Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Questions about PTP SDM debugger

Dave,

On Aug 22, 2008, at 10:44 PM, Dave Wootton wrote:


I have a first attempt at changes to my PE proxy to allow a PE application to be debugged using the SDM debugger, and have some questions
1) Around line 188 of SDMDebugger.java, I see code that sets up the --numnodes parameter to sdm with number of processes + 1. Right now, my code isn't setting up the right job attributes to satisfy JobAttributes.getNumberOfProcessesAttributeDefinition so I was getting a null pointer exception. I temporarily hard coded --numnodes as 2 to get around that..
Is there an assumption that there is only one application task per node when debugging or is this really number of application tasks + 1 and number of tasks per node doesn't matter?
For PE, the user can run as many tasks per node as he likes as long as system resources are available. If the user specifies a hostlist, I could probably figure out the number of nodes used by the application by looking at the hostlist before starting the debugger. If the user is using LoadLeveler to allocate nodes, then I have no idea how many nodes, or even what nodes the application will run on since LoadLeveler doesn't get control to handle node allocation until sometime after the job is submitted.

You're right, this parameter should be --numprocs rather than -numnodes. I've changed it now. I've also changed it so that you specify the number of processes being debugged rather than +1 since I think this makes more sense.


2) I think I have the argument list to sdm set up properly, where argv[0] is the sdm executable name (sdm), the next elements of argv are whatever are passed as debugArgs, then the pathname of the application executable and finally a NULL (to satisfy execve)
When I try to invoke the debugger, I see the sdm process show up for a few seconds then it exits. If I'm quick enough, I can attatch to sdm with gdb then let it run to completion. It looks like sdm is just running for a few seconds then exits with an exit(1) call somewhere in main.
Is there any way I can turn on some debug output to see what is going wrong with SDM?

--debug will enable debug output. --debug=level will enable selective debug output. See config.h for the levels.

Note that the new debugger must be started in two steps. The first step is to run a master sdm on the head node. The second step is to start the server sdm's on the nodes using mpirun (or poe). All the sdm's will wait until they find a routing file formatted as:

numprocs
index address port
...

where numprocs is the number of processes being debugged, index in the rank of the process, address is the host address (node name) where the process is running, and port is a random port number (I'm not sure that this is used).

It's likely that the routing file format will change in the future.


3) I am intercepting stdout and stderr for teh sdm process so that they can either be sent to a console or redirected to a file. In either case, I'm not seeing anything from SDM. If I try to redirect stdio for sdm, will that cause problems?

The only thing on stdout/err should be debugging I think.


Dave_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


Back to the top