another thing i have to do for this setup is manually kill some processes on the node i start my job from before i can run another debug job
otherwise my job will not start under control of sdm but will terminate as soon as sdm master and sdm on the compute node(s) have connected
this pid: 20130
waiting: 20129 20130
#PTP job_id=0
#launchMode=debug
launchCommand: salloc --time=01:00:00 --partition=gpu_short --ntasks=2 --ntasks-per-node=1 mpirun -np 2 -mca orte_show_resolved_nodenames 1 -display-map /nfs/home1/thomasge/.eclipsesettings/sdm --port=39704 --host=localhost --debugger=gdb-mi --debug=13 --routing_file=/nfs/home1/thomasge/source/testmpitype/routing_file
line: salloc: Granted job allocation 1151231
salloc: Granted job allocation 1151231
line: Data for JOB [32579,1] offset 0
Data for JOB [32579,1] offset 0
line:
line: ======================== JOB MAP ========================
found job map: ======================== JOB MAP ========================
found node gcn64, procs 1
found proc 0
found end of node map
found node gcn65, procs 1
found proc 1
found end of node map
found end of table
line: SDM: [server] effsize: 3, size: 2, rv: 0
SDM: [server] effsize: 3, size: 2, rv: 0
line: SDM: [server] Found routing file, size=2
SDM: [server] Found routing file, size=2
line: SDM: [1] size 3
SDM: [1] size 3
line: SDM: [1] sdm_route_get_route dest {0-1}, parent 2
SDM: [1] sdm_route_get_route dest {0-1}, parent 2
line: SDM: [server] effsize: 3, size: 2, rv: 0
SDM: [server] effsize: 3, size: 2, rv: 0
line: SDM: [1] nodeID: 0, hostname: gcn64, port: 59269
SDM: [1] nodeID: 0, hostname: gcn64, port: 59269
line: SDM: [1] nodeID: 1, hostname: gcn65, port: 55920
SDM: [1] nodeID: 1, hostname: gcn65, port: 55920
line: SDM: [server] effsize: 3, size: 2, rv: 0
SDM: [server] effsize: 3, size: 2, rv: 0
line: SDM: [server] Found routing file, size=2
SDM: [server] Found routing file, size=2
line: SDM: [0] size 3
SDM: [0] size 3
line: SDM: [0] sdm_route_get_route dest {0-1}, parent 2
SDM: [0] sdm_route_get_route dest {0-1}, parent 2
line: SDM: [server] effsize: 3, size: 2, rv: 0
SDM: [server] effsize: 3, size: 2, rv: 0
line: SDM: [0] nodeID: 0, hostname: gcn64, port: 59269
SDM: [0] nodeID: 0, hostname: gcn64, port: 59269
SDM: [master] effsize: 3, size: 2, rv: 0
SDM: [master] Found routing file, size=2
SDM: [2] size 3
SDM: [2] route for 0 is {}
SDM: [2] route for 1 is {}
SDM: [2] sdm_route_get_route dest {0-2}, parent 2
SDM: [master] effsize: 3, size: 2, rv: 0
SDM: [2] nodeID: 0, hostname: gcn64, port: 59269
SDM: [2] nodeID: 1, hostname: gcn65, port: 55920
line: SDM: [0] Initialization successful
SDM: [0] Initialization successful
line: SDM: starting task 0
SDM: starting task 0
SDM: [2] nodeID: 1, hostname: gcn65, port: 55920
SDM: [2] Initialization successful
SDM: starting client
SDM: DbgMasterInit num_svrs=2
SDM: DbgMasterCreateSession host=localhost port=39704
line: SDM: [1] Initialization successful
SDM: [1] Initialization successful
line: SDM: starting task 1
SDM: starting task 1
SDM: DbgMasterStartSession(testmpitype,/nfs/home1/thomasge/source/testmpitype,)
SDM: [2] sdm_route_get_route dest {0-1}, parent 2
line: SDM: [1] sdm_route_get_route dest {0}, parent 2
SDM: [1] sdm_route_get_route dest {0}, parent 2
line: SDM: [0] sdm_route_get_route dest {1}, parent 2
SDM: [0] sdm_route_get_route dest {1}, parent 2
line: SDM: [1] sdm_route_get_route dest {2}, parent 2
SDM: [1] sdm_route_get_route dest {2}, parent 2
line: SDM: [0] sdm_route_get_route dest {2}, parent 2
SDM: [0] sdm_route_get_route dest {2}, parent 2
SDM: dbg_master_cmd_completed src="">
SDM: DbgMasterSetFuncBreakpoint(2:03,0,1,0,,main,,0,0)
SDM: [2] sdm_route_get_route dest {0-1}, parent 2
line: SDM: [1] sdm_route_get_route dest {0}, parent 2
SDM: [1] sdm_route_get_route dest {0}, parent 2
line: SDM: [0] sdm_route_get_route dest {1}, parent 2
SDM: [0] sdm_route_get_route dest {1}, parent 2
line: SDM: [1] sdm_route_get_route dest {2}, parent 2
SDM: [1] sdm_route_get_route dest {2}, parent 2
line: SDM: [0] sdm_route_get_route dest {2}, parent 2
SDM: [0] sdm_route_get_route dest {2}, parent 2
SDM: dbg_master_cmd_completed src="">
SDM: DbgMasterQuit()
SDM: [2] sdm_route_get_route dest {0-1}, parent 2
SDM: shutdown completed
line: SDM: [1] sdm_route_get_route dest {0}, parent 2
SDM: [1] sdm_route_get_route dest {0}, parent 2
line: SDM: [0] sdm_route_get_route dest {1}, parent 2
SDM: [0] sdm_route_get_route dest {1}, parent 2
SDM: DbgMasterFinish
SDM: all finished
waiting: 20129 20130
line: SDM: [0] sdm_route_get_route dest {2}, parent 2
SDM: [0] sdm_route_get_route dest {2}, parent 2
line: SDM: all finished
SDM: all finished
line: SDM: [1] sdm_route_get_route dest {2}, parent 2
SDM: [1] sdm_route_get_route dest {2}, parent 2
line: SDM: all finished
SDM: all finished
line: salloc: Relinquishing job allocation 1151231
salloc: Relinquishing job allocation 1151231
line: salloc: Job allocation 1151231 has been revoked.
salloc: Job allocation 1151231 has been revoked.
exit
thanks
Thomas