Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[ptp-dev] Re: Problem about SDM

Jie,

Thanks for the patch, I've applied it to 3.0 and HEAD. This appears to be a different problem to the one I fixed a while ago that was preventing debugging jobs with 5 or more processes. However your other problem does seem very like this one (bug #297914). I'll do some more testing to see if I can reproduce the problem.

Greg

On Mar 22, 2010, at 12:14 PM, JiangJie wrote:

Greg,

I checked out ptp_3.0_branch and HEAD, this problem still exists.

After enabling debug message output, I found that it is due to the TCP port conflicting,
i.e, the port is already in use , which causes the bind() function fails.

I have submit a patch to sdm source code to avoid port conflicting in Bug 306733.
Please check it.

However,  event I fix this bug, there is still some problems during setting up
 the debug session for a debug job with more than 8 processes.
By now, the debug session for 8 processes can be successfully set up.
When it comes to the debug job with process number larger than 8( 9, 10, or 16),
 every process can stop at the first line in the main() function.

But the registered process window can't list process 0 's stack frames.
I checked the output of sdm 0's debug outputs, it seems that it did not
receive the "-stack-list-frames" mi command.   And if I set a bre! akpoint at debugListStackframes()
this breakpoint is not hit.

Why is the "-stack-list-frames" mi command not sent?
Why doesn't it  happen to the small process number (<8)?

Any idea?

Jie


From: g.watson@xxxxxxxxxxxx
To: yangtzj@xxxxxxxxxxx
Subject: Re: Problem about SDM
Date: Thu, 18 Mar 2010 17:04:22 +0000

Jie,

This was a bug that has been fixed in the 3.0 branch and head.  

Cheers,
Greg 

On Mar 18, 2010, at 2:34 PM, JiangJie <yangtzj@xxxxxxxxxxx> wrote:

Hi  all,

Recently I have been testing SLURM proxy based on SLURM-2.1.
Basically, non-debug job can be launched successfully.
However, when it comes to debug job, it seems that there are some problems about sdm.
For example, when the job scale is 2 or 4, the debug session can be set up successfully.
But when the job scale goes to 8,  some sdm servers on some compute node exit ( with exit code=1)
immediately right after the debug job is launched (as recorded in slurmd log file).

Is there anyone here who has tested the debugger with large scale (e.g, >= 8) with any RMs or runtime system?

Greg, any idea?

Regards,
Jie



使用新一代 Windows Live Messenger 轻松交流和共享! 立刻下载!


搜索本应是彩色的,快来体验新一代搜索引擎-必应,精美图片每天换哦! 立即试用!


Back to the top