Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Re: Question about bug 292049

Greg
I just committed the routing file changes for the PE proxy so the debugger 
should work again. I modeled the port number generation logic after what 
you had in SDMDebugger.java

While I was fixing this, I saw the same connect: Invalid argument problem 
we were looking at last month, this time  with just two MPI tasks. I think 
I know what is going on. This is sort of a timing problem caused by 
leaving an old routing file hanging around after the debugger exits. 

In the PE proxy model, the child SDMs start as the PE application. I think 
that if the routing file doesn't exist, you have logic where they spin 
until the routing file appears and the master SDM starts. If there's no 
routing file, then the debugger starts correctly. If there's an old 
routing file hanging around, then the child SDMs read it and get bad port 
numbers, resulting in the connect failure.

I was reliably able to start the SDM debugger if I deleted the routing 
file before I started the debugger. I was reliably able to get either a 
connect: invalid argument failure or a child SDM exiting with rc -1 if I 
did not delete the old routing file before starting the debugger.

I think the solution to this is that once the master SDM has initialized, 
delete the routing file. Note that this does not fix the case where 
somebody starts two debug sessions in the same working directory since the 
second debug instance will likely trip over the old routing file. This 
case is unlikely, but using unique filenames for each routing file could 
fix that.
Dave



From:
Greg Watson <g.watson@xxxxxxxxxxxx>
To:
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
Date:
11/05/2009 02:07 PM
Subject:
Re: [ptp-dev] Re: Question about bug 292049
Sent by:
ptp-dev-bounces@xxxxxxxxxxx



The third number is a TCP/IP port number that each process listens on 
for an incoming connection. The number should be unique for each node 
(so if two processes are on the same node, their port numbers will be 
different). It looks like the debugger currently generates a pseudo- 
random number between 50000 and 60000. It doesn't matter if the port 
number is being used by another process as the servers have an 
internal algorithm to deal with that.

I've already changed the java code, so as soon as you change the PE 
RM, the debugger will be working again :-).

Greg

On Nov 5, 2009, at 1:21 PM, Dave Wootton wrote:

> Ok, I will try to get this done in the next few days.  Two questions:
> 1)What should I be using as the third token in eack line? I suspect 
> '7777'
> was some scaffolding code I had and that I need a real value to put 
> there
> 2) How should we coordinate thye update of SDMDebugger.java?
>
> Dave
>
>
>
> From:
> Greg Watson <g.watson@xxxxxxxxxxxx>
> To:
> Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
> Date:
> 11/05/2009 12:51 PM
> Subject:
> Re: [ptp-dev] Re: Question about bug 292049
> Sent by:
> ptp-dev-bounces@xxxxxxxxxxx
>
>
>
> Dave,
>
> The debugger uses the working dir also, so that looks correct. I'd
> suggest checking it's passed and if not just use the current dir.
>
> Greg
>
> On Nov 5, 2009, at 11:25 AM, Dave Wootton wrote:
>
>> Greg
>> I have code in the proxy already, ifdefed out for now, that is
>> supposed to
>> generate the routing file. The code generates the routing file after
>> the
>> attach.cfg file is read. The code I have now writes one line per
>> task with
>> task index, hostname and the string '7777' (I don't remember what
>> 7777 is
>> for). This is only a few lines of code so I should be able to make 
>> the
>> change fairly quickly.
>>
>> The questions I have are what directory do I need to create this in,
>> and
>> how is that directory name passed to the proxy? Currently I think my
>> code
>> is picking it up from the PTP_JOB_WORKING_DIR_ATTR passed in the
>> target
>> program invocation request but I'm not sure if that's the right
>> value or
>> if I can count on that always being passed.
>> Dave
>>
>>
>>
>> From:
>> Greg Watson <g.watson@xxxxxxxxxxxx>
>> To:
>> JiangJie <yangtzj@xxxxxxxxxxx>
>> Cc:
>> ptp-dev@xxxxxxxxxxx
>> Date:
>> 11/05/2009 09:51 AM
>> Subject:
>> [ptp-dev] Re: Question about bug 292049
>> Sent by:
>> ptp-dev-bounces@xxxxxxxxxxx
>>
>>
>>
>> Jie,
>>
>> Yes, this should really be inside the 'if', but it was moved because
>> the
>> PE RM does not currently generate a routing file.
>>
>> Dave, would it be possible to add this to the PE RM? Would it help
>> if I
>> provided some support functions in the utils package?
>>
>> Greg
>>
>> On Nov 5, 2009, at 9:37 AM, JiangJie wrote:
>>
>> Hi Greg,
>>
>> I'm almost done with the new patch.
>> But during the test process, I found a problem that has been solved
>> before.
>> In SDMDebugger.java, writeRoutingFile() method has been moved
>> outside the
>> following "if (fSdmRunner !== null)" condition,
>> which will eliminate the use of
>> SLURMServiceProvider.needsDebuggerLaunchHelp().  Even if
>> needsDebuggerLaunchHelp() returns false,
>> the PTP debugger will still try to write the routing file. As we have
>> discussed, SLURM proxy cann't provide enough information
>> for PTP debugger to generate routing file.Instead, it writes the
>> routing
>> file on its own.
>>
>> So is it possbile to move the call to writeRoutingFile() inside the
>> "if"
>> condition? (There is a version of PTP where the call to
>> writeRoutingFile()
>> IS inside the "if" condition in my cvs update. When did this change
>> happen?)
>>
>> Regards,
>> Jie
>>
>> Subject: Re: Question about bug 292049
>> From: g.watson@xxxxxxxxxxxx
>> Date: Tue, 3 N! ov 2009 10:04:59 -0500
>> CC: ptp-dev@xxxxxxxxxxx
>> To: yangtzj@xxxxxxxxxxx
>>
>> Hi Jie,
>>
>> The changes to the views look fine.
>>
>> To fix the slurm.h problem, I've modified the proxy code to add
>> "PTP_" to
>> the beginning of all proxy*.h constants. Please update the slurm C
>> code to
>> use the new names and hopefully this should resolve the problem.
>>
>> Regards,
>> Greg
>>
>>
>> 使用Messenger保护盾2.0,支持多账号登录! 现在就下载!
>> _______________________________________________
>> ptp-dev mailing list
>> ptp-dev@xxxxxxxxxxx
>> https://dev.eclipse.org/mailman/listinfo/ptp-dev
>>
>>
>>
>> _______________________________________________
>> ptp-dev mailing list
>> ptp-dev@xxxxxxxxxxx
>> https://dev.eclipse.org/mailman/listinfo/ptp-dev
>
> _______________________________________________
> ptp-dev mailing list
> ptp-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-dev
>
>
>
> _______________________________________________
> ptp-dev mailing list
> ptp-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev




Back to the top