Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Re: Question about bug 292049

Dave,

Ok, great.

Would you mind opening a bug with this information in it? That way I can keep track of it.

Thanks,
Greg

On Nov 6, 2009, at 9:46 AM, Dave Wootton wrote:

Greg
I just committed the routing file changes for the PE proxy so the debugger should work again. I modeled the port number generation logic after what
you had in SDMDebugger.java

While I was fixing this, I saw the same connect: Invalid argument problem we were looking at last month, this time with just two MPI tasks. I think
I know what is going on. This is sort of a timing problem caused by
leaving an old routing file hanging around after the debugger exits.

In the PE proxy model, the child SDMs start as the PE application. I think
that if the routing file doesn't exist, you have logic where they spin
until the routing file appears and the master SDM starts. If there's no
routing file, then the debugger starts correctly. If there's an old
routing file hanging around, then the child SDMs read it and get bad port
numbers, resulting in the connect failure.

I was reliably able to start the SDM debugger if I deleted the routing
file before I started the debugger. I was reliably able to get either a connect: invalid argument failure or a child SDM exiting with rc -1 if I
did not delete the old routing file before starting the debugger.

I think the solution to this is that once the master SDM has initialized,
delete the routing file. Note that this does not fix the case where
somebody starts two debug sessions in the same working directory since the
second debug instance will likely trip over the old routing file. This
case is unlikely, but using unique filenames for each routing file could
fix that.
Dave



From:
Greg Watson <g.watson@xxxxxxxxxxxx>
To:
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
Date:
11/05/2009 02:07 PM
Subject:
Re: [ptp-dev] Re: Question about bug 292049
Sent by:
ptp-dev-bounces@xxxxxxxxxxx



The third number is a TCP/IP port number that each process listens on
for an incoming connection. The number should be unique for each node
(so if two processes are on the same node, their port numbers will be
different). It looks like the debugger currently generates a pseudo-
random number between 50000 and 60000. It doesn't matter if the port
number is being used by another process as the servers have an
internal algorithm to deal with that.

I've already changed the java code, so as soon as you change the PE
RM, the debugger will be working again :-).

Greg

On Nov 5, 2009, at 1:21 PM, Dave Wootton wrote:

Ok, I will try to get this done in the next few days.  Two questions:
1)What should I be using as the third token in eack line? I suspect
'7777'
was some scaffolding code I had and that I need a real value to put
there
2) How should we coordinate thye update of SDMDebugger.java?

Dave



From:
Greg Watson <g.watson@xxxxxxxxxxxx>
To:
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
Date:
11/05/2009 12:51 PM
Subject:
Re: [ptp-dev] Re: Question about bug 292049
Sent by:
ptp-dev-bounces@xxxxxxxxxxx



Dave,

The debugger uses the working dir also, so that looks correct. I'd
suggest checking it's passed and if not just use the current dir.

Greg

On Nov 5, 2009, at 11:25 AM, Dave Wootton wrote:

Greg
I have code in the proxy already, ifdefed out for now, that is
supposed to
generate the routing file. The code generates the routing file after
the
attach.cfg file is read. The code I have now writes one line per
task with
task index, hostname and the string '7777' (I don't remember what
7777 is
for). This is only a few lines of code so I should be able to make
the
change fairly quickly.

The questions I have are what directory do I need to create this in,
and
how is that directory name passed to the proxy? Currently I think my
code
is picking it up from the PTP_JOB_WORKING_DIR_ATTR passed in the
target
program invocation request but I'm not sure if that's the right
value or
if I can count on that always being passed.
Dave



From:
Greg Watson <g.watson@xxxxxxxxxxxx>
To:
JiangJie <yangtzj@xxxxxxxxxxx>
Cc:
ptp-dev@xxxxxxxxxxx
Date:
11/05/2009 09:51 AM
Subject:
[ptp-dev] Re: Question about bug 292049
Sent by:
ptp-dev-bounces@xxxxxxxxxxx



Jie,

Yes, this should really be inside the 'if', but it was moved because
the
PE RM does not currently generate a routing file.

Dave, would it be possible to add this to the PE RM? Would it help
if I
provided some support functions in the utils package?

Greg

On Nov 5, 2009, at 9:37 AM, JiangJie wrote:

Hi Greg,

I'm almost done with the new patch.
But during the test process, I found a problem that has been solved
before.
In SDMDebugger.java, writeRoutingFile() method has been moved
outside the
following "if (fSdmRunner !== null)" condition,
which will eliminate the use of
SLURMServiceProvider.needsDebuggerLaunchHelp().  Even if
needsDebuggerLaunchHelp() returns false,
the PTP debugger will still try to write the routing file. As we have
discussed, SLURM proxy cann't provide enough information
for PTP debugger to generate routing file.Instead, it writes the
routing
file on its own.

So is it possbile to move the call to writeRoutingFile() inside the
"if"
condition? (There is a version of PTP where the call to
writeRoutingFile()
IS inside the "if" condition in my cvs update. When did this change
happen?)

Regards,
Jie

Subject: Re: Question about bug 292049
From: g.watson@xxxxxxxxxxxx
Date: Tue, 3 N! ov 2009 10:04:59 -0500
CC: ptp-dev@xxxxxxxxxxx
To: yangtzj@xxxxxxxxxxx

Hi Jie,

The changes to the views look fine.

To fix the slurm.h problem, I've modified the proxy code to add
"PTP_" to
the beginning of all proxy*.h constants. Please update the slurm C
code to
use the new names and hopefully this should resolve the problem.

Regards,
Greg


使用Messenger保护盾2.0,支持多账号登录! 现在就下载!
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev



_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev



_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev



_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev



Back to the top