Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] PBS remote resource manager fails on NERSC's Hopper

Thank you, Greg – Wyatt, it is definitely worth a look in your .eclipsesettings directory, the README file is very clear on how things are organized, and indeed, as Greg indicates, there are directories in the rms subdirectory for each resource manager type – for the cray, you will likely need to modify the da_nodes_info_LML.pl, to replace pbsnodes with the apstat command as Roland suggests.  Somewhere, too, there is likely some parsing code to deal with the output of pbsnodes, and to put it into the right xml format that is expected downstream of the nodes information command.  That is likely the most complicated part of the monitoring package.

 

I’d suggest just trying to modify enough of the PBS support to make it work on the cray, it may be worth having a connection to a torque or pbspro cluster open at the same time you are working on hopper, so you can see best how to modify the code…

 

 

 

From: ptp-dev-bounces@xxxxxxxxxxx [mailto:ptp-dev-bounces@xxxxxxxxxxx] On Behalf Of Greg Watson
Sent: Tuesday, October 04, 2011 12:17 PM
To: Parallel Tools Platform general developers
Subject: Re: [ptp-dev] PBS remote resource manager fails on NERSC's Hopper

 

Jay,

 

Monitoring is done by perl scripts that run out of .eclipsesettings on the target machine. Take a look at the rms directory for examples of existing RMs. Yes, these are packaged in a plugin, but you can modify the scripts on the target machine to fix problems/add new RM support. When you have something working, we can add to the plugin. I'd like to add a mechanism to specify these scripts via the XML configuration, but that's on the TODO list.

 

Greg

 

On Oct 4, 2011, at 5:33 PM, Jay Alameda wrote:



Then we need to provide a modified lml.da, right?  (trying to figure out where this is plumbed in, I believe it is packaged in a plugin).  I don’t think there is a place yet, for configuring this in the resource manager xml (really the control xml), but I’m not certain.

 

Jay

 

 

From: ptp-dev-bounces@xxxxxxxxxxx [mailto:ptp-dev-bounces@xxxxxxxxxxx] On Behalf Of Roland Schulz
Sent: Tuesday, October 04, 2011 11:31 AM
To: Parallel Tools Platform general developers
Subject: Re: [ptp-dev] PBS remote resource manager fails on NERSC's Hopper

 

On Cray the pbsnodes command never gives useful information. On Jaguar it only shows the batch nodes. 

"apstat -v -n" gives  the required information instead.

 

Roland

On Tue, Oct 4, 2011 at 12:08 PM, Wyatt Spear <wspear@xxxxxxxxxxxxxx> wrote:

On Hopper pbsnodes returns: 

pbsnodes: Server has no node list MSG=node list is empty - check 'server_priv/nodes' file

 

I'll ask the NERSC-ies about this...

 

Thanks,

Wyatt

 

On Tue, Oct 4, 2011 at 8:48 AM, Jay Alameda <jalameda@xxxxxxxxxxxxxxxxx> wrote:

Well, I’m slowly coming up the steep learning curve on the configurable RM.   I know that the monitoring code, lml.da is looking at pbsnodes output on machines that support pbs.     Maybe start on hopper, and see what pbsnodes returns?

 

There also could be a missing perl module, that would be needed to convert the raw output of pbs nodes into xml that the client monitoring code expects to see.  We saw this on one system here at NCSA –

 

Jay

 

 

From: ptp-dev-bounces@xxxxxxxxxxx [mailto:ptp-dev-bounces@xxxxxxxxxxx] On Behalf Of Wyatt Spear
Sent: Tuesday, October 04, 2011 10:31 AM


To: Parallel Tools Platform general developers
Subject: Re: [ptp-dev] PBS remote resource manager fails on NERSC's Hopper

 

I would be happy to dig around and check.  Is there a spot where I can do a sysout on what PBS is returning?

Thanks,

Wyatt

On Tue, Oct 4, 2011 at 7:51 AM, Jay Alameda <jalameda@xxxxxxxxxxxxxxxxx> wrote:

I think we tried this out on an older Cray system, Kraken, at NICS.  I
seem to recall that it worked, including the LML display.  I wonder what
may be different here?

Jay


-----Original Message-----
From: ptp-dev-bounces@xxxxxxxxxxx [mailto:ptp-dev-bounces@xxxxxxxxxxx] On
Behalf Of Greg Watson
Sent: Tuesday, October 04, 2011 6:37 AM
To: Parallel Tools Platform general developers
Subject: Re: [ptp-dev] PBS remote resource manager fails on NERSC's Hopper


Maybe open a bug?

Greg

On Oct 3, 2011, at 11:25 PM, Wyatt Spear wrote:

> Greetings,
>
> I would like to be able to test/demonstrate the remote tools on the
Hopper system at NERSC.  I can connect but once it tries to populate the
system monitor view it errors out with:
> cvc-complex-type.2.4.b: The content of element 'scheme' is not complete.
One of '{el1}' is expected.
>
> I'm guessing the version of pbs they have over there has a different
sort of interface.  Is there any I can help get Hopper supported by an
Eclipse resource manager?
>
> Thanks,
> Wyatt
> _______________________________________________
> ptp-dev mailing list
> ptp-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

 


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

 



 

--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov
865-241-1537, ORNL PO BOX 2008 MS6309

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

 


Back to the top