Jie,
Do you have any thoughts on this? I see two options:
a) create a separate RM called SLURM-MPI (or something) that assumes the job would always be an MPI job and adds some extra fields to the launch configuration; or
b) modify the SLURM RM launch configuration to be more like the PBS one and allow the user to select the MPI command.
This doesn't need to be BG specific, it could support any system that has MPI installed. Is this something your site might be interested in?
Regards, Greg On Jun 28, 2010, at 12:34 AM, Simon Wail wrote: I'm using the latest version of Eclipse
(Helios) with PTP 4.0. I've been able to configure a resource manager
talking to SLURM, that is simulating an IBM BlueGene system. The
issue I have is it seems SLURM on the BlueGene is different than on other
systems. On other systems when you provide SLURM with the number
of nodes and/or tasks to run on, it automatically executes your MPI program
on the number of nodes specified. For a BlueGene it is different
as you need to execute the "mpirun" command with your MPI program
as an argument, as well as specify the number of nodes as the "-np"
argument to mpirun. Now of course the SLURM implementation for PTP
does not do this.
So the question is how do we get the SLURM resource manager to work for
BlueGene? I've looked at the PTP-SLURM proxy code and it seems one
way would be to change the executable for the job to always be "mpirun"
and the user specified program is added as an argument. This seems
a bit of a hack, and it doesn't account for other options to mpirun needed
for the BlueGene, such as "mode" and processor mapping. I
suppose a better method would be to change/extend the existing SLURM resource
manager configuration in the UI to allow the specification of the mpirun
command if you're using a BlueGene - like the PBS configuration allows
the user to select the MPI command. Alternatively we could create
a new SLURM resource manager specifically for BlueGene but probably reusing
lots of the existing SLURM code. Either approach is probably a lot
more work and would need some proper design specs. Is there currently
any plan to add BlueGene support to the SLURM RM, or has anyone tried it?
Am I on my own here, or does this sound like a reasonable enhancement
to PTP and maybe something we can work on together?
Your feedback is much appreciated.
Thanks,
Simon Wail, Ph.D
|
HPC Specialist
|
<Mail Attachment.gif>
| IBM Research Collaboratory
for Life Sciences - Melbourne
|
<Mail Attachment.gif>
|
phone:
| +61 3 9035-4341
fax: +61 3 8344-9130
|
address:
| VLSCI, Gnd Floor, 187 Grattan St
|
| Carlton VIC 3010 Australia
|
email:
| simon.wail@xxxxxxxxxxx |
|
_______________________________________________ ptp-dev mailing list ptp-dev@xxxxxxxxxxx https://dev.eclipse.org/mailman/listinfo/ptp-dev
|