Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [ptp-dev] Using PTP with SLURM on a BlueGene/P

Hi Simon,
 
Sorry for the delayed response.
It would be great to port SLURM proxy to support BG machine.
 
Since I have no BG machine available, could you please give an example on
the command submitting MPI job on BG machine with SLURM rms?
On non-BG machine, it usually looks like this:
srun -n x -N x ./a.out
What is it on BG machine?
 
Regards,
Jie
 

To: ptp-dev@xxxxxxxxxxx
From: simon.wail@xxxxxxxxxxx
Date: Mon, 28 Jun 2010 14:34:22 +1000
Subject: [ptp-dev] Using PTP with SLURM on a BlueGene/P

I'm using the latest version of Eclipse (Helios) with PTP 4.0.  I've been able to configure a resource manager talking to SLURM, that is simulating an IBM BlueGene system.  The issue I have is it seems SLURM on the BlueGene is different than on other systems.  On other systems when you provide SLURM with the number of nodes and/or tasks to run on, it automatically executes your MPI program on the number of nodes specified.  For a BlueGene it is different as you need to execute the "mpirun" command with your MPI program as an argument, as well as specify the number of nodes as the "-np" argument to mpirun.  Now of course the SLURM implementation for PTP does not do this.

So the question is how do we get the SLURM resource manager to work for BlueGene?  I've looked at the PTP-SLURM proxy code and it seems one way would be to change the executable for the job to always be "mpirun" and the user specified program is added as an argument.  This seems a bit of a hack, and it doesn't account for other options to mpirun needed for the BlueGene, such as "mode" and processor mapping.  I suppose a better method would be to change/extend the existing SLURM resource manager configuration in the UI to allow the specification of the mpirun command if you're using a BlueGene - like the PBS configuration allows the user to select the MPI command.  Alternatively we could create a new SLURM resource manager specifically for BlueGene but probably reusing lots of the existing SLURM code.  Either approach is probably a lot more work and would need some proper design specs.  Is there currently any plan to add BlueGene support to the SLURM RM, or has anyone tried it?  Am I on my own here, or does this sound like a reasonable enhancement to PTP and maybe something we can work on together?


Your feedback is much appreciated.

Thanks,
Simon Wail, Ph.D
HPC Specialist
IBM Research Collaboratory for Life Sciences - Melbourne


phone:
+61 3 9035-4341  fax: +61 3 8344-9130
address:
VLSCI, Gnd Floor, 187 Grattan St
Carlton   VIC   3010   Australia
email:
simon.wail@xxxxxxxxxxx




使用Messenger保护盾V2,支持多账号登录! 现在就下载!

Back to the top