Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Re: About PTP's support for resource manager

As an explanation of where we are: we (I) am involved in a major rewrite / restructuring of the runtime subsystem because of changes to the behavior of ORTE that we weren't expecting. We had outlined our requirements, had an agreement on it, and subsequently they have changed ORTE's behavior so we're trying to adjust accordingly. Most of my changes are done, but there are some bugs to be ironed out.

-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------



Randy M. Roberts wrote:
Jie Jiang,

What you described, that job submission and control can be affected
totally through the RM system, is what we are designing.  We are
currently not at the stage that we have an API specified for someone
to hook-up their own RM, but we hope to have one soon -- my current
task.  In fact SLURM is on our short list for targeting as our first
implementation, with LSF also in consideration.

About ORTE's future, we envision that even ORTE jobs will be handled
through the RM interface, an RM specifically for submitting,
monitoring, and controlling ORTE jobs.

The overall architecture will include the ability to instantiate
several RM's within an Eclipse session, each one could be of a
different type, e.g. SLURM or LSF, and each may be running on different
remote sites.  You would have a "currently selected" RM to designate
with which RM you would submit, monitor and control jobs.

Aside: Are you volunteering to implement the SLURM RM?

Regards,
Randy

On Tue, 2006-08-01 at 17:33 +0800, jiangyangtz wrote:
Hi all,
Currently, PTP runtime subsystem ultilizes ORTE to lauch, monitor and control parallel jobs, processes and also obtain status about computing nodes.However, there exist some other practical third party's resource management systems, such as SLURM, that can also lauch
process,
monitor process/node status. Their functionalities are similar to
ORTE,
but the implementations are different.Further, by now ORTE is not
widely used and some users ultilize SLURM to manage the available computing
resources.
It's really necessary to make PTPsupport other RM system(like SLURM).
Fortunately, PTP team has realized this problem and Randy has been
working on this issue.
Here I'd like to know how to integrate RM support into PTP. That is, can the PTP's RM support subsystem compeltely substitue ORTE for RM users? Since both can provide same funciotns to the upper UI layer,
does the PTP's RM subsystem has equal role as the current ORTE
subsystem ?
If so, the SLURM user can only rely on the existing RM to run PTP, not
relying on ORTE anymore.
Best regards,
Jie Jiang

______________________________________________________________________
Windows Live Safety Center 为您的计算机提供免费的安全扫描服务。 它是免费的!

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev



Back to the top