Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] PTP monthly conference call Dec 12

> First some questions:
>
> Were you planning on using OpenMPI or did you just grab it as a place
> to start?

I used OpenMPI as a starting point since I was able to get that running on
a small cluster and so I could use that as an example of how the proxy
talked to the PTP front end. I implemented a new proxy which has no
dependencies on OpenMPI or orte since my goal was to implement something
that did not use OpenMPI or orte for jobs run with PE.

>
> Are you primarily interested in job submission and monitoring?

My thinking is that OpenMPI represents the minimal implementation that we
need to be at least as good as, and hopefully better than. What I think
that means is that we need to be able to start PE jobs interactively, see
that the tasks get allocated on a set of nodes, and watch the jobs run to
completion. The user should be able to view the stdout/stderr output
directly from the jobs view in PTP and see the state of nodes in the
machines view. I think it would be nice to be able to disconnect from the
running job and connect again later, but there are some interesting
questions to be answered for that to work as well as managing the state of
nodes in the machines view. We need to be able to debug an interactive PE
job and need to be able to scale all of this to huge numbers of nodes
reliably.

The PE implementation I'm working on is strictly interactive and has no
concept of a resource manager or job scheduler. That, I think, falls within
the scope of a Load Leveler implementation, which is a different piece of
work.

>
> How far have you gotten?

I think I have a fairly complete proxy at this point. I can submit a PE
job, watch the jobs view update thru the submit/start/complete states, and
view the stdout/stderr from the jobs view. I can build an initial machines
view from the host file that PE uses to allocate tasks to nodes, but that
needs some design work. All of this appears to be working reliably, but
probably does not scale well.

I also cloned the OpenMPI preference panels and made some changes to the
cloned panels to allow me to set the mimimal settings I needed to run a PE
job. The preferences panels are pretty bare-bones right now. I need to
understand what the preferences panels should look like, and whether the
preferences panels are even the right place for these settings. I think
there is justification to some additional panels in the run configuration
dialog so we can have specific settings for each run configuration. There's
a lot of settings for PE, so those also need to be organized in some
meaningful way, and the most useful subset choosen.

This was work that I started in mid November, using the 1.1 branch of PTP
code, and was mainly a learning exercise for me. If this is not consistent
with the direction PTP is going, I'll rework what's needed.
>
> Our current status:
>
> I've created a plugin that sounds very similar to what you are
> working on, see org.eclipse.ptp.lsf.proxy.  It doesn't do anything
> yet (contains only stubs) but I wanted to use it to work out the API
> necessary for supporting the LSF queueing system.  I was talking with
> Greg yesterday and we think we will probably startup the proxy from
> an ssh connection.  Randy Roberts is currently working on some new
> API to support remote job submission beyond OpenMPI.  We are also
> interested in supporting Moab.
>
> It would be great to work with you to, first to get the design
> fleshed out, and then to get some code created.
>
> I'm planning on attending the CDT telecon today as HP will is talking
> about remote projects.
>
> Cheers,
> Craig

Dave



Back to the top