[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[ptp-dev] Some Changes I Need to Implement
|
So I have a few things I need to change to the base code and with the
resource manager changes going in I think we might need to work out
exactly how to do this in the "new system".
Firstly - I need to change the model (method, not model object :)) that
interfaces to the runtime subsystem so that instead of the Eclipse-side
asking for information about the runtime-system (such as node status,
process status, job status) these are instead triggered by the runtime
subsystem when appropriate. An example would probably help.
Currently, when we start up we send a Startup() message to the runtime
system. It does so. Then we ask the runtime system how many nodes it
knows about. It returns a number. Then, for each node we ask it, in
sequence, what it knows about those nodes (up/down, node name, general
attributes, etc). Similarly, when a job starts, we get a jobID back and
then we go back to the runtime system and ask it for information from
that jobID - including, in sequence, attributes related to each process.
OK, that's the old way. The new way I want to do it better keeps up our
event-driven methodology. In particular, the runtime subsystem will
tell us information about the nodes, jobs, processes it knows about with
a general 'discovery' message. So here's an example of how the new
model will work:
We send a Startup() message. Then we just sit there. At some point,
the runtime subsystem sends back information abuot the system. "Hey, I
know of these machines, with these nodes, and these are the attributes
of these nodes". I'll have to change some of the UI code so that it is
more flexible to having partial information (for instance, when we know
there are 256 machines, I will create those components and they can be
displayed - but we might not yet know anything about the status of these
machines, that will be coming in asynchronously). Similarly, when we
start a job up we'll get back a jobID, and then we'll get back some
information about the processes related to that job as well - but,
again, asynchronously.
Greg and I talked about this and we like this a lot better. It'll
simplify the model considerably and make implementing the runtime
component for some runtime systems that are less advanced than OMPI
easier. Not only will I remove the use of these functions
(getNodeAttributes, getProcessAttributes, etc) but I'll straight up
remove the functions all together.
Any thoughts or concerns, Randy in particular, about how this interacts
with the RM? Should it at all?
I'll put my other thoughts in a separate email so they can be handled
separately.
--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------