Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[ptp-dev] Some Changes I Need to Implement

So I have a few things I need to change to the base code and with the resource manager changes going in I think we might need to work out exactly how to do this in the "new system".

Firstly - I need to change the model (method, not model object :)) that interfaces to the runtime subsystem so that instead of the Eclipse-side asking for information about the runtime-system (such as node status, process status, job status) these are instead triggered by the runtime subsystem when appropriate. An example would probably help.

Currently, when we start up we send a Startup() message to the runtime system. It does so. Then we ask the runtime system how many nodes it knows about. It returns a number. Then, for each node we ask it, in sequence, what it knows about those nodes (up/down, node name, general attributes, etc). Similarly, when a job starts, we get a jobID back and then we go back to the runtime system and ask it for information from that jobID - including, in sequence, attributes related to each process.

OK, that's the old way. The new way I want to do it better keeps up our event-driven methodology. In particular, the runtime subsystem will tell us information about the nodes, jobs, processes it knows about with a general 'discovery' message. So here's an example of how the new model will work:

We send a Startup() message. Then we just sit there. At some point, the runtime subsystem sends back information abuot the system. "Hey, I know of these machines, with these nodes, and these are the attributes of these nodes". I'll have to change some of the UI code so that it is more flexible to having partial information (for instance, when we know there are 256 machines, I will create those components and they can be displayed - but we might not yet know anything about the status of these machines, that will be coming in asynchronously). Similarly, when we start a job up we'll get back a jobID, and then we'll get back some information about the processes related to that job as well - but, again, asynchronously.

Greg and I talked about this and we like this a lot better. It'll simplify the model considerably and make implementing the runtime component for some runtime systems that are less advanced than OMPI easier. Not only will I remove the use of these functions (getNodeAttributes, getProcessAttributes, etc) but I'll straight up remove the functions all together.

Any thoughts or concerns, Randy in particular, about how this interacts with the RM? Should it at all?

I'll put my other thoughts in a separate email so they can be handled separately.

--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------



Back to the top