[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [ptp-dev] Some Changes I Need to Implement
|
Nathan,
I haven't looked into the current implementation a lot at this point
(that is next on my plate), but I was under the impression from our
discussions that it already worked the way you propose to change it.
I see nothing that should preclude making it more event-driven, the
only question is that of development timing. My next task was to rip
out ORTE, MPICH, and SIMULATION into separate plug-ins. I am concerned
about doing that at the same time making changes to their
implementations.
I hate taking this off of ptp-dev, but I propose we meet to discuss
this. I am open today. We can post the results of our discussion to
ptp-dev.
Regards,
R^2
On Thu, 2006-06-08 at 08:24 -0600, Nathan DeBardeleben wrote:
> So I have a few things I need to change to the base code and with the
> resource manager changes going in I think we might need to work out
> exactly how to do this in the "new system".
>
> Firstly - I need to change the model (method, not model object :)) that
> interfaces to the runtime subsystem so that instead of the Eclipse-side
> asking for information about the runtime-system (such as node status,
> process status, job status) these are instead triggered by the runtime
> subsystem when appropriate. An example would probably help.
>
> Currently, when we start up we send a Startup() message to the runtime
> system. It does so. Then we ask the runtime system how many nodes it
> knows about. It returns a number. Then, for each node we ask it, in
> sequence, what it knows about those nodes (up/down, node name, general
> attributes, etc). Similarly, when a job starts, we get a jobID back and
> then we go back to the runtime system and ask it for information from
> that jobID - including, in sequence, attributes related to each process.
>
> OK, that's the old way. The new way I want to do it better keeps up our
> event-driven methodology. In particular, the runtime subsystem will
> tell us information about the nodes, jobs, processes it knows about with
> a general 'discovery' message. So here's an example of how the new
> model will work:
>
> We send a Startup() message. Then we just sit there. At some point,
> the runtime subsystem sends back information abuot the system. "Hey, I
> know of these machines, with these nodes, and these are the attributes
> of these nodes". I'll have to change some of the UI code so that it is
> more flexible to having partial information (for instance, when we know
> there are 256 machines, I will create those components and they can be
> displayed - but we might not yet know anything about the status of these
> machines, that will be coming in asynchronously). Similarly, when we
> start a job up we'll get back a jobID, and then we'll get back some
> information about the processes related to that job as well - but,
> again, asynchronously.
>
> Greg and I talked about this and we like this a lot better. It'll
> simplify the model considerably and make implementing the runtime
> component for some runtime systems that are less advanced than OMPI
> easier. Not only will I remove the use of these functions
> (getNodeAttributes, getProcessAttributes, etc) but I'll straight up
> remove the functions all together.
>
> Any thoughts or concerns, Randy in particular, about how this interacts
> with the RM? Should it at all?
>
> I'll put my other thoughts in a separate email so they can be handled
> separately.
>