Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Some Changes I Need to Implement

Nathan,

I haven't looked into the current implementation a lot at this point 
(that is next on my plate), but I was under the impression from our
discussions that it already worked the way you propose to change it.

I see nothing that should preclude making it more event-driven, the
only question is that of development timing.  My next task was to rip
out ORTE, MPICH, and SIMULATION into separate plug-ins.  I am concerned
about doing that at the same time making changes to their 
implementations.

I hate taking this off of ptp-dev, but I propose we meet to discuss
this.  I am open today.  We can post the results of our discussion to
ptp-dev.

Regards,
R^2

On Thu, 2006-06-08 at 08:24 -0600, Nathan DeBardeleben wrote:
> So I have a few things I need to change to the base code and with the 
> resource manager changes going in I think we might need to work out 
> exactly how to do this in the "new system".
> 
> Firstly - I need to change the model (method, not model object :)) that 
> interfaces to the runtime subsystem so that instead of the Eclipse-side 
> asking for information about the runtime-system (such as node status, 
> process status, job status) these are instead triggered by the runtime 
> subsystem when appropriate.  An example would probably help.
> 
> Currently, when we start up we send a Startup() message to the runtime 
> system.  It does so.  Then we ask the runtime system how many nodes it 
> knows about.  It returns a number.  Then, for each node we ask it, in 
> sequence, what it knows about those nodes (up/down, node name, general 
> attributes, etc).  Similarly, when a job starts, we get a jobID back and 
> then we go back to the runtime system and ask it for information from 
> that jobID - including, in sequence, attributes related to each process.
> 
> OK, that's the old way.  The new way I want to do it better keeps up our 
> event-driven methodology.  In particular, the runtime subsystem will 
> tell us information about the nodes, jobs, processes it knows about with 
> a general 'discovery' message.   So here's an example of how the new 
> model will work:
> 
> We send a Startup() message.  Then we just sit there.  At some point, 
> the runtime subsystem sends back information abuot the system.  "Hey, I 
> know of these machines, with these nodes, and these are the attributes 
> of these nodes".  I'll have to change some of the UI code so that it is 
> more flexible to having partial information (for instance, when we know 
> there are 256 machines, I will create those components and they can be 
> displayed - but we might not yet know anything about the status of these 
> machines, that will be coming in asynchronously).  Similarly, when we 
> start a job up we'll get back a jobID, and then we'll get back some 
> information about the processes related to that job as well - but, 
> again, asynchronously.
> 
> Greg and I talked about this and we like this a lot better.  It'll 
> simplify the model considerably and make implementing the runtime 
> component for some runtime systems that are less advanced than OMPI 
> easier.  Not only will I remove the use of these functions 
> (getNodeAttributes, getProcessAttributes, etc) but I'll straight up 
> remove the functions all together.
> 
> Any thoughts or concerns, Randy in particular, about how this interacts 
> with the RM?  Should it at all?
> 
> I'll put my other thoughts in a separate email so they can be handled 
> separately.
> 



Back to the top