[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[ptp-dev] PTP now correctly hooks into node state change with bproc and ORTE
|
I just committed some code that finishes up something I've been working
on for a bit. Now, with PTP using ORTE/OMPI and running on a bproc
system we can correctly monitor node status information and display that
with the correct node. So, for instance, I can run PTP on the front-end
of a 10 node bproc test machine I've got here and it will show me,
graphically, who the owners of each node are, what their state is, etc.
Then I can tell it to change the ownership or permissions of one node
and wham, immediately our icon changes for that corresponding node. :)
Then I can say reboot the entire cluster and we'll get a flurry of
messages as each of the nodes changes state and goes through a series of
states, such as 'reboot, down, booting, up'. They don't do them all in
lock step, of course, as machines can't be expected to boot exactly at
the same time and it's wonderful to see my little grid of icons all
flicker as the machines change state.
One thing it doesn't do yet is it sends one event for each event in the
subsystem. What this means is that if we rebooted say a 2000node bproc
cluster we'd get 2000 events for each state change. This is obviously
not going to scale so that's going to be a focus in the coming weeks
after the release to put in some sort of throttling or coalescing code.
Anyway, just wanted to relay some good news and since I don't think many
of the people listening to this list actually run bproc systems you guys
might not get to see it first hand. :)
Thanks go out to the OMPI guys for getting me the correct code to make
this possible.
--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------