[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[ptp-dev] Core/Launch/UI Release Strategy Update
|
Attached is the updated release strategy and feature list. It includes
the newer feature where you can select which machine to run your job on.
Today's RC1 for these components will not have a completely functional
OMPI runtime layer implementation. This is because of recent bugs in
OMPI which I believe have been resolved last night which were preventing
me from hooking in a pile of code I've been holding here to test. It
should be in and begin testing very early next week.
--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------
PTP RELEASE STRATEGY FOR THE CORE/RUNTIME COMPONENTS
----------------------------------------------------
FEATURES:
1: A Parallel Development Perspective which is comprised of a Machines
View, Jobs View, Process View, Legend, and includes a Preferences
Page.
2: A Machine's View which displays the status of all the machines the
user knows of.
3: Dynamically updated status of the nodes of machines as those nodes
change state (such as 'up', 'down', 'has a job running on it')
4: A Job's View which displays all the jobs that were started during
the current session. This includes job state and a listing of
processes comprising the job (including process state).
5: A Process View which displays the status of a single process (though
multiple Process Views may be open concurrently). This includes the
stdout of the process and has status and exit code fields that are
updated dynamically as the state of the process changes.
6: Ability to focus on a machine, node, job, or process and display
current status of that entity.
7: A Legend dialog that displays the various icons for nodes and
processes that represent the states these entities can undertake.
8: A Preferences Page which lets the user specify settings for the
type of monitoring and control system to use.
9: A model (series of instantiated data structures) that represents the
known universe (machines, nodes, jobs, processes). The model is
organized heirarchically and each entity contains attributes
(key/value pairs) that represent additional information about the
object (such as process state, node ownership, etc.)
10: An interface to external control and monitoring systems (runtime
system components).
11: Open-MPI control and monitoring systems implementations. These
interface to Open-RTE through the Java Native Interface (JNI).
12: Simulated control and monitoring systems which exercise the
runtime system interface, user interface, and allow demonstration
in environments without other control/monitoring systems.
13: Ability to start a job on a specified machine on a specified
number of processes.
14: Ability to terminate a running job.
15: Ability to create sets of nodes and processes for ease of viewing.
Also the ability to delete entries from these sets, add to them, and
focus on a given set.
TESTING PLAN:
SETUP:
Start on a bproc machine.
Make sure Open-MPI is setup and working (not part of this test,
just required to utilize Open-MPI).
Start with a fresh Eclipse install (including workspace).
Install PTP.
Compile the PTP Open-MPI JNI library.
Acquire some set of nodes for a long period for testing.
Launch Eclipse and then launch a new Eclipse with the PTP plugins
running.
1: Using the Parallel Development Preferences Page select the Simulated
runtime system.
2: Using the menuing system, open the Machines View and Jobs View.
3: Confirm that the Jobs View shows no jobs since it is a clean start.
4: Confirm that the Machines View displays the current state of the
machines. Use the drop-down menus to observe other machines that
are known and confirm they too display the current machine state.
5: Create a set of nodes using the user interface and name the set.
6: Confirm that the user can switch between (focus on) the full set of
nodes for the given machine and the newly created set.
7: Add a few more nodes to the new set.
8: Focus on a different machine and confirm that the set is no longer
visible (since it pertains to the original machine).
9: Create a new C project.
10: Create a new C-MPI source file in the project. The source file will
have each process producing periodic output and run for a few
minutes (so that the tests can be performed on a running job).
11: Compile the C-MPI application.
12: Create a new Run Configuration for this project, utilizing the
Parallel Development configuration to specify the number of
processes for this run and a chosen simulated machine.
13: Run the job (under simulated control).
14: Confirm that the appropriate node's change state in the Machines
View to specify they contain a running job and that the job starts
on the correct machine.
15: Focus on a node where one of the processes has been assigned.
Confirm that the Machines View displays the processes on that node,
including which job the process belongs to.
16: Double-click on one of the processes in the Machine View to bring
up the Process View. Confirm that the MPI rank, node number, job
number, and status are correct.
17: Observe process output in the output section of the Process View.
18: Wait for job to terminate.
19: Observe that the process state and exit code correctly display in
the Process View, Machine View (for the appropriate node), and Jobs
View (for the appropriate Job).
20: Bring the Jobs View to the foreground.
21: Confirm the Job previously run, as well as the processes contained
within it, are listed and is shown as terminated.
22: Re-run the same job. Confirm the Job View displays the job as
running and the processes as well.
23: Double-click on a process of the job, opening the Process View.
Confirm the running state.
24: Terminate the job by using the terminate icon.
25: Confirm the Job View updates to show the terminated state.
26: Confirm the Process View updates to show the terminated state,
including an exit-code.
27: Using the Parallel Development Preferences Page select the Open-MPI
runtime system.
28: Using the Open MPI Preferences Page under the Parallel Development
Preferences Page set the path and arguments to the ORTE daemon
(ORTEd).
29: Run the same job from step #12 (under OMPI control).
30: Repeat steps #14 through #26 for this second runtime system
(Open-MPI as opposed to simulation).
31: Switch back to the Machine View.
32: Using another terminal change the state of one of the nodes (reboot
it, change ownership, etc) and confirm that the node's status
changes in the Machine View (both the icon to match the legend as
well as the detailed text information to display the new change(s)).
SUPPORTED ARCHITECTURES/RUNTIMES:
bproc Linux [SIMULATOR AND OPEN-MPI]
Mac OS-X 10.4 (Tiger) [SIMULATOR ONLY]
Requires Eclipse 3.1.0
Requires CDT version 3.0.0 for building C-MPI applications.
PACKAGING:
1: Open-MPI binary build for bproc 64bit Linux.
2: PTP's Open-MPI JNI library build for 64bit Linux.
3: PTP core, launch, and UI packages as source.
All above will be tarballs and gzipped (X.tar.gz).
User will untar and gunzip the tarballs.
User will need to setup and test Open-MPI's compile and run facilities
themselves, confirming it works on their architecture.
When user launches Eclipse PTP source will build.