Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Orte jobs do not stop

Wyatt,

PTP 2.0 doesn't support OpenMPI with threading enabled. I think there's still a bug open on this. I'm not sure that OMPI 1.2 even fully supports enabling threads, and the APIs that PTP uses are not thread safe.

Greg

On Aug 20, 2008, at 5:48 PM, wspear wrote:

I've got a little more information on this issue now.  It occurs when
OpenMPI has been configured with:

--enable-mpi-threads
--with-devel-headers
--enable-orterun-prefix-by-default
--with-tm=/opt/torque

Apparently for any version of OpenMPI (tested with 1.2.3 and 1.2.6).
It does not occur when --enable-mpi-threads and
--enable-orterun-prefix-by-default are omitted.

I guess the focus now is on getting the new launch manager together,
but if any of this suggests a work-around for the current release,
please let me know.

-Wyatt

On Thu, Aug 7, 2008 at 5:13 PM, wspear <wspear@xxxxxxxxxxxxxx> wrote:
Greetings Greg,

Have you had a chance to look at this yet?

Thanks,
Wyatt

On Thu, Jul 10, 2008 at 2:32 AM, Greg Watson <g.watson@xxxxxxxxxxxx> wrote:
Wyatt,

I haven't tried PTP 2.0 with Open MPI 1.2.6 (only 1.2.5) so it's possible that something has broken. I'll install it on my Linux VM and let you know
how it goes.

Greg

On Jul 9, 2008, at 6:08 PM, wspear wrote:

This is openmpi 1.2.6 built with gnu 4.1.2.  It's running on x86_64
Linux. I have been using the PTP 2.0 available from the update site
(2.0.0.200806061515).  The behavior is the same in both Europa and
Ganymede.

-Wyatt

On Wed, Jul 9, 2008 at 5:35 AM, Greg Watson <g.watson@xxxxxxxxxxxx> wrote:

Wyatt,

What version of Open MPI are you using? What type of system is it? Is
this
PTP 2.0 or from CVS?

PTP 2.0 has not been tested with Ganymede, but it sounds like this is a problem with Open MPI. Can you try with Europa to see if you have the
same
problem?

Thanks,

Greg

On Jul 8, 2008, at 11:36 PM, wspear wrote:

Greetings,

When I try to execute an mpi application with ptp via the orte it
seems to run successfully, but after what should be the final output is printed the ptp continues to list the job status as running, and the orte process's processor usage shoots up to 100% in top. If I try
to stop the job or shut down the orte resource manager manually
eclipse freezes solid and I need to kill the orte process from the
command line.

Three possibly relevant factors are that I'm using a version of
openmpi configured for use with pbs (though I'm just running on the
head node at the moment), I'm running these tests in the Ganymede
Eclipse release, and I get a warning about oversubscribed nodes (which is also normal for running with mpirun on the headnode in this case).

I don't know if any of those could explain why the application would
run successfully while the orte fails to stop, though.

When I run it on a back-end node, where interactive jobs are allowed, the execution completes without the warning, but the output only shows up on the command line where Eclipse was launched, and there is no sign that the start of the process or individual jobs were detected or
handled by the PTP.  The orte process still freezes as described
above.

Any ideas how I might fix this? Has anyone has been working on a pbs
resource manager for ptp?

Thanks,

Wyatt
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev



_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev




Back to the top