[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [ptp-dev] Orte jobs do not stop
|
Wyatt,
PTP 2.0 doesn't support OpenMPI with threading enabled. I think
there's still a bug open on this. I'm not sure that OMPI 1.2 even
fully supports enabling threads, and the APIs that PTP uses are not
thread safe.
Greg
On Aug 20, 2008, at 5:48 PM, wspear wrote:
I've got a little more information on this issue now. It occurs when
OpenMPI has been configured with:
--enable-mpi-threads
--with-devel-headers
--enable-orterun-prefix-by-default
--with-tm=/opt/torque
Apparently for any version of OpenMPI (tested with 1.2.3 and 1.2.6).
It does not occur when --enable-mpi-threads and
--enable-orterun-prefix-by-default are omitted.
I guess the focus now is on getting the new launch manager together,
but if any of this suggests a work-around for the current release,
please let me know.
-Wyatt
On Thu, Aug 7, 2008 at 5:13 PM, wspear <wspear@xxxxxxxxxxxxxx> wrote:
Greetings Greg,
Have you had a chance to look at this yet?
Thanks,
Wyatt
On Thu, Jul 10, 2008 at 2:32 AM, Greg Watson
<g.watson@xxxxxxxxxxxx> wrote:
Wyatt,
I haven't tried PTP 2.0 with Open MPI 1.2.6 (only 1.2.5) so it's
possible
that something has broken. I'll install it on my Linux VM and let
you know
how it goes.
Greg
On Jul 9, 2008, at 6:08 PM, wspear wrote:
This is openmpi 1.2.6 built with gnu 4.1.2. It's running on x86_64
Linux. I have been using the PTP 2.0 available from the update
site
(2.0.0.200806061515). The behavior is the same in both Europa and
Ganymede.
-Wyatt
On Wed, Jul 9, 2008 at 5:35 AM, Greg Watson
<g.watson@xxxxxxxxxxxx> wrote:
Wyatt,
What version of Open MPI are you using? What type of system is
it? Is
this
PTP 2.0 or from CVS?
PTP 2.0 has not been tested with Ganymede, but it sounds like
this is a
problem with Open MPI. Can you try with Europa to see if you
have the
same
problem?
Thanks,
Greg
On Jul 8, 2008, at 11:36 PM, wspear wrote:
Greetings,
When I try to execute an mpi application with ptp via the orte it
seems to run successfully, but after what should be the final
output
is printed the ptp continues to list the job status as running,
and
the orte process's processor usage shoots up to 100% in top.
If I try
to stop the job or shut down the orte resource manager manually
eclipse freezes solid and I need to kill the orte process from
the
command line.
Three possibly relevant factors are that I'm using a version of
openmpi configured for use with pbs (though I'm just running on
the
head node at the moment), I'm running these tests in the Ganymede
Eclipse release, and I get a warning about oversubscribed nodes
(which
is also normal for running with mpirun on the headnode in this
case).
I don't know if any of those could explain why the application
would
run successfully while the orte fails to stop, though.
When I run it on a back-end node, where interactive jobs are
allowed,
the execution completes without the warning, but the output
only shows
up on the command line where Eclipse was launched, and there is
no
sign that the start of the process or individual jobs were
detected or
handled by the PTP. The orte process still freezes as described
above.
Any ideas how I might fix this? Has anyone has been working on
a pbs
resource manager for ptp?
Thanks,
Wyatt
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev