Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Orte jobs do not stop

Greetings Greg,

Have you had a chance to look at this yet?

Thanks,
Wyatt

On Thu, Jul 10, 2008 at 2:32 AM, Greg Watson <g.watson@xxxxxxxxxxxx> wrote:
> Wyatt,
>
> I haven't tried PTP 2.0 with Open MPI 1.2.6 (only 1.2.5) so it's possible
> that something has broken. I'll install it on my Linux VM and let you know
> how it goes.
>
> Greg
>
> On Jul 9, 2008, at 6:08 PM, wspear wrote:
>
>> This is openmpi 1.2.6 built with gnu 4.1.2.  It's running on x86_64
>> Linux.  I have been using the PTP 2.0 available from the update site
>> (2.0.0.200806061515).  The behavior is the same in both Europa and
>> Ganymede.
>>
>> -Wyatt
>>
>> On Wed, Jul 9, 2008 at 5:35 AM, Greg Watson <g.watson@xxxxxxxxxxxx> wrote:
>>>
>>> Wyatt,
>>>
>>> What version of Open MPI are you using? What type of system is it? Is
>>> this
>>> PTP 2.0 or from CVS?
>>>
>>> PTP 2.0 has not been tested with Ganymede, but it sounds like this is a
>>> problem with Open MPI. Can you try with Europa to see if you have the
>>> same
>>> problem?
>>>
>>> Thanks,
>>>
>>> Greg
>>>
>>> On Jul 8, 2008, at 11:36 PM, wspear wrote:
>>>
>>>> Greetings,
>>>>
>>>> When I try to execute an mpi application with ptp via the orte it
>>>> seems to run successfully, but after what should be the final output
>>>> is printed the ptp continues to list the job status as running, and
>>>> the orte process's processor usage shoots up to 100% in top.  If I try
>>>> to stop the job or shut down the orte resource manager manually
>>>> eclipse freezes solid and I need to kill the orte process from the
>>>> command line.
>>>>
>>>> Three possibly relevant factors are that I'm using a version of
>>>> openmpi configured for use with pbs (though I'm just running on the
>>>> head node at the moment), I'm running these tests in the Ganymede
>>>> Eclipse release, and I get a warning about oversubscribed nodes (which
>>>> is also normal for running with mpirun on the headnode in this case).
>>>>
>>>> I don't know if any of those could explain why the application would
>>>> run successfully while the orte fails to stop, though.
>>>>
>>>> When I run it on a back-end node, where interactive jobs are allowed,
>>>> the execution completes without the warning, but the output only shows
>>>> up on the command line where Eclipse was launched, and there is no
>>>> sign that the start of the process or individual jobs were detected or
>>>> handled by the PTP.  The orte process still freezes as described
>>>> above.
>>>>
>>>> Any ideas how I might fix this?  Has anyone has been working on a pbs
>>>> resource manager for ptp?
>>>>
>>>> Thanks,
>>>>
>>>> Wyatt
>>>> _______________________________________________
>>>> ptp-dev mailing list
>>>> ptp-dev@xxxxxxxxxxx
>>>> https://dev.eclipse.org/mailman/listinfo/ptp-dev
>>>>
>>>
>>> _______________________________________________
>>> ptp-dev mailing list
>>> ptp-dev@xxxxxxxxxxx
>>> https://dev.eclipse.org/mailman/listinfo/ptp-dev
>>>
>>>
>> _______________________________________________
>> ptp-dev mailing list
>> ptp-dev@xxxxxxxxxxx
>> https://dev.eclipse.org/mailman/listinfo/ptp-dev
>>
>
> _______________________________________________
> ptp-dev mailing list
> ptp-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-dev
>
>


Back to the top