[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[ptp-dev] MPI Runtime Environment Commits/Updates
|
I committed some code just now having to do with the MPI Runtime
Environment layer. Basically this code when linked with Open MPI (OMPI)
lets you spawn a parallel job from within the PTP/Eclipse. You get
messages about it as it's going, etc.
There's some very big caveats currently - some are my problems, others
aren't.
1. Firstly you have to have already spawned the Open Runtime
Environment Daemon (ORTEd) from a console. I'm going to be
changing this so that you have a file selector somewhere in the UI
that lets you pick the daemon and PTP will spawn it for you.
2. It's hard coded what job is spawns. Yes, I know this is stupid
but this was for testing purposes. Right now it's looking for a
file in a directory that I have. If you want to start tinkering
with this code then you have to change this hard coded String.
Obviously I'll quickly change this to pull down the information
from the Run. . . box.
3. The ORTEd has a lot of locking issues due to it not being
thread-safe. Right now I'm doing the locks in JNI-C because of
another issue that the OMPI team is aware of. In the future I'll
move these locks out of C and into Java where they're easier and
more portable - in Java we'll have more control over them as well.
4. Currently the ORTEd doesn't like it when you sleep on their
progress function and in another thread try and start a job. This
is a thread safety issue and, while they have multithreaded
versions of OMPI, they apparently have other bugs that are
problems for us. We've got a workaround for this which involves
the progress function not being a true sleep (which is implemented
as a blocking select() as I understand it). Instead we've had to
go with a nonblocking select and then putting the thread to sleep
for 1000usecs. This results in a light weight polling - but
something I'm not happy with long-term. It'll allow work to
continue now, but something they're addressing as we speak.
5. There's also an interesting bug with ORTE/OMPI where the messages
coming out of your MPI task are repeated the more you run your
job. Again, the OMPI team is aware of this now and is trying to
fix it. Probably only a few days on this one.
So, some real progress here. And if you want to try this sub-alpha
version feel free. :) You may want to wait a day or two for me to tidy
it up with the GUI so that it gets rid of this bit of hardcoding.
--
-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------