Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] How PTP 2.1 runs in the PC cluster(remote target)

Wei-Cheng,

Did you recompile the program after installing the new version of openmpi? Also, please try running the program from the command line first to make your openmpi installation is ok.

Greg

On Jan 22, 2009, at 10:54 AM, Wei-Cheng Lau wrote:

Hi, Greg

I have followed your suggestion to try to setup new PTP2.1.1 and changed openmpi version from 1.2 to 1.3.
I submitted job to two machines, but it occured some error message when it in runtime:

[michael6:01708] mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_ras_dash_host: file not found (ignored)
[michael6:01708] mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_ras_gridengine: file not found (ignored)
[michael6:01708] mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_ras_localhost: file not found (ignored)
[michael6:01708] mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_errmgr_hnp: file not found (ignored)
[michael6:01708] mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_errmgr_orted: file not found (ignored)
[michael6:01708] mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_errmgr_proxy: file not found (ignored)
[michael6:01708] mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_iof_proxy: file not found (ignored)
[michael6:01708] mca: base: component_find: unable to open /usr/local/lib/openmpi/mca_iof_svc: file not found (ignored)
Host key verification failed.
<map>
    <host name="michael6" slots="1" max_slots="0">
        <process rank="0"/>
    </host>
</map>
--------------------------------------------------------------------------
A daemon (pid 1709) died unexpectedly with status 255 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished


 It can't found the files (mca_iof_svc...), but I've checked the same directory, the files is in !
 And the LD_LIBRARY_PATH, I also set the path "/usr/local/lib/openmpi/".
Did I lost some thing?


regard,

Wei-Cheng

 

2009/1/21 Greg Watson <g.watson@xxxxxxxxxxxx>
Hi Wei-Cheng,

Apologies for taking so long to reply. The message "Hostnames from Open MPI output do not match expected hostname" occurs when the machine running Eclipse and the cluster do not agree on the names of the nodes. Please try the latest PTP 2.1.1 build (http://wiki.eclipse.org/PTP/builds/2.1.1) which includes a fix for this problem (for OpenMPI 1.3).

Regards,

Greg

On Dec 20, 2008, at 3:02 AM, Wei-Cheng Lau wrote:

Hi,Greg
My _expression_ may be not clear enough, and you have misunderstood. In fact, I have been able to connect to the header node of the PC cluster and work successfully. I want to submit my job to two machines, and then I try to set the hostfile of the configurations in the Resources option and run it, but it presents some error messages:
'Open Mpi Job' has encountered a problem.
Failed after executing command to launch parallel application.
Details:
Failed after executing command to launch parallel application.
Hostnames from Open MPI output do not match expected hostname.

 

Besides, I try to run the command which is produced by ptp , liked mpirun –display-map –np 2 –bynode –hostfile /home/XXX/hostfile /home/XXX/test,directly. It is worked in our cluster. How do I configure the parameter of PTP if I want to run the parallel program in the multi-machine?

thanks,

Wei-Cheng Lau

2008/12/18 Greg Watson <g.watson@xxxxxxxxxxxx>
Do you have Open MPI installed on the cluster? If not, then you'll need to do that first, and configure it so that you can run MPI programs. Once you have that working, create an Open MPI resource manager, then a remote tools service provider that connects to the head node of the cluster.

Regards,

Greg


On Dec 14, 2008, at 1:38 PM, Wei-Cheng Lau wrote:

Hi,
I want to run PTP2.1 in the PC cluster(remote target), but now it only can run in the single machine. How can I make it? I found hostfile that is in the Resource Manager, is it the key set ? If it is, how to use that?
My execution environment: The PTP2.1 install in Windows XP  and the PC cluster's OS is Fedora 9.
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev


Back to the top