Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] [Is there a generic method, in LSF, to find the process ID's of all LSF processes?

Q is flaky. I wouldn't worry much about it not working on Q - for many reasons. So much software has bugs in their ports to Q.

-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard@xxxxxxxx
---------------------------------------------------------------------



Randy M. Roberts wrote:
I've been looking into LSF's API to determine process id's from job information. I've had LANL's ICN Consulting help me with this.

The results are mixed.  We found the lsb_readjobinfo(...) function
call to obtain pid's for a job.  This works on Lambda and Theta, but
does not work on QSC.  Both Lambda and Theta are Linux machines, one
is BProc the other is not;  QSC is a Compac Q machine.  On QSC the
results are always npids = 0, no matter how many processes the job
actually possesses.

Are there any LSF API experts out there that can help out?

Regards,
Randy

-------- Forwarded Message --------
From: consult@xxxxxxxx
To: rsqrd@xxxxxxxx
Subject: [CIC600000210542] Is there a generic method, in LSF, to find
the process ID's of all LSF processes?
Date: Wed, 24 May 2006 11:13:19 -0600
X-Mailer: Rem-Mail

From:    robd

Digging into the API for lsf, there are indications that some process id info is available.
A test code run on flash using the API produces much, if not all, of the info requested
(significant help from user himself in writing this code).

The code works quite similarly on lambda.

However, initial test on qsc does not produce the same level of output -- i.e., the pidInfo
table is apparently not filled.

See http://www.science-computing.de/manuals/lsf/6.2/lsf6.2_api_ref/lsb_readjobinfo.3.html for structure definitions.


Here is a sample of the code used: Compiler info is in comments:

----------------------------------
#include <stdio.h>
#include <lsf/lsf.h>
#include <lsf/lsbatch.h>
/* * compile with intel/8.1 on flash icc -I/opt/LSF/6.0/linux2.4-glibc2.3-x86-bproc/share/include -L/opt/LSF/6.0/linux2.4-glibc2.3-x86-bproc/lib lsfBjobs.c -o lsfBjobs -llsf -lbat -lnsl
 * compile with Compaq C V6.5-011 on qsc
 cc -I/lsfdir/QSC/eagle/6.0/include -L/lsfdir/QSC/eagle/6.0/alpha5-rms/lib lsfBjobs.c -o lsfBjobs -llsf -lbat

 * compile with gcc on lambda
 gcc -I/lsfdir/LAMBDA/linux2.4/6.0/include -L/lsfdir/LAMBDA/linux2.4/6.0/linux2.4-glibc2.3-x86/lib lsfBjobs.c -o lsfBjobs -llsf -lbat -lnsl

 liblsf for ls_ api's
 libbat for lsb_ api's
 libnsl for yp_ api's

see: http://www.lanl.gov/asci/bluemtn/LSF/DOCS/lsf3.2/html/program/prog-con.htm LSF API Services
 */

int main(argc, argv)
    int  argc;
    char **argv;
{
    int  options = RUN_JOB;
    char *user = "robd";             /* match jobs for user robd by default */
    struct jobInfoEnt *job;
    int more;
    int i;
    struct jRusage runUsage;
    struct pidInfo pidInf;

    if (lsb_init(argv[0]) < 0) {
        lsb_perror("lsb_init");
        exit(-1);
    }

    if ( argc > 1 ) {
	    user = argv[1];
    }
    if (lsb_openjobinfo(0, NULL, user, NULL, NULL, options) < 0) {
        lsb_perror("lsb_openjobinfo");
        exit(-1);
    }

    printf("All running jobs submitted by user %s:\n",user);
    for (;;) {
        job = lsb_readjobinfo(&more);
        if (job == NULL) {
            lsb_perror("lsb_readjobinfo");
            exit(-1);
        }

        /* display the job */
        printf("%sLSF Job <%lld> <%d pid> of user <%s>, submitted from host <%s> : status <%o> \n",
                ctime(&job->submitTime), job->jobId, job->jobPid, job->user,
		job->fromHost, job->status);

	runUsage = job->runRusage;
	printf("\t%d processes, mem <%d> swap <%d> utime <%d> stime <%d>\n",
			runUsage.npids, runUsage.mem, runUsage.swap,
			runUsage.utime, runUsage.stime);
	for  ( i=0; i<runUsage.npids; ++i ) {
	   pidInf = runUsage.pidInfo[i];
	   printf("\t\tpgid <%d> ppid <%d> pid <%d>\n",
		pidInf.pgid, pidInf.ppid, pidInf.pid);

	}
if (! more) break;
    }

    lsb_closejobinfo();

    exit(0);
}

Original Problem Statement:



Previous Email Message:



=======================================================
Derrick Robert M.
CCN-7 High Performance Computing    |   consult@xxxxxxxx
Mail Stop B-251                  |   (505) 665-4444
Los Alamos National Lab     |   (505) 665-6647  (fax)
Los Alamos,  NM  87545     |
=======================================================





_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev



Back to the top