Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ptp-dev] Questions related to batch job submission

Greg
We weren't planning to reimplement any LL function in PTP. A LL user would 
submit a multi-step job as a single command file containing the 
specification statements to define the multi-step job. Our implementation 
for PTP would follow that model.

The question we had was related to the 'jobid' that the proxy fills in in 
each new job event it sends to the front end. There's a comment in 
proxy_event.c that states that new jobs created in response to a submit 
command must fill in jobid with the jobid passed to the proxy in the job 
command. This works fine when the job command file specifies a single job 
step since when we query LL to get the job info, we only have a single new 
job event.

In the case of a multi-step job, there will be a single submit command, 
but when LL processes the job submission, it can result in multiple new 
jobs, where the number cannot be determined by parsing the command file 
(only LL knows the correct number of steps after it has parsed the command 
file as part of the submit process).

In the case of a multi-step job, as we generate new job events for each 
job step of the just submitted job, what do we use for the jobid in each 
event? It seems like we would generate the first event using the jobid 
from the submit command and -1 for the jobid of the remainder of the new 
job events, but we're not sure that's right.

Also, we have figured out how to recognize new jobs as they are initially 
submitted, so we are able to provide an immediate indication to the user 
that the job was submitted.

Dave



Greg Watson <g.watson@xxxxxxxxxxxx> 
Sent by: ptp-dev-bounces@xxxxxxxxxxx
06/19/2007 03:52 PM
Please respond to
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>


To
Parallel Tools Platform general developers <ptp-dev@xxxxxxxxxxx>
cc

Subject
Re: [ptp-dev] Questions related to batch job submission







On Jun 18, 2007, at 1:38 PM, Dave Wootton wrote:

> Hi
> We were discussing details of batch job submission thru PTP and had 
> some
> questions about expected PTP behavior and how we should implement our
> support
>
> 1) The user can submit a job which contains a set of job steps. 
> From our
> perspective, each job step behaves as if it was a separate job, 
> although
> there may be dependency and conditional execution specifications that
> require job step 1 to complete before job step 2 canm begin, or 
> that job
> step 2 can only run if job step 1 completed successfully, etc. The job
> submission/job command file that specifes the individual job steps 
> is a
> single file that will be passed to the proxy in a single run command.
>
> Current ptp behavior is that the run command includes a jobid that is
> generated by the front end and passed to the proxy. The proxy 
> responds to
> the run command with an event containing that jobid as well as the
> proxy-generated identifier for that job. This works for a single 
> job, or
> for the first step of a multi-step job.
>
> How should multi-step jobs be handled? Should the PTP front end have a
> list or array of jobs steps built at the time the job is submitted, 
> and
> use the same jobid for each of those steps? Should the front end 
> generate
> a unique jobid for each step that is then passed across in the run
> command, maybe as an array of jobids, and then the proxy generate a 
> new
> job event for each step using the corresponding jobid? Should the 
> proxy
> just use the passed jobid for the first step and use -1 as the 
> jobid for
> all subsequent steps, since the front end doesn't know about the
> additional job steps?

It sounds like LL handles multi-step jobs using a command file. I 
presume the user just submits this command file with a single job 
submission command? Why wouldn't you do the same thing from Eclipse - 
submit the job command file - rather than try and implement LL 
functionality in the UI?

It would be possible for you to build a job control UI that monitors 
the status of a job and when it is successfully completed, submits 
the next job in turn. However, there is currently no functionality to 
control whether a job is run or not, only if it is submitted. This 
sounds like some internal LL functionality that you would need to 
expose. How does it work now?

>
> 2) When we submit a job, the job may not appear on any job queue for a
> while, possibly several minutes. We won't have some job related
> information, such as cluster (machine name) where the job was queued,
> until the job appears on the job queue. If we delay our event 
> response to
> the run command until we have the required information, does that 
> cause
> problems, such as blocking any additional jobs from being queued 
> until the
> event notification from the first run command is received? Does the 
> front
> end have problems tracking multiple 'in process' run commands 
> active at
> the same time?

The architecture supports multiple outstanding submit commands, so 
delaying the new job event won't cause the UI any problems. However, 
it would be nice if the user got feedback that something had actually 
happened when they submitted the job. One thought is that you could 
have a dummy 'submitted' queue that these jobs would go onto 
immediately. Then when the job is actually placed on a cluster, you 
could remove it from the dummy queue and add it to the correct queue.

Greg

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev




Back to the top