Re: [ptp-dev] new rm prototype

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [ptp-dev] new rm prototype

From: Greg Watson <g.watson@xxxxxxxxxxxx>
Date: Tue, 4 Jan 2011 21:50:44 -0500
Delivered-to: ptp-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/ptp-dev>
List-help: <mailto:ptp-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/ptp-dev>, <mailto:ptp-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/ptp-dev>, <mailto:ptp-dev-request@eclipse.org?subject=unsubscribe>

The main issue for me is not requiring the installation of the LLView server side to use PTP. E.g. To be able to use PTP to launch a parallel job on a remote system using Open MPI, then I don't want the user to have to install any additional components on the remote machine. Having said that, Open MPI, MPICH2, etc. are sufficiently different from PBS, SLURM, etc. that we may do things differently for these (maybe even use the existing infrastructure?)

Greg

On Jan 4, 2011, at 9:34 PM, Albert L. Rossi wrote:

> Greg,
> 
> Thanks for starting this work.  I will be looking at what you've done within the next day or so.  Without having seen it though, I do have one immediate reaction to your proposal to have job state checked independently from the monitoring.  I'm not sure if this is such a good idea.  I can see it leading to the kind of complexities we currently have in the proxies, a good bit of which will really duplicate what the monitoring does anyway.  Unless you want to limit arbitrarily the control framework to handling only one job at a time, which I doubt you do, then you are going to have to go through the implementation of a matching update against stored values, and if this is all done on the client side, it may run up against the very scalability issues we want to avoid (unless we do something like construct an OR'd grep of jobIds against the polling invocation, e.g., qstat ... | grep -e <pattern>; but even with this, we have made the client stateful in a stronger way than previous
> ly: do we want that?).
> 
> On the other hand, there is probably a way we can arrange to abstract the LLview client calls such that they are cleanly separated from whatever UI representations are going to be given them; then LLview can simply serve as the implementation of this subset of monitoring which is job state handling; building a dependency between our control abstraction and a job state abstraction implementable by LLview doesn't seem such a terrible thing to me at least.
> 
> At any rate, I'd like to discuss this some more. 
> 
> Al
> 
> 
> ----- Greg Watson <g.watson@xxxxxxxxxxxx> wrote:
>> Happy new year!
>> 
>> I've just committed a prototype of the new RM control framework to org.eclipse.ptp.core.rm. The framework roughly follows the DRMAA spec, but since we're not actually implementing DRMAA I chose only to use those parts I thought were relevant to us.
>> 
>> The way the new framework will work is that a new RM will need to provide implementations of IResourceManager (by extending AbstractResourceManager), and the IJobTemplateFactory and IJobTemplate interfaces.
>> 
>> There are two new primary methods:
>> 
>> 	public String runJob(IJobTemplate jobTemplate, IProgressMonitor monitor) throws ResourceManagerException;
>> 
>> and 
>> 
>> 	public void control(String job, JobControlOperation operation, IProgressMonitor monitor) throws ResourceManagerException;
>> 
>> The runJob method basically replaces the current submitJob method. The control method is used to control jobs, such as placing them on hold, terminating them, etc.
>> 
>> Job status changes are notified via the IResourceManagerListener interface (much as they are now, but simpler). The resource manager will need to call the handleJobStatusChange method whenever a job's status changes (i.e. when it changes from submitted to running to done, etc. This is probably going to be the trickiest bit of the implementation, since most RM's will need to be polled for this information. I would prefer not to have to use the monitoring system to provide this information if possible; that way we could keep a clean separation between the control and monitoring components.
>> 
>> The piece I'm least sure about is the IJobTemplate. The DRMAA spec includes a whole lot of methods for things like setStartTime, setHardWallclockTimeLimit, etc., but these seem more like attributes to me. I presume that they are there to mandate their use (some are optional), so I'm not sure if we should keep them or not. On one hand we could just pass an ILaunchConfiguration with all the attributes from the launch dialog. On the other hand, it might be nice to enumerate the common attributes more explicitly. I'm also not sure what to do about stdin/stdout/stderr (if anything).
>> 
>> Note that this is all pretty high level. I presume that we'll provide a concrete implementation that gets configuration/attribute information from XML files, uses remote commands, etc., and that can be easily cloned to support different RMs.
>> 
>> Please take a look and provide feedback.
>> 
>> Thanks,
>> Greg
>> _______________________________________________
>> ptp-dev mailing list
>> ptp-dev@xxxxxxxxxxx
>> https://dev.eclipse.org/mailman/listinfo/ptp-dev
>> 
> 
> _______________________________________________
> ptp-dev mailing list
> ptp-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/ptp-dev

Follow-Ups:
- Re: [ptp-dev] new rm prototype
  - From: Albert L. Rossi

References:
- Re: [ptp-dev] new rm prototype
  - From: Albert L. Rossi

Prev by Date: Re: [ptp-dev] new rm prototype
Next by Date: Re: [ptp-dev] new rm prototype
Previous by thread: Re: [ptp-dev] new rm prototype
Next by thread: Re: [ptp-dev] new rm prototype
Index(es):
- Date
- Thread

Breadcrumbs