Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [ptp-dev] Support for SLURM 2.1 committed


Greg,

As you said, the original intention of using ProcessStatus attribute is to reflect the status of process in slurm jobs.
Since SLURM doesn't provide any process state information, we fake process state/status with its parent job state/status.

By now, I have modified slurm.ui plugin to remove the use of ProcessAttributes.getStatusAttributeDefinition(),
by using the parent job status attribute to display process state icons.
So statusAttrDef and getStatusAttributeDefinition() can be removed from ProcessAttributes class.

Here are two related problems worth discussing.
The first one is  related to the scalability of UI. The rm proxy needs to send a NewProcess event for each process  in a newly submitted job. When the job scale is thousands or more, this will cause heavy workload for UI to handle so many events. Similarly, as the scalabity of machine is continously increasing, the node state change events will cause the same problem when node number is large and node state changes frequently.  How to handle with this problem?

The second one is related to the element icon view.   As SLURM provide different node and job states from those enumerated definitions , I use node/job status attributes to display their icon view. However, this will cause changes to the "Legend" dialog and destroy its simplicity if add more status icons into it.  One potential solution is to keep the node/job states from all RM systems consistent with the enumerated value and display icon view based on state value (this will keep Legend simple and consistent), and provide another method to show the RM-specific node/job status.  For example, the node status attribute can be displayed in the Node Attributes view.    Maybe the job list view can provide another field to show job's status attribute.

Jie


From: g.watson@xxxxxxxxxxxx
Subject: Re: [ptp-dev] Support for SLURM 2.1 committed
Date: Thu, 27 May 2010 15:38:03 -0400
To: ptp-dev@xxxxxxxxxxx

Jie,

We would like to remove the ProcessAttributes.getStatusAttributeDefinition() interface if possible. 

I've looked briefly at the SLURM code, and it seems like you're only using this attribute to replicate the job status in the individual processes. Do you have a specific reason for wanting to do this? The original intention was that the process icons would reflect the status of the actual processes running on the system, which is adequately covered by the state, exit status, and signal attributes. States such as "pending", "timeout", "cancelled", etc., don't really apply to processes, only to the jobs that start them.

In any case, our plan is to transition away for displaying individual process icons (except in some specific cases, such as debug), since this is inherently non-scalable, so there may not be a processes view in the future.

I would suggest a couple of approaches for SLURM. The ideal approach would be to restrict your processes to the states defined by the state attribute, and use the job status to display additional icons. Alternatively, if you really want to keep the process status, I would suggest defining your own SLURM-specific attribute and use that instead (none of the other resource managers use status). I would also suggest using a new set type attribute on the job element with a base enumerated type rather than a string.

Does this sound ok?

Please note that the last possible date for changes to be included in Helios is June 15, although I would urge you to get any changes in much earlier so that they can be tested.

Regards,
Greg

On May 25, 2010, at 11:45 AM, Randy Roberts wrote:

Greg,

It's only a concern.  If we can make sure to limit the number of actual status values.

One thought...  If this is a SLURM-specific status (and is only being used in the SLURM UI code),
could we define a SLURM-specific attribute definition, only usable by the SLURM UI?

Even so, I would still feel better if it were an enumeration.

R^2
--
Randy M. Roberts
work: (505)665-4285

"In my many years I have come to a conclusion
 that one useless man is a shame, two is a law firm,
 and three or more is a congress."
                      -- John Adams

On May 24, 2010, at 8:16 PM, Greg Watson wrote:

Randy,

The state attribute has a small number of states that must be implemented by all resource managers, and are recognized by the UI code to display particular icons, etc. The status attribute is a string that can be used by resource managers to provide arbitrary resource manager-specific status information. It is displayed as a text string in the generic views, but is also intended for resource manager-specific views (if any were provided) to use. 

We can't add resource manager-specific values to the ProcessAtrributes.State enumeration because these must remain generic as they are implemented by all resource managers. 

Is there any reason you can't use the new attribute sets for the status attribute also?

Thanks,
Greg




On May 24, 2010, at 11:32 AM, Randy Roberts wrote:

Greg, Jie,

I have to apologize!   The method getStatusAttributeDefinition() **WAS** commented out.
I got it confused with getStateAttributeDefinition() in all of my earlier replies.

Jie, why must you use getStatusAttributeDefinition() instead of getStateAttributeDefinition()?
Can't we add more values to the ProcessAttributes.State enumeration?

The reason the Status Attribute was commented out, is that no other code was explicitly using
it, except to set it.  For scalability String-valued process attributes are suspect, since they do
not predictably take on only a small number of distinct values -- a requirement to scalably
store Processes.  I would much prefer using an enumerated process attribute.

Regards,
R^2
--
Randy M. Roberts
work: (505)665-4285

"This is how Liberty dies - with thunderous applause."
                            -- Senator Amidala (Star Wars: Episode III)

On May 24, 2010, at 9:09 AM, Randy Roberts wrote:

It is still being used.  So it shouldn't be commented out.

R^2
--
Randy M. Roberts
work: (505)665-4285

"In my many years I have come to a conclusion
 that one useless man is a shame, two is a law firm,
 and three or more is a congress."
                      -- John Adams

On May 20, 2010, at 7:53 PM, JiangJie wrote:

Hi Greg and Randy,
 
ProcessAttributes.getStatusAttributeDefinition() was commented out in head,
but I restore it back when I submitted new slurm proxy code last night.
 
The defined process state can not reflect the real status of process/job in SLURM.
So I use process status  attribute to report process state provided by SLURM 
(such as job_failed, job_timeout, job_node_fail). These extra statuses are used to 
display corresponding state icons in UI.
 
I'm not very clear why this method got commented out.
Keep it if possible.
 
Jie
 

 

From: g.watson@xxxxxxxxxxxx
Subject: Re: [ptp-dev] Support for SLURM 2.1 committed
Date: Thu, 20 May 2010 16:02:34 -0400
To: ptp-dev@xxxxxxxxxxx

It was commented out in my code also, but now it appears to be back. Weird.

Greg

On May 20, 2010, at 3:46 PM, Randy Roberts wrote:

Unfortunately I'm seeing an error in SLURMRuntimeModelPresentation which is still trying to use ProcessAttributes.getStatusAttributeDefinition(). This method no longer exists, so you might want to check that you've updated to the latest head.

Greg,

This method still exists.  Remember, you use it in calls to the process' parent IPJob.

The definition methods that were removed from ProcessAttributes are:

// public static IntegerAttributeDefinition getPIDAttributeDefinition() {
// return pidAttrDef;
// }

and
// public static IntegerAttributeDefinition getIndexAttributeDefinition() {
// return indexAttrDef;
// }

Regards,
R^2
--
Randy M. Roberts
work: (505)665-4285

"This is how Liberty dies - with thunderous applause."
                            -- Senator Amidala (Star Wars: Episode III)

On May 20, 2010, at 1:25 PM, Randy Roberts wrote:

Randy, I notice that you've just commented out code in ProcessAttributes. Would you please either deprecate the interfaces or remove them completely. Ditto for anywhere else this has been done.

Greg,

I guess I was just a bit uncertain about it.  Now that you've commented on it,
I will remove the commented code.

Thanks,
R^2
--
Randy M. Roberts
work: (505)665-4285

"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."  -- Benjamin Franklin

On May 20, 2010, at 11:38 AM, Greg Watson wrote:

Great! Thanks for the update.

Unfortunately I'm seeing an error in SLURMRuntimeModelPresentation which is still trying to use ProcessAttributes.getStatusAttributeDefinition(). This method no longer exists, so you might want to check that you've updated to the latest head.

Randy, I notice that you've just commented out code in ProcessAttributes. Would you please either deprecate the interfaces or remove them completely. Ditto for anywhere else this has been done.

Thanks,
Greg

On May 20, 2010, at 11:50 AM, JiangJie wrote:


Hi all,

I have committed new ptp_slurm_proxy and related codes to support SLURM_2.1/2.x. 

Major improvements include:
(1) Reimplemented proxy based on SLURM API completely.
(2) Support SLURM-2.1 (and later, if API keeps compatible). 
       Previous versions of SLURM-1.x are not supported anymore.
(3) The topology information of parallel processes is obtained from  job step layout,
       not from "srun" ELF image anymore.
(4) Update job/process/node state in separate threads.
(5) Comply with PTP-4.0 rm proxy protocol.

Any feedback would be appreciated.

Regards,
Jie 

搜索本应是彩色的,快来体验新一代搜索引擎-必应,精美图片每天换哦! 立即试用!
_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev



使用新一代 Windows Live Messenger 轻松交流和共享! 立刻下载! _______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev

_______________________________________________
ptp-dev mailing list
ptp-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/ptp-dev



搜索本应是彩色的,快来体验新一代搜索引擎-必应,精美图片每天换哦! 立即试用!

Back to the top