[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[ptp-user] slurm tasks not honoured?
|
Seeing the email about the release of ptp 5.0.5 I updated eclipse and downloaded the proxy zip file recompiled utils, proxy and sdm
all seems fine, but when I run a job, the num tasks is always 1 it seems.
Launching with 16 tasks on one node, it outputs this (note the exception every time on job launch)
SLURM@Local: ptp_slurm_proxy: Job step aborted: Waiting up to 2 seconds for job step to finish.
SLURM@Local: Send Job/Process StateChange Event: state=32772
SLURM@Local: job[15974] iothread exit on EOF/ERROR of stdout fd
SLURM@Local: job[15974] iothread exit on Error/EOF of stderr fd.
SLURM@Local: Send Job/Process StateChange Event: state=4
SLURM@Local: Job[15974] no longer exist in SLURM. Romove it!
SLURM@Local: SLURM_SubmitJob (2):
SLURM@Local: job submit commands:
SLURM@Local: jobTimeLimit=55
SLURM@Local: launchedByPTP=true
SLURM@Local: jobNumProcs=16
SLURM@Local: execPath=/project/csvis/biddisco/eiger/build/pv-os/bin
SLURM@Local: progArgs=-rc
SLURM@Local: progArgs=-ch=148.187.14.220
SLURM@Local: progArgs=--use-offscreen-rendering
SLURM@Local: jobNumNodes=1
SLURM@Local: execName=pvserver
SLURM@Local: jobPartition=stdMem
SLURM@Local: jobSubId=JOB_13297370315374
SLURM@Local: Job[15975] io thread create done.
SLURM@Local: Send Job/Process StateChange Event: state=1
java.lang.NullPointerException
at org.eclipse.ptp.ui.views.MachinesNodesView$JobListener.handleEvent(MachinesNodesView.java:111)
at org.eclipse.ptp.rmsystem.AbstractResourceManagerMonitor.fireJobChanged(AbstractResourceManagerMonitor.java:241)
at org.eclipse.ptp.rmsystem.AbstractResourceManager.fireJobChanged(AbstractResourceManager.java:510)
at org.eclipse.ptp.rtsystem.AbstractRuntimeResourceManager.fireJobChanged(AbstractRuntimeResourceManager.java:145)
at org.eclipse.ptp.rtsystem.AbstractRuntimeResourceManagerMonitor.doUpdateJobs(AbstractRuntimeResourceManagerMonitor.java:988)
at org.eclipse.ptp.rtsystem.AbstractRuntimeResourceManagerMonitor.handleEvent(AbstractRuntimeResourceManagerMonitor.java:348)
at org.eclipse.ptp.rtsystem.AbstractRuntimeSystem.fireRuntimeJobChangeEvent(AbstractRuntimeSystem.java:90)
at org.eclipse.ptp.rtsystem.AbstractProxyRuntimeSystem.handleEvent(AbstractProxyRuntimeSystem.java:368)
at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient.fireProxyRuntimeJobChangeEvent(AbstractProxyRuntimeClient.java:249)
at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient.processRunningEvent(AbstractProxyRuntimeClient.java:677)
at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient.runStateMachine(AbstractProxyRuntimeClient.java:937)
at org.eclipse.ptp.proxy.runtime.client.AbstractProxyRuntimeClient$StateMachineThread.run(AbstractProxyRuntimeClient.java:94)
at java.lang.Thread.run(Thread.java:736)
and doing a scontrol show job ID --details gives this
JobId=15975 Name=pvserver
UserId=biddisco(20569) GroupId=csstaff(1000)
Priority=11025 Account=csstaff QOS=normal
JobState=COMPLETED Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:01:08 TimeLimit=00:55:00 TimeMin=N/A
SubmitTime=12:23:51 EligibleTime=12:23:51
StartTime=12:23:51 EndTime=12:24:59
PreemptTime=NO_VAL SuspendTime=None SecsPreSuspend=0
Partition=stdMem AllocNode:Sid=eiger220:4509
ReqNodeList=(null) ExcNodeList=(null)
NodeList=eiger200
BatchHost=eiger200
NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
Nodes=eiger200 CPU_IDs=1 Mem=0
MinCPUsNode=1 MinMemoryCPU=12000M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=(null)
I suspect the generation of the slurm params is fishy. Is it possible to edit them by hand? (I think there was a template somewhere, but I can't remember/find it).
It's quite possible I'm doing something wrong as I'm new to this.
Any advice welcome.
thanks
JB