-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support GPU information and job array query job status for SLURM #671
Comments
It is very scheduler and site configuration specific how GPUs or other 'special' types of nodes are indicated. There is no way for Xenon to know how this is done on each of the sites, as there is no standard way of doing this. It is up to the sysadmin. Some sites use separate queues for GPU nodes (Cartesius for example), while others use node properties to indicate the GPU nodes (DAS5 is doing this). Both are using SLURM, but chose a different way to expose their GPU nodes. So there is some knowledge from the user needed in the application to solve this problem, there is no way around that. For DAS5, there is extra information on each job in the You should note however, that since the GPU nodes are not in a special queue on DAS5, normal CPU jobs can also be scheduled onto a GPU node. So the flag only says something about the job, not necessarily the node it is running on. |
For the second question about job arrays, This seems to be an issue with the SLURM parser not properly recognizing the individual jobs produced by array jobs. I'll see if I can reproduce this problem. |
I cannot seem to reproduce the issue. See #672 for my test |
Context
I am working on a project which aims to provide a user side solution for higher resource utilization on SLURM cluster.
It requires information on pending jobs in the queue and running jobs.
Problem
The interface
JobQueueScheduler.getJobStatus(jobIdentifier)
in this line, returns jobstatus of the job.However, only contains basic information like start time, time limit, required number of nodes. For jobs have GPU requirement, they can not be recognized.
Besides, there is also a problem on querying jobs generated by job array. The job array and the running jobs can be found by
String[] jobIDs=scheduler.getJobs(PartitionName);
. However, when I am trying to get the status of those jobs, there will be error raised saying no such jobs. The job array on the pending has the id like 1080_[5-1024] and jobs on the run have ids like 1080_2.When
JobQueueScheduler.getJobStatus(jobIdentifier)
is invoked, the error raise.Question
Is it possible to provide information about GPU and job array via job status? After all, the implementation of jobstatus maintains a map
schedulerSpecificInformation
. Perhaps the related information can be added to this map. And also job array queries need to fix.The text was updated successfully, but these errors were encountered: