-
Notifications
You must be signed in to change notification settings - Fork 0
SLURM Basics: Running Jobs
The HPC clusters use the SLURM Resource Manager and Scheduler. Below is some basic information.
One way to submit jobs is to create a SLURM script that you submit to SLURM with the sbatch command. Here is a sample Job script:
#!/bin/bash
#SBATCH --job-name=my_job_name # Job name
#SBATCH --partition=haswell # Partition/Queue name
#SBATCH --mail-type=END,FAIL # Mail events
#SBATCH --mail-user=email@maine.edu # Where to send mail
#SBATCH --ntasks=1 # Run a single task
#SBATCH --cpus-per-task=4 # Run with 4 threads
#SBATCH --mem=60gb # Job memory request
#SBATCH --time=24:00:00 # Time limit hrs:min:sec
#SBATCH --output=test_%j.log # Standard output and error log
module load module_name ...
srun program param1 ...
The ntasks and cpus-per-task change depending on if you have a multi-threaded program (one program that uses multiple threads) vs. a multi-process program (multiple processes like with MPI). In the MPI case, you'd set --ntasks=50 to run with 50 processes. Other directives to control how the job is configured:
#SBATCH --ntasks-per-node=4 # Run on a 4 cores per node
#SBATCH --nodes=2 # Run on a 2 nodes
This would be for (for example) an MPI job running with 8 processes.
sbatch: Command to submit a job:
sbatch script-name
The email directives are optional.
squeue: Command to Check all jobs in the queue:
squeue
or to check all jobs in the queue:
squeue -u user-name
Another form of this command is just sq which does the same thing except in a little different format. The main additional information that this command gives is the total number of cores for the job in the second to last column.
sinfo: Command to get the status of all of the Slurm partitions:
sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug up infinite 1 mix node-153
haswell* up infinite 4 down* node-[127-129,139]
haswell* up infinite 15 mix node-[55-58,61,81,90,122-123,125-126,130,140-142]
haswell* up infinite 67 idle node-[59,63-80,82-89,91-121,124,131-138]
haswell-test up infinite 1 idle node-62
skylake up infinite 1 drain* node-148
skylake up infinite 4 mix node-[143-144,149-150]
skylake up infinite 3 idle node-[145-147]
gpu up infinite 5 mix gpu node-[g101,g102,g103,g104,g105]
epyc up infinite 4 mix node-[151,153,155,158,163]
epyc up infinite 1 alloc node-152
epyc up infinite 7 idle node-[154,156-157,159-162,164]
epyc-hm up infinite 2 alloc node-[167-168]
epyc-hm up infinite 2 idle node-[169-170]
scancel: Delete a job:
scancel JOB_ID
checkjob: The checkjob command can be used to get more information about a job or to check on the status of a job:
checkjob JOB_ID
This command mimics the same command name in the Moab scheduler that we used to use. Sample output:
[blackbear@penobscot pi_MPI]$ checkjob 962658
JobId=962658 JobName=parallel_pi_test
UserId=blackbear(1028) GroupId=blackbear(1003) MCS_label=N/A
Priority=10010 Nice=0 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
DerivedExitCode=2:0
RunTime=00:00:10 TimeLimit=00:05:00 TimeMin=N/A
SubmitTime=2022-09-14T12:29:08 EligibleTime=2022-09-14T12:29:08
AccrueTime=2022-09-14T12:29:08
StartTime=2022-09-14T12:29:08 EndTime=2022-09-14T12:34:08 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2022-09-14T12:29:08
Partition=haswell AllocNode:Sid=penobscot:9998
ReqNodeList=(null) ExcNodeList=(null)
NodeList=node-[140-141]
BatchHost=node-140
NumNodes=2 NumCPUs=8 NumTasks=8 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=8,mem=2G,node=2,billing=8
Socks/Node=* NtasksPerN:B:S:C=4:0:*:* CoreSpec=*
Nodes=node-[140-141] CPU_IDs=16-19 Mem=0 GRES_IDX=
MinCPUsNode=4 MinMemoryNode=1G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/backbear/pi_MPI/go.slurm
WorkDir=/blackbear/abol/pi_MPI
StdErr=/blackbear/abol/pi_MPI/parallel_pi_962658.log
StdIn=/dev/null
StdOut=/home/blackbear/pi_MPI/parallel_pi_962658.log
Power=
seff: Command to check the memory and CPU efficiency of a job. This command is mostly useful after a job has completed:
seff JOB_ID
for instance:
[root@penobscot slurm]# seff 962658
Job ID: 962658
Cluster: penobscot
Use of uninitialized value $user in concatenation (.) or string at /bin/seff line 154, <DATA> line 628.
User/Group: /blackbear
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 4
CPU Utilized: 00:00:00
CPU Efficiency: 0.00% of 00:32:08 core-walltime
Job Wall-clock time: 00:04:01
Memory Utilized: 508.00 KB
Memory Efficiency: 0.02% of 2.00 GB
You have mail in /var/spool/mail/root
tail -f: Since you are logged into Katahdin and your job is running Checking output files
In Slurm, the term "Partition" refers to a set of nodes. Other Resource Managers refer to these as Queues. Currently, the list of partitions on the Penobscot cluster is:
-
debug: general debugging of code. Currently just a single node
-
haswell: The largest partition as far as the number of nodes and cores. Around 90 nodes, each with Intel Haswell or Broadwell CPUs with either 24 or 28 cores and 64 GB or 128 GB of RAM
-
skylake: 8 Intel Skylake nodes, each with 36 cores and 256 GB of RAM
-
epyc: a new partition to access the new AMD EPYC3 nodes. These 14 nodes each have 96 cores and 512 GB of RAM
-
epyc-hm: these four nodes have AMD EPYC3 CPUs, 32 cores each node and with 1 TB of RAM
-
gpu: Penobscot only. Five NVIDIA nodes with a variety of GPUs and up to 1 TB of RAM
The list of partitions, along with the current state of the nodes in the partitions can be retrieved with the sinfo command.
When you submit a job to SLURM, the job gets sent to a node or set of nodes and it runs in the background. After you submit the job, you are returned to the shell prompt and can continue on with what you are doing. Occasionally, you might find that you want to interact with the job, directly on the node that it is running. You can do this by using the srun command like:
srun --partition=epyc --ntasks=1 --cpus-per-task=4 --mem=64gb --gres=gpu:1 --time=10:00:00 --pty /bin/bash
This job is asking to run a job in the epyc partition with 4 CPU cores, one GPU and 64 GB of RAM for 10 hours. Once the resources are available, the terminal will be logged into the node that is running the job and you will be at a prompt. From there, you can run commands. This is particularly helpful when you want to use the nvcc command on a GPU node to compile with CUDA.
For more information visit the Advanced Research Computing, Security & Information Management web site.