Scheduling Jobs at the CHPC with Slurm
Slurm is a scalable, open-source scheduler used by over 60% of the world's top clusters and supercomputers. There are several short training videos about Slurm, including concepts such as batch scripts and interactive jobs.
On this page
The table of contents requires JavaScript to load.
About Slurm
Slurm – Simple Linux Utility for Resource Management, is used for managing job scheduling on clusters. It was originally created by people at the Livermore Computing Center and has grown into a full-fledged open-source software backed up by a large community, commercially supported by the original developers, and is installed in many of the Top500 supercomputers. The Slurm development team is based close by to the University of Utah in Lehi, Utah.
You may submit jobs to the Slurm batch system in two ways:
- Submitting a batch script
- Submitting for an interactive job
Using Slurm to Submit Jobs: #SBATCH Directives
To create a batch script, use your favorite text editor to create a text file that details job requirements and instructions on how to run your job.
All job requirements passed to Slurm are prefaced by #SBATCH directives. The #SBATCH commands are used to pass computational requirements of your job to Slurm, which Slurm uses to determine what resources to give to your job. The two most important parameters that you can include are account and partition information, in the form:
#SBATCH --account=<youraccount>
#SBATCH --partition=<yourpartition>
Accounts are typically named as your group name, which is likely your PI's lastname. If your group has owner
nodes, the account is usually <unix_group>-<cluster_abbreviation> (where cluster abbreviation is np, kp, lp, rw, ash). There are other types of accounts,
typically named for specific partitions. These can include owner-guest, <cluster>-gpu,
notchpeak-shared-short, and smithp-guest.
Partitions are virtual groups of node types. Naming mechanisms include cluster
, cluster-shared
, cluster-gpu
, cluster-gpu-guest
, cluster-guest
, cluster-shared-guest
, and pi-cl
, where cluster is the full name of the cluster and cl is the abbreviated form. We have our partition names described here.
How to Determine which Slurm Accounts you are in
The easiest method to find the accounts and partitions you have access to at the CHPC
is to use the mychpc batch
command. This command will output the cluster, the applicable account and partition
for that cluster, and your allocation status for that partition.
An example would look like the below:
GENERAL
CPU --partition=kingspeak-shared --qos=kingspeak --account=baggins [21% idle]
The above shows a general (i.e. non-preemptable) allocation on the kingspeak cluster under the baggins account within the kingspeak-shared partition. It also indicates how much of the partition is available without a wait - in this example, 21% of the CPUs within the kingspeak-shared partition are idle and available for jobs.
If you notice anything incorrect in the output from the mychpc batch command that you feel should be changed, please let us know.
Other Important #SBATCH Directives
Other important #SBATCH parameters to inform Slurm of include the amount of time your job will run, number of nodes needed, number of cpus/tasks needed, amount of memory needed, and specifications for stdout and stderr files. These are designated as such:
#SBATCH --time=DD-HH:MM:SS #DD is days, HH is hours, MM is minutes, SS is seconds
#SBATCH --nodes=<number-of-nodes>
#SBATCH --ntasks=<number-of-cpus>
#SBATCH --mem=<size>[units]
#SBATCH -o slurmjob-%j.out-%N #stdout file in format slurmjob-SLURM_JOB_ID.out-NODEID
#SBATCH -e slurmjob-%j.err-%N #stderr file in format slurmjob-SLURM_JOB_ID.err-NODEID
*Note*There is a walltime limit of 72 hours for jobs on general cluster nodes and 14 days on owner cluster nodes. If your job requires more time than these hard limits, you can email the CHPC at helpdesk@chpc.utah.edu, providing the job ID, cluster, and length of time you would like to extend the job to.
Where to Run Your Slurm Job
There are three main places you can run your job: your home directory, /scratch spaces, or group spaces (available if your group has purchased group storage). This will determine where I/O is handled during the duration of your job. Each has its own benefits, outlined below:
Home | Scratch | Group Space |
---|---|---|
Free | Free | $150/TB without backups |
Automatically provisioned per user | 60 day automatic deletion of untouched files | $450/TB with backups |
50 GB soft limit | Two files systems: vast and nfs1 | Is shared among your group |
Due to the memory limits in each users home directory, we recommend setting up your jobs to run in our scratch file systems. It must be noted that files in the CHPC's scratch file systems will be deleted if untouched for 60 days.
To run jobs in the CHPC scratch file systems (vast or nfs1), place the following commands in your Slurm batch script. The commands that you use depend on what Linux shell you have. Unsure? Type 'echo $SHELL' in your terminal.
BASH | TCSH |
---|---|
SCRDIR=/scratch/general/<file-system>/$USER/$SLURM_JOB_ID mkdir -p $SCRDIR cp <input-files> $SCRDIR cd $SCRDIR |
set SCRDIR = /scratch/general/<file-system>/$USER/$SLURM_JOB_ID mkdir -p $SCRDIR cp <input-files> $SCRDIR cd $SCRDIR |
Replace <file-system> with either vast or nfs1.
$USER points to your uNID and $SLURM_JOB_ID points to the job ID that Slurm assigned your job.
Running Your Program in Slurm
To run the software/script you have against your input data, simply pass the same commands that you would use at the command line to your Slurm script.
Putting it all Together: An Example Slurm Script
Below is an example job that combines all of the information from above. In this example below, we will suppose your PI is Frodo Baggins (group ID baggins) and is requesting general user access to 1 lonepeak node with at least 8 cpus and 32GB of memory. The job will run for two hours.
#!/bin/bash
#SBATCH --account=baggins
#SBATCH --partition=lonepeak
#SBATCH --time=02:00:00
#SBATCH --ntasks=8
#SBATCH --mem=32G
#SBATCH -o slurmjob-%j.out-%N
#SBATCH -e slurmjob-%j.err-%N
#set up scratch directory
SCRDIR=/scratch/general/vast/$USER/$SLURM_JOB_ID
mkdir -p $SCRDIR
#copy input files and move over to the scratch directory
cp inputfile.csv myscript.r $SCRDIR
cd $SCRDIR
#load your module
module load R/4.4.0
#run your script
Rscript myscript.r inputfile.csv
#copy output to your home directory and clean up
cp outputfile.csv $HOME
cd $HOME
rm -rf $SCRDIR
For more examples of SLURM jobs scripts see CHPC MyJobs templates.
Submitting your Job to Slurm
In order to submit a job, one has to be logged onto the CHPC systems. Once logged
on, job submission is done with the sbatch
command in slurm.
For example, to submit a script named SlurmScript.sh, type:
sbatch SlurmScript.sh
NOTE: sbatch by default passes all environment variables to the compute node, which differs from the behavior in PBS (which started with a clean shell). If you need to start with a clean environment, you will need to use the following directive in your batch script:
#SBATCH --export=NONE
This will still execute .bashrc/.tcshrc scripts, but any changes you make in your
interactive environment will not be present in the compute session. As an additional
precaution, if you are using modules, you should use module purge
to guarantee a fresh environment.
Checking the Status of your Job
To check the status of your job, use the squeue
command. The output from the squeue command on its own will output all jobs currently
submitted to the cluster you are logged onto. You can filter the output of squeue
to jobs that only pertain to you in a number of ways:
squeue --me
squeue -u uNID
squeue -j job#
Adding -l
(for "long" output) gives more details in the squeue output.
Special Circumstances
Slurm Reservations
Upon request we can create reservations for users to guarantee node availability via
an email to helpdesk@chpc.utah.edu. Once a reservation is in place, reservations can be passed to Slurm with the --reservation
flag (abbreviated as -R
) followed by the reservation name.
For policies regarding reservations see the Batch Policies document.
QOS
Every account (found through the 'mychpc batch' command) is associated with at least one QOS, otherwise known as Quality of Service. The QOS dictates a job's base priority. In some cases, there may be multiple QOS's associated with a single account that differ on preemption status and maximum job walltime.
One example of multiple QOS's to a single account is when a user needs to override
the normal 3 day wall time limit. In this case, the user can request access to a special
long QOS that we have set up for the general nodes of a cluster, <cluster>-long
, that allow for a longer wall time to be specified. In order to get access to the
long QOS of a given cluster, send a request with an explanation on why you need a
longer wall time to helpdesk@chpc.utah.edu.
Requesting GPUs
If you would like to request GPU resources, you can find information on the GPUs available at the CHPC as well as information on reserving a GPU via Slurm. Not all software or research benefits from use with GPUs and, therefore, not all CHPC users have access to GPUs at the CHPC. If you need access to our GPUs, you can email us at helpdesk@chpc.utah.edu and explain how your research requires GPUs, at which point we will grant you access.
Slurm Job Arrays
Slurm arrays enable quick submission of many related jobs. In this case, Slurm provides an environment variable, SLURM_ARRAY_TASK_ID, which differentiates Slurm jobs with an array by a given index number.
For example, if we need to run the same program against 30 different samples, we can utilize Slurm arrays to run the program across the 30 different samples with a naming convention such as sample_[1-30].data using the following script:
#!/bin/bash
#SBATCH -n 1 # Number of tasks
#SBATCH -N 1 # All tasks on one machine
#SBATCH -p PARTITION # Partition on some cluster
#SBATCH -A ACCOUNT # The account associated with the above partition
#SBATCH -t 02:00:00 # 2 hours (D-HH:MM)
#SBATCH -o myprog%A%a.out # Standard output
#SBATCH -e myprog%A%a.err # Standard error
#SBATCH --array=1-30
./myprogram input_$SLURM_ARRAY_TASK_ID.data
You can also limit the number of jobs that can be running simultaneously to "n" by adding a %n after the end of the array range:
#SBATCH --array=1-30%5
Apart from $SLURM_ARRAY_TASK_ID, Slurm also utilizes a few environmental variables to represent various variables important to Slurm arrays. These include:
- %A and %a, which represent the job ID and the job array index, respectively. These can be used in the #SBATCH parameters to generate unique names.
- SLURM_ARRAY_TASK_COUNT is the number of arrays.
- SLURM_ARRAY_TASK_MAX is the highest job array index value.
- SLURM_ARRAY_TASK_MIN is the lowest job array index value.
When submitting jobs that use less than the full CPU count per node, use the shared partitions to allow multiple array jobs on one node. For more information, see the Node Sharing page.
Depending on the characteristics of your job, there may be a number of other solutions you could use, detailed on the running multiple serial jobs page.
Interactive Batch Jobs
Submitting for an interactive job can happen interactively on the command line. In order to launch an interactive session on a compute node, use the salloc
command and pass flags to it using the same format for #SBATCH directives:
salloc --time=02:00:00 --ntasks 2 --nodes=1 --account=baggins --partition=lonepeak
The salloc flags can be abbreviated as:
salloc -t 02:00:00 -n 2 -N 1 -A baggins -p lonepeak
CHPC cluster queues tend to be very busy; it may take some time for an interactive
job to start. For this reason, we have added two nodes in a special partition on the
notchpeak cluster that are geared more towards interactive work. Job limits on this
partition are 8 hours wall time, a maximum of ten submitted jobs per user, with a
maximum of two running jobs with a maximum total of 32 tasks and 128 GB memory. To
access this special partition, notchpeak-shared-short
, request both an account and partition under this name, e.g.:
salloc -N 1 -n 2 -t 2:00:00 -A notchpeak-shared-short -p notchpeak-shared-short
Handy Slurm Information
Slurm User Commands
Slurm Command | What it does |
---|---|
sinfo |
reports the state of partitions and nodes managed by Slurm. It has a wide variety of filtering, sorting, and formatting options. For a personalized view, showing only information about the partitions to which you have access, see mysinfo. |
squeue | reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order. For a personalized view, showing only information about the jobs in the queues/partitions to which you have access, see mysqueue. |
sbatch | is used to submit a job script for later execution. The script will typically contain one or more #SBATCH directives. |
scancel | is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step. |
sacct | is used to report job or job step accounting information about active or completed jobs. |
srun | is used to submit a job for execution or initiate job steps in real time. srun has a wide variety of options to specify resource requirements, including: minimum and maximum node count, processor count, specific nodes to use or not use, and specific node characteristics (so much memory, disk space, certain required features, etc.). A job can contain multiple job steps executing sequentially or in parallel on independent or shared nodes within the job's node allocation. |
spart | list partitions and their utilization |
pestat | list efficiency of cluster utilization on per node, per user, or per partition basis.
By default it prints utilization of all cluster nodes. To select only nodes utilized
by an user, run pestat -u $USER . |
Useful Slurm Aliases
Bash to add to .aliases file:
#SLURM Aliases that provide information in a useful manner for our clusters
alias si="sinfo -o \"%20P %5D %14F %8z %10m %10d %11l %32f %N\""
alias si2="sinfo -o \"%20P %5D %6t %8z %10m %10d %11l %32f %N\""
alias sq="squeue -o \"%8i %12j %4t %10u %20q %20a %10g %20P %10Q %5D %11l %11L %R\""
Tcsh to add to .aliases file:
#SLURM Aliases that provide information in a useful manner for our clusters
alias si 'sinfo -o "%20P %5D %14F %8z %10m %11l %32f %N"'
alias si2 'sinfo -o "%20P %5D %6t %8z %10m %10d %11l %32f %N"'
alias sq 'squeue -o "%8i %12j %4t %10u %20q %20a %10g %20P %10Q %5D %11l %11L %R"'
sview GUI Tool
sview is a graphical user interface to view and modify a Slurm state. Run it by typing sview in a FastX (or X-11 forwarded) terminal session. It is useful for viewing partitions, nodes characteristics, and information on jobs. Right clicking on the job, node, or partition allows you to perform actions on them. Use this carefully so as not to accidentally modify or remove your job.
Logging Onto Computational Nodes: Checking Job Stats
Sometimes it is useful to connect to the node(s) where a job runs to monitor the executable
and determine if it is running correctly and efficiently. For that, we allow users
with active jobs on compute nodes to ssh to these compute nodes. To determine the
name of your compute node, run the squeue -u $USER
command, and then ssh to the node(s) listed.
Once logged onto the compute node, you can run the top
command to view CPU and memory usage of the node. If using GPUs, you can view GPU
usage through the nvidia-smi
command.
Other CHPC Documentation on Slurm
Looking for more information on running Slurm at the CHPC? Check out these pages. If you have a specific question, please don't hesitate to contact us at helpdesk@chpc.utah.edu.
Slurm Job Preemption and Restarting of Jobs
Slurm Priority Scoring for Jobs
Running Independent Serial Calculations with Slurm
Accessing CHPC's Data Transfer Nodes (DTNs) through Slurm
Other Slurm Constraint Suggestions and Owner Node Utilization
Sharing Nodes Among Jobs with Slurm
Other Good Sources of Information
- http://slurm.schedmd.com/pdfs/summary.pdf This is a two page summary of common SLURM commands and options.
- http://slurm.schedmd.com/documentation.html Best source for online documentation
- http://slurm.schedmd.com/slurm.html
- http://slurm.schedmd.com/man_index.html
- man <slurm_command> (from the command line)
- http://www.glue.umd.edu/hpcc/help/slurm-vs-moab.html A more complete comparison table between slurm and moab
- http://www.schedmd.com/slurmdocs/rosetta.pdf is a table of slurm commands and their counterparts in a number different batch systems