SlurmBatchScripting - Center for High Performance Computing

#SBATCH Directives

To create a batch script, use your favorite text editor to create a text file that details job requirements and instructions on how to run your job.

All job requirements passed to Slurm are prefaced by #SBATCH directives. The #SBATCH commands are used to pass computational requirements of your job to Slurm, which Slurm then uses to determine what resources to give to your job.

The different #SBATCH directives for your slurm job are:

#SBATCH --account=<youraccount>

Specifies the allocation account (often tied to a research group or class) that will be charged for the resources consumed by this job.

If your group has owner nodes, the account is usually <unix_group>-<cluster_abbreviation> (where cluster abbreviation is np, kp, lp, rw).

There are other types of accounts not associated with a research group. These are typically named for specific partitions, for example owner-guest and <cluster>-gpu accounts.

#SBATCH --partition=<yourpartition>

Designates the cluster queue or partition where the job will run (e.g., lonepeak is a specific set of nodes/resources).

Naming mechanisms include cluster, cluster-gpu, cluster-gpu-guest, cluster-guest, and pi-cl, where cluster is the full name of the cluster and cl is the abbreviated form.

We have our partition names described here.

#SBATCH --qos=<yourqos>

Specifies the appropriate QoS for your partition.

Most partitions only have one QoS, but, depending on use cases, there may be multiple QoS's for one partition. A QoS helps specify the type of resources you are accessing and your level of access to these resources. For instance, one QoS may have a longer max walltime than another QoS.

Unsure which Slurm account, partition, and QoS combination to use? Use the command mychpc batch to list the options available to you.

Try using our tool that helps users find which accounts, partitions and qualities of service you can use when submitting jobs on Center for High Performance Computing systems.

#SBATCH --time=DD-HH:MM:SS

Sets the maximum wall-clock time the job is allowed to run. If the job exceeds this limit, it will be automatically terminated.

DD - Days, HH - Hours, MM - Minutes, SS - Seconds

*Note*There is a walltime limit of 72 hours for jobs on general cluster nodes and 14 days on owner cluster nodes. If your job requires more time than these hard limits, you can email the CHPC at helpdesk@chpc.utah.edu, providing the job ID, cluster, and length of time you would like to extend the job to.

#SBATCH --ntasks=<number-of-cpus>

Requests the total number of cores the job will use. This is commonly used for parallel jobs and is often related to the number of CPUs requested.

#SBATCH --mem=<size>[units]

Requests the total amount of physical memory (RAM) required for the entire job.

#SBATCH --nodes=<number-of-nodes>

This directive specifies the minimum and maximum number of compute nodes required for the job. It tells the scheduler to allocate resources across the specified number of individual physical computers in the cluster.

#SBATCH --gres=gpu:<num-gpus>:<type-gpu>

To utilize a GPU (after specifying a GPU-containing partition), it is necessary that you add #SBATCH --gres=gpu.

Optionally, you can specify the number of GPUs you require by replacing <num-gpus>. You can also optionally specify a particular type of GPU that you require with <type-gpu>.

#SBATCH --constraint

The constraint flag is useful when you require a very specific set of compute resources.

For instance, if you require nodes with a specific architecture, amount of memory, number of cores, or want to target certain owner nodes using an owner-guest partition.

#SBATCH -o slurmjob-%j.out-%N

Specifies the file path for the job's standard output (stdout). The %j and %N variables are replaced with the job ID and the name of the first node used, respectively.

#SBATCH -e slurmjob-%j.err-%N

Specifies the file path for the job's standard error (stderr). Any error messages will be directed to this file, using the job ID and node name for clarity.

Where to Run Your Slurm Job

There are three main places you can run your job: your home directory, /scratch spaces, or group spaces (available if your group has purchased group storage). This will determine where I/O is handled during the duration of your job. Each has its own benefits, outlined below:

Home	Scratch	Group Space
Free	Free	$150/TB without backups for 7 years
Automatically provisioned per user	60 day automatic deletion of untouched files	$450/TB with backups
50 GB soft limit	Two files systems: vast and nfs1	Is shared among your group

The home and group spaces are on Vast storage systems that are not as performant as the /scratch storage spaces. If your job has high I/O and you choose to run it in the home or group spaces, you could crash the storage system. For this reason, we highly recommend setting up your jobs to run in our scratch file systems. It must be noted that files in the CHPC's scratch file systems will be deleted if untouched for 60 days, so be sure to transfer any necessary files back to home or group space.

To run jobs in the CHPC scratch file systems (vast or nfs1), place the following commands in your Slurm batch script. The commands that you use depend on what Linux shell you have.

Unsure? Type ' echo $SHELL ' in your terminal.

Bash

SCRDIR=/scratch/general/<file-system>/$USER/$SLURM_JOB_ID

mkdir -p $SCRDIR

cp <input-files> $SCRDIR

cd $SCRDIR

TCSH

set SCRDIR = /scratch/general/<file-system>/$USER/$SLURM_JOB_ID

mkdir -p $SCRDIR

cp <input-files> $SCRDIR

cd $SCRDIR

Replace <file-system> with either vast or nfs1.
$USER points to your uNID and $SLURM_JOB_ID points to the job ID that Slurm assigned your job.

Note: You have the necessary permissions to make your own directory under your uNID in the scratch file systems.

Putting it all Together: An Example Slurm Script

Below is an example job that combines all of the information from above. In this example below, we will suppose your PI is Frodo Baggins (group ID baggins) and is requesting general user access to 1 lonepeak node with at least 8 cpus and 32GB of memory. The job will run for two hours.

#!/bin/bash
#SBATCH --account=baggins
#SBATCH --partition=lonepeak
#SBATCH --time=02:00:00
#SBATCH --ntasks=8
#SBATCH --mem=32G
#SBATCH -o slurmjob-%j.out-%N
#SBATCH -e slurmjob-%j.err-%N

#set up scratch directory
SCRDIR=/scratch/general/vast/$USER/$SLURM_JOB_ID
mkdir -p $SCRDIR

#copy input files and move over to the scratch directory
cp inputfile.csv myscript.r $SCRDIR
cd $SCRDIR

#load your module - R is an example
module load R/4.4.0

#run your script - for example, running an Rscript would look like
Rscript myscript.r inputfile.csv

#copy output to your home directory and clean up
cp outputfile.csv $HOME
cd $HOME
rm -rf $SCRDIR

NOTE When specifying an account or paritition, you may use either an equals sign or a space before the account or parition name, but you may not use both in the same line. For example, "#SBATCH --account=kingspeak-gpu" and "#SBATCH --account kingspeak-gpu" are acceptable, but "#SBATCH --account = kingspeak-gpu" is not.

For more examples of SLURM jobs scripts see CHPC MyJobs templates.

Submitting your Job to Slurm

In order to submit a job, one has to be logged onto the CHPC systems. Once logged on, job submission is done with the sbatch command in slurm.

For example, to submit a script named SlurmScript.sh, type:

sbatch SlurmScript.sh

NOTE: sbatch by default passes all environment variables to the compute node, which differs from the behavior in PBS (which started with a clean shell). If you need to start with a clean environment, you will need to use the following directive in your batch script:

#SBATCH --export=NONE

This will still execute .bashrc/.tcshrc scripts, but any changes you make in your interactive environment will not be present in the compute session. As an additional precaution, if you are using modules, you should use module purge to guarantee a fresh environment.

Checking the Status of your Job

To check the status of your job, use the squeue command. The output from the squeue command on its own will output all jobs currently submitted to the cluster you are logged onto. You can filter the output of squeue to jobs that only pertain to you in a number of ways:

squeue --me

squeue -u uNID

squeue -j job#

Adding -l (for "long" output) gives more details in the squeue output.

Users can also go to Open Ondemand to check for the status of their job.

Slurm Job Arrays

Slurm arrays enable quick submission of many related jobs. In this case, Slurm provides an environment variable, SLURM_ARRAY_TASK_ID, which differentiates Slurm jobs with an array by a given index number.

For example, if we need to run the same program against 30 different samples, we can utilize Slurm arrays to run the program across the 30 different samples with a naming convention such as sample_[1-30].data using the following script:

#!/bin/bash
#SBATCH -n 1 # Number of tasks 
#SBATCH -N 1 # All tasks on one machine 
#SBATCH -p PARTITION # Partition on some cluster
#SBATCH -A ACCOUNT # The account associated with the above partition
#SBATCH -t 02:00:00 # 2 hours (D-HH:MM) 
#SBATCH -o myprog%A%a.out # Standard output 
#SBATCH -e myprog%A%a.err # Standard error
#SBATCH --array=1-30

./myprogram input_$SLURM_ARRAY_TASK_ID.data

You can also limit the number of jobs that can be running simultaneously to "n" by adding a %n after the end of the array range:

#SBATCH --array=1-30%5

Apart from $SLURM_ARRAY_TASK_ID, Slurm also utilizes a few environmental variables to represent various variables important to Slurm arrays. These include:

%A and %a, which represent the job ID and the job array index, respectively. These can be used in the #SBATCH parameters to generate unique names.
SLURM_ARRAY_TASK_COUNT is the number of arrays.
SLURM_ARRAY_TASK_MAX is the highest job array index value.
SLURM_ARRAY_TASK_MIN is the lowest job array index value.

When submitting jobs that use less than the full CPU count per node, use the shared partitions to allow multiple array jobs on one node. For more information, see the Node Sharing page.

Depending on the characteristics of your job, there may be a number of other solutions you could use, detailed on the running multiple serial jobs page.