Skip to content

Owner Nodes and Using Slurm Constraints

Slurm jobs submitted to guest partitions, using  #SBATCH --account owner-guest and  #SBATCH --partition cluster-guest (substituting the proper cluster name) are eligible for preemption by jobs submitted by the group that owns the nodes. To help minimize the chance of preemption and avoid wasting time and other resources, nodes owned by groups with (historically) low utilization can be targeted directly in batch scripts and interactive submissions. It is important to note that any constraint suggestions are based solely on historical usage information and are not indicative of the future behavior of any groups.

 

Utilization of Owner Nodes by Cluster

Understanding owner utilization of their nodes can be helpful for users who are interested in shortening the wait time in the Slurm queue of their jobs by accessing the <cluster>-guest and <cluster>-gpu-guest partitions. The CHPC provides handy graphics that detail the use of owner node partitions by the owners.

Information about owner utilization is presented as a heatmap and is generated from Slurm logs. Lighter colors mean fewer nodes with the given node feature were in use by owner groups, which is beneficial for guest jobs, while darker colors mean more of the nodes were being used by the owners. 

Please click on a link below to view utilization of owner nodes by the owners for each cluster:

If the images are not updating, it may be because your browser is caching older versions. Try an uncached reload of the page.

 

How to use Constraint Suggestions

Both the utilization of the owner ndoes and the size of the pool of nodes must be considered when selecting constraints; if an owner group has many nodes available and utilizes only some of them, the remainder will still be available for guest jobs. Selecting constraints for both effective size and owner utilization, then, can help further reduce the likelihood of preemption.

Constraints, specified with the line #SBATCH -C or #SBATCH --constraint, can be used to target specific nodes based on certain constraints a job may have, such as memory, number of CPUs, and even the name of a node or owner node. The ability to use constraints allows for a finer grained specification of resources.

 Features commonly specified are:

  • Core count on node: The core count is denoted as c#, e.g., c16. To request 16 core nodes, do  #SBATCH -C c16 .  
  • Amount of memory per node:  This constraint takes the form of m#, where the number is the amount, in GB, of memory in the node, e.g., m32. To request nodes with exclusively 32GB of memory, do #SBATCH -C m32.
    • IMPORTANT: There is a difference between the use of the memory constraint, #SBATCH -C m32 , versus the batch directive #SBATCH --mem=32000 . When using #SBATCH --mem=32000, you specify the number as it appears in the MEMORY entry from the  si  command, which is in MB.  This will result in the job only being eligible to run on a node with at least this amount of memory and restricts the job to being able to use only 32GB, regardless if the node has more memory than this value.  When using #SBATCH -C m32, the job will only run on a node with 32GB of memory, but the job will have access to all of the memory of the node.
  • Node owner: This can be used as a constraint to target specific group nodes that have low owner use as a guest in order to reduce chances of being preempted. For example, to target nodes used by owner group "ucgd", we can do #SBATCH -C "ucgd". Historical usage (the past 2 weeks) of different owner node groups can be found at CHPC's constraint suggestion page.
  • GPUs: For the GPU nodes, the specified features includes the GPU line, e.g., geforce or tesla , and the GPU type, e.g., a100, 3090, or t4. There is additional information about specifying the GPUs being requested for a job on CHPC's GPU and Accelerator page.
  • Processor architecture: This is currently only on notchpeak and redwood. This is useful for jobs where you want to restrtict the processor architecture to be used on the job. Examples are bwk for Intel Broadwell, skl for Intel Skylake processors, csl for Intel Cascade Lake, icl for Intel Icelake, npl for AMD Naples, rom for AMD Rome, and mil for AMD Milan.

Multiple constraints can be specified at once with logical operators in Slurm directives. This allows for submission to nodes owned by one of several owner groups at a time (which might help reduce queue times and increase the number of nodes available) as well as the specification of exact core counts and available memory.

To select from multiple owner groups' nodes, use the "or" operator; a directive like #SBATCH -C "group1|group2|group3" will select from nodes in any of the constraints listed. By contrast, the "and" operator can be used to achieve further specificity in requests. To request nodes owned by a group and with only some amount of memory, for example, a directive like #SBATCH -C "group1&m256" could be used. (This will only work where multiple node features are associated with the nodes and the combination is valid. To view the available node features, the sinfo aliases si and si2documented on the Slurm page are helpful.)

When using in Open OnDemand, enter only the constraint string into the Constraints text entry, e.g. "group1|group2|group3" .

 

CPU Microarchitecture Constraints

Due to the variety of CPU microarchitectures on some CHPC clusters, each node is identified with a specific three letter constraint that specifies the node's CPU microarchitecture. Use these constraints to restrict runs to certain CPU types. The most common restriction is to use only Intel or only AMD nodes, since some codes don't work when CPUs of both manufacturers are in a single job. For example, to use only AMD nodes, use #SBATCH -C "rom|mil|gen".

Notchpeak

  •  skl Intel Sky Lake microarchitecture (Xeon 51xx or 61xx)  
  •  csl Intel Cascade Lake microarchitecture (Xeon 52xx or 62xx)
  •  icl Intel Ice Lake microarchitecture (Xeon 53xx or 63xx)
  • srp Intel Sapphire Rapids microarchitecture (Xeon 54xx or 64xx)
  •  npl AMD Naples microarchitecture (Zen1, EPYC 7xx1 )
  •  rom AMD Rome microarchitecture (Zen2, EPYC 7xx2)
  •  mil AMD Milan microarchitecture (Zen3, EPYC 7xx3)
  • gen AMD Genoa microarchitecture (Zen4, EPYC 9xx1)

 

More Information on Slurm

Looking for more information on running Slurm at the CHPC? Check out these pages. If you have a specific question, please don't hesitate to contact us at helpdesk@chpc.utah.edu.

Setting up a Slurm Batch Script or Interactive Job Session

Slurm Priority Scoring for Jobs

MPI with Slurm

GPUs with Slurm

Accessing CHPC's Data Transfer Nodes (DTNs) through Slurm

Other Slurm Constraint Suggestions and Owner Node Utilization

Sharing Nodes Among Jobs with Slurm

Personalized Slurm Queries

Moab/PBS to Slurm

Last Updated: 9/4/24