General Cluster Information

Filesystems
User Environment
Applications
Slurm batch system
- Walltime
- Partitions and accounts
Individual Cluster Guides

Filesystems

NFS home directory

Your home directory, which is an NFS mounted file system, is one choice for I/O. This space carries the worst statistical performance in terms of I/O speed; in addition the use of home directory for job I/O has the potential to impact all users of CHPC general resources. This space is visible to all nodes on the general clusters through an auto-mounting system.

NFS group directory

A second choice for I/O is the CHPC group space IF your group owns a group space. Group spaces are shares of space on a larger group file system (CHPC purchases file systems as needed and sells shares in TB chunks). This is also an NFS mounted file system, and therefore also has performance limitations. The use of your group space has the potential to impact all users from the groups that own a share of a given file system. This space is visible to all nodes on the general environment clusters through an auto-mounting system.

Scratch

All general environment cluster nodes have access to several NFS mounted scratch file systems, including /scratch/general/lustre with a 700 TB capacity, /scratch/general/nfs1 which has 595 TB, and /scratch/general/vast with 1 PB.

Local disk (`/scratch/local`)

The local scratch space is a storage space unique to each individual node. The local scratch space is cleaned aggressively, with files older than 1 week being scrubbed. It can be accessed on each node through/scratch/local. This space will be one of the fastest, but certainly not the largest with the amount available varying between the clusters. Users must remove all their files from /scratch/local at the end of their calculation.

It is a good idea to make flows from one storage system to another when you are running jobs. At the start of the batch, job data files should be copied from the home directory to the scratch space, followed by a copy of the output back to the user's home directory at the end of the run.

It is important to keep in mind that ALL users must remove excess files on their own. Preferably this should be done within the user's batch job when he/she has finished the computation. Leaving files in any scratch space creates an impediment to other users who are trying to run their own jobs. Simply delete all extra files from any space, other than your home directory, when it is not being used immediately.

User Environment

CHPC currently supports two shells: bash and tcshrc. In addition, we use the Lmod module system to control the user environment.

CHPC provides two login files for each shell choice, the first, .tcshrc/.bashrc , which sets up a basic environment and the second,.custom.csh/.custom.sh, which a user can use to customize their environment. Details can be found on the modules documentation page. All new accounts are created using the modules framework, and all accounts, even those with the older style login files, have been enabled to make use of modules.

Applications

CHPC maintains a number of user applications as well as tools needed to install applications. For some of these we have software documentation pages which can be found in the software documentation section. Also note that there are several packages which we have installed, e.g., abaqus, ansys, charmm, comsol, star-CCM+, that are licensed by individual groups and therefore are not accessible outside of that group.

Historically, applications that are not cluster specific have been installed in /uufs/chpc.utah.edu/sys/pkg, whereas cluster specific applications (most typically due to the use of a cluster specific installation of MPI or another support package) are located in /uufs/$UUFSCELL/sys/pkg, where $UUFSCELL is kingspeak.peaks, ash.peaks, lonepeak.peaks, or notchpeak.peaks.

Moving forward, we are working to consolidate applications in /uufs/chpc.utah.edu/sys/installdir.

Batch System

The batch implementation on the CHPC clusters is Slurm.

Any process which requires more than 15 minutes run time needs to be submitted through the batch system. The command to submit to the batch system is sbatch , and there is also a way to submit an interactive job:

salloc -t 1:00:00 -n 4 -N 2

Other options for running jobs are given in the Slurm documentation.

Walltime

Each cluster has a hard walltime limit on general resources as well as jobs run as guest on owner resources. On most clusters this is 72 hours, however please see the individual cluster guides for specifics on a given cluster. If you find you need longer than this, please contact CHPC.

With respect to how much wall time to ask for, we suggest that you start with some small runs to gauge how long the production runs will take. As a rule of thumb, consider a wall time which is 10-15% larger than the actual run time. If you specify a shorter wall time, you face the risk of your job running out of wall time and killed before finishing. If you set a wall time that is too large, you may face a longer waiting time in the queue.

Partitions and accounts

Most clusters have general nodes, available to all University affiliates with an allocation, and owner nodes, owned by research groups and available only to the members of those research groups. Both the general and owner nodes are accessible to everyone, even those without or out of allocation, in a "freecycle" or "guest" modes, respectively. A "freecycle" job runs in the general (CHPC owned) partition, whereas "guest" jobs run on owner (research groups owned) partitions. Both "freecycle" and "guest" jobs are preemptable, i.e., they are subject to termination if a job with allocation needs the resources.

The GPU resources are handled separately. The general GPU nodes are run without allocation, but you must request to be added to the appropriate accounts. There are also owner GPU nodes which all users can use in guest mode, again with jobs subject to preemption. As there are not many GPU nodes, CHPC requests that only jobs that are making use of the GPUs be run on these resources.

In the table below we list options for resources available to a user based on their allocation and access to owner nodes. Note that although freecycle is listed as one of the options for groups without a general allocation, unless you can automatically restart your calculation, we don't recommend its use since the chances of preemption are very high. Also freecycle is not available if a group has an active allocation. To find out if, and how much general allocation a group has, see CHPC's usage page.

When using guest mode on owner nodes, you can utilize the slurm features descriptors and constraint directive described on the CHPC Slurm page along with information on past usage of owner nodes to target nodes that have been less heavily used in the recent past. If you do so, please remember that past utilization is not necessarily indicative of future usage patterns.

Allocations and node ownership status	What resource(s) are available
No general allocation, no owner nodes	Unallocated general nodes Allocated general nodes in freecycle mode - not recommended Guest access on owner nodes
General allocation, no owner nodes	Unallocated general nodes Allocated general nodes Guest access on owner nodes
Group owner nodes, no general allocation	Unallocated general nodes Allocated general nodes in freecycle mode - not recommended Group owned nodes Guest access on owner nodes of other groups
Group owner node, general allocation	Unallocated general nodes Allocated general nodes Group owned nodes Guest access on owner nodes of other groups

The table below lists possible ways to run on CHPC clusters. General account name is the group name, obtained with running groups $USER command (typically PIs last name). The account and partition for owner nodes is typically the group name, although there are exceptions. Available accounts and partitions can be obtained by running sacctmgr -ps list user $USER command. The partitions are not directly listed in the output of this command, but QOS's (Quality of Service) are listed and the name of the partition for a given QOS typically has the same name as the QOS. A new alternative to obtain your partition and account information in an easy to read form is to issue the myallocation command.

For details on how to target the different partitions and accounts, see the SLURM user's guide.

Execution mode	Cluster	Partition (-p)	Account (-A)
General nodes on unallocated clusters - open to all users	lonepeak kingspeak	lonepeak kingspeak	group-name
General nodes on allocated clusters - if your group has allocation	notchpeak granite	notchpeak granite	group-name
General GPU nodes	kingspeak notchpeak granite	kingspeak-gpu notchpeak-gpu granite-gpu	kingspeak-gpu notchpeak-gpu granite-gpu
Freecycle - can only be used if your group does not have allocation	notchpeak	notchpeak-freecycle	group-name
Owner nodes (non-GPU)	kingspeak notchpeak lonepeak	name-kp name-np name-lp	name-kp name-np name-lp
Owner GPU nodes	kingspeak notchpeak	name-gpu-kp name-gpu-np	name-gpu-kp name-gpu-np
Guest access to owner nodes (non-GPU)	kingspeak notchpeak lonepeak	kingspeak-guest notchpeak-guest lonepeak-guest	owner-guest owner-guest owner-guest
Guest accesss to owner GPU nodes	kingspeak notchpeak	kingspeak-gpu-guest notchpeak-gpu-guest	owner-gpu-guest owner-gpu-guest

In the following individual cluster user guides, details of a typical batch script are given.

Nodes with different core counts are present on most clusters. The number of cores a job is to use can be specified with Slurm constraints.

Individual Cluster User Guides

Granite User Guide
Notchpeak User Guide
Kingspeak User Guide (no allocation required)
Lonepeak User Guide (no allocation required)
Redwood User Guide (Restricted use: See New Protected Environment)