General Cluster Information

Filesystems

NFS home directory

Your home directory, which is an NFS mounted file system, is one choice for I/O. This space carries the worst statistical performance in terms of I/O speed; in addition the use of home directory for job I/O has the potential to impact all users of CHPC general resources. This space is visible to all nodes on the general clusters through an auto-mounting system. 

NFS group directory

A second choice for I/O is the CHPC group space IF your group owns a group space.  Group spaces are shares of space on a larger group file system (CHPC purchases file systems as needed and sells shares in TB chunks). This is also an NFS mounted file system, and therefore alos has perforrmanc =e limitations. The use of your group space has the potential to impact the all users from the groups that own a share of a given file system. This space is visible to all nodes on the general environment  clusters through an auto-mounting system. 

Scratch 

All general environement cluster nodes have access to several NFS mounted scratch file systems, including /scratch/general/lustre with a 700TB capacity, /scratch/kingspeak/serial with  175TB, and /scratch/lonepeak/serial  which has 33TB.  

Local disk (/scratch/local)

The local scratch space is a storage space unique to each individual node. The local scratch space is cleaned aggressively, with files older than 1 week being scrubbed. It can be accessed on each node through/scratch/local. This space will be one of the fastest, but certainly not the largest with the amount available varying between the clusters. Users must remove all their files from /scratch/local at the end of their calculation.

It is a good idea to make flows from one storage system to another when you are running jobs. At the start of the batch, job data files should be copied from the home directory to the scratch space, followed by a copy of the output back to the user's home directory at the end of the run. 

It is important to keep in mind that ALL users must remove excess files on their own. Preferably this should be done within the user's batch job when he/she has finished the computation. Leaving files in any scratch space creates an impediment to other users who are trying to run their own jobs. Simply delete all extra files from any space, other than your home directory, when it is not being used immediately.

User Environment

CHPC currently supports two shells: bash and tcshrc.  In addition, we use the Lmod module system to control the user environment.

CHPC provides two login files for each shell choice, the first, .tcshrc/.bashrc , which sets up a basic environment and the second,.custom.csh/.custom.sh, which a user can use to customize their environment.  Details can be found on the modules documentation page. All new accounts are  created using the modules framework, and all accounts, even those with the older style login files, have been enabled to make use of modules.

Applications

CHPC maintains a number of user applications as well as tools needed to install applications. For some of these we have software documentation pages which can be found in the software documentation section,  We also have a searchable database of the installed packages.  Also note that there are several packages which we have installed, e.g., abaqus, ansys, charmm, comsol, star-CCM+, that are licensed by individual groups and therefore are not accessible outside of that group.

Historically, applications that are not cluster specific have been installed  in /uufs/chpc.utah.edu/sys/pkg, whereas cluster specific applications (most typically due to the use of a cluster specific installation of MPI or another support package) are located in /uufs/$UUFSCELL/sys/pkg, where $UUFSCELL is kingspeak.peaks, ash.peaks, or ember.arches.

Moving forward, we are working to consolidate applications in /uufs/chpc.utah.edu/sys/installdir

Batch System 

The batch implementation on the CHPC clusters is Slurm.

Any process which requires more than 15 minutes run time needs to be submitted through the batch system. The command to submit to the batch system is sbatch , and there is also a way to submit an interactive job:

  • srun --pty -t 1:00:00 -n 4 -N 2 /bin/tcsh -l  

Other options for running jobs are given in the Slurm documentation.

Walltime

Each cluster has a hard walltime limit on general resources as well as jobs run as guest on owner resources. On most clusters this is 72 hours, however please see the individual cluster guides for specifics on a given cluster. If you find you need longer than this, please contact CHPC.

With respect to how much wall time to ask for, we suggest that you start with some small runs to gauge how long the production runs will take. As a rule of thumb, consider a wall time which is 10-15% larger than the actual run time. If you specify a shorter wall time, you face the risk of your job running out of wall time and killed before finishing. If you set a wall time that is too large, you may face a longer waiting time in the queue.

Partitions and accounts

Most clusters have general nodes, available to all University affiliates with an allocation, and owner nodes, owned by research groups and available only to the members of that research groups. Both the general and and owner nodes are accessible to everyone, even those without or out of allocation, in a "freecycle" or "guest" modes, respectively. A "freecycle" job runs in the general (CHPC owned) partition, whereas "guest" jobs run on owner (research groups owned) partitions. Both "freecycle" and "guest" jobs are preemptable, i.e., they are subject to termination if a job with allocation needs the resources.

The GPU resources are handled separately.  The general GPU nodes are run without allocation, but you must request to be added to the appropriate accounts. There are also owner GPU nodes which all users can use in guest mode, again with jobs subject to preemption. As there are not many GPU nodes, CHPC requests that only jobs that are making use of the GPUs be run on these resources.

In the table below we list options on resources available to a user based on their allocation and access to owner nodes. Note that although freecycle is listed as one of the options for groups without a general allocation, unless you can automatically restart your calculation, we don't recommend its use since the chances of preemption are very high. Also freecycle is not available if a group has an active allocation. To find out if, and how much general allocation a group has, see CHPC's usage page.

Allocations and node ownership status What resource(s) are available 
No general allocation, no owner nodes Unallocated general nodes
Allocated general nodes in freecycle mode - not recommended
Guest access on owner nodes
General allocation, no owner nodes Unallocated general nodes
Allocated general nodes
Guest access on owner nodes
Group owner nodes, no general allocation Unallocated general nodes
Allocated general nodes in freecycle mode - not recommended
Group owned nodes
Guest access on owner nodes of other groups
Group owner node, general allocation Unallocated general nodes
Allocated general nodes
Group owned nodes
Guest access on owner nodes of other groups

 

The table below lists possible ways to run on CHPC clusters. General account name is the group name, obtained with running groups $USER command (typically PIs last name). The account and partition for owner nodes is typically the group name, although there are exceptions. Available accounts and partitions can be obtained by running sacctmgr -ps list user $USER command. The partitions are not directly listed in the output of this command, but QOS's (Quality of Service) are listed and the name of the partition for a given QOS typically has the same name as the QOS. 

For details on how to target the different partitions and accounts, see the SLURM user's guide.

Execution mode  Cluster  Partition (-p) Account (-A)

General nodes on unallocated clusters -
open to all users

ember
lonepeak

ember
lonepeak

group-name

General nodes on allocated clusters -
if your group has allocation

kingspeak
notchpeak
kingspeak
notchpeak
group-name

General GPU nodes

ember
kingspeak
notchpeak
ember-gpu
kingspeak-gpu
notchpeak-gpu
ember-gpu
kingspeak-gpu
notchpeak-gpu

Freecycle - can only be used if your
group does not have allocation

kingspeak
notchpeak
kingspeak-freecycle
notchpeak-freecycle
group-name
Owner nodes (non-GPU) ember
kingspeak
notchpeak
lonepeak
ash
partition-name-em
partition-name-kp
partition-name-np
partition-name-lp
smithp-ash
partition-name-em
partition-name-kp
partition-name-np
partition-name-lp
smithp-ash (note there are different ash owner accounts)
 Owner GPU nodes kingspeak partition-name-gpu-kp partition-name-gpu-kp
Guest access to owner nodes (non-GPU) ember
kingspeak
notchpeak
lonepeak
ash
ember-guest
kingspeak-guest
notchpeak-guest
lonepeak-guest
ash-guest
owner-guest
owner-guest
owner-guest
owner-guest
smithp-guest (on ash)
Guest  accesss to owner GPU nodes kingspeak kingspeak-gpu-guest owner-gpu-guest

 

In the following individual cluster user guides, details of a typical batch script are given.  

Nodes with different core counts are present on most clusters. The number of cores a job is to use can be specified with Slurm constraints.

Individual Cluster User Guides

Last Updated: 4/13/18