General Cluster Information

Filesystems

NFS home directory

Your home directory, which is an NFS mounted file system, is one choice for I/O. This space carries the worst statistical performance in terms of I/O speed. This space is visible to all nodes on the clusters through an auto-mounting system. The one exception to this is apexarch, as the protected environment has a separate, isolated home directory space.

 

NFS Scratch 

All interactive nodes have access to several NFS mounted scratch file systems, including /scratch/general/lustre with a 700TB capacity, /scratch/kingspeak/serial with  175TB, and /scratch/lonepeak/serial  which has 33TB.   See the individual guides for information on which of these scratch file systems is mounted on the compute nodes of each cluster. Again, apexarch is the exception, having its own scratch file system.

 

Local disk (/scratch/local)

The local scratch space is a storage space unique to each individual node. The local scratch space is cleaned aggressively, with files older than 1 week being scrubbed. It can be accessed on each node through/scratch/local. This space will be one of the fastest, but certainly not the largest with the amount available varying between the clusters. Users must remove all their files from /scratch/local at the end of their calculation.

It is a good idea to make flows from one storage system to another when you are running jobs. At the start of the batch, job data files should be copied from the home directory to the scratch space, followed by a copy of the output back to the user's home directory at the end of the run. 

It is important to keep in mind that ALL users must remove excess files on their own. Preferably this should be done within the user's batch job when he/she has finished the computation. Leaving files in any scratch space creates an impediment to other users who are trying to run their own jobs. Simply delete all extra files from any space, other than your home directory, when it is not being used immediately.

User Environment

CHPC currently supports two shells : bash and tcshrc.  In addition, we are currently offering two systems for setting the shell environment.

 Historically, we have offered a set of scripts to set the environment via login files which had a number of scripts that could be sourced to create the environment.  This login script enables users to switch on/off (machine-specific) initializations for packages which have been installed on the cluster (e.g. Gaussian, Matlab, TotalView,..), to set cluster-dependent MPI-defaults.The first part of the .tcshrc/.bashrc script determines the machine type on which one is logged in based on a few parameters: the machine's operating system, its IP address, or its UUFSCELL variable (defined at the system level). Upon each login the address list with the CHPC Linux machines is retrieved from the CHPC webserver. In case of a succesful retrieval the address list is stored in the file linuxips.csh (tcsh) orlinuxips.sh (bash). If the CHPC webserver is non-responsive, the .tcshrc/.bashrc script uses the address list from previous sessions, or, in the absence of the latter, issues a warning.
The .tcshrc/.bashrc script looks up the machine's IP address and performs a host-specific initialization.

More recently we started to offer the use of the Lmod module system to control the user environment, providing two login files for each shell choice, one of which sets up a basic environment and the other which a user can customize.  Details can be found on the modules documentation page. Going forward, all new accounts are being created using the modules framework, and all accounts, even those with the older style login files, have been enabled to make use of modules.

Applications

CHPC maintains a number of user applications as well as tools needed to install applications. For some of these we have software documentation pages which can be found in the software documentation section,  We are working on establishing a searchable database of the installed packages, but for now you can look in the locations listed in the next section to see if we have a specific application already installed at CHPC.  Also note that there are several packages which we have installed, e.g., abaqus, ansys, charmm, comsol, AVL, schrodinger, star-CCM+, that are licensed by individual groups and therefore are not accessible outside of that group.

Historically, applications that are not cluster specific have been installed  in /uufs/chpc.utah.edu/sys/pkg, whereas cluster specific applications (most typically due to the use of a cluster specific installation of MPI or another support package) are located in /uufs/$UUFSCELL/sys/pkg, where $UUFSCELL is kingspeak.peaks, ash.peaks, or ember.arches.

Moving forward, we are working to consolidate applications in /uufs/chpc.utah.edu/sys/installdir

Batch System 

The batch implementation on the CHPC clusters is Slurm.

Any process which requires more than 15 minutes run time needs to be submitted through the batch system. The command to submit to the batch system is sbatch , and there is also a way to submit an interactive job:

  • srun --pty -t 1:00:00 -n 4 -N 2 /bin/tcsh -l  

Other options for qsub are given in the Slurm documentation mentioned above.

Each cluster has a hard walltime limit on general resources as well as jobs run as guest on owner resources. On most clusters this is 72 hours, however please see the individual cluster guides for specifics on a given cluster. If you find you need longer than this, please contact CHPC. Users without any allocation or those that have used all of their allocation can still run, but as "freecycle" mode. However, "freecycle" jobs are preemptable, i.e., they are subject to termination if a job with allocation needs the resources. We suggest that you start with some small runs to gauge how long the production runs will take. As a rule of thumb, consider a wall time which is 10-15% larger than the actual run time. If you specify a shorter wall time, you face the risk of your job being killed before finishing. If you set a wall time that is too large, you may face a longer waiting time in the queue.

In the following individual cluster user guides, details of a typical batch script are given.  

Nodes with different core counts are present on most clusters. The number of cores a job is to use can be specified with Slurm constraints.

Individual Cluster User Guides

Last Updated: 11/22/17