Notchpeak, started in January 2018, is operated in a condominium fashion with some general CHPC nodes on which groups can get allocation, along with additional nodes owned by different research groups. All users have guest access to the owner nodes in a preemptable fashion.
The General CHPC nodes include:
- 8 GPU nodes each with 32 (Intel XeonSP Skylake) or 40 (Intel XeonSP Cascadelake) cores, 192GB memory, and a mix of P40, V100, and RTX2080Ti GPUs. These are described in more detail on our GPU & Accelerators page.
- 25 dual socket nodes (Intel XeonSP Skylake) with 32 cores each
- 4 nodes with 96 GB memory
- 19 nodes with 192 GB memory
- 2 nodes with 768 GB memory
- 1 dual socket node (Intel XeonSP Skylake) with 36 cores, 768 GB memory
- 2 dual socket nodes (Intel XeonSP Cascadelake) with 40 cores, 192 GB memory
- 2 dual socket AMD Epyc nodes, each with 64 cores, 512 GB memory (note that these are in a special short partition, see below)
Other information on notchpeak's hardware and cluster configuration:
- Mellanox EDR Infiniband interconnect
- 2 general interactive nodes
Added April 2019: Two new AMD processor (Epyc 7601) based nodes now available as compute nodes on notchpeak. Each node has 64 physical cores and 512GB of memory.
Instead of adding these nodes to the general notchpeak partition, we are using them to explore having a “test or debug” queue, with a shorter maximum wall time. We are doing this as we have had several requests for a test queue, and the arrival of these nodes has given us an opportunity to see if there will be sufficient usage of this queue.
These nodes are available for use by all users, regardless if they have access to a general allocation; the use of these nodes will not count against any allocation. To use set both the partition and the account to notchpeak-shared-short. As node sharing is being used – users MUST specify the number of cores and the amount of memory – see https://www.chpc.utah.edu/documentation/software/node-sharing.php for additional details.
In order to maximize throughput of short jobs, and provide access to all users, they have been placed in a separate partition, with node sharing enabled. In addition, the nodes are being run such that they will allow a load of twice the number of physical cores, again to maximize job throughput.
Use of these nodes is limited:
- Maximum wall time is 8 hours
- Maximum jobs in the queue per user is 10
- Maximum running jobs per user is 2
- Maximum cores per user is 32
- Maximum memory per user is 128GB
- The Skylake and Cascadelake processors offer AVX-512 support. See our page on Single Executables for all CHPC Platforms for details on building applications to take advantage of this feature.
CHPC resources are available to qualified faculty, students (under faculty supervision), and researchers from any Utah institution of higher education. Users can request accounts for CHPC computer systems by filling out an account request form. This can be found by following this link: account request form.
Users requiring priority on their jobs may apply for an allocation of wall clock hours per quarter. Users will need to send a brief proposal, using the the allocation form found here.
The notchpeak cluster can be accessed via ssh (secure shell) at the following address:
All CHPC machines mount the same user home directories. This means that the user files on notchpeak will be exactly the same as the ones on other CHPC clusters. The advantage is obvious: users do not need to copy files between machines.
Notchpeak compute nodes mount the following scratch file systems:
As a reminder, the non-restricted scratch file systems are automatically scrubbed of files that have not been accessed for 60 days.
Your environment is setup through the use of modules. Please see the User Environment section of the General Cluster Information page for details in setting up your environment for batch and other applications.
The batch implementation on notchpeak is Slurm.
The creation of a batch script on the Notchpeak cluster
A shell script is a bundle of shell commands which are fed one after another to a
tcsh,..). As soon as the first command has successfully finished, the second command is
executed. This process continues until either an error occurs or the complete array
of individual shell commands has been executed. A batch script is a shell script which
defines the tasks a particular job has to execute on a cluster.
Below this paragraph a batch script example for running in Slurm on the notchpeak cluster is shown. The lines at top of the file all begin with #SBATCH which are interpreted by the shell as comments, but give options to Slurm.
Example Slurm Script for notchpeak:
#SBATCH --time=1:00:00 # walltime, abbreviated by -t
#SBATCH --nodes=2 # number of cluster nodes, abbreviated by -N
#SBATCH -o slurm-%j.out-%N # name of the stdout, using the job number (%j) and the first node (%N)
#SBATCH --ntasks=64 # number of MPI tasks, abbreviated by -n # additional information for allocated clusters
#SBATCH --account=baggins # account - abbreviated by -A
#SBATCH --partition=notchpeak # partition, abbreviated by -p # # set data and working directories
setenv WORKDIR $HOME/mydata
setenv SCRDIR /scratch/notchpeak/serial/UNID/myscratch
mkdir -p $SCRDIR
cp -r $WORKDIR/* $SCRDIR
# load appropriate modules, in this case Intel compilers, MPICH2
module load intel mpich2
# for MPICH2 over Ethernet, set communication method to TCP - for general lonepeak nodes
# see above for network interface selection options for other MPI distributions
setenv MPICH_NEMESIS_NETMOD tcp
# run the program
# see above for other MPI distributions
mpirun -np $SLURM_NTASKS my_mpi_program > my_program.out
For more details and example scripts please see our Slurm documentation. Also, to help with specifying your job and instructions in your slurm script, please review CHPC Policy 2.1.6 notchpeak Job Scheduling Policy.
Job Submission on Notchpeak
In order to submit a job on notchpeak one has to login first into a notchpeak interactive node. Note that this is a change from the way job submission has worked in the past on our other clusters where you could submit from any interactive node to any cluster.
To submit a script named slurmjob.script, just type:
To check the status of your job, use the "squeue" command
For information on compiling on the clusters at CHPC, please see our Programming Guide.