Skip to content

Granite User Guide 

On this page

The table of contents requires JavaScript to load.

Hardware Overview

The Granite cluster began development in Summer of 2024 and is operated in a condominium-style fashion; it currently contains only CHPC-owned nodes due to its newness, but will host nodes owned by different research groups. Users can access all nodes on granite with their allocation or have guest access in a preemptable fashion.

The General CHPC nodes include:

  • 3 GPU nodes, each with 64 (Genoa AMD) cores, 1 with 768GB of memory and 2 with 384GB of memory. One node contains H100NVL GPUs and 2 nodes contain L40S GPUs. These nodes are in the granite-gpu partition, require an allocation to access, and are described in more detail on our GPU & Accelerators page. 
  • 5 single socket nodes (AMD Genoa) with 96 cores and 768GB of memory each.
  • 3 dual socket nodes (AMD Genoa) with 96 cores and 1536GB of memory each.

Other information on Granite's hardware and cluster configuration:

  • RDMA over Converged Ethernet at 100GB
  • 2 general interactive nodes

 

Important Differences from other CHPC Clusters

  • There is no InfiniBand network on Granite. We are instead utiziling RDMA over Converged Ethernet at 100GB.
  • There is no separation between regular and shared partitions anymore. The granite partition will be in a shared state by default.
  • In addition to SLURM --account and --partition options, --qos option has to be used as well. The value of the --qos value is in most cases the same as the --partition, use the new "mychpc batch" command, which replaces previous "myallocation", to list the account/partition/qos values available to you.
  • Due to the default shared state, users will be required to specify core counts and amount of memory in their Slurm batch scripts. If users do not specify core counts and memory, it will default to 1 core and 2GB of memory.
  • All users have access to GPUs with granite through the granite-gpu and granite-gpu-freecycle partitions.
  • Access to the granite-gpu partition will require an allocation.
  • Due to the high usage of GPU nodes, we now monitor GPU usage. Any job that reserves a GPU, but does not utilize that GPU, will be cancelled by the CHPC.

 

Granite Usage

CHPC resources are available to faculty, students under faculty supervision, and researchers from any Utah institution of higher education. Users can request accounts for CHPC computer systems by filling out an account request form.  

Granite will be run without allocation, just as the Kingspeak and Lonepeak clusters, until January 1, 2025. Groups that want to have an allocation starting at this date will have to complete the allocation requests form before December 1st, 2024. Starting in January, Granite will require an allocation for jobs running in a non-preemptive state. Allocations will be required to access both the general CPU and the GPU nodes. Users requiring an allocation may apply for an allocation of wall clock hours per quarter. Users will need to send a brief proposal, with links to the allocation form and due dates for allocations found here.

 

Granite Access and Environment

The granite cluster has two interactive (login) nodes that can be accessed via ssh (secure shell) at the following addresses:

  • granite1.chpc.utah.edu
  • granite2.chpc.utah.edu
  • granite.chpc.utah.edu (will randomly assign you to granite1 or granite2)

Arbiter is running on the interactive nodes as interactive nodes are not meant for computational workloads. If you violate this policy, Arbiter will send you warnings and increasingly throttle your performance until your job is moved to compute nodes.

All CHPC machines mount the same user home directories. This means that the user files on Granite will be exactly the same as the ones on other CHPC general environment clusters (Notchpeak, Kingspeak, Lonepeak, Ash).

Granite compute nodes mount the following scratch file systems:

  • /scratch/general/nfs1
  • /scratch/general/vast

These scratch file systems are automatically scrubbed of files that have not been accessed for 60 days.

Your environment is setup through the use of modules. Please see the User Environment section of the General Cluster Information page for details in setting up your environment for batch and other applications.

Last Updated: 11/7/24