Storage Services at CHPC
CHPC currently offers four different types of storage: home directories, group space, scratch file systems, and an archive storage system. All storage types are accessible from all CHPC resources, excluding the archive storage system. Data on the archive storage space must be moved to one of the other spaces in order to be accessible.
Note that the information below is specific for the General Environment. In the Protected Environment (PE), all four types of storage exist; however, the nature of the storage, pricing, and policies vary in the PE. See the Protected Environment page for more details.
For more information on CHPC data policies, including details on current backup policies, please visit our File Storage Policies page.
Please remember that you should always have an additional copy, and possibly multiple copies, of any critical data on independent storage systems. While storage systems built with data resiliency mechanisms (such as RAID and erasure coding mentioned in the offerings listed below or other, similar technologies) allow for multiple component failures, they do not offer any protection against large-scale hardware failures, software failures leading to corruption, or the accidental deletion or overwriting of data. Please take the necessary steps to protect your data to the level you deem necessary. |
On this page
The table of contents requires JavaScript to load.
Home Directories
By default, each user is provided with a 50 GB general environment home directory
free of charge. To view the current home directory usage and quota status, run the
command mydiskquota
.
This space is not backed up; important data should be copied to a departmental file server or other locations. |
CHPC can provide a temporary increase to your home directory; please reach out via helpdesk@chpc.utah.edu and include the reason that the temporary increase is needed as well as for how long you will need the increase.
Home directories can be mounted on local desktops. See the Data Transfer Services page for information on mounting CHPC file systems on local machines.
Quota Enforcement Policies
The 50 GB quota on this directory is enforced. There is a two-level quota system in place. Once a user has exceeded the 50 GB quota, they have 7 days to clean up their space such that they are using less than 50 GB. If they do not, after 7 days they will no longer be able to write or edit any files until files are cleaned up and the home directory is under the quota. If your home directory grows to 75 GB, you will no longer be able to write any files until files are cleaned up and your home directory is under the quota. When over quota, you will not be able to start a FastX or OnDemand session, as those tasks write to your home directory, but an SSH session can be used to connect and free up space.
To find what files are taking up space in your directory, the command du -h --max-depth=1
will show you the size of each directory; you can run this from your home directory
to see what is taking up space. If your quota is more than 50 GB, it is possible your
home directory is in a shared home space; see below for more information about shared
home spaces.
The output from the command mydiskquota only updates every hour or so, so the output from the command will be outdated immediately
following the deletion of files in your home directory. |
Purchases of Larger Home Directory Space
CHPC also allows CHPC PIs to buy larger home directory storage at a price based on hardware cost recovery. The hardware for the new home directory solution was originally purchased at $900/TB and was put into service in May 2022, described in the Spring 2022 newsletter. The cost of home directory storage now has a prorated cost of $540/TB for the remaining warranty lifetime. The current warranty expires May 2027 and the prorated price is updated every May for the remaining lifetime of the storage. Once purchased, the home directories of all members of the PI's group will be provisioned in this space.
Purchase of home directory space includes the cost of the space on the VAST storage system along with backup of this space. The backup will be to the CHPC object storage, pando, and will be a weekly full backup with nightly incremental backups, with a two week retention window.
If you are interested in this option, please contact us by emailing helpdesk@chpc.utah.edu to discuss your storage needs.
Group-Level Storage
CHPC PIs can purchase general environment group level file storage at the TB-level. CHPC purchases the hardware for this storage in bulk and
then sells it to individual groups in TB quantities, so, depending on the amount of
group storage space you are interested in purchasing, CHPC may have the storage to
meet your needs on hand. Group space is on shared hardware and is not designed for
running jobs that have high IO requirements. Running such jobs on group space can
bog down the system and cause issues for other groups on the hardware. Please refer
to the scratch file system information below.
If interested, a more detailed description of this storage offering is available.
The current pricing is $150/TB for the lifetime of the hardware without backups. The CHPC provides a back up option at $450/TB (original + 1 full copy). Hardware is purchased with a 5-year warranty and we are usually able to obtain an additional 2 years of warranty after purchase. If interested in purchasing group-level storage, please contact at helpdesk@chpc.utah.edu.
Current backup policies can be found at File Storage Policies. The CHPC also provides information on a number of user-driven alternative to our group level storage service: see the User Driven Backup Options section below for information.
Group directories can be mounted on local desktops. See the Data Transfer Services page for information on mounting CHPC file systems on local machines.
Group space is on shared hardware and is not designed for running jobs that have high IO requirements. Running such jobs on group space can bog down the system and cause issues for other groups on the hardware. Please refer to the scratch file system information below.
For group level storage options (project space) in the protected environment, please visit this link.
Scratch File Systems
A scratch space is a high-performance temporary file system for files being accessed and operated on during jobs. It is recommended to transfer data from home directories or group spaces to scratch when running IO-intensive jobs, as the scratch systems are designed for better performance and this prevents group spaces from getting bogged down.
These scratch file systems are not backed up. Files that have not been accessed for 60 days are automatically scrubbed. The CHPC
provides two scratch file systems, available free of charge, on the General Environment
clusters.
Scratch space is not intended for long-term file storage. Files in scratch spaces are deleted automatically after a period of inactivity. |
The current scratch file systems are:
- /scratch/general/nfs1 - a 595 TB NFS system accessible from all general environment CHPC resources
- /scratch/general/vast - 1 PB file system available from all general environment CHPC
resources
- There is a per-user quota of 50 TB on this scratch file system
If you have questions about using the scratch file systems or about IO-intensive jobs, please contact helpdesk@chpc.utah.edu.
Temporary File Systems
/scratch/local
Each node on the cluster has a local disk mounted at /scratch/local. This disk can be used for storign intermediate files during calculations. Because it is local to the node, this will have lower-latency file access. However, be aware that these files are only accessible on the node and should be moved to another shared file system (home, group, scratch) before the end of the job if they are needed after job completion.
/scratch/local can also be used for storing intermediate files during calculation. However, be aware that getting to these files after the job finishes will be difficult since they are local to the compute node and not directly accessible from cluster interactive nodes.
Access permissions to /scratch/local have been set such that users cannot create directories in the top level /scratch/local directory. Instead, as part of the slurm job prolog (before the job is started), a job level directory, /scratch/local/$USER/$SLURM_JOB_ID , will be created. Only the job owner will have access to this directory. At the end of the job, in the slurm job epilog, this job level directory will be removed.
All slurm scripts that make use of /scratch/local must be adapted to accommodate this change. Additional updated information is provided on the CHPC Slurm page.
scratch/local is now software-encrypted. Each time a node is rebooted, this software encryption is re-setup from scratch, purging anything within the content of this space. There is also a cron job in place to scrub /scratch/local of content that has not been accessed for over 2 weeks. This scrub policy can be adjusted on a per host basis. A node owned by a group can opt to have us disable this and it will not run on that host.
/tmp and /var/tmp
Linux defines temporary file systems at /tmp or /var/tmp. CHPC cluster nodes set up temporary file systems as a RAM disk with limited capacity. All interactive and compute nodes also have a spinning disk local storage at /scratch/local. If a user program is known to need temporary storage, it is advantageous to define the location of the temporary storage by setting the environmental variable TMPDIR to point to /scratch/local. Local disk drives range from 40 to 500 GB depending on the node, which is much more than the default /tmp size.
Archive Storage
Pando
CHPC has an object-based archival storage system, specifically Ceph, a distributed object store suite developed at UC Santa Cruz. We are offering an 6+3 erasure coding configuration, allowing for the $150/TB price for the 7-year lifetime of the hardware. In alignment with our current group space offerings, we will operate this space in a condominium-style model by reselling this space in TB chunks.
One of the key features of the archive system is that users manage the archive directly.
Users can move data in and out of the archive storage as needed: they can archive
milestone moments in their research, store an additional copy of crucial instrument
data, and retrieve data as needed. This space is a standalone entity and is not mounted
on other CHPC resources. Pando is available as an endpoint on Globus, which allows
for data transfer from other CHPC resources or local sources.
Pando is available as an endppoint on Globus, which allows for data transfer from other CHPC resources or local sources (see the Data Transfer page for more information). Ceph presents the storage as an S3 endpoint, which allows access via applications that use Amazon's S3 API. GUI tools such as Cyberduck or transmit (for Mac) as well as command-line tools such as s3cmd and rclone can be used to move the data.
Pando is currently the backend storage used for CHPC-provided automatic backups (e.g., home or group spaces that are backed up). As such, groups looking for additional data resiliency that already have CHPC-provided backups should look for other options. See User Driven Backup Options below.
It should also be noted that this archive storage space is for use in the General Environment, and is not for use with regulated data; there is a separate archive space in the Protected Environment.
User-Driven Backup Options
Campus level options for a backup location include Box and Microsoft OneDrive. Note: There is a UIT Knowledge Base article with information on the suitability of the campus level options for different types of data (public/sensitive/restricted). Please follow these university guidelines to determine a suitable location for your data.
Owner backup to University of Utah Box: This is an option suitable for sensitive/restricted data. See the link to get more information about the limitations. If using rclone, the credentials expire and have to be reset periodically.
Owner backup to University of Utah Microsoft OneDrive: As with box, this option suitable for sensitive/restricted data. See the link above to get more information about the limitations.
Owner backup to CHPC archive storage (Pando in the General Environment and Elm in the Protected Environment): This choice, mentioned in the archive storage section above, requires that the group purchase the required space on CHPC's archive storage options.
Owner backup to other storage external to CHPC: Some groups have access to other storage resources, external to the CHPC, whether at the University of Utah or at other sites. The tools that can be used for doing this are dependent on the nature of the target storage.
There are a number of tools, mentioned on our Data Transfer Services page, that can be used to transfer data for backup. The tool best suited for transfers to object storage file systems is rclone. Other tools include fpsync, a parallel version of rsync suited for transfers between typical Linux "POSIX-like" file systems, and Globus, best suited for transfers to and from resources outside of the CHPC.
If you are considering a user driven backup option for your data, CHPC staff are available for consultation at helpdesk@chpc.utah.edu.
Mounting CHPC Storage
For making direct mounts of home and group space on your local machine, see the instructions provided on our Data Transfer Services page.