CHPC provides several options for home directory file systems.
General HPC home directories
The General HPC home directory storage system is the default home directory file system that is available to groups free of charge. If your group has not purchased storage or you do not fit into one of the other categories listed below, then this is the space where your CHPC home directory will be provisioned. This file system has a 50 GB per user quota which is enforced. CHPC can provide temporary increases to this space. THIS SPACE IS NOT BACKED UP. It is assumed important data will be copied to a departmental file server or another location.
Owner home directories - New solution as of August 2017
CHPC currently allows CHPC PIs with sponsored research projects to buy-in to storage at a price determined based on cost recovery. The current limit for this space is 1TB/group, and all members of the research group will have home directories on this space. The hardware for the current home directory solution was purchased in the summer of 2017. Initially, it was sold at a price of $1250/TB for the 5 year warranty period of the hardware. The current price is prorated to $1000/B for the remaining warranty lifetime.
THis solution is based on an offering from Dell known as their Compellent solution. In this solution there are two RAID disk based copies, one of which is the primary storage with which users will normally interact. The second copy is used as a fail over, and is effectively a replicated copy of the primary side. In case of hardware issue with the primary copy, the fail over will become the active working copy until repairs can be made. For increased performance, there are solid state drives that will be used in an transparent manner as a first tier for I/O to this space in front of the larger capacity traditional spinning drives. Note that these features, including the fail over copy, will be present for all home directories, including the default 50GB ones provided to users whose groups do not purchase the larger home directory space.
Note that having a redundant copy is not a backup solution as any changes in the primary side will be synced to the secondary side, for example deleting or overwriting a file. THIS SPACE IS BACKED UP. The price of this solution also includes a back up, with nightly incremental and weekly full back ups and a two week retention window.
We will continue to prorate the cost of this storage based on the remaining warranty time. When it is time to refresh the hardware, CHPC will contact all groups who have purchased space about the new pricing policy. Please contact us by emailing email@example.com and request to meet with us to discuss your needs and timing.
Group Level Storage File Systems
CHPC currently allows CHPC PIs with sponsored research projects to buy-in to file storage at a price determined based on cost recovery. A more detailed description of this storage offering is available. The current pricing is $150/TB for the lifetime of the hardware which is purchased with a 5 year warranty. CHPC purchases the hardware for this storage in bulk and then sells it to individual groups in TB quantities, so depending on the amount of group storage space you are interested in purchasing, CHPC may have the storage to meet your needs on hand. Please contact us by emailing firstname.lastname@example.org and request to meet with us to discuss your needs and timing. BY DEFAULT THIS SPACE IS NOT BACKED UP, HOWEVER CHPC PROVIDES A BACK UP OPTION.
NOTE: March 2019. We are no longer offering backup of NEW group spaces to tape. We will continue to provide backups of group spaces for which groups who have already purchased tapes until that group space is retired. Details of the new options for backup of group spaces are given in CHPC's Spring 2019 Newsletter as well as in the Backup section below.
New archive backups of group level storage will be to the Archive Storage discussed below. CHPC will perform the backups on a quarterly basis provided the group purchase enough space on pando to allow for two copies of the data. Contact us at email@example.com to set up any group space backup. CHPC also provides information on a number of user driven alternative to this service; see the section on User Driven Backup Options below.
Scratch File Systems
There are various scratch file systems which are available on the HPC clusters. THE SCRATCH FILE SYSTEMS ARE NOT BACKED UP. This space is provided for users to store intermediate files required during the duration of a job on one of the HPC clusters. On these scratch file system, files that have not been accessed for 60 days are automatically scrubbed. There is no charge for this service.
The current scratch file systems are:
- /scratch/general/lustre - a 700TB lustre parallel file system accessible from all CHPC resources
- /scratch/kingspeak/serial - a 175 TB NFS system accessible from all CHPC resources
- /scratch/general/nfs1 - a 595 TB NFS system accessible from all CHPC resources
Linux defines temporary file system at
/var/tmp where temporary user and system files are stored. CHPC cluster nodes set up temporary
file systems as a RAM disk with limited capacity. All interactive and compute nodes
have also a spinning disk local storage at
/scratch/local. If an user program is known to need temporary storage, it is advantageous to set
environment variable TMPDIR which defines the location of the temporary storage and point it to
/scratch/local. Local disk drives range from 40 to 500 GB depending on the node, which is much more
than the default
/scratch/localcan also be used for storing intermediate files during calculation, however be aware
that getting to these files after the job finishes will be difficult since they are
local to the (compute) node and not directly accessible from cluster interactive nodes.
CHPC now has a new archive storage solution based around object storage, specifically ceph, a distributed object store suite developed at UC Santa Cruz. We are offering an 6+3 erasure coding configuration which results in a price of $140/TB of usable capacity for the 5 year lifetime of the hardware. As we currently do with our group space, we will operate this space in a condominium model by reselling this space in TB chunks.
This space is a stand alone entity, and will not be mounted on other CHPC resources.
One of the key features of the archive system is that users manage the archive directly,
unlike the tape archive option. Users can move data in and out of the archive storage as needed -- they can archive
milestone moments in their research, store an additional copy of crucial instrument
data, and retrieve data as needed. This archive storage solution will be accessible
via applications that use Amazon’s S3 API. GUI tools such as transmit (for Mac) as well as command-line tools such as
s3cmd and rclone can be used to move the data. In addition Globus can be used to access this space; however note that the globus ceph plugin is a new
tool that is still be developed and should be treated as such.
It should also be noted that this archive storage space is for use in the general environment, and is not for use with regulated data; there is a separate archive space in the protected environment.
The backup policy of the individual file systems is mentioned above.
Note: March 2019. CHPC is migrating the backup of group home directory from tape to the disk based archive storage mentioned above. At this same time we started the process of phasing out the backup of group spaces to tape by moving the CHPC managed quarterly archives of any newly purchased spaces to the archive storage.
For additional information on user driven backup options see the next section.
User Driven Backup Options
Owner back up to Google Drive: There is a University agreement with Google that provides for unlimited storage and this is an option that a number of CHPC users already use for backup, using rclone. Please keep in mind that Google Drive is only suitable for public data, it is NOT suitable for sensitive or restricted data. Details can be found on the University’s Google Drive page and CHPC' rclone page. One other consideration is that the google drive storage is owned by an individual, not by a group.
Owner backup to Box: This is an option suitable for sensitive/restricted data. However there is a file size limitation of 15GB. In addition, if using rclone the credentials expire and have to be reset periodically.
Owner backup to pando: This choice, mentioned in the Archive storage section above, is a good option if a group wishes not to use Google Drive, especially if only a subset of the data needs to be backed up or if a different backup frequency is desired.
Owner backup to other storage external to CHPC: Some groups have access to other storage resources, external to CHPC, whether at the University or at other sites. The tools that can be used for doing this are dependent on the nature of the target storage.
There a a number of tools, mentioned on our Data Transfer Services page, that can be used. Several places above we mentioned rclone which is the tool best suited for transfers to object storage file system; others are fpsync, a parallel version of rsync suited for transfers between typical Linux "POSIX-like" file systems, and globus, best suited for transfers to and from resources outside of the CHPC.
In addition we have a page that presents a number of considerations and tips for user driven backups.
Mounting CHPC Storage
For making direct mounts of home and group space on your local machine see the instructions provided on our Data Transfer Services page.
For more information on CHPC Data policies, visit: File Storage Policies