Skip to content

Backup Considerations and Tips 

On this page we provide information on a few considerations and some tips for users to manage backup of their data.  Users are referred to the page on the tool for details on the usage of the tools.

 

  • Determine who should do the backup, keeping in mind the ownership of the backup.
    • Also keep in mind that the user making the backup must have access to the data, so make sure all group members set their linux file permissions appropriately
  • Determine what data needs to be backed up.
    • Do you need the full content of your group space backed up?
    • Is there only a subset of the data that needs to be backed up
      • If you determine that only a subset of the data needs to be backed up, is it possible to group this data together into a directory (or a few directories) that can be backed up?
    • How much data are you backing up – and how long will it take?
    • What is the frequency of backup needed?
      • Is the data to be backed up fairly static? If so, making a point in time copy may be sufficient. You should keep a minimum of two copies (two “buckets”) in the archive space, keeping the last successful backup intact until the current backup has been completed and verified.
      • If the data changes on a daily basis, running full backups periodically, with using sync to add incremental differences would be a good choice
      • This goes hand in hand with the amount being backed up – you need to make sure one backup of a dataset finishes before the next starts
    • In order to minimize the archive space needed, tar your files (and zip) when appropriate, keeping in mind potential loss of timestamps if you do so.
    • Sync vs copy
      • When the destination location does not already exist, copy and sync are identical in terms of content (but with copy the creation data is not preserved)
      • When the destination already exists with content, they are different – and sync is a faster option
        • Sync will make changes to make the destination identical to the source, including deleting files/folders that no longer exist at the source
        • Copy will only copy over files, but will not delete files at the destination if they no longer exist at the source
      • Automate your backups using a script and a cron job
        • See templates on the rclone page for both an example rclone sync and a cron job to run it on a regular basis
        • Remember to check the logs of these runs to confirm that your data was backed up!
      • Backups should be performed from the Data Transfer Nodes (DTNs) not from other CHPC resources.
        • Use dtn03.chpc.utah.edu for transfers to/from pando from other CHPC resources
        • Between campus and off-campus, use of the science DMZ (dtn01, dtn04, airplane01, airplane02, ariplane04 or airplane04)-dmz.chpc.utah.edu which bypass the campus firewall
          • Note that UBox is an off campus resource.
Last Updated: 7/5/23