Backup Considerations and Tips
On this page we provide information on a few considerations and some tips for users to manage backup of their data. Users are referred to the page on the tool for details on the usage of the tools.
- Determine who should do the backup, keeping in mind the ownership of the backup.
- Also keep in mind that the user making the backup must have access to the data, so make sure all group members set their linux file permissions appropriately
- Determine what data needs to be backed up.
- Do you need the full content of your group space backed up?
- Is there only a subset of the data that needs to be backed up
- If you determine that only a subset of the data needs to be backed up, is it possible to group this data together into a directory (or a few directories) that can be backed up?
- How much data are you backing up – and how long will it take?
- What is the frequency of backup needed?
- Is the data to be backed up fairly static? If so, making a point in time copy may be sufficient. You should keep a minimum of two copies (two “buckets”) in the archive space, keeping the last successful backup intact until the current backup has been completed and verified.
- If the data changes on a daily basis, running full backups periodically, with using sync to add incremental differences would be a good choice
- This goes hand in hand with the amount being backed up – you need to make sure one backup of a dataset finishes before the next starts
- In order to minimize the archive space needed, tar your files (and zip) when appropriate, keeping in mind potential loss of timestamps if you do so.
- Sync vs copy
- When the destination location does not already exist, copy and sync are identical in terms of content (but with copy the creation data is not preserved)
- When the destination already exists with content, they are different – and sync is
a faster option
- Sync will make changes to make the destination identical to the source, including deleting files/folders that no longer exist at the source
- Copy will only copy over files, but will not delete files at the destination if they no longer exist at the source
- Automate your backups using a script and a cron job
- See templates on the rclone page for both an example rclone sync and a cron job to run it on a regular basis
- Remember to check the logs of these runs to confirm that your data was backed up!
- Backups should be performed from the Data Transfer Nodes (DTNs) not from other CHPC
resources.
- Use dtn03.chpc.utah.edu for transfers to/from pando from other CHPC resources
- Between campus and off-campus, use of the science DMZ (dtn01, dtn04, airplane01, airplane02,
ariplane04 or airplane04)-dmz.chpc.utah.edu which bypass the campus firewall
- Note that UBox is an off campus resource.