I have a folder that contain files of current and previous projects that I plan backup using versioned rsync. For more a more robust backup strategy I want to store a monthly snapshot offsite (eg amazon glacier) at regular intervals.
To save space and bandwidth I want to compress the the backup before sending it offsite. However, since only a small fraction of the total number of files change from month to month, sending the whole compressed library each backup will also be a huge waste of bandwidth.
Ideally what I want to do, is to compress the backup into volumes of 500mb (or some other size) and upload them to my offsite storage. Next time I backup, most of these volumes should be identical to the previous backup, except for those containing files that have been changed since the last backup. In this scenario I only need to upload the changed volumes, saving bandwidth (and file write requests).
Is it possible to do what I describe using a combination of tar and gzip (split maybe?). Or other command line tools?
One issue I can imagine is that if a change happens to a file contained in some volume, the content of all the subsequent volumes may be offset, requiring a re-upload of the changed volume and the subsequent volumes. Perhaps its better to segment the volumes by folders somehow?
I would love to hear any input or suggestion you have
Best regards
M
Best Answer
tar
can do this with the--listed-incremental
flag so as described I would probably do that. You can use whatever compressors tar supports to compress it (or just pipe it through an arbitrary compressor). See https://www.gnu.org/software/tar/manual/html_section/tar_39.htmlI'm not sure what sort of projects these are, but if it's code or some other text-based format I'd probably look into using
git
or some other source control system.I should also point out that this is GNU tar. If you are on a BSD or other unix, you might need to install
gnutar
because I don't thinkbsdtar
supports this.