Rsync-like tool that supports compression and maintains hardlinks

backupcompressionrsync

I use an rsync-based backup scheme to maintain daily (and soon weekly and monthly as well) "snapshots" of my file server, using rsync's link-dest feature to store hardlinks between the snapshots for files that haven't changed. This works very well: Backing up more than 330 GB of data, an entire snapshot takes up only 1.5 MB on disk if no files changed.

Recently a hard drive failure in this server shocked me into the realization of how fragile I am right now — while I lost no data thanks to these backup snapshots, the backups and the data they back up both live in the same physical box in the office in my apartment; something as simple as a small fire in my apartment building could completely obliterate every last bit (pun intended) of my data!

I have an off-site location that I could be taking external hard drives to, so I intend to implement a rotating off-site backup wherein I would have one hard drive offline and off-site while a second is plugged in and backing up my snapshots, and I would periodically (e.g. once a week) swap them, so that if the worst does happen I lose at most the last weeks' worth of data.

Now here's the rub: I want to backup my snapshots, not the data itself, on the external hard drive. (I failed to notice the failed hard drive for a day, so that day's daily snapshot is all but worthless; having additional snapshots, however, is what saved my bacon, and I want that same level of assurance here as well. Note to self: Actually monitor that handy SMART data…) Tried-and-true cp won't work because it will see individual files, not hard links, increasing storage needed by 7 fold! rsync would work, but I'd like the data on these external drives to be compressed so that I can get away with having smaller drives (I'm targeting 1TB, more than enough for the current 337 GB of data but markedly smaller than the nearly 3TB capacity of my file server; I don't expect to be able to back up all 3 TB to just a 1 TB drive, even with compression, I just want the 1 TB to have a maximum useful lifespan as my data grows).

So, does anyone know of a method by which I could maintain hard-link associations between the snapshots while also compressing the files I'm backing up? The ideal solution would also support an exclusion list so I could simply skip compressing already-compressed files (.zip, .gz, .mp3, .jpg, etc.).

Just for absolute clarity, I'm looking for a scheme where a hard-link in /backup/snapshots/daily.1/file1 that points to /backup/snapshots/daily.0/file1 would get copied over to /mnt/external_hdd/snapshots/daily.1/file1 as a hard-link that points to /mnt/external_hdd/snapshots/daily.0/file1, the latter of which (the actual file itself) is now compressed (e.g. gzip).

Best Answer

See rdup-simple (from rdup). You said you wanted compression, but in case you change your mind, I strongly recommend rsnapshot.

By the way, if two hardlinks point to a file, it's the same file. You can't compress only one of the hardlinks since it's the same underlying data in the file-system.

Related Question