Mac – Is it safe to thin out Time Machine backups by deduplicating files

time-machine

When I move a large file between folders, I notice that Time Machine takes a long time on the next hourly backup. Apparently, it is quite stupid and considers a file move just like a deletion and an addition of a completely new file…

Since my backups are getting quite large, I'd like to shrink them by removing duplicate files across the backup sets. Would it be safe to simply run fdupes or a similar program on the backup sets? If not, would it be reasonable to instead use some custom script that replaces each found dupe with a hardlink to the duped file?

Best Answer

No, you cannot run most deduplicator applications on a time-machine backup.

Time machine makes extensive use of hard-links internally. Any application that is not aware of this will find hundreds or thousands of false duplicates (which are really hard-links to the same file), and proceed to hose your time machine backup. Apple also seems to do some filesystem-specific stuff in the backup too, which is why it needs to be on a HFS+ volume (either a entire drive, or a .sparseimage on a network share).

A lot of the complexity within the time machine mechanism is poorly or undocumented entirely, and if you break something, time machine will simply not work, without any meaningful documentation. If your deduplicator is hard-link aware, it may work, but it has equal chances of hosing something, from which you cannot recover.

Really, there does not seem to be any decent tools for managing time-machine images, unfortunately.