Is it possible and how to merge identical files between different snapshots in a btrfs file system

backupbtrfscompressionmergesnapshot

I regularly use btrfs's snapshot to backup the whole system, but it's redundant to have identical files between different snapshots. Commonly, for example, I took a snapshot of @ as @_without_install_nvidia_driver, and installed nvidia_driver. A few days later, I updated the system, which resulted in a massive number of file changes. Afterward, a few days later, I found that the nvidia_driver package was unstable, and reverted back to @_without_install_nvidia_driver, but I had to update the system again, and there I realized that identical files existed in both snapshots, because the system had now been updated in both snapshots.

That got me wondering: Is there a way to merge identical files between different snapshots?

Best Answer

You can use the bedup utility to de-duplicate the identical files. Once you've installed it, usage is fairly simple:

# bedup dedup /path/to/btrfs

You may need to set your snapshots writable (btrfs property set -ts /path/to/snapshot ro false) so it can de-duplicate them. You can change them back afterwards.

Note that depending on how many files you have, it could take a while (it first looks for files of the same size, then compares those files... so if you have a bunch of large files of the same size, that can take a bit.)

Finally, you can run it again from time to time and the future runs will be much quicker as it keeps track of the btrfs generation and uses that to skip old files.

Related Question