Linux – filesystem that keeps only one copy of a file, and other copies are just references

filesystemslinux

The question may be imprecise so I will try to explain it in more detail.

For a number of reasons I have lots of copies of the same file on my Linux file-system. Many of them are quite large.

Say I have /path/to/some.file and copies of this file /other/path/file.name and /yet/another/path/third.copy. I wonder if there is a file-system which would literally make two of these files act as a reference to the original. Naturally, if user modifies one of them, then and only then they become independent files.

PS. I know this can be (partially) accomplished by using links. But I want this feature I tried to explain above to be transparently handled by the file-system.

Best Answer

This feature is called deduplication. None of the popular Linux filesystems (ext*) support it, but apparently, ZFS supports it partially. There is also a table of filesystems listing, among others, deduplication, but there don’t appear to be any popular choices - it is a planned feature for Btrfs, though.

I would guess that periodically checking your filesystem and creating appropriate hard links is the best you can do at the moment, although that does not imply copy-on-write.

Related Question