Deduplication on partition level

deduplicationfilesystems

What are available solutions for block level or more detailed deduplication ?

There are file-based ones – with "Copy-On-Write" approach.

I'm looking for block level "copy-on-write", so I could periodically look for common blocks, or – preferably – parts of files, merge them and flag for CoW use manner.
Is there something like this available, or does it still need to be created ?
I am not sure if Btrfs deduplication is block/file/subpart level ?
There is LessFS, but I'am not sure what level of deduplication does it provide ? Maybe other solution?

Best Answer

As block level deduplication goes, I think ZFS is the uncontested best implementation out currently. It really isn't designed for after-the-fact optimization, because its deduplication (if turned on) is built directly into the read/write functions. Because of this, it can be a bit memory expensive under load, in trying to keep the most relevant portions of the deduplication table in memory, but ZFS is good at restricting itself to consuming not much more than 50% of memory, which depending on quantity of memory installed, could seem quite arbitrary (50% of 2Gb vs 50% of 64Gb, especially if few-if-any user tasks needing memory).

Depending on what you're looking to use it in, you've got some options:

OpenIndiana appears to have some good Desktop and Server options, based on Solaris

FreeBSD (since 9.0) has a pretty advanced version of ZFS (which includes deduplication) built in to it. One notable FreeBSD (then MonoWall) derived distribution is NAS4Free, which makes making a NAS pretty easy.

Linux has a few options, some with dedup, others without. Since you're looking for dedup, the most notable I've seen is zfsonlinux. I'm not sure what their progress is, or how stable their project is, but it definitely looks promising.

As to anything with partial block deduplication, I have seen NOTHING so far that reports an ability to do that.

Related Solutions

Deduplication Scripts Using Btrfs CoW

I wrote bedup for this purpose. It combines incremental btree scanning with CoW-deduplication. Best used with Linux 3.6, where you can run:

sudo bedup dedup

How to evaluate if it’s worth using deduplication

You didn't specify which filesystem. If you're talking about ZFS, you can use the zdb command to see what effect turning on dedup would have had:

# zdb -S tank
Simulated DDT histogram:

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1      775   96.8M   96.8M   96.8M      775   96.8M   96.8M   96.8M
     2        2    256K    256K    256K        6    768K    768K    768K
     4        3    384K    384K    384K       13   1.62M   1.62M   1.62M
   128        1    128K    128K    128K      158   19.8M   19.8M   19.8M
 Total      781   97.5M   97.5M   97.5M      952    119M    119M    119M

dedup = 1.22, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.22

Best Answer

Related Solutions

Deduplication Scripts Using Btrfs CoW

How to evaluate if it’s worth using deduplication

Related Question