How to evaluate if it’s worth using deduplication

deduplicationfilesystems

I have a partition where I am considering to use deduplication.

For the profile of it's data I think it will be a good choice. Still, before doing it, I would like to evaluate the impact in a more systematic way than "feeling".

Is there a tool that evaluates the impact of deduplication on a partition? (either file level or block level).

For now I have ubuntu and ext4, but if deduplication proves to be valuable in this situation I am considering using opendedup or lessfs. Any other sugestion, even if that might mean using a different distribution / free *nix.

Best Answer

You didn't specify which filesystem. If you're talking about ZFS, you can use the zdb command to see what effect turning on dedup would have had:

# zdb -S tank
Simulated DDT histogram:

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1      775   96.8M   96.8M   96.8M      775   96.8M   96.8M   96.8M
     2        2    256K    256K    256K        6    768K    768K    768K
     4        3    384K    384K    384K       13   1.62M   1.62M   1.62M
   128        1    128K    128K    128K      158   19.8M   19.8M   19.8M
 Total      781   97.5M   97.5M   97.5M      952    119M    119M    119M

dedup = 1.22, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.22