I have a zfs formatted partition using zfs-fuse for linux (Ubuntu).
I had used it for a while, and then enabled dedup and compression on it (zfs set compression=on/dedup=on). Now I think I have some files that are dedup'ed and compressed, and file that are not yet.
It was OK, but sometimes I was confused. Let's see, following command would consume almost 4GB of my zfs storage:
cp oldfile.4GB newfile.4GB
.. and this would consume almost zero:
cp newfile.4GB newfile.4GB.2
This is because the old file is not yet compressed, so dedup not happened, I think.
My idea is — if I can find old files that are not yet dedup/compressed, I can perform batch copy/rename/remove them to eliminate duplicity and redundancy. But how I can check that?
I know I can re-copy whole contents of my storage should work (even better with checking the time stamp of each file), but I'd be happier if I have zfsstat
-like tool that shows some file properties.
EDIT: Verified jlliagre's tip on my environment.
First, made some dataset and directories:
$ sudo zfs create zfs/test
$ sudo install -d -m 1777 /zfs/test/orig /zfs/test/copy
Created some files:
$ yes > /zfs/test/orig/yes.1s & sleep 1; kill %1
$ dd if=/dev/zero of=/zfs/test/orig/zero.1M bs=1K count=1024
$ dd if=/dev/urandom of=/zfs/test/orig/rand.1M bs=1K count=1024
Turned compression on, and copy above files:
$ sudo zfs set compress=on zfs/test
$ cp /zfs/test/orig/* /zfs/test/copy
Now the directories look like:
$ ls -hil /zfs/test/*
/zfs/test/copy:
total 1.5K
10 -rw-r--r-- 1 kimura kimura 1.0M Mar 2 01:30 rand.1M
11 -rw-r--r-- 1 kimura kimura 40M Mar 2 01:30 yes.1s
12 -rw-r--r-- 1 kimura kimura 1.0M Mar 2 01:30 zero.1M
/zfs/test/orig:
total 42M
9 -rw-r--r-- 1 kimura kimura 1.0M Mar 2 01:29 rand.1M
7 -rw-r--r-- 1 kimura kimura 40M Mar 2 01:29 yes.1s
8 -rw-r--r-- 1 kimura kimura 1.0M Mar 2 01:29 zero.1M
And zdb tool shows some information:
kimura@kimura-desktop:~$ sudo zdb zfs/test
Dataset zfs/test [ZPL], ID 196, cr_txg 108306, 44.2M, 12 objects
Object lvl iblk dblk dsize lsize %full type
0 7 16K 16K 16K 16K 37.50 DMU dnode
-1 1 16K 512 1K 512 100.00 ZFS user/group used
-2 1 16K 512 1K 512 100.00 ZFS user/group used
1 1 16K 512 1K 512 100.00 ZFS master node
2 1 16K 512 1K 512 100.00 ZFS delete queue
3 1 16K 512 1K 512 100.00 ZFS directory
4 1 16K 512 1K 512 100.00 ZFS directory
5 1 16K 512 1K 512 100.00 ZFS directory
6 1 16K 512 1K 512 100.00 ZFS directory
7 3 16K 128K 39.8M 39.8M 100.00 ZFS plain file
8 2 16K 128K 1.00M 1M 100.00 ZFS plain file
9 2 16K 128K 1.00M 1M 100.00 ZFS plain file
10 2 16K 128K 1.00M 1M 100.00 ZFS plain file
11 3 16K 128K 1.41M 39.8M 100.00 ZFS plain file
12 2 16K 128K 0 128K 0.00 ZFS plain file
I can see "yes" and "zero" are well compressed.
Best Answer
You can get deduplication overall statistics with the
zdb -D poolname
command.For per file compression status, it's not very straightforward but you might use this:
This will output lines looking like these ones:
The first column is the inode number, column 5 and 6 represent the size on disk and the file size, and column 7 the percentage of the file that really exists. Any file with different values in 6 and 7 and 100% as 8 are compressed.