Linux – How to tell if ZFS (zfs-fuse) dedup/compression is applied to a particular file

filesystemslinuxUbuntuzfs

I have a zfs formatted partition using zfs-fuse for linux (Ubuntu).

I had used it for a while, and then enabled dedup and compression on it (zfs set compression=on/dedup=on). Now I think I have some files that are dedup'ed and compressed, and file that are not yet.

It was OK, but sometimes I was confused. Let's see, following command would consume almost 4GB of my zfs storage:

cp oldfile.4GB newfile.4GB

.. and this would consume almost zero:

cp newfile.4GB newfile.4GB.2

This is because the old file is not yet compressed, so dedup not happened, I think.

My idea is — if I can find old files that are not yet dedup/compressed, I can perform batch copy/rename/remove them to eliminate duplicity and redundancy. But how I can check that?

I know I can re-copy whole contents of my storage should work (even better with checking the time stamp of each file), but I'd be happier if I have zfsstat-like tool that shows some file properties.

EDIT: Verified jlliagre's tip on my environment.

First, made some dataset and directories:
$ sudo zfs create zfs/test
$ sudo install -d -m 1777 /zfs/test/orig /zfs/test/copy

Created some files:
$ yes > /zfs/test/orig/yes.1s & sleep 1; kill %1
$ dd if=/dev/zero of=/zfs/test/orig/zero.1M bs=1K count=1024
$ dd if=/dev/urandom of=/zfs/test/orig/rand.1M bs=1K count=1024

Turned compression on, and copy above files:
$ sudo zfs set compress=on  zfs/test
$ cp /zfs/test/orig/* /zfs/test/copy

Now the directories look like:
$ ls -hil /zfs/test/*
/zfs/test/copy:
total 1.5K
10 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:30 rand.1M
11 -rw-r--r-- 1 kimura kimura  40M Mar  2 01:30 yes.1s
12 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:30 zero.1M

/zfs/test/orig:
total 42M
9 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:29 rand.1M
7 -rw-r--r-- 1 kimura kimura  40M Mar  2 01:29 yes.1s
8 -rw-r--r-- 1 kimura kimura 1.0M Mar  2 01:29 zero.1M

And zdb tool shows some information:
kimura@kimura-desktop:~$ sudo zdb zfs/test 
Dataset zfs/test [ZPL], ID 196, cr_txg 108306, 44.2M, 12 objects

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    7    16K    16K    16K    16K   37.50  DMU dnode
        -1    1    16K    512     1K    512  100.00  ZFS user/group used
        -2    1    16K    512     1K    512  100.00  ZFS user/group used
         1    1    16K    512     1K    512  100.00  ZFS master node
         2    1    16K    512     1K    512  100.00  ZFS delete queue
         3    1    16K    512     1K    512  100.00  ZFS directory
         4    1    16K    512     1K    512  100.00  ZFS directory
         5    1    16K    512     1K    512  100.00  ZFS directory
         6    1    16K    512     1K    512  100.00  ZFS directory
         7    3    16K   128K  39.8M  39.8M  100.00  ZFS plain file
         8    2    16K   128K  1.00M     1M  100.00  ZFS plain file
         9    2    16K   128K  1.00M     1M  100.00  ZFS plain file
        10    2    16K   128K  1.00M     1M  100.00  ZFS plain file
        11    3    16K   128K  1.41M  39.8M  100.00  ZFS plain file
        12    2    16K   128K      0   128K    0.00  ZFS plain file

I can see "yes" and "zero" are well compressed.

Best Answer

You can get deduplication overall statistics with the zdb -D poolname command.

For per file compression status, it's not very straightforward but you might use this:

zdb dataset | grep plain

This will output lines looking like these ones:

     8    2    16K   128K  3.03M  5.00M  100.00  ZFS plain file
     9    2    16K   128K  3.03M  5.00M  100.00  ZFS plain file
    10    2    16K   128K  5.00M  5.00M  100.00  ZFS plain file
    11    2    16K   128K  3.03M  6.00M   83.33  ZFS plain file

The first column is the inode number, column 5 and 6 represent the size on disk and the file size, and column 7 the percentage of the file that really exists. Any file with different values in 6 and 7 and 100% as 8 are compressed.

Related Solutions

How to access the contents of a ZFS snapshot without affecting its current data

It's been a while since I played with zfs, but you should be able to use zfs list -t snapshot to find your available snapshots and access the files under a special .zfs directory under your zfs mountpoint.

[~]# zfs list -t snapshot
NAME                       USED  AVAIL  REFER  MOUNTPOINT
mypool                    1.49G   527M   528M  /mnt/zfspool
mypool@snap1                28K      -   993M  -
mypool@snap2                28K      -   993M  -
mypool@snap3                28K      -   993M  -

[~]# cd /mnt/zfspool/.zfs/snapshot/snap1
[snap1]# ls

IIRC, snapshots are already read-only, so attempts to change data in the snapshot directory should fail. If the data changes in the real fs, the snapshot should grow, as it copies the pre-changed data to keep the snapshot consistent.

You would need to zfs clone the snapshot to a new location, in order for you to make edits to the snapshot (at which point, it wouldn't be the snapshot any more).

As I said, though, it's been a while, so test first...

ref: http://www.googlux.com/zfs-snapshot.html

Windows – How to make Windows modify CIFS/ZFS ACLs ‘correctly’

So I've not really dealt much with any of this, and I may be seriously wrong, but here's a few bits that might explain it:

NFSv4 ACLs are similar but not quite like Windows/CIFS/NTFS ACLs.

For example, the Windows ACL format does not distinguish between 'user' or 'group' or 'special' SIDs at all. Since the NAS doesn't know what accounts your Windows system has, it cannot determine which SIDs are users and which are groups – it has to guess.
However, while NFSv4 ACLs do not support POSIX-style 'default ACEs', they do support the Windows/CIFS-style 'inheritable' ACEs; that is, each entry can be inheritable or not.

In the FreeBSD getfacl, you can see the f and d flags which correspond to "inheritable by files" and "inheritable by directories".

There is also an "inherit-only" flag i, which is practically an exact equivalent to 'default' ACEs in POSIX ACLs – that is, the ACE is only inherited, but isn't used for the directory itself.
When creating a file, it is always owned by the user who created it. It's not inheritable.

If the CIFS server were running Windows Server, it'd have an option to make the built-in "Administrators" group the default file owner (again note how it lets the owner be either a user or a group, and lacks 'group ownership').

Best Answer

Related Solutions

How to access the contents of a ZFS snapshot without affecting its current data

Windows – How to make Windows modify CIFS/ZFS ACLs ‘correctly’

Related Question