What is the order for data written to a zfs filesystem on zfs on linux?
The only specific document i found at http://docs.oracle.com/cd/E36784_01/html/E36835/gkknx.html says; When a file is written, the data is compressed, encrypted, and the checksum is verified. Then, the data is deduplicated, if possible.
but if that was true then dedup would not dedup blocks compressed with different compression algorithms.
I tested mysqlf and i believe that the order is the following: dedup, compress, encrypt
.
My test-Setting:
zpool create tank /dev/sdb
zfs create tank/lz4
zfs create tank/gzip9
zfs set compression=lz4 tank/lz4
zfs set compression=gzip-9 tank/gzip9
zfs set dedup=on tank
Output of zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 106K 19,3G 19K /tank
tank/gzip9 19K 19,3G 19K /tank/gzip9
tank/lz4 19K 19,3G 19K /tank/lz4
generate a random file with dd if=/dev/urandom of=random.txt count=128K bs=1024
131072+0 Datensätze ein
131072+0 Datensätze aus
134217728 Bytes (134 MB) kopiert, 12,8786 s, 10,4 MB/s
Output of zpool list on empty pool:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 19,9G 134K 19,9G - 0% 0% 1.00x ONLINE -
Then copy the files to the datasets with different compression algos:
cp random.txt /tank/lz4
cp random.txt /tank/gzip9
Output of zfs list
after copying:
NAME USED AVAIL REFER MOUNTPOINT
tank 257M 19,1G 19K /tank
tank/gzip9 128M 19,1G 128M /tank/gzip9
tank/lz4 128M 19,1G 128M /tank/lz4
Output of zpool list
afer copying:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 19,9G 129M 19,7G - 0% 0% 2.00x ONLINE -
The dedup-ratio is 2.0 after copying the same file to different datasets. In my opinion this means that dedup is done on data-blocks before compression and encryption.
Please could someone verify if this is correct?
Best Answer
It turns out, that http://docs.oracle.com/cd/E36784_01/html/E36835/gkknx.html is right.
My assumption with the random file was incorrect. It seems that ZFS aborts compression if it cannot achieve a certain minimum compression ratio.
quote from https://wiki.illumos.org/display/illumos/LZ4+Compression
For testing i created a textfile from my filesystem with
find / >> tree.txt
.After copying the file to both datasets and then
zpool get dedupratio
did return:Dedup is really the last part in this write chain. Choosing different compression-algorithms will result in poor dedupratio!
Unfortunately my ZoL-version does not support encryption. But it seems that encrypting different datasets could also ruin dedup. Info on encryption: https://docs.oracle.com/cd/E53394_01/html/E54801/gkkih.html