What does a rmlint’s “clone” for btrfs do

btrfsdeduplicationioctlreflinkrmlint

I was reading the rmlint manual, and one of the duplicate handlers are clone and reflink:

· clone: btrfs only. Try to clone both files with the BTRFS_IOC_FILE_EXTENT_SAME ioctl(3p). This will physically delete duplicate extents. Needs at least kernel 4.2.

· reflink: Try to reflink the duplicate file to the original. See also –reflink in man 1 cp. Fails if the filesystem does not support it.

What exactly does this clone do, and how is it different from a reflink? What does the BTRFS_IOC_FILE_EXTENT_SAME ioctl do?

Best Answer

The differences are somewhat subtle.

Reflink deletes the duplicate file and creates a new file in its place which is a clone of the original file. The metadata of the duplicate is lost, although rmlint does its best to preserve the metadata via some trickery with touch -mr.

Clone uses the BTRFS_IOC_FILE_EXTENT_SAME ioctl (or, in the latest version, the FIDEDUPERANGE ioctl) which asks the kernel to check if the files are identical, if so then make them share the same data extents. They keep their original metadata. It's arguably safer than reflink because it's done atomically by the kernel, and because it checks that the files are still identical.

Related Solutions

How to clone btrfs filesystem into different medium preserving snapshots’ sharing data

I asked a similar question 2 years ago.

However in my case, I was only planning to copy a single device onto raid0.

I eventually found a solution. At the time you couldn't convert from raid0 to raid10, but it looks like that since kernel 3.3, you can now. So that solution may work for you in the end.

A problem with that approach is that it copies the fsuid. Which means you can't mount both the FS and its copy on the same machine. At the time, there was no tool to change the fsuid of a FS, but it might have changed now.

The idea is to add a copy-on-write layer on top of the original device so that it can be written to, but any modification is done somewhere else which you can discard later on. That means you need additional storage space (for instance on an external drive).

Then mount that COW'd FS instead of the original, add the devices for the FS copy and remove the COW's device.

For copy-on-write, you can use the device mapper.

For the disposable copy on write area, here I use a loop device.

Let's say you want to clone /dev/sda onto /dev/sd[bcde]:

Create the COW back store:

truncate -s 100G /media/STORE/snap-store
losetup /dev/loop0 /media/STORE/snap-store

Now unmount the origin FS if mounted and modprobe -r btrfs to make sure it's not going to interfere and make it forget its device scan.

Then make the COW'd device:

echo "echo 0 $(blockdev --getsize /dev/sda) snapshot /dev/sda /dev/loop0 N 8 | dmsetup create cowed

Now /dev/mapper/cowed is like /dev/sda except that anything written to it will end up in /dev/loop0 and /dev/sda will be untouched.

Now, you can mount it:

mount /dev/mapper/cowed /mnt

Add the other devices:

btrfs dev add /dev/sd[bcde] /mnt

And remove the old one:

btrfs dev del /dev/mapper/cowed /mnt

When that's over, you may want to shutdown and unplug or make /dev/sda readonly as because it's got the same fsuid as the other ones, btrfs might still mess up with it.

Now, if I understand correctly, assuming you've got recent btrfs-prog, you should be able to do a:

btrfs balance start -d convert=raid10 /mnt

To convert to raid10. In theory, that should make sure that every data chunk is copied on a least 2 disks.

I would strongly recommend that you do tests on a dummy btrfs on loop devices first as all that is from memory and I might have gotten it wrong (see for instance my initial answer before my edit).

Note that since kernel 3.6, btrfs implements send/receive a bit like in zfs. That might be an option for you.

Why does “cp -R –reflink=always” perform a standard copy on a btrfs filesystem

cp --reflink=always is almost certainly working correctly. If it weren't, you would be getting an error. By design, that's the difference between --reflink=always and --reflink=auto. The error would look like this:

# Filesystem that does not support the feature at all
cp: failed to clone `xx' from `yy': Inappropriate ioctl for device

# Filesystem that does support it, but copy across filesystems
cp: failed to clone `xx' from `yy': Invalid cross-device link

Are you copying a directory structure with lots of small files? In that case cp still has to create every directory and open and close every file, so it will still take time, unlike btrfs subvolume snapshot. That most likely explains the time it takes to perform the operation.

Best Answer

Related Solutions

How to clone btrfs filesystem into different medium preserving snapshots’ sharing data

Why does “cp -R –reflink=always” perform a standard copy on a btrfs filesystem

Related Question