Why is moving files between btrfs subvolumes an expensive operation

benchmarkingbtrfsfile-transfer

From what I understand, btrfs subvolumes share the same file system "storage", so I was surprised to know that moving files between different subvolumes is an expensive operation, like moving between different filesystems (copy + delete).

I was especially surprised when someone suggested this work-around: reflink-copy files between subvolumes, then delete the original ones. This is said to be a cheap operation (moving around metadata only). How is that different subvolumes can share data blocks when using COW, but not in the should-be easier operation of moving data?

Best Answer

How is that different subvolumes can share data blocks when using COW, but not in the should-be easier operation of moving data?

mv uses the rename syscall to attempt the move. btrfs's kernel rename impl detects the cross subvolumes move and explicitly disallows this (even if under the same mount point):

/* we only allow rename subvolume link between subvolumes */
if (old_ino != BTRFS_FIRST_FREE_OBJECTID && root != dest)
    return -EXDEV;

This probably has to do with subvolume inode accounting and the code paths these operations take. The reflink-copy is actually creating new metadata (but the data itself is CoW) accounted in the new subvolume. In theory they probably could make rename "move" the metadata by doing something similar to what copy --reflink followed by rm source does... simply no one has taken the effort to do it.

Related Question