Why is moving files between btrfs subvolumes an expensive operation

benchmarkingbtrfsfile-transfer

From what I understand, btrfs subvolumes share the same file system "storage", so I was surprised to know that moving files between different subvolumes is an expensive operation, like moving between different filesystems (copy + delete).

I was especially surprised when someone suggested this work-around: reflink-copy files between subvolumes, then delete the original ones. This is said to be a cheap operation (moving around metadata only). How is that different subvolumes can share data blocks when using COW, but not in the should-be easier operation of moving data?

Best Answer

How is that different subvolumes can share data blocks when using COW, but not in the should-be easier operation of moving data?

mv uses the rename syscall to attempt the move. btrfs's kernel rename impl detects the cross subvolumes move and explicitly disallows this (even if under the same mount point):

/* we only allow rename subvolume link between subvolumes */
if (old_ino != BTRFS_FIRST_FREE_OBJECTID && root != dest)
    return -EXDEV;

This probably has to do with subvolume inode accounting and the code paths these operations take. The reflink-copy is actually creating new metadata (but the data itself is CoW) accounted in the new subvolume. In theory they probably could make rename "move" the metadata by doing something similar to what copy --reflink followed by rm source does... simply no one has taken the effort to do it.

Option 1 - Dumb data copy then change UUID

Ensure that source partition is unmounted and will not be automounted.

Use either dd (slow, dumb) or partclone.btrfs -b -s /dev/src -o /dev/target

Use btrfstune -u to change UUID after copy and before mounting.

Data loss warning: Do NOT try to (auto)mount either original or copy until the UUID has changed

Option 2 - `btrfs-clone`

I have not personally tried btrfs-clone, but it purports to clone an existing BTRFS file system to a new one, cloning each subvolume in order.

Linux – btrfs: browsing subvolumes

Keep in mind the Btrfs directory (and subvolumes) tree on your device is conceptually different than the directory structure in the OS. The root of either one is denoted / but they are different.

The @ subvolume is identified within the Btrfs filesystem itself as @ (or /@) but this path is not directly available in your OS. I guess the subvolume is mounted to / which is the root of your directory tree as seen by the OS and programs (note: mount namespaces aside).

Similarly @home is mounted under /home.

The output of mount command in my Kubuntu contains (among other lines):

/dev/sda1 on / type btrfs (rw,relatime,ssd,space_cache,subvolid=1902,subvol=/@)
/dev/sda1 on /home type btrfs (rw,relatime,ssd,space_cache,subvolid=258,subvol=/@home)

So my setup is identical as yours: /@ subvolume from Btrfs tree becomes / in the OS tree. /@home subvolume from Btrfs tree becomes /home in the OS tree.

But I also have access to the entire Btrfs tree:

/dev/sda1 on /mnt/ssd type btrfs (rw,relatime,ssd,space_cache,subvolid=5,subvol=/)

This means the root (/) of the Btrfs tree is available as /mnt/ssd in my OS. From there I can peek into every subvolume and directory. I set this mountpoint up by myself, exactly to be able to see and manage the entire Btrfs structure. The relevant line in my /etc/fstab is as follows:

UUID=<UUID of my /dev/sda1 here>    /mnt/ssd            btrfs   defaults,subvol=/       0   2

Even without the above line I could still mount the root Btrfs volume manually:

mount -o rw,relatime,ssd,space_cache,subvol=/ /dev/sda1 /mnt/ssd

The main conclusion is you should mount the root of your Btrfs filesystem somewhere, with subvol=/ option. This way you gain access to the filesystem in its entirety.

Note it's a good idea not to mount Btrfs / as your OS /. If such mounting was the case, you had /etc, /bin etc. directories directly under your Btrfs / along with subvolumes like /timeshift-btrfs. In your OS all these entries would appear under / after mounting the Btrfs / to the OS /.

By deriving your OS's root tree from Btrfs /@ you keep it tidy. You (and/or proper tools) organize subvolumes outside Btrfs /@, while the OS keeps the majority of its / in Btrfs /@. Majority, because e.g. in my case /mnt/ssd/@/proc is just an empty directory (after Btrfs /@ is mounted as /, the proc filesystem is available in the OS's /proc); the same for /mnt/ssd/@/home (after Btrfs /@ is mounted as /, the Btrfs /@home subvolume gets mounted at what's now the OS's /home).

Best Answer

Related Solutions

How to copy a btrfs filesystem

Option 1 - Dumb data copy then change UUID

Option 2 - btrfs-clone

Linux – btrfs: browsing subvolumes

Related Question

Option 2 - `btrfs-clone`