How does enabling btrfs quotas impact the system

btrfs

I use btrfs snapshots quite a bit, and one of the things that interests me is how much space a given snapshot takes up – or to be more specific – how much exclusive data a btrfs snapshot is taking up, so that I know if I delete the snapshot, how much space I would free up.

The only way I know of finding out this information is by enabling btrfs quotas with

# btrfs quota enable /

and running

# btrfs qgroup show /

This is very effective. The thing is – I don't actually use btrfs quotas for anything other than this, which means I have the choice of either always having btrfs quotas enabled, or enabling them temporarily just before running the "btrfs qgroup show /" command, and immediately disabling them. Also, last I heard, btrfs quotas were still experimental (am I wrong?).

Basically what I want to know is:

Does keeping btrfs quotas enabled impact system stability?
Does keeping btrfs quotas enabled impact system performance?

I realize that the "to be safe" answer is to only enable quotas when I need them, but I'm actually making a script that runs "btrfs qgroup show /" and prints the output in a more readable form, and adding code that enables and disables quotas all the time slows down the script considerably, and adds complexity, so I'd rather keep quotas enabled all the time, but I have no clue if there's a downside to it.

Best Answer

I run use btrfs for my personal NAS. It's a 3.7T filesystem with over a thousand snapshots. I use the snapshots to sync backups to external drives. For my use case, enabling quotas has detrimental effects on system stability and performance. BTRFS transactions can become stalled for hours doing quota calculations. This causes any process that touches that filesystem to hang in uninterruptible disk sleep. Even ls or df will hang and become unkillable until the quota calculations complete.

I think if I were to use far fewer snapshots I would not experience this problem. Quotas do seem to perform tolerably well for some people's workloads, just not mine.

Related Solutions

How to clone btrfs filesystem into different medium preserving snapshots’ sharing data

I asked a similar question 2 years ago.

However in my case, I was only planning to copy a single device onto raid0.

I eventually found a solution. At the time you couldn't convert from raid0 to raid10, but it looks like that since kernel 3.3, you can now. So that solution may work for you in the end.

A problem with that approach is that it copies the fsuid. Which means you can't mount both the FS and its copy on the same machine. At the time, there was no tool to change the fsuid of a FS, but it might have changed now.

The idea is to add a copy-on-write layer on top of the original device so that it can be written to, but any modification is done somewhere else which you can discard later on. That means you need additional storage space (for instance on an external drive).

Then mount that COW'd FS instead of the original, add the devices for the FS copy and remove the COW's device.

For copy-on-write, you can use the device mapper.

For the disposable copy on write area, here I use a loop device.

Let's say you want to clone /dev/sda onto /dev/sd[bcde]:

Create the COW back store:

truncate -s 100G /media/STORE/snap-store
losetup /dev/loop0 /media/STORE/snap-store

Now unmount the origin FS if mounted and modprobe -r btrfs to make sure it's not going to interfere and make it forget its device scan.

Then make the COW'd device:

echo "echo 0 $(blockdev --getsize /dev/sda) snapshot /dev/sda /dev/loop0 N 8 | dmsetup create cowed

Now /dev/mapper/cowed is like /dev/sda except that anything written to it will end up in /dev/loop0 and /dev/sda will be untouched.

Now, you can mount it:

mount /dev/mapper/cowed /mnt

Add the other devices:

btrfs dev add /dev/sd[bcde] /mnt

And remove the old one:

btrfs dev del /dev/mapper/cowed /mnt

When that's over, you may want to shutdown and unplug or make /dev/sda readonly as because it's got the same fsuid as the other ones, btrfs might still mess up with it.

Now, if I understand correctly, assuming you've got recent btrfs-prog, you should be able to do a:

btrfs balance start -d convert=raid10 /mnt

To convert to raid10. In theory, that should make sure that every data chunk is copied on a least 2 disks.

I would strongly recommend that you do tests on a dummy btrfs on loop devices first as all that is from memory and I might have gotten it wrong (see for instance my initial answer before my edit).

Note that since kernel 3.6, btrfs implements send/receive a bit like in zfs. That might be an option for you.

How much space would be freed by removing a btrfs subvolume

You should take a look at btrfs quota and btrfs qgroups (quota groups).

Basically qgroups do exactly what you requested, they track how much space is allocated by subvolumes. To enable qgroup functionality for a btrfs filesystem you have to

# btrfs quota enable /path/to/btrfs/filesystem

However, before you do this be warned that this triggers a complete re-computation of the qgroup data which will take some time especially for large filesystems with many subvolumes. This process runs asynchronously in the background. You can already check the status of the qgroups with

# btrfs qgroup show /path/to/btrfs/filesystem

This will give you some output like this:

WARNING: rescan is running, qgroup data may be incorrect
qgroupid         rfer         excl
--------         ----         ----
0/5         843.69GiB     61.91MiB
0/4881      811.06GiB      9.34GiB
0/7990      867.32GiB    329.91MiB
0/8400      867.17GiB     37.64MiB

(The warning in the first line is present as long as the rescan is still running.)

Btrfs automatically creates a qgroup for each subvolume. In this case there are three subvolumes with subvolume IDs 4881, 7990, and 8400. The part before the forward slash is the level of the qgroup. Each subvolume qgroup is on level 0. Additionally there is a special qgroup on level 0 that always has ID 5 and corresponds to the root of the btrfs filesystem.

For each qgroup the above output shows how much space is referenced by it. That means that the corresponding subvolume contains files whose total size equals the shown number.

However, due to snapshots and the copy-on-write nature of btrfs subvolumes may share files. This means that the content (or actually the extents) of files may be referenced by more than one subvolume. This is expressed by the second number which shows how much space is exclusively allocated by each subvolume and is not shared with any other subvolume. In case you delete a subvolume this is the space that will actually be freed.

If you want to find out how much space would be freed if you delete multiple subvolumes, you can use the aforementioned levels. qgroups are organized in a hierachy and groups on upper levels (higher than 0) aggregate the information of lower levels.

Thus, to find out how much space would be freed if subvolumes 4881 and 7990 (in the above example) would be deleted create a new qgroup (arbitrarily with ID 0, but you may choose whatever you like here) on level 1 with

# btrfs qgroup create 1/0 /path/to/btrfs/filesystem

Then assign the newly created qgroup as a parent to the qgroups of the subvolumes you want to delete with

# btrfs qgroup assign 0/4881 1/0 /path/to/btrfs/filesystem
# btrfs qgroup assign 0/7990 1/0 /path/to/btrfs/filesystem

This will trigger another re-scan of the quota information which may take a while. If it is finished and you now issue

# btrfs qgroup show -p /path/to/btrfs/filesystem

you get an output like this:

qgroupid         rfer         excl parent
--------         ----         ---- ------
0/5           1.38TiB      2.51GiB ---
0/4881        1.11TiB     10.86GiB 1/0
0/7990        1.23TiB    502.41MiB 1/0
0/8400        1.34TiB      1.69GiB 1/0
1/0           1.51TiB    132.23GiB ---

(I added the -p flag to add the parent column to the output which shows the parent/child relationship of the qgroups.)

Now the line with qgroup 1/0 tells you how much space is referenced by both subvolumes you want to delete and, more importantly, it tells you how much space is allocated by them exclusively. This is the amount of space that will be freed if you delete both subvolumes.

I also wonder why they are saying that it would be so slow?

This is due to the copy-on-write nature of btrfs together with snapshots. If you create a snapshot in btrfs (normally) all actual data in the newly created subvolume that contains the snapshot is shared with the source of the snapshot. Only when a file is changed or replaced in the source does it point to different content (extents). This makes it very difficult to assess how much space would actually be freed if a subvolume is deleted because you have to account for all the space that is shared with other subvolumes.

Best Answer

Related Solutions

How to clone btrfs filesystem into different medium preserving snapshots’ sharing data

How much space would be freed by removing a btrfs subvolume

Related Question