I asked a similar question 2 years ago.
However in my case, I was only planning to copy a single device onto raid0.
I eventually found a solution. At the time you couldn't convert from raid0 to raid10, but it looks like that since kernel 3.3, you can now. So that solution may work for you in the end.
A problem with that approach is that it copies the fsuid. Which means you can't mount both the FS and its copy on the same machine. At the time, there was no tool to change the fsuid
of a FS, but it might have changed now.
The idea is to add a copy-on-write layer on top of the original device so that it can be written to, but any modification is done somewhere else which you can discard later on. That means you need additional storage space (for instance on an external drive).
Then mount that COW'd FS instead of the original, add the devices for the FS copy and remove the COW's device.
For copy-on-write, you can use the device mapper.
For the disposable copy on write area, here I use a loop device.
Let's say you want to clone /dev/sda
onto /dev/sd[bcde]
:
Create the COW back store:
truncate -s 100G /media/STORE/snap-store
losetup /dev/loop0 /media/STORE/snap-store
Now unmount the origin FS if mounted and modprobe -r btrfs
to make sure it's not going to interfere and make it forget its device scan.
Then make the COW'd device:
echo "echo 0 $(blockdev --getsize /dev/sda) snapshot /dev/sda /dev/loop0 N 8 | dmsetup create cowed
Now /dev/mapper/cowed
is like /dev/sda
except that anything written to it will end up in /dev/loop0
and /dev/sda
will be untouched.
Now, you can mount it:
mount /dev/mapper/cowed /mnt
Add the other devices:
btrfs dev add /dev/sd[bcde] /mnt
And remove the old one:
btrfs dev del /dev/mapper/cowed /mnt
When that's over, you may want to shutdown and unplug or make /dev/sda
readonly as because it's got the same fsuid as the other ones, btrfs
might still mess up with it.
Now, if I understand correctly, assuming you've got recent btrfs-prog, you should be able to do a:
btrfs balance start -d convert=raid10 /mnt
To convert to raid10. In theory, that should make sure that every data chunk is copied on a least 2 disks.
I would strongly recommend that you do tests on a dummy btrfs on loop devices first as all that is from memory and I might have gotten it wrong (see for instance my initial answer before my edit).
Note that since kernel 3.6, btrfs implements send/receive a bit like in zfs. That might be an option for you.
You should take a look at btrfs quota
and btrfs qgroups
(quota groups).
Basically qgroups
do exactly what you requested, they track how much space is allocated by subvolumes. To enable qgroup
functionality for a btrfs
filesystem you have to
# btrfs quota enable /path/to/btrfs/filesystem
However, before you do this be warned that this triggers a complete re-computation of the qgroup
data which will take some time especially for large filesystems with many subvolumes. This process runs asynchronously in the background. You can already check the status of the qgroups
with
# btrfs qgroup show /path/to/btrfs/filesystem
This will give you some output like this:
WARNING: rescan is running, qgroup data may be incorrect
qgroupid rfer excl
-------- ---- ----
0/5 843.69GiB 61.91MiB
0/4881 811.06GiB 9.34GiB
0/7990 867.32GiB 329.91MiB
0/8400 867.17GiB 37.64MiB
(The warning in the first line is present as long as the rescan is still running.)
Btrfs automatically creates a qgroup
for each subvolume. In this case there are three subvolumes with subvolume IDs 4881, 7990, and 8400. The part before the forward slash is the level of the qgroup
. Each subvolume qgroup
is on level 0. Additionally there is a special qgroup
on level 0 that always has ID 5 and corresponds to the root of the btrfs filesystem.
For each qgroup
the above output shows how much space is referenced by it. That means that the corresponding subvolume contains files whose total size equals the shown number.
However, due to snapshots and the copy-on-write nature of btrfs subvolumes may share files. This means that the content (or actually the extents) of files may be referenced by more than one subvolume. This is expressed by the second number which shows how much space is exclusively allocated by each subvolume and is not shared with any other subvolume. In case you delete a subvolume this is the space that will actually be freed.
If you want to find out how much space would be freed if you delete multiple subvolumes, you can use the aforementioned levels. qgroups
are organized in a hierachy and groups on upper levels (higher than 0) aggregate the information of lower levels.
Thus, to find out how much space would be freed if subvolumes 4881 and 7990 (in the above example) would be deleted create a new qgroup
(arbitrarily with ID 0, but you may choose whatever you like here) on level 1 with
# btrfs qgroup create 1/0 /path/to/btrfs/filesystem
Then assign the newly created qgroup
as a parent to the qgroups
of the subvolumes you want to delete with
# btrfs qgroup assign 0/4881 1/0 /path/to/btrfs/filesystem
# btrfs qgroup assign 0/7990 1/0 /path/to/btrfs/filesystem
This will trigger another re-scan of the quota information which may take a while. If it is finished and you now issue
# btrfs qgroup show -p /path/to/btrfs/filesystem
you get an output like this:
qgroupid rfer excl parent
-------- ---- ---- ------
0/5 1.38TiB 2.51GiB ---
0/4881 1.11TiB 10.86GiB 1/0
0/7990 1.23TiB 502.41MiB 1/0
0/8400 1.34TiB 1.69GiB 1/0
1/0 1.51TiB 132.23GiB ---
(I added the -p
flag to add the parent
column to the output which shows the parent/child relationship of the qgroups
.)
Now the line with qgroup
1/0
tells you how much space is referenced by both subvolumes you want to delete and, more importantly, it tells you how much space is allocated by them exclusively. This is the amount of space that will be freed if you delete both subvolumes.
I also wonder why they are saying that it would be so slow?
This is due to the copy-on-write nature of btrfs together with snapshots. If you create a snapshot in btrfs (normally) all actual data in the newly created subvolume that contains the snapshot is shared with the source of the snapshot. Only when a file is changed or replaced in the source does it point to different content (extents). This makes it very difficult to assess how much space would actually be freed if a subvolume is deleted because you have to account for all the space that is shared with other subvolumes.
Best Answer
I run use btrfs for my personal NAS. It's a 3.7T filesystem with over a thousand snapshots. I use the snapshots to sync backups to external drives. For my use case, enabling quotas has detrimental effects on system stability and performance. BTRFS transactions can become stalled for hours doing quota calculations. This causes any process that touches that filesystem to hang in uninterruptible disk sleep. Even
ls
ordf
will hang and become unkillable until the quota calculations complete.I think if I were to use far fewer snapshots I would not experience this problem. Quotas do seem to perform tolerably well for some people's workloads, just not mine.