How to Replace Device in BTRFS RAID-1 Filesystem

btrfs

I have a BTRFS RAID-1 filesystem with 2 legs. One disk needs to be replaced because of re-occuring read errors.

Thus, the plan is:

  1. add a 3rd leg -> result should be: 3 way mirror
  2. remove the faulty disk -> result should be: 2 way mirror

Thus, I did following steps:

btrfs dev add /dev/new_device /mnt/foo
btrfs balance /mnt/foo

I assume that btrfs does the right thing, i.e. create a 3 way mirror.

The alternative would be to use a balance filter, I guess. But since the filesystem already is a RAID-1 one, that shouldn't be necessary?

I am a bit concerned because a btrfs fi show prints this:

Before balance start:

    Total devices 3 FS bytes used 2.15TiB
    devid    1 size 2.73TiB used 2.16TiB path /dev/left
    devid    2 size 2.73TiB used 2.16TiB path /dev/right
    devid    3 size 2.73TiB used 0.00B path /dev/new_device

During balancing:

    Total devices 3 FS bytes used 2.15TiB
    devid    1 size 2.73TiB used 1.93TiB path /dev/left
    devid    2 size 2.73TiB used 1.93TiB path /dev/right
    devid    3 size 2.73TiB used 458.03GiB path /dev/new_device

I mean, this looks like btrfs balances one half of the existing RAID-1 group to a single disk … right?

Thus, my question, do I need to specify a balance filter to get a 3-way mirror?

PS: Does btrfs even support n-way mirrors? A note in the btrfs wiki says that it does not – but perhaps it is outdated? Oh boy, cks has a pretty recent article on the 2-way limit.

Best Answer

Currently, btrfs does not support n-way mirrors.

Btrfs does have a special replace subcommand:

btrfs replace start /dev/left /dev/new_device /mnt/foo

Reading between the lines of the btrfs-replace man page, this command should be able to use both existing legs - e.g. for situations where both legs have read errors - but both error sets are disjoint.

The btrfs replace command is executed in the background - you can check its status via the status subcommand, e.g.:

btrfs replace status /mnt/foo
45.4% done, 0 write errs, 0 uncorr. read errs

Alternatively, one can also add a device to raid-1 filesytem and then delete an existing leg:

btrfs dev add /dev/mapper/new_device /mnt/foo
btrfs dev delete /dev/mapper/right  /mnt/foo

The add should return fast, since it justs adds the device (issue a btrfs fi show to confirm).

The following delete should trigger a balancing between the remaining devices such that each extend is available on each remaining device. Thus, the command is potentially very long running. This method also works to deal with the situation described in the question.

In comparison with btrfs replace the add/delete cycle spams the syslog with low-level info messages. Also, it takes much longer to finish (e.g. 2-3 times longer, in my test system with 3 TB SATA drives, 80 % FS usage).

Finally, after the actual replacement, if the newer devices are larger than the original devices, you will need to issue a btrfs fi resize on each device to utilize the entire disk space available. For the replace example at the top, this looks like something like:

btrfs fi resize <devid>:max /mnt/foo

where devid stands for the device id which btrfs fi show returns.

Related Question