Btrfs RAID1 Disk Replacement – How to Replace a Missing Disk

btrfsdiskraid1replace

I have a btrfs RAID1 system with the following state:

# btrfs filesystem show
Label: none  uuid: 975bdbb3-9a9c-4a72-ad67-6cda545fda5e
        Total devices 2 FS bytes used 1.65TiB
        devid    1 size 1.82TiB used 1.77TiB path /dev/sde1
        *** Some devices missing

The missing device is a disk drive that failed completely and which the OS could not recognize anymore. I removed the faulty disk and sent it for recycling.

Now I have a new disk installed under /dev/sdd. Searching the web, I fail to find instructions for such a scenario (bad choice of search terms?). There are many examples how to save a RAID system when the faulty disk still remain somewhat accessible by the OS. btrfs replace command requires a source disk.

I tried the following:

# btrfs replace start 2 /dev/sdd /mnt/brtfs-raid1-b
# btrfs replace status /mnt/brtfs-raid1-b
Never started

No error message, but status indicate it never started. I cannot figure out what the problem with my attempt is.

I am running Ubuntu 16.04 LTS Xenial Xerus, Linux kernel 4.4.0-57-generic.

Update #1

Ok, when running the command in "non background mode (-B)", I see an error that did not showed up before:

# btrfs replace start -B 2 /dev/sdd /mnt/brtfs-raid1-b                                                                                                                     
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt/brtfs-raid1-b": Read-only file system

/mnt/brtfs-raid1-b is mounted RO (Read Only). I have no choice; Btrfs does not allow me to mount the remaining disk as RW (Read Write). When I try to mount the disk RW, I get the following error in syslog:

BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed

When in RO mode, it seams I cannot do anything; cannot replace, nor add, nor delete a disk. But there is no way for me to mount the disk as RW. What option is left?

It shouldn't be this complicated when a simple disk fails. The system should continue running RW and warn me of a failed drive. I should be able to insert a new disk and have the data recopied over it, while the applications remain unaware of the disk issue. That is a proper RAID.

Best Answer

Turns out that this is a limitation of btrfs as of beginning of 2017. To get the filesystem mounted rw again, one needs to patch the kernel. I have not tried it though. I am planing to move away from btrfs because of this; one should not have to patch a kernel to be able to replace a faulty disk.

Click on the following links for details:

Please leave a comment if you still suffer from this problem as of 2020. I believe that people would like to know if this has been fixed or not.

Update: I moved to good old mdadm and lvm and am very happy with my RAID10 4x4 Tb (8 Tb total space), as of 2020-10-20. It is proven, works well, not resource intensive and I have full trust in it.

Related Question