Can BTRFS recover/continue after disk failure in “single” mode

btrfs

Testing btrfs for the first time to see if I can use it for a specific project.

I'm working in a virtual machine.

  1. Created a volume with these 3 small and random sized disks.

    mkfs.btrfs -d single /dev/sdb /dev/sdc /dev/sdd
    mount /dev/sdb /mnt/data
    
  2. Added another device just to test

    btrfs device add /dev/sde /mnt/data 
    
  3. Created a bunch of 1GB files to fill up the disks

    dd if=/dev/urandom of=1GB_07.bin bs=64M count=16 iflag=fullblock
    
  4. I removed one of the disks from the VM and rebooted

  5. I was able to force mount in read only mode

    mount -ro degraded /dev/sdb /mnt/data
    

I can see all of the files. I tried to rsync them to a different directory and could not copy one of the 1G files I created. Makes sense, it's on the missing disk!

From here, is there a way to just "trash" the missing disk and files that were on it and have things running in read/write mode again? I'm just trying to piece together a box with a bunch of random sized disks. Redundancy isn't important to me here and I don't want the overhead of mirroring data on this box.

If I lose a drive with some data on it, I want to just replace/remove it and re-rsync from the source to get new copies of the missing files on the BTRFS machine.

Does that make sense?
Is this possible?

Best Answer

Given your exact description, no it's not possible because you will have lost part of the metadata tree as well (and if you're really unlucky, you will have lost the chunk tree (the System chunks in btrfs fi df output) too, which is equivalent to wiping the superblocks and part of the inode tables on an ext4 filesystem). THis missing metadata is part of why you were forced to mount read-only.

By default, BTRFS uses dup mode for metadata. This means that it stores 2 copies of each metadata block, but both are kept on the same device (even if you have more than one device). As a result of this, if you lose one device from a multi-device BTRFS volume using this metadata profile, you will (probably) lose some of your metadata. If the metadata tree is that damaged, you're probably going to have large parts of the filesystem missing, and are also likely to not be able to mount the filesystem at all.

What you would need to do is use raid1 mode for metadata. Seriously, this is not as much of a performance hit as you think, especially if you arenot regularly writing to the filesystem, and it will prevent a single device failure from nuking the whole filesystem.

With that, once a device fails:

  1. Use mount -o remount,rw,degraded to force the filesystem to be writable again. DO NOT LEAVE THE FILESYSTEM RUNNING LIKE THIS IF YOU AREN'T FIXING IT! Seriously, very bad things can happen if you leave the filesystem degraded but writable.
  2. Delete each file affected by the failure somehow. Figuring out reliably what's affected is non-trivial currently, especially if you have any degree of fragmentation.
  3. Once those files and directories are removed, use btrfs device delete to remove the failed device (if the device is completely missing, you can use btrfs device delete missing to get rid of it). Using btrfs replace in this scenario will probably fail, and doesn't get you any better performance. Using btrfs device delete also removes the requirement that the new device be at least as large as the old one (and thus makes your life easier since you're not dealing with uniformly sized devices).
  4. Use btrfs device add to add the replacement device, and then btrfs balance start -musage=100 to rebalance the metadata chunks (the data chunks will naturally rebalance as you copy lost files in).
  5. Use rsync or a similar tool to copy back the stuff that is now missing.
Related Question