Yes, Linux implementation of RAID1 speeds up disk read operations by a factor of two as long as two separate disk read operations are performed at the same time. That means reading one 10GB file won't be any faster on RAID1 than on a single disk, but reading two distinct 10GB files*will be faster.
To demonstrate it, just read some data with dd
. Before performing anything, clear the disk read cache with sync && echo 3 > /proc/sys/vm/drop_caches
. Otherwise hdparm
will claim super fast reads.
Single file:
# COUNT=1000; dd if=/dev/md127 of=/dev/null bs=10M count=$COUNT &
(...)
10485760000 bytes (10 GB) copied, 65,9659 s, 159 MB/s
Two files:
# COUNT=1000; dd if=/dev/md127 of=/dev/null bs=10M count=$COUNT &; dd if=/dev/md127 of=/dev/null bs=10M count=$COUNT skip=$COUNT &
(...)
10485760000 bytes (10 GB) copied, 64,9794 s, 161 MB/s
10485760000 bytes (10 GB) copied, 68,6484 s, 153 MB/s
Reading 10 GB of data took 65 seconds whereas reading 10 GB + 10 GB = 20 GB data took 68.7 seconds in total, which means multiple disk reads benefit greatly from RAID1 on Linux. skip=$COUNT
part is very important. The second process reads 10 GB of data from the 10 GB offset.
Jared's answer and ssh's comments refering to http://www.unicom.com/node/459 are wrong. The benchmark from there proves disk reads don't benefit from RAID1. However, the test was performed with bonnie++ benchmarking tool which doesn't perform two separate reads at one time. The author explictly states bonnie++ is not usable for benchmarking RAID arrays (refer to readme).
Alright, I figured it out with the help of this Trello link. In case anyone else wants to do this, here's the procedure.
Procedure
From a RAID1 array of two disks, one /dev/sda
which is faulty and another /dev/sdc
known-good:
- Disable auto-mounting of this array in
/etc/fstab
, reboot. Basically, we want btrfs to forget this array exists, as there's a bug where it'll still try to use one of the drives if it's unplugged.
Now that your array is unmounted, execute:
echo 1 | sudo tee /sys/block/sda/device/delete
replacing sda
with the faulty device name. This causes the disk to spin down (you should verify this in dmesg) and become inaccessible to the kernel.
Alternatively: just take the drive out of the computer before booting! I chose not to opt for this method, as the above works fine for me.
- Mount your array, with
-o degraded
mode.
- Begin a rebalancing operation with
sudo btrfs balance start -f -mconvert=single -dconvert=single /mountpoint
. This will reorganise the extents on the known-good drive, converting them to single
(non-RAID). This will take almost a day to complete, depending on the speed of your drive and size of your array. (mine had ~700 GiB, and rebalanced at a rate of 1 1GiB chunk per minute) Luckily, this operation can be paused, and will keep the array online while it occurs.
- Once this is done, you can issue
sudo btrfs device remove missing /mountpoint
to remove the 'missing' faulty device.
- Begin a second rebalance with
sudo btrfs balance start -mconvert=dup /mountpoint
to restore metadata redundancy. This takes a few minutes on my system.
- You're done! Your array is now
single
mode, with all redundancy removed.
- Take your faulty drive outside, and beat it with a hammer.
Troubleshooting
- Help, btrfs tried to write to my faulty disk, errored out, and forced it readonly!
- Did you follow step 1, and reboot before continuing? It's likely that btrfs still thinks the drive you spun down is present. Rebooting will cause btrfs to forget any errors, and will let you continue.
Best Answer
Let Btrfs do everything.
For one thing, Btrfs has its own integrated mirroring code which can be smarter than madm.
Of course if a disk fails hard in a mirrored pair in an madm raid10, you can replace the bad disk and move on with your life (albeit after a distressingly complex set of shell commands). The problem is if your disk fails a bit more softly: if a few blocks just give back the wrong bits instead of giving the appropriate error codes for a bad block, then when reading the data you will randomly get bad data. Btrfs is smarter than that: it checksums every bit of data. To be honest I don't know if it's more correct to say "every BTree node" or "every block", but the point is that when it reads some data from a mirrored array, it checks the checksum before giving it back to your userland process. If the checksum doesn't match, it consults the other mirror in the array first, and if that gives the correct checksum, then it will alert you that your disk has started to silently fail.
The Btrfs wiki specifically mentions your question:
Finally, even without this substantial advantage, the command-line workflow for dealing with removed or added Btrfs devices is super simple. I'm not even sure I could get the degraded-mount-then-fix-your-filesystem shell commands right, but for Btrfs it's very clearly documented on the multiple devices page as:
At this point if you have enough space on your remaining disks, you can always just
btrfs rebalance
and be done with it; no need to replace the mirror, as you would absolutely need to do with madm! And if you want to replace it, you can dobtrfs device add
first.