Linux – How to safely replace a not-yet-failed disk in a Linux RAID5 array

linuxmdadmraid5software-raid

I have a software RAID5 array (Linux md) on 4 disks.

I would like to replace one of the disks with a new one, without putting the array in a degraded state, and if possible, online. How would that be possible?

It's important because I don't wan't to:

take the risk of stressing the other disks so one may crash during rebuild,
take the risk of being in a "no-parity state" so I don't have a safety net for some time.

I suppose doing so online is too much asking and I should just raw copy (dd) the data of the old disk to the new one offline and then replace it, but I think it is theoretically possible…

Some context: Those disks have all been spinning almost continuously for more than 5.5 years. They still work perfectly for the moment and they all pass the (long) SMART self-test. However, I have reasons to think that one of those 4 disks will not last much longer (supposed predictive failure).

Best Answer

Using mdadm 3.3

Since mdadm 3.3 (released 2013, Sep 3), if you have a 3.2+ kernel, you can proceed as follows:

# mdadm /dev/md0 --add /dev/sdc1
# mdadm /dev/md0 --replace /dev/sdd1 --with /dev/sdc1

sdd1 is the device you want to replace, sdc1 is the preferred device to do so and must be declared as a spare on your array.

The --with option is optional, if not specified, any available spare will be used.

Older mdadm version

Note: You still need a 3.2+ kernel.

First, add a new drive as a spare (replace md0 and sdc1 with your RAID and disk device, respectively):

# mdadm /dev/md0 --add /dev/sdc1

Then, initiate a copy-replace operation like this (sdd1 being the failing device):

# echo want_replacement > /sys/block/md0/md/dev-sdd1/state

Result

The system will copy all readable blocks from sdd1 to sdc1. If it comes to an unreadable block, it will reconstruct it from parity. Once the operation is complete, the former spare (here: sdc1) will become active, and the failing drive will be marked as failed (F) so you can remove it.

Note: credit goes to frostschutz and Ansgar Esztermann who found the original solution (see the duplicate question).

Older kernels

Related Solutions

Linux – How to recover a crashed Linux md RAID5 array

First check the disks, try running smart selftest

for i in a b c d; do
    smartctl -s on -t long /dev/sd$i
done

It might take a few hours to finish, but check each drive's test status every few minutes, i.e.

smartctl -l selftest /dev/sda

If the status of a disk reports not completed because of read errors, then this disk should be consider unsafe for md1 reassembly. After the selftest finish, you can start trying to reassembly your array. Optionally, if you want to be extra cautious, move the disks to another machine before continuing (just in case of bad ram/controller/etc).

Recently, I had a case exactly like this one. One drive got failed, I re-added in the array but during rebuild 3 of 4 drives failed altogether. The contents of /proc/mdadm was the same as yours (maybe not in the same order)

md1 : inactive sdc2[2](S) sdd2[4](S) sdb2[1](S) sda2[0](S)

But I was lucky and reassembled the array with this

mdadm --assemble /dev/md1 --scan --force

By looking at the --examine output you provided, I can tell the following scenario happened: sdd2 failed, you removed it and re-added it, So it became a spare drive trying to rebuild. But while rebuilding sda2 failed and then sdb2 failed. So the events counter is bigger in sdc2 and sdd2 which are the last active drives in the array (although sdd didn't have the chance to rebuild and so it is the most outdated of all). Because of the differences in the event counters, --force will be necessary. So you could also try this

mdadm --assemble /dev/md1 /dev/sd[abc]2 --force

To conclude, I think that if the above command fails, you should try to recreate the array like this:

mdadm --create /dev/md1 --assume-clean -l5 -n4 -c64 /dev/sd[abc]2 missing

If you do the --create, the missing part is important, don't try to add a fourth drive in the array, because then construction will begin and you will lose your data. Creating the array with a missing drive, will not change its contents and you'll have the chance to get a copy elsewhere (raid5 doesn't work the same way as raid1).

If that fails to bring the array up, try this solution (perl script) here Recreating an array

If you finally manage to bring the array up, the filesystem will be unclean and probably corrupted. If one disk fails during rebuild, it is expected that the array will stop and freeze not doing any writes to the other disks. In this case two disks failed, maybe the system was performing write requests that wasn't able to complete, so there is some small chance you lost some data, but also a chance that you will never notice it :-)

edit: some clarification added.

Ubuntu – RAID5 recovery trying with “mdadm –create … missing”

If you just lost one disk, you should have been able to recover from that using the very much safer --assemble.

You've run create now so much that all the UUIDs are different. sdc1 and sdd1 share a UUID (expected, as that's your working array)... the rest the disks share a name, but all have different UUIDs. So I'm guessing none of those are the original superblocks. Too bad...

Anyway, I'd guess you're either attempting to use the wrong disks, or you're trying to use the wrong chunk size (the default has changed over time, I believe). Your old array may have also used a different superblock version—that default has definitely changed—which could offset all the sectors (and also destroy some of the data). Finally, its possible you're using the wrong layout, though that's less likely.

It's also possible that, your test array was read-write (from a md standpoint) that attempts to use ext3 actually did some writes. E.g., a journal replay. But that's only if it found a superblock at some point, I think.

BTW: I think you really ought to be using --assume-clean, though of course a degraded array will not try to start rebuilding. Then you probably want to immediately set read-only.