Using mdadm 3.3
Since mdadm
3.3 (released 2013, Sep 3), if you have a 3.2+ kernel, you can proceed as follows:
# mdadm /dev/md0 --add /dev/sdc1
# mdadm /dev/md0 --replace /dev/sdd1 --with /dev/sdc1
sdd1
is the device you want to replace, sdc1
is the preferred device to do so and must be declared as a spare on your array.
The --with
option is optional, if not specified, any available spare will be used.
Older mdadm version
Note: You still need a 3.2+ kernel.
First, add a new drive as a spare (replace md0
and sdc1
with your RAID and disk device, respectively):
# mdadm /dev/md0 --add /dev/sdc1
Then, initiate a copy-replace operation like this (sdd1
being the failing device):
# echo want_replacement > /sys/block/md0/md/dev-sdd1/state
Result
The system will copy all readable blocks from sdd1
to sdc1
. If it comes to an unreadable block, it will reconstruct it from parity. Once the operation is complete, the former spare (here: sdc1
) will become active, and the failing drive will be marked as failed (F) so you can remove it.
Note: credit goes to frostschutz and Ansgar Esztermann who found the original solution (see the duplicate question).
Older kernels
Other answers suggest:
- Johnny's approach: convert array to RAID6, "replace" the disk, then back to RAID5,
- Hauke Laging's approach: briefly remove the disk from the RAID5 array, make it part of a RAID1 (mirror) with the new disk and add that mirror drive back to the RAID5 array (theoretical)...
OK, it looks like we have now access to the raid. At least the first checked files looked good. So here is what we have done:
The raid recovery article on the kernel.org wiki suggests two possible solutions for our problem:
using --assemble --force
(also mentioned by derobert)
The article says:
[...] If the event count differs by less than 50, then the information on the drive is probably still ok. [...] If the event count closely matches but not exactly, use "mdadm --assemble --force /dev/mdX " to force mdadm to assemble the array [...]. If the event count of a drive is way off [...] that drive [...] shouldn't be included in the assembly.
In our case the drive sde
had an event difference of 9. So there was a good chance that --force
would work. However after we executed the --add
command the event count dropped to 0 and the drive was marked as spare.
So we better desisted from using --force
.
recreate the array
This solution is explicitly marked as dangerous because you can loose data if you do something wrong. However this seemed to be the only option we had.
The idea is to create a new raid on the existing raid-devices (that is overwriting the device's superblocks) with the same configuration of the old raid and explicitly tell mdadm that the raid has already existed and should be assumed as clean.
Since the event count difference was just 9 and the only problem was that we lost the superblock of sde
there were good chances that writing new superblocks will get us access to our data... and it worked :-)
Our solution
Note: This solution was specially geared to our problem and may not work on your setup. You should take these notes to get an idea on how things can be done. But you need to research what's best in your case.
Backup
We already lost a superblock. So this time we saved the first and last gigabyte of each raid device (sd[acdefghij]
) using dd before working on the raid. We did this for each raid device:
# save the first gigabyte of sda
dd if=/dev/sda of=bak_sda_start bs=4096 count=262144
# determine the size of the device
fdisk -l /dev/sda
# In this case the size was 4000787030016 byte.
# To get the last gigabyte we need to skip everything except the last gigabyte.
# So we need to skip: 4000787030016 byte - 1073741824 byte = 3999713288000 byte
# Since we read blocks auf 4096 byte we need to skip 3999713288000/4096=976492502 blocks.
dd if=/dev/sda of=bak_sda_end bs=4096 skip=976492502
Gather information
When recreating the raid it is important to use the same configration as the old raid. This is especially important if you want to recreate the array on another machine using a different mdadm version. In this case mdadm's default values may be different and could create superblocks that do not fit to the existing raid (see the wiki article).
In our case we use the same machine (and thus the same mdadm-version) to recreate the array. However the array was created by a 3rd party tool in the first place. So we didn't want to rely on default values here and had to gather some information about the existing raid.
From the output of mdadm --examine /dev/sd[acdefghij]
we get the following information about the raid (Note: sdb was the ssd containing the OS and was not part of the raid):
Raid Level : raid5
Raid Devices : 9
Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
The Used Dev Size
is denominated in blocks of 512 byte. You can check this:
7814034432*512/1000000000 ~= 4000.79
But mdadm requires the size in Kibibytes: 7814034432*512/1024 = 3907017216
Important is the Device Role
. In the new raid each device must have the same role as before. In our case:
device role
------ ----
sda 0
sdc 1
sdd 2
sde 3
sdf 4
sdg 5
sdh 6
sdi spare
sdj 8
Note: Drive letters (and thus the order) can change after reboot!
We also need the layout and the chunk size in the next step.
Recreate raid
We now can use the information of the last step to recreate the array:
mdadm --create --assume-clean --level=5 --raid-devices=9 --size=3907017216 \
--chunk=512 --layout=left-symmetric /dev/md127 /dev/sda /dev/sdc /dev/sdd \
/dev/sde /dev/sdf /dev/sdg /dev/sdh missing /dev/sdj
It is important to pass the devices in the correct order!
Moreover we did not add sdi
as it's event count was too low. So we set the 7th raid slot to missing
. Thus the raid5 contains 8 of 9 devices and will be assembled in degraded mode. And because it lacks a spare device no rebuild will automatically start.
Then we used --examine
to check if the new superblocks fit to our old superblocks. And it did :-) We were able to mount the filesystem and read the data. The next step is to backup the data and then add back sdi
and start the rebuild.
Best Answer
Yes, you can (provided you have a 3.2+ kernel). First, add a new drive as a spare:
(replace
md0
andsdc1
with your RAID and disk device, respectively).Then, initiate a copy-replace operation like this:
Where
md0
is, again, your RAID device, andsdd1
is the failing drive. (Actually, sdd1 is a partition on the failing drive -- I prefer to create RAID sets on partitions rather than on raw disks).The system will copy all readable blocks from
sdd1
tosdc1
. If it comes to an unreadable block, it will reconstruct it from parity. Once the operation is complete, the former spare (here:sdc1
) will become active, and the failing drive will be marked as failed (F) so you can remove it.