Linux – Reusing older raid discs

linuxmdadmraidsoftware-raid

I have a software RAID5 array (Linux md) on 4 disks; the OS is OpenSUSE 12.3.

Recently I replaced a disc with a larger one (fail disc, remove it, add new disc). The capacity of the array has not yet increased as I didn't replace all discs. Now the new one seems to be bad and I want to replace it with the old one. Can I simply remove the new one (fail, remove) and replace it with the unchanged old one or should I format the old one before adding it to the array? The array has a bitmap.

What would happen if I simply shut down the machine and replaced the discs without using mdadm?

Best Answer

If the bitmap has not changed when the old disk was replaced by the new one, it should work to mark the disk as failed and remove it from the array.

mdadm -f /dev/md0 /dev/sda1
mdadm -r /dev/md0 /dev/sda1

Then replace the disk and add the old one to the array:

mdadm --add /dev/md0 /dev/sde1

I think that shutting down the machine and replacing the disks would also work, but the mdadm method has the advantage that the disks can be hot-plugged if supported by the machine.

Related Solutions

Linux – RAID 5 with 4 disks fails to operate with one failed disk

This is a fundamental problem with RAID5—bad blocks on rebuild are a killer.

Oct  2 15:08:51 it kernel: [1686185.573233] md/raid:md0: device xvdc operational as raid disk 0
Oct  2 15:08:51 it kernel: [1686185.580020] md/raid:md0: device xvde operational as raid disk 2
Oct  2 15:08:51 it kernel: [1686185.588307] md/raid:md0: device xvdd operational as raid disk 1
Oct  2 15:08:51 it kernel: [1686185.595745] md/raid:md0: allocated 4312kB
Oct  2 15:08:51 it kernel: [1686185.600729] md/raid:md0: raid level 5 active with 3 out of 4 devices, algorithm 2
Oct  2 15:08:51 it kernel: [1686185.608928] md0: detected capacity change from 0 to 2705221484544
⋮

The array has been assembled, degraded. It has been assembled with xvdc, xvde, and xvdd. Apparently, there is a hot spare:

Oct  2 15:08:51 it kernel: [1686185.615772] md: recovery of RAID array md0
Oct  2 15:08:51 it kernel: [1686185.621150] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Oct  2 15:08:51 it kernel: [1686185.627626] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Oct  2 15:08:51 it kernel: [1686185.634024]  md0: unknown partition table
Oct  2 15:08:51 it kernel: [1686185.645882] md: using 128k window, over a total of 880605952k.

The 'partition table' message is unrelated. The other messages are telling you that md is attempting to do a recovery, probably on to a hot spare (which might be the device that failed out before, if you've attempted to remove/re-add it).

⋮
Oct  2 15:24:19 it kernel: [1687112.817845] end_request: I/O error, dev xvde, sector 881423360
Oct  2 15:24:19 it kernel: [1687112.820517] raid5_end_read_request: 1 callbacks suppressed
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: read error not correctable (sector 881423360 on xvde).
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: Disk failure on xvde, disabling device.
Oct  2 15:24:19 it kernel: [1687112.821837] md/raid:md0: Operation continuing on 2 devices.

And this here is md attempting to read a sector from xvde (one of the remaining three devices). That fails [bad sector, probably], and md (since the array is degraded) can not recover. It thus kicks the disk out of the array, and with a double-disk failure, your RAID5 is dead.

I'm not sure why its being labeled as a spare—that's weird (though, I guess I normally look at /proc/mdstat, so maybe that's just how mdadm labels it). Also, I thought newer kernels were much more hesitant to kick out for bad blocks—but maybe you're running something older?

What can you do about this?

Good backups. That's always an important part of any strategy to keep data alive.

Make sure that the array gets scrubbed for bad blocks routinely. Your OS may already include a cron job for this. You do this by echoing either repair or check to /sys/block/md0/md/sync_action. "Repair" will also repair any discovered parity errors (e.g., the parity bit doesn't match with the data on the disks).

# echo repair > /sys/block/md0/md/sync_action
#

Progress can be watched with cat /proc/mdstat, or the various files in that sysfs directory. (You can find somewhat up-to-date documentation at the Linux Raid Wiki mdstat article.

NOTE: On older kernels—not sure the exact version—check may not fix bad blocks.

One final option is to switch to RAID6. This will require another disk (you can run a four- or even three-disk RAID6, you probably don't want to). With new enough kernels, bad blocks are fixed on the fly when possible. RAID6 can survive two disk failures, so when one disk has failed, it can still survive a bad block—and thus it'll both map out the bad block and continue the rebuild.

Mdadm; previously working; after “failure”, cannot join array due to disk size

So uh... I guess... well... the disks... shrank?

The area mdadm reserves for metadata by default probably grew... I've had some cases recently where mdadm wasted a whopping 128MiB for no apparent reason. You want to check mdadm --examine /dev/device* for the data offset entry. Ideally it should be no more than 2048 sectors.

If that is indeed the problem, you could use mdadm --create along with the --data-offset= parameter to make mdadm waste less space for metadata.

If that's still not sufficient, you'd have to either try your luck with the old 0.90 metadata (which might be the most space efficient as it uses no such offset), or shrink the other side of the RAID a little (remember to shrink the LV / filesystem first).

Best Answer

Related Solutions

Linux – RAID 5 with 4 disks fails to operate with one failed disk

What can you do about this?

Mdadm; previously working; after “failure”, cannot join array due to disk size

Related Question