Centos – mdadm: can’t remove components in RAID 1

centosmdadmraidraid1software-raid

I have my /boot partition in a RAID 1 array using mdadm. This array has degraded a few times in the past, and every time I remove the physical drive, add a new one, bring the array being to normal, it uses a new drive letter. Leaving the old one still in the array and failed. I can't seem to remove those all components that no longer exist.

[root@xxx ~]# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdg1[10] sde1[8](F) sdb1[7](F) sdd1[6](F) sda1[4] sdc1[5]
      358336 blocks super 1.0 [4/3] [UUU_]

Here's what I've tried to remove the non-existent drives and partitions. For example, /dev/sdb1.

[root@xxx ~]# mdadm /dev/md0 -r /dev/sdb1
mdadm: Cannot find /dev/sdb1: No such file or directory
[root@xxx ~]# mdadm /dev/md0 -r faulty
mdadm: Cannot find 8:49: No such file or directory
[root@xxx ~]# mdadm /dev/md0 -r detached
mdadm: Cannot find 8:49: No such file or directory

That 8:49 I believe refers to the major and minor number shown in --detail, but I'm not quite sure where to go from here. I'm trying to avoid a reboot or restarting mdadm.

[root@xxx ~]# mdadm --detail /dev/md0 
/dev/md0:
        Version : 1.0
  Creation Time : Thu Aug  8 18:07:35 2013
     Raid Level : raid1
     Array Size : 358336 (350.00 MiB 366.94 MB)
  Used Dev Size : 358336 (350.00 MiB 366.94 MB)
   Raid Devices : 4
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Sat Apr 18 16:44:20 2015
          State : clean, degraded 
 Active Devices : 3
Working Devices : 3
 Failed Devices : 3
  Spare Devices : 0

           Name : xxx.xxxxx.xxx:0  (local to host xxx.xxxxx.xxx)
           UUID : 991eecd2:5662b800:34ba96a4:2039d40a
         Events : 694

    Number   Major   Minor   RaidDevice State
       4       8        1        0      active sync   /dev/sda1
      10       8       97        1      active sync   /dev/sdg1
       5       8       33        2      active sync   /dev/sdc1
       6       0        0        6      removed

       6       8       49        -      faulty
       7       8       17        -      faulty
       8       8       65        -      faulty

Note: The array is legitimately degraded right now and I'm getting a new drive in there as we speak. However, as you can see above, that shouldn't matter. I should still be able to remove /dev/sdb1 from this array.

Best Answer

It's because the device nodes no longer exist on your system (probably udev removed them when the drive died). You should be able to remove them by using the keyword failed or detached instead:

mdadm -r /dev/md0 failed     # all failed devices
mdadm -r /dev/md0 detached   # failed ones that aren't in /dev anymore

If your version of mdadm is too old to do that, you might be able to get it to work by mknod'ing the device to exist again. Or, honestly, just ignore it—it's not really a problem, and should go away the next time you reboot.

Related Solutions

Ubuntu – mdadm – RAID5 array size vs. actual disk size mismatch

fdisk is the wrong tool for disks >2TB. Use parted or gdisk instead.

It appears that /dev/sdc1 and /dev/sdd1 are 2TB partitions, so that's what limits your array size. For the other disks, they have GPT so I assume they are 3TB already, but you should check.

Basically you have to stop the array, enlarge each partition to 3TB (without changing the starting offset), then start it again and follow it up with a grow:

mdadm --grow /dev/md0 --size=max

If you can't stop the array, you'll have to fail each 2TB partition individually, repartition and re-add it. This might go faster if you add a write-intent bitmap first.

mdadm --grow /dev/md0 --bitmap=internal

Then for each disk individually,

mdadm /dev/md0 --fail /dev/disk1 # check mdstat for [UUUU] first
mdadm /dev/md0 --remove /dev/disk1
parted /dev/disk -- mklabel gpt mkpart primary 1mib -1mib
mdadm /dev/md0 --re-add /dev/disk1
mdadm --wait /dev/md0 # must wait for sync

Once that's done you can remove the bitmap again (keeping it may harm performance).

mdadm --grow /dev/md0 --bitmap=none
mdadm --grow /dev/md0 --size=max

Finally do your resize2fs or whatever.

“DegradedArray event”

The first message means that your RAID array went into an inconsistent state, because apparently the sdb drive was detected as failing (second message). The current status looks like the array was restored, but you may want to check the output of smartctl --all /dev/sdb to get the current health status (which is printed before the drive parameters and the error log) and check if the drive parameters are suspicious (things like Reallocated Sector Count or Current Pending Sector hint to a potential problem) or if the device has (new) entries in the error log. You may also want to check dmesg for messages related to sdb.

You could also, for extra safety, remove sdb1 from the RAID array and execute a test with smartctl on it (eg. smartctl -t short /dev/sdb for a short test or smartctl -t long /dev/sdb for a more thorough test).

Please note that you need to use -d <...> for smartctl with a parameter <...> that fits your device. Refer to this list of supported USB devices for the correct one. To get the USB IDs, you can use lsusb. If your device is not listed, you may look for related devices (eg. by the same vendor or having a similar name).

Best Answer

Related Solutions

Ubuntu – mdadm – RAID5 array size vs. actual disk size mismatch

“DegradedArray event”

Related Question