Centos – mdadm: can’t remove components in RAID 1

centosmdadmraidraid1software-raid

I have my /boot partition in a RAID 1 array using mdadm. This array has degraded a few times in the past, and every time I remove the physical drive, add a new one, bring the array being to normal, it uses a new drive letter. Leaving the old one still in the array and failed. I can't seem to remove those all components that no longer exist.

[root@xxx ~]# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdg1[10] sde1[8](F) sdb1[7](F) sdd1[6](F) sda1[4] sdc1[5]
      358336 blocks super 1.0 [4/3] [UUU_]

Here's what I've tried to remove the non-existent drives and partitions. For example, /dev/sdb1.

[root@xxx ~]# mdadm /dev/md0 -r /dev/sdb1
mdadm: Cannot find /dev/sdb1: No such file or directory
[root@xxx ~]# mdadm /dev/md0 -r faulty
mdadm: Cannot find 8:49: No such file or directory
[root@xxx ~]# mdadm /dev/md0 -r detached
mdadm: Cannot find 8:49: No such file or directory

That 8:49 I believe refers to the major and minor number shown in --detail, but I'm not quite sure where to go from here. I'm trying to avoid a reboot or restarting mdadm.

[root@xxx ~]# mdadm --detail /dev/md0 
/dev/md0:
        Version : 1.0
  Creation Time : Thu Aug  8 18:07:35 2013
     Raid Level : raid1
     Array Size : 358336 (350.00 MiB 366.94 MB)
  Used Dev Size : 358336 (350.00 MiB 366.94 MB)
   Raid Devices : 4
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Sat Apr 18 16:44:20 2015
          State : clean, degraded 
 Active Devices : 3
Working Devices : 3
 Failed Devices : 3
  Spare Devices : 0

           Name : xxx.xxxxx.xxx:0  (local to host xxx.xxxxx.xxx)
           UUID : 991eecd2:5662b800:34ba96a4:2039d40a
         Events : 694

    Number   Major   Minor   RaidDevice State
       4       8        1        0      active sync   /dev/sda1
      10       8       97        1      active sync   /dev/sdg1
       5       8       33        2      active sync   /dev/sdc1
       6       0        0        6      removed

       6       8       49        -      faulty
       7       8       17        -      faulty
       8       8       65        -      faulty

Note: The array is legitimately degraded right now and I'm getting a new drive in there as we speak. However, as you can see above, that shouldn't matter. I should still be able to remove /dev/sdb1 from this array.

Best Answer

It's because the device nodes no longer exist on your system (probably udev removed them when the drive died). You should be able to remove them by using the keyword failed or detached instead:

mdadm -r /dev/md0 failed     # all failed devices
mdadm -r /dev/md0 detached   # failed ones that aren't in /dev anymore

If your version of mdadm is too old to do that, you might be able to get it to work by mknod'ing the device to exist again. Or, honestly, just ignore it—it's not really a problem, and should go away the next time you reboot.

Related Question