Ubuntu – mdadm warning (system unbootable) from update-initramfs, mkconf’s suggested fix seems inconsistent with mdadm’s description of problem

bootinitramfsmdadmraiduuid

Summary: update-initramfs says system unbootable, message points to mkconf which suggests rewrite to mdadm.conf that would appear to break the RAID, system up (for now) but next reboot may kill it, unclear how to proceed, mdadm.conf looks good but what is the update-initramfs warning telling me? and why does mkconf seem to suggest something bad?

I have a dedicated server at 1and1.com running Ubuntu 12.04, and "update-initramfs -u" reports a mdadm error message indicating the server will not reboot properly. I've looked at the relevant configuration files, and have not been able to identify the problem. I haven't tried to reboot since seeing this message, because I do not want to "just see what happens" on a server I can't physically access (and possibly make diagnosis even more difficult, if I lose access to a running system that I can probe for information.) I feel that I should try to understand the error message and system configuration, until I have confidence that a reboot is likely to succeed.

First, the error message (from update-initramfs -u):

W: mdadm: the array /dev/md3 with UUID dffcb503:bc157198:3fb6082e:e5593158
W: mdadm: is currently active, but it is not listed in mdadm.conf. if
W: mdadm: it is needed for boot, then YOUR SYSTEM IS NOW UNBOOTABLE!
W: mdadm: please inspect the output of /usr/share/mdadm/mkconf, compare
W: mdadm: it to /etc/mdadm/mdadm.conf, and make the necessary changes.
W: mdadm: the array /dev/md1 with UUID a46d442b:4e5b8a52:3fb6082e:e5593158
W: mdadm: is currently active, but it is not listed in mdadm.conf. if
W: mdadm: it is needed for boot, then YOUR SYSTEM IS NOW UNBOOTABLE!
W: mdadm: please inspect the output of /usr/share/mdadm/mkconf, compare
W: mdadm: it to /etc/mdadm/mdadm.conf, and make the necessary changes.

I concentrate below on md1 since that's where /boot is located (so "needed for boot" === TRUE), but the same error message is also generated for md3.

The md structure is from the ISP's default Ubuntu image, this part of the system has not been touched. The only change to the drive/partition structure was expanding the size of logical drives (lvextend and resize2fs), which (although it might affect other things) would not seem to affect "/" (on md1) and its ability to boot.

cat /etc/fstab

/dev/md1    /       ext3    defaults       1 1
/dev/sda2   none        swap    sw          
/dev/sdb2   none        swap    sw          
/dev/vg00/usr   /usr        ext4    errors=remount-ro       0 2
/dev/vg00/var   /var        ext4    errors=remount-ro       0 2
/dev/vg00/home  /home       ext4    errors=remount-ro   0 2

proc /proc proc nodev,noexec,nosuid 0 0

We can see the md system running properly, with md1 on sda1 and sdb1:

cat /proc/mdstat
-----
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 sdb1[1] sda1[0]
      4194240 blocks [2/2] [UU]

md3 : active raid1 sdb3[0] sda3[1]
      1458846016 blocks [2/2] [UU]

It seems that these md's are defined in ARRAY lines mdadm.conf:

cat /etc/mdadm/mdadm.conf

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This file was auto-generated on Tue, 11 May 2010 20:53:30 +0200
# by mkconf $Id$

ARRAY /dev/md1 level=raid1 num-devices=2 devices=/dev/sda1,/dev/sdb1
ARRAY /dev/md3 level=raid1 num-devices=2 devices=/dev/sda3,/dev/sdb3

The most recent initrd in /boot is initrd.img-3.2.0-37-generic, and mdadm.conf cached there looks identical (checked via "gunzip -c /boot/initrd.img-3.2.0-37-generic | cpio -i –quiet –to-stdout etc/mdadm/mdadm.conf")

So the actual situation (the running md's and how they are defined for boot) looks fine to me. Going back to the "update-initramfs -u" error message, it suggests comparing mdadm.conf to the output of /usr/share/mdadm/mkconf. This is where we start to see something that looks really different:

/usr/share/mdadm/mkconf

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

If I'm reading this correctly, the system's proposed rewrite of mdadm.conf (to fix that md1 and md3 are "currently active, but … not listed in mdadm.conf") would DROP the listing of md1 and md3 from mdadm.conf. So I can't square the error message and the proposed fix (change mdadm.conf to go from unlisted to listed, but where the proposed fix goes from listed to unlisted?) My inability to (1) find any actual problem, and (2) reconcile the error message with the proposed fix, makes me distrust the output of /usr/share/mdadm/mkconf (and the error message directing me there, from update-initramfs -u). But I don't want to ignore the system calling for help, especially on a part of the system that is so critical. I trust that the OS knows something that I don't. And experimentation (remote reboot) is a last resort.

When searching online for others having similar error messages, related issues seem to involve mkconf generating ARRAY lines that are different from what is presently in mdadm.conf (and using mkconf's output is generally recommended, as a fix to the ARRAY lines.) But in this case, mkconf provides no ARRAY lines at all, so this line of research has not led to relevant assistance. Comments in mdadm.conf say that it scans for MD superblocks by defualt, so the fact that the generated file omits explicit reference to md1/md3 is perhaps okay (?) if mdadm can draw that information from the superblocks. But if so, why does the error message say that the problem is that md1/md3 is unlisted, and what is wrong with the present configuration (why is there an error message at all)? So that line of thought (trying to understand how a generated file without ARRAY lies might help) has not worked out either.

This is perhaps barking up the wrong tree, but are device names like sda1 currently allowed in mdadm.conf on ARRAY lines? I know that UUID is preferred, could the use of device names be the cause of the error message? If so, what options might work out: (1) no change to mdadm.conf and relying on the system continuing to assemble md1 based on device names; (2) use the output of mkconf, with no explicit md's at all (no device name, no UUID's), relying on automatic discovery based on superblocks; (3) find the UUID's and write new ARRAY lines for mdadm.conf (which would be neither existing values, nor the mkconf proposed replacement)?

How should the cause of this error message be identified, what is it trying to tell me?

Additional information that might be useful:

mdadm –misc –detail /dev/md1

/dev/md1:
        Version : 0.90
  Creation Time : Sun Feb 24 19:11:59 2013
     Raid Level : raid1
     Array Size : 4194240 (4.00 GiB 4.29 GB)
  Used Dev Size : 4194240 (4.00 GiB 4.29 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Sun Apr 27 23:39:38 2014
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : a46d442b:4e5b8a52:3fb6082e:e5593158
         Events : 0.122

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1

mdadm –misc –detail /dev/md3

/dev/md3:
        Version : 0.90
  Creation Time : Sun Feb 24 19:11:59 2013
     Raid Level : raid1
     Array Size : 1458846016 (1391.26 GiB 1493.86 GB)
  Used Dev Size : 1458846016 (1391.26 GiB 1493.86 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 3
    Persistence : Superblock is persistent

    Update Time : Sun Apr 27 23:43:41 2014
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : dffcb503:bc157198:3fb6082e:e5593158
         Events : 0.1883

    Number   Major   Minor   RaidDevice State
       0       8       19        0      active sync   /dev/sdb3
       1       8        3        1      active sync   /dev/sda3

Best Answer

I found a solution here: http://www.howtoforge.com/forums/showthread.php?t=65066

Obtain the UUID for your array in question with the command: mdadm --misc --detail /dev/mdX (where X is the array number) and edit /etc/mdadm/mdadm.conf and replace them with:

ARRAY /dev/md1 UUID=dffcb503:bc157198:3fb6082e:e5593158
ARRAY /dev/md3 UUID=a46d442b:4e5b8a52:3fb6082e:e5593158

Replacing my /dev/mdX device and UUID with yours. I just did this on one of mine and it worked. I'm posting this not really for you as you likely solved it ages ago but for anyone else who this happened to.

Related Question