Mdadm; previously working; after “failure”, cannot join array due to disk size

mdadmraidsoftware-raid

Abstract

I had a functional Raid 5 array, I rebooted the box, and then mdadm couldn't re-assamble one part.

Seeing that it was only one part, I thought it would be easy to just re-sync. But that turned out not to work, because apparently now the device is not large enough to join the array!?

Initial Raid Setup

Sadly rather complicated. I have a Raid 5 combining two 3 TB disks with two linear raids (consisting of 1tb+2tb).
I did not partition the disks, that is, the raid spans physical discs. In hindsight this is probably what caused the initial failure.

After the fateful reboot

mdadm would refuse to assemble one of the linear arrays, claiming that there existed no superblock (checking with mdadm –examine on both didn't return anything). Stranger yet, apparently they still had some partitiontable remains on them.

At this point I thought that the quickest solution would be to just re-create the linear array, add it to the bigger raid5 array and then have it re-sync.
Hence I opted to just remove those partition table entries, that is: partition them to freespace. And then I created a linear array spanning both of the disks.

# mdadm --create /dev/md2 --level=linear --raid-devices=2 /dev/sda /dev/sdc

However, when trying to add them back to the array, I get

# mdadm --add /dev/md0 /dev/md2        
mdadm: /dev/md2 not large enough to join array

So I am I correctly guessing the disks shrank?

Counting blocks

I guess it's time for some block counts!

The two components of the linear array:

RO    RA   SSZ   BSZ   StartSec            Size   Device
rw   256   512  4096          0   1000204886016   /dev/sda
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw   256   512  4096          0   2000398934016   /dev/sdc

If mdadm's linear mode would have no overhead, the sum of the two sizes would be bigger than one of the 3tb drives (3000592982016). But that is not the case:

/proc/mdstat reports that the linear array has size 2930015024, which is 120016 less than the required

# mdadm --detail /dev/md0 | grep Dev\ Size
Used Dev Size : 2930135040 (2794.39 GiB 3000.46 GB)

But that… is terribly fishy! Before rebooting (an albeit earlier incarnation) of this linear array was part of the bigger array!

What I believe happend

After the reboot, mdadm recognized that a part of the array was missing. Since it was the smallest member, the array device size was automagically grown to fill up the next smallest device.

But that does not sound like uh, sensible behavior, does it?

An alternative would be that for some reason I am no longer creating the maximum size linear raid, but… that's sort of nonsensical as well.

What I have been pondering to do

Shrink the degraded array to exclude the "broken" linear array and then try to –add and –grow again. But I am afraid that does not change the device size actually.

Since I do not understand what exactly went wrong, I would prefer to first understand what caused this problem in the first place before doing anything hasty.

Best Answer

So uh... I guess... well... the disks... shrank?

The area mdadm reserves for metadata by default probably grew... I've had some cases recently where mdadm wasted a whopping 128MiB for no apparent reason. You want to check mdadm --examine /dev/device* for the data offset entry. Ideally it should be no more than 2048 sectors.

If that is indeed the problem, you could use mdadm --create along with the --data-offset= parameter to make mdadm waste less space for metadata.

If that's still not sufficient, you'd have to either try your luck with the old 0.90 metadata (which might be the most space efficient as it uses no such offset), or shrink the other side of the RAID a little (remember to shrink the LV / filesystem first).

Related Question