LVM not coming up after reboot, couldn’t find device with uuid

data-recoveryhard-disklvm

Had a VM that was, up until recently working without issue, but needed to be rebooted after some configuration changes. However after rebooting the VM didn't come back up, saying it couldn't find the root device (which was an LVM volume under /dev/mapper).

Booting into recovery mode, I saw that the filesystems under /dev/mapper, and /dev/dm-* did indeed, not exist.

The filesystem should be layed out with

  • /dev/sda1 as the boot partition
  • /dev/sda2 extended partition containing
  • /dev/sda5 and /dev/sda6 as LVM partitions
  • /dev/sda{5,6} are both PVs in a single VG
  • with 2 LVs for the root FS and swap

Doing an lvm pvshow gives me:

  Couldn't find device with uuid '8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi'.
  Couldn't find device with uuid '8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi'.
  Couldn't find device with uuid '8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi'.
  --- Physical volume ---
  PV Name               unknown device
  VG Name               of1-server-lucid
  PV Size               19.76 GiB / not usable 2.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              5058
  Free PE               0
  Allocated PE          5058
  PV UUID               8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi

  --- Physical volume ---
  PV Name               /dev/sda6
  VG Name               of1-server-lucid
  PV Size               100.00 GiB / not usable 2.66 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              25599
  Free PE               0
  Allocated PE          25599
  PV UUID               cuhP6R-QbiO-U7ye-WvXN-ZNq5-cqUs-VVZpux

So it appears as though /dev/sda5 is not listed as a PV and is causing errors.

fdisk -l:

Disk /dev/sda: 128.8 GB, 128849018880 bytes
255 heads, 63 sectors/track, 15665 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00044a6c

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          32      248832   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2              32       15665   125579256+   5  Extended
/dev/sda5              32        2611    20722970   8e  Linux LVM
/dev/sda6            2612       15665   104856223+  8e  Linux LVM

So I can see the /dev/sda5 device exists, but blkid isn't reporting anything for it:

~ # blkid
/dev/sda1: UUID="d997d281-2909-41d3-a835-dba400e7ceec" TYPE="ext2" 
/dev/sda6: UUID="cuhP6R-QbiO-U7ye-WvXN-ZNq5-cqUs-VVZpux" TYPE="LVM2_member" 

After taking a snapshot of the disks, I tried recovering the PV from the archive config:

~ # pvremove -ff /dev/sda5
Labels on physical volume "/dev/sda5" successfully wiped
~ # pvcreate --uuid=8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi /dev/sda5 --restorefile=/etc/lvm/archive/of1-dev-server_00000.vg
Couldn't find device with uuid '8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi'.
  Physical volume "/dev/sda5" successfully created
~ # vgchange -a y
2 logical volume(s) in volume group "of1-dev-server" now active"

So at least now the device has a blkid:

/dev/sda1: UUID="d997d281-2909-41d3-a835-dba400e7ceec" TYPE="ext2" 
/dev/sda6: UUID="cuhP6R-QbiO-U7ye-WvXN-ZNq5-cqUs-VVZpux" TYPE="LVM2_member" 
/dev/sda5: UUID="8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi" TYPE="LVM2_member" 

Doing a pvdisplay now also shows the correct device:

  --- Physical volume ---
  PV Name               /dev/sda5
  VG Name               of1-dev-danr-lucid
  PV Size               19.76 GiB / not usable 2.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              5058
  Free PE               0
  Allocated PE          5058
  PV UUID               8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi

  --- Physical volume ---
  PV Name               /dev/sda6
  VG Name               of1-dev-danr-lucid
  PV Size               100.00 GiB / not usable 2.66 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              25599
  Free PE               0
  Allocated PE          25599
  PV UUID               cuhP6R-QbiO-U7ye-WvXN-ZNq5-cqUs-VVZpux

And the mapper devices exist:

crw-rw----    1 root     root      10,  59 Jul 10 10:47 control
brw-rw----    1 root     root     252,   0 Jul 10 11:21 of1--dev--server-root
brw-rw----    1 root     root     252,   1 Jul 10 11:21 of1--dev--server-swap_1

Also the LVMs seem to be listed correctly:

~ # lvdisplay
  --- Logical volume ---
  LV Name                /dev/of1-dev-danr-lucid/root
  VG Name                of1-dev-danr-lucid
  LV UUID                pioKjE-iJEp-Uf9S-0MxQ-UR0H-cG9m-5mLJm7
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                118.89 GiB
  Current LE             30435
  Segments               2
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:0

  --- Logical volume ---
  LV Name                /dev/of1-dev-danr-lucid/swap_1
  VG Name                of1-dev-danr-lucid
  LV UUID                mIq22L-RHi4-tudV-G6nP-T1e6-UQcS-B9hYUF
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                888.00 MiB
  Current LE             222
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:1

But trying to mount the root device gives me an error:

~ # mount /dev/mapper/of1--dev--server-root /mnt2
mount: mounting /dev/mapper/of1--dev--server-root on /mnt2 failed: Invalid argument

So I tried a disk consistency check:

~ # fsck.ext4 -f /dev/mapper/of1--dev--server-root
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/mapper/of1--dev--server-root
[...]

So I tried another superblock:

~ # mke2fs -n /dev/mapper/of1--dev--server-root
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
7798784 inodes, 31165440 blocks
1558272 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
952 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872
~ # fsck.ext4 -y -b 23887872 /dev/mapper/of1--dev--server-root

Upon which I received ridiculous numbers of errors, the main ones I saw were:

  • Superblock has an invalid journal
  • One or more block group descriptor checksums are invalid.
  • Truncating orphaned inode ()
  • Already cleared block #0 () found in orphaned inode
  • /dev/mapper/of1–dev–server-root contains a filesystem with errors, check forced
  • Resize inode not valid. Recreate
  • Root inode is not a directory.
  • Reserved inode 3 () has invalid mode
  • HTREE directory inode has invalid root node
  • Inode , i_blocks is , should be 0.
  • Unconnected directory inode

After a lot of messages, it says it's done. Mounting the directory as above works fine, but the directory is empty with a lost+found directory full of files, most just numbers, some have filenames vaguely relating to files that once existed.

So, how do I bring the VM back up?

Whenever I see disk errors, my first instinct is to snapshot so things don't get worse, so I have a snapshot from just after reboot when I first saw the error.

I know the data is there somewhere, as the VM worked without issue until I rebooted. The user can't remember changing anything on the filesystem recently, but it had almost a year of uptime when I rebooted it so all sorts could have happened since then.

We also, unfortunately, don't have backups as Puppet had been disabled on this node.

The original OS was Ubuntu Lucid, running on VMWare.

Best Answer

If I understood correctly, you have already fixed the volume, even though you have a lost+found directory which may or may not have critical files.

What is going on now that's blocking the VM from booting? It still can't find the boot device?

Your fdisk -l output seems a bit off to me. Have you considered the possibility that only the partition table was damaged? In this scenario, your snapshot may be helpful, and in the best case you won't even need a(nother) fsck. But we'll need something to try to find the partition offsets - I've used testdisk successfully more than once.

In the worst case scenario, if you need to scrape anything from the volume, forensic tools like PhotoRec or Autopsy/The Sleuth Kit may prove useful.

If none of this works, give us a lsblk -o NAME,RM,SIZE,RO,TYPE,MAJ:MIN -fat too (these flags are just to show as much information as possible), and relevant dmesg output, if any.

Related Question