Had a VM that was, up until recently working without issue, but needed to be rebooted after some configuration changes. However after rebooting the VM didn't come back up, saying it couldn't find the root device (which was an LVM volume under /dev/mapper).
Booting into recovery mode, I saw that the filesystems under /dev/mapper, and /dev/dm-* did indeed, not exist.
The filesystem should be layed out with
/dev/sda1
as the boot partition/dev/sda2
extended partition containing/dev/sda5
and/dev/sda6
as LVM partitions/dev/sda{5,6}
are both PVs in a single VG- with 2 LVs for the root FS and swap
Doing an lvm pvshow
gives me:
Couldn't find device with uuid '8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi'.
Couldn't find device with uuid '8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi'.
Couldn't find device with uuid '8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi'.
--- Physical volume ---
PV Name unknown device
VG Name of1-server-lucid
PV Size 19.76 GiB / not usable 2.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 5058
Free PE 0
Allocated PE 5058
PV UUID 8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi
--- Physical volume ---
PV Name /dev/sda6
VG Name of1-server-lucid
PV Size 100.00 GiB / not usable 2.66 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 25599
Free PE 0
Allocated PE 25599
PV UUID cuhP6R-QbiO-U7ye-WvXN-ZNq5-cqUs-VVZpux
So it appears as though /dev/sda5
is not listed as a PV and is causing errors.
fdisk -l
:
Disk /dev/sda: 128.8 GB, 128849018880 bytes
255 heads, 63 sectors/track, 15665 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00044a6c
Device Boot Start End Blocks Id System
/dev/sda1 * 1 32 248832 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 32 15665 125579256+ 5 Extended
/dev/sda5 32 2611 20722970 8e Linux LVM
/dev/sda6 2612 15665 104856223+ 8e Linux LVM
So I can see the /dev/sda5
device exists, but blkid
isn't reporting anything for it:
~ # blkid
/dev/sda1: UUID="d997d281-2909-41d3-a835-dba400e7ceec" TYPE="ext2"
/dev/sda6: UUID="cuhP6R-QbiO-U7ye-WvXN-ZNq5-cqUs-VVZpux" TYPE="LVM2_member"
After taking a snapshot of the disks, I tried recovering the PV from the archive config:
~ # pvremove -ff /dev/sda5
Labels on physical volume "/dev/sda5" successfully wiped
~ # pvcreate --uuid=8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi /dev/sda5 --restorefile=/etc/lvm/archive/of1-dev-server_00000.vg
Couldn't find device with uuid '8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi'.
Physical volume "/dev/sda5" successfully created
~ # vgchange -a y
2 logical volume(s) in volume group "of1-dev-server" now active"
So at least now the device has a blkid
:
/dev/sda1: UUID="d997d281-2909-41d3-a835-dba400e7ceec" TYPE="ext2"
/dev/sda6: UUID="cuhP6R-QbiO-U7ye-WvXN-ZNq5-cqUs-VVZpux" TYPE="LVM2_member"
/dev/sda5: UUID="8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi" TYPE="LVM2_member"
Doing a pvdisplay
now also shows the correct device:
--- Physical volume ---
PV Name /dev/sda5
VG Name of1-dev-danr-lucid
PV Size 19.76 GiB / not usable 2.00 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 5058
Free PE 0
Allocated PE 5058
PV UUID 8x38hf-mzd7-xTes-y6IV-xRMr-qrNP-0dNnLi
--- Physical volume ---
PV Name /dev/sda6
VG Name of1-dev-danr-lucid
PV Size 100.00 GiB / not usable 2.66 MiB
Allocatable yes (but full)
PE Size 4.00 MiB
Total PE 25599
Free PE 0
Allocated PE 25599
PV UUID cuhP6R-QbiO-U7ye-WvXN-ZNq5-cqUs-VVZpux
And the mapper devices exist:
crw-rw---- 1 root root 10, 59 Jul 10 10:47 control
brw-rw---- 1 root root 252, 0 Jul 10 11:21 of1--dev--server-root
brw-rw---- 1 root root 252, 1 Jul 10 11:21 of1--dev--server-swap_1
Also the LVMs seem to be listed correctly:
~ # lvdisplay
--- Logical volume ---
LV Name /dev/of1-dev-danr-lucid/root
VG Name of1-dev-danr-lucid
LV UUID pioKjE-iJEp-Uf9S-0MxQ-UR0H-cG9m-5mLJm7
LV Write Access read/write
LV Status available
# open 0
LV Size 118.89 GiB
Current LE 30435
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:0
--- Logical volume ---
LV Name /dev/of1-dev-danr-lucid/swap_1
VG Name of1-dev-danr-lucid
LV UUID mIq22L-RHi4-tudV-G6nP-T1e6-UQcS-B9hYUF
LV Write Access read/write
LV Status available
# open 0
LV Size 888.00 MiB
Current LE 222
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 252:1
But trying to mount the root device gives me an error:
~ # mount /dev/mapper/of1--dev--server-root /mnt2
mount: mounting /dev/mapper/of1--dev--server-root on /mnt2 failed: Invalid argument
So I tried a disk consistency check:
~ # fsck.ext4 -f /dev/mapper/of1--dev--server-root
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/mapper/of1--dev--server-root
[...]
So I tried another superblock:
~ # mke2fs -n /dev/mapper/of1--dev--server-root
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
7798784 inodes, 31165440 blocks
1558272 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
952 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
~ # fsck.ext4 -y -b 23887872 /dev/mapper/of1--dev--server-root
Upon which I received ridiculous numbers of errors, the main ones I saw were:
- Superblock has an invalid journal
- One or more block group descriptor checksums are invalid.
- Truncating orphaned inode ()
- Already cleared block #0 () found in orphaned inode
- /dev/mapper/of1–dev–server-root contains a filesystem with errors, check forced
- Resize inode not valid. Recreate
- Root inode is not a directory.
- Reserved inode 3 () has invalid mode
- HTREE directory inode has invalid root node
- Inode , i_blocks is , should be 0.
- Unconnected directory inode
After a lot of messages, it says it's done. Mounting the directory as above works fine, but the directory is empty with a lost+found
directory full of files, most just numbers, some have filenames vaguely relating to files that once existed.
So, how do I bring the VM back up?
Whenever I see disk errors, my first instinct is to snapshot so things don't get worse, so I have a snapshot from just after reboot when I first saw the error.
I know the data is there somewhere, as the VM worked without issue until I rebooted. The user can't remember changing anything on the filesystem recently, but it had almost a year of uptime when I rebooted it so all sorts could have happened since then.
We also, unfortunately, don't have backups as Puppet had been disabled on this node.
The original OS was Ubuntu Lucid, running on VMWare.
Best Answer
If I understood correctly, you have already fixed the volume, even though you have a
lost+found
directory which may or may not have critical files.What is going on now that's blocking the VM from booting? It still can't find the boot device?
Your
fdisk -l
output seems a bit off to me. Have you considered the possibility that only the partition table was damaged? In this scenario, your snapshot may be helpful, and in the best case you won't even need a(nother) fsck. But we'll need something to try to find the partition offsets - I've used testdisk successfully more than once.In the worst case scenario, if you need to scrape anything from the volume, forensic tools like PhotoRec or Autopsy/The Sleuth Kit may prove useful.
If none of this works, give us a
lsblk -o NAME,RM,SIZE,RO,TYPE,MAJ:MIN -fat
too (these flags are just to show as much information as possible), and relevantdmesg
output, if any.