Ubuntu – Won’t boot (drop to busybox) after kernel upgrade (with encrypted disks)

16.04bootencryptionluks

I have two systems – a 14.04.03 LTS and 16.04 LTS, both of which are not booting after recent kernel upgrades. Any help would be appreciated – I'm no expert, but it seems to me like the new kernel just isn't seeing the encrypted filesystems at boot time.

For both systems, I used the installer to encrypt the boot drive. Later, I added a second disk, encrypted using LUKS, automounted at boot time using a keyfile. I (roughly – was done after system was installed and booted) followed the instructions here:

https://www.martineve.com/2012/11/02/luks-encrypting-multiple-partitions-on-debianubuntu-with-a-single-passphrase/

Here's some more details of the systems:

For the 16.04 LTS system:

Hardware is an mSATA SSD on /dev/sdb, which Ubuntu was installed on using the automatic partition setup (/ and swap). A 500GB Sata drive on /dev/sda is used for /data – after system was up and running, luks partition was created and set to auto mount on boot with a keyfile.

$ uname -a
Linux <hostname> 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 x86_64 X86_64 GNU/Linux

$ cat /etc/fstab
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
/dev/mapper/ubuntu--vg-root /               ext4    errors=remount-ro 0       1

# /boot was on /dev/sdb2 during installation
UUID=5039XXXX-XXXX-XXXX-XXXX-d2b4XXXXXXXX /boot           ext2    defaults        0       2

# /boot/efi was on /dev/sdb1 during installation
UUID=1DXX-XX4D  /boot/efi       vfat    umask=0077        0       1
/dev/mapper/ubuntu--vg-swap_1 none            swap    sw              0       0

# encrypted disk /data (unlocked with keyfile
/dev/mapper/data_crypt  /data   ext4    errors=remount-ro   0   1

$ cat /etc/crypttab 
sdb3_crypt UUID=0bg9rz-XXXX-XXXX-XXXX-XXXX-XXXX-XXqvBH none luks
data_crypt UUID=453cXXXX-XXXX-XXXX-XXXX-6926XXXXXXXX /root/<keyfile> luks 

When 4.4.0-21 is selected from grub, a password prompt to unlock sdb3_crypt comes up straight away. The system then hangs at the ubuntu startup prompt. Pressing delete shows the following error message. After a few minutes the system actually does boot, and data_crypt is mounted on /data as expected

[ 2.091698] [drm:intel_set_pch_fifo_underrun_reporting [i915]] ERROR uncleared pch fifo underrun on pch transcoder a
[ 2.091735] [drm:intel_pch_fifo_underrun_irq_handler [i915]] ERROR PCH transcoder A FIFO underrun
lvmetad is not active yet, using direct activation during sysinit
Volume group "ubuntu-vg" not found
cannot process volume group ubuntu-vg
/run/lvm/lvmetad.socket: connect failed: no such file or directory:
Reading all physical volumes. This may take a while...
Found volume group "ubuntu-vg" using metadata type lvm2
/run/lvm/lvmetad.socket: connect failed: no such file or directory:
WARNING: failed to connect to lvmetad. Falling back to internal scanning.
2 logical volume(s) in volume group "ubuntu-vg" now active
/dev/mapper/ubuntu--vg-root: clean 488960/6750209 files, 11825728/26996736 blocks
[  ***] A start job is running for dev-disk-by\x2duuid-0bg9rZ\X2deNSE\x2d1E8u\x2XXXXX\x2XXXXX\x2XXXX\x2dZpqvBH.device (xxS / 1min 30s)

From the Grub menu, selecting either 4.4.0-22 or 4.4.0-24 and then pressing delete shows the following information (transcribed):

[ 2.091698] [drm:intel_set_pch_fifo_underrun_reporting [i915]] ERROR uncleared pch fifo underrun on pch transcoder a
[ 2.091735] [drm:intel_pch_fifo_underrun_irq_handler [i915]] ERROR PCH transcoder A FIFO underrun
lvmetad is not active yet, using direct activation during sysinit
Volume group "ubuntu-vg" not found
cannot process volume group ubuntu-vg
/run/lvm/lvmetad.socket: connect failed: no such file or directory:
WARNING: failed to connect to lvmetad. Falling back to internal scanning.
Reading all physical volumes. This may take a while.

The last 3 lines repeat maybe 30 times, and after a few minutes drops to a busybox shell. Manually running cryptsetup luksOpen /dev/sdb3 sdb3_crypt (and entering passphrase) is ok, but then I can't mount this (probably because it is in initramfs and I don't know what I'm doing there).

Deleting lines relating to data_crypt from /etc/fstab and /etc/crypttab makes no difference, so I don't think this is related to the automount of this disk causing issues.

I have also tried re-creating the initramfs for these kernels, which has not made any difference.

I am also experiencing a similar problem on a 14.04.03 LTS system as described below, but can't provide exact details (it is the system I'm posting this from).

For 14.04.03 LTS System:

Hardware is an NVMe SSD for / and swap, and a 1TB SATA disk for /data. Again, Ubuntu was installed onto the SSD, and then later the SATA drive was connected, set up as an encrypted partition using luks, then added to automount at boot with a keyfile.

packages for xenial kernel have been installed:

  • linux-generic-lts-xenial
  • linux-headers-generic-lts-xenial
  • linux-image-generic-lts-xenial

Command output:

Linux hostname 4.2.0-36-generic #42~14.04.1-Ubuntu SMP Fri May 13 17:27:22 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/fstab

# <file system> <mount point>   <type>  <options>       <dump>  <pass>
/dev/mapper/ubuntu--vg-root /               ext4    errors=remount-ro 0       1

# /boot was on /dev/nvme0n1p2 during installation
UUID=0657XXXX-XXXX-XXXX-XXXX-1be3XXXXXXXX /boot           ext2    defaults        0       2

# /boot/efi was on /dev/nvme0n1p1 during installation
UUID=AEXX-XX66  /boot/efi       vfat    defaults        0       1
/dev/mapper/ubuntu--vg-swap_1 none            swap    sw              0       0

# encrypted disk /data (unlocked with keyfile
/dev/mapper/data_crypt  /data   ext4    errors=remount-ro   0   1

$ cat /etc/crypttab 
nvme0n1p3_crypt UUID=61235695-XXXX-XXXX-XXXX-962cXXXXXXXX none luks,discard
data_crypt UUID=2e92XXXX-XX57-XXXX-XXXX-af9fXXXXXXXX /root/<keyfile> luks

Any of the 4.4 series kernels fails to boot. The 4.2 series kernel boots fine though.

Best Answer

Ok, figured it out.

The UUID of sdb3_crypt (where / and swap are located) somehow wasn't right in /etc/crypttab. I verified this by comparing the UUIDs listed in /etc/crypttab with those listed in /dev/disk/by-uuid/. No idea how that got wrong, but I must have fat fingered it somewhere along the way.

I corrected /etc/crypttab with the correct UUID of /dev/sdb3, then updated the initramfs ($update-initramfs -c -k 4.4.0-24-generic). Reboot and now it works ok.