Debugging dpkg configure failure with grub

dpkggrub2

On one of my machines dpkg is unable to finish installing/configuring grub, only giving the error message:

subprocess installed post-installation script returned error exit status 255

Full output:

# dpkg --configure grub-pc
Setting up grub-pc (1.99-27+deb7u3) ...
device node not found
device node not found
device node not found
device node not found
Installation finished. No error reported.
Installation finished. No error reported.
dpkg: error processing grub-pc (--configure):
 subprocess installed post-installation script returned error exit status 255
Errors were encountered while processing:
 grub-pc

There is nothing in the log files that would shed any more light.

Running dpkg with some debug options reveals a little more:

# dpkg -D10113 --configure grub-pc
Setting up grub-pc (1.99-27+deb7u3) ...
D000002: fork/exec /var/lib/dpkg/info/grub-pc.postinst ( configure  )
device node not found
device node not found
device node not found
device node not found
Installation finished. No error reported.
Installation finished. No error reported.
dpkg: error processing grub-pc (--configure):
 subprocess installed post-installation script returned error exit status 255
D010000: trigproc_run_deferred
Errors were encountered while processing:
 grub-pc

Now I know the problem is someplace in /var/lib/dpkg/info/grub-pc.postinst configure, but that script doesn't seem to have any verbosity or debug options and is just to large to read through with nearly 700 lines. The script also doesn't have any exit 255 calls, so I tend to believe the problem isn't even in there but in some other script that is called.

The configure task also fails for the kernel processes:

# dpkg --configure linux-image-3.16.0-0.bpo.4-amd64
Setting up linux-image-3.16.0-0.bpo.4-amd64 (3.16.39-1+deb8u1~bpo70+1) ...
vmlinuz(/boot/vmlinuz-3.16.0-0.bpo.4-amd64
) points to /boot/vmlinuz-3.16.0-0.bpo.4-amd64
 (/boot/vmlinuz-3.16.0-0.bpo.4-amd64) -- doing nothing at /var/lib/dpkg/info/linux-image-3.16.0-0.bpo.4-amd64.postinst line 263.
initrd.img(/boot/initrd.img-3.16.0-0.bpo.4-amd64
) points to /boot/initrd.img-3.16.0-0.bpo.4-amd64
 (/boot/initrd.img-3.16.0-0.bpo.4-amd64) -- doing nothing at /var/lib/dpkg/info/linux-image-3.16.0-0.bpo.4-amd64.postinst line 263.
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-3.16.0-0.bpo.4-amd64
run-parts: /etc/kernel/postinst.d/zz-update-grub exited with return code 255
Failed to process /etc/kernel/postinst.d at /var/lib/dpkg/info/linux-image-3.16.0-0.bpo.4-amd64.postinst line 634.
dpkg: error processing linux-image-3.16.0-0.bpo.4-amd64 (--configure):
 subprocess installed post-installation script returned error exit status 1
Errors were encountered while processing:
 linux-image-3.16.0-0.bpo.4-amd64

Line 634 in /var/lib/dpkg/info/linux-image-3.16.0-0.bpo.4-amd64.postinst boils down to this command:

run-parts --report --exit-on-error --arg=3.16.0-0.bpo.4-amd64 --arg=/boot/vmlinuz-3.16.0-0.bpo.4-amd64 /etc/kernel/postinst.d

Running this command manually results in:

run-parts: /etc/kernel/postinst.d/zz-update-grub exited with return code 255

This script is, as far as I can tell, only a wrapper that does a check and then calls update-grub, which works without error.

update-grub just runs grub-mkconfig, so I ran this command and checked the return value:

# grub-mkconfig -o /boot/grub/grub.cfg
Generating grub.cfg ...
Found linux image: /boot/vmlinuz-3.16.0-0.bpo.4-amd64
Found initrd image: /boot/initrd.img-3.16.0-0.bpo.4-amd64
Found linux image: /boot/vmlinuz-3.2.0-4-amd64
Found initrd image: /boot/initrd.img-3.2.0-4-amd64
# echo $?
255

This seems to be the culprit. The script works, finds all kernels, generates a valid grub config (saves it as /boot/grub/grub.cfg.new though) and then exits with return code 255. And of course it doesn't have any debug options.

How can I go on debugging the problem?

Additional information that might or might not be helpful:

  • the system is running debian wheezy
  • GRUB is version 1.99-27+deb7u3
  • the system has an mdraid
  • the system has been running for years, it's not a new installation. The error appeared only recently
  • not quite sure, but I believe the error started to appear after a faulty hard drive was replaced
  • the configure task only fails for grub and the kernel packages. All other packages can be installed without error

More information from questions that came up later

zulu668:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda4[2] sdb4[1]
      1456504640 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sda3[2] sdb3[1]
      7996352 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda2[2] sdb2[1]
      499392 blocks super 1.2 [2/2] [UU]

unused devices: <none>
zulu668:~# sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Wed Oct 29 12:40:33 2014
     Raid Level : raid1
     Array Size : 499392 (487.77 MiB 511.38 MB)
  Used Dev Size : 499392 (487.77 MiB 511.38 MB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Wed Mar 15 14:51:01 2017
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : zulu668:0  (local to host zulu668)
           UUID : 22e14818:7754cf01:67287402:c31a3328
         Events : 217

    Number   Major   Minor   RaidDevice State
       2       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2

Best Answer

So, at the time of writing, you brilliantly narrowed down your problem to grub-mkconfig and wonder how to debug it.

grub-mkconfig is a shell script that basically build your grub.cfg configuration file by executing every script in /etc/grub.d. There is a set -e command at the beginning of grub-mkconfig, meaning "stop at the first non-managed error you encounter". Chances are your problem is due to the failure of one of the grub.d scripts.

First, let's identify the culprit. Run:

dash -vx grub-mkconfig -o /boot/grub/grub.cfg

dash, the Bourne Shell interpreter that is most likely bound to /bin/sh, will output every line it executes. Since the script probably fails due to the set -e command, the last line is likely to be the grub.d sub-script that fails. I assume you will get something like:

+ echo ### BEGIN /etc/grub.d/99_buggy_script ###
+ /etc/grub.d/99_buggy_script

The script name itself won't probably give you enough evidence on what is going on. Since it's also a Bourne shell script, you can debug it the same way. Change the first line of the grub.d script from

#!/bin/sh

To:

#!/bin/sh -vx

And run grub-mkconfig -o /boot/grub/grub.cfg (dash -vx is no longer necessary). The trace you will get is from the grub.d script.

Hopefully, the problem will be obvious now. Once you have fixed it, don't forget to remove the -vx flags at the beginning of the grub.d sub-script.

Related Question