Debugging boot performance issues in grub / before kernel logging starts

bootboot-loaderdebugginggrubperformance

My PC takes a long time booting. I have reason to believe it is caused by delays after grub, but before kernel logging starts (booting takes 30 seconds, but dmesg messages have time stamps between 0.000009.34223, details in this post).

Are there any ways to debug what's going on? In particular:

  • Is there a way to make grub itself more verbose, or keep logs?
  • Are there any other processes between grub and the kernel logging that could be taking time? How do I debug those?

I believe the question is not specific to my setup. But just in case, I'm running Ubuntu 16.10 and grub(2)

EDIT:

Following a suggestion by @TooTea, I set the debug=all environment variables which produced a lot of messages a la script/script.c:50 malloc 0x7a9a2ca0 that should not be a problem. Then there is one block of messages that coincide with a 8 seconds delay delay:

kern/dl.c:56 Detecting ext2... 
lib/relocator.c:1397 chunks = 0x7a7e0ae0
lib/relocator.c:434 trying to allocate in ...-... aligned ... size ...
lib/relocator.c:1198 allocated: ...+...
lib/relocator.c:1409 allocated .../...
lib/relocator.c:1410  chunks = 0x7a7e0ae0

Best Answer

As you can probably imagine, the handover between GRUB and Linux involves some really intricate low-level steps, so there's no room for advanced tracing or logging. Fortunately, that shouldn't be a problem as there's also no room for any extended stalls in that code. You can get a really detailed trace of all the preparatory steps in GRUB by setting the debug environment variable.

It is, however, more likely that whatever delays you see happen after the control is transferred to the Linux kernel. Normally, you can't see the log messages before the console is initialized. Also, as you have noted, all the timestamps are zero until the timekeeping subsystem is initialized, so it is impossible to figure out the timing later.

Fortunately, you can use the earlyprintk boot option to make the kernel actually print the log messages somewhere, so that you can follow them in real time and see where the delay occurs. earlyprintk can be directed to various destinations, but the most relevant for usual (physical) machines will be serial, vga (old-school console), or efi. Make sure that your kernel is built with the appropriate config options (CONFIG_EARLY_PRINTK*).

Related Question