linux-kernel interrupt – How to Deduce the Nature of an Interrupt from Its Number

interruptlinux-kernel

I'm trying to boot/install Linux for learning purposes, using an older PC (HP Pavilion Elite m9660de). The following message is the first thing that shows up when booting (Ubuntu and Fedora, both from a bootable USB-stick and a fresh install):

do_IRQ: 1.55 No irq handler for vector

do_IRQ: 2.55 No irq handler for vector

do_IRQ: 3.55 No irq handler for vector

The boot process will stall there for a very long time (like 15 minutes), and eventually continue.


I'm not asking to get support for this concrete problem, but rather to understand how to interpret such a message.

I found out in the kernel code of do_IRQ that 55 is a vector. As I understand it, this is more or less the number of an interrupt, corresponding to a memory location containing the address of the interrupt handler.

I would have expected that there is a fixed correspondence between these numbers and the events that cause the interrupt. Where can I find documentation on this? Is this Linux-specific, processor-specific or motherboard-specific?

Best Answer

do_IRQ: 1.55 No irq handler for vector

This message can be found in Linux kernel source file arch/x86/kernel/irq.c, so it's about x86-specific handling of interrupts.

/*
 * do_IRQ handles all normal device IRQ's (the special
 * SMP cross-CPU interrupts have their own specific
 * handlers).
 */
__visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
{
        struct pt_regs *old_regs = set_irq_regs(regs);
        struct irq_desc * desc;
        /* high bit used in ret_from_ code  */
        unsigned vector = ~regs->orig_ax;

        entering_irq();

        /* entering_irq() tells RCU that we're not quiescent.  Check it. */
        RCU_LOCKDEP_WARN(!rcu_is_watching(), "IRQ failed to wake up RCU");

        desc = __this_cpu_read(vector_irq[vector]);

        if (!handle_irq(desc, regs)) {
                ack_APIC_irq();

                if (desc != VECTOR_RETRIGGERED && desc != VECTOR_SHUTDOWN) {
                        pr_emerg_ratelimited("%s: %d.%d No irq handler for vector\n",
                                             __func__, smp_processor_id(),
                                             vector);
                } else {
                        __this_cpu_write(vector_irq[vector], VECTOR_UNUSED);
                }
        }

        exiting_irq();

        set_irq_regs(old_regs);
        return 1;
}

So, the first number (before the dot) is the ID of the reporting processor, and the 55 is the interrupt vector as you discovered. The message could be avoided if the IRQ vector was in the state VECTOR_SHUTDOWN or VECTOR_RETRIGGERED.

According to arch/x86/kernel/apic/vector.c the state VECTOR_SHUTDOWN indicates an interrupt vector that was intentionally cleared (e.g. a hardware device was stopped and its driver unloaded in a controlled fashion).

The VECTOR_RETRIGGERED is set in fixup_irqs() at the end of arch/x86/kernel/irq.c and seems to be related to CPU hotplugging, or more specifically marking a CPU as offline.

So, neither of those states should be applicable on a regular PC at boot time.

Your idea of a fixed correspondence between interrupt vector numbers and causes of interrupts would have been valid with the ISA bus architecture of the original IBM PC... and quite a while after that.

But somewhere in the era of 486 processors and the first Pentiums, an APIC (Advanced Programmable Interrupt Controller) was introduced. It was one of the components enabling multiple processors to coexist in PC architecture. It opened the way to increase the number of available hardware interrupt lines from 15 (the pair of 8259 interrupt controllers like in the first IBM PC-AT), eventually up to 224 discrete hardware interrupts. This enabled the design of more complex systems, and also helped in making truly auto-configurable buses possible.

Essentially, either the system firmware or the operating system is supposed to configure the device on the bus to use a particular interrupt line, and then to program the APIC to route the interrupt signal to an available interrupt vector in the CPU. This requires knowledge on how the bus is actually wired on the motherboard, so in practice this is almost exclusively done by the system firmware, and many of the exceptions are specifically to patch up firmware bugs.

The PCI bus originally had its interrupts mapped to ISA-style interrupts, but when APICs became integrated in CPUs, this limitation could be removed, reducing IRQ latency and allowing more complex systems to be built. With PCI bus version 2.2, Message Signaled Interrupts (MSI) were introduced, which allowed discrete hardware interrupts without dedicated physical interrupt lines. In PCI Express, MSI became the standard way to handle interrupts.

So... it looks like your system's hardware includes an active source of interrupts routed to IRQ vector 55, but Linux currently has no driver loaded to handle it. Since the PCI configuration space is readable in a standard fashion and Linux does read it, any devices on the PCI bus (or on PCIe links) should have been detected, identified and their interrupt configuration should be known.

It also might be that the source of IRQ's is something that is not a PCI device, i.e. a platform device, for example something that is part of the system chipset or connected to them using some non-PCI-compatible interface. All such devices should be described by the firmware ACPI tables... but apparently in your case, this source of these IRQ's isn't.

My conclusion is that this might be a firmware bug: see if HP offers a BIOS update for your system. (At this moment, HP's support downloads page for the Pavilion Elite m9660de seems to be failing to load for me.)

According to this thread in Ubuntu forums it could also be a hardware bug in the VIA chipset: if your system has this chipset, adding the boot option pci=nomsi,noaer in GRUB might fix it.

If your current kernel has debugfs support and CONFIG_GENERIC_IRQ_DEBUGFS kernel option enabled, you might get a lot of information on the state of IRQ vector 55 with the following commands as root:

mount -t debugfs none /sys/kernel/debug
grep "Vector.*55" /sys/kernel/debug/irq/irqs/*

This should tell you which files in that directory mention "Vector: 55". Reading those files should tell you basically everything the kernel knows about that interrupt vector.

Related Question