Linux – How does the Linux kernel handle shared IRQs

interruptirqkernellinuxpci

According to what I've read so far, "when the kernel receives an interrupt, all the registered handlers are invoked."

I understand that the registered handlers for each IRQ can be viewed via /proc/interrupts, and I also understand that the registered handlers come from the drivers that have invoked request_irq passing in a callback roughly of the form:

irqreturn_t (*handler)(int, void *)

Based on what I know, each of these interrupt handler callbacks associated with the particular IRQ should be invoked, and it is up to the handler to determine whether the interrupt should indeed be handled by it. If the handler should not handle the particular interrupt it must return the kernel macro IRQ_NONE.

What I am having trouble understanding is, how each driver is expected to determine whether it should handle the interrupt or not. I suppose they can keep track internally if they're supposed to be expecting an interrupt. If so, I don't know how they'd be able to deal with the situation in which multiple drivers behind the same IRQ are expecting an interrupt.

The reason I'm trying to understand these details is because I'm messing with the kexec mechanism to re-execute the kernel in the middle of system operation while playing with the reset pins and various registers on a PCIe bridge as well as a downstream PCI device. And in doing so, after a reboot I'm either getting kernel panics, or other drivers complaining that they're receiving interrupts even though no operation was taking place.

How the handler decided that the interrupt should be handled by it is the mystery.

Edit: In case it's relevant, the CPU architecture in question is x86.

Best Answer

This is covered in chapter 10 of Linux Device Drivers, 3rd edition, by Corbet et al. It is available for free online, or you may toss some shekels O'Reilly's way for dead tree or ebook forms. The part relevant to your question begins on page 278 in the first link.

For what it's worth, here is my attempt to paraphrase those three pages, plus other bits I've Googled up:

  • When you register a shared IRQ handler, the kernel checks that either:

    a. no other handler exists for that interrupt, or

    b. all of those previously registered also requested interrupt sharing

    If either case applies, it then checks that your dev_id parameter is unique, so that the kernel can differentiate the multiple handlers, e.g. during handler removal.

  • When a PCI¹ hardware device raises the IRQ line, the kernel's low-level interrupt handler is called, and it in turn calls all of the registered interrupt handlers, passing each back the dev_id you used to register the handler via request_irq().

    The dev_id value needs to be machine-unique. The common way to do that is to pass a pointer to the per-device struct your driver uses to manage that device. Since this pointer must be within your driver's memory space for it to be useful to the driver, it is ipso facto unique to that driver.²

    If there are multiple drivers registered for a given interrupt, they will all be called when any of the devices raises that shared interrupt line. If it wasn't your driver's device that did this, your driver's interrupt handler will be passed a dev_id value that doesn't belong to it. Your driver's interrupt handler must immediately return when this happens.

    Another case is that your driver is managing multiple devices. The driver's interrupt handler will get one of the dev_id values known to the driver. Your code is supposed to poll each device to find out which one raised the interrupt.

    The example Corbet et al. give is that of a PC parallel port. When it asserts the interrupt line, it also sets the top bit in its first device register. (That is, inb(0x378) & 0x80 == true, assuming standard I/O port numbering.) When your handler detects this, it is supposed to do its work, then clear the IRQ by writing the value read from the I/O port back to the port with the top bit cleared.

    I don't see any reason that particular mechanism is special. A different hardware device could choose a different mechanism. The only important thing is that for a device to allow shared interrupts, it has to have some way for the driver to read the interrupt status of the device, and some way to clear the interrupt. You'll have to read your device's datasheet or programming manual to find out what mechanism your particular device uses.

  • When your interrupt handler tells the kernel it handled the interrupt, that doesn't stop the kernel from continuing to call any other handlers registered for that same interrupt. This is unavoidable if you are to share an interrupt line when using level-triggered interrupts.

    Imagine two devices assert the same interrupt line at the same time. (Or at least, so close in time that the kernel doesn't have time to call an interrupt handler to clear the line and thereby see the second assertion as separate.) The kernel must call all handlers for that interrupt line, to give each a chance to query its associated hardware to see if it needs attention. It is quite possible for two different drivers to successfully handle an interrupt within the same pass through the handler list for a given interrupt.

    Because of this, it is imperative that your driver tell the device it is managing to clear its interrupt assertion sometime before the interrupt handler returns. It's not clear to me what happens otherwise. The continuously-asserted interrupt line will either result in the kernel continuously calling the shared interrupt handlers, or it will mask the kernel's ability to see new interrupts so the handlers are never called. Either way, disaster.


Footnotes:

  1. I specified PCI above because all of the above assumes level-triggered interrupts, as used in the original PCI spec. ISA used edge-triggered interrupts, which made sharing tricky at best, and possible even then only when supported by the hardware. PCIe uses message-signalled interrupts; the interrupt message contains a unique value the kernel can use to avoid the round-robin guessing game required with PCI interrupt sharing. PCIe may eliminate the very need for interrupt sharing. (I don't know if it actually does, just that it has the potential to.)

  2. Linux kernel drivers all share the same memory space, but an unrelated driver isn't supposed to be mucking around in another's memory space. Unless you pass that pointer around, you can be pretty sure another driver isn't going to come up with that same value accidentally on its own.

Related Question