Centos – How to know which IRQ is responsible of high CPU usage

centoscpukernel

I've moved a server from one mainboard to another due a disk controller failure.

Since then I've noticed that constantly a 25% of one of the cores goes always to IRQ however I haven't managed myself to know which is the IRQ responsible for that.

The kernel is a Linux 2.6.18-194.3.1.el5 (CentOS). mpstat -P ALLshows:

18:20:33     CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
18:20:33     all    0,23    0,00    0,08    0,11    6,41    0,02    0,00   93,16   2149,29
18:20:33       0    0,25    0,00    0,12    0,07    0,01    0,05    0,00   99,49    127,08
18:20:33       1    0,14    0,00    0,03    0,04    0,00    0,00    0,00   99,78      0,00
18:20:33       2    0,23    0,00    0,02    0,03    0,00    0,00    0,00   99,72      0,02
18:20:33       3    0,28    0,00    0,15    0,28   25,63    0,03    0,00   73,64   2022,19

This is the /proc/interrupts

cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:        245          0          0    7134094    IO-APIC-edge  timer
  8:          0          0         49          0    IO-APIC-edge  rtc
  9:          0          0          0          0   IO-APIC-level  acpi
 66:         67          0          0          0   IO-APIC-level  ehci_hcd:usb2
 74:     902214          0          0          0         PCI-MSI  eth0
169:          0          0         79          0   IO-APIC-level  ehci_hcd:usb1
177:          0          0          0    7170885   IO-APIC-level  ata_piix, b4xxp
185:          0          0          0      59375   IO-APIC-level  ata_piix
NMI:          0          0          0          0 
LOC:    7104234    7104239    7104243    7104218 
ERR:          0
MIS:          0

How can I identify which IRQ is causing the high CPU usage?

Edit:

Output from dmesg | grep -i b4xxp

wcb4xxp 0000:30:00.0: probe called for b4xx...
wcb4xxp 0000:30:00.0: Identified Wildcard B410P (controller rev 1) at 00012000, IRQ 177
wcb4xxp 0000:30:00.0: VPM 0/1 init: chip ver 33
wcb4xxp 0000:30:00.0: VPM 1/1 init: chip ver 33
wcb4xxp 0000:30:00.0: Hardware echo cancellation enabled.
wcb4xxp 0000:30:00.0: Port 1: TE mode
wcb4xxp 0000:30:00.0: Port 2: TE mode
wcb4xxp 0000:30:00.0: Port 3: TE mode
wcb4xxp 0000:30:00.0: Port 4: TE mode
wcb4xxp 0000:30:00.0: Did not do the highestorder stuff
wcb4xxp 0000:30:00.0: new card sync source: port 3

Best Answer

Well, since you're specifically asking how to know which IRQ is responsible for the number in mpstat, you can assume it's not the local interrupt timer (LOC), since those numbers are fairly equal, and yet mpstat shows some of those cpus at 0 %irq.

That leaves IRQ 0, which is the system timer, and which you can't do anything about, and IRQ 177, which is tied to your b4xxp driver.

My guess is that IRQ 177 would be your culprit.

If this is causing a problem, and you would like to change the behavior your see, try:

  1. disabling the software that uses that card, and see if the interrupts decrease.

  2. removing that card from the system, and unloading the driver, and see if there's improvement.

  3. move that card to another slot and see if that helps.

  4. check for updated drivers or patches for the software.

If it's not a problem, and you were just curious, then carry on. :)

Related Question