Ubuntu – Occasional dmesg log “NOHZ: local_softirq_pending 08”

14.04kernel

I have an Ubuntu 14.04 server that is occasionally issuing "NOHZ: local_softirq_pending 08" errors to the dmesg log. This started after upgrading to kernel 4.4; previously it was running without issue on a 3.16 kernel. Here's an excerpt from the end of the log:

[    7.805258] audit: type=1400 audit(1484883362.092:11): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/sbin/dhclient" pid=1636 comm="apparmor_parser"
[   10.605443] igb 0000:c1:00.0 eth0: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[   10.605545] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   19.219187] ixgbe 0000:02:00.1 p4p2: NIC Link is Up 10 Gbps, Flow Control: None
[   19.219368] IPv6: ADDRCONF(NETDEV_CHANGE): p4p2: link becomes ready
[   52.010390] ip_tables: (C) 2000-2006 Netfilter Core Team
[   52.089283] init: plymouth-upstart-bridge main process ended, respawning
[ 2857.027773] perf interrupt took too long (2542 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[ 7195.391731] perf interrupt took too long (5012 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
[37277.461862] perf interrupt took too long (10050 > 10000), lowering kernel.perf_event_max_sample_rate to 12500
[239795.500056] NOHZ: local_softirq_pending 08
[579047.644110] NOHZ: local_softirq_pending 08
[837865.916051] NOHZ: local_softirq_pending 08

It's a production database host with 32 cores under a decent amount of load.

I'm wondering if I should be concerned about these messages, and if so how I might go about fixing the issue.

Kernel details here:

[    0.000000] Linux version 4.4.0-59-generic (buildd@lcy01-32) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) ) #80~14.04.1-Ubuntu SMP Fri Jan 6 18:02:02 UTC 2017 (Ubuntu 4.4.0-59.80~14.04.1-generic 4.4.35)
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.4.0-59-generic root=UUID=5db4a2c8-24f4-409b-b437-6120682cc518 ro noautogroup transparent_hugepage=never nomdmonddf nomdmonisw

Best Answer

Add nohz=off to the kernel parameters during boot to disable it.

This option causes RCU to attempt to accelerate grace periods in order to allow CPUs to enter dynticks-idle state more quickly. On the other hand, this option increases the overhead of the dynticks-idle checking, particularly on systems with large numbers of CPUs.

You seem to be affected by the bold part.

More reading ...