I have a lab setup with 16 HP Z620 systems, all alike (purchased at the same time), with exactly the same Ubuntu 12.04 installation with current kernel 3.13.0-44-generic. Well, not quite all alike: 15 of these have BIOS version J61 v03.06, and the 16th has BIOS version J61 v03.18. All have static IP address with network-manager, avahi-daemon, and cups-browsed disabled.
The bizarre thing is that the 15 systems show load averages much less than 1 (as I write this, uptime shows a load average of 0.00), but the 16th system always shows a load average of 1.00 or above. Here's a top snapshot:
top - 13:13:04 up 25 min, 3 users, load average: 1.00, 1.03, 0.91
Tasks: 203 total, 1 running, 202 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.9 us, 0.3 sy, 0.0 ni, 97.5 id, 1.3 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 12232332 total, 1583716 used, 10648616 free, 63148 buffers
KiB Swap: 12505084 total, 0 used, 12505084 free. 626708 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 33772 3024 1468 S 0.0 0.0 0:00.79 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.10 ksoftirqd/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:+
7 root 20 0 0 0 0 S 0.0 0.0 0:01.64 rcu_sched
8 root 20 0 0 0 0 S 0.0 0.0 0:00.28 rcuos/0
9 root 20 0 0 0 0 S 0.0 0.0 0:00.23 rcuos/1
10 root 20 0 0 0 0 S 0.0 0.0 0:00.20 rcuos/2
11 root 20 0 0 0 0 S 0.0 0.0 0:01.95 rcuos/3
12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/0
14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/1
15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/2
16 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/3
I'm baffled as to why the load average on this one box is always 1.00 or above. Any suggestions?
BTW, I upgraded the BIOS on system 16 to version 3.85, but this didn't change anything. I also installed Ubuntu 14.04, but I still get the same behavior.
Best Answer
When top does not identify CPU usage or I/O wait as the source of the load average, then typically it is a task or tasks in uninterruptible sleep (one task in your case). Identify them with this command:
vmstat can also be used, but only to give the number of tasks in uninterruptible sleep. Example:
where tasks in uninterruptible sleep are under the "b" column under "procs".
It would be highly unusual to observe any constant non-zero number in the "r" column (processes waiting for run time) without also observing CPU usage and or I/O wait. In the two examples below, one is for an unloaded system and one is for a loaded system.
If some sort of hung process in the queue is suspected, try this to identify:
Example (where I have 3 heavy processes running, properly):
Do the command a few times to help identify the real culprit, as there may well be spurious real processes running.
The last thing to try is to just look at the entire list of threads for any anomalies. Example:
Where anything other than "S" or "R" in the first column is of interest. Perhaps filter the list with: