Why load is high despite the fact that neither CPU or disk is overused

loadperformance

I'm getting the following output from top:

Cpu(s): 43.8%us, 32.5%sy,  4.8%ni,  2.0%id, 15.6%wa,  0.2%hi,  1.2%si,  0.0%st
Mem:  16331504k total, 15759412k used,   572092k free,  4575980k buffers
Swap:  4194296k total,   260644k used,  3933652k free,  1588044k cached

the output from iostat -xk 6 shows the following:

Device: rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda       0.00   360.20   86.20  153.40  1133.60  2054.40    26.61     1.51    6.27   0.77  18.38
sdb       0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd      22.60   198.80   17.40   31.60   265.60   921.60    48.46     0.18    3.70   1.67   8.20
sdc      16.80   218.20   22.20   23.40   261.60   966.40    53.86     0.21    4.56   1.49   6.78

Based on the above it looks like something must be overloaded. But what?

Questions

  1. If its not the harddisk or the CPU then what?
  2. It seems as though 15.6% of the CPU's time is spent waiting. What exactly could it be waiting for?

Best Answer

As a clarification point, load is not directly tied to CPU. This is one of the most common misconceptions about load. The fact that you mention disk seems to acknowledge that you're aware of this, but I just wanted to mention it as I see comments that indicate some believe otherwise.

Load is defined as the number of processes waiting on system resources. This is commonly CPU, disk, or network, but can be anything hardware really.
A "process" is not necessarily a full process either. A thread is defined as a "lightweight process", and each thread that is waiting increases the load count.


To figure out which processes are a problem:

Run top -H (the -H enables showing threads)

The keyboard shortcuts vary by version.

With newer top (3.3 and after):

Press f to bring up the field options.
Use the arrow keys to go to S = Process Status and press s.
Press q to go back to the main page.
Press Shift + R to reverse the sorting.

With older top (before 3.3):

Press Shift+o to bring up the sort options.
Then w to sort by process status.
Then Enter to go back to the main page.
Then Shift + R to reverse the sorting.

Then in the S column, look for processes which have D or R (they should now be at the top). These will be processes contributing to system load.

If the process shows a D, that means "uninterruptable sleep". Usually this is caused when the process is waiting on I/O (disk, network, etc).
If the process shows a R, that means it's just doing normal computation.


To find more about what those processes are doing:

With newer top (3.3 and after):

Press f to bring up the field options.
Use the arrow keys to go to WCHAN = Sleeping in Function and press d to enable it.
Then q to get back to the main page.

With older top (before 3.3):

Press f then y to enable the WCHAN field.

If your system has the necessary kernel options, and the wchan file is present on your system (I forget where it is and what it's called), the WCHAN field should show you what kernel function the process is currently running (if the field just shows a - or a ? on everything, you don't have support).
A bit of google here and you should be on your way.

If you don't have wchan support, you can always try an strace on the processes to find out what they're doing, but that's the difficult way.

Related Question