Huge CPU load due to high system usage

top

Tasks: 747 total, 176 running, 560 sleeping,   0 stopped,  11 zombie
Cpu(s): 10.5%us, 89.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:  74236420k total, 73285344k used,   951076k free, 12261184k buffers
Swap:  8388600k total,    10404k used,  8378196k free, 27872176k cached

89% of CPU is being used by %sy. What is that %sy?

This is how iostats look like

root@host [~]# iostat -xk 5
Linux 2.6.32-431.20.3.el6.x86_64 (host.superhostsite.com)       09/03/2014      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          43.02    0.28   50.00    0.05    0.00    6.65

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.25    64.95   14.21   79.82    91.86   579.51    14.28     0.15    1.60   0.09   0.84
sda               0.87   182.70   28.06  206.05   247.08  1629.10    16.03     0.49    2.07   0.09   2.22

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.45    0.00   91.55    0.00    0.00    0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00    14.00    0.20   15.00     3.20   116.00    15.68     0.03    1.92   0.28   0.42
sda               0.00    23.20    2.00   47.80    25.60   284.00    12.43     0.02    0.42   0.14   0.70

So disk usage is small. Everything is small. And yet, huge 89.2% cpu used by system.

Why Why %sy is high? Why not %us?

Best Answer

I assume your question is basically "What's going on here?".

I will answer by explaining your output - If that helps, let me know, I'd add more detail.
(Try to edit the question so that is's more clear what you are asking, otherwise it may get closed)

So, yes, you see "huge CPU load due to high CPU usage"!

Let's look at the top output:

Cpu(s): 10.5%us, 89.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st

The percentage values say where the time is spent - in user code, or in system (kernel) code. The %sy are the time in system code - and there are additional 10%us - user time. So the CPU is used by 100%! (You can see it from the 0%id - idle - also.)

But there is even more:

Tasks: 747 total, 176 running, 560 sleeping, 0 stopped, 11 zombie

There are 176 running processes. But if you have less than 176 cores, some of them are olny in the state that they could run if they had CPU time.
That means you have more load, that could get more CPUs to 100% usage.
Your CPU is not used to 89.2% - it's 100%

From this, there is no reason to look at iostat - the system does not need much IO in this state.

But the information we need it: what are these at least 176 processes or threads, there may be many more similar tasks not in running state.

And the next will be: what are they doing, and why?

So take a look at the process list in top - it may show some obvious problem.

It could help to know more about the processes in "runable" state;
The command below lists all processes and threads that are in "runable" state - the tasks that could run if they get CPU time:

ps -o comm,pid,ppid,user,time,etime,start,pcpu,state --sort=comm aH | grep '^COMMAND\|R$'

For me, that lists only one or two lines, including ps itself

Related Question