Top c – CPU usage >200% Will it Crash

cpu

I would appreciate some advice from your experience. My main concern is I really REALLY do not want to cause the computer server to crash.

The question is, I am running a program on a Linux computer server (super-computer? Maybe.). The program I am running is able to specify the of thread which can be used. I specified I would like to use 15 thread.

The computer server I am using has about 20+ processor (Intel Xeon CPU with 6 core). From top c, I saw that from running my program I am using

%CPU
190.7%

So I proceed to check with top c (1) and below is the output

Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  : 95.7%us,  0.3%sy,  0.0%ni,  0.0%id,  3.6%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 : 96.0%us,  0.7%sy,  0.0%ni,  0.0%id,  3.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu16 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu17 :  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu18 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu19 :  0.3%us,  0.0%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu20 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

The % of CPU used shifts between different CPU, for example sometimes cpu20 gets 90% and cpu1 returns to 0%.

Is there a chance the computer server might crash because I am using 190% of CPU?

Best Answer

Percentage of cpu is reported differently, by different tools, by different systems. A better way to consider cpu load is with load. Consider the below overloaded worker machine:

# w 
 02:22:31 up 221 days, 11:06,  1 user,  load average: 9.87, 9.50, 7.25
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT
stephan  pts/0    173.13.169.18    02:22    0.00s  0.44s  0.00s w


~$ cat /proc/cpuinfo |grep processor
processor   : 0
processor   : 1

This says I have a 1 min load of 9.87, 5 min load of 9.50, and 15 min load of 7.25. The 'load' number represents how many processors worth of work this machine has been assigned, while the cpuinfo command shows me how many actual processors I have to do the work. If I had 12 cpus, this load level would be perfectly fine.

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                
11579 app       20   0  263m  97m 4104 R   22  1.3   0:00.85 ruby                                                                                   
11586 app       20   0     0    0    0 Z   20  0.0   0:00.62 ruby <defunct>                                                                         
11589 app       20   0  262m  96m 3884 S   18  1.3   0:00.53 ruby                                                                                   
11592 app       20   0  260m  95m 3000 R   17  1.3   0:00.50 ruby                                                                                   
11600 app       20   0  260m  95m 2744 R   15  1.3   0:00.45 ruby                                                                                   
11595 app       20   0  260m  95m 2744 R   13  1.3   0:00.39 ruby                                                                                   
11598 app       20   0  262m  95m 3096 R   12  1.3   0:00.35 ruby                                                                                   
11604 app       20   0  258m  93m 2744 R   10  1.3   0:00.30 ruby                                                                                   
11607 app       20   0  257m  92m 2496 R    8  1.2   0:00.25 ruby                                                                                   
11610 app       20   0  256m  91m 2560 S    4  1.2   0:00.11 ruby

So you can see cpus are being divvied out between processes, but what I care is that there's more work than the cpus can feasibly keep up with. That's causing queued jobs to have to wait for cpu to be free to use them.

Related Question