I would appreciate some advice from your experience. My main concern is I really REALLY do not want to cause the computer server to crash.
The question is, I am running a program on a Linux computer server (super-computer? Maybe.). The program I am running is able to specify the of thread which can be used. I specified I would like to use 15 thread.
The computer server I am using has about 20+ processor (Intel Xeon CPU with 6 core). From top c, I saw that from running my program I am using
%CPU
190.7%
So I proceed to check with top c (1) and below is the output
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 95.7%us, 0.3%sy, 0.0%ni, 0.0%id, 3.6%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 96.0%us, 0.7%sy, 0.0%ni, 0.0%id, 3.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu16 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 0.0%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu18 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
The % of CPU used shifts between different CPU, for example sometimes cpu20 gets 90% and cpu1 returns to 0%.
Is there a chance the computer server might crash because I am using 190% of CPU?
Best Answer
Percentage of cpu is reported differently, by different tools, by different systems. A better way to consider cpu load is with load. Consider the below overloaded worker machine:
This says I have a 1 min load of 9.87, 5 min load of 9.50, and 15 min load of 7.25. The 'load' number represents how many processors worth of work this machine has been assigned, while the cpuinfo command shows me how many actual processors I have to do the work. If I had 12 cpus, this load level would be perfectly fine.
So you can see cpus are being divvied out between processes, but what I care is that there's more work than the cpus can feasibly keep up with. That's causing queued jobs to have to wait for cpu to be free to use them.