Linux – How to detect and warn if a process is using 100% CPU for a long time

cpu usagelinuxmonitoring

Every now and then (every few days) I notice that a process is using 100% CPU. The process is avrdude started by the Arduino IDE which, under certain circumstances that I haven't been able to reproduce just sits there at 100% CPU as shown in top.

Possibly the circumstances are that an upload to the Arduino board commences, and the board is disconnected during the process.

I have 8 cores in the processor, so it isn't immediately obvious that one of them is maxed out. In fact, it only becomes noticeable if it happens a few times in a row, and then I have maybe 3 cores at 100% CPU.

Is there a way of having some background task check for this (say, every 15 minutes), and then alert me in some way (eg. some pop-up dialog)? I am using Ubuntu 14.04 LTS.


Thanks to MelBurslan for his answer, but I'm stumped as to why it isn't fully working. My current script is this:

cpupercentthreshold=2
pstring=""
top -b -n 1 | sed -e "1,7d" | while read line; do
cpuutil=$(echo ${line} | awk '{print $9}' | cut -d"." -f 1)
procname=$(echo ${line} | awk '{print $12}' )
if [ ${cpuutil} -ge ${cpupercentthreshold} ]
then
  echo ${cpuutil}
  pstring=${pstring}${procname}" "
  echo pstring is currently ${pstring}
fi
done
echo pstring is ${pstring}
if [ -n "${pstring}" ]
then
  zenity --title="Warning!" --question --text="These processes are above CPU threshold limit ${pstring}" --ok-label="OK"
fi

I dropped the threshold down for testing. However as you can see it collects the individual processes OK, but the final test (to display the dialog box) fails because pstring has suddenly become empty for reasons I can't see:

13
pstring is currently VirtualBox
6
pstring is currently VirtualBox Xorg
6
pstring is currently VirtualBox Xorg compiz
6
pstring is currently VirtualBox Xorg compiz ibus-engin+
6
pstring is currently VirtualBox Xorg compiz ibus-engin+ top
pstring is

Best Answer

After reading the answer by MelBurslan and various comments, I decided to try (inspired by their suggestions) to do a version in Lua. This was done in Lua 5.1.5 - I'm not sure if it will work with the latest Lua.

The general idea is to use Lua's popen (open a pipe) to execute top and then process the resulting data using a regular expression (or pattern, as it is called in Lua). Matching lines (which would be most of them) are then considered for crossing the threshold percentage. If they do, they are added to a table.

If the table is not empty, then zenity is called to display a message to the user. A few "gotchas" I found during development:

  • I added a timeout of 60 seconds to zenity so that, if you were not at the PC at the time, you didn't fill the screen with warning dialogs.
  • I added --display=:0.0 so that a display screen was found when running under cron.
  • I simplified the test for "every 15 minutes" in the crontab, like this:

    */15 * * * * /home/nick/check_cpu_usage.lua
    
  • The regular expression captures everything from top in case you want to do other tests (eg. using too much memory).

I think this would be faster than firing off lots of processes and subshells. It seems to work OK. Test by reducing the threshold (eg. to 5) and change the crontab entry to check every minutes.


check_cpu_usage.lua

#! /usr/local/bin/lua

THRESHOLD = 90  -- percent

-- pipe output of top through a file "f"
f = assert (io.popen ("top -b -n 1 -w 512"))
t = { }

-- check each line
for line in f:lines() do

  -- match top output, eg.
  --   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  -- 30734 nick      20   0 6233848 3.833g 3.731g S   8.6 12.2   3:11.75 VirtualBox

  local pid, user, priority, nice, virt, res, shr, 
        status, cpu, mem, time, command =
    string.match (line,
      "^%s*(%d+)%s+(%a+)%s+(%-?%d+)%s+(%-?%d+)" ..
--         pid      user   priority    nice
      "%s+([%d.]+[g]?)%s+([%d.]+[g]?)%s+([%d.]+[g]?)%s+([DRSTZ])%s+(%d+%.%d+)%s+(%d+%.%d+)" ..
--        virtual          res           shr             status       %cpu        %mem
      "%s+([0-9:.]+)%s+(.*)$")
--         time       command

  -- if a match (first few lines won't) check for CPU threshold
  if pid then
    cpu = tonumber (cpu)
    if cpu >= THRESHOLD then
      table.insert (t, string.format ("%s (%.1f%%)", command, cpu))
    end -- if
  end -- if

end -- for loop

f:close()

-- if any over the limit, alert us
if #t > 0 then
  os.execute ('zenity --title="CPU usage warning!" --info ' ..
              '--text="These processes are using more than ' ..
              THRESHOLD .. '% CPU:\n' ..
              table.concat (t, ", ") ..
              '" --ok-label="OK" ' ..
              '--timeout=60 ' ..   -- close dialog after one minute in case we aren't around
              '--display=:0.0 '  -- ensure visible when running under cron
              )
end -- if
Related Question