Linux – Detect long-running commands and notify when they complete

linuxzsh

This is a long shot — I don't think it's possible, but I figured I'd ask.

I frequently run commands that take a few minutes to complete, and while they run I check my email or browse the Internet. When I remember, I run command; finished, where finished is a command that pops up a dialog notifying me that the command is done. However, I don't always remember, so I'd like to have something that automatically detects when a command takes more than N seconds and runs finished when the process completes. If it matters, I prefer using zsh but I'd switch if another shell was capable of this.

iTerm (a terminal emulator for OSX) has a feature like this, where it shows you a notification when the terminal goes idle or produces output after being idle for a while. If there are any terminal emulators for Linux that you know of that have this feature, that would work just as well (or better).

Can you think of any way to make this work? Thanks so much!

Best Answer

You need a way to find your processes which have been running for a while. To do that, it helps to know that the etime parameter shows the elapsed time since the process started, in the format DD-hh:mm:ss, where each bigger part is optional. So, you can do

ps -U sauer -o etime,pid,command

(on AIX, you could use ps -U sauer -o "%t,%p,%c")

You could use a -t $( tty) in place of the -U username to select processes on the current tty (aka, in the current terminal). That's what I'll do later. Also, note that doing -o cmd= suppresses teh column header, but means you need multiple -o options.

So you end up with ps -t $(tty) -o etime= -o pid= -o command= to show all the processes running on the current terminal, showing columns "elapsed wall clock time", "process ID", and "command", and suppressing the column headers.

Now you need to get only the ones which have been running for longer than whatever seconds. So, you need to extract the first column and convert the time to seconds. How about a shell script to do that (ksh/bash):

ps -t $(tty) -o etime= -o pid= -o command= | while read time pid cmd
do
  secs=0
  while true
  do
    part=${time##*:}
    secs=$(( secs + ${part#0} ))
    [[ "$part" != "$time" ]] || break # only had seconds

    time=${time%:$part}
    part=${time##*:}
    secs=$(( secs + (60 * ${part#0}) ))
    [[ "$part" != "$time" ]] || break # only had minutes left

    time=${time%:$part}
    part=${time##*-}
    secs=$(( secs + (60 * 60 * ${part#0}) ))
    [[ "$part" != "$time" ]] || break # only had hours left

    time=${time%-$part} # all that's left are days-hours
    secs=$(( secs + (24 * 60 * 60 * time) ))
    break
  done

  echo "Process '$cmd' (pid '$pid') has been up for '$secs' seconds"
done

A couple of things may need explained. The ${part#0} is in there so that times like "06" seconds are converted to "6" instead of being treated as octal. Sure, octal 06 and decimal 6 are the same, but octal 09 isn't valid, and shells sometimes whine about that.

The other thing is that ${var##*:} evaluates to whatever $var is with the largest string matching "*:" pruned from the front (one # is the shortest match, and %% / % does the same from the end). If the pattern doesn't match, it evaluates to $var. So, we yank off everything but the seconds, add that to seconds, then remove the seconds from the end. If we have stuff left, that's minutes. So, yank the minutes off, convert to seconds, add to the running total, prune it from the end. And so forth.

Alright. From there, now you just need to put a #!/bin/ksh (or bash, if you have to) at the top, and you have a script. Except you also need a way to find just the pids that are running for more than X seconds. How about a file in /tmp. Let's use this in place of the echo command:

  if [[ $secs -gt 120 ]]
  then
    { cat /tmp/pids.$$; echo "$pid $cmd"; } 2>/dev/null | sort -u > /tmp/pids.$$.tmp
    while read p c
    do
      if ps -p $p >/dev/null 2>&1
      then
        # put it back in the list
        echo "$p $c" > /tmp/pids.$$
      else
        echo "Command '$c' exited" | mailx -s "exit report" $USER
      fi
    done < /tmp/pids.$$.tmp
  fi

Now all you have to do is wrap the whole giant thing inside a while loop

while true do # the big while loop and the if statement sleep 10 done

Call it process_monitor.sh and stick a process_monitor.sh& in your .profile or .bash_login or whatever. That'll sit in the background watching all the processes in that terminal, and when it finds one running for over 120 seconds, it'll stick that into the list to watch, and email you when the process exits.

There are things I'd probably add to make it more pretty, like trapping the EXIT signal and removing the temp files when the script exits, etc. But that's an exercise for you. I'm hoping this should be enough to get you well on the way. :)

Related Question