Bash – timeout without killing process in bash

bashkilllinuxtimeout

I have a main script that I'm running, and from it I have a second "slow process" I want to kick off, and "do something" in the main script if it doesn't complete in the time limit — depending on if it completed or not. N.B. If the "slow process" finishes before my time limit, I don't want to have to wait an entire time limit.

I want the "slow process" to keep going so I can gather stats and forensics about it's performance.

I've looked into using timeout, however it will kill my script when finished.

Suppose this simplified example.

main.sh

result=`timeout 3 ./slowprocess.sh`
if [ "$result" = "Complete" ]
then
 echo "Cool it completed, do stuff..."
else
 echo "It didn't complete, do something else..."
fi

slowprocess.sh

#!/bin/bash
start=`date +%s`
sleep 5
end=`date +%s`
total=`expr $end - $start`
echo $total >> /tmp/performance.log
echo "Complete"

Here, it uses timeout — so the script dies, so nothing winds up in /tmp/performance.log — I want slowprocess.sh to complete, but, I want main.sh to go onto its next step even if it doesn't finish in the 3 seconds.

Best Answer

With ksh/bash/zsh:

{
  (./slowprocess.sh >&3 3>&-; echo "$?") |
    if read -t 3 status; then
      echo "Cool it completed with status $status, do stuff..."
    else
      echo "It didn't complete, do something else..."
    fi
} 3>&1

We duplicate the original stdout onto fd 3 (3>&1) so we can restore it for slowprocess.sh (>&3), while stdout for the rest of the (...) subshell goes to the pipe to read -t 3.

Alternatively, if you want to use timeout (here assuming GNU timeout):

timeout --foreground 3 sh -c './slowprocess.sh;exit'

would avoid slowprocess.sh being killed (the ;exit is necessary for sh implementations that optimise by executing the last command in the shell process).

Short answer

In bash (and dash) the various "job status" messages are not displayed from signal handlers, but require an explicit check. This check is performed only before a new prompt is provided, probably not to disturb the user while he/she is typing a new command.

The message is not shown just before the prompt after the kill is displayed probably because the process is not dead yet - this is particularly probable condition since kill is an internal command of the shell, so it's very fast to execute and doesn't need forking.

Doing the same experiment with killall, instead, usually yields the "killed" message immediately, sign that the time/context switches/whatever required to execute an external command cause a delay long enough for the process to be killed before the control returns to the shell.

matteo@teokubuntu:~$ dash
$ sleep 60 &
$ ps
  PID TTY          TIME CMD
 4540 pts/3    00:00:00 bash
 4811 pts/3    00:00:00 sh
 4812 pts/3    00:00:00 sleep
 4813 pts/3    00:00:00 ps
$ kill -9 4812
$ 
[1] + Killed                     sleep 60
$ sleep 60 &
$ killall sleep
[1] + Terminated                 sleep 60
$

Long answer

`dash`

First of all, I had a look at the dash sources, since dash exhibits the same behavior and the code is surely simpler than bash.

As said above, the point seems to be that job status messages are not emitted from a signal handler (which can interrupt the "normal" shell control flow), but they are the consequence of an explicit check (a showjobs(out2, SHOW_CHANGED) call in dash) that is performed only before requesting new input from the user, in the REPL loop.

Thus, if the shell is blocked waiting for user input no such message is emitted.

Now, why doesn't the check performed just after the kill show that the process was actually terminated? As explained above, probably because it's too fast. kill is an internal command of the shell, so it's very fast to execute and doesn't need forking, thus, when immediately after the kill the check is performed, the process is still alive (or, at least, is still being killed).

`bash`

As expected, bash, being a much more complex shell, was trickier and required some gdb-fu.

The backtrace for when that message is emitted is something like

(gdb) bt
#0  pretty_print_job (job_index=job_index@entry=0, format=format@entry=0, stream=0x7ffff7bd01a0 <_IO_2_1_stderr_>) at jobs.c:1630
#1  0x000000000044030a in notify_of_job_status () at jobs.c:3561
#2  notify_of_job_status () at jobs.c:3461
#3  0x0000000000441e97 in notify_and_cleanup () at jobs.c:2664
#4  0x00000000004205e1 in shell_getc (remove_quoted_newline=1) at /Users/chet/src/bash/src/parse.y:2213
#5  shell_getc (remove_quoted_newline=1) at /Users/chet/src/bash/src/parse.y:2159
#6  0x0000000000423316 in read_token (command=<optimized out>) at /Users/chet/src/bash/src/parse.y:2908
#7  read_token (command=0) at /Users/chet/src/bash/src/parse.y:2859
#8  0x00000000004268e4 in yylex () at /Users/chet/src/bash/src/parse.y:2517
#9  yyparse () at y.tab.c:2014
#10 0x000000000041df6a in parse_command () at eval.c:228
#11 0x000000000041e036 in read_command () at eval.c:272
#12 0x000000000041e27f in reader_loop () at eval.c:137
#13 0x000000000041c6fd in main (argc=1, argv=0x7fffffffdf48, env=0x7fffffffdf58) at shell.c:749

The call that checks for dead jobs & co. is notify_of_job_status (it's more or less the equivalent of showjobs(..., SHOW_CHANGED) in dash); #0-#1 are related to its inner working; 6-8 is the yacc-generated parser code; 10-12 is the REPL loop.

The interesting place here is #4, i.e. from where the notify_and_cleanup call comes. It seems that bash, unlike dash, may check for terminated jobs at each character read from the command line, but here's what I found:

      /* If the shell is interatctive, but not currently printing a prompt
         (interactive_shell && interactive == 0), we don't want to print
         notifies or cleanup the jobs -- we want to defer it until we do
         print the next prompt. */
      if (interactive_shell == 0 || SHOULD_PROMPT())
    {
#if defined (JOB_CONTROL)
      /* This can cause a problem when reading a command as the result
     of a trap, when the trap is called from flush_child.  This call
     had better not cause jobs to disappear from the job table in
     that case, or we will have big trouble. */
      notify_and_cleanup ();
#else /* !JOB_CONTROL */
      cleanup_dead_jobs ();
#endif /* !JOB_CONTROL */
    }

So, in interactive mode it's intentional to delay the check until a new prompt is provided, probably not to disturb the user entering commands. As for why the check doesn't spot the dead process when displaying the new prompt immediately after the kill, the previous explanation holds (the process is not dead yet).

Bash – Killing background process in bash script when exiting the script

CLOCK_PID=$!
trap 'kill -9 $CLOCK_PID' EXIT
tail -f mylog.log

Best Answer

Related Solutions

Bash – Why ‘Terminated’ Message Appears After Killing Process

Short answer

Long answer

dash

bash

Bash – Killing background process in bash script when exiting the script

Related Question

`dash`

`bash`