Bash – Why does ‘jobs’ always return a line for finished processes when run in a subshell within a script

bashjobsshellshell-script

Normally, when a job is launched in the background, jobs will report that it is finished the first time it is run after the job's completion, and nothing for subsequent executions:

$ ping -c 4 localhost &>/dev/null &
[1] 9666
$ jobs
[1]+  Running                 ping -c 4 localhost &> /dev/null &
$ jobs
[1]+  Done                    ping -c 4 localhost &> /dev/null
$ jobs  ## returns nothing
$

However, when run in a subshell within a script it seems to always return a value. This script will never exit:

#!/usr/bin/env bash
ping -c 3 localhost &>/dev/null &
while [[ -n $(jobs) ]]; do
    sleep 1; 
done

If I use tee in the [[ ]] construct to see the output of jobs, I see that it is always printing the Done ... line. Not only once as I expected but, apparently, for ever.

What is even stranger is that running jobs within the loop causes it to exit as expected:

#!/usr/bin/env bash
ping -c 3 localhost &>/dev/null &
while [[ -n $(jobs) ]]; do
    jobs
    sleep 1; 
done

Finally, as pointed out by @mury, the first script works as expected and exits if run from the commandline:

$ ping -c 5 localhost &>/dev/null & 
[1] 13703
$ while [[ -n $(jobs) ]]; do echo -n . ; sleep 1; done
...[1]+  Done                    ping -c 5 localhost &> /dev/null
$

This came up when I was answering a question on Super User so please don't post answers recommending better ways of doing what that loop does. I can think of a few myself. What I am curious about is

Why does jobs act differently within the [[ ]] construct? Why will it always return the Done... line while it doesn't when run manually?
Why does running jobs within the loop change the behavior of the script?

Best Answer

You know, of course, that $(…) causes the command(s) within the parentheses to run in a subshell. And you know, of course, that jobs is a shell builtin. Well, it looks like jobs clears a job from the shell’s memory once its death has been reported. But, when you run $(jobs), the jobs command runs in a subshell, so it doesn’t get a chance to tell the parent shell (the one that’s running the script) that the death of the background job (ping, in your example) has been reported. So, each time the shell spawns a subshell to run the $(jobs) thingie, that subshell still has a complete list of jobs (i.e., the ping job is there, even though it’s dead after the 5th iteration), and so jobs still (again) believes that it needs to report on the status of the ping job (even if it’s been dead for the past four seconds).

This explains why running an unadulterated jobs command within the loop causes it to exit as expected: once you run jobs in the parent shell, the parent shell knows that the job’s termination has been reported to the user.

Why is it different in the interactive shell? Because, whenever a foreground child of an interactive shell terminates, the shell reports on any background jobs that have terminated¹ while the foreground process was running. So, the ping terminates while the sleep 1 is running, and when the sleep terminates, the shell reports on the background job’s death. Et voilà.

¹ It might be more accurate to say “any background jobs that have changed state while the foreground process was running.” I believe that it might also report on jobs that have been suspended (kill -SUSP, the programmatic equivalent to Ctrl+Z) or become unsuspended (kill -CONT, which is what the fg command does).

Related Solutions

Shell Script – Prevent SIGINT from Interrupting Function Calls and Child Processes

There are several ways you can cut off the effect of Ctrl+C:

Change the terminal setting so that it doesn't generate a signal.
Block the signal so that it is saved for later delivery, when the signal becomes unblocked.
Ignore the signal, or set a handler for it.
Run subprocesses in a background process group.

Since you want to detect that Ctrl+C has been pressed, ignoring the signal is out. You could change the terminal settings, but then you would need to write custom key processing code. Shells don't provide access to signal blocking.

You can however isolate subprocesses from receiving the signal automatically by running them in a separate process group. Interactive shells run background commands in a separate process group by default, but non-interactive shells run them in the same process group, and all processes in the foreground process group receive a signal from terminal events. To tell the shell to run background jobs in a separate process group, run set -m. Running setsid ping … is another way of forcing ping to run in a separate process group.

set -m
interrupted=
trap 'echo Interrupted, but ping may still be running' INT
set -m
ping … &
while wait; [ $? -ge 128 ]; do echo "Waiting for background jobs"; done
echo ping has finished

If you want Ctrl+Z to suspend a background process group, you'll need to propagate the signal from the shell.

Controlling signals finely is a bit of a stretch for a shell script, and shells other than ATT ksh tend to be a little buggy when you reach the corner cases, so consider a language that gives you more control such as Perl, Python or Ruby.

linux cron – Email Only Occasionally Sent on Output and Errors

Upon further testing, I suspect the & is messing with your results. As you point out, &>/dev/null is bash syntax, not sh syntax. As a result, sh is creating a subshell and backgrounding it. Sure, the subshell's echo creates stderr, but my theory is that:

cron is not catching the subshell's stderr, and
the backgrounding of the subshell always completes successfully, thus bypassing your || echo ....

... causing the cron job to have no output and thus no mail. Based on my reading of the vixie-cron source, it would seem that the job's stderr and stdout would be captured by cron, but it must be getting lost by the subshell.

Test it yourself in a /bin/sh environment (assuming you do not have a file named 'bar' here):

(grep foo bar) &
echo $?

Best Answer

Related Solutions

Shell Script – Prevent SIGINT from Interrupting Function Calls and Child Processes

linux cron – Email Only Occasionally Sent on Output and Errors

Related Question