Shell – Launch background processes in one group and later kill them all

processshell

The other day I tried writing a script that would kill a PID and all its children processes, but after spending some time on it, I decided it wasn't trustworthy, because sometimes some of the children would end up with a PPID of 1.

Now, what I'm looking for is how can I run the following function in the background, and have the function and curl all in the same group, and later kill this script if it gets stuck, killing all its children that were left in the background:

download(){
    until curl -s -S -L -A Mozilla/5.0 -m 300 "$@"; do
        echo Retrying in 5 seconds... >&2
        sleep 5
    done
}

for url in foo.com bar.com baz.com; do
    download $url >$url &
done
wait

I know there's no need for this function download, but I included here because it is a function I use many times in this script. Also, I'm giving a max time limit of 300 to curl, but depending on some network errors, it also gets stuck.

Best Answer

Typically, if you invoke your script from an interactive shell, it will be put in a new process-group (aka job), so if you Ctrl-C on it, all the processes started by that script will receive a SIGINT.

If you started it in background as well (still from an interactive shell), it will also be started in its own process group.

You can find out about process groups with:

ps -j

(j is for job control which is that behaviour of shells that run command lines in process groups to be able to manage them (foreground/background/kill/suspend/resume)).

You can find out about the jobs of your interactive shell with the jobs command (though not all the processes in it). jobs -p will show you the process group id.

You can kill the members of a process group by sending a signal to -x where x is the process group id (PGID) or by using jobs specifications with %job-number. For instance:

$ sleep 30 | sleep 40 &
[1] 6950 6951
$ ps -j
  PID  PGID   SID TTY          TIME CMD
 6031  6031  6031 pts/3    00:00:00 zsh
 6950  6950  6031 pts/3    00:00:00 sleep
 6951  6950  6031 pts/3    00:00:00 sleep
 6952  6952  6031 pts/3    00:00:00 ps
$ kill -- -6950
[1]  + terminated  sleep 30 | sleep 40
$ sleep 30 | sleep 40 &
[1] 6955 6957
$ jobs
[1]  + running    sleep 30 | sleep 40
$ kill %1
[1]  + terminated  sleep 30 | sleep 40

Now, if not started from an interactive shell, your script will end up in the same process group as its parent. So killing that process group could end up killing a bunch of other processes you don't want to kill.

What you could do in your script is start the process group yourself.

Like by adding:

[ "$(ps -o pgid= -p "$$")" -eq "$$" ] ||
  exec perl -e 'setpgrp or die "setpgrp; $!"; exec @ARGV' -- "$0" "$@"

(which re-executes the script after having called perl to start a new process group if we detect we're not a process group leader) at the start of your script.

That way, you're guaranteed that the script will run in its own process group.

What that means though is that if you do:

something | myscript | something

in an interactive shell, chances are myscript will not be the process group leader. By doing the setpgrp above, the script will no longer be in the foreground process group of the terminal, which means Ctrl-C won't kill it.

Related Question