Bash – Why can’t I kill a timeout called from a Bash script with a keystroke

bashkillshell-scriptsignals

[Edit: This looks similar to some other questions asking how to kill all spawned processes – the answers all seem to be to use pkill. So the core of my question may be: Is there a way to propagate Ctrl-C/Z to all processes spawned by a script?]

When calling a SoX rec with the timeout command from coreutils (discussed here), there doesn't seem to be any way to kill it with a keystroke once it's been invoked from within a Bash script.

Examples:

timeout 10 rec test.wav

…can be killed with Ctrl+C or Ctrl+Z from bash, but not when it's been called from inside a script.

timeout 10 ping nowhere

…can be killed with Ctrl+C or Ctrl+Z from bash, and with Ctrl+Z when it's run from inside a script.

I can find the process ID and kill it that way, but why can't I use a standard break keystroke? And is there any way to structure my script so that I can?

Best Answer

Signal keys such as Ctrl+C send a signal to all processes in the foreground process group.

In the typical case, a process group is a pipeline. For example, in head <somefile | sort, the process running head and the process running sort are in the same process group, as is the shell, so they all receive the signal. When you run a job in the background (somecommand &), that job is in its own process group, so pressing Ctrl+C doesn't affect it.

The timeout program places itself in its own process group. From the source code:

/* Ensure we're in our own group so all subprocesses can be killed.
   Note we don't just put the child in a separate group as
   then we would need to worry about foreground and background groups
   and propagating signals between them.  */
setpgid (0, 0);

When a timeout occurs, timeout goes through the simple expedient of killing the process group of which it is a member. Since it has put itself in a separate process group, its parent process will not be in the group. Using a process group here ensures that if the child application forks into several processes, all its processes will receive the signal.

When you run timeout directly on the command line and press Ctrl+C, the resulting SIGINT is received both by timeout and by the child process, but not by interactive shell which is timeout's parent process. When timeout is called from a script, only the shell running the script receives the signal: timeout doesn't get it since it's in a different process group.

You can set a signal handler in a shell script with the trap builtin. Unfortunately, it's not that simple. Consider this:

#!/bin/sh
trap 'echo Interrupted at $(date)' INT
date
timeout 5 sleep 10
date

If you press Ctrl+C after 2 seconds, this still waits the full 5 seconds, then print the “Interrupted” message. That's because the shell refrains from running the trap code while a foreground job is active.

To remedy this, run the job in the background. In the signal handler, call kill to relay the signal to the timeout process group.

#!/bin/sh
trap 'kill -INT -$pid' INT
timeout 5 sleep 10 &
pid=$!
wait $pid

Related Solutions

Process descendants

If you send a signal to a process, that process gets killed. I wonder how the rumor that killing a process also kills other processes got started, it seems particularly counter-intuitive.

There are, however, ways to kill more than one process. But you won't be sending a signal to one process. You can kill a whole process group by sending a signal to -1234 where 1234 is the PGID (process group ID), which is the PID of the process group leader. When you run a pipeline, the whole pipeline starts out as a process group (the applications may change this by calling setpgid or setpgrp).

When you start processes in the background (foo &), they are in their own process group. Process groups are used to manage access to the terminal; normally only the foreground process group has access to the terminal. The background jobs remain in the same session, but there's no facility to kill a whole session or even to enumerate the process groups or processes in a session, so that doesn't help much.

When you close a terminal, the kernel sends the signal SIGHUP to all processes that have it as their controlling terminal. These processes form a session, but not all sessions have a controlling terminal. For your project, one possibility is therefore to start all the processes in their own terminal, created by script, screen, etc. Kill the terminal emulator process to kill the contained processes (assuming they haven't seceded with setsid).

You can provide more isolation by running the processes as their own user, who doesn't do anything else. Then it's easy to kill all the processes: run kill (the system call or the utility) as that user and use -1 as the PID argument to kill, meaning “all of that user's processes”.

You can provide even more isolation, but with considerably more setup by running the contained processes in an actual container.

Bash – Script dies when parent process is terminated

On a Centos 7 test system via

$ sudo rpm -Uvh https://packages.microsoft.com/config/rhel/7/packages-microsoft-prod.rpm
$ sudo yum install dotnet-sdk-2.1

which results in dotnet-sdk-2.1-2.1.400-1.x86_64 being installed then with the test code

using System;
using System.Diagnostics;
using System.ComponentModel;
namespace myApp {
    class Program {
        static void Main(string[] args) {
            var process = new Process();
            process.EnableRaisingEvents = true; // to avoid [defunct] sh processes
            process.StartInfo.FileName = "/var/tmp/foo";
            process.StartInfo.Arguments = "";
            process.StartInfo.UseShellExecute = true;
            process.StartInfo.CreateNoWindow = true;
            process.Start();
            process.WaitForExit(10000);
            if (process.HasExited) {
                Console.WriteLine("Exit code: " + process.ExitCode);
            } else {
                Console.WriteLine("Child process still running after 10 seconds");
            }
        }
    }
}

and a shell script as /var/tmp/foo a strace stalls out and shows that /var/tmp/foo is run through xdg-open which on my system does...I'm not sure what, it seems a needless complication.

$ strace -o foo -f dotnet run
Child process still running after 10 seconds
^C
$ grep /var/tmp/foo foo
25907 execve("/usr/bin/xdg-open", ["/usr/bin/xdg-open", "/var/tmp/foo"], [/* 37 vars */] <unfinished ...>
...

a simpler solution is to simply exec a program that in turn can be a shell script that does what you want, which for .NET requires not using the shell:

            process.StartInfo.UseShellExecute = false;

with this set the strace shows that /var/tmp/foo is being run via a (much simpler) execve(2) call:

26268 stat("/var/tmp/foo", {st_mode=S_IFREG|0755, st_size=37, ...}) = 0
26268 access("/var/tmp/foo", X_OK)      = 0
26275 execve("/var/tmp/foo", ["/var/tmp/foo"], [/* 37 vars */] <unfinished ...>

and that .NET refuses to exit:

$ strace -o foo -f dotnet run
Child process still running after 10 seconds
^C^C^C^C^C^C^C^C

because foo replaces itself with something that ignores most signals (notably not USR2, or there is always KILL (but avoid using that!)):

$ cat /var/tmp/foo
#!/bin/sh
exec /var/tmp/stayin-alive
$ cat /var/tmp/stayin-alive
#!/usr/bin/perl
use Sys::Syslog;
for my $s (qw(HUP INT QUIT PIPE ALRM TERM CHLD USR1)) {
   $SIG{$s} = \&shandle;
}
openlog( 'stayin-alive', 'ndelay,pid', LOG_USER );
while (1) {
    syslog LOG_NOTICE, "oh oh oh oh oh stayin alive";
    sleep 7;
}
sub shandle {
    syslog LOG_NOTICE, "nice try - @_";
}

daemonize

With a process that disassociates itself from the parent and a shell script that runs a few commands (hopefully equivalent to your intended apt-get update; apt-get upgrade)

$ cat /var/tmp/a-few-things
#!/bin/sh
sleep 17 ; echo a >/var/tmp/output ; echo b >/var/tmp/output

we can modify the .NET program to run /var/tmp/solitary /var/tmp/a-few-things

            process.StartInfo.FileName = "/var/tmp/solitary";
            process.StartInfo.Arguments = "/var/tmp/a-few-things";
            process.StartInfo.UseShellExecute = false;

which when run causes the .NET program to exit fairly quickly

$ dotnet run
Exit code: 0

and, eventually, the /var/tmp/output file does contain two lines written by a process that was not killed when the .NET program when away.

You probably should save the output from the APT commands somewhere, and may also need something so that two (or more!) updates are not trying to be run at the same time, etc. This version does not stop for questions and ignores any TERM signals (INT may also need to be ignored).

#!/bin/sh
trap '' TERM
set -e
apt-get --yes update
apt-get --yes upgrade

Best Answer

Related Solutions

Process descendants

Bash – Script dies when parent process is terminated

daemonize

Related Question