Linux – When Does the System Send a SIGTERM to a Process?

busyboxlinuxprocesssignals

My server program received a SIGTERM and stopped (with exit code 0). I am surprised by this, as I am pretty sure that there was plenty of memory for it. Under what conditions does linux (busybox) send a SIGTERM to a process?

Best Answer

_{I'll post this as an answer so that there's some kind of resolution if this turns out to be the issue.}

An exit status of 0 means a normal exit from a successful program. An exiting program can choose any integer between 0 and 255 as its exit status. Conventionally, programs use small values. Values 126 and above are used by the shell to report special conditions, so it's best to avoid them.

At the C API level, programs report a 16-bit status¹ that encodes both the program's exit status and the signal that killed it, if any.

In the shell, a command's exit status (saved in $?) conflates the actual exit status of the program and the signal value: if a program is killed by a signal, $? is set to a value greater than 128 (with most shells, this value is 128 plus the signal number; ATT ksh uses 256 + signal number and yash uses 384 + signal number, which avoids the ambiguity, but the other shells haven't followed suit).

In particular, if $? is 0, your program exited normally.

Note that this includes the case of a process that receives SIGTERM, but has a signal handler for it, and eventually exits normally (perhaps as an indirect consequence of the SIGTERM signal, perhaps not).

To answer the question in your title, SIGTERM is never sent automatically by the system. There are a few signals that are sent automatically like SIGHUP when a terminal goes away, SIGSEGV/SIGBUS/SIGILL when a process does things it shouldn't be doing, SIGPIPE when it writes to a broken pipe/socket, etc. And there are a few signals that are sent due to a key press in a terminal, mainly SIGINT for Ctrl+C, SIGQUIT for Ctrl+\ and SIGTSTP for Ctrl+Z, but SIGTERM is not one of those. If a process receives SIGTERM, some other process sent that signal.

¹ _{roughly speaking}

Related Solutions

Default exit code when process is terminated

Processes can call the _exit() system call (on Linux, see also exit_group()) with an integer argument to report an exit code to their parent. Though it's an integer, only the 8 least significant bits are available to the parent (exception to that is when using waitid() or handler on SIGCHLD in the parent to retrieve that code, though not on Linux).

The parent will typically do a wait() or waitpid() to get the status of their child as an integer (though waitid() with somewhat different semantics can be used as well).

On Linux and most Unices, if the process terminated normally, bits 8 to 15 of that status number will contain the exit code as passed to exit(). If not, then the 7 least significant bits (0 to 6) will contain the signal number and bit 7 will be set if a core was dumped.

perl's $? for instance contains that number as set by waitpid():

$ perl -e 'system q(kill $$); printf "%04x\n", $?'
000f # killed by signal 15
$ perl -e 'system q(kill -ILL $$); printf "%04x\n", $?'
0084 # killed by signal 4 and core dumped
$ perl -e 'system q(exit $((0xabc))); printf "%04x\n", $?'
bc00 # terminated normally, 0xbc the lowest 8 bits of the status

Bourne-like shells also make the exit status of the last run command in their own $? variable. However, it does not contain directly the number returned by waitpid(), but a transformation on it, and it's different between shells.

What's common between all shells is that $? contains the lowest 8 bits of the exit code (the number passed to exit()) if the process terminated normally.

Where it differs is when the process is terminated by a signal. In all cases, and that's required by POSIX, the number will be greater than 128. POSIX doesn't specify what the value may be. In practice though, in all Bourne-like shells that I know, the lowest 7 bits of $? will contain the signal number. But, where n is the signal number,

in ash, zsh, pdksh, bash, the Bourne shell, $? is 128 + n. What that means is that in those shells, if you get a $? of 129, you don't know whether it's because the process exited with exit(129) or whether it was killed by the signal 1 (HUP on most systems). But the rationale is that shells, when they do exit themselves, by default return the exit status of the last exited command. By making sure $? is never greater than 255, that allows to have a consistent exit status:
```
$ bash -c 'sh -c "kill \$\$"; printf "%x\n" "$?"'
bash: line 1: 16720 Terminated              sh -c "kill \$\$"
8f # 128 + 15
$ bash -c 'sh -c "kill \$\$"; exit'; printf '%x\n' "$?"
bash: line 1: 16726 Terminated              sh -c "kill \$\$"
8f # here that 0x8f is from a exit(143) done by bash. Though it's
   # not from a killed process, that does tell us that probably
   # something was killed by a SIGTERM
```
ksh93, $? is 256 + n. That means that from a value of $? you can differentiate between a killed and non-killed process. Newer versions of ksh, upon exit, if $? was greater than 255, kills itself with the same signal in order to be able to report the same exit status to its parent. While that sounds like a good idea, that means that ksh will generate an extra core dump (potentially overwriting the other one) if the process was killed by a core generating signal:
```
$ ksh -c 'sh -c "kill \$\$"; printf "%x\n" "$?"'
ksh: 16828: Terminated
10f # 256 + 15
$ ksh -c 'sh -c "kill -ILL \$\$"; exit'; printf '%x\n' "$?"
ksh: 16816: Illegal instruction(coredump)
Illegal instruction(coredump)
104 # 256 + 15, ksh did indeed kill itself so as to report the same
    # exit status as sh. Older versions of `ksh93` would have returned
    # 4 instead.
```
Where you could even say there's a bug is that ksh93 kills itself even if $? comes from a return 257 done by a function:
```
$ ksh -c 'f() { return "$1"; }; f 257; exit'
zsh: hangup     ksh -c 'f() { return "$1"; }; f 257; exit'
# ksh kills itself with a SIGHUP so as to report a 257 exit status
# to its parent
```
yash. yash offers a compromise. It returns 256 + 128 + n. That means we can also differentiate between a killed process and one that terminated properly. And upon exiting, it will report 128 + n without having to suicide itself and the side effects it can have.
```
$ yash -c 'sh -c "kill \$\$"; printf "%x\n" "$?"'
18f # 256 + 128 + 15
$ yash -c 'sh -c "kill \$\$"; exit'; printf '%x\n' "$?"
8f  # that's from a exit(143), yash was not killed
```

To get the signal from the value of $?, the portable way is to use kill -l:

$ /bin/kill 0
Terminated
$ kill -l "$?"
TERM

(for portability, you should never use signal numbers, only signal names)

On the non-Bourne fronts:

csh/tcsh and fish same as the Bourne shell except that the status is in $status instead of $? (note that zsh also sets $status for compatibility with csh (in addition to $?)).
rc: the exit status is in $status as well, but when killed by a signal, that variable contains the name of the signal (like sigterm or sigill+core if a core was generated) instead of a number, which is yet another proof of the good design of that shell.
es. the exit status is not a variable. If you care for it, you run the command as:
```
status = <={cmd}
```
which will return a number or sigterm or sigsegv+core like in rc.

Maybe for completeness, we should mention zsh's $pipestatus and bash's $PIPESTATUS arrays that contain the exit status of the components of the last pipeline.

And also for completeness, when it comes to shell functions and sourced files, by default functions return with the exit status of the last command run, but can also set a return status explicitly with the return builtin. And we see some differences here:

bash and mksh (since R41, a regression^Wchange apparently introduced intentionally) will truncate the number (positive or negative) to 8 bits. So for instance return 1234 will set $? to 210, return -- -1 will set $? to 255.
zsh and pdksh (and derivatives other than mksh) allow any signed 32 bit decimal integer (-2³¹ to 2³¹-1) (and truncate the number to 32bits).
ash and yash allow any positive integer from 0 to 2³¹-1 and return an error for any number out of that.
ksh93 for return 0 to return 320 set $? as is, but for anything else, truncate to 8 bits. Beware as already mentioned that returning a number between 256 and 320 could cause ksh to kill itself upon exit.
rc and es allow returning anything even lists.

Also note that some shells also use special values of $?/$status to report some error conditions that are not the exit status of a process, like 127 or 126 for command not found or not executable (or syntax error in a sourced file)...

Linux – What happends when sending SIGKILL to a Zombie Process in Linux

To answer that question, you have to understand how signals are sent to a process and how a process exists in the kernel.

Each process is represented as a task_struct inside the kernel (the definition is in the sched.h header file and begins here). That struct holds information about the process; for instance the pid. The important information is in line 1566 where the associated signal is stored. This is set only if a signal is sent to the process.

A dead process or a zombie process still has a task_struct. The struct remains, until the parent process (natural or by adoption) has called wait() after receiving SIGCHLD to reap its child process. When a signal is sent, the signal_struct is set. It doesn't matter if the signal is a catchable one or not, in this case.

Signals are evaluated every time when the process runs. Or to be exact, before the process would run. The process is then in the TASK_RUNNING state. The kernel runs the schedule() routine which determines the next running process according to its scheduling algorithm. Assuming this process is the next running process, the value of the signal_struct is evaluated, whether there is a waiting signal to be handled or not. If a signal handler is manually defined (via signal() or sigaction()), the registered function is executed, if not the signal's default action is executed. The default action depends on the signal being sent.

For instance, the SIGSTOP signal's default handler will change the current process's state to TASK_STOPPED and then run schedule() to select a new process to run. Notice, SIGSTOP is not catchable (like SIGKILL), therefore there is no possibility to register a manual signal handler. In case of an uncatchable signal, the default action will always be executed.

To your question:

A defunct or dead process will never be determined by the scheduler to be in the TASK_RUNNING state again. Thus the kernel will never run the signal handler (default or defined) for the corresponding signal, whichever signal is was. Therefore the exit_signal will never be set again. The signal is "delivered" to the process by setting the signal_struct in task_struct of the process, but nothing else will happen, because the process will never run again. There is no code to run, all that remains of the process is that process struct.

However, if the parent process reaps its children by wait(), the exit code it receives is the one when the process "initially" died. It doesn't matter if there is a signal waiting to be handled.

Best Answer

Related Solutions

Default exit code when process is terminated

Linux – What happends when sending SIGKILL to a Zombie Process in Linux

Related Question