SIGKILLing after a grace period

initprocess-managementsignalssystemd

I've seen a lot of process managers that try to do this. It was my understanding that you should only use SIGTERM to kill a process. The process could take an unknown amount of time to clean up after itself; on a slow system, it could take minutes. I always thought the only solution was to be patient, and wait for the program to clean up and exit gracefully. If the process does not catch SIGTERMs, then this is a bug and should be reported to the software's maintainer.

I've seen popular tools like docker also trying to do this:

Usage: docker stop [OPTIONS] CONTAINER [CONTAINER…]

Stop a running container (Send SIGTERM, and then SIGKILL after grace period)

Is this bad practice? One thing I'm also curious about is how the shutdown procedure works on a SystemV-style init system. I've had a look at some manpages and some other questions but can't find a definite answer. I'm guessing each init service (terminology?) sees the change in run-level and executes the proper "stop" function as defined in the init script. Does it do this one-by-one, making sure each service ends properly? What happens to processes not handled by init scripts? I've seen some questions vaguely mention a grace period being used before SIGKILLing them, but I was hoping someone could elaborate or at least point me in the right direction. 🙂

If someone could help me find out more about how this works with systemd too, I'd be happy to research and find out more. I've had a look at the manpages but can't find anything definitive here either.

Best Answer

You can either have your cake or eat it. The program wants to have as long as it needs (or wants) to react to SIGTERM. The system (program manager) wants to be able to terminate. If the system waits forever, this allows a program to hijack system shutdown by never responding (either because the program is malicious or because it is buggy).

In a normal shutdown sequence, each daemon is terminated via an init script (call it a shutdown script if you want) provided by the author or the packager of the daemon. Depending on the daemon, the init script may just send it a signal, or it may do a more controlled shutdown (e.g. by writing to a socket). The init script may wait for the daemon to report that it has satisfactorily shut down, or it may forcibly kill it. Init scripts run as root (they may call su to perform part of their task as a less privileged user); they are supposed to be cooperative, so they're allowed to leave the system hanging forever.

Once init scripts have finished their job, all services are supposed to be shut down. Any remaining process is supposed to be either unimportant or misbehaving. So at this stage the remaining processes get told to shut down (SIGTERM), and after a grace period (giving them a last chance to shut down cleanly) the system must go down so any remaining processes are killed forcibly (SIGKILL).

Think of the init scripts as the normal arrest procedure — show warrants, shout a warning, etc. A law-abiding daemon should shut down at this point, though daemons have lawyers (the init scripts) that can delay the procedure as long as they want. Once the normal process is finished, remaining daemons are presumed hostile. The system fires a warning shot (SIGTERM), then after a delay shoots to SIGKILL.

Related Question