Bash – Why is a zombie waiting for its child

bashprocesssignalszombie-process

I'm digging through different sources, but can't find a good description of the anatomy of child reaping. This is a simple case of what I would like to understand.

$ cat <( sleep 100 & wait ) &
[1] 14247
$ ps ax -O pgid | grep $$
12126 12126 S pts/17   00:00:00 bash
14248 12126 S pts/17   00:00:00 bash
14249 12126 S pts/17   00:00:00 sleep 100
14251 14250 S pts/17   00:00:00 grep --color=auto 12126
$ kill -2 14248

$ ps ax -O pgid | grep $$
12126 12126 S pts/17   00:00:00 bash
14248 12126 Z pts/17   00:00:00 [bash] <defunct>
14249 12126 S pts/17   00:00:00 sleep 100
14255 14254 S pts/17   00:00:00 grep --color=auto 12126

Why is the zombie waiting for the kid?

Can you explain this one? Do I need to know C and read Bash source code to get a wider understanding of this or is there any documentation? I've already consulted:

various links on this site and Stack Overflow
The Linux Command Line by W. Shotts
man bash
Bash Reference Manual (in Bash source code docs)
Bash Guide for Beginners @ tldp.org
Advanced Bash-Scripting Guide

GNU bash, version 4.3.42(1)-release (x86_64-pc-linux-gnu)

Linux 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Best Answer

The zombie isn't waiting for its child. Like any zombie process, it stays around until its parent collects it.

You should display all the processes involved to understand what's going on, and look at the PPID as well. Use this command line:

ps -t $(tty) -O ppid,pgid

The parent of the process you're killing is cat. What happens is that bash runs the background command cat <( sleep 100 & wait ) in a subshell. Since the only thing this subshell does is to set up some redirection and then run an external command, this subshell is replaced by the external command. Here's the rundown:

The original bash (12126) calls fork to execute the background command cat <( sleep 100 & wait ) in a child (14247).
- The child (14247) calls pipe to create a pipe, then fork to create a child to run the process substitution sleep 100 & wait.
  - The grandchild (14248) calls fork to run sleep 100 in the background. Since the grandchild isn't interactive, the background process doesn't run in a separate process group. Then the grandchild waits for sleep to exit.
- The child (14247) calls setpgid (it's a background job in an interactive shell so it gets its own process group), then execve to run cat. (I'm a bit surprised that the process substitution isn't happening in the background process group.)
You kill the grandchild (14248). Its parent is running cat, which knows nothing about any child process and has no business calling wait. Since the grandchild's parent doesn't reap it, the grandchild stays behind as a zombie.
Eventually, cat exits — either because you kill it, or because sleep returns and closes the pipe so cat sees the end of its input. At that point, the zombie's parent dies, so the zombie is collected by init and init reaps it.

If you change the command to

{ cat <( sleep 100 & wait ); echo done; } &

then cat runs in a separate process, not in the child of the original bash process: the first child has to stay behind to run echo done. In this case, if you kill the grandchild, it doesn't stay on as a zombie, because the child (which is still running bash at that point) reaps it.

How it happens

When in Linux/Unix a process dies/ends all information from the process gets removed from the system memory, only the process descriptor stays. The process get in the state Z (zombie). His parent process gets a signal from the kernel: SIGCHLD, that means that one of his child processes exits, is interrupted or resumes after being interrupted (in our case it simply exits).

The parent process now needs to execute the wait() syscall to read the exit status and other information from his child process. Then the descriptor gets removed from the memory and the process is no longer a zombie.

If the parent process never calls the wait() syscall, the zombie process descriptor stays in the memory and eats brains. Normally you don't see zombie processes, because the procedure above take less time.

The dawn of the dead

Each process descriptor needs a very small amount of memory, so a few zombies are not very dangerous (like in real life). One problem is that each zombie process keeps his process id, and a Linux/Unix operating system has a limited number of pid's. If an improperly programmed software generates a lot of zombie processes, it can happen that processes cannot be started anymore because no more process id's are available.

So, if they are in huge groups they are very dangerous (like in many movies is demonstrated very well)

How can we defend ourselves against a horde of zombies?

A shot in the head would work, but I don't know the command for that (SIGKILL won't work because the process is already dead).

Well, you can send SIGCHLD via kill to the parent process, but when it ignores this signal, what then? Your only option is to kill the parent process and that the init process "adopts" the zombie. Init calls periodically the wait() syscall to clean up his zombie children.

In your case

In your case, you have to send SIGCHLD to the crond process:

root@host:~# strace -p $(pgrep cron)
Process 1180 attached - interrupt to quit

Then from another terminal:

root@host:~$ kill -17 $(pgrep cron)

The output is:

restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7fff51be39dc, WNOHANG, NULL) = -1 ECHILD (No child processes) <-- Here it happens
rt_sigreturn(0xffffffffffffffff)        = -1 EINTR (Interrupted system call)
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1892, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {0x403170, [CHLD], SA_RESTORER|SA_RESTART, 0x7fd6a7e9d4a0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({42, 0}, ^C <unfinished ...>
Process 1180 detached

You see the wait4() syscall returns -1 ECHILD, which means that no child process is there. So the conclusion is: cron reacts to the SIGCHLD syscall and should not force the apocalypse.

Linux – What happends when sending SIGKILL to a Zombie Process in Linux

To answer that question, you have to understand how signals are sent to a process and how a process exists in the kernel.

Each process is represented as a task_struct inside the kernel (the definition is in the sched.h header file and begins here). That struct holds information about the process; for instance the pid. The important information is in line 1566 where the associated signal is stored. This is set only if a signal is sent to the process.

A dead process or a zombie process still has a task_struct. The struct remains, until the parent process (natural or by adoption) has called wait() after receiving SIGCHLD to reap its child process. When a signal is sent, the signal_struct is set. It doesn't matter if the signal is a catchable one or not, in this case.

Signals are evaluated every time when the process runs. Or to be exact, before the process would run. The process is then in the TASK_RUNNING state. The kernel runs the schedule() routine which determines the next running process according to its scheduling algorithm. Assuming this process is the next running process, the value of the signal_struct is evaluated, whether there is a waiting signal to be handled or not. If a signal handler is manually defined (via signal() or sigaction()), the registered function is executed, if not the signal's default action is executed. The default action depends on the signal being sent.

For instance, the SIGSTOP signal's default handler will change the current process's state to TASK_STOPPED and then run schedule() to select a new process to run. Notice, SIGSTOP is not catchable (like SIGKILL), therefore there is no possibility to register a manual signal handler. In case of an uncatchable signal, the default action will always be executed.

To your question:

A defunct or dead process will never be determined by the scheduler to be in the TASK_RUNNING state again. Thus the kernel will never run the signal handler (default or defined) for the corresponding signal, whichever signal is was. Therefore the exit_signal will never be set again. The signal is "delivered" to the process by setting the signal_struct in task_struct of the process, but nothing else will happen, because the process will never run again. There is no code to run, all that remains of the process is that process struct.

However, if the parent process reaps its children by wait(), the exit code it receives is the one when the process "initially" died. It doesn't matter if there is a signal waiting to be handled.

Best Answer

Related Solutions

Why Process/program becomes zombie

How it happens

The dawn of the dead

How can we defend ourselves against a horde of zombies?

In your case

Linux – What happends when sending SIGKILL to a Zombie Process in Linux

Related Question