Bash – Why is a zombie waiting for its child

bashprocesssignalszombie-process

I'm digging through different sources, but can't find a good description of the anatomy of child reaping. This is a simple case of what I would like to understand.

$ cat <( sleep 100 & wait ) &
[1] 14247
$ ps ax -O pgid | grep $$
12126 12126 S pts/17   00:00:00 bash
14248 12126 S pts/17   00:00:00 bash
14249 12126 S pts/17   00:00:00 sleep 100
14251 14250 S pts/17   00:00:00 grep --color=auto 12126
$ kill -2 14248

$ ps ax -O pgid | grep $$
12126 12126 S pts/17   00:00:00 bash
14248 12126 Z pts/17   00:00:00 [bash] <defunct>
14249 12126 S pts/17   00:00:00 sleep 100
14255 14254 S pts/17   00:00:00 grep --color=auto 12126

Why is the zombie waiting for the kid?

Can you explain this one? Do I need to know C and read Bash source code to get a wider understanding of this or is there any documentation? I've already consulted:

GNU bash, version 4.3.42(1)-release (x86_64-pc-linux-gnu)

Linux 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Best Answer

The zombie isn't waiting for its child. Like any zombie process, it stays around until its parent collects it.

You should display all the processes involved to understand what's going on, and look at the PPID as well. Use this command line:

ps -t $(tty) -O ppid,pgid

The parent of the process you're killing is cat. What happens is that bash runs the background command cat <( sleep 100 & wait ) in a subshell. Since the only thing this subshell does is to set up some redirection and then run an external command, this subshell is replaced by the external command. Here's the rundown:

  • The original bash (12126) calls fork to execute the background command cat <( sleep 100 & wait ) in a child (14247).
    • The child (14247) calls pipe to create a pipe, then fork to create a child to run the process substitution sleep 100 & wait.
      • The grandchild (14248) calls fork to run sleep 100 in the background. Since the grandchild isn't interactive, the background process doesn't run in a separate process group. Then the grandchild waits for sleep to exit.
    • The child (14247) calls setpgid (it's a background job in an interactive shell so it gets its own process group), then execve to run cat. (I'm a bit surprised that the process substitution isn't happening in the background process group.)
  • You kill the grandchild (14248). Its parent is running cat, which knows nothing about any child process and has no business calling wait. Since the grandchild's parent doesn't reap it, the grandchild stays behind as a zombie.
  • Eventually, cat exits — either because you kill it, or because sleep returns and closes the pipe so cat sees the end of its input. At that point, the zombie's parent dies, so the zombie is collected by init and init reaps it.

If you change the command to

{ cat <( sleep 100 & wait ); echo done; } &

then cat runs in a separate process, not in the child of the original bash process: the first child has to stay behind to run echo done. In this case, if you kill the grandchild, it doesn't stay on as a zombie, because the child (which is still running bash at that point) reaps it.

See also How does linux handles zombie process and Can a zombie have orphans? Will the orphan children be disturbed by reaping the zombie?

Related Question