Linux Zombie Process – How Linux Handles Zombie Processes

linuxzombie-process

Zombie processes are created in Unix/Linux systems.
We can remove them via the kill command.

But is there any in-built clean-up mechanism in Linux to handle zombie processes?

Best Answer

Zombie processes are already dead. You cannot kill them. The kill command or system call has no effect on a zombie process. (You can make a zombie go away with kill, but you have to shoot the parent, not the zombie, as we'll see in a minute.)

A zombie process is not really a process, it's only an entry in the process table. There are no other resources associated with the zombie process: it doesn't have any memory or any running code, it doesn't hold any files open, etc.

When a process dies, the last thing to go, after all other resources are cleaned up, is the entry in the process table. This entry is kept around, forming a zombie, to allow the parent process to track the exit status of the child. The parent reads the exit status by calling one of the wait family of syscalls; at this point, the zombie disappears. Calling wait is said to reap the child, extending the metaphor of a zombie being dead but in some way still not fully processed into the afterlife. The parent can also indicate that it doesn't care (by ignoring the SIGCHLD signal, or by calling sigaction with the SA_NOCLDWAIT flag), in which case the entry in the process table is deleted immediately when the child dies.

Thus a zombie only exists when a process has died and its parent hasn't called wait yet. This state can only last as long as the parent is still running. If the parent dies before the child or dies without reading the child's status, the zombie's parent process is set to the process with PID 1, which is init. One of the jobs of init is to call wait in a loop and thus reap any zombie process left behind by its parent.

Related Solutions

Way to identify which process turns into Zombie process

The audit subsystem of the Linux kernel can be very useful to figure out what processes are becoming zombie processes. I just had the following situation:

server ~ # ps -ef --forest
[...]
root     16385     1  0 17:04 ?        00:00:00 /usr/sbin/apache2 -k start
root     16388 16385  0 17:04 ?        00:00:00  \_ /usr/bin/perl -T -CSDAL /usr/lib/iserv/apache_user
root     16389 16385  0 17:04 ?        00:00:00  \_ /usr/bin/perl -T -CSDAL /usr/lib/iserv/apache_user
www-data 16415 16385  0 17:04 ?        00:00:00  \_ /usr/sbin/apache2 -k start
www-data 18254 16415  0 17:23 ?        00:00:00  |   \_ [sh] <defunct>
www-data 18347 16415  0 17:23 ?        00:00:00  |   \_ [sh] <defunct>
www-data 22966 16415  0 18:18 ?        00:00:00  |   \_ [sh] <defunct>
www-data 16583 16385  0 17:05 ?        00:00:01  \_ /usr/sbin/apache2 -k start
www-data 18306 16583  0 17:23 ?        00:00:00  |   \_ [sh] <defunct>
www-data 18344 16583  0 17:23 ?        00:00:00  |   \_ [sh] <defunct>
www-data 17561 16385  0 17:12 ?        00:00:00  \_ /usr/sbin/apache2 -k start
www-data 22983 17561  0 18:18 ?        00:00:00  |   \_ [sh] <defunct>
www-data 18318 16385  0 17:23 ?        00:00:00  \_ /usr/sbin/apache2 -k start
www-data 19725 16385  0 17:43 ?        00:00:01  \_ /usr/sbin/apache2 -k start
www-data 22638 16385  0 18:13 ?        00:00:00  \_ /usr/sbin/apache2 -k start
www-data 22659 16385  0 18:14 ?        00:00:00  \_ /usr/sbin/apache2 -k start
www-data 25102 16385  0 18:41 ?        00:00:00  \_ /usr/sbin/apache2 -k start
www-data 25175 16385  0 18:42 ?        00:00:00  \_ /usr/sbin/apache2 -k start
www-data 25272 16385  0 18:44 ?        00:00:00  \_ /usr/sbin/apache2 -k start

The cause for these zombie processes is most probably a PHP script, but as these Apache child processes are processing lots of HTTP requests and lots of different PHP scripts, it's very hard to figure out which one could be responsible. Linux has also already deallocated important information of these zombie processes, so we don't even have /proc/<pid>/cmdline to figure out which script or -c command /bin/sh may have been running:

server ~ # cat /proc/18254/cmdline 
server ~ #

To figure it out, I've installed auditd: https://linux-audit.com/configuring-and-auditing-linux-systems-with-audit-daemon/

I set up the following audit rules:

auditctl -a always,exit -F arch=b32 -S execve -F path=/bin/dash
auditctl -a always,exit -F arch=b64 -S execve -F path=/bin/dash

These rules audit all process creations of the /bin/dash binary. /bin/sh doesn't work here, because it's a symlink and audit apparently only sees the target file name:

server ~ # ls -l /bin/sh
lrwxrwxrwx 1 root root 4 Nov  8  2014 /bin/sh -> dash*

A simple test should now produce audit logs in /var/log/audit/audit.log (I've taken the liberty and added a lot of line breaks to improve the readability):

server ~ # sh -c 'echo test'
test

server ~ # tail -f /var/log/audit/audit.log
[...]
type=SYSCALL msg=audit(1488219335.976:43871): arch=40000003 syscall=11 \
  success=yes exit=0 a0=ffdca3ec a1=f7760e58 a2=ffdd399c a3=ffdca068 items=2 \
  ppid=27771 pid=27800 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 \
  fsgid=0 tty=pts7 ses=7532 comm="sh" exe="/bin/dash" key=(null)
type=EXECVE msg=audit(1488219335.976:43871): argc=3 a0="sh" a1="-c" \
  a2=6563686F2074657374
type=CWD msg=audit(1488219335.976:43871):  \
  cwd="/var/lib/iserv/remote-support/iserv-martin.von.wittich"
type=PATH msg=audit(1488219335.976:43871): item=0 name="/bin/sh" inode=10403900 \
  dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL
type=PATH msg=audit(1488219335.976:43871): item=1 name=(null) inode=5345368 \
  dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL
type=PROCTITLE msg=audit(1488219335.976:43871): \
  proctitle=7368002D63006563686F2074657374

Lots of the information is encoded, but ausearch can translate it with -i:

server ~ # ausearch -i -x /bin/dash | tail                                      
[...]
----
type=PROCTITLE msg=audit(27.02.2017 19:15:35.976:43871) : proctitle=sh 
type=PATH msg=audit(27.02.2017 19:15:35.976:43871) : item=1 name=(null) \
  inode=5345368 dev=08:01 mode=file,755 ouid=root ogid=root rdev=00:00 \
  nametype=NORMAL 
type=PATH msg=audit(27.02.2017 19:15:35.976:43871) : item=0 name=/bin/sh \
  inode=10403900 dev=08:01 mode=file,755 ouid=root ogid=root rdev=00:00 \
  nametype=NORMAL 
type=CWD msg=audit(27.02.2017 19:15:35.976:43871) :  \
  cwd=/var/lib/iserv/remote-support/iserv-martin.von.wittich 
type=EXECVE msg=audit(27.02.2017 19:15:35.976:43871) : argc=3 a0=sh a1=-c \
  a2=echo test 
type=SYSCALL msg=audit(27.02.2017 19:15:35.976:43871) : arch=i386 \
  syscall=execve success=yes exit=0 a0=0xffdca3ec a1=0xf7760e58 a2=0xffdd399c \
  a3=0xffdca068 items=2 ppid=27771 pid=27800 auid=root uid=root gid=root \
  euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=pts7 \
  ses=7532 comm=sh exe=/bin/dash key=(null) 
----

If you don't want to restrict the ausearch filtering to /bin/dash, you can also use ausearch -i -m ALL to translate the complete log. Another good filter would be ausearch -i -p <PID of a zombie process>, in this case ausearch -i -p 27800.

Just leave these rules in place until new zombie processes show up, and then search for the process creation of a zombie PID:

ausearch -i -p <PID>

This should be very helpful to identify the root cause of the zombie processes. In my case it was a PHP script that used proc_open to spawn a Perl script without closing the handle with proc_close.

N upper limit to the number of zombie processes you can have

I don't have HP-UX available to me, and I've never been a big HP-UX fan.

It appears that on Linux, a per-process or maybe per-user limit on how many child processes exists. You can see it with the limit Zsh built-in (seems to be analogous to ulimit -u in bash):

1002 % limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       8MB
coredumpsize    0kB
memoryuse       unlimited
maxproc         16136
  ...

That's on an Arch linux laptop.

I wrote a little program to test that limit:

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>

volatile int sigchld_cnt = 0;

voida
sigchld_hdlr(int signo)
{
        ++sigchld_cnt;
}

int
main(int ac, char **av)
{
        int looping = 1;
        int child_cnt = 0;
        int status;

        signal(SIGCHLD, sigchld_hdlr);

        printf("Parent PID %d\n", getpid());

        while (looping)
        {
                switch (fork())
                {
                case 0:
                        _exit(0);
                        break;
                case -1:
                        fprintf(stderr, "Problem with fork(), %d children: %s\n",
                                child_cnt, strerror(errno));
                        looping = 0;
                        break;
                default:
                        ++child_cnt;
                        break;
                }
        }

        fprintf(stderr, "Sleeping, forked %d child processes\n", child_cnt);
        fprintf(stderr, "Received %d sigchild\n", sigchld_cnt);
        sleep(10);

        looping = 1;
        do {
                int x = wait(&status);

                if (x != -1)
                        --child_cnt;
                else if (errno != EINTR) {
                        fprintf(stderr, "wait() problem %d children left: \%s\n",
                                child_cnt, strerror(errno));
                        looping = 0;
                }
        } while (looping);

        printf("%d children left, %d SIGCHLD\n", child_cnt, sigchld_cnt);

        return 0;
}

It was surprisingly difficult to "collect" all the zombies by calling wait(2) enough times. Also, the number of SIGCHLD signals received is never the same as the number of child processes forked: I believe the linux kernel sometimes sends 1 SIGCHLD for a number of exited child processes.

Anyway, on my Arch linux laptop, I get 16088 child processes forked, and that has to be the number of zombies, as the program doesn't do wait(2) system calls in the signal handler.

On my Slackware 12 server, I get 6076 child processes, which closely matches the value of maxproc 6079. My user ID has 2 other processes running, sshd and Zsh. Along with the first, non-zombie instance of the program above that makes 6079.

The fork(2) system call fails with a "Resource temporarily unavailable" error. I don't see any other evidence of what resource is unavailable. I do get somewhat different numbers if I run my program simultaneously in 2 different xterms, but they add up to the same number as if I run it in one xterm. I assume it's process table entries, or swap or some system-wide resource, and not just an arbitrary limit.

I don't have anything else running to try it on right now.

Best Answer

Related Solutions

Way to identify which process turns into Zombie process

N upper limit to the number of zombie processes you can have

Related Question