I will execute following command for reaping zombie
/usr/bin/preap $(ps -ef | grep defunct | grep -v grep | awk '{ print $2 }' | xargs)
Is there any service impact of this approach ?
processsolariszombie-process
I will execute following command for reaping zombie
/usr/bin/preap $(ps -ef | grep defunct | grep -v grep | awk '{ print $2 }' | xargs)
Is there any service impact of this approach ?
The audit subsystem of the Linux kernel can be very useful to figure out what processes are becoming zombie processes. I just had the following situation:
server ~ # ps -ef --forest
[...]
root 16385 1 0 17:04 ? 00:00:00 /usr/sbin/apache2 -k start
root 16388 16385 0 17:04 ? 00:00:00 \_ /usr/bin/perl -T -CSDAL /usr/lib/iserv/apache_user
root 16389 16385 0 17:04 ? 00:00:00 \_ /usr/bin/perl -T -CSDAL /usr/lib/iserv/apache_user
www-data 16415 16385 0 17:04 ? 00:00:00 \_ /usr/sbin/apache2 -k start
www-data 18254 16415 0 17:23 ? 00:00:00 | \_ [sh] <defunct>
www-data 18347 16415 0 17:23 ? 00:00:00 | \_ [sh] <defunct>
www-data 22966 16415 0 18:18 ? 00:00:00 | \_ [sh] <defunct>
www-data 16583 16385 0 17:05 ? 00:00:01 \_ /usr/sbin/apache2 -k start
www-data 18306 16583 0 17:23 ? 00:00:00 | \_ [sh] <defunct>
www-data 18344 16583 0 17:23 ? 00:00:00 | \_ [sh] <defunct>
www-data 17561 16385 0 17:12 ? 00:00:00 \_ /usr/sbin/apache2 -k start
www-data 22983 17561 0 18:18 ? 00:00:00 | \_ [sh] <defunct>
www-data 18318 16385 0 17:23 ? 00:00:00 \_ /usr/sbin/apache2 -k start
www-data 19725 16385 0 17:43 ? 00:00:01 \_ /usr/sbin/apache2 -k start
www-data 22638 16385 0 18:13 ? 00:00:00 \_ /usr/sbin/apache2 -k start
www-data 22659 16385 0 18:14 ? 00:00:00 \_ /usr/sbin/apache2 -k start
www-data 25102 16385 0 18:41 ? 00:00:00 \_ /usr/sbin/apache2 -k start
www-data 25175 16385 0 18:42 ? 00:00:00 \_ /usr/sbin/apache2 -k start
www-data 25272 16385 0 18:44 ? 00:00:00 \_ /usr/sbin/apache2 -k start
The cause for these zombie processes is most probably a PHP script, but as these Apache child processes are processing lots of HTTP requests and lots of different PHP scripts, it's very hard to figure out which one could be responsible. Linux has also already deallocated important information of these zombie processes, so we don't even have /proc/<pid>/cmdline
to figure out which script or -c
command /bin/sh
may have been running:
server ~ # cat /proc/18254/cmdline
server ~ #
To figure it out, I've installed auditd
: https://linux-audit.com/configuring-and-auditing-linux-systems-with-audit-daemon/
I set up the following audit rules:
auditctl -a always,exit -F arch=b32 -S execve -F path=/bin/dash
auditctl -a always,exit -F arch=b64 -S execve -F path=/bin/dash
These rules audit all process creations of the /bin/dash
binary. /bin/sh
doesn't work here, because it's a symlink and audit apparently only sees the target file name:
server ~ # ls -l /bin/sh
lrwxrwxrwx 1 root root 4 Nov 8 2014 /bin/sh -> dash*
A simple test should now produce audit logs in /var/log/audit/audit.log
(I've taken the liberty and added a lot of line breaks to improve the readability):
server ~ # sh -c 'echo test'
test
server ~ # tail -f /var/log/audit/audit.log
[...]
type=SYSCALL msg=audit(1488219335.976:43871): arch=40000003 syscall=11 \
success=yes exit=0 a0=ffdca3ec a1=f7760e58 a2=ffdd399c a3=ffdca068 items=2 \
ppid=27771 pid=27800 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 \
fsgid=0 tty=pts7 ses=7532 comm="sh" exe="/bin/dash" key=(null)
type=EXECVE msg=audit(1488219335.976:43871): argc=3 a0="sh" a1="-c" \
a2=6563686F2074657374
type=CWD msg=audit(1488219335.976:43871): \
cwd="/var/lib/iserv/remote-support/iserv-martin.von.wittich"
type=PATH msg=audit(1488219335.976:43871): item=0 name="/bin/sh" inode=10403900 \
dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL
type=PATH msg=audit(1488219335.976:43871): item=1 name=(null) inode=5345368 \
dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL
type=PROCTITLE msg=audit(1488219335.976:43871): \
proctitle=7368002D63006563686F2074657374
Lots of the information is encoded, but ausearch
can translate it with -i
:
server ~ # ausearch -i -x /bin/dash | tail
[...]
----
type=PROCTITLE msg=audit(27.02.2017 19:15:35.976:43871) : proctitle=sh
type=PATH msg=audit(27.02.2017 19:15:35.976:43871) : item=1 name=(null) \
inode=5345368 dev=08:01 mode=file,755 ouid=root ogid=root rdev=00:00 \
nametype=NORMAL
type=PATH msg=audit(27.02.2017 19:15:35.976:43871) : item=0 name=/bin/sh \
inode=10403900 dev=08:01 mode=file,755 ouid=root ogid=root rdev=00:00 \
nametype=NORMAL
type=CWD msg=audit(27.02.2017 19:15:35.976:43871) : \
cwd=/var/lib/iserv/remote-support/iserv-martin.von.wittich
type=EXECVE msg=audit(27.02.2017 19:15:35.976:43871) : argc=3 a0=sh a1=-c \
a2=echo test
type=SYSCALL msg=audit(27.02.2017 19:15:35.976:43871) : arch=i386 \
syscall=execve success=yes exit=0 a0=0xffdca3ec a1=0xf7760e58 a2=0xffdd399c \
a3=0xffdca068 items=2 ppid=27771 pid=27800 auid=root uid=root gid=root \
euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=pts7 \
ses=7532 comm=sh exe=/bin/dash key=(null)
----
If you don't want to restrict the ausearch
filtering to /bin/dash
, you can also use ausearch -i -m ALL
to translate the complete log. Another good filter would be ausearch -i -p <PID of a zombie process>
, in this case ausearch -i -p 27800
.
Just leave these rules in place until new zombie processes show up, and then search for the process creation of a zombie PID:
ausearch -i -p <PID>
This should be very helpful to identify the root cause of the zombie processes. In my case it was a PHP script that used proc_open
to spawn a Perl script without closing the handle with proc_close
.
The zombie isn't waiting for its child. Like any zombie process, it stays around until its parent collects it.
You should display all the processes involved to understand what's going on, and look at the PPID as well. Use this command line:
ps -t $(tty) -O ppid,pgid
The parent of the process you're killing is cat
. What happens is that bash runs the background command cat <( sleep 100 & wait )
in a subshell. Since the only thing this subshell does is to set up some redirection and then run an external command, this subshell is replaced by the external command. Here's the rundown:
fork
to execute the background command cat <( sleep 100 & wait )
in a child (14247).
pipe
to create a pipe, then fork
to create a child to run the process substitution sleep 100 & wait
.
fork
to run sleep 100
in the background. Since the grandchild isn't interactive, the background process doesn't run in a separate process group. Then the grandchild waits for sleep
to exit.setpgid
(it's a background job in an interactive shell so it gets its own process group), then execve
to run cat
. (I'm a bit surprised that the process substitution isn't happening in the background process group.)cat
, which knows nothing about any child process and has no business calling wait
. Since the grandchild's parent doesn't reap it, the grandchild stays behind as a zombie.cat
exits — either because you kill it, or because sleep
returns and closes the pipe so cat
sees the end of its input. At that point, the zombie's parent dies, so the zombie is collected by init and init reaps it.If you change the command to
{ cat <( sleep 100 & wait ); echo done; } &
then cat
runs in a separate process, not in the child of the original bash process: the first child has to stay behind to run echo done
. In this case, if you kill the grandchild, it doesn't stay on as a zombie, because the child (which is still running bash at that point) reaps it.
See also How does linux handles zombie process and Can a zombie have orphans? Will the orphan children be disturbed by reaping the zombie?
Best Answer
If you reap a zombie before its parent, you lose whatever effect the reaping would have in the parent. This is obviously application-dependent.
There is very little reason to actively go and reap zombies. Some operating systems don't let you do it, short of manually ptracing the parent process and causing it to execute a
waitpid
system call. Solaris offers apreap
utility, but the only case when you should use it is when a program is misbehaving and filling the process table with zombies.