Linux – How to kill a process which can’t be killed without rebooting

killlinuxprocesszombie-process

There are 5 processes which can't be killed by kill -9 $PID and executing cat /proc/$PID/cmdline will hang the current session. Maybe they're zombie processes.

Executing ps -ef or htop will also hang the current session. But top and ps -e are working fine.

So it seems that there are two problems the filesystem not responding.

This is a production machine running virtual machines, so rebooting isn't an option.

The following processes ids aren't working:
16181 16765 5985 7427 7547

The parent of these processes is init

        ├─collectd(16765)─┬─{collectd}(16776)
        │                 ├─{collectd}(16777)
        │                 ├─{collectd}(16778)
        │                 ├─{collectd}(16779)
        │                 ├─{collectd}(16780)
        │                 └─{collectd}(16781)
        ├─collectd(28642)───{collectd}(28650)
        ├─collectd(29868)─┬─{collectd}(29873)
        │                 ├─{collectd}(29874)
        │                 ├─{collectd}(29875)
        │                 └─{collectd}(29876)

And one of the qemu processes not working

|-qemu-system-x86(16181)-+-{qemu-system-x86}(16232)
|                        |-{qemu-system-x86}(16238)
|                        |-{qemu-system-x86}(16803)
|                        |-{qemu-system-x86}(17990)
|                        |-{qemu-system-x86}(17991)
|                        |-{qemu-system-x86}(17992)
|                        |-{qemu-system-x86}(18062)
|                        |-{qemu-system-x86}(18066)
|                        |-{qemu-system-x86}(18072)
|                        |-{qemu-system-x86}(18073)
|                        |-{qemu-system-x86}(18074)
|                        |-{qemu-system-x86}(18078)
|                        |-{qemu-system-x86}(18079)
|                        |-{qemu-system-x86}(18086)
|                        |-{qemu-system-x86}(18088)
|                        |-{qemu-system-x86}(18092)
|                        |-{qemu-system-x86}(18107)
|                        |-{qemu-system-x86}(18108)
|                        |-{qemu-system-x86}(18111)
|                        |-{qemu-system-x86}(18113)
|                        |-{qemu-system-x86}(18114)
|                        |-{qemu-system-x86}(18119)
|                        |-{qemu-system-x86}(23147)
|                        `-{qemu-system-x86}(27051)

Best Answer

You don't have zombies. cat /proc/$PID/cmdline wouldn't have any problem with a zombie. If kill -9 doesn't kill the program, it means the program is doing some uninterruptible I/O operation. That usually indicates one of three things:

  • a network filesystem that isn't responding;
  • a kernel bug;
  • a hardware bug.

Utilities such as ps may hang if they try to read some information such as the process executable path that the kernel isn't providing for one of the reasons above.

Try cat /proc/16181/syscall to see what process 16181 is doing. This may or may not work depending on how far gone your system is.

If the problem is a network filesystem, you may be able to force-unmount it, or to make it come online. If the problem is a kernel or hardware bug, what you can do will depend on the nature of the bug. Rebooting (and upgrading to a fixed kernel, or replacing the broken hardware) is strongly recommended.

Related Question