Why Process/program becomes zombie

processzombie-process

If script is running fine from command line then, why the same script becomes zombie state after running through cron and How you will troubleshoot the same ?

Here following real example :

[root@abc ~]# ps ax | grep Z
23880 ?        Zs     0:00 [checkloadadv.sh] <defunct>
23926 pts/0    S+     0:00 grep Z
[root@abc ~]# strace -p 23880
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted
[root@abc ~]# pstree | grep  checkload
init-+-crond---crond-+-checkloadadv.sh
[root@abc ~]# bash /usr/bin/checkloadadv.sh
System Load is OK : 0.05

Best Answer

enter image description here

Like actual zombie's a zombie process cannot be killed, because it's already dead.

How it happens

When in Linux/Unix a process dies/ends all information from the process gets removed from the system memory, only the process descriptor stays. The process get in the state Z (zombie). His parent process gets a signal from the kernel: SIGCHLD, that means that one of his child processes exits, is interrupted or resumes after being interrupted (in our case it simply exits).

The parent process now needs to execute the wait() syscall to read the exit status and other information from his child process. Then the descriptor gets removed from the memory and the process is no longer a zombie.

If the parent process never calls the wait() syscall, the zombie process descriptor stays in the memory and eats brains. Normally you don't see zombie processes, because the procedure above take less time.

The dawn of the dead

Each process descriptor needs a very small amount of memory, so a few zombies are not very dangerous (like in real life). One problem is that each zombie process keeps his process id, and a Linux/Unix operating system has a limited number of pid's. If an improperly programmed software generates a lot of zombie processes, it can happen that processes cannot be started anymore because no more process id's are available.

So, if they are in huge groups they are very dangerous (like in many movies is demonstrated very well)

How can we defend ourselves against a horde of zombies?

A shot in the head would work, but I don't know the command for that (SIGKILL won't work because the process is already dead).

Well, you can send SIGCHLD via kill to the parent process, but when it ignores this signal, what then? Your only option is to kill the parent process and that the init process "adopts" the zombie. Init calls periodically the wait() syscall to clean up his zombie children.

In your case

In your case, you have to send SIGCHLD to the crond process:

root@host:~# strace -p $(pgrep cron)
Process 1180 attached - interrupt to quit

Then from another terminal:

root@host:~$ kill -17 $(pgrep cron)

The output is:

restart_syscall(<... resuming interrupted call ...>) = ? ERESTART_RESTARTBLOCK (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7fff51be39dc, WNOHANG, NULL) = -1 ECHILD (No child processes) <-- Here it happens
rt_sigreturn(0xffffffffffffffff)        = -1 EINTR (Interrupted system call)
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1892, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {0x403170, [CHLD], SA_RESTORER|SA_RESTART, 0x7fd6a7e9d4a0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({42, 0}, ^C <unfinished ...>
Process 1180 detached

You see the wait4() syscall returns -1 ECHILD, which means that no child process is there. So the conclusion is: cron reacts to the SIGCHLD syscall and should not force the apocalypse.

Related Question