N upper limit to the number of zombie processes you can have

hp-uxkilllimitprocesszombie-process

I used to work with an HP-UX system and the old admin told me there is an upper limit on the number of zombie processes you can have on the system, I believe 1024.

  • Is this a hard fact ceiling? I think you could have any number of zombies just as if you can have any number of processes…?
  • Is it different value from distro to distro?
  • What occurs if we hit the upper limit and try to create another zombie?

Best Answer

I don't have HP-UX available to me, and I've never been a big HP-UX fan.

It appears that on Linux, a per-process or maybe per-user limit on how many child processes exists. You can see it with the limit Zsh built-in (seems to be analogous to ulimit -u in bash):

1002 % limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       8MB
coredumpsize    0kB
memoryuse       unlimited
maxproc         16136
  ...

That's on an Arch linux laptop.

I wrote a little program to test that limit:

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>

volatile int sigchld_cnt = 0;

voida
sigchld_hdlr(int signo)
{
        ++sigchld_cnt;
}

int
main(int ac, char **av)
{
        int looping = 1;
        int child_cnt = 0;
        int status;

        signal(SIGCHLD, sigchld_hdlr);

        printf("Parent PID %d\n", getpid());

        while (looping)
        {
                switch (fork())
                {
                case 0:
                        _exit(0);
                        break;
                case -1:
                        fprintf(stderr, "Problem with fork(), %d children: %s\n",
                                child_cnt, strerror(errno));
                        looping = 0;
                        break;
                default:
                        ++child_cnt;
                        break;
                }
        }

        fprintf(stderr, "Sleeping, forked %d child processes\n", child_cnt);
        fprintf(stderr, "Received %d sigchild\n", sigchld_cnt);
        sleep(10);

        looping = 1;
        do {
                int x = wait(&status);

                if (x != -1)
                        --child_cnt;
                else if (errno != EINTR) {
                        fprintf(stderr, "wait() problem %d children left: \%s\n",
                                child_cnt, strerror(errno));
                        looping = 0;
                }
        } while (looping);

        printf("%d children left, %d SIGCHLD\n", child_cnt, sigchld_cnt);

        return 0;
}

It was surprisingly difficult to "collect" all the zombies by calling wait(2) enough times. Also, the number of SIGCHLD signals received is never the same as the number of child processes forked: I believe the linux kernel sometimes sends 1 SIGCHLD for a number of exited child processes.

Anyway, on my Arch linux laptop, I get 16088 child processes forked, and that has to be the number of zombies, as the program doesn't do wait(2) system calls in the signal handler.

On my Slackware 12 server, I get 6076 child processes, which closely matches the value of maxproc 6079. My user ID has 2 other processes running, sshd and Zsh. Along with the first, non-zombie instance of the program above that makes 6079.

The fork(2) system call fails with a "Resource temporarily unavailable" error. I don't see any other evidence of what resource is unavailable. I do get somewhat different numbers if I run my program simultaneously in 2 different xterms, but they add up to the same number as if I run it in one xterm. I assume it's process table entries, or swap or some system-wide resource, and not just an arbitrary limit.

I don't have anything else running to try it on right now.

Related Question