N upper limit to the number of zombie processes you can have

hp-uxkilllimitprocesszombie-process

I used to work with an HP-UX system and the old admin told me there is an upper limit on the number of zombie processes you can have on the system, I believe 1024.

Is this a hard fact ceiling? I think you could have any number of zombies just as if you can have any number of processes…?
Is it different value from distro to distro?
What occurs if we hit the upper limit and try to create another zombie?

Best Answer

I don't have HP-UX available to me, and I've never been a big HP-UX fan.

It appears that on Linux, a per-process or maybe per-user limit on how many child processes exists. You can see it with the limit Zsh built-in (seems to be analogous to ulimit -u in bash):

1002 % limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       8MB
coredumpsize    0kB
memoryuse       unlimited
maxproc         16136
  ...

That's on an Arch linux laptop.

I wrote a little program to test that limit:

#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>

volatile int sigchld_cnt = 0;

voida
sigchld_hdlr(int signo)
{
        ++sigchld_cnt;
}

int
main(int ac, char **av)
{
        int looping = 1;
        int child_cnt = 0;
        int status;

        signal(SIGCHLD, sigchld_hdlr);

        printf("Parent PID %d\n", getpid());

        while (looping)
        {
                switch (fork())
                {
                case 0:
                        _exit(0);
                        break;
                case -1:
                        fprintf(stderr, "Problem with fork(), %d children: %s\n",
                                child_cnt, strerror(errno));
                        looping = 0;
                        break;
                default:
                        ++child_cnt;
                        break;
                }
        }

        fprintf(stderr, "Sleeping, forked %d child processes\n", child_cnt);
        fprintf(stderr, "Received %d sigchild\n", sigchld_cnt);
        sleep(10);

        looping = 1;
        do {
                int x = wait(&status);

                if (x != -1)
                        --child_cnt;
                else if (errno != EINTR) {
                        fprintf(stderr, "wait() problem %d children left: \%s\n",
                                child_cnt, strerror(errno));
                        looping = 0;
                }
        } while (looping);

        printf("%d children left, %d SIGCHLD\n", child_cnt, sigchld_cnt);

        return 0;
}

It was surprisingly difficult to "collect" all the zombies by calling wait(2) enough times. Also, the number of SIGCHLD signals received is never the same as the number of child processes forked: I believe the linux kernel sometimes sends 1 SIGCHLD for a number of exited child processes.

Anyway, on my Arch linux laptop, I get 16088 child processes forked, and that has to be the number of zombies, as the program doesn't do wait(2) system calls in the signal handler.

On my Slackware 12 server, I get 6076 child processes, which closely matches the value of maxproc 6079. My user ID has 2 other processes running, sshd and Zsh. Along with the first, non-zombie instance of the program above that makes 6079.

The fork(2) system call fails with a "Resource temporarily unavailable" error. I don't see any other evidence of what resource is unavailable. I do get somewhat different numbers if I run my program simultaneously in 2 different xterms, but they add up to the same number as if I run it in one xterm. I assume it's process table entries, or swap or some system-wide resource, and not just an arbitrary limit.

I don't have anything else running to try it on right now.

Related Solutions

Impact of Reaping Zombie Processes in Solaris

If you reap a zombie before its parent, you lose whatever effect the reaping would have in the parent. This is obviously application-dependent.

There is very little reason to actively go and reap zombies. Some operating systems don't let you do it, short of manually ptracing the parent process and causing it to execute a waitpid system call. Solaris offers a preap utility, but the only case when you should use it is when a program is misbehaving and filling the process table with zombies.

Does OpenBSD have a limit to the number of file descriptors

Taking a look at the source code, to get the default value of max open files:

Well documented code

extern int maxfiles;                 /* kernel limit on number of open files */

maxfiles, on param.c defines the formula to maxfiles

int maxfiles = 5 * (NPROCESS + MAXUSERS) + 80;

OK, we found it.

NPROCESS =

#define NPROCESS (30 + 16 * MAXUSERS)

MAXUSERS = - Lets take amd64 architecture as an example:

machine         amd64
include         "../../../conf/GENERIC"
maxusers        80                      # estimated number of users

Lets sum all the stuff:

maxfiles = 5 * ((30 + 16 * 80) + 80) + 80
maxfiles = 5 * ((30 + 1280) + 80) + 80
maxfiles = 5 * (1390) + 80
maxfiles = 6950 + 80
maxfiles = 7030

To increase the total of max open files, you will need first of all to increase the max open files kernel limit with sysctl kern.maxfiles=20000 and increase the number of files a process/user can open, editing login.conf. This Tor daemon setup have both examples for you.

Best Answer

Related Solutions

Impact of Reaping Zombie Processes in Solaris

Does OpenBSD have a limit to the number of file descriptors

Related Question