Get the exit code of processes forked from the master process

background-processjob-controlprocess

I have a master process (run-jobs below) that starts other jobs as its sub-processes. When the master process fails (e.g. database failure), it exits with a non-0 status code, which is good, and can be verified by looking into $? variable (echo $?).

However, I'd also like to inspect the exit codes of the sub-processes in case the master job fails. Is there a convenient way to check the exit code of process_1 and process_2 below, once the master process is gone?

This is simplified output of ps auxf:

vagrant 5167 | \_ php app/console run-jobs
vagrant 5461 | \_ php process_1
vagrant 5517 | \_ php process_2

Best Answer

Processes report their exit status to their parent and if their parent is dead to the process of id 1 (init), though with recent versions of Linux (3.4 or above), you can designate another ancestor as the child subreaper for that role (using prctl(PR_SET_CHILD_SUBREAPER)).

Actually, after they die, processes become zombies until their parent (or init) retrieves their exit status (with waitpid() or other).

In your case, you're saying the children are dying after (as a result of?) run-jobs dying. That means they'll report their exit status to init or to the process designated as child sub-reaper.

If init doesn't log that (and it generally doesn't) and if you don't use auditing or process accounting, that exit status will be lost.

If on a recent version of Linux, you can create your own sub-reaper to get the pid and exit status of those orphan processes. Like with perl:

$ perl -MPOSIX -le '
  require "syscall.ph";
  syscall(&SYS_prctl,36,1) >= 0 or die "cannot set subreaper: $!";

  # example running 1 child and 2 grand children:
  if (!fork) {
    # There, you would run:
    # exec("php", "run-jobs");
    if (!fork) {exec "sleep 1; exit 12"};
    if (!fork) {exec "sleep 2; exit 123"};
    exit(88)
  }
  # now reporting on all children and grand-children:
  while (($pid = wait) > 0) {
   print "$pid: " . WEXITSTATUS($?)
  }'
22425: 88
22426: 12
22427: 123

If you wanted to retrieve information on the dying processes (like command line, user, ppid...), you'd need to do that while they're still in the zombie state, that is before you've done a wait() on them.

To do that you'd need to use the waitid() API with the WNOWAIT option (and then get the information from /proc or the ps command). I don't think perl has an interface to that though, so you'd need to write it in another language like C.