Pthreads and vfork

vfork

I am trying to check what really happens to pthreads while one of them performs vfork.
The spec says that the parent "thread of control" is "suspended" until the child process calls exec* or _exit.
As I understand, the consensus is that it means that the whole parent process (that is: with all of its pthreads) is suspended.
I'd like to confirm it using an experiment.
So far I performed several experiments, all of which suggest that other pthreads are running. As I have no linux experience, I suspect that my interpretation of these experiments is wrong, and learning the real interpretation of these results could help avoid further misconceptions in my life.
So here are the exepriments I did:

Experiment I

#include<unistd.h>
#include<signal.h>
#include<errno.h>
#include<cstring>
#include<string>
#include<iostream>
using namespace std;
void * job(void *x){
  int pid=vfork();
  if(-1==pid){
    cerr << "failed to fork: " << strerror(errno) << endl;
    _exit(-3);
  }
  if(!pid){
    cerr << "A" << endl;
    cerr << "B" << endl;
    if(-1 == execlp("/bin/ls","ls","repro.cpp",(char*)NULL)){
      cerr << "failed to exec : " << strerror(errno) << endl;
      _exit(-4);//serious problem, can not proceed
    }
  }
  return NULL;
}
int main(){
  signal(SIGPIPE,SIG_IGN);
  signal(SIGCHLD,SIG_IGN);
  const int thread_count = 4;
  pthread_t thread[thread_count];
  int err;
  for(size_t i=0;i<thread_count;++i){
    if((err = pthread_create(thread+i,NULL,job,NULL))){
      cerr << "failed to create pthread: " << strerror(err) << endl;
      return -7;
    }
  }
  for(size_t i=0;i<thread_count;++i){
    if((err = pthread_join(thread[i],NULL))){
      cerr << "failed to join pthread: " << strerror(err) << endl;
      return -17;
    }
  }
}

There are 44 pthreads, all of which perform vfork, and exec in the child.
Each child process, performs two output operations between the vfork and exec "A" and "B".
The theory suggests that the output should read ABABABABABA…without nesting.
However the output is a total mess: for example:

AAAA



BB
B

B

Experiment II

Suspecting that using I/O lib after vfork could be a bad idea, I've replaced the job() function with the following :

const int S = 10000000;
int t[S];
void * job(void *x){
  int pid=vfork();
  if(-1==pid){
    cerr << "failed to fork: " << strerror(errno) << endl;
    _exit(-3);
  }
  if(!pid){
    for(int i=0;i<S;++i){
      t[i]=i;
    }
    for(int i=0;i<S;++i){
      t[i]-=i;
    }
    for(int i=0;i<S;++i){
      if(t[i]){
        cout << "INCONSISTENT STATE OF t[" << i << "] = " << t[i] << " DETECTED" << endl;
      }
    }
    if(-1 == execlp("/bin/ls","ls","repro.cpp",(char*)NULL)){
      cerr << "failed to execlp : " << strerror(errno) << endl;
      _exit(-4);
    }
  }
  return NULL;
}

This time, I perform two loops such that the second one undoes the results of the first one, so at the end the global table t[] should be back to the initial state (which by definition is all zeros).
If entering the child process freezes the other pthreads making them unable to call vfork until current child finishes the loops, then the array should be all zeros at the end.
And I confirmed that when I use fork() instead of vfork() then the above code does not produce any output.
However, when I change fork() to vfork() I get tons of inconsistencies reported to stdout.

Experiment III

One more experiment is described here https://unix.stackexchange.com/a/163761/88901 – it involved calling sleep, but actually the results were the same when I've replaced it with a long for loop.

Best Answer

The Linux man page for vork is quite specific:

vfork() differs from fork(2) in that the calling thread is suspended until the child terminates

It is not the whole process, but indeed the calling thread. This behavior is not guaranteed by POSIX or other standards, other implementations may do different things (up to and including simply implementing vfork with a plain fork).

(Rich Felker also notes this behavior in vfork considered dangerous.)

Using fork in a multi-threaded program is hard enough to reason about already, calling vfork is at least as bad. Your tests are full of undefined behavior, you're not even allowed to call a function (let alone do I/O) inside the vfork'd child, except for exec-type functions and _exit (not even exit, and returning causes mayhem).

Here's an example adapted from yours which I believe is nearly free of undefined behavior assuming a compiler/implementation that doesn't output function calls for atomic reads and writes on ints. (The one problem is the write to start after the vfork - that's not allowed.) Error handling elided to keep it short.

#include<unistd.h>
#include<signal.h>
#include<errno.h>
#include<atomic>
#include<cstring>
#include<string>
#include<iostream>

std::atomic<int> start;
std::atomic<int> counter;
const int thread_count = 4;

void *vforker(void *){
  std::cout << "vforker starting\n";
  int pid=vfork();
  if(pid == 0){
    start = 1;
    while (counter < (thread_count-1))
      ;
    execlp("/bin/date","date",nullptr);
  }
  std::cout << "vforker done\n";
  return nullptr;
}

void *job(void *){
  while (start == 0)
    ;
  counter++;
  return NULL;
}

int main(){
  signal(SIGPIPE,SIG_IGN);
  signal(SIGCHLD,SIG_IGN);
  pthread_t thread[thread_count];
  counter = 0;
  start   = 0;

  pthread_create(&(thread[0]), nullptr, vforker, nullptr);
  for(int i=1;i<thread_count;++i)
    pthread_create(&(thread[i]), nullptr, job, nullptr);

  for(int i=0;i<thread_count;++i)
    pthread_join(thread[i], nullptr);
}

The idea is this: the normal threads wait (busy-loop) for the atomic global variable start to be 1 before incrementing a global atomic counter. The thread that does a vfork sets start to 1 in the vfork child, then waits (busy-loop again) for the other threads to have incremented the counter.

If the other threads were suspended during vfork, no progress could ever be made: the suspended threads would never increment counter (they'd have been suspended before start was set to 1), so the vforker thread would be stuck in an infinite busy-wait.

Related Solutions

Linux – How to process doing “vfork without exec” end up in a long uninterruptible sleep

When a process calls vfork, the parent remains in state D as long as the child hasn't executed _exit or execve (the only two authorized functions, together with execve's relatives like execvp, etc.). The parent is still executing the vfork call, so it's in state D.

If the child does something like this (which is stupid, but valid), the parent will remain in state D indefinitely, while the child will remain in state R indefinitely.

if (!vfork()) while (1) {}

vfork() – Why Use When Child Process Calls exec() or exit()

When a child process created by vfork() calls exec(), doesn't exec() modify the address space of the parent process, by loading the new program?

No, exec() provides a new address space for the new program; it doesn’t modify the parent address space. See for example the discussion of the exec functions in POSIX, and the Linux execve() manpage.

When a child process created by vfork() calls exit(), does exit() not modify the address space of the parent process when terminating the child?

Plain exit() might – it runs exit hooks installed by the running program (including its libraries). vfork() is more restrictive; thus, on Linux, it mandates the use of _exit() which doesn’t call the C library’s clean-up functions.

vfork() turned out to be quite difficult to get right; it’s been removed in current versions of the POSIX standard, and posix_spawn() should be used instead.

However, unless you really know what you’re doing, you should not use either vfork() or posix_spawn(); stick to good old fork() and exec().

The Linux manpage linked above provides more context:

However, in the bad old days a fork(2) would require making a complete copy of the caller's data space, often needlessly, since usually immediately afterward an exec(3) is done. Thus, for greater efficiency, BSD introduced the vfork() system call, which did not fully copy the address space of the parent process, but borrowed the parent's memory and thread of control until a call to execve(2) or an exit occurred. The parent process was suspended while the child was using its resources. The use of vfork() was tricky: for example, not modifying data in the parent process depended on knowing which variables were held in a register.