Process Architecture – Why Fork is Needed to Create New Processes

Architectureforkprocess

In Unix whenever we want to create a new process, we fork the current process, creating a new child process which is exactly the same as the parent process; then we do an exec system call to replace all the data from the parent process with that for the new process.

Why do we create a copy of the parent process in the first place and not create a new process directly?

Best Answer

The short answer is, fork is in Unix because it was easy to fit into the existing system at the time, and because a predecessor system at Berkeley had used the concept of forks.

From The Evolution of the Unix Time-sharing System (relevant text has been highlighted):

Process control in its modern form was designed and implemented within a couple of days. It is astonishing how easily it fitted into the existing system; at the same time it is easy to see how some of the slightly unusual features of the design are present precisely because they represented small, easily-coded changes to what existed. A good example is the separation of the fork and exec functions. The most common model for the creation of new processes involves specifying a program for the process to execute; in Unix, a forked process continues to run the same program as its parent until it performs an explicit exec. The separation of the functions is certainly not unique to Unix, and in fact it was present in the Berkeley time-sharing system, which was well-known to Thompson. Still, it seems reasonable to suppose that it exists in Unix mainly because of the ease with which fork could be implemented without changing much else. The system already handled multiple (i.e. two) processes; there was a process table, and the processes were swapped between main memory and the disk. The initial implementation of fork required only

1) Expansion of the process table

2) Addition of a fork call that copied the current process to the disk swap area, using the already existing swap IO primitives, and made some adjustments to the process table.

In fact, the PDP-7's fork call required precisely 27 lines of assembly code. Of course, other changes in the operating system and user programs were required, and some of them were rather interesting and unexpected. But a combined fork-exec would have been considerably more complicated, if only because exec as such did not exist; its function was already performed, using explicit IO, by the shell.

Since that paper, Unix has evolved. fork followed by exec is no longer the only way to run a program.

  • vfork was created to be a more efficient fork for the case where the new process intends to do an exec right after the fork. After doing a vfork, the parent and child processes share the same data space, and the parent process is suspended until the child process either execs a program or exits.

  • posix_spawn creates a new process and executes a file in a single system call. It takes a bunch of parameters that let you selectively share the caller's open files and copy its signal disposition and other attributes to the new process.