System Calls – Difference Between fork() and vfork()

forklinuxsystem-calls

I would like to understand in detail the difference between fork() and vfork(). I was not able to digest the man page completely.

I would also like to clarify one of my colleagues comment "In current Linux, there is no vfork(), even if you call it, it will internally call fork()."

Best Answer

Man pages are usually terse reference documents. Wikipedia is a better place to turn to for conceptual explanations.

Fork duplicates a process: it creates a child process which is almost identical to the parent process (the most obvious difference is that the new process has a different process ID). In particular, fork (conceptually) must copy all the parent process's memory.

As this is rather costly, vfork was invented to handle a common special case where the copy is not necessary. Often, the first thing the child process does is to load a new program image, so this is what happens:

if (fork()) {
    # parent process …
} else {
    # child process (with a new copy of the process memory)
    execve("/bin/sh", …);  # discard the process memory
}

The execve call loads a new executable program, and this replaces the process's code and data memory by the code of the new executable and a fresh data memory. So the whole memory copy created by fork was all for nothing.

Thus the vfork call was invented. It does not make a copy of the memory. Therefore vfork is cheap, but it's hard to use since you have to make sure you don't access any of the process's stack or heap space in the child process. Note that even reading could be a problem, because the parent process keeps executing. For example, this code is broken (it may or may not work depending on whether the child or the parent gets a time slice first):

if (vfork()) {
    # parent process
    cmd = NULL; # modify the only copy of cmd
} else {
    # child process
    execve("/bin/sh", "sh", "-c", cmd, (char*)NULL);  # read the only copy of cmd
}

Since the invention of vfork, better optimizations have been invented. Most modern systems, including Linux, use a form of copy-on-write, where the pages in the process memory are not copied at the time of the fork call, but later when the parent or child first writes to the page. That is, each page starts out as shared, and remains shared until either process writes to that page; the process that writes gets a new physical page (with the same virtual address). Copy-on-write makes vfork mostly useless, since fork won't make any copy in the cases where vfork would be usable.

Linux does retain vfork. The fork system call must still make a copy of the process's virtual memory table, even if it doesn't copy the actual memory; vfork doesn't even need to do this. The performance improvement is negligible in most applications.

-i bytes-per-inode (aka inode_ratio)

For some unknown reason this parameter is sometime documented as bytes-per-inode and sometime as inode_ratio. According to the documentation, this is the bytes/inode ratio. Most human will have a better understanding when stated as either (excuse my english):

1 inode for every X bytes of storage (where X is bytes-per-inode).
lowest average-filesize you can fit.

The formula (taken from the mke2fs source code):

inode_count = (blocks_count * blocksize) / inode_ratio

Or even simplified (assuming "partition size" is roughly equivalent to blocks_count * blocksize, I haven't checked the allocation):

inode_count = (partition_size_in_bytes) / inode_ratio

Note 1: Even if you provide a fixed number of inode at FS creation time (mkfs -N ...), the value is converted into a ratio, so you can fit more inode as you extend the size of the filesystem.

Note 2: If you tune this ratio, make sure to allocate significantly more inode than what you plan to use... you really don't want to reformat your filesystem.

-I inode-size

This is the number of byte the filesystem will allocate/reserve for each inode the filesystem may have. The space is used to store the attributes of the inode (read Intro to Inodes). In Ext3, the default size was 128. In Ext4, the default size is 256 (to store extra_isize and provide space for inline extended-attributes). read Linux: Why change inode size?

Note: X bytes of disjkspace is allocated for each allocated inode, whether is free or used, where X=inode-size.

Best Answer

Related Solutions

Linux – Difference Between a Library Call and a System Call

Linux – the difference between “inode size” and “Bytes per inode”

-i bytes-per-inode (aka inode_ratio)

-I inode-size

Related Question