Linux – How Memory Mapping a File Increases Performance Over Standard I/O

iolinuxvirtual-memory

Operating System Concepts says

Consider a sequential read of a file on disk using the standard
system calls open(), read(), and write()
. Each file access requires
a system call and disk access
.

Alternatively, we can use the virtual memory techniques discussed so
far to treat file I/O as routine memory accesses. This approach, known
as
memory mapping a file, allows a part of the virtual address space to be logically associated with the file. As we shall see, this can
lead to significant performance increases. Memory mapping a file is
accomplished by mapping a disk block to a page (or pages) in memory.
Initial access to the file proceeds through ordinary demand paging,
resulting in a page fault. However, a page-sized portion of the file is
read from the file system into a physical page (some systems may opt to
read in more than a page-sized chunk of memory at a time). Subsequent
reads and writes to the file are handled as routine memory accesses.
Manipulating files through memory rather than incurring the overhead of
using the read() and write() system calls simplifies and speeds up file
access and usage.

Could you analyze the performance of memory mapped file?

If I am correct, memory mapping file works as following. It takes a system call to create a memory mapping.
Then when it accesses the mapped memory, page faults happen. Page faults also have overhead.

How does memory mapping a file have significant performance increases over the standard I/O system calls?

Thanks.

Best Answer

Memory mapping a file directly avoids copying buffers which happen with read() and write() calls. Calls to read() and write() include a pointer to buffer in process' address space where the data is stored. Kernel has to copy the data to/from those locations. Using mmap() maps the file to process' address space, so the process can address the file directly and no copies are required.

There is also no system call overhead when accessing memory mapped file after the initial call if the file is loaded to memory at initial mmap(). If a page of the mapped file is not in memory, access will generate a fault and require kernel to load the page to memory. Reading a large block with read() can be faster than mmap() in such cases, if mmap() would generate significant number of faults to read the file. (It is possible to advise kernel in advance with madvise() so that the kernel may load the pages in advance before access).

For more details, there is related question on Stack Overflow: mmap() vs. reading blocks

Related Question