Linux – Behavior of mmap’d memory on memory pressure

large fileslinuxmemorymmap

I have a large tar file (60GB) containing image files. I'm using mmap() on this entire file to read in these images, which are accessed randomly.

I'm using mmap() for the following reasons:

  1. Thread safety — I cannot seek an ifstream from multiple threads.
  2. I can avoid extra buffering.
  3. I get some caching (in the form of a requested page already being resident.)

The question is what happens when I've read every image in that 60GB file? Certainly not all
of the images are being used at once — they're read, displayed, and then discarded.

My mmap() call is:

mmap(0, totalSize, PROT_READ, MAP_SHARED | MAP_NORESERVE, fd, 0); 

Here's the question: does the kernel see that I've mapped read-only pages backed by a file and simply purges the unused pages on memory pressure? I'm not sure if this case is recognized. Man pages indicate that MAP_NORESERVE will not require backing swap space, but there doesn't seem to be any guarantee of what happens to the pages under memory pressure. Is there any guarantee that the kernel will purge my unneeded pages before it, say, purges the filesystem cache or OOM's another process?

Thanks!

Best Answer

A read-only mmap is largely equivalent to open followed by lseek and read. If a chunk of memory that's mapped in a process is backed up by a file, the copy in RAM is considered part of the disk cache, and will be freed under memory pressure, just like a disk cache entry created by reading from a file.

I haven't checked the source, but I believe MAP_NORESERVE makes no difference for read-only mappings.