Linux – what is the purpose of memory overcommitment on Linux

linuxmmap

I know about memory overcommitment and I profoundly dislike it and usually disable it. I am not thinking of setuid-based system processes (like those running sudo or postfix) but of an ordinary Linux process started on some command line by some user not having admin privileges.

A well written program could malloc (or mmap which is often used by malloc) more memory than available and crash when using it. Without memory overcommitment, that malloc or mmap would fail and the well written program would catch that failure. The poorly written program (using malloc without checks against failure) would crash when using the result of a failed malloc.

Of course virtual address space (which gets extended by mmap so by malloc) is not the same as RAM (RAM is a resource managed by the kernel, see this; processes have their virtual address space initialized by execve(2) and extended by mmap & sbrk so don't consume directly RAM, only virtual memory).

Notice that optimizing RAM usage could be done with madvise(2) (which could give a hint, using MADV_DONTNEED to the kernel to swap some pages onto the disk), when really needed. Programs wanting some overcommitment could use mmap(2) with MAP_NORESERVE. My understanding of memory overcommitment is as if every memory mapping (by execve or mmap) is using implicitly MAP_NORESERVE

My perception of it is that it is simply useful for very buggy programs. But IMHO a real developer should always check failure of malloc, mmap and related virtual address space changing functions (e.g. like here). And most free software programs whose source code I have studied have such check, perhaps as some xmalloc function….

Are there real life programs, e.g. packaged in a typical Linux distributions, which actually need and are using memory overcommitment in a sane and useful way? I know none of them!

What are the disadvantages of disabling memory overcommitment? Many older Unixes (e.g. SunOS4, SunOS5 from the previous century) did not have it, and IMHO their malloc (and perhaps even the general full-system performance, malloc-wise) was not much worse (and improvements since then are unrelated to memory overcommitment).

I believe that memory overcommitment is a misfeature for lazy programmers.

The user of that program could setup some resource limit for setrlimit(2) called with RLIMIT_AS by the parent process (e.g. ulimit builtin of /bin/bash; or limit builtin of zsh, or any modern equivalent for e.g. at, crontab, batch, …), or a grand-parent process (up to eventually /sbin/init of pid 1 or its modern systemd variant).

Best Answer

The reason for overcommitting is to avoid underutilization of physical RAM. There is a difference between how much virtual memory a process has allocated and how much of this virtual memory has been actually mapped to physical page frames. In fact, right after a process is started, it reserves very little RAM. This is due to demand paging: the process has a virtual memory layout, but the mapping from the virtual memory address to a physical page frame isn't established until the memory is read or written.

A program typically never uses its whole virtual memory space, and the memory areas touched varies during the run of the program. For example, mappings to page frames containing initialization code that is executed only at the start of the run can be discarded and the page frames can be used for other mappings.

The same applies to data: when a program calls malloc, it reserves a sufficiently large contiguous virtual address space for storing data. However, mappings to physical page frames are not established until the pages are actually used, if ever. Or consider the program stack: every process gets a fairly big contiguous virtual memory area set aside for the stack (typically 8 MB). A process typically uses only a fraction of this stack space; small and well-behaving programs use even less.

A Linux computer typically has a lot of heterogeneous processes running in different stages of their lifetimes. Statistically, at any point in time, they do not collectively need a mapping for every virtual page they have been assigned (or will be assigned later in the program run).

A strictly non-overcommitting scheme would create a static mapping from virtual address pages to physical RAM page frames at the moment the virtual pages are allocated. This would result in a system that can run far fewer programs concurrently, because a lot of RAM page frames would be reserved for nothing.

I don't deny that overcommitting memory has its dangers, and can lead to out-of-memory situations that are messy to deal with. It's all about finding the right compromise.

Related Solutions

Linux – “automatic stack expansion”

Yes stacks grow dynamically. The stack is in the top of the memory growing downwards towards the heap.

--------------
| Stack      |
--------------
| Free memory|
--------------
| Heap       |
--------------
     .
     .

The heap grows upwards (whenever you do malloc) and the stack grows downwards as and when new functions are called. The heap is present just above the BSS section of the program. Which means the size of your program and the way it allcates memory in heap also affect the maximum stack size for that process. Usually the stack size is unlimited (till heap and stack areas meet and/or overwrite which will give a stack overflow and SIGSEGV :-)

This is only for the user processes, The kernel stack is fixed always (usually 8KB)

Linux – How to Limit Memory Resources for a Process

I am not sure if this answers your question, but I found this perl script that claims to do exactly what you are looking for. The script implements its own system for enforcing the limits by waking up and checking the resource usage of the process and its children. It seems to be well documented and explained, and has been updated recently.

As slm said in his comment, cgroups can also be used for this. You might have to install the utilities for managing cgroups, assuming you are on Linux you should look for libcgroups.

sudo cgcreate -t $USER:$USER -a $USER:$USER -g memory:myGroup

Make sure $USER is your user.

Your user should then have access to the cgroup memory settings in /sys/fs/cgroup/memory/myGroup.

You can then set the limit to, lets say 500 MB, by doing this:

echo 500000000 > /sys/fs/cgroup/memory/myGroup/memory.limit_in_bytes

Now lets run Vim:

cgexec -g memory:myGroup vim

The vim process and all its children should now be limited to using 500 MB of RAM. However, I think this limit only applies to RAM and not swap. Once the processes reach the limit they will start swapping. I am not sure if you can get around this, I can not find a way to limit swap usage using cgroups.

Best Answer

Related Solutions

Linux – “automatic stack expansion”

Linux – How to Limit Memory Resources for a Process

Related Question