I saw several posts around the web of people apparently complaining about a hosted VPS unexpectedly killing processes because they used too much RAM.
How is this possible? I thought all modern OS' provide "infinite RAM" by just using disk swap for whatever goes over the physical RAM. Is this correct?
What might be happening if a process is "killed due to low RAM"?
Best Answer
It's sometimes said that linux by default never denies requests for more memory from application code -- e.g.
malloc()
.1 This is not in fact true; the default uses a heuristic wherebyFrom
[linux_src]/Documentation/vm/overcommit-accounting
(all quotes are from the 3.11 tree). Exactly what counts as a "seriously wild allocation" isn't made explicit, so we would have to go through the source to determine the details. We could also use the experimental method in footnote 2 (below) to try and get some reflection of the heuristic -- based on that, my initial empirical observation is that under ideal circumstances (== the system is idle), if you don't have any swap, you'll be allowed to allocate about half your RAM, and if you do have swap, you'll get about half your RAM plus all of your swap. That is more or less per process (but note this limit is dynamic and subject to change because of state, see some observations in footnote 5).Half your RAM plus swap is explicitly the default for the "CommitLimit" field in
/proc/meminfo
. Here's what it means -- and note it actually has nothing to do with the limit just discussed (from[src]/Documentation/filesystems/proc.txt
):The previously quoted overcommit-accounting doc states that the default
vm.overcommit_ratio
is 50. So if yousysctl vm.overcommit_memory=2
, you can then adjust vm.covercommit_ratio (withsysctl
) and see the consequences.3 The default mode, whenCommitLimit
is not enforced and only "obvious overcommits of address space are refused", is whenvm.overcommit_memory=0
.While the default strategy does have a heuristic per-process limit preventing the "seriously wild allocation", it does leave the system as a whole free to get seriously wild, allocation wise.4 This means at some point it can run out of memory and have to declare bankruptcy to some process(es) via the OOM killer.
What does the OOM killer kill? Not necessarily the process that asked for memory when there was none, since that's not necessarily the truly guilty process, and more importantly, not necessarily the one that will most quickly get the system out of the problem it is in.
This is cited from here which probably cites a 2.6.x source:
Which seems like a decent rationale. However, without getting forensic, #5 (which is redundant of #1) seems like a tough sell implementation wise, and #3 is redundant of #2. So it might make sense to consider this pared down to #2/3 and #4.
I grepped through a recent source (3.11) and noticed that this comment has changed in the interim:
This is a little more explicitly about #2: "The goal is to [kill] the task consuming the most memory to avoid subsequent oom failures," and by implication #4 ("we want to kill the minimum amount of processes (one)).
If you want to see the OOM killer in action, see footnote 5.
1 A delusion Gilles thankfully rid me of, see comments.
2 Here's a straightforward bit of C which asks for increasingly large chunks of memory to determine when a request for more will fail:
If you don't know C, you can compile this
gcc virtlimitcheck.c -o virtlimitcheck
, then run./virtlimitcheck
. It is completely harmless, as the process doesn't use any of the space it asks for -- i.e., it never really uses any RAM.On a 3.11 x86_64 system with 4 GB system and 6 GB of swap, I failed at ~7400000 kB; the number fluctuates, so perhaps state is a factor. This is coincidentally close to the
CommitLimit
in/proc/meminfo
, but modifying this viavm.overcommit_ratio
does not make any difference. On a 3.6.11 32-bit ARM 448 MB system with 64 MB of swap, however, I fail at ~230 MB. This is interesting since in the first case the amount is almost double the amount of RAM, whereas in the second it is about 1/4 that -- strongly implying the amount of swap is a factor. This was confirmed by turning swap off on the first system, when the failure threshold went down to ~1.95 GB, a very similar ratio to the little ARM box.But is this really per process? It appears to be. The short program below asks for a user defined chunk of memory, and if it succeeds, waits for you to hit return -- this way you can try multiple simultaneous instances:
Beware, however, that it is not strictly about the amount of RAM and swap regardless of use -- see footnote 5 for observations about the effects of system state.
3
CommitLimit
refers to the amount of address space allowed for the system when vm.overcommit_memory = 2. Presumably then, the amount you can allocate should be that minus what's already committed, which is apparently theCommitted_AS
field.A potentially interesting experiment demonstrating this is to add
#include <unistd.h>
to the top of virtlimitcheck.c (see footnote 2), and afork()
right before thewhile()
loop. That is not guaranteed to work as described here without some tedious synchronization, but there is a decent chance it will, YMMV:This makes sense -- looking at tmp.txt in detail you can see the processes alternate their bigger and bigger allocations (this is easier if you throw the pid into the output) until one, evidently, has claimed enough that the other one fails. The winner is then free to grab everything up to
CommitLimit
minusCommitted_AS
.4 It's worth mentioning, at this point, if you do not already understand virtual addressing and demand paging, that what makes over commitment possible in the first place is that what the kernel allocates to userland processes isn't physical memory at all -- it's virtual address space. For example, if a process reserves 10 MB for something, that's laid out as a sequence of (virtual) addresses, but those addresses do not yet correspond to physical memory. When such an address is accessed, this results in a page fault and then the kernel attempts to map it onto real memory so that it can store a real value. Processes usually reserve much more virtual space than they actually access, which allows the kernel to make the most efficient use of RAM. However, physical memory is still a finite resource and when all of it has been mapped to virtual address space, some virtual address space has to be eliminated to free up some RAM.
5 First a warning: If you try this with
vm.overcommit_memory=0
, make sure you save your work first and close any critical applications, because the system will be frozen for ~90 seconds and some process will die!The idea is to run a fork bomb that times out after 90 seconds, with the forks allocating space and some of them writing large amounts of data to RAM, all the while reporting to stderr.
Compile this
gcc forkbomb.c -o forkbomb
. First, try it withsysctl vm.overcommit_memory=2
-- you'll probably get something like:In this environment, this kind of fork bomb doesn't get very far. Note that the number in "says N forks" is not the total number of processes, it is the number of processes in the chain/branch leading up to that one.
Now try it with
vm.overcommit_memory=0
. If you redirect stderr to a file, you can do some crude analysis afterward, e.g.:Only 15 processes failed to allocate 1 GB -- demonstrating that the heuristic for overcommit_memory = 0 is affected by state. How many processes were there? Looking at the end of tmp.txt, probably > 100,000. Now how may actually got to use the 1 GB?
Eight -- which again makes sense, since at the time I had ~3 GB RAM free and 6 GB of swap.
Have a look at your system logs after you do this. You should see the OOM killer reporting scores (amongst other things); presumably this relates to
oom_badness
.