My server runs out of memory even though there is swap available.
Why?
I can reproduce it this way:
eat_20GB_RAM() {
perl -e '$a="c"x10000000000;print "OK\n";sleep 10000';
}
export -f eat_20GB_RAM
parallel -j0 eat_20GB_RAM ::: {1..25} &
When that stabilizes (i.e. all processes reach sleep) I run a few more:
parallel --delay 5 -j0 eat_20GB_RAM ::: {1..25} &
When that stabilizes (i.e. all processes reach sleep) around 800 GB RAM/swap is used:
$ free -m
total used free shared buff/cache available
Mem: 515966 440676 74514 1 775 73392
Swap: 1256720 341124 915596
When I run a few more:
parallel --delay 15 -j0 eat_20GB_RAM ::: {1..50} &
I start to get:
Out of memory!
even though there is clearly swap available.
$ free
total used free shared buff/cache available
Mem: 528349276 518336524 7675784 14128 2336968 7316984
Swap: 1286882284 1017746244 269136040
Why?
$ cat /proc/meminfo
MemTotal: 528349276 kB
MemFree: 7647352 kB
MemAvailable: 7281164 kB
Buffers: 70616 kB
Cached: 1503044 kB
SwapCached: 10404 kB
Active: 476833404 kB
Inactive: 20837620 kB
Active(anon): 476445828 kB
Inactive(anon): 19673864 kB
Active(file): 387576 kB
Inactive(file): 1163756 kB
Unevictable: 18776 kB
Mlocked: 18776 kB
SwapTotal: 1286882284 kB
SwapFree: 269134804 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 496106244 kB
Mapped: 190524 kB
Shmem: 14128 kB
KReclaimable: 753204 kB
Slab: 15772584 kB
SReclaimable: 753204 kB
SUnreclaim: 15019380 kB
KernelStack: 46640 kB
PageTables: 3081488 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1551056920 kB
Committed_AS: 1549560424 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 1682132 kB
VmallocChunk: 0 kB
Percpu: 202752 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 12251620 kB
DirectMap2M: 522496000 kB
DirectMap1G: 3145728 kB
Best Answer
In
/proc/meminfo
you find:So you are at the commit limit.
If you have disabled overcommiting of memory (to avoid the OOM-killer) by:
Then the commit limit is computed as:
(From: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting)
You can use the full memory by:
Then you will get out-of-memory when physical RAM and swap is all reserved.
The name
overcommit_ratio
is in this case a bit misleading: You are not overcommitting anything.Even with this setup you may see out-of-memory before swap is exhausted. malloc.c:
Compile as:
Run as (reserve 1 GB for 10 seconds):
If you run this you may see OOM even though there is swap free:
So while
free
in practice often will do The Right Thing, looking at CommitLimit and Committed_AS seems to be more bullet-proof.