Docker “cannot allocate memory” – virtual memory tuning

dockerkernelmemoryout of memoryrhel

We are building or running Docker containers in our Jenkins instances built on top of Centos7 within AWS EC2.
We have 2 instances of t2.medium boxes with 2 CPUs and 3.5 Gb of Available memory.
In once case we are building the containers in another we are just pulling them and running (different container).

We started to get errors

open /var/lib/docker/overlay/<sha>-init/merged/dev/console: cannot allocate memory

and in journalctl we get

page allocation failure: order:4

Running page cache dump resolves the issue for a while

echo 1 > /proc/sys/vm/drop_caches

So what I noticed that while running the docker task, Dirty memory block spikes (as it should) and Mapped jumps after it. However, the DirectMap4k is relatively close to that jump.

For example:
Idle machine

cat /proc/meminfo | grep -P "(Dirty|Mapped|DirectMap4k)"
Dirty:               104 kB
Mapped:            45696 kB
DirectMap4k:      100352 kB

Active Machine

cat /proc/meminfo | grep -P "(Dirty|Mapped|DirectMap4k)"
Dirty:             72428 kB
Mapped:            70192 kB
DirectMap4k:      100352 kB

So this machine takes some time to start failing, whereas identical machine reports DirectMap4k: 77824 kB and thus fails regularly (it also has to handle building more complex container), but sysctl vm is identical.

The underlying problem that build/boot of the docker container throws out of memory error and the question is what needs to be tuned for the kernel to make it stable.

Docker version

Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:20:36 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:21:56 2017
 OS/Arch:      linux/amd64
 Experimental: false

Kernel 3.10.0-327.10.1.el7.x86_64

sysctl vm

vm.admin_reserve_kbytes = 8192
vm.block_dump = 0
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 30
vm.dirty_writeback_centisecs = 500
vm.drop_caches = 1
vm.extfrag_threshold = 500
vm.hugepages_treat_as_movable = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256   256     32
vm.max_map_count = 65530
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.min_free_kbytes = 67584
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 4096
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
vm.nr_pdflush_threads = 0
vm.numa_zonelist_order = default
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.stat_interval = 1
vm.swappiness = 30
vm.user_reserve_kbytes = 108990
vm.vfs_cache_pressure = 100
vm.zone_reclaim_mode = 0

Best Answer

TL;DR

sudo su
sysctl -w vm.swappiness=10

Explanation

I've created a testing scenario where I can reproduce this error 10/10 times. This is just building a larger image directly via command line rather than through CI.

as mentioned workaround I knew was

echo 1 > /proc/sys/vm/drop_caches

So I've tried to corelate it to DirectMap values. Since I learned that those values represent TLB load and cannot be tuned directly I've looked up the preference value to use them and that is swappiness.

RHLE 7 Docs explain swappiness:

⁠swappiness

The swappiness value, ranging from 0 to 100, controls the degree to which the system favors anonymous memory or the page cache. A high value improves file-system performance while aggressively swapping less active processes out of RAM. A low value avoids swapping processes out of memory, which usually decreases latency at the cost of I/O performance. The default value is 60.

WARNING
Setting swappiness==0 will very aggressively avoids swapping out, which > increase the risk of OOM killing under strong memory and I/O pressure.

So reducing it lowers the reliance on memory cache pages. By default, EC2 Centos 7 Images that we use, set it to 30 so reducing it to 10 made the large image built successfully 10/10 times.

Further investigations

Following my gut feeling that zram was behind this behavious, I setted up a VM with similar spec as your machine: 4 GB RAM and 2 GB zram swap, no swap file.

I have loaded the VM with heavy weight applications and got the following state:

huygens@ubuntu:~$ smem -wt -K ~/vmlinuz-3.2.0-38-generic.unpacked -R 4096M
Area                           Used      Cache   Noncache 
firmware/hardware            130717          0     130717 
kernel image                  13951          0      13951 
kernel dynamic memory       1063520     922172     141348 
userspace memory            2534684     257136    2277548 
free memory                  451432     451432          0 
----------------------------------------------------------
                            4194304    1630740    2563564 
huygens@ubuntu:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          3954       3528        426          0         79        858
-/+ buffers/cache:       2589       1365
Swap:         1977          0       1977

As you can see free reports 858 MB cache memory and that is also what smem seems to report within the cached kernel dynamic memory.

Then I further stressed the system using Chromium Browser. At the beginning, it was only have 83 MB of swap used. But then after a few more tabs opened, the swap switch quickly to almost it's maximum and I experienced OOM! zram has really a dangerous side where wrongly configured (too big sizes) it can quickly hit you back like a trebuchet-like mechanism.

At that time I had the following outputs:

huygens@ubuntu:~$ smem -wt -K ~/vmlinuz-3.2.0-38-generic.unpacked -R 4096M
Area                           Used      Cache   Noncache 
firmware/hardware            130717          0     130717 
kernel image                  13951          0      13951 
kernel dynamic memory       1355344     124072    1231272 
userspace memory             961004      36456     924548 
free memory                 1733288    1733288          0 
----------------------------------------------------------
                            4194304    1893816    2300488 
huygens@ubuntu:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          3954       2256       1698          0          4        132
-/+ buffers/cache:       2118       1835
Swap:         1977       1750        227

See how the kernel dynamic memory (columns cache and non-cache) look like inverted? It is because in the first case, the kernel had "cached" memory such as reported by free but then it had swap memory held by zram which smem does not know how to compute (check smem source code, zram occupation is not reported in /proc/meminfo, this it is not computed by smem which does simple "total kernel mem" - "type of memory reported by meminfo that I know are cache", what it does not know is that in the computed total kernel mem it has added the size of the swap which is in RAM!)

When I was in this state, I activated a hard disk swap and turned off the zram swap and I reset the zram devices: echo 1 > /sys/block/zram0/reset.

After that the noncache kernel memory melted like snow in summer and returned to "normal" value.

Conclusion

smem does not know about zram (yet) maybe because it is still staging and thus not part of /proc/meminfo which reports global parameters (like (in)active pages size, total memory) and then only report on a few specific parameters. smem identified a few of this specific parameters as "cache", sum them up and compare that to total memory. Because of that zram used memory gets counted in the noncache column.

Note: by the way, in modern kernel, meminfo reports also the shared memory consumed. smem does not take that yet into account, so even without zram the output of smem is to consider carefully esp. if you use application that make big use of shared memory.

References used:

Mount.cifs : mount error(12) : Cannot allocate memory

Not sure if there is a workaround on the Linux side, but the fix on the Windows side definitely works.

Most posts on the web mention 2 registry keys and a reboot. In fact, only one registry change is needed on Windows 7, and no reboot. Only a service restart.

Talk to your Windows sysadmin. If you can get him/her to copy/paste this into a command prompt, it should work:

reg add HKLM\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters /v Size /t REG_DWORD /d 3 /f
sc stop  LanmanServer
sc start LanmanServer

Best Answer

TL;DR

Explanation

⁠swappiness

Related Solutions

Linux – “kernel dynamic memory” as reported by smem

Further investigations

Conclusion

Mount.cifs : mount error(12) : Cannot allocate memory

Related Question