Docker “cannot allocate memory” – virtual memory tuning

dockerkernelmemoryout of memoryrhel

We are building or running Docker containers in our Jenkins instances built on top of Centos7 within AWS EC2.
We have 2 instances of t2.medium boxes with 2 CPUs and 3.5 Gb of Available memory.
In once case we are building the containers in another we are just pulling them and running (different container).

We started to get errors

open /var/lib/docker/overlay/<sha>-init/merged/dev/console: cannot allocate memory

and in journalctl we get

page allocation failure: order:4

Running page cache dump resolves the issue for a while

echo 1 > /proc/sys/vm/drop_caches

So what I noticed that while running the docker task, Dirty memory block spikes (as it should) and Mapped jumps after it. However, the DirectMap4k is relatively close to that jump.

For example:
Idle machine

cat /proc/meminfo | grep -P "(Dirty|Mapped|DirectMap4k)"
Dirty:               104 kB
Mapped:            45696 kB
DirectMap4k:      100352 kB

Active Machine

cat /proc/meminfo | grep -P "(Dirty|Mapped|DirectMap4k)"
Dirty:             72428 kB
Mapped:            70192 kB
DirectMap4k:      100352 kB

So this machine takes some time to start failing, whereas identical machine reports DirectMap4k: 77824 kB and thus fails regularly (it also has to handle building more complex container), but sysctl vm is identical.

The underlying problem that build/boot of the docker container throws out of memory error and the question is what needs to be tuned for the kernel to make it stable.


Docker version

Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:20:36 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:21:56 2017
 OS/Arch:      linux/amd64
 Experimental: false

Kernel 3.10.0-327.10.1.el7.x86_64

sysctl vm

vm.admin_reserve_kbytes = 8192
vm.block_dump = 0
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 30
vm.dirty_writeback_centisecs = 500
vm.drop_caches = 1
vm.extfrag_threshold = 500
vm.hugepages_treat_as_movable = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256   256     32
vm.max_map_count = 65530
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.min_free_kbytes = 67584
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 4096
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
vm.nr_pdflush_threads = 0
vm.numa_zonelist_order = default
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.stat_interval = 1
vm.swappiness = 30
vm.user_reserve_kbytes = 108990
vm.vfs_cache_pressure = 100
vm.zone_reclaim_mode = 0

Best Answer

TL;DR

sudo su
sysctl -w vm.swappiness=10

Explanation

I've created a testing scenario where I can reproduce this error 10/10 times. This is just building a larger image directly via command line rather than through CI.

as mentioned workaround I knew was

echo 1 > /proc/sys/vm/drop_caches

So I've tried to corelate it to DirectMap values. Since I learned that those values represent TLB load and cannot be tuned directly I've looked up the preference value to use them and that is swappiness.

RHLE 7 Docs explain swappiness:

⁠swappiness

The swappiness value, ranging from 0 to 100, controls the degree to which the system favors anonymous memory or the page cache. A high value improves file-system performance while aggressively swapping less active processes out of RAM. A low value avoids swapping processes out of memory, which usually decreases latency at the cost of I/O performance. The default value is 60.

WARNING
Setting swappiness==0 will very aggressively avoids swapping out, which > increase the risk of OOM killing under strong memory and I/O pressure.

So reducing it lowers the reliance on memory cache pages. By default, EC2 Centos 7 Images that we use, set it to 30 so reducing it to 10 made the large image built successfully 10/10 times.

Related Question