We have a Linux server running Debian 4.0.5 (Kernel 4.0.0-2) with 32G RAM installed and 16G Swap configured. The system uses lxc containers for compartmentalisation, but that shouldn't matter here. The issue exists inside and out of different containers.
Here's a typical free -h
:
total used free shared buff/cache available
Mem: 28G 2.1G 25G 15M 936M 26G
Swap: 15G 1.4G 14G
/proc/meminfo
has
Committed_AS: 12951172 kB
So there's plenty of free memory, even if everything allocated was actually used at once. However, the system is instantly paging even running processes.
This is most notable with Gitlab, a Rails application using Unicorn: newly forked Unicorn workers are instantly swapped, and when a request comes in need to be read from disk at ~1400kB/s (data from iotop
) and runs into timeouts (30s for now, to get it restarted in time. No normal request should take more than 5s) before it gets loaded into memory completely, thus getting instantly killed. Note that this is just an example, I have seen this happen to redis, amavis, postgres, mysql, java(openjdk) and others.
The system is otherwise in a low-load situation with about 5% CPU utilization and a loadavg around 2 (on 8 cores).
What we tried (in no particular order):
swapoff -a
: fails at about 800M still swapped- Reducing swappiness (in steps) using
sysctl vm.swappiness=NN
. This seems to have no impact at all, we went down to 0% and still exactly the same behaviour exists - Stopping non-essential services (Gitlab, a Jetty-based webapp…), freeing ca. 8G of committed-but-not-mapped memory and bringing Committed_AS down to about 5G. No change at all.
- Clearing system caches using
sync && echo 3 > /proc/sys/vm/drop_caches
. This frees up memory, but does nothing to the swap situation. - Combinations of the above
Restarting the machine to completely disable swap via fstab as a test is not really an option, as some services have availability issues and need planned downtimes, not "poking around"… and also we don't really want to disable swap as a fallback.
I don't see why there is any swapping occuring here. Any ideas what may be going on?
This problem has existed for a while now, but it showed up first during a period of high IO load (long background data processing task), so I can't pinpoint a specific event. This task is done for some days and the problem persists, hence this question.
Best Answer
Remember how I said:
Well, turns out it did matter. Or rather, the cgroups at the heart of lxc matter.
The host machine only sees reboots for kernel upgrades. So, what were the last kernels used? 3.19, replaced by 4.0.5 2 months ago and yesterday with 4.1.3. And what happened yesterday? Processes getting memkilled left, right and center. Checking
/var/log/kern.log
, the affected processes were in cgroups with 512M memory. Wait, 512M? That can't be right (when the expected requirement is around 4G!). As it turns out, this is exactly what we configured in the lxc configs when setting this all up months ago.So, what happened is that 3.19 completely ignored the memory limit for cgroups; 4.0.5 always paged if the cgroup required more than allowed (this is the core issue of this question) and only 4.1.3 does a full memkiller-sweep.
The swappiness of the host system had no influence on this, since it never was anywhere near being out of physical memory.
The solution:
For a temporary change, you can directly modify the cgroup, for example for an lxc container named
box1
the cgroup is calledlxc/box1
and you may execute (as root in the host machine):The permanent solution is to correctly configure the container in
/var/lb/lxc/...
Moral of the story: always check your configuration. Even if you think it can't possibly be the issue (and takes a different bug/inconsistency in the kernel to actually fail).