Debian – OOM killer doesn’t work properly, leads to a frozen OS

arch linuxdebianlinuxout of memory

For years, the OOM killer of my operating system doesn't work properly and leads to a frozen system.
When the memory usage is very high, the whole system tends to "freeze" (in fact: becoming extremely slow) for hours or even days, instead of killing processes to free the memory.
The maximum that I have recorded is 7 days before resigning myself to operate a reset.
When OOM is about to be reached, the iowait is very very high (~ 70%), before becoming unmeasurable.
The tool: iotop has showed that every programs are reading at a very high throughput (per tens of MB/sec) from my hard drive.
What those programs are reading ?
– The directory hierarchy ?
– The executable code itself ?
I don't exactly now.

[edited] At the time I wrote this message (in 2017) I was using an uptodate ArchLinux (4.9.27-1-lts), but had already experienced the issue for years before.
I have experienced the same issue with various Linux distributions and different hardware configurations.
Currently (2019), I am using an uptodate Debian 9.6 (4.9.0)
I have 16 GB of physical ram, a SSD on which my OS is installed, and not any swap partition.

Because of the amount of ram that I have, I don't want to enable a swap partition, since it would just delay the apparition of the issue.
Also, with SSDs swapping too often could potentially reduce the lifespan of the disk.
By the way, I've already tried with and without a swap partition, it has proved to only delay the apparition of the problem, but not being the solution.

To me the problem is caused by the fact that Linux drops essential data from the caches, which leads to a frozen system because it has to read everything, every time from the hard drive.

I even wonder if Linux wouldn't drop the executable code pages of running programs, which would explain why programs that normally don't read a lot of data, behave this way in this situation.

I have tried several things in the hope to fix this issue.
One was to set /proc/sys/vm/min_free_kbytes to 1000000 (1 GB).
Because this 1 GB should remain free, I thought that this memory would be reserved by Linux to cache important data.
But it hasn't worked.

Also, I think useful to add that even if it could sound great in theory, restricting the size of the virtual memory to the size of the physical memory, by defining /proc/sys/vm/overcommit_memory to 2 isn't decently technically possible in my situation, because the kind of applications that I use, require more virtual memory than they effectively use for some reasons.
According to the file /proc/meminfo, the Commited_AS value is often higher than the double of the physical ram on my system (16 GB, Commited_AS is often > 32 GB).

I have experienced this problem with /proc/sys/vm/overcommit_memory to its default value: 0, and for a while I have defined it to: 1, because I preferred programs to be killed by the OOM killer rather than behaving wrongly because they don't check the return values of malloc when the allocations are refused.

When I was talking about this issue on IRC, I have met other Linux users who have experienced this very same problem, so I guess that a lot of users are concerned by this.
To me this is not acceptable since even Windows deals better with high memory usage.

If you need more information, have a suggestion, please tell me.

Documentation:
https://en.wikipedia.org/wiki/Thrashing_%28computer_science%29
https://en.wikipedia.org/wiki/Memory_overcommitment
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
https://lwn.net/Articles/317814/

They talk about it:
Why does linux out-of-memory (OOM) killer not run automatically, but works upon sysrq-key?
Why does OOM-killer sometimes fail to kill resource hogs?
Preloading the OOM Killer
Is it possible to trigger OOM-killer on forced swapping?
How to avoid high latency near OOM situation?
https://lwn.net/Articles/104179/
https://bbs.archlinux.org/viewtopic.php?id=233843

Best Answer

I've found two explanations(of the same thing) as to why ~~kswapd0 does~~ constant disk reading happens well before OOM-killer kills the offending process:

see the answer and comment of this askubuntu SE answer
see the answer and David Schwartz's comments of this answer on unix SE

I'll quote here the comment from 1. which really opened my eyes as to why I was getting constant disk reading while everything was frozen:

For example, consider a case where you have zero swap and system is nearly running out of RAM. The kernel will take memory from e.g. Firefox (it can do this because Firefox is running executable code that has been loaded from disk - the code can be loaded from disk again if needed). If Firefox then needs to access that RAM again N seconds later, the CPU generates "hard fault" which forces Linux to free some RAM (e.g. take some RAM from another process), load the missing data from disk and then allow Firefox to continue as usual. This is pretty similar to normal swapping and kswapd0 does it. – Mikko Rantalainen Feb 15 at 13:08

If anyone has a way as to how to disable this behavior(maybe recompile kernel with what options?), please let me know as soon as possible! Much appreciated, thanks!

UPDATE: The only way I've found thus far is through patching the kernel, and it works for me with swap disabled(ie. CONFIG_SWAP is not set) but doesn't work for others with swap enabled it seems; see the patch inside this question.

How is the `OOM_Score` calculated?

In David's patch set, the old badness() heuristics are almost entirely gone. Instead, the calculation turns into a simple question of what percentage of the available memory is being used by the process. If the system as a whole is short of memory, then "available memory" is the sum of all RAM and swap space available to the system.

If instead, the OOM situation is caused by exhausting the memory allowed to a given cpuset/control group, then "available memory" is the total amount allocated to that control group. A similar calculation is made if limits imposed by a memory policy have been exceeded. In each case, the memory use of the process is deemed to be the sum of its resident set (the number of RAM pages it is using) and its swap usage.

This calculation produces a percent-times-ten number as a result; a process which is using every byte of the memory available to it will have a score of 1000, while a process using no memory at all will get a score of zero. There are very few heuristic tweaks to this score, but the code does still subtract a small amount (30) from the score of root-owned processes on the notion that they are slightly more valuable than user-owned processes.

One other tweak which is applied is to add the value stored in each process's oom_score_adj variable, which can be adjusted via /proc. This knob allows the adjustment of each process's attractiveness to the OOM killer in user space; setting it to -1000 will disable OOM kills entirely, while setting to +1000 is the equivalent of painting a large target on the associated process.

References

http://www.queryhome.com/15491/whats-happening-kernel-starting-killer-choose-which-process https://serverfault.com/a/571326

Best Answer

Related Solutions

Linux Kernel – Preloading the OOM Killer

linux memory out-of-memory – How Does the OOM Killer Decide Which Process to Kill First?

How is the OOM_Score calculated?

Related Question

How is the `OOM_Score` calculated?