By default OOM is overseeing cgroups.
memory.oom_control
contains a flag (0 or 1) that enables or disables the Out of Memory killer for a cgroup. If enabled (0), tasks that attempt to consume more memory than they are allowed are immediately killed by the OOM killer. The OOM killer is enabled by default in every cgroup using the memory subsystem; to disable it, write 1 to the memory.oom_control file:
~]# echo 1 > /cgroup/memory/lab1/memory.oom_control
When the OOM killer is disabled, tasks that attempt to use more memory than they are allowed are paused until additional memory is freed.
References
The reason the OOM-killer is not automatically called is, because the system, albeit completely slowed down and unresponsive already when close to out-of-memoryy, has not actually reached the out-of-memory situation.
Oversimplified the almost full ram contains 3 type of data:
- kernel data, that is essential
- pages of essential process data (e.g. any data the process created in ram only)
- pages of non-essential process data (e.g. data such as the code of executables, for which there is a copy on disk/ in the filesystem, and which while being currently mapped to memory could be "reread" from disk upon usage)
In a memory starved situation the linux kernel as far as I can tell it is kswapd0
kernel thread, to prevent data loss and functionality loss, cannot throw away 1. and 2. , but is at liberty to at least temporarily remove those mapped-into-memory-files data from ram that is form processes that are not currently running.
While this is behaviour which involves disk-thrashing, to constantly throw away data and reread it from disk, can be seen as helpful as it avoids, or at least postpones the necessariry removing/killing of a process and the freeing-but-also-loosing of its memory, it has a high price: performance.
[load pages from disk to ram with code of executable of process 1]
[ run process 1 ]
[evict pages with binary of process 1 from ram]
[load pages from disk to ram with code of executable of process 2]
[ run process 2 ]
[evict pages with binary of process 2 from ram]
[load pages from disk to ram with code of executable of process 3]
[ run process 3 ]
[evict pages with binary of process 3 from ram]
....
[load pages from disk to ram with code of executable of process 1]
[ run process 1 ]
[evict pages with binary of process 1 from ram]
is clearly IO expensive and the system is likely to become unresponsive, event though technically it has not yet run out completely of memory.
From a user persepective however it seems, to be hung/frozen and the resulting unresponsive UI might not be really preferable, over simply killing the process (e.g. of a browser tab, whose memory usage might have very well been the root cause/culprit to begin with.)
This is where as the question indicated the Magic SysRq key trigger to start the OOM manually seems great, as the Magic SysRq is less impacted by the unresponsiveness of the system.
While there might be use-cases where it is important to preserve the processes at all (performance) costs, for a desktop, it is likely that uses would prefere the OOM-killer over the frozen UI. There is patch that claims to exempt clean mapped fs backed files from memory in such situation in this answer on stackoverflow.
Best Answer
It's possible to register for a notification for when a cgroup's memory usage goes above a threshold. In principle, setting the threshold at a suitable point below the actual limit would let you send a signal or take other action.
See:
https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt