How to Receive Signal Before Process Killed by OOM Killer

cgroupskilllimitout of memory

In our cluster, we are restricting our processes resources, e.g. memory (memory.limit_in_bytes).

I think, in the end, this is also handled via the OOM killer in the Linux kernel (looks like it by reading the source code).

Is there any way to get a signal before my process is being killed? (Just like the -notify option for SGE's qsub, which will send SIGUSR1 before the process is killed.)

I read about /dev/mem_notify here but I don't have it – is there something else nowadays? I also read this which seems somewhat relevant.

I want to be able to at least dump a small stack trace and maybe some other useful debug info – but maybe I can even recover by freeing some memory.

One workaround I'm currently using is this small script which frequently checks if I'm close (95%) to the limit and if so, it sends the process a SIGUSR1. In Bash, I'm starting this script in background (cgroup-mem-limit-watcher.py &) so that it watches for other procs in the same cgroup and it quits automatically when the parent Bash process dies.

Best Answer

It's possible to register for a notification for when a cgroup's memory usage goes above a threshold. In principle, setting the threshold at a suitable point below the actual limit would let you send a signal or take other action.

See:

https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt

References

Redhat docs - 3.7. MEMORY

Linux – Why does linux out-of-memory (OOM) killer not run automatically, but works upon sysrq-key

The reason the OOM-killer is not automatically called is, because the system, albeit completely slowed down and unresponsive already when close to out-of-memoryy, has not actually reached the out-of-memory situation.

Oversimplified the almost full ram contains 3 type of data:

kernel data, that is essential
pages of essential process data (e.g. any data the process created in ram only)
pages of non-essential process data (e.g. data such as the code of executables, for which there is a copy on disk/ in the filesystem, and which while being currently mapped to memory could be "reread" from disk upon usage)

In a memory starved situation the linux kernel as far as I can tell it is kswapd0 kernel thread, to prevent data loss and functionality loss, cannot throw away 1. and 2. , but is at liberty to at least temporarily remove those mapped-into-memory-files data from ram that is form processes that are not currently running.

While this is behaviour which involves disk-thrashing, to constantly throw away data and reread it from disk, can be seen as helpful as it avoids, or at least postpones the necessariry removing/killing of a process and the freeing-but-also-loosing of its memory, it has a high price: performance.

[load pages from disk to ram with code of executable of process 1]
[ run process 1 ] 
[evict pages with binary of process 1 from ram]
[load pages from disk to ram with code of executable of process 2]
[ run process 2 ] 
[evict pages with binary of process 2 from ram]
[load pages from disk to ram with code of executable of process 3]
[ run process 3 ] 
[evict pages with binary of process 3 from ram]
....
[load pages from disk to ram with code of executable of process 1]
[ run process 1 ] 
[evict pages with binary of process 1 from ram]

is clearly IO expensive and the system is likely to become unresponsive, event though technically it has not yet run out completely of memory.

From a user persepective however it seems, to be hung/frozen and the resulting unresponsive UI might not be really preferable, over simply killing the process (e.g. of a browser tab, whose memory usage might have very well been the root cause/culprit to begin with.)

This is where as the question indicated the Magic SysRq key trigger to start the OOM manually seems great, as the Magic SysRq is less impacted by the unresponsiveness of the system.

While there might be use-cases where it is important to preserve the processes at all (performance) costs, for a desktop, it is likely that uses would prefere the OOM-killer over the frozen UI. There is patch that claims to exempt clean mapped fs backed files from memory in such situation in this answer on stackoverflow.

Best Answer

Related Solutions

Linux – What’s the Linux kernel’s behaviour when processes in a cgroup hit their memory limit

References

Linux – Why does linux out-of-memory (OOM) killer not run automatically, but works upon sysrq-key

Related Question