How to Receive Signal Before Process Killed by OOM Killer

cgroupskilllimitout of memory

In our cluster, we are restricting our processes resources, e.g. memory (memory.limit_in_bytes).

I think, in the end, this is also handled via the OOM killer in the Linux kernel (looks like it by reading the source code).

Is there any way to get a signal before my process is being killed? (Just like the -notify option for SGE's qsub, which will send SIGUSR1 before the process is killed.)

I read about /dev/mem_notify here but I don't have it – is there something else nowadays? I also read this which seems somewhat relevant.

I want to be able to at least dump a small stack trace and maybe some other useful debug info – but maybe I can even recover by freeing some memory.

One workaround I'm currently using is this small script which frequently checks if I'm close (95%) to the limit and if so, it sends the process a SIGUSR1. In Bash, I'm starting this script in background (cgroup-mem-limit-watcher.py &) so that it watches for other procs in the same cgroup and it quits automatically when the parent Bash process dies.

Best Answer

It's possible to register for a notification for when a cgroup's memory usage goes above a threshold. In principle, setting the threshold at a suitable point below the actual limit would let you send a signal or take other action.

See:

https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt

Related Question