shell-script – Trigger a Script When OOM Killer Kills a Process

cronmonitoringout of memoryshell-script

I do a lot of work on in the cloud running statistical models that take up a lot of memory, usually with Ubuntu 18.04. One big headache for me is when I set up a model to run for several hours or overnight, and I check on it later to find that the processes was killed. After doing some research, it seems like this is due to something called the Out Of Memory (OOM) killer.

I would like to know when the OOM Killer kills one of my processes as soon as it happens, so I don't spend a whole night paying for a cloud VM that is not even running anything.

It looks like OOM events are logged in /var/log/, so I suppose I could write a cron job that periodically looks for new messages in /var/log/. But this seems like a kludge. Is there any way to set up the OOM killer so that after it kills a process, it then runs a shell script that I can configure to send me notifications?

Best Answer

You can ask the kernel to panic on oom:

sysctl vm.panic_on_oom=1

or for future reboots

echo "vm.panic_on_oom=1" >> /etc/sysctl.conf

You can adjust a process's likeliness to be killed, but presumably you have already removed most processes, so this may not be of use. See man 5 proc for /proc/[pid]/oom_score_adj.

Of course, you can test the exit code of your program. If it is 137 it was killed by SIGKILL, which an oom would do.

If using rsyslogd you can match for the oom message (I don't know what shape that has) in the data stream and run a program:

:msg, contains, "oom killer..." ^/bin/myprogram
Related Question