Ubuntu – Scripts that detects kernel activity and reboots when kernel freezes

bashbootkernelscripts

I'm running a machine that has a GPU running that sometimes causes the machine to freeze. When
I look at syslog file, it says that the kernel is hung:

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

I would like to create a script that detects an activity in the kernel so that when it hangs,
it would boot the machine automatically. However when I run a bash script and keeps track syslog file and looks for some certain keyword, like kernel, the script stops running by the time the kernel freezes, so it doesn't have opportunity to execute reboot command.

Is there a way to keep track kernel activity, so that when it freezes, it automatically reboots? Like auto reboot when kernel panic happens.

regards

Best Answer

Most machines have a /dev/watchdog device provided by a kernel driver for some built-in hardware. The user-space api is fairly simple, and there is now also a wdctl command to get information about the hardware features of the device. There is also a systemd configuration option RuntimeWatchdogSec to set it at boot.

The generic watchdog operation is that the watchdog hardware is configured with an action and a set time delay (some hardware have fixed configurations), it is started, and has to be tickled repeatedly within that delay or it will cause the action, often a reset. Sometimes, on closing the device the watchdog is cleared, but often this is not desirable so the watchdog can be configured to continue timing and triggering no matter what. On reboot, the cause of the reset might be available from the device or some other hardware, so that we can see the watchdog was the cause.

Related Question