Ubuntu – Scripts that detects kernel activity and reboots when kernel freezes

bashbootkernelscripts

I'm running a machine that has a GPU running that sometimes causes the machine to freeze. When
I look at syslog file, it says that the kernel is hung:

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

I would like to create a script that detects an activity in the kernel so that when it hangs,
it would boot the machine automatically. However when I run a bash script and keeps track syslog file and looks for some certain keyword, like kernel, the script stops running by the time the kernel freezes, so it doesn't have opportunity to execute reboot command.

Is there a way to keep track kernel activity, so that when it freezes, it automatically reboots? Like auto reboot when kernel panic happens.

regards

Best Answer

Most machines have a /dev/watchdog device provided by a kernel driver for some built-in hardware. The user-space api is fairly simple, and there is now also a wdctl command to get information about the hardware features of the device. There is also a systemd configuration option RuntimeWatchdogSec to set it at boot.

The generic watchdog operation is that the watchdog hardware is configured with an action and a set time delay (some hardware have fixed configurations), it is started, and has to be tickled repeatedly within that delay or it will cause the action, often a reset. Sometimes, on closing the device the watchdog is cleared, but often this is not desirable so the watchdog can be configured to continue timing and triggering no matter what. On reboot, the cause of the reset might be available from the device or some other hardware, so that we can see the watchdog was the cause.

Related Solutions

Ubuntu – How to configure automatic reboot after kernel panic

The kernel parameter that you're looking for is kernel.panic=1 (where 1 is the number of seconds before rebooting).

You can add that to your sysctl.conf, sysctl.d, boot line, or however you normally set your kernel parameters. Make sure you have some way of monitoring your uptime so that you know when kernel panics have occurred.

Ubuntu – Regular freezing on Ryzen based system, 16.04 LTS and newer kernel

I had the same problem... What I did to solve this issue:

Performance:

sudo cpufreq-set -r -g performance

Set on boot:

sudo apt-get install cpufrequtils
echo 'GOVERNOR="performance"' | sudo tee /etc/default/cpufrequtils
sudo systemctl disable ondemand

Best Answer

Related Solutions

Ubuntu – How to configure automatic reboot after kernel panic

Ubuntu – Regular freezing on Ryzen based system, 16.04 LTS and newer kernel

Related Question