Debian – How to Configure Watchdog Daemon for BIOS Watch Dog

biosdebianwatchdog

Supermicro main boards contain a BIOS feature named "Watch Dog Function". Having Debian 6.0.6 with kernel "Linux debian 2.6.32-5-amd64 #1 SMP" we did:

  1. Change BIOS "Watch Dog Function" from Disabled to Enabled.
  2. Install the package watchdog (# apt-get install watchdog)

Expected: that would be all for the watchdog function to be correctly installed.

Result: system reboots every (roughly) 5 minutes.

Change BIOS "Watch Dog Function" from Enabled to Disabled fixes the undesired reboots.

The boot process seems to correctly enable the watchdog daemon. At least console displays (when BIOS Watch Dog is disabled):

Starting watchdog keepalive daemon: wd_keepalive.
Stopping watchdog keepalive daemon....
Starting watchdog daemon....

And on reboot this output is generated:

INIT: SUsing makefile-style concurrent boot in runlevel 6.
Stopping watchdog daemon....
Starting watchdog keepalive daemon....

What else need to be done to configure the BIOS watch dog function and Linux OS watchdog daemon to work together correctly?

Best Answer

1. Load hardware module

Firstly, in order to actually 'feed' the watchdog, you need to have the watchdog hardware module loaded. This may not happen automatically as most watchdog drivers are blacklisted in case there is no watchdog daemon (e.g. in /etc/modprobe.d/blacklist-watchdog.conf on an Ubuntu/Debian system). Check to see if /dev/watchdog (or similar) has appeared, as that would imply the module has been loaded.

I don't know what the Supermicro board uses, but it may be the Intel TCO driver (iTCO_wdt). Note that iTCO_wdt might require some other modules like i2c-i801, i2c-smbus to do its magic. Try using modprobe iTCO_wdt to load that module and see if it is accepted.

Success looks like:

iTCO_wdt: Found a Intel PCH TCO device (Version=4, TCOBASE=0x0400)
iTCO_wdt: initialized. heartbeat=120 sec (nowayout=0)

Failure shows nothing after:

iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11 

Also check syslog. Otherwise check out the IPMI tools as they include a watchdog driver.

2. Edit /etc/watchdog.conf

Secondly, you need to edit the watchdog configuration file, like # nano /etc/watchdog.conf.

3. Un-comment watchdog-device = ...

So actually use the /dev/watchdog device access to the module. Otherwise the watchdog will not use the hardware and rely only on its internal code to soft-reboot a broken machine (which is not so useful).

Again, on starting the watchdog daemon look for messages in syslog about it starting and what hardware module it has found.

Related Question