Supermicro main boards contain a BIOS feature named "Watch Dog Function". Having Debian 6.0.6 with kernel "Linux debian 2.6.32-5-amd64 #1 SMP" we did:
- Change BIOS "Watch Dog Function" from Disabled to Enabled.
- Install the package watchdog (
# apt-get install watchdog
)
Expected: that would be all for the watchdog function to be correctly installed.
Result: system reboots every (roughly) 5 minutes.
Change BIOS "Watch Dog Function" from Enabled to Disabled fixes the undesired reboots.
The boot process seems to correctly enable the watchdog daemon. At least console displays (when BIOS Watch Dog is disabled):
Starting watchdog keepalive daemon: wd_keepalive.
Stopping watchdog keepalive daemon....
Starting watchdog daemon....
And on reboot this output is generated:
INIT: SUsing makefile-style concurrent boot in runlevel 6.
Stopping watchdog daemon....
Starting watchdog keepalive daemon....
What else need to be done to configure the BIOS watch dog function and Linux OS watchdog daemon to work together correctly?
Best Answer
1. Load hardware module
Firstly, in order to actually 'feed' the watchdog, you need to have the watchdog hardware module loaded. This may not happen automatically as most watchdog drivers are blacklisted in case there is no watchdog daemon (e.g. in
/etc/modprobe.d/blacklist-watchdog.conf
on an Ubuntu/Debian system). Check to see if/dev/watchdog
(or similar) has appeared, as that would imply the module has been loaded.I don't know what the Supermicro board uses, but it may be the Intel TCO driver (
iTCO_wdt
). Note thatiTCO_wdt
might require some other modules likei2c-i801
,i2c-smbus
to do its magic. Try usingmodprobe iTCO_wdt
to load that module and see if it is accepted.Success looks like:
Failure shows nothing after:
Also check syslog. Otherwise check out the IPMI tools as they include a watchdog driver.
2. Edit
/etc/watchdog.conf
Secondly, you need to edit the watchdog configuration file, like
# nano /etc/watchdog.conf
.3. Un-comment
watchdog-device = ...
So actually use the
/dev/watchdog
device access to the module. Otherwise the watchdog will not use the hardware and rely only on its internal code to soft-reboot a broken machine (which is not so useful).Again, on starting the watchdog daemon look for messages in syslog about it starting and what hardware module it has found.