If you are using the Raspberry Pi as a server you might want to enable the built-in hardware watchdog. This will automatically reboot the machine if user space fails to periodically write to /dev/watchdog within a reasonable time.
There are two major tasks. (1) Installing and configuring program to say "hi" to the heartbeat device periodically. (2) The kernel exposing the heartbeat device to users.
The hardware is simple enough. It counts down from n down to 0, one tick per second. Upon reaching zero the hardware reboots the machine. If the hardware is tickled then it resets the countdown timer to n and the countdown begins anew.
The Linux kernel exposes the hardware countdown timer as /dev/watchdog. The convention is that the countdown takes 60 seconds.
The Raspberry Pi watchdog runs for less time than this, between 1 and 14 seconds. That's understandable, at its heart the Raspberry Pi has a mobile phone CPU and no one is going to look at the blank screen for a minute wondering what will happen.
User space heart beat - Raspbian
The watchdog(8) daemon is the simplest way for Raspbian to periodically tickle /dev/watchdog.
sudo apt-get install watchdog update-rc.d watchdog enable
The watchdog daemon requires some configuration on the Raspberry Pi. Edit /etc/watchdog.conf to contain only:
watchdog-device = /dev/watchdog watchdog-timeout = 14 realtime = yes priority = 1
If you want the daemon to consume less CPU you can extend the interval between heart beats. Four seconds still gives three chances per fourteen second interval:
interval = 4
User space heartbeat - systemd
The great simplification of system utilities by systemd encompasses watchdog timers too. Edit /etc/systemd/system.conf and set:
Kernel watchdog device
Configure the kernel to expose the watchdog device. Set the parameters to the kernel module by creating a new file /etc/modprobe.d/bcm2708_wdog.conf containing:
alias char-major-10-130 bcm2708_wdog alias char-major-10-131 bcm2708_wdog options bcm2708_wdog heartbeat=14 nowayout=1
The periodic writes from user space are called "heart beats". The heartbeat parameter to the kernel module is the maximum gap between heartbeats seen by the device before the hardware reboots. On the Raspberry Pi this gap can be as large as 14 seconds. That's substantially less than the common value of 60 seconds.
The nowayout parameter determines what happens when the /dev/watchdog device is closed: is a heartbeat still expected or not? A value of 0 says that no further heart beats are expected. So if the process writing the heartbeats fails then the machine will not reboot, even if that process failing is a sign that the machine is in a poor way. A value of 1 says that the countdown to a reboot keeps running and if the device is not reopened and a heartbeat written then the machine will reboot. The Raspberry Pi does not remove power to itself when halted. So setting nowayout=1 will reboot the Raspberry Pi about 14 seconds after the completion of shutdown -h now.
Normally we would put the module name into /etc/modules, but what if starting the system takes longer than the fourteen seconds available? Rather than risk a continual reboot we should let udev load the module the first time something opens /dev/watchdog. Unfortunately I can't figure out how to do that in this case :-(
The second-best option is to install the module just before it is used. The watchdog daemon on Debian allows for this in /etc/default/watchdog:
Start watchdog service
This will all take effect at the next reboot, or kick it off without interrupting service with:
sudo modprobe bcm2708_wdog sudo service watchdog start
Check watchdog service
Check operation in the system log. Here is the module activating /dev/watchdog:
bcm2708 watchdog, heartbeat=14 sec (nowayout=1)
Here is the start of the watchdog daemon which writes the heart beats:
watchdog: starting daemon (5.12): watchdog: int=4s realtime=yes sync=no soft=no mla=0 mem=0 watchdog: ping: no machine to check watchdog: file: no file to check watchdog: pidfile: no server process to check watchdog: interface: no interface to check watchdog: test=none(0) repair=none(0) alive=/dev/watchdog heartbeat=none temp=none to=root no_act=no watchdog: hardware wartchdog identity: BCM2708