I have a backup daemon running on my server that's crashing every few days. I'm not sure why. In the long run I'd like to figure out why and fix it, but in the mean time I'd like systemd to restart it when it crashes.
It has an old-style SysV init script, which is being picked up by systemd-sysv-generator. Apparently when it crashes it does so with a zero ("successful") exit code. To try to get it to restart after these crashes, I dropped in an override.conf
:
~$ cat /etc/systemd/system/crashplan.service.d/override.conf
[Service]
Restart=always
systemd does appear to be picking this up:
roberts:~$ sudo systemctl show crashplan.service | grep Restart
Restart=always
RestartUSec=100ms
Nonetheless, when I checked on it after a few days, I found:
roberts:~$ sudo systemctl status crashplan.service
● crashplan.service - LSB: CrashPlan Engine
Loaded: loaded (/etc/init.d/crashplan; bad; vendor preset: enabled)
Drop-In: /etc/systemd/system/crashplan.service.d
└─override.conf
Active: active (exited) since Thu 2017-01-05 00:33:50 PST; 5 days ago
Docs: man:systemd-sysv-generator(8)
Jan 05 00:33:50 roberts systemd[1]: Stopped LSB: CrashPlan Engine.
Jan 05 00:33:50 roberts systemd[1]: Starting LSB: CrashPlan Engine...
Jan 05 00:33:50 roberts crashplan[25491]: Starting CrashPlan Engine ... Using standard startup
Jan 05 00:33:50 roberts crashplan[25491]: OK
Jan 05 00:33:50 roberts systemd[1]: Started LSB: CrashPlan Engine.
So… systemd seems to think that it's not running and that's cool? There are no logs suggesting that it even tried to restart it? I can't even figure out how to tell when it crashed. What's going on here?
Best Answer
When the init.d script doesn't specify a PID file, its autogenerated unit has
RemainAfterExit=yes
. In most cases such scripts represent oneshot tasks which don't have a long-running process, so this option makes such units show up as 'active' even after the process exits.This allows the admin to 'stop' such a unit manually (e.g. "starting" /etc/init.d/iptables load firewall rules, and "stopping" it would flush them). However, since the unit is always 'active', it means the restart logic will never trigger. (After all, there is nothing to restart.)
The solution here would be to write a native systemd .service file for CrashPlan – or at least make the daemon produce a pidfile and add
# pidfile: /run/...
to the initscript accordingly....Alternatively, first run
systemctl cat crashplan.service
to see the full unit contents, then manually undo all the wrong parameters: RemainAfterExit, GuessMainPID, and so on.See also commit f87883039 and file sysv-generator.c line 197.