Why isn’t systemd restarting this service that has Restart=always

systemd

I have a backup daemon running on my server that's crashing every few days. I'm not sure why. In the long run I'd like to figure out why and fix it, but in the mean time I'd like systemd to restart it when it crashes.

It has an old-style SysV init script, which is being picked up by systemd-sysv-generator. Apparently when it crashes it does so with a zero ("successful") exit code. To try to get it to restart after these crashes, I dropped in an override.conf:

~$ cat /etc/systemd/system/crashplan.service.d/override.conf
[Service]
Restart=always

systemd does appear to be picking this up:

roberts:~$ sudo systemctl show crashplan.service | grep Restart
Restart=always
RestartUSec=100ms

Nonetheless, when I checked on it after a few days, I found:

roberts:~$ sudo systemctl status crashplan.service
● crashplan.service - LSB: CrashPlan Engine
   Loaded: loaded (/etc/init.d/crashplan; bad; vendor preset: enabled)
  Drop-In: /etc/systemd/system/crashplan.service.d
           └─override.conf
   Active: active (exited) since Thu 2017-01-05 00:33:50 PST; 5 days ago
     Docs: man:systemd-sysv-generator(8)

Jan 05 00:33:50 roberts systemd[1]: Stopped LSB: CrashPlan Engine.
Jan 05 00:33:50 roberts systemd[1]: Starting LSB: CrashPlan Engine...
Jan 05 00:33:50 roberts crashplan[25491]: Starting CrashPlan Engine ... Using standard startup
Jan 05 00:33:50 roberts crashplan[25491]: OK
Jan 05 00:33:50 roberts systemd[1]: Started LSB: CrashPlan Engine.

So… systemd seems to think that it's not running and that's cool? There are no logs suggesting that it even tried to restart it? I can't even figure out how to tell when it crashed. What's going on here?

Best Answer

When the init.d script doesn't specify a PID file, its autogenerated unit has RemainAfterExit=yes. In most cases such scripts represent oneshot tasks which don't have a long-running process, so this option makes such units show up as 'active' even after the process exits.

This allows the admin to 'stop' such a unit manually (e.g. "starting" /etc/init.d/iptables load firewall rules, and "stopping" it would flush them). However, since the unit is always 'active', it means the restart logic will never trigger. (After all, there is nothing to restart.)

The solution here would be to write a native systemd .service file for CrashPlan – or at least make the daemon produce a pidfile and add # pidfile: /run/... to the initscript accordingly.

...Alternatively, first run systemctl cat crashplan.service to see the full unit contents, then manually undo all the wrong parameters: RemainAfterExit, GuessMainPID, and so on.

See also commit f87883039 and file sysv-generator.c line 197.

Related Question