The answer seems to have been to put the following in my init.d script, which I put just before the start-stop-daemon calls in do_start:
ulimit -r ## (where ## is a sufficiently high number; 99 works)
The way I was able to determine this was by making system calls to ulimit -a inside of a bash command inside of my code:
bash -c "ulimit -a"
The bash part is necessary, because ulimit -a is a shell builtin. ulimit -a on /bin/sh returns different information unrelated to real-time priority. For some reason, I found that my real-time priority was limited to 0 (no real-time priority) when my service is started at boot. When I run it with service or by calling the init.d script, it inherits my permissions which allow for real-time priority. But when the system calls it through the Upstart/SystemV backwards compatibility system, it doesn't get that elevated privilege. I suppose this might relate to posts I have seen that say Upstart doesn't read /etc/security/limits.conf which is where you would set system-wide real-time priority permissions for non-privileged users.
If anyone can verify or explain why this solution works, I would love to hear it.
The need for an After=
or Before=
can finally be seen in examples from archlinux (a remarkable source of help as usual). Based on that link, there are two solutions to running a command on suspend and resume.
One method is to use two units, say mysyssuspend
and mysysresume
. The following examples just run the date
command to syslog so we can see when they get activated:
/etc/systemd/system/mysyssuspend.service
[Unit]
Before=suspend.target
[Service]
Type=simple
StandardOutput=syslog
ExecStart=/bin/date +'mysyssuspend start %%H:%%M:%%S'
[Install]
WantedBy=suspend.target
/etc/systemd/system/mysysresume.service
[Unit]
After=suspend.target
[Service]
Type=simple
StandardOutput=syslog
ExecStart=/bin/date +'mysysresume start %%H:%%M:%%S'
[Install]
WantedBy=suspend.target
As usual, do a systemctl daemon-reload
and systemctl enable mysyssuspend mysysresume
after creating the unit files.
The first unit has a Before
dependency on the suspend target and gets run when the computer enters suspend. The second unit similarly has an After
dependency, and gets run on resuming.
The other method puts all the commands in a single unit:
/etc/systemd/system/mysuspendresume.service
[Unit]
Before=sleep.target
StopWhenUnneeded=yes
[Service]
Type=oneshot
StandardOutput=syslog
RemainAfterExit=yes
ExecStart=/bin/date +'mysuspendresume start %%H:%%M:%%S'
ExecStop=/bin/date +'mysuspendresume stop %%H:%%M:%%S'
[Install]
WantedBy=sleep.target
This works with StopWhenUnneeded=yes
, so the service is stopped when no active service requires it. The sleep target also has StopWhenUnneeded
, so when it is finished it will run ExecStop
of our unit.
The RemainAfterExit
is needed so that our unit is still seen as active, even after ExecStart
has finished.
I tested both of these methods on Ubuntu 18.04.5 with systemd version 237 and they both seem to work correctly.
Rather than trying to merge your requirement into the above working mechanisms, it is probably more pragmatic to use one of them to stop/start an independent unit. For example, use the second method and add a mylongrun
service:
/etc/systemd/system/mysuspendresume.service
[Unit]
Before=sleep.target
StopWhenUnneeded=yes
[Service]
Type=oneshot
StandardOutput=syslog
RemainAfterExit=yes
ExecStart=-/bin/date +'my1 %%H:%%M:%%S' ; /bin/systemctl stop mylongrun ; /bin/date +'my2 %%H:%%M:%%S'
ExecStop=-/bin/date +'my3 %%H:%%M:%%S' ; /bin/systemctl start mylongrun ; /bin/date +'my4 %%H:%%M:%%S'
[Install]
WantedBy=sleep.target
/etc/systemd/system/mylongrun.service
[Unit]
Description=Long Run
[Service]
Type=simple
StandardOutput=syslog
ExecStart=/bin/bash -c 'date +"my11 %%H:%%M:%%S"; while sleep 2; do date +"my12 %%H:%%M:%%S"; done'
ExecStop=/bin/bash -c 'date +"my13 %%H:%%M:%%S"; sleep 10; date +"my14 %%H:%%M:%%S"'
[Install]
WantedBy=multi-user.target
Testing this by starting mylongrun
then closing the lid gives the following journalctl entries:
09:29:19 bash[3626]: my12 09:29:19
09:29:21 bash[3626]: my12 09:29:21
09:29:22 systemd-logind[803]: Lid closed.
09:29:22 systemd-logind[803]: Suspending...
09:29:22 date[3709]: my1 09:29:22
09:29:22 systemd[1]: Stopping Long Run...
09:29:22 bash[3715]: my13 09:29:22
09:29:23 bash[3626]: my12 09:29:23
09:29:25 bash[3626]: my12 09:29:25
09:29:27 bash[3626]: my12 09:29:27
09:29:29 bash[3626]: my12 09:29:29
09:29:31 bash[3626]: my12 09:29:31
09:29:32 bash[3715]: my14 09:29:32
09:29:32 systemd[1]: Stopped Long Run.
09:29:32 date[3729]: my2 09:29:32
09:29:32 systemd[1]: Reached target Sleep.
09:29:33 systemd[1]: Starting Suspend...
We can see the long running stop command (sleep 10
) completed correctly. On resume, the long run command is started again:
09:35:12 systemd[1]: Stopped target Sleep.
09:35:12 systemd[1]: mysuspendresume.service: Unit not needed anymore. Stopping.
09:35:12 systemd[1]: Reached target Suspend.
09:35:12 date[3813]: my3 09:35:12
09:35:12 systemd[1]: Started Long Run.
09:35:12 date[3817]: my4 09:35:12
09:35:12 bash[3816]: my11 09:35:12
09:35:14 bash[3816]: my12 09:35:14
09:35:16 bash[3816]: my12 09:35:16
09:35:18 bash[3816]: my12 09:35:18
Best Answer
Upstart will consider the job stopped if the main process (what is run if the script or exec stanzas are specified) exits. Upstart will then run the post-start process.
So what is happening is the first script is running and exiting, Upstart is considering the job stopped, then the second script is running and exiting. If you run the stop command on an already stopped job, it prints the message you saw.
To handle this, use a pre-start stanza:
if you do this, Upstart will see the job as started once the pre-start stanza finishes, and not as stopped.