Okay, quite simply I have a shell script that needs to wait for something to happen, but it has a lock-file and some child-processes that I need to ensure are tidied up if the script is interrupted.
I've achieved this without issue by using the trap
command to set some appropriate actions, and have come up with a script that looks a bit like this:
#!/bin/sh
LOG="$0.log"
# Create a lock-file to prevent simultaneous access
lockfile -l 86400 "$LOG.lock" || $(echo 'Locking failed' >&2 && exit 3)
# Create trap for interrupt and cleanup
on_complete() {
echo $(date +%R)' Ended.' >> "$LOG"
kill $(jobs -p)
rm -f "$LOG.lock"
exit
}
trap 'on_complete 2> /dev/null' SIGTERM SIGINT SIGHUP EXIT
# Do nothing
echo $(date +%R)' Running…' >> "$LOG"
sleep 86400 &
while wait; do sleep 86400 &; done
This can be run just fine in a terminal via sh Example.sh
, and terminating it with Ctrl + C
, causing it to remove its lock-file without any fuss.
I then tried creating a launchd
job for this script like so:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>org.example</string>
<key>ProgramArguments</key>
<array>
<string>sh</string>
<string>~/Downloads/Example.sh</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>EnableGlobbing</key>
<true/>
</dict>
</plist>
Creating Example.sh and Example.plist from the above in the ~/Downloads
folder allows me to run the launchd
job via launchd load ~/Downloads/Example.plist
and end it via launchd unload ~/Downloads/Example.plist
. However, ending the job doesn't cause a SIGTERM
to reach the script, which is instead SIGKILL
'd after the 20 second timeout.
So what I'd like to know is; why isn't my script receiving SIGTERM
, and how I can ensure it does?
Best Answer
The ultimate problem here is that Bash does not normally kill its non-builtin children.
When you hit
<CTRL>+<C>
you're killing the shell script, which behaves normally -- but the sleep lives on. Useps
to see.When try to stop things externally, via
kill
, then Bash as above. After some time-out period (I'm guessing 20 seconds)launchd
then issues akill -9
which the script cannot trap.The solution is to issue a wait after the sleep, to indicate to Bash that it can interrupt itself:
This will allow the script to be interrupted, but the sleep will still survive. I'm sure there's a way to kill the children, but I didn't bother looking it up...