Sometimes you have to make sure that only one instance of a shell script is running at the same time.
For example a cron job which is executed via crond that does not provide
locking on its own (e.g. the default Solaris crond).
A common pattern to implement locking is code like this:
#!/bin/sh
LOCK=/var/tmp/mylock
if [ -f $LOCK ]; then # 'test' -> race begin
echo Job is already running\!
exit 6
fi
touch $LOCK # 'set' -> race end
# do some work
rm $LOCK
Of course, such code has a race condition. There is a time window where the
execution of two instances can both advance after line 3 before one is able to
touch the $LOCK
file.
For a cron job this is usually not a problem because you have an interval of
minutes between two invocations.
But things can go wrong – for example when the lockfile is on a NFS server –
that hangs. In that case several cron jobs can block on line 3 and queue up. If
the NFS server is active again then you have thundering herd of parallel
running jobs.
Searching on the web I found the tool lockrun which seems like a good
solution to that problem. With it you run a script that needs locking like
this:
$ lockrun --lockfile=/var/tmp/mylock myscript.sh
You can put this in a wrapper or use it from your crontab.
It uses lockf()
(POSIX) if available and falls back to flock()
(BSD). And lockf()
support over NFS should be relatively widespread.
Are there alternatives to lockrun
?
What about other cron daemons? Are there common crond's that support locking in a
sane way? A quick look into the man page of Vixie Crond (default on
Debian/Ubuntu systems) does not show anything about locking.
Would it be a good idea to include a tool like lockrun
into coreutils?
In my opinion it implements a theme very similar to timeout
, nice
and friends.
Best Answer
Here's another way to do locking in shell script that can prevent the race condition you describe above, where two jobs may both pass line 3. The
noclobber
option will work in ksh and bash. Don't useset noclobber
because you shouldn't be scripting in csh/tcsh. ;)YMMV with locking on NFS (you know, when NFS servers are not reachable), but in general it's much more robust than it used to be. (10 years ago)
If you have cron jobs that do the same thing at the same time, from multiple servers, but you only need 1 instance to actually run, the something like this might work for you.
I have no experience with lockrun, but having a pre-set lock environment prior to the script actually running might help. Or it might not. You're just setting the test for the lockfile outside your script in a wrapper, and theoretically, couldn't you just hit the same race condition if two jobs were called by lockrun at exactly the same time, just as with the 'inside-the-script' solution?
File locking is pretty much honor system behavior anyways, and any scripts that don't check for the lockfile's existence prior to running will do whatever they're going to do. Just by putting in the lockfile test, and proper behavior, you'll be solving 99% of potential problems, if not 100%.
If you run into lockfile race conditions a lot, it may be an indicator of a larger problem, like not having your jobs timed right, or perhaps if interval is not as important as the job completing, maybe your job is better suited to be daemonized.
EDIT BELOW - 2016-05-06 (if you're using KSH88)
Base on @Clint Pachl's comment below, if you use ksh88, use
mkdir
instead ofnoclobber
. This mostly mitigates a potential race condition, but doesn't entirely limit it (though the risk is miniscule). For more information read the link that Clint posted below.And, as an added advantage, if you need to create tmpfiles in your script, you can use the
lockdir
directory for them, knowing they will be cleaned up when the script exits.For more modern bash, the noclobber method at the top should be suitable.