Shell Script Locking – Correct Locking in Shell Scripts

coreutilscronlocknfsshell-script

Sometimes you have to make sure that only one instance of a shell script is running at the same time.

For example a cron job which is executed via crond that does not provide
locking on its own (e.g. the default Solaris crond).

A common pattern to implement locking is code like this:

#!/bin/sh
LOCK=/var/tmp/mylock
if [ -f $LOCK ]; then            # 'test' -> race begin
  echo Job is already running\!
  exit 6
fi
touch $LOCK                      # 'set'  -> race end
# do some work
rm $LOCK

Of course, such code has a race condition. There is a time window where the
execution of two instances can both advance after line 3 before one is able to
touch the $LOCK file.

For a cron job this is usually not a problem because you have an interval of
minutes between two invocations.

But things can go wrong – for example when the lockfile is on a NFS server –
that hangs. In that case several cron jobs can block on line 3 and queue up. If
the NFS server is active again then you have thundering herd of parallel
running jobs.

Searching on the web I found the tool lockrun which seems like a good
solution to that problem. With it you run a script that needs locking like
this:

$ lockrun --lockfile=/var/tmp/mylock myscript.sh

You can put this in a wrapper or use it from your crontab.

It uses lockf() (POSIX) if available and falls back to flock() (BSD). And lockf() support over NFS should be relatively widespread.

Are there alternatives to lockrun?

What about other cron daemons? Are there common crond's that support locking in a
sane way? A quick look into the man page of Vixie Crond (default on
Debian/Ubuntu systems) does not show anything about locking.

Would it be a good idea to include a tool like lockrun into coreutils?

In my opinion it implements a theme very similar to timeout, nice and friends.

Best Answer

Here's another way to do locking in shell script that can prevent the race condition you describe above, where two jobs may both pass line 3. The noclobber option will work in ksh and bash. Don't use set noclobber because you shouldn't be scripting in csh/tcsh. ;)

lockfile=/var/tmp/mylock

if ( set -o noclobber; echo "$$" > "$lockfile") 2> /dev/null; then

        trap 'rm -f "$lockfile"; exit $?' INT TERM EXIT

        # do stuff here

        # clean up after yourself, and release your trap
        rm -f "$lockfile"
        trap - INT TERM EXIT
else
        echo "Lock Exists: $lockfile owned by $(cat $lockfile)"
fi

YMMV with locking on NFS (you know, when NFS servers are not reachable), but in general it's much more robust than it used to be. (10 years ago)

If you have cron jobs that do the same thing at the same time, from multiple servers, but you only need 1 instance to actually run, the something like this might work for you.

I have no experience with lockrun, but having a pre-set lock environment prior to the script actually running might help. Or it might not. You're just setting the test for the lockfile outside your script in a wrapper, and theoretically, couldn't you just hit the same race condition if two jobs were called by lockrun at exactly the same time, just as with the 'inside-the-script' solution?

File locking is pretty much honor system behavior anyways, and any scripts that don't check for the lockfile's existence prior to running will do whatever they're going to do. Just by putting in the lockfile test, and proper behavior, you'll be solving 99% of potential problems, if not 100%.

If you run into lockfile race conditions a lot, it may be an indicator of a larger problem, like not having your jobs timed right, or perhaps if interval is not as important as the job completing, maybe your job is better suited to be daemonized.

EDIT BELOW - 2016-05-06 (if you're using KSH88)

Base on @Clint Pachl's comment below, if you use ksh88, use mkdir instead of noclobber. This mostly mitigates a potential race condition, but doesn't entirely limit it (though the risk is miniscule). For more information read the link that Clint posted below.

lockdir=/var/tmp/mylock
pidfile=/var/tmp/mylock/pid

if ( mkdir ${lockdir} ) 2> /dev/null; then
        echo $$ > $pidfile
        trap 'rm -rf "$lockdir"; exit $?' INT TERM EXIT
        # do stuff here

        # clean up after yourself, and release your trap
        rm -rf "$lockdir"
        trap - INT TERM EXIT
else
        echo "Lock Exists: $lockdir owned by $(cat $pidfile)"
fi

And, as an added advantage, if you need to create tmpfiles in your script, you can use the lockdir directory for them, knowing they will be cleaned up when the script exits.

For more modern bash, the noclobber method at the top should be suitable.

Best Answer

EDIT BELOW - 2016-05-06 (if you're using KSH88)

Related Solutions

Shell – Locking in a shell script

Is `ln` atomic and reliable on NFS? Could NFS replace GFS in this use case

Related Question