Shell – Handling of stale file locks in Linux and robust usage of flock

flocklockshell-script

I have a script I execute via cron regularly (every few minutes). However the script should not run multiple times in parallel and it sometimes runs a bit longer, thus I wanted to implement some locking, i.e. making sure the script is terminated early if a previous instance is already running.

Based on various recommendations I have a locking that looks like this:

lock="/run/$(basename "$0").lock"
exec {fd}<>"$lock"
flock -n $fd || exit 1

This should call the exit 1 in case another instance of the script is still running.

Now here's the problem: It seems sometimes a stale lock survives even though the script is already terminated. This effectively means the cron is never executed again (until the next reboot or by deleting the locked file), which of course is not what I want.

I figured out there's the lslocks command that lists existing file locks. It shows this:

(unknown)        2732 FLOCK        WRITE 0     0   0 /run...                                                                 

The process (2732 in this case) no longer exists (e.g. in ps aux). It is also unclear to me why it doesn't show the full filename (i.e. only /run…). lslocks has a parameter –notrucate which sounded to me it may avoid truncating filenames, however that does not change the output, it's still /run…

So I have multiple questions:

  • Why are these locks there and what situation causes a lock from flock to exist beyond the lifetime of the process?
  • Why does lslocks not show the full path/filename?
  • What is a good way to avoid this and make the locking in the script more robust?
  • Is there some way to cleanup stale locks without a reboot?

Best Answer

An flock lock is associated with a file description object; it will go away once all file descriptors referring to the file description have been closed (see the flock.2 manpage).

If the file is still locked, then the file descriptor is almost certainly still referenced from either the original process or a child process (assuming that you haven't used things like file descriptor passing to propagate a reference to it outside the original process hierarchy).

I would recommend checking sudo fuser $lock_path.

To work around this issue, there are two methods I know of: Either you prevent the shell from letting child processes inherit the file descriptor, or you kill all the processes still referencing it, e.g. using fuser -k ....

The path you are seeing is incomplete because lslocks uses /proc/locks to gather information; this file contains an identifier for the mountpoint and information on the process that acquired the lock, but not the path to the locked file. If lslocks can't find the file descriptor holding the lock while inspecting that process, it falls back to only printing the mount point.

Related Question