I have a cluster with a bunch of servers with a shared disk containing a GFS global file system that all nodes access simultaneously.
Each node in the cluster run the same program (a shell script is the main core).
The system processes files that appear in a couple of input directories, and it works like this:
- the program loops through the input directories.
- for each file found, check existence of a "lock file", if lock file exists skip to next file.
- if no lock file found, create lock file. If lockfile creation failed (race lost), skip to next file
- if "we" own the lock, process the file and move it out of the way when it is finished.
This all works very well, but I wonder if there are cheaper (less complex) solutions that would also work. I'm thinking NFS or SMB perhaps.
There are two reasons for my use of GFS:
- each file is stored in one place only (on redundant underlying hardware of course)
- file locking works reliably
I create the lockfile like this:
date '+%s:'${unid} > ${currlock}.${unid}
ln ${currlock}.${unid} ${currlock}
lockrc=$?
rm -f ${currlock}.${unid}
where $unid
is a unique session identifier and $currlock
is /gfs/tmp/lock.${file_to_process}
The beauty of ln
is that it is atomic, so it fails for all but one that attempts the same thing at the same time.
So, I guess what I'm asking is: will NFS fill my needs? Does ln
work reliably in the same way on NFS as on GFS?
Best Answer
The
link()
system call on the NFS client should map directly to the NFSLINK
operation, which the server should implement using itslink()
system call. So as long aslink()
is atomic on the server, it will also be atomic on the clients.