Will an `unlink` or `rename` portably and atomically make a `link` fail

concurrencyfilesportabilityrename

Question

Suppose I have some non-directory (file, named pipe/socket, whatever) at the pathname /tmp/foo and some other non-directory at the pathname /tmp/bar. Then two (or more) processes start executing concurrently:

Process one does:

unlink('/tmp/foo') /* or rename('/tmp/foo', '/tmp/removed') */
unlink('/tmp/bar') /* or rename('/tmp/bar', '/tmp/removed') */

Process two (and so on) does:

link('/tmp/foo', '/tmp/bar')

As I understand it, there is no way process two could possibly succeed (either the link(2) is attempted while /tmp/foo is still present, in which case /tmp/bar is also present so it must fail with EEXIST, or /tmp/foo is gone so is must fail with ENOENT).

But this intuition relies on the assumption that the unlink(2) and/or rename(2) system calls are inherently sequential in their unlinking effects, so I am looking for verification of my understanding: Is there any *nix-like system out there whose kernel allows the two unlink(2) and/or rename(2) calls to succeed, but simultaneously causes link(2) to succeed as well (whether due to re-order the unlinking of /tmp/foo and /tmp/bar and not abstracting/hiding that from the process calling link(2), or through through some other quirky race condition/bug)?

Current Understanding

I have read the manpages for unlink(2), rename(2), and link(2) for Linux and a few BSDs, and the POSIX specification for these functions. But I don't think they actually contain anything reassuring on this matter, upon careful consideration. At least with rename(2), we're promised that the destination is atomically replaced if it's already present (bugs in the OS itself aside), but nothing else.

I have seen claims that multiple simultaneous executions of rename(foo, qux) will atomically and portably have all but one rename fail with ENOENT – so that's promising! I am just uncertain if that can be extended to having a link(foo, bar) fail with ENOENT under the same circumstances as well.

Preferred Answers

I realize that this is one of those "can't prove a negative" situations – we can at best only note that there is no evidence that a *nix-like system which will allow process two's link(2) to succeed exists.

So what I'm looking for is answers covering as many *nix-like systems as possible (at least Linux, OS X, and the various BSDs, but ideally also the proprietary still-in-some-use systems like Solaris 10) – from people who have sufficient familiarity with these systems and this narrow set of problems (atomic/well-ordered file system operations) that they're confident (as much as one realistically can be) that they'd know of issues like the aforementioned Mac OS X rename(2)-not-actually-atomic bug if they existed on the platforms they're familiar with. That would give me enough confidence that this works the way I think it does in a portable-enough manner to rely on.

Final Note

This isn't an "X/Y problem" question – there's no underlying problem that can be answered by referring me to the various locking/IPC mechanisms or something else that works around the uncertainty about how these particular system calls interact: I specifically want to know if one can rely on the above system calls portably interacting as expected across *nix-like systems in practical use today.

Best Answer

Look at standards such as POSIX for portability guarantees. In practice, most POSIX-compliant systems have minor deviations from the specifications, but generally speaking you can rely on the guarantees given in the specification. Most modern unices comply with the specification even if they haven't been formally tested. They may need to be run in a POSIX mode, e.g. setting POSIXLY_CORRECT=1 with bash or making sure that /usr/xpg4/bin is ahead of /bin and /usr/bin in PATH on Solaris.

Single Unix v2 (an older extension of POSIX) has this to say about link:

The link() function will atomically create a new link for the existing file and the link count of the file is incremented by one.

About rename:

If the link named by the new argument exists, it is removed and old renamed to new. In this case, a link named new will remain visible to other processes throughout the renaming operation and will refer either to the file referred to by new or old before the operation began.

POSIX explicitly states that if the destination exists, its replacement must be atomic. It does not however state that the renaming itself must be atomic, i.e. that there is no point in time when both old and new refer to the file in question, or when neither does. In practice, those properties are true on unix systems, at least with local filesystems.

Furthermore the order of operations is guaranteed: in C, ; guarantees sequential execution; in sh, ;/newline guarantees sequential execution (as do && and so on); other programming languages offer similar guarantees. So in

unlink("/tmp/foo");
unlink("/tmp/bar");

it is guaranteed that there is no point in time when /tmp/foo exists but not /tmp/bar (assuming that /tmp/bar exists initially). Therefore a concurrent process executing link("/tmp/foo", "/tmp/bar") cannot succeed.

Note that atomicity does not guarantee resilience. Atomicity is about observable behavior on a live system. Resilience, in the context of filesystems, is about what happens in case of a system crash. Many filesystems sacrifice resilience for performance, so if the execution of unlink("foo"); unlink("bar"); is interrupted (with the current directory on on-disk storage), it is possible that bar will be deleted and foo will remain behind.

Some network filesystems give fewer guarantees when operations happen on different clients. Older implementations of NFS were notorious for this. I think modern implementations are better but I have no experience of modern NFS.

Related Solutions

What’s the use dirsync option for mount

sync does everything dirsync does, plus more. Unfortunately this 'more' is a significant performance penalty. With sync enabled, all disk I/O is immediately written to disk. With dirsync, only directory operations are immediately written.

The only case I've seen where one might want to use dirsync instead of sync is in the case of network filesystems. When multiple boxes are working in a shared directory, they might try to create the file at around the same time. One box will create the file, but without dirsync on, the file isn't visible to other boxes yet. With dirsync on, the file will show up immediately, so the other servers at least know it exists and can now perform file locking on it.

Bash – Batch rename image files by age plus add date and variable to filename

Most unices don't track a file's creation date¹. “Creation date” is ill-defined anyway (does copying a file create a new file?). You can use the file's modification time, which is by a reasonable interpretation the date at which the latest version of the data was created. If you make copies of the file, make sure to retain the modification time (e.g. cp -p or cp -a if you use the cp command, not bare cp).

A few file formats have a field inside the file where the creator application fills in a creation date. This is often the case for photos, where the camera will fill in some Exif data in JPEG or TIFF images, including the creation time. Nikon's NEF image format wraps around TIFF and supports Exif as well.

There are ready-made tools to rename image files containing Exif data to include the creation date in the file name. renaming images to include creation date in name shows two solutions, with exiftool and exiv2.

I don't think either tool lets you include a counter in the file name. You can do your renaming in two passes: first include the date (with as high resolution as possible to retain the order) in the file name, then number the files according to that date part (and chuck away the time). Since modern DSLRs can fire bursts of images (Nikon's D4s shoots at 11fps) it is advisable to retain the original filename as well in the first phase, as otherwise it would potentially lead to several files with the same file name.

exiv2 mv -r %Y%m%d-%H%M%S:basename: *.NEF
# exiv2 uses `strftime(3)`, so `%Y%m%d-%H%M%S` returns YYYYMMDD-hhmmss
# :basename: is a naming variable exiv2's `-r`-handle provides. See `exiv2 -h` for more  
# Now you have files with names like 20140630-235958_DSCC1234.NEF.
# Note that chronological order and lexicographic order agree with this naming format.
i=10000
for x in *.NEF; do
  i=$((i+1))
  mv "$x" "${x%-*}_FOO_${i#1}.NEF"
done

${x%-*} removes the part after the - character. The counter variable i counts from 10000 and is used with the leading 1 digit stripped; this is a trick to get the leading zeroes so that all counter values have the same number.

Rename files by incrementing a number within the filename has other solutions for renaming a bunch of files to include a counter.

If you want to use a file's timestamp rather than Exif data, see Renaming a bunch of files with date modified timestamp at the end of the filename?

As a general note, don't generate shell code and then pipe it into a shell. It's needlessly convoluted. For example, instead of

find -name '*.NEF' | 
gawk 'BEGIN{ a=1 }{ printf "mv %s %04d.NEF\n", $0, a++ }' | 
bash

you can write

find -name '*.NEF' | 
gawk 'BEGIN{ a=1 }{ system(sprintf("mv %s %04d.NEF\n", $0, a++)) }'

Note that both versions could lead to catastrophic results if a file name contained shell special characters (such as spaces, ', $, `, etc.) since the file name is interpreted as shell code. There are ways to turn this into robust code, but this isn't the easiest approach, so I won't pursue that approach.

¹ _{Note that there is something called the “ctime”, but the c isn't for creation, it's for change. The ctime changes every time anything changes about the file, either in its content or in its metadata (name, permissions, …). The ctime is pretty much the antithesis of a creation time.}