Atomically write a file without changing inodes (preserve hard link)

hard linkinoderenamesystem-callswrite

The normal way to safely, atomically write a file X on Unix is:

Write the new file contents to a temporary file Y.
rename(2) Y to X

In two steps it appears that we have done nothing but change X "in-place".

It is protected against race conditions and unintentional data loss (where X is destroyed but Y is incomplete or destroyed).

The drawback (in this case) of this is that it doesn't write the inode referred to by X in-place; rename(2) makes X refer to a new inode number.

When X was a file with link count > 1 (an explicit hard link), now it doesn't refer to the same inode as before, the hard link is broken.

The obvious way to eliminate the drawback is to write the file in-place, but this is not atomic, can fail, might result in data loss etc.

Is there some way to do it atomically like rename(2) but preserve hard links?

Perhaps to change the inode number of Y (the temporary file) to the same as X, and give it X's name? An inode-level "rename."

This would effectively write the inode referred to by X with Y's new contents, but would not break its hard-link property, and would keep the old name.

If the hypothetical inode "rename" was atomic, then I think this would be atomic and protected against data loss / races.

Best Answer

The issue

You have a (mostly) exhaustive list of systems calls here.

You will notice that there is no "replace the content of this inode" call. Modifying that content always implies:

Opening the file to get a file descriptor.
optional seek to the desired write offset
Writing to the file.
optional Truncating old data, if new data is smaller.

Step 4 can be done earlier. There are some shortcuts as well, such as pwrite, which directly write at a specified offset, combining steps #2 and #3, or scatter writing.

An alternate way is to use a memory mapping, but it gets worse as every byte written may be sent to the underlying file independently (conceptually as if every write was a 1-byte write call).

→ The point is the very best scenario you can have is still 2 operations: one write and one truncate.

Whatever the order you perform them in, you still risk another process to mess with the file in between and end up with a corrupted file.

Solutions

Normal solution

As you have noted, this is why the canonical approach is to create a new file, you know you are the only writer of (you can even guarantee this by combining O_TMPFILE and linkat), then atomically redirect the old name to the new file.

There are two other options, however both fail in some way:

Mandatory locking

It enables file access to be denied to other processes by setting a special flag combination. Sounds like the tool for the job, right? However:

It must be enabled at the filesystem level (it's a flag when mounting).
Warning: the Linux implementation of mandatory locking is unreliable.

Since Linux 4.5, mandatory locking has been made an optional feature. This is an initial step toward removing this feature completely.

This is only logical, as Unix has always shun away from locks. They are error prone, and it is impossible to cover all edge cases and guarantee no deadlock.

Advisory locking

It is set using the fcntl system call. However, it is only advisory, and most programs simply ignore it.

In fact it is only good for managing locks on shared file among several processes cooperating.

Conclusion

Is there some way to do it atomically like rename(2) but preserve hard links?

No.

Inodes are low level, almost an implementation detail. Very few APIs acknowledge their existence (I believe the stat family of calls is the only one).

Whatever you try to do probably relies on either misusing the design of Unix filesystems or simply asking too much to it.

Could this be somewhat of an XY-problem?

Related Solutions

Hard Links – Why Do Hard Links Seem to Take the Same Space as Originals?

A file is an inode with meta data among which a list of pointers to where to find the data.

In order to be able to access a file, you have to link it to a directory (think of directories as phone directories, not folders), that is add one or more entries to one of more directories to associate a name with that file.

All those links, those file names point to the same file. There's not one that is the original and the other ones that are links. They are all access points to the same file (same inode) in the directory tree. When you get the size of the file (lstat system call), you're retrieving information (that metadata referred to above) stored in the inode, it doesn't matter which file name, which link you're using to refer to that file.

By contrast symlinks are another file (another inode) whose content is a path to the target file. Like any other file, those symlinks have to be linked to a directory (must have a name) so you can access them. You can also have several links to a symlinks, or in other words, symlinks can be given several names (in one or more directories).

$ touch a
$ ln a b
$ ln -s a c
$ ln c d
$ ls -li [a-d]
10486707 -rw-r--r-- 2 stephane stephane 0 Aug 27 17:05 a
10486707 -rw-r--r-- 2 stephane stephane 0 Aug 27 17:05 b
10502404 lrwxrwxrwx 2 stephane stephane 1 Aug 27 17:05 c -> a
10502404 lrwxrwxrwx 2 stephane stephane 1 Aug 27 17:05 d -> a

Above the file number 10486707 is a regular file. Two entries in the current directory (one with name a, one with name b) link to it. Because the link count is 2, we know there's no other name of that file in the current directory or any other directory. File number 10502404 is another file, this time of type symlink linked twice to the current directory. Its content (target) is the relative path "a".

Note that if 10502404 was linked to another directory than the current one, it would typically point to a different file depending on how it was accessed.

$ mkdir 1 2
$ echo foo > 1/a
$ echo bar > 2/a
$ ln -s a 1/b
$ ln 1/b 2/b
$ ls -lia 1 2
1:
total 92
10608644 drwxr-xr-x   2 stephane stephane  4096 Aug 27 17:26 ./
10485761 drwxrwxr-x 443 stephane stephane 81920 Aug 27 17:26 ../
10504186 -rw-r--r--   1 stephane stephane     4 Aug 27 17:24 a
10539259 lrwxrwxrwx   2 stephane stephane     1 Aug 27 17:26 b -> a

2:
total 92
10608674 drwxr-xr-x   2 stephane stephane  4096 Aug 27 17:26 ./
10485761 drwxrwxr-x 443 stephane stephane 81920 Aug 27 17:26 ../
10539044 -rw-r--r--   1 stephane stephane     4 Aug 27 17:24 a
10539259 lrwxrwxrwx   2 stephane stephane     1 Aug 27 17:26 b -> a
$ cat 1/b
foo
$ cat 2/b
bar

Files have no names associated with them other than in the directories that link them. The space taken by their names is the entries in those directories, it's accounted for in the file size/disk usage of the directories.

You'll notice that the system call to remove a file is unlink. That is, you don't remove files, you unlink them from the directories they're referenced in. Once unlinked from the last directory that had an entry to a given file, that file is then destroyed (as long as no process has it opened).

Linux Hard Links – How Does Hard-Linking to a Directory Work?

Hard links to directories aren't fundamentally different to hard links for files. In fact, many filesystems do have hard links on directories, but only in a very disciplined way.

In a filesystem that doesn't allow users to create hard links to directories, a directory's links are exactly

the . entry in the directory itself;
the .. entries in all the directories that have this directory as their parent;
one entry in the directory that .. points to.

An additional constraint in such filesystems is that from any directory, following .. nodes must eventually lead to the root. This ensures that the filesystem is presented as a single tree. This constraint is violated on filesystems that allow hard links to directories.

Filesystems that allow hard links to directories allow more cases than the three above. However they maintain the constraint that these cases do exist: a directory's . always exists and points to itself; a directory's .. always points to a directory that has it as an entry. Unlinking a directory entry that is a directory only removes it if it contains no entry other than . and ...

Thus a dangling .. cannot happen. What can go wrong is that a part of the filesystem can become detached. If a directory's .. pointing to one of its descendants, so that ../../../.. eventually forms a loop. (As seen above, filesystems that don't allow hard link manipulations prevent this.) If all the paths from the root to such a directory are unlinked, the part of the filesystem containing this directory cannot be reached anymore, unless there are processes that still have their current directory on it. That part can't even be deleted since there's no way to get at it.

GCFS allows directory hard links and runs a garbage collector to delete such detached parts of the filesystem. You should read its specification, which addresses your concerns in details. This is an interesting intellectual exercise, but I don't know of any filesystem that's used in practice that provides garbage collection.