Linux – What happens when a file that is 100% paged in to the page cache gets modified by another process

buffercachelinuxvirtual-memory

I know that when a page cache page is modified, it is marked dirty and requires a writeback, but what happens when:

Scenario:
The file /apps/EXE, which is an executable file, is paged in to the page cache completely (all of its pages are in cache/memory) and being executed by process P

Continuous release then replaces /apps/EXE with a brand new executable.

Assumption 1:
I assume that process P (and anyone else with a file descriptor referencing the old executable) will continue to use the old, in memory /apps/EXE without an issue, and any new process which tries to exec that path will get the new executable.

Assumption 2:
I assume that if not all pages of the file are mapped into memory, that things will be fine until there is a page fault requiring pages from the file that have been replaced, and probably a segfault will occur?

Question 1:
If you mlock all of the pages of the file with something like vmtouch does that change the scenario at all?

Question 2:
If /apps/EXE is on a remote NFS, would that make any difference? (I assume not)

Please correct or validate my 2 assumptions and answer my 2 questions.

Let's assume this is a CentOS 7.6 box with some kind of 3.10.0-957.el7 kernel

Update:
Thinking about it further, I wonder if this scenario is no different than any other dirty page scenario..

I suppose the process that writes the new binary will do a read and get all cache pages since it’s all paged in, and then all those pages will be marked dirty. If they are mlocked, they will just be useless pages occupying core memory after the ref count goes to zero.

I suspect when the currently-executing programs end, anything else will use the new binary. Assuming that’s all correct, I guess it’s only interesting when only some of the file is paged in.

Best Answer

Continuous release then replaces /apps/EXE with a brand new executable.

This is the important part.

The way a new file is released is by creating a new file (e.g. /apps/EXE.tmp.20190907080000), writing the contents, setting permissions and ownership and finally rename(2)ing it to the final name /apps/EXE, replacing the old file.

The result is that the new file has a new inode number (which means, in effect, it's a different file.)

And the old file had its own inode number, which is actually still around even though the file name is not pointing to it anymore (or there are no file names pointing to that inode anymore.)

So, the key here is that when we talk about "files" in Linux, we're most often really talking about "inodes" since once a file has been opened, the inode is the reference we keep to the file.

Assumption 1: I assume that process P (and anyone else with a file descriptor referencing the old executable) will continue to use the old, in memory /apps/EXE without an issue, and any new process which tries to exec that path will get the new executable.

Correct.

Assumption 2: I assume that if not all pages of the file are mapped into memory, that things will be fine until there is a page fault requiring pages from the file that have been replaced, and probably a segfault will occur?

Incorrect. The old inode is still around, so page faults from the process using the old binary will still be able to find those pages on disk.

You can see some effects of this by looking at the /proc/${pid}/exe symlink (or, equivalently, lsof output) for the process running the old binary, which will show /app/EXE (deleted) to indicate the name is no longer there but the inode is still around.

You can also see that the diskspace used by the binary will only be released after the process dies (assuming it's the only process with that inode open.) Check output of df before and after killing the process, you'll see it drop by the size of that old binary you thought wasn't around anymore.

BTW, this is not only with binaries, but with any open files. If you open a file in a process and remove the file, the file will be kept on disk until that process closes the file (or dies.) Similarly to how hardlinks keep a counter of how many names point to an inode in disk, the filesystem driver (in the Linux kernel) keeps a counter of how many references exist to that inode in memory, and will only release the inode from disk once all references from the running system have been released as well.

Question 1: If you mlock all of the pages of the file with something like vmtouch does that change the scenario

This question is based on the incorrect assumption 2 that not locking the pages will cause segfaults. It won't.

Question 2: If /apps/EXE is on a remote NFS, would that make any difference? (I assume not)

It's meant to work the same way and most of the time it does, but there are some "gotchas" with NFS.

Sometimes you can see the artifacts of deleting a file that's still open in NFS (shows up as a hidden file in that directory.)

You also have some way to assign device numbers to NFS exports, to make sure those won't get "reshuffled" when the NFS server reboots.

But the main idea is the same. NFS client driver still uses inodes and will try to keep files around (on the server) while the inode is still referenced.