How is it possible to do a live update while a program is running

executablefilesupgrade

I wonder how killer applications such as Thunderbird or Firefox can be updated via the system's package manager while they are still running. What happens with the old code while they are being updated? What do I have to do when I want to write a program a.out that updates itself while it is running?

Best Answer

Replacing files in general

First, there are several strategies to replace a file:

Open the existing file for writing, truncate it to 0 length, and write the new content. (A less common variant is to open the existing file, overwrite the old content with the new content, truncate the file to the new length if it's shorter.) In shell terms:
```
echo 'new content' >somefile
```
Remove the old file, and create a new file by the same name. In shell terms:
```
rm somefile
echo 'new content' >somefile
```
Write to a new file under a temporary name, then move the new file to the existing name. The move deletes the old file. In shell terms:
```
echo 'new content' >somefile.new
mv somefile.new somefile
```

I won't list all the differences between the strategies, I'll just mention some that are important here. With stategy 1, if any process is currently using the file, the process sees the new content as it's being updated. This can cause some confusion if the process expects the file content to remain the same. Note that this is only about processes that have the file open (as visible in lsof or in /proc/PID/fd/; interactive applications that have a document open (e.g. opening a file in an editor) usually do not keep the file open, they load the file content during the “open document” operation and they replace the file (using one of the strategies above) during the “save document” operation.

With strategies 2 and 3, if some process has the file somefile open, the old file remains open during the content upgrade. With strategy 2, the step of removing the file in fact only removes the file's entry in the directory. The file itself is only removed when it has no directory entry leading to it (on typical Unix filesystems, there can be more than one directory entry for the same file) and no process has it open. Here's a way to observe this — the file is only removed when the sleep process is killed (rm only removes its directory entry).

echo 'old content' >somefile
sleep 9999999 <somefile &
df .
rm somefile
df .
cat /proc/$!/fd/0
kill $!
df .

With strategy 3, the step of moving the new file to the existing name removes the directory entry leading to the old content and creates a directory entry leading to the new content. This is done in one atomic operation, so this strategy has a major advantage: if a process opens the file at any time, it will either see the old content or the new content — there's no risk of getting mixed content or of the file not existing.

Replacing executables

If you try strategy 1 with a running executable on Linux, you'll get an error.

cp /bin/sleep .
./sleep 999999 &
echo oops >|sleep
bash: sleep: Text file busy

A “text file” means a file containing executable code for obscure historical reasons. Linux, like many other unix variants, refuses to overwrite the code of a running program; a few unix variants allow this, leading to crashes unless the new code was a very well though-out modification of the old code.

On Linux, you can overwrite the code of a dynamically loaded library. It's likely to lead to a crash of the program that's using it. (You might not be able to observe this with sleep because it loads all the library code it needs when it starts. Try a more complex program that does something useful after sleeping, like perl -e 'sleep 9; print lc $ARGV[0]'.)

If an interpreter is running a script, the script file is opened in an ordinary way by the interpreter, so there is no protection against overwriting the script. Some interpreters read and parse the whole script before they start executing the first line, others read the script as needed. See What happens if you edit a script during execution? and How Does Linux deal with shell scripts? for more details.

Strategies 2 and 3 are safe for executables as well: although running executables (and dynamically loaded libraries) aren't open files in the sense of having a file descriptor, they behave in a very similar way. As long as some program is running the code, the file remains on disk even without a directory entry.

Upgrading an application

Most package managers use strategy 3 to replace files, because of the major advantage mentioned above — at any point in time, opening the file leads to a valid version of it.

Where application upgrades can break is that while upgrading one file is atomic, upgrading the application as a whole isn't if the application consists of multiple files (program, libraries, data, …). Consider the following sequence of events:

An instance of the application is started.
The application is upgraded.
The running instance application opens one of its data files.

In step 3, the running instance of the old version of the application is opening a data file from the new version. Whether this works or not depends on the application, of which file it is and how much the file has been modified.

After an upgrade, you'll note that the old program is still running. If you want to run the new version, you'll have to exit the old program and run the new version. Package managers usually kill and restart daemons on an upgrade, but leave end-user applications alone.

A few daemons have special procedures to handle upgrades without having to kill the daemon and wait for the new instance to restart (which causes a service disruption). This is necessary in the case of init, which cannot be killed; init systems provide a way to request that the running instance call execve to replace itself with the new version.

Related Solutions

Upgrade – Identify Running Programs Using Old Version of a Replaced Library

I found two ways to do this:

Debian-specific, lists most deleted/replaced files held by processes (with the exception of certain files known to be transient, e.g. stuff in /tmp): The debian-goodies package contains checkrestart, which accomplishes something like what I've described by scraping the output of lsof to find open files that are gone or replaced on disk. It identifies the processes in question and (if possible) the package to which they belong and any init script that can be used to restart them. The -v option will identify the files concerned.
Generic, manual, allows specifying the file you're worried about: You can look at the output of lsof to identify open file handles to deleted or replaced files. In the output of lsof -nnP, such a file appears to be identified by DEL in the fourth column. You can do something like lsof -nnP | grep DEL.*libssl.so to look for stale handles to a particular library (OpenSSL, in this case). This is probably highly dependent on the specific version of lsof you use and the behavior of your package manager, so proceed with caution.
```
pluto      3592       root  DEL       REG      202,0               98831 /lib/i386-linux-gnu/libssl.so.1.0.0
pluto      3604       root  DEL       REG      202,0               98831 /lib/i386-linux-gnu/libssl.so.1.0.0
```

Linux – Monitoring what program calls an executable file

This would be hacky, but if it's a dynamically linked executable, you could set up a global preload in /etc/ld.so.preload which would only trigger a logging hook if it detected you were in the right executable.

Something like:

#define _XOPEN_SOURCE
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define TARGET "/some_executable"

__attribute__((constructor)) 
static void 
logger(int argc, char** argv){ 
    /*catch own argv right here and parent's later from /proc */

    static char buf[sizeof(TARGET)];

    readlink("/proc/self/exe", buf, sizeof(buf)-1);

    if ( 0==strcmp(TARGET, buf)){
        /* ... */
        syslog(/*...*/);
    }
}

The obvious disadvantage of this approach is it would slightly delay the execution of each dynamically linked executable on your system, but my measurements indicate the delay is quite small (<1ms where fork+exec costs about 2ms).

As for the dropped permission problem, you could have a small setuid-root binary that will unconditionally read and echo its grandparents proc files (the status file, most likely), possibly if and only if its parent is the executable whose parents you want to log. You could then spawn that setuid executable inside your logging hook to obtain the info on the executables parent (grandparent of the setuid helper).