Linux – What Happens If You Edit a Script During Execution?

bashlinuxprocess-managementshell

I have a general question, which might be a result of misunderstanding of how processes are handled in Linux.

For my purposes I am going to define a 'script' as a snippet of bash code saved to a text file with execute permissions enabled for the current user.

I have a series of scripts that call each other in tandem. For simplicity's sake I'll call them scripts A, B, and C. Script A carries out a series of statements and then pauses, it then executes script B, then it pauses, then it executes script C. In other words, the series of steps is something like this:

Run Script A:

Series of statements
Pause
Run Script B
Pause
Run Script C

I know from experience that if I run script A until the first pause, then make edits in script B, those edits are reflected in the execution of the code when I allow it to resume. Likewise if I make edits to script C while script A is still paused, then allow it to continue after saving changes, those changes are reflected in the execution of the code.

Here is the real question then, is there any way to edit Script A while it is still running? Or is editing impossible once its execution begins?

Best Answer

In Unix, most editors work by creating a new temporary file containing the edited contents. When the edited file is saved, the original file is deleted and the temporary file renamed to the original name. (There are, of course, various safeguards to prevent dataloss.) This is, for example, the style used by sed or perl when invoked with the -i ("in-place") flag, which is not really "in-place" at all. It should have been called "new place with old name".

This works well because unix assures (at least for local filesystems) that an opened file continues to exist until it is closed, even if it is "deleted" and a new file with the same name is created. (It's not coincidental that the unix system call to "delete" a file is actually called "unlink".) So, generally speaking, if a shell interpreter has some source file open, and you "edit" the file in the manner described above, the shell won't even see the changes since it still has the original file open.

[Note: as with all standards-based comments, the above is subject to multiple interpretations and there are various corner-cases, such as NFS. Pedants are welcome to fill the comments with exceptions.]

It is, of course, possible to modify files directly; it's just not very convenient for editing purposes, because while you can overwrite data in a file, you cannot delete or insert without shifting all following data, which would imply quite a lot of rewriting. Furthermore, while you were doing that shifting, the contents of the file would be unpredictable and processes which had the file open would suffer. In order to get away with this (as with database systems, for example), you need a sophisticated set of modification protocols and distributed locks; stuff which is well beyond the scope of a typical file editing utility.

So, if you want to edit a file while its being processed by a shell, you have two options:

You can append to the file. This should always work.
You can overwrite the file with new contents of exactly the same length. This may or may not work, depending on whether the shell has already read that part of the file or not. Since most file I/O involves read buffers, and since all the shells I know read an entire compound command before executing it, it is pretty unlikely that you can get away with this. It certainly wouldn't be reliable.

I don't know of any wording in the Posix standard which actually requires the possibility of appending to a script file while the file is being executed, so it might not work with every Posix compliant shell, much less with the current offering of almost- and sometimes-posix-compliant shells. So YMMV. But as far as I know, it does work reliably with bash.

As evidence, here's a "loop-free" implementation of the infamous 99 bottles of beer program in bash, which uses dd to overwrite and append (the overwriting is presumably safe because it substitutes the currently executing line, which is always the last line of the file, with a comment of exactly the same length; I did that so that the end result can be executed without the self-modifying behaviour.)

#!/bin/bash
if [[ $1 == reset ]]; then
  printf "%s\n%-16s#\n" '####' 'next ${1:-99}' |
  dd if=/dev/stdin of=$0 seek=$(grep -bom1 ^#### $0 | cut -f1 -d:) bs=1 2>/dev/null
  exit
fi

step() {
  s=s
  one=one
  case $beer in
    2) beer=1; unset s;;
    1) beer="No more"; one=it;;
    "No more") beer=99; return 1;;
    *) ((--beer));;
  esac
}
next() {
  step ${beer:=$(($1+1))}
  refrain |
  dd if=/dev/stdin of=$0 seek=$(grep -bom1 ^next\  $0 | cut -f1 -d:) bs=1 conv=notrunc 2>/dev/null
}
refrain() {
  printf "%-17s\n" "# $beer bottles"
  echo echo ${beer:-No more} bottle$s of beer on the wall, ${beer:-No more} bottle$s of beer.
  if step; then
    echo echo Take $one down, pass it around, $beer bottle$s of beer on the wall.
    echo echo
    echo next abcdefghijkl
  else
    echo echo Go to the store, buy some more, $beer bottle$s of beer on the wall.
  fi
}
####
next ${1:-99}   #

Related Solutions

Shell – How to alter PATH within a shell script

You have to use source or eval or to spawn a new shell.

When you run a shell script a new child shell is spawned. This child shell will execute the script commands. The father shell environment will remain untouched by anything happens in the child shell.

There are a lot of different techniques to manage this situation:

Prepare a file sourcefile containg a list of commands to source in the current shell:
```
export JAVA_HOME=/cygdrive/c/dev/Java/jdk1.5.0_22
export PATH=$JAVA_HOME/bin:$PATH
```
and then source it
```
source sourcefile
```
note that there is no need for a sha-bang at the begin of the sourcefile, but it will work with it.

Prepare a script evalfile.sh that prints the command to set the environment:

#!/bin/sh
echo "export JAVA_HOME=/cygdrive/c/dev/Java/jdk1.5.0_22"
echo "export PATH=$JAVA_HOME/bin:$PATH"

and then evaluate it:

eval `evalfile.sh`

Configure and run a new shell:

#!/bin/sh
export JAVA_HOME=/cygdrive/c/dev/Java/jdk1.5.0_22
export PATH=$JAVA_HOME/bin:$PATH

exec /bin/bash

note that when you type exit in this shell, you will return to the father one.

Put an alias in your ~/.bashrc:

alias prepare_environ='export JAVA_HOME=/cygdrive/c/dev/Java/jdk1.5.0_22; export PATH=$JAVA_HOME/bin:$PATH;'

and call it when needed:

prepare_environ

Linux – Modifying binary during execution

While the Stack Overflow question seemed to be enough at first, I understand, from your comments, why you may still have a doubt about this. To me, this is exactly the kind of critical situation involved when the two UNIX subsystems (processes and files) communicate.

As you may know, UNIX systems are usually divided into two subsystems: the file subsystem, and the process subsystem. Now, unless it is instructed otherwise through a system call, the kernel should not have these two subsystems interact with one another. There is however one exception: the loading of an executable file into a process' text regions. Of course, one may argue that this operation is also triggered by a system call (execve), but this is usually known to be the one case where the process subsystem makes an implicit request to the file subsystem.

Because the process subsystem naturally has no way of handling files (otherwise there would be no point in dividing the whole thing in two), it has to use whatever the file subsystem provides to access files. This also means that the process subsystem is submitted to whatever measure the file subsystem takes regarding file edition/deletion. On this point, I would recommend reading Gilles' answer to this U&L question. The rest of my answer is based on this more general one from Gilles.

The first thing that should be noted is that internally, files are only accessible through inodes. If the kernel is given a path, its first step will be to translate it into a inode to be used for all other operations. When a process loads an executable into memory, it does it through its inode, which has been provided by the file subsystem after translation of a path. Inodes may be associated to several paths (links), and programs may only delete links. In order to delete a file and its inode, userland must remove all existing links to that inode, and ensure that it is completely unused. When these conditions are met, the kernel will automatically delete the file from disk.

If you have a look at the replacing executables part of Gilles' answer, you'll see that depending on how you edit/delete the file, the kernel will react/adapt differently, always through a mechanism implemented within the file subsystem.

If you try strategy one (open/truncate to zero/write or open/write/truncate to new size), you'll see that the kernel won't bother handling your request. You'll get an error 26: Text file busy (ETXTBSY). No consequences whatsoever.
If you try strategy two, the first step is to delete your executable. However, since it is being used by a process, the file subsystem will kick in and prevent the file (and its inode) from being truly deleted from disk. From this point, the only way to access the old file's content is to do it through its inode, which is what the process subsystem does whenever it needs to load new data into text sections (internally, there is no point in using paths, except when translating them into inodes). Even though you've unlinked the file (removed all its paths), the process can still use it as if you'd done nothing. Creating a new file with the old path doesn't change anything: the new file will be given a completely new inode, which the running process has no knowledge of.

Strategies 2 and 3 are safe for executables as well: although running executables (and dynamically loaded libraries) aren't open files in the sense of having a file descriptor, they behave in a very similar way. As long as some program is running the code, the file remains on disk even without a directory entry.

Strategy three is quite similar since the mv operation is an atomic one. This will probably require the use of the rename system call, and since processes can't be interrupted while in kernel mode, nothing can interfere with this operation until it completes (successfully or not). Again, there is no alteration of the old file's inode: a new one is created, and already-running processes will have no knowledge of it, even if it's been associated with one of the old inode's links.

With strategy 3, the step of moving the new file to the existing name removes the directory entry leading to the old content and creates a directory entry leading to the new content. This is done in one atomic operation, so this strategy has a major advantage: if a process opens the file at any time, it will either see the old content or the new content — there's no risk of getting mixed content or of the file not existing.

Recompiling a file : when using gcc (and the behaviour is probably similar for many other compilers), you are using strategy 2. You can see that by running a strace of your compiler's processes:

stat("a.out", {st_mode=S_IFREG|0750, st_size=8511, ...}) = 0
unlink("a.out") = 0
open("a.out", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
chmod("a.out", 0750) = 0

The compiler detects that the file already exists through the stat and lstat system calls.
The file is unlinked. Here, while it is no longer accessible through the name a.out, its inode and contents remain on disk, for as long as they are being used by already-running processes.
A new file is created and made executable under the name a.out. This is a brand new inode, and brand new contents, which already-running processes don't care about.

Now, when it comes to shared libraries, the same behaviour will apply. As long as a library object is used by a process, it will not be deleted from disk, no matter how you change its links. Whenever something has to be loaded into memory, the kernel will do it through the file's inode, and will therefore ignore the changes you made to its links (such as associating them with new files).

Best Answer

Related Solutions

Shell – How to alter PATH within a shell script

Linux – Modifying binary during execution

Related Question