Shell – What Happens When Executing a File

execkernelshebangshell

So, I thought I had a good understanding of this, but just ran a test (in response to a conversation where I disagreed with someone) and found that my understanding is flawed…

In as much detail as possible what exactly happens when I execute a file in my shell? What I mean is, if I type: ./somefile some arguments into my shell and press return (and somefile exists in the cwd, and I have read+execute permissions on somefile) then what happens under the hood?

I thought the answer was:

The shell make a syscall to exec, passing the path to somefile
The kernel examines somefile and looks at the magic number of the file to determine if it is a format the processor can handle
If the magic number indicates that the file is in a format the processor can execute, then
1. a new process is created (with an entry in the process table)
2. somefile is read/mapped to memory. A stack is created and execution jumps to the entry point of the code of somefile, with ARGV initialized to an array of the parameters (a char**, ["some","arguments"])
If the magic number is a shebang then exec() spawns a new process as above, but the executable used is the interpreter referenced by the shebang (e.g. /bin/bash or /bin/perl) and somefile is passed to STDIN
If the file doesn't have a valid magic number, then an error like "invalid file (bad magic number): Exec format error" occurs

However someone told me that if the file is plain text, then the shell tries to execute the commands (as if I had typed bash somefile). I didn't believe this, but I just tried it, and it was correct. So I clearly have some misconceptions about what actually happens here, and would like to understand the mechanics.

What exactly happens when I execute a file in my shell? (in as much detail is reasonable…)

Best Answer

The definitive answer to "how programs get run" on Linux is the pair of articles on LWN.net titled, surprisingly enough, How programs get run and How programs get run: ELF binaries. The first article addresses scripts briefly. (Strictly speaking the definitive answer is in the source code, but these articles are easier to read and provide links to the source code.)

A little experimentation show that you pretty much got it right, and that the execution of a file containing a simple list of commands, without a shebang, needs to be handled by the shell. The execve(2) manpage contains source code for a test program, execve; we'll use that to see what happens without a shell. First, write a testscript, testscr1, containing

#!/bin/sh

pstree

and another one, testscr2, containing only

pstree

Make them both executable, and verify that they both run from a shell:

chmod u+x testscr[12]
./testscr1 | less
./testscr2 | less

Now try again, using execve (assuming you built it in the current directory):

./execve ./testscr1
./execve ./testscr2

testscr1 still runs, but testscr2 produces

execve: Exec format error

This shows that the shell handles testscr2 differently. It doesn't process the script itself though, it still uses /bin/sh to do that; this can be verified by piping testscr2 to less:

./testscr2 | less -ppstree

On my system, I get

    |-gnome-terminal--+-4*[zsh]
    |                 |-zsh-+-less
    |                 |     `-sh---pstree

As you can see, there's the shell I was using, zsh, which started less, and a second shell, plain sh (dash on my system), to run the script, which ran pstree. In zsh this is handled by zexecve in Src/exec.c: the shell uses execve(2) to try to run the command, and if that fails, it reads the file to see if it has a shebang, processing it accordingly (which the kernel will also have done), and if that fails it tries to run the file with sh, as long as it didn't read any zero byte from the file:

        for (t0 = 0; t0 != ct; t0++)
            if (!execvebuf[t0])
                break;
        if (t0 == ct) {
            argv[-1] = "sh";
            winch_unblock();
            execve("/bin/sh", argv - 1, newenvp);
        }

bash has the same behaviour, implemented in execute_cmd.c with a helpful comment (as pointed out by taliezin):

Execute a simple command that is hopefully defined in a disk file somewhere.

fork ()

connect pipes

look up the command

do redirections

execve ()

If the execve failed, see if the file has executable mode set. If so, and it isn't a directory, then execute its contents as a shell script.

POSIX defines a set of functions, known as the exec(3) functions, which wrap execve(2) and provide this functionality too; see muru's answer for details. On Linux at least these functions are implemented by the C library, not by the kernel.

Related Solutions

Bash – How does bash execute an ELF file

Bash knows nothing about ELF. It simply sees that you asked it to run an external program, so it passes the name you gave it as-is to execve(2). Knowledge of things like executable file formats, shebang lines, and execute permissions lives behind that syscall, in the kernel.

(It is the same for other shells, though they may choose to use another function in the exec(3) family instead.)

In Bash 4.3, this happens on line 5195 of execute_cmd.c in the shell_execve() function.

If you want to understand Linux at the source code level, I recommend downloading a copy of Research Unix V6 or V7, and going through that rather than all the complexity that is in the modern Linux systems. The Lions Book is a good guide to the code.

V7 is where the Bourne shell made its debut. Its entire C source code is just a bit over half the size of just that one C file in Bash. The Thompson shell in V6 is nearly half the size of the original Bourne shell. Yet, both of these simpler shells do the same sort of thing as Bash, and for the same reason. (It appears to be an execv(2) call from texec() in the Thompson shell and an execve() call from execs() in the Bourne shell's service.c module.)

Shell – Is it possible to exec some commands in a subshell without immediately exiting afterwards

The problem lies in how you're calling the . special builtin:

exec /bin/sh -c '. vars.sh; /usr/bin/fish'

In sh, if the argument doesn't contain any /, . searches for the file in $PATH. So above, it would look for vars.sh in $PATH instead of the current directory as you intended.

Also, . being a special builtin, its failure causes the shell to exit (when not interactive), so the next command (here fish) is not executed which is why your terminal emulator window goes away without a fish prompt.

That can be prevented by calling . as command . which removes the special attribute of special builtins.

Note that the behaviour of bash (the sh implementation of the GNU project) is different in that regard when not in POSIX mode (when not called as sh, nor with --posix, and when the environment doesn't contain POSIXLY_CORRECT= nor SHELLOPTS=posix):

bash's . doesn't cause the shell to exit upon failure and it searches for slash-less argument in the current directory if it can't find it in $PATH.

In any case, POSIX mode or not, if you want the vars.sh in the current directory, you need the ./vars.sh syntax. So it's

exec sh -c 'command . ./vars.sh; exec fish'

Best Answer

Related Solutions

Bash – How does bash execute an ELF file

Shell – Is it possible to exec some commands in a subshell without immediately exiting afterwards

Related Question