As you suspect, the exact behaviour is shell-dependent, but a baseline level of functionality is specified by POSIX.
Command search and execution for the standard shell command language (which most shells implement a superset of) has a lot of cases, but we're only interested for the moment in the case where PATH
is used. In that case:
the command shall be searched for using the PATH environment variable as described in XBD Environment Variables
and
If the search is successful:
[...]
the shell executes the utility in a separate utility environment with actions equivalent to calling the execl()
function [...] with the path argument set to the pathname resulting from the search.
In the unsuccessful case, execution fails and an exit code of 127 is returned with an error message.
This behaviour is consistent with the execvp
function, in particular. All the exec*
functions accept the file name of a program to run, a sequence of arguments (which will be the argv
of the program), and perhaps a set of environment variables. For the versions using PATH
lookup, POSIX defines that:
The argument file is used to construct a pathname that identifies the new process image file [...] the path prefix for this file is obtained by a search of the directories passed as the environment variable PATH
The behaviour of PATH is defined elsewhere as:
This variable shall represent the sequence of path prefixes that certain functions and utilities apply in searching for an executable file known only by a filename. The prefixes shall be separated by a <colon> ( ':' ). When a non-zero-length prefix is applied to this filename, a <slash> shall be inserted between the prefix and the filename if the prefix did not end in . A zero-length prefix is a legacy feature that indicates the current working directory. It appears as two adjacent characters ( "::" ), as an initial <colon> preceding the rest of the list, or as a trailing <colon> following the rest of the list. A strictly conforming application shall use an actual pathname (such as .) to represent the current working directory in PATH. The list shall be searched from beginning to end, applying the filename to each prefix, until an executable file with the specified name and appropriate execution permissions is found. If the pathname being sought contains a <slash>, the search through the path prefixes shall not be performed. If the pathname begins with a <slash>, the specified path is resolved (see Pathname Resolution). If PATH is unset or is set to null, the path search is implementation-defined.
That's a bit dense, so a summary:
- If the program name has a
/
(slash, U+002F SOLIDUS) in it, treat it as a path in the usual fashion, and skip the rest of this process. For the shell, this case technically doesn't arise (because the shell rules will have dealt with it already).
- The value of
PATH
is split into pieces at each colon, and then each component processed from left to right. As a special (historical) case, an empty component of a non-empty variable is treated as .
(the current directory).
- For each component, the program name is appended to the end with a joining
/
and the existence of a file by that name is checked, and if one does exist then valid execute (+x) permissions are checked as well. If either of those checks fails, the process moves on to the next component. Otherwise, the command resolves to this path and the search is done.
- If you run out of components, the search fails.
- If there's nothing in
PATH
, or it doesn't exist, do whatever you want.
Real shells will have builtin commands, which are found before this lookup, and often aliases and functions as well. Those don't interact with PATH
. POSIX defines some behaviour around those, and your shell may have much more.
While it's possible to rely on exec*
to do most of this for you, the shell in practice may implement this lookup itself, notably for caching purposes, but the empty-cache behaviour should be similar. Shells have fairly wide latitude here and have subtly different behaviours in the corner cases.
As you found, Bash uses a hash table to remember the full paths of commands it's seen before, and that table can be accessed with the hash
function. The first time you run a command it searches, and when a result is found it gets added to the table so there's no need to bother looking the next time you try it.
In zsh, on the other hand, the full PATH
is generally searched when the shell starts. A lookup table is prepopulated with all discovered command names so that runtime lookups usually aren't necessary (unless a new command is added). You can notice that happening when you try to tab-complete a command that didn't exist before.
Very lightweight shells, like dash
, tend to delegate as much behaviour as possible to the system library and don't bother to remember past command paths.
Best Answer
All modern CPUs have the capacity to interrupt the currently-executing machine instruction. They save enough state (usually, but not always, on the stack) to make it possible to resume execution later, as if nothing had happened (the interrupted instruction will be restarted from scratch, usually). Then they start executing an interrupt handler, which is just more machine code, but placed at a special location so the CPU knows where it is in advance. Interrupt handlers are always part of the kernel of the operating system: the component that runs with the greatest privilege and is responsible for supervising execution of all the other components.1,2
Interrupts can be synchronous, meaning that they are triggered by the CPU itself as a direct response to something the currently-executing instruction did, or asynchronous, meaning that they happen at an unpredictable time because of an external event, like data arriving on the network port. Some people reserve the term "interrupt" for asynchronous interrupts, and call synchronous interrupts "traps", "faults", or "exceptions" instead, but those words all have other meanings so I'm going to stick with "synchronous interrupt".
Now, most modern operating systems have a notion of processes. At its most basic, this is a mechanism whereby the computer can run more than one program at the same time, but it is also a key aspect of how operating systems configure memory protection, which is is a feature of most (but, alas, still not all) modern CPUs. It goes along with virtual memory, which is the ability to alter the mapping between memory addresses and actual locations in RAM. Memory protection allows the operating system to give each process its own private chunk of RAM, that only it can access. It also allows the operating system (acting on behalf of some process) to designate regions of RAM as read-only, executable, shared among a group of cooperating processes, etc. There will also be a chunk of memory that is only accessible by the kernel.3
As long as each process accesses memory only in the ways that the CPU is configured to allow, memory protection is invisible. When a process breaks the rules, the CPU will generate a synchronous interrupt, asking the kernel to sort things out. It regularly happens that the process didn't really break the rules, only the kernel needs to do some work before the process can be allowed to continue. For instance, if a page of a process's memory needs to be "evicted" to the swap file in order to free up space in RAM for something else, the kernel will mark that page inaccessible. The next time the process tries to use it, the CPU will generate a memory-protection interrupt; the kernel will retrieve the page from swap, put it back where it was, mark it accessible again, and resume execution.
But suppose that the process really did break the rules. It tried to access a page that has never had any RAM mapped to it, or it tried to execute a page that is marked as not containing machine code, or whatever. The family of operating systems generally known as "Unix" all use signals to deal with this situation.4 Signals are similar to interrupts, but they are generated by the kernel and fielded by processes, rather than being generated by the hardware and fielded by the kernel. Processes can define signal handlers in their own code, and tell the kernel where they are. Those signal handlers will then execute, interrupting the normal flow of control, when necessary. Signals all have a number and two names, one of which is a cryptic acronym and the other a slightly less cryptic phrase. The signal that's generated when the a process breaks the memory-protection rules is (by convention) number 11, and its names are
SIGSEGV
and "Segmentation fault".5,6An important difference between signals and interrupts is that there is a default behavior for every signal. If the operating system fails to define handlers for all interrupts, that is a bug in the OS, and the entire computer will crash when the CPU tries to invoke a missing handler. But processes are under no obligation to define signal handlers for all signals. If the kernel generates a signal for a process, and that signal has been left at its default behavior, the kernel will just go ahead and do whatever the default is and not bother the process. Most signals' default behaviors are either "do nothing" or "terminate this process and maybe also produce a core dump."
SIGSEGV
is one of the latter.So, to recap, we have a process that broke the memory-protection rules. The CPU suspended the process and generated a synchronous interrupt. The kernel fielded that interrupt and generated a
SIGSEGV
signal for the process. Let's assume the process did not set up a signal handler forSIGSEGV
, so the kernel carries out the default behavior, which is to terminate the process. This has all the same effects as the_exit
system call: open files are closed, memory is deallocated, etc.Up till this point nothing has printed out any messages that a human can see, and the shell (or, more generally, the parent process of the process that just got terminated) has not been involved at all.
SIGSEGV
goes to the process that broke the rules, not its parent. The next step in the sequence, though, is to notify the parent process that its child has been terminated. This can happen in several different ways, of which the simplest is when the parent is already waiting for this notification, using one of thewait
system calls (wait
,waitpid
,wait4
, etc). In that case, the kernel will just cause that system call to return, and supply the parent process with a code number called an exit status.7 The exit status informs the parent why the child process was terminated; in this case, it will learn that the child was terminated due to the default behavior of aSIGSEGV
signal.The parent process may then report the event to a human by printing a message; shell programs almost always do this. Your
crsh
doesn't include code to do that, but it happens anyway, because the C library routinesystem
runs a full-featured shell,/bin/sh
, "under the hood".crsh
is the grandparent in this scenario; the parent-process notification is fielded by/bin/sh
, which prints its usual message. Then/bin/sh
itself exits, since it has nothing more to do, and the C library's implementation ofsystem
receives that exit notification. You can see that exit notification in your code, by inspecting the return value ofsystem
; but it won't tell you that the grandchild process died on a segfault, because that was consumed by the intermediate shell process.Footnotes
Some operating systems don't implement device drivers as part of the kernel; however, all interrupt handlers still have to be part of the kernel, and so does the code that configures memory protection, because the hardware doesn't allow anything but the kernel to do these things.
There may be a program called a "hypervisor" or "virtual machine manager" that is even more privileged than the kernel, but for purposes of this answer it can be considered part of the hardware.
The kernel is a program, but it is not a process; it is more like a library. All processes execute parts of the kernel's code, from time to time, in addition to their own code. There may be a number of "kernel threads" that only execute kernel code, but they do not concern us here.
The one and only OS you are likely to have to deal with anymore that can't be considered an implementation of Unix is, of course, Windows. It does not use signals in this situation. (Indeed, it does not have signals; on Windows the
<signal.h>
interface is completely faked by the C library.) It uses something called "structured exception handling" instead.Some memory-protection violations generate
SIGBUS
("Bus error") instead ofSIGSEGV
. The line between the two is underspecified and varies from system to system. If you've written a program that defines a handler forSIGSEGV
, it is probably a good idea to define the same handler forSIGBUS
."Segmentation fault" was the name of the interrupt generated for memory-protection violations by one of the computers that ran the original Unix, probably the PDP-11. "Segmentation" is a type of memory protection, but nowadays the term "segmentation fault" refers generically to any sort of memory protection violation.
All the other ways the parent process might be notified of a child having terminated, end up with the parent calling
wait
and receiving an exit status. It's just that something else happens first.