Segmentation Fault – How Does a Segmentation Fault Work Under-the-Hood

kernelsegmentation faultshellsignals

I can't seem to find any information on this aside from "the CPU's MMU sends a signal" and "the kernel directs it to the offending program, terminating it".

I assumed that it probably sends the signal to the shell and the shell handles it by terminating the offending process and printing "Segmentation fault". So I tested that assumption by writing an extremely minimal shell I call crsh (crap shell). This shell does not do anything except take user input and feed it to the system() method.

#include <stdio.h>
#include <stdlib.h>

int main(){
    char cmdbuf[1000];
    while (1){
        printf("Crap Shell> ");
        fgets(cmdbuf, 1000, stdin);
        system(cmdbuf);
    }
}

So I ran this shell in a bare terminal (without bash running underneath). Then I proceeded to run a program that produces a segfault. If my assumptions were correct, this would either a) crash crsh, closing the xterm, b) not print "Segmentation fault", or c) both.

braden@system ~/code/crsh/ $ xterm -e ./crsh
Crap Shell> ./segfault
Segmentation fault
Crap Shell> [still running]

Back to square one, I guess. I've just demonstrated that it's not the shell that does this, but the system underneath. How does "Segmentation fault" even get printed? "Who" is doing it? The kernel? Something else? How does the signal and all of its side effects propagate from the hardware to the eventual termination of the program?

Best Answer

All modern CPUs have the capacity to interrupt the currently-executing machine instruction. They save enough state (usually, but not always, on the stack) to make it possible to resume execution later, as if nothing had happened (the interrupted instruction will be restarted from scratch, usually). Then they start executing an interrupt handler, which is just more machine code, but placed at a special location so the CPU knows where it is in advance. Interrupt handlers are always part of the kernel of the operating system: the component that runs with the greatest privilege and is responsible for supervising execution of all the other components.^1,2

Interrupts can be synchronous, meaning that they are triggered by the CPU itself as a direct response to something the currently-executing instruction did, or asynchronous, meaning that they happen at an unpredictable time because of an external event, like data arriving on the network port. Some people reserve the term "interrupt" for asynchronous interrupts, and call synchronous interrupts "traps", "faults", or "exceptions" instead, but those words all have other meanings so I'm going to stick with "synchronous interrupt".

Now, most modern operating systems have a notion of processes. At its most basic, this is a mechanism whereby the computer can run more than one program at the same time, but it is also a key aspect of how operating systems configure memory protection, which is is a feature of most (but, alas, still not all) modern CPUs. It goes along with virtual memory, which is the ability to alter the mapping between memory addresses and actual locations in RAM. Memory protection allows the operating system to give each process its own private chunk of RAM, that only it can access. It also allows the operating system (acting on behalf of some process) to designate regions of RAM as read-only, executable, shared among a group of cooperating processes, etc. There will also be a chunk of memory that is only accessible by the kernel.³

As long as each process accesses memory only in the ways that the CPU is configured to allow, memory protection is invisible. When a process breaks the rules, the CPU will generate a synchronous interrupt, asking the kernel to sort things out. It regularly happens that the process didn't really break the rules, only the kernel needs to do some work before the process can be allowed to continue. For instance, if a page of a process's memory needs to be "evicted" to the swap file in order to free up space in RAM for something else, the kernel will mark that page inaccessible. The next time the process tries to use it, the CPU will generate a memory-protection interrupt; the kernel will retrieve the page from swap, put it back where it was, mark it accessible again, and resume execution.

But suppose that the process really did break the rules. It tried to access a page that has never had any RAM mapped to it, or it tried to execute a page that is marked as not containing machine code, or whatever. The family of operating systems generally known as "Unix" all use signals to deal with this situation.⁴ Signals are similar to interrupts, but they are generated by the kernel and fielded by processes, rather than being generated by the hardware and fielded by the kernel. Processes can define signal handlers in their own code, and tell the kernel where they are. Those signal handlers will then execute, interrupting the normal flow of control, when necessary. Signals all have a number and two names, one of which is a cryptic acronym and the other a slightly less cryptic phrase. The signal that's generated when the a process breaks the memory-protection rules is (by convention) number 11, and its names are SIGSEGV and "Segmentation fault".^5,6

An important difference between signals and interrupts is that there is a default behavior for every signal. If the operating system fails to define handlers for all interrupts, that is a bug in the OS, and the entire computer will crash when the CPU tries to invoke a missing handler. But processes are under no obligation to define signal handlers for all signals. If the kernel generates a signal for a process, and that signal has been left at its default behavior, the kernel will just go ahead and do whatever the default is and not bother the process. Most signals' default behaviors are either "do nothing" or "terminate this process and maybe also produce a core dump." SIGSEGV is one of the latter.

So, to recap, we have a process that broke the memory-protection rules. The CPU suspended the process and generated a synchronous interrupt. The kernel fielded that interrupt and generated a SIGSEGV signal for the process. Let's assume the process did not set up a signal handler for SIGSEGV, so the kernel carries out the default behavior, which is to terminate the process. This has all the same effects as the _exit system call: open files are closed, memory is deallocated, etc.

Up till this point nothing has printed out any messages that a human can see, and the shell (or, more generally, the parent process of the process that just got terminated) has not been involved at all. SIGSEGV goes to the process that broke the rules, not its parent. The next step in the sequence, though, is to notify the parent process that its child has been terminated. This can happen in several different ways, of which the simplest is when the parent is already waiting for this notification, using one of the wait system calls (wait, waitpid, wait4, etc). In that case, the kernel will just cause that system call to return, and supply the parent process with a code number called an exit status.⁷ The exit status informs the parent why the child process was terminated; in this case, it will learn that the child was terminated due to the default behavior of a SIGSEGV signal.

The parent process may then report the event to a human by printing a message; shell programs almost always do this. Your crsh doesn't include code to do that, but it happens anyway, because the C library routine system runs a full-featured shell, /bin/sh, "under the hood". crsh is the grandparent in this scenario; the parent-process notification is fielded by /bin/sh, which prints its usual message. Then /bin/sh itself exits, since it has nothing more to do, and the C library's implementation of system receives that exit notification. You can see that exit notification in your code, by inspecting the return value of system; but it won't tell you that the grandchild process died on a segfault, because that was consumed by the intermediate shell process.

Footnotes

Some operating systems don't implement device drivers as part of the kernel; however, all interrupt handlers still have to be part of the kernel, and so does the code that configures memory protection, because the hardware doesn't allow anything but the kernel to do these things.
There may be a program called a "hypervisor" or "virtual machine manager" that is even more privileged than the kernel, but for purposes of this answer it can be considered part of the hardware.
The kernel is a program, but it is not a process; it is more like a library. All processes execute parts of the kernel's code, from time to time, in addition to their own code. There may be a number of "kernel threads" that only execute kernel code, but they do not concern us here.
The one and only OS you are likely to have to deal with anymore that can't be considered an implementation of Unix is, of course, Windows. It does not use signals in this situation. (Indeed, it does not have signals; on Windows the <signal.h> interface is completely faked by the C library.) It uses something called "structured exception handling" instead.
Some memory-protection violations generate SIGBUS ("Bus error") instead of SIGSEGV. The line between the two is underspecified and varies from system to system. If you've written a program that defines a handler for SIGSEGV, it is probably a good idea to define the same handler for SIGBUS.
"Segmentation fault" was the name of the interrupt generated for memory-protection violations by one of the computers that ran the original Unix, probably the PDP-11. "Segmentation" is a type of memory protection, but nowadays the term "segmentation fault" refers generically to any sort of memory protection violation.
All the other ways the parent process might be notified of a child having terminated, end up with the parent calling wait and receiving an exit status. It's just that something else happens first.

Best Answer

Footnotes

Related Solutions

Shell – How Lookup in $PATH Works Under the Hood

Related Question