Linux – sigaction(7): semantics of siginfo_t’s si_code member

linux-kernelsegmentation faultsignals

I've got a long-running program (becomes a daemon with daemon(3) call) that exits on Signal 11 (Segmentation Violation) every so often. I can't tell why. So, I wrote a SIGSEGV handler, set using the sigaction() system call. I set the handler function so that it has this prototype: void (*sa_sigaction)(int, siginfo_t *, void *) which means it gets a pointer to a siginfo_t structure as a formal argument.

On the occasion of a mysterious SIGSEGV, the si_code element of the siginfo_t has a value of 0x80, which means, according to the sigaction man page, "The kernel" sent the signal. This is on a Red Hat RHEL system: Linux blahblah 2.6.18-308.20.1.el5 #1 SMP Tue Nov 6 04:38:29 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

Why does the kernel send a SIGSEGV? Is this from the famed OOM-Killer, or does some other reason exist for getting a SIGSEGV? As a mere user on this system, I can't see /var/log/message, and the sysadmins are more than a bit aloof, probably because they come from a Windows background.

A SIGSEGV generated on purpose (dereferencing a NULL pointer) does not get an si_code value of 0x80, it gets 0x1, which means "address not mapped to object".

Best Answer

The undocumented semantic of si_code = SI_KERNEL with si_errno = 0 is,

  1. processor-specific traps
  2. kernel segment memory violation (except for semaphore access)
  3. ELF file format violations, and
  4. stack violations.

All other SIGSEGVs should have a si_errno set to a non-zero value. Read on for the details.

When the kernel sets up a userspace process, it defines a table of virtual memory pages for the process. When the kernel scheduler runs the process, it reconfigures the CPU's memory management unit (MMU) according to the page table for the process.

When a userspace process attempts to access memory that is outside of its page table, the CPU MMU detects this violation and generates an exception. Note that this happens at the hardware level. The kernel is not involved yet.

The kernel is set up to handle MMU exceptions. It catches the exception caused by the running proccess's attempt to access memory outside of its page table. The kernel then calls do_page_fault() which sends the SIGSEGV signal to the process. This is why the signal comes from the kernel and not from the process itself or from another process.

This is a highly simplified explanation of course. The best simple explanation that I have seen of this is the "Page Faults" section of William Gatliff's beautiful article The Linux Kernel’s Memory Management Unit API.

Note that on CPU's without an MMU, such as the Blackfin MPU's, Linux userspace processes can generally access any memory. i.e. there is no SIGSEGV signal for memory violations (only for traps such as stack overflow) and debugging memory access problems can be tricky.

I second jordanm's comment regarding setting the ulimit and inspecting the core file with gdb. You can do ulimit -c unlimited from the command line if you run the process from a shell, or use the libc setrlimit system call wrapper (man setrlimit) in your program. You can set the name of the core file and its location by in file /proc/sys/kernel/core_pattern. See A.P. Lawrence's excellent gloss on this at Controlling core files (Linux). To use gdb on the corefile, see this little tutorial on Steve.org.

A segmentation violation with si_code SEGV_MAPERR (0x1) is likely a null pointer dereference, an access of non-existent memory such as 0xfffffc0000004000, or malloc and free problems. Heap corruption or process exceeding its runtime limits (man getrlimit) in the case of malloc and double free or free of non-allocated address in the case of free. Look at the si_errno element for more clues.

A segmentation violation that occurs as a result of userspace process accessing virtual memory above the TASK_SIZE limit will cause a segmentation violation with an si_code of SI_KERNEL. In other words, the TASK_SIZE limit is the highest virtual address that any process is allowed to access. This is normally 3GB unless the kernel is configured for high memory support. The area above the TASK_SIZE limit is referred to as the "kernel segment". See linux-2.6//arch/x86/mm/fault.c:__bad_area_nosemaphore(...) where it calls force_sig_info_fault(...).

For each architecture there are also a number of specific traps that cause a SISEGV with SI_KERNEL. For x86 these are defined by the DO_ERROR macros in linux-2.6//arch/x86/kernel/traps.c.

The OOM handler sends SIGKILL, not SIGSEGV as can be seen in function linux-2.6//mm/oom_kill.c:oom_kill_process(...) at about line 498:

do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);

for related processes and line 503:

do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);

for the process that was the proximal cause of the OOM.

You can get more information by looking at the wait status of the process that was killed from its parent process and possibly by looking at dmesg or better, by configuring the kernel log and looking at it.

Related Question