If you send a signal to a process, that process gets killed. I wonder how the rumor that killing a process also kills other processes got started, it seems particularly counter-intuitive.
There are, however, ways to kill more than one process. But you won't be sending a signal to one process. You can kill a whole process group by sending a signal to -1234 where 1234 is the PGID (process group ID), which is the PID of the process group leader. When you run a pipeline, the whole pipeline starts out as a process group (the applications may change this by calling setpgid
or setpgrp
).
When you start processes in the background (foo &
), they are in their own process group. Process groups are used to manage access to the terminal; normally only the foreground process group has access to the terminal. The background jobs remain in the same session, but there's no facility to kill a whole session or even to enumerate the process groups or processes in a session, so that doesn't help much.
When you close a terminal, the kernel sends the signal SIGHUP
to all processes that have it as their controlling terminal. These processes form a session, but not all sessions have a controlling terminal. For your project, one possibility is therefore to start all the processes in their own terminal, created by script, screen, etc. Kill the terminal emulator process to kill the contained processes (assuming they haven't seceded with setsid
).
You can provide more isolation by running the processes as their own user, who doesn't do anything else. Then it's easy to kill all the processes: run kill
(the system call or the utility) as that user and use -1 as the PID argument to kill, meaning “all of that user's processes”.
You can provide even more isolation, but with considerably more setup by running the contained processes in an actual container.
The undocumented semantic of si_code = SI_KERNEL
with si_errno = 0
is,
- processor-specific traps
- kernel segment memory violation (except for semaphore access)
- ELF file format violations, and
- stack violations.
All other SIGSEGV
s should have a si_errno
set to a non-zero value. Read on for the details.
When the kernel sets up a userspace process, it defines a table of virtual memory pages for the process. When the kernel scheduler runs the process, it reconfigures the CPU's memory management unit (MMU) according to the page table for the process.
When a userspace process attempts to access memory that is outside of its page table, the CPU MMU detects this violation and generates an exception. Note that this happens at the hardware level. The kernel is not involved yet.
The kernel is set up to handle MMU exceptions. It catches the exception caused by the running proccess's attempt to access memory outside of its page table. The kernel then calls do_page_fault()
which sends the SIGSEGV signal to the process. This is why the signal comes from the kernel and not from the process itself or from another process.
This is a highly simplified explanation of course. The best simple explanation that I have seen of this is the "Page Faults" section of William Gatliff's beautiful article The Linux Kernel’s Memory Management Unit API.
Note that on CPU's without an MMU, such as the Blackfin MPU's, Linux userspace processes can generally access any memory. i.e. there is no SIGSEGV signal for memory violations (only for traps such as stack overflow) and debugging memory access problems can be tricky.
I second jordanm's comment regarding setting the ulimit
and inspecting the core file with gdb
. You can do ulimit -c unlimited
from the command line if you run the process from a shell, or use the libc setrlimit
system call wrapper (man setrlimit
) in your program. You can set the name of the core file and its location by in file /proc/sys/kernel/core_pattern
. See A.P. Lawrence's excellent gloss on this at Controlling core files (Linux). To use gdb
on the corefile, see this little tutorial on Steve.org.
A segmentation violation with si_code
SEGV_MAPERR (0x1) is likely a null pointer dereference, an access of non-existent memory such as 0xfffffc0000004000, or malloc
and free
problems. Heap corruption or process exceeding its runtime limits (man getrlimit
) in the case of malloc
and double free or free of non-allocated address in the case of free
. Look at the si_errno
element for more clues.
A segmentation violation that occurs as a result of userspace process accessing virtual memory above the TASK_SIZE
limit will cause a segmentation violation with an si_code
of SI_KERNEL
. In other words, the TASK_SIZE
limit is the highest virtual address that any process is allowed to access. This is normally 3GB unless the kernel is configured for high memory support. The area above the TASK_SIZE
limit is referred to as the "kernel segment". See linux-2.6//arch/x86/mm/fault.c:__bad_area_nosemaphore(...)
where it calls force_sig_info_fault(...)
.
For each architecture there are also a number of specific traps that cause a SISEGV with SI_KERNEL
. For x86 these are defined by the DO_ERROR macros in linux-2.6//arch/x86/kernel/traps.c
.
The OOM handler sends SIGKILL, not SIGSEGV as can be seen in function linux-2.6//mm/oom_kill.c:oom_kill_process(...)
at about line 498:
do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
for related processes and line 503:
do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
for the process that was the proximal cause of the OOM.
You can get more information by looking at the wait
status of the process that was killed from its parent process and possibly by looking at dmesg
or better, by configuring the kernel log and looking at it.
Best Answer
Your information seems to be outdated. From my
sigaction
man page: