Apache – Unkillable apache2 processes

apache-httpdprocesssegmentation faultstrace

After upgrading from debian lenny to squeeze (apache2 2.2.16-6+squeeze4 and php 5.3.10-1~dotdeb.1), my apache processes started to exit with segmentation fault. It happens every 5-30 minutes (for 1 process), so it does not do any impact right now. Problem is that sometimes instead of exiting with SIGSEGV, they just go crazy and loop with 100% system cpu usage in SIGBUS error:

# strace -p27635
Process 27635 attached - interrupt to quit
--- SIGBUS (Bus error) @ 0 (0) ---
--- SIGBUS (Bus error) @ 0 (0) ---
--- SIGBUS (Bus error) @ 0 (0) ---
--- SIGBUS (Bus error) @ 0 (0) ---
--- SIGBUS (Bus error) @ 0 (0) ---
--- SIGBUS (Bus error) @ 0 (0) ---

They don't respond to kill -9. But when I run strace on the process after killing it, it just repeats the SIGBUS twice and then exits.

# strace -p27635
Process 27635 attached - interrupt to quit
--- SIGBUS (Bus error) @ 0 (0) ---
--- SIGBUS (Bus error) @ 0 (0) ---
+++ killed by SIGKILL +++

Why is the process unkillable without strace? How does running strace influence the process that it can exit?

(I know that there is probably something wrong with my setup of apache/php modules, but in this question I'm interested in the strange behaviour of the unkillable processes. I will ask another question if I am unable to fix apache/php.)

Best Answer

First, check your RAM.

A process that doesn't respond to SIGKILL is a symptom of either a kernel bug or a hardware bug. When you haven't just changed your kernel, the most likely reason is that your RAM is failing, so check it. Kernel bugs can have subtle causes (such as using the wrong version of gcc) and manifest themselves subtly (such as working perfectly except that the X server wouldn't start — same true story). It's not very likely that your new kernel is buggy, if you're using the distribution-provided kernel that a lot of other users are using, but it could happen — possibly a rare bug triggered by a combination of drivers and activity patterns. Try another kernel.

There may also be a bug in Apache that causes it to crash, but if SIGKILL doesn't work, it's not Apache's fault.

Related Question