Kernel Panic – How to Diagnose and Fix Fatal Machine Check Error

dual-boothardwarekernel

I have got a new Samsung Series 7 laptop with dual boot setup for Windows 8 and Ubuntu 12.10. A fine machine comparable to a Macbook Pro. The Ubuntu installation was quite a hassle, but with the help of Boot Repair finally it seemed to work. Or so I thought. Windows 8 starts fine, but if I want to start Ubuntu regularly the following Machine Check Exception error occurs, quite similar to this one

[Hardware Error] CPU 1: Machine Check Exception: 5 Bank 6
[Hardware Error] RIP !inexact! 33 <00007fab2074598a>
[Hardware Error] TSC 95b623464c ADDR fe400 MISC 3880000086
.. [similar messages for CPU 2,3 and 0] ..
[Hardware Error] Machine Check: Processor context corrupt
Kernel panic - not syncing: Fatal Machine Check
Rebooting in 30 seconds

Kernel panic does not sound good. Then it starts to reboot, and the second boot trial often works. Is it a Kernel or driver problem? The laptop has an Intel Core i7 processor. I already deactivated Hyperthreading in the BIOS, but it does not seem to help 🙁

I also disabled the Execute Disable Bit (EDB) flag in the BIOS. EDB is an Intel hardware-based security feature that can help reduce system exposure to viruses and malicious code. Since I disabled it, the error did occur less frequently, but it still appears occasionally 🙁 It seems to be the same error as described here and here. Maybe a Samsung specific Kernel problem? A similar error also happens on a Samsung Ultrabook Series 9 (which seems to be kernel bugs 49161 and 47121).

At my Samsung Series 7, it still occurs for instance during booting on battery after "Checking battery state". Perhaps anyone else has an idea? These Kernel Panic errors are reallly annoying..

Best Answer

According to section 15.9.1 in volume 3 of the Intel Architecture Software Developer Manual the Machine Check Exception 5 from the MSR_IA32_MCG_STATUS MSR indicates in internal parity error. After reading the manual I am not sure which specific component bank 6 refers to, so I cannot determine where this physical error has occurred.

The error message indicates that bit 57 of the Machine Check Status MSR is set indicating that the internal processor state may well be corrupted by the error condition detected and that reliable restarting of the processor may not be possible. So at this point the Kernel has no real choice apart from stopping by using a kernel panic.

As things stand, this kind of error is rare.

I am unsure why disabling the EBD can cause this issue. Perhaps it is just a co-incidence. Does this panic happen frequently with EBD enabled? Incidentally, which processor model do you have? .. to find out, use:

cat /proc/cpuinfo 
Related Question