How does a multi core computer freeze (at the hardware level)

cpufreezemulti-core

I have a 4 core i7 computer that freezes. The display stays, but nothing will ever move again. This question is not about getting help on that particular problem, but a general question about how a computer can freeze.

And it is not about blue screens either. I am talking about a sudden, complete halt of the system. Although one can never be sure, here are what I mean by completly frozen :

  • Incator ligths on the keyboard (like caps lock) no longer toggle
  • Purpose built software that blinks an icon in the system tray no longer updates
  • No input possible (mouse, keyboard and power button) unresponsive
  • can't ping or WOL the computer
  • Music (read from network or localy stops)
  • Bluetooth radio no longer responsive
  • Closing and opening the cover has no effect
  • Will stay that way for hours and CPU stays somewhat cool (I can't reach it)

Way back when, your signle CPU could halt if it encountered an unexpected situation. Maybe an unknown opcode. The comptuter would suddenly freeze. If you had an ICE debugger attached to it, you could see the trace that led to the frozen CPU. I've seen that (too) often with Z80, 6800 and 8086 CPU.

With multiple cores, why can't the computer run on the remaining cores, if only to write a core dump ? In other words, what other single point of failure are there on a multi-core computer ?

Best Answer

Given the description of the freeze you're describing, it does sounds like a hardware-level issue, however not necessarily caused by the CPU. That said, a multi-CPU system can definitely tangle itself into a deadlock on all cores, if each is running a thread or process that are each waiting on a resource the other thread/process has allocated. A search on "CPU deadlock" provides lots of details on possible conditions. A failure due to overheating or improper voltage settings could also cause intermittent behavior - although I've only seen systems shutdown or refuse to POST when this is the case.

FYI - I've seen similar problems on systems with bad memory sticks, and bad video cards. You might try running some burn-in diagnostics such as MemTest+, and/or benchmarking the system with different pieces of hardware removed to see if you can isolate the unstable component(s).