IMac – Late 2013 iMac: help with troubleshooting intermittent crashing problem

crashimac

I've been having an intermittent problem with my late 2013 27" iMac for more than a year now and it seems to be getting more frequent.

Basically, I could be doing anything on my iMac (or nothing at all!) and I suddenly hear the fans spin up to what sounds like max speed. When the fans spin up like this, the applications I am using (typically Chrome, but others too), will work for a little while, ~30 seconds, and then become unresponsive. I am not even able to restart the iMac from the Apple menu (it just doesn't respond). Eventually, I get the "rainbow circle" (beach ball?) mouse cursor and I can't do anything else other than move the mouse around. If I keep waiting, the screen goes black and if I wait a longer time the black screen gets a circle with a slash through it. What I normally do is just power cycle the iMac as soon as I hear the fans spin up and the applications become unresponsive. The system crashes this way usually at least once and often a few times per day if I use it a lot.

Here's what I've done so far to troubleshoot:

  • After power-cycling, I go into Console.App, and select "Crash Reports" on the sidebar, it is always empty. Is it correct to assume that this means it was not an application that crashed?

  • I've tried sifting through the logs looking near the time of the crash, and I don't notice anything obviously strange, basically just normal looking logs and then boot-up messages. The problem is there's A LOT of messages. I am looking for "grave" sounding messages, but I honestly don't know what, exactly, to look for. Is it possible there's no clues in the logs?

  • I've run Memtest86, it passes all four passes. No failures detected.

  • I've wiped the machine and re-installed the OS about 3 times. The problem persists. I am on Catalina now, but this has been happening even with the previous OS, Mojave.

  • I've tried not using some Applications that are long-running. I've tried switching from Chrome to firefox, and disabled dropbox. No change.

  • Fan RPM's and Die Temps appear normal. I do notice that when the system crashes, these no longer update. Could it be a thermal issue that happens so fast, the system crashes before the sensors log the problem? I am not doing anything super-taxing to the system. I have noticed at least one spike in CPU temperature before around the time of a crash, but I haven't been able to see that consistently.

  • I cleared out the /Library/Logs/DiagnosticReports directory, and used the machine normally until the next crash. After the crash, the directory had a bunch of files. None of them had a suspicious filename extension (eg .panic, .spin, .tailspin). Only one had a timestamp that was at at most a few minutes from when the crash happened: "Google Chrome Helper (Renderer)_2021-02-02-082350_MY-MACHINENAME.wakeups_resource.diag". Sadly, I don't have even the foggiest idea what this log file is trying to tell me. The central issue seems to be something about "wake-ups", here's part of it (I can upload the whole thing somewhere if someone thinks this has critical clues):

Date/Time:        2021-02-02 08:21:48 -0500
End time:         2021-02-02 08:23:50 -0500
OS Version:       Mac OS X 10.15.7 (Build 19H114)
Architecture:     x86_64h
Report Version:   29
Incident Identifier: 52354DFF-B2C8-497A-8421-369191E5D935

Data Source:      Microstackshots
Shared Cache:     0x7b5a000 57CFFC05-B33E-3B2A-9BBC-D3A0F410A70D

Command:          Google Chrome Helper (Renderer)
Path:             /Applications/Google Chrome.app/Contents/Frameworks/Google Chrome
Framework.framework/Versions/88.0.4324.96/Helpers/Google Chrome Helper
(Renderer).app/Contents/MacOS/Google Chrome Helper (Renderer)
Identifier:       com.google.Chrome.helper.renderer
Version:          88.0.4324.96 (4324.96)
PID:              18226
Event:            wakeups
Action taken:     none
Wakeups:          45001 wakeups over the last 122 seconds (370 wakeups per second average), exceeding limit of 150 wakeups per second
over 300 seconds
Wakeups limit:    45000
Limit duration:   300s
Wakeups caused:   45001
Wakeups duration: 122s
Duration:         121.77s
Duration Sampled: 87.46s
Steps:            21

Hardware model:   iMac14,2
Active cpus:      8

Fan speed:        1202 rpm
   admin@mt-iMac DiagnosticReports % kextstat | grep -v com.apple
   Index Refs Address            Size       Wired      Name (Version) UUID <Linked Against>
     160    3 0xffffff7f84180000 0xf2000    0xf2000    org.virtualbox.kext.VBoxDrv (6.1.18) 9C1C33DF-8061-30A2-9266-C9284816A6A2 <8 6 5 3 1>
     163    0 0xffffff7f84272000 0x8000     0x8000     org.virtualbox.kext.VBoxUSB (6.1.18) 51E577B4-43B6-359F-B817-9C63A69E7943 <162 160 59 8 6 5 3 1>
     164    0 0xffffff7f8427a000 0x5000     0x5000     org.virtualbox.kext.VBoxNetFlt (6.1.18) 96E530DE-E34D-3447-89A5-FCF6646AE47E <160 8 6 5 3 1>
     165    0 0xffffff7f8427f000 0x6000     0x6000     org.virtualbox.kext.VBoxNetAdp (6.1.18) 63EFABA5-3341-3BEB-B47A-AAFCDD7312A5 <160 6 5 1>
     173    0 0xffffff7f80fb6000 0x6000     0x6000     com.getdropbox.dropbox.kext (1.13.0) 4FFF485B-204E-3E48-BC54-C1D406AB9E75 <8 6 5 2 1>
admin@my-iMac DiagnosticReports %
  • No third party hardware was connected, just Apple keyboard + trackpad.

  • I will try running safemode for some days and see if these crashes still occur. I understand that will mean it's a hardware issue, but what component/sub-system?

  • Switching to safari as my browser has kept the machine stable for 3 days and counting. Still curious about the root cause.

What else can I try? I am comfortable with disassembly and swapping parts, and in fact, am thinking about an upgrade from fusion drive to ssd and increasing the RAM, but if the system is unstable, I am hesitant to spend the bucks on an upgrade unless I can also find/fix the root cause of these crashes.

Any other ideas?

Here's the system…

  Model Name:   iMac
  Model Identifier: iMac14,2
  Processor Name:   Quad-Core Intel Core i7
  Processor Speed:  3.5 GHz
  Number of Processors: 1
  Total Number of Cores:    4
  L2 Cache (per Core):  256 KB
  L3 Cache: 8 MB
  Hyper-Threading Technology:   Enabled
  Memory:   16 GB
  Boot ROM Version: 429.0.0.0.0
  SMC Version (system): 2.15f7

Best Answer

  1. Let’s skip Console.app and check the source. Open /Library/Logs/DiagnosticReports and order the contents by descending date. Check for any reports around the time of your crashes. You want to look fro files ending in file extension .panic, .spin or .tailspin. If you find any, please provide a way for us to view those files.
  2. I’d suggest scanning your filesystem for corruption (i.e. Disk Utility) though I expect it to turn out OK, given that this issue seems to survive full disk erases.
  3. Are there third-party kernel extensions that you have installed? Please provide a copy of the output of the Terminal command kextstat.
  4. Can you work for a while in Safe Boot Mode (booting with Shift held down) and see if the problem persists?
  5. Do you have any third-party devices attached? Can you run without them?

UPDATE:

  1. Can you share the output of pmset -g log for the timespan between your most recent clean boot and the subsequent reboot after failure?
  2. Can you try reproducing this issue a few times and see if you consistently see the Chrome wakeups_resource.diag file in your logs around the time of hang? If so, please find a way to share the file (e.g. via PasteBin). Chrome may be indirectly responsible for an interrupt storm and we might be able to see that.
  3. After reproducing a few times, can you temporarily stop using Chrome completely and try living on Safari?
  4. If the hangs continue without Chrome ever running, can you try uninstalling Virtual Box and DropBox? (Make sure that kextstat doesn’t show them anymore.) Even though you are not running those apps, these kernel extensions start at boot and are always loaded into kernel memory so we have to remove them to eliminate the possibility that they’re involved in the failure sequence.
  5. Can you list the filenames of the new logs that appear in DiagnosticReports?

Re: your comment about safe mode:

I will try running safemode for some days and see if these crashes still occur. I understand that will mean it's a hardware issue, but what component/sub-system?

Actually, the overwhelming majority of unstable behavior these days is due to software (and occasionally firmware) bugs. Your symptoms don’t smell like hardware failure, particularly since you’ve tested both your HDD and your DRAM. There is very likely a software cause and fix for this.