GPU problem – Boot Hangs on Grey Screen

bootgpugraphicshardware

I found this according to my problem in this thread:
Boot hangs on grey screen (even when booting from USB drive with fresh OS X install)

My MacBook Pro 15" Early 2011 with AMD Radeon HD 6750M exhibited display corruption and associated system crashes/resets over a period of two weeks before it entirely failed to boot. The boot would progress through the grey screen with the Apple logo and spinner, but just when it seems it should have switched to the login screen the Apple logo and spinner would disappear and hang on a blank grey screen.

Initially I suspected hard-drive corruption and went about trying to remedy that. Unsuccessfully, I tried the following, with each continuing to hang as described above:

Safe boot
Boot into recovery (including Internet Recovery)
Boot from install media on USB drive
Boot from OS X installation on USB drive
Clear NVRAM
Reset SMC

I also ran the Apple Hardware Test many times without it finding any issues.

Verbose safe boot (Cmd+Shift+V) output everything that I'd expect to see but would then hang as described above.

After coming across more posts online at Apple's discussion forums of GPU related problems I revisited this as the cause:

2011 MacBook Pro and Discrete Graphics Card or
2011 MacBook Pro and Discrete Graphics Card

Attempting to boot Ubuntu from a USB flash drive, I could only get as far as GRUB. When trying to boot Ubuntu Desktop or run the graphicstest in GRUB the system would hang.

At this point, running Apple Hardware Test hung right before the end of the standard test, possibly [guessing] when doing a video test.

Based on the advice in the Apple Discussions posts above I did the following:

Boot into Single-User mode
Execute the following commands:

/sbin/fsck -fy /
/sbin/mount -uw /
mkdir /Disabled_System_Library_Extensions
cd /Disabled_System_Library_Extensions
mv /System/Library/Extensions/ATI* .
mv /System/Library/Extensions/AMD* .
touch /System/Library/Extensions
exit

This time the machine booted all the way through. However, graphics are extremely slow, even just transitions when minimizing windows. I will be taking my MBP to Apple to demand a replacement as the large number of reports of others facing similar issues makes it look like a recurrence of a similar GPU-related failure that resulted in them doing a recall.

But when I use the "mv" command the files won't get moved (neither deleted) and it shows me:
Sandbox deny(01) file-write-unlinked…

Any Solution ?

Best Answer

Background and explanations

Please read all of this post at least once from start to finish before taking any action.

All MacBook Pros from 2011 have a serious design defect. The thermal management and the generated heat together with the robustness of the discrete AMD graphics chips do not match up very well. Apple knew this and acted like a typical Soapy Smith, only reacting to this after an outrage. This scandal took on the name of RadeonGate. Only with a threatened class-action law suit Apple was finally pressured into offering a so called "Repair Extension Program".

The Apple Repair Extension program is not available anymore. The only real way to fix this problem is to replace the AMD chip alone. Not the logic board. Not "re-balling", not "reflowing", not "baking". Apple replaced a failed chip with a failing chip. Time and time again. Only replacing the graphics chip is still a costly hardware procedure for such a vintage laptop.

The only known way – that is: with software alone – to get a 2011 MacBook Pro (8,2) with 'only' a failed AMD graphics chip to almost reliably turn on again and boot into macOS and be quite useable with an accelerated GUI is this guide or a variation of it. Most previous tips just removed all AMD-kexts and this results in a horrible user experience with no GUI acceleration at all.

It is necessary to know your exact OS version. The following guide will be simpler for Yosemite but assumes El Capitan or newer. El Capitan, Sierra and High Sierra need SIP (System Integrity Protection) disabled. On previous systems (10.6–10.10) these steps are unnecessary.

Important: This guide assumes further that all kexts are still in their default location /System/Library/Extensions. Having all AMD-kexts there except one is beneficial for 'proper' operation. Previous hacks in this direction might have instructed you to move, or worse: remove all AMD*/ATI* kernel extensions. If that is the case: either move the kexts back into their default location or reinstall a system of your choice. Having most of the AMD kexts in place and then having the X3000-kext loaded with a delay will enable power management of the GPU which will otherwise burn electricity for nothing (and might hasten the final heat death of the chip on top of that). To reiterate: Only the file AMDRadeonX3000.kext really has be absent on boot to enable a successful startup, but all other (needed) AMD drivers should be in their default location and the X3000-kext loaded afterwards/delayed to get back into a realm of almost sensible power and temperature management.

Bypassing the discrete graphics chip

To get some display acceleration back it will be necessary to force the machine to not boot in discrete graphics (dGPU) but directly into integrated graphics (iGPU) and stay in this mode.

Booting into dGPU mode is the default on Macs with two switchable graphics cards. The procedure below will set an NVRAM variable that disables the dGPU and forces the system to only use the integrated Intel graphics even when booting.

The NVRAM variable is undocumented but appears to be universally applicable to all Macs with two switchable graphics cards. That means it should work on iMacs and MacBook Pros. Whether they have AMD or NVIDIA chips. The specifics about the drivers that might be necessary to move only cover AMD in this guide. But the NVRAM variable will bypass the discrete graphics chip in any case.

This will give you back your machine – but you will lose some features: e.g. the ability to drive an external display from the DisplayPort, a bit of 3D performance. Thunderbolt data connections should work.

In case this guide fails or is not wanted anymore: this procedure is pure software configuration and therefore fully reversible at any time with simple NVRAM reset.

The initial procedure:

Part 1: Disable SIP, disable dGPU, move one kernel extension

  1. To start from a clean slate: reset SMC and NVRAM:
    shutdown, unplug everything except power, now hold

    leftShift+Ctrl+Opt+Power
    and release all at the same time;

  2. Now power on again and hold

    Cmd+Opt+p+r
    at the same time until you hear the startup chime two times.

  3. Boot into Single User Recovery by holding

    Cmd+r+s

  4. Disable SIP: enter:

    csrutil disable

  5. disable dGPU on boot with setting the following variable:

    nvram fa4ce28d-b62f-4c99-9cc3-6815686e30f9:gpu-power-prefs=%01%00%00%00

  6. enable verbose boot mode:

    nvram boot-args="-v"

  7. reboot into Single User-mode by holding

    Cmd+s
    on boot

  8. mount root partition writeable

    /sbin/mount -uw /

  9. make a kext-backup directory

    mkdir -p /System/Library/Extensions-off

  10. only move ONE offending kext out of the way:

    mv /System/Library/Extensions/AMDRadeonX3000.kext /System/Library/Extensions-off/

  11. inform the system to update its kextcache:

    touch /System/Library/Extensions/

  12. reboot normally:

You should now have an iGPU accelerated display, but the system doesn't know how to power-management the failed AMD-chip. (In this state the GPU is always idling with relatively high power, consuming quite a bit of battery when unplugged and leading to GPU temperatures from 60°C upwards [on average 60-85°C], despite not being used for anything by system.)

Part 2: improve thermal and power management

For improved power management of the disabled GPU you have to either manually load the one crucial kext after boot by:

sudo kextload /System/Library/Extensions-off/AMDRadeonX3000.kext

If you have a temperature sensor application you might want to have it open before issuing the above command and watch the temps drop…

Automate this with the following LoginHook that will get executed after the next reboot:

sudo mkdir -p /Library/LoginHook   
sudo nano /Library/LoginHook/LoadX3000.sh

with the following content:

#!/bin/bash
kextload  /System/Library/Extensions-off/AMDRadeonX3000.kext
pmset -a force gpuswitch 0    # undocumented/experimental
exit 0

then make it*1 executable and active:

sudo chmod a+x /Library/LoginHook/LoadX3000.sh  
sudo defaults write com.apple.loginwindow LoginHook /Library/LoginHook/LoadX3000.sh 

*1: The undocumented use of this pmset command seems to improve sleep/wake/shutdown behaviour. If it doesn't, experiment with leaving it out.
See Disclaimer below. The following is just speculation: sleep/wake/shutdown may remain troublesome. The theory here is that "something slowly corrupts" what is saved in the SMC. Therefore, resetting the SMC and re-applying the variable hack seems to alleviate the situation for a time. (Permanent solutions for this welcome!) As short time workarounds you might want to try to avoid "lid-closing sleep", that seems to give more trouble than other methods (Apple-Menu, Keyboard-Shorcut). Apparent hangs on shutdown are usually just very long delays that will shutdown, eventually, cleanly and successfully.
Unscientific sampling suggests that Yosemite is worst for this and El Capitan and Sierra much better behaved in this regard.

Manually or otherwise delayed loading of this crucial kernel extension allows the system to handle power management a bit better. The battery will be less used and the temperatures emanating from the unused GPU will drop to a range significantly below 50°C (on average between 15-50°C).

For proper power management the minimal set of loaded kexts are on boot (versions for 10.12.6, check with kextstat | grep AMD):

com.apple.kext.AMDLegacySupport (1.5.1) 
com.apple.kext.AMD6000Controller (1.5.1)  
com.apple.kext.AMDSupport (1.5.1)
com.apple.kext.AMDLegacyFramebuffer (1.5.1) 

And if the above method of loading succeeded this should appear added to the list:

com.apple.AMDRadeonX3000 (1.5.1) 

A final step is to reboot once again into SingleUserRecovery
Do this with Cmd+r+s
after the commandline becomes active, enter:

 nvram boot-args="-v agc=0"   

and reboot normally.

This will cool down the dGPU a bit further.

It is imperative to issue this command from SingleUserRecovery since the system with SIP enabled will block your attempts to set this variable when booted from the normal boot volume, whether in normal full boot mode or regular SingleUser. It is important to note that therefor this step cannot be easily integrated into the force-iGPU.sh script (that you will be creating in a minute) and has to be repeated on its own after a NVRAM reset.

This last step assumes that SystemIntegretyProtection has been reenabled. But if SIP is intentionally and permanently kept off, then this step can be integrated in the force-iGPU.sh script above.
But since somehow I intended to keep SIP off permanently and it was turned back on without me noticing it, reliance on SIP staying "off" might be not the best approach. Clearing NVRAM, where SIP settings are stored, might be one such unforeseen disturbance.

Preventive measures for future use

There are two further caveats to know: This is reversible when the SMC/NVRAM is reset. If that happens the GPU-power-pref NVRAM-variable can or even has to be set again to force the use of the iGPU from boot-time.

Since this can happen quite easily (and is often erroneously recommended way too many times than it is actually useful), you should probably prepare for such a scenario and create a simple script to greatly speed up the process and also make entering the necessary variable much less error prone:

 sudo nano /force-iGPU-boot.sh

– Enter the following content to this file:

#/bin/sh
sudo nvram boot-args="-v"
sudo nvram fa4ce28d-b62f-4c99-9cc3-6815686e30f9:gpu-power-prefs=%01%00%00%00
exit 0

– Now make that executable:

sudo chmod a+x /force-iGPU-boot.sh

In the future, when the SMC/PRAM/NVRAM gets reset to default values it is now possible to boot into SingleUser with:

Cmd+s

– And after mounting your boot-volume read-write to execute just this single line:

sh /force-iGPU-boot.sh

Remember that the agc variable is now also cleared. (See above) Also, make sure you set the default boot-volume again in System Preferences> Startup Disk.

Part 3: Handling Updates from Apple

This setup has now one kext in a place Apple's installers do not expect. That is why in this guide SIP has not been reenabled. If an update that contains changes to the AMD drivers is about to take place it is advisable to move back the AMDRadeonX3000.kext to its default location before the update process. Otherwise the updater writes at least another kext of a different version to its default location or at worst you end up with an undefined state of partially non-matching drivers.

After any system update the folder /System/Library/Extensions has to be checked for the offending kext. Its presence there will lead to e.g. a boot hang on Yosemite and Sierra, an overheating boot-loop in High Sierra.

Upgrading to High Sierra 10.13: with this hack in place is almost straightforward: Despite applying a firmware update the installation process should not touch the NVRAM variable. The installation process also does not use a fully accelerated AMD chip but basic acceleration that is not problematic regarding this hack. However, as noted in the paragraph above the first boot into a system that is finished installing but just about to start the setup process will produce a heat/crash induced boot loop. The offending kernel extension has to be moved again like described above. (Starting at Step 3) After moving the kext, all shall be well.

Recent updates from Apple: Do not update before you have read the following.

Recent updates break the machine again. It updates the firmware, RecoveryPartition, seems to disable the possibility to boot into SingleUserRecoveryMode
and to top it off it installs – even with the DeltaUpdate – a working AMDRadeonX3000.kext!
Without preparation and with just the machine at hand you will be stuck a bit.

In case SingleUserRecoveryMode is gone for good, use regular RecoveryMode. Results are the same, it is just a little bit slower to boot into: the above procedure is still valid, and faster for all previous versions of Mac OS X/macOS.

But if you update to 10.13.6, or later:
Then you have to replace the instructions for SingleUserRecoveryMode (Command+r+s) with regular RecoveryMode (Command+r) and disable SIP via Terminal (Example for this precise use case).

In case you do belong to those where even regular RecoveryMode doesn't work as expected:
Workarounds for the inability to disable SIP with SingleUserRecovery:

  1. First, boot into single user recovery mode. csrutil edits are not allowed in this mode, but can set the gpu-power-prefs nvram property. This will help to reboot the machine in recovery mode. Then you have to replace the instructions for SingleUserRecoveryMode (Command+r+s) with regular RecoveryMode (Command+r) and disable SIP via Terminal (Example for this precise use case).

  2. Before you update, prepare a bootable volume. That can be an external disk or a stick. Any version that boots the machine will be fine. Such a drive can be created on another Mac.
    Keep in mind that on the external drive the AMDRadeonX3000.kext has to be (re)moved as well. Try booting from that drive. Only if that works as expected and you can mount your internal drive with it: reboot from your internal drive and proceed with the update of your internal drive/system to 10.13.6.
    After the update is nearly finished, one reboot will hang. Force a shutdown and reboot from your external drive. Mount the internal drive and move the Radeon.kext. SIP only protects the booted system.

  3. Suggested somewhere online, but really a desperate guess and untested: Instead of SingleUserRecoveryMode with Cmdrs you might try InternetRecoverySingleUserMode CmdOptrs. Alternatively, it might be worth a try to see whether SafeRecoveryMode works CmdShiftr.

Graphical recovery mode might not work too, as it did for me. However, in the last version of High Sierra, it is still possible to boot to single user recovery mode. It just needs good timing. The trick is to first enable recovery mode by pressing cmd+R and immediately after it is recognized the cmd+S command for the single user mode. The exact moment has to be figured out by the user. If cmd+R+S are pressed at the same time, only single user mode will be activated. If first cmd+R is pressed and cmd+S is pressed too late, graphical recovery mode is loaded. – TAKeanice ↵

Screen brightness keys not working in High Sierra?

Apple changed the way that keyboard events for changing screen brightness are handled in High Sierra. With this hack in place the keys will be functionless. One more reason to stay with Sierra. But with this hack you might also resort to using another software solution. Aside from hacking your own AppleScript solution you might want to try out ready made applications or apps.

For example Brightness Slider on the AppStore offers customizable keyboard shortcuts.


To avoid these crashes/hangs/boot-loops – which are never a good idea for your filesystem – on a fresh install or an upgrade: make sure to babysit the installation process and always boot into SafeMode (keep Shift pressed during boot-ups until the kext is moved into a safe place –– the installation should proceed just fine in SafeMode.

Closing Remarks and Recommendations

Further: this laptop is overheating, no matter what you do. The cooling system is inadequate and the huge number of failing AMD chips are just proof of that.

To prolong the life of this now hacked machine it is advisable to abstain from really heavy lifting over prolonged stretches of time. Strictly follow the usual recommendations for laptops: use on hard surfaces, keep the fans and fins inside it clean. Using any fancontrol software with relatively aggressive settings should also help: like smcFanControl, MacsFanControl, or TGPro (both commercial).

Disclaimer: This whole procedure is no magic bullet. The state of failure for these chips not 100% predictable. Very few users have issues even with this hack in place: there might be issues with rebooting, going to sleep or waking up properly, most of them coming from users with Yosemite, the least trouble seems to be on Sierra. In these cases it seems sometimes necessary to not use the AMDRadeonX3000.kext, and therefore also not the LoginHook from Part 3. (But see the additional note under *1 above.) High Sierra users report issues with their displays backlight adjustment. So currently, the sweet spot for choice of operating system is in my view 10.12 Sierra.

In some cases, even with all these measures in place it seems that the still functioning Thunderbolt port will cause some issues if any peripherals are attached and active when the machine goes to sleep. After this happens any subsequent sleep cycle might be affected and an NVRAM reset with subsequent variable setting dance outlined above will be necessary, again. It seems advisable in such cases to either prevent machine sleep or unplug any hardware on the Thunderbolt port before letting the machine sleep.

Within the restrictions outlined at the start of this answer: Most users report complete success.


Hardware mods/hacks

Several ways available now, some bad, some good.

Bad solution: A very cheap hardware modification is available at/from RealMacMods: While they use a relatively complicated way of setting the necessary EFI variable with linux the following has the advantage of cutting the core voltage to the dGPU completely by removing just one tiny resistor! (Pictures at the link)

On this reboot it is essential that you boot once into safe mode (hold Shift throughout boot), and then choose shutdown(not restart) from the menu.
Do this safe boot with the R8911 resistor in place. Without this SAFE BOOT, the next steps may not work.
Do no more boots until you complete the next steps.
The safe boot clears OS level GPU preferences, that may interfere with the following process.
This will now cause your MacBook Pro to stop switching to the Radeon automatically, but it will still draw power, create heat, and be visible to the OS.
We discovered that simply removing 1 resistor will resolve this.
The resistor can also be replaced with a switch, in case you need to turn your radeon back on for any reason.
The placement of this resistor varies between logic board models.
The resistor in question is R8911 on the 17″ MBP and R8911 on the 15″ MBP a 1 Ohm resistor that provides a current path to the ISL6263C DC to DC Converter.
This resistor controls power to the Voltage Regulator that provides the Core Voltage to the Radeon GPU. Simply put, no core voltage, no GPU. You will find the resistor just to the right of a cooling fan (in the above orientation). It will be near the ISL Voltage converter chip. This is the chip we will be disabling.
Just remove it. The preferred method is a professional reflow station, but an iron and a steady hand will get you where you need to be. If you used flux to remove it (not needed), make sure you clean up with a little Alcohol or other suitable solvent.
That is basically it. Next time you boot up you will notice your GPU defect issue is gone, and you will no longer see the AMD GPU as installed hardware.

I have not tested this but it should eliminate any need to care for the kexts and also solve any issues regarding sleep, wakeup, hibernation, reboot etc.
Caveat for considering this method: since it also seems to rely on having this NVRAM variable set it looks like it is likely absolutely essential to have a fully automated method in place to set this variable without any user intervention. (Like a linux stick that makes the necessary changes) Otherwise an NVRAM reset might practically brick the machine. The vendor suggests to protect from accidental NVRAM esets via password and if a reset was made to call them for a 'procedure'.

(If bitten by this method, ending up with just a black screen: it seems possible to remotely access the machine with VNC or ssh, so if these are setup beforehand, it may indeed a not so bad option after all, as the nvram variable can bet set in this way. Remember: Untested internet story.)

Permanent, reliable and cheap hardware solution!

Dosdude1 has found a solution that looks like the holy grail for this problem: Permanently Disable 2011 15"/17" MacBook Pro Dedicated GPU - gMux IC Bypass

  • Option A, which will be detailed below, is to hard-wire the LVDS output lines from the integrated graphics LVDS output lines straight to the lines connecting to the display.
  • Option B would be to re-program the gMux IC (which is simply a Lattice LFXP2 micro-controller), with a custom firmware to disable the GPU switching functionality. I may experiment with this in the future, but doing so requires special hardware that I don't have. This would, of course, be the optimal solution, though.

This is almost easy. All that's needed are various lengths of wire. To get a glimpse: enter image description here Also on youtube!

The 'bad solution' from above is now made into an almost professional and pre-made hardware solution, eliminating the previous 'badness' of that approach:

Tiresias (the GPUkiller): The Tiresias is a small board that can be soldered onto the motherboard of the MacBook Pro 15-inch or 17-inch 2011 (Early or Late) models.
These are all the models that have the 820-2914-A, 820-2914-B, 820-2915-A or 820-2915-B motherboard.
The 820-2914 and 820-2915 board has two GPUs. The internal (Intel) GPU that is part of the PCH, and an external (discrete) AMD GPU. It is the external GPU that fails in 'a small percentage of MacBook Pro systems' (Apple-speak for: 'very many'). The Tiresias writes the 'gpu-power-prefs' nvram-variable to the ROM so that the Mac does not use the (dead) external (discrete) AMD GPU anymore. If the user clears the NVRAM (PRAM) there is no problem as the Tiresias will write the record again, and the Mac will work again.
This makes this the ideal solution to bring a 820-2914 or 820-2915 with dead GPU back to life. Installation is easy (no wires to solder). You will need to mount a very small board onto the motherboard. An experienced technician can do this in minutes. Other than that R8911 should be removed to turn off power to the dead GPU. This saves energy, generating less heat and conserving battery life. Removing R8911 also prevents the Mac from getting confused by the dead GPU because even with the GPU turned off it will still try to talk to the dead GPU. Depending on what internal contacts in the GPU are ruptured this might confuse or even crash the Mac.
Mac OS X 10.13 High Sierra is also supported. To solve the issue of the backlight not coming back on after sleep also remove R9704 and connect R9704 pin 2 to C9711 pin 1.

OS X 10.6 - 10.12 (Sierra)
The backlight slider (in System Preferences) and the backlight keys (F1 and F2) work. System sleep works. Video out on the Thunderbolt port does not work, but all other functions of the Thunderbolt port do work.

OS X 10.13 (High Sierra)
As far as we know 10.13 (High Sierra) offers no advantages over 10.12 (Sierra). Apple totally redid the video drivers in High Sierra, and seem to have made a mess of it. The backlight controls will not work. And worse, after the machine wakes up from sleep the backlight is not turned back on at all.
To solve the issue of the backlight not coming back on after sleep remove R9704 and connect R9704 pin 2 to C9711 pin 1. This sets the backlight to full brightness. The down-side is that with this modification the brightness will also be on full brightness with the oldes OS's.
Tiresias for 820-2915 (15-Inch) Quantity one (1) Including shipping (world wide) 60 EURO.

enter image description here enter image description here enter image description here


Update for a one-stop software solution

The procedure above seems to have been cast into an application related to the hardware hack! Well, at least parts of it. But on the other hand this application is more universal than the soution above, as it seems to handle also NVidia cards, that is: it is for disabling all discrete CPUs in all Macs.

Alas, this app made by dosdude1 is not well documented. The readme screen says that it would set the NVRAM variable, move all graphic acceleration drivers, and then install a launchdaemon to handle updates and ensure that the variable stays set.

Not tested by me and not endorsed by me if you already have followed the procedure outlined above!
But if the procedure didn't work at some point for you or seems just to daunting to begin with, then you might try this:

dosdude1: Other, undocumented software that I've written is stored here: MacBook Pro dGPU Disabler.zip

You might have to look into the procedure above again, as the application seems to miss out on improving the thermal management part (if you modify the hardware by removing the transistor this becomes mood: mix and match).
If someone tests this out, please give feedback here via comments or an edit.


Update 2019: 20$ solution that uses a 64bit Windows computer and a Lattice HW-USBN-2A ICSP FPGA programmer to apply a custom firmware to the gMux IC. Dosdude1 claims this to be 'perfect' solution, meaning that even under HighSierra and Mojave battery life, temperature, brightness control and wake/sleep work as expected. Using that solution is permanent and makes everything above obsolete.
But this new solution is not free and requires hardware in the form of a Windows PC and the programmer); as well as currently soldering a few wires temporarily to the logic board.)

[Update June 2020]: This gMUX Bypass with native brightness control is a cheaper (opensource) version of the Dosdude1 hardware hack. GitHub: gMUXBypass