Laptop overheating after doing the usual cleaning routine

laptopoverheatingtemperature

I have a Vaio VGN-CR353 laptop was given to me around September or October 2012 and I installed Ubuntu on it. I have already made it into a very personal laptop and installed games under wine (SC2, Frozen throne), several IDEs (Sublime Text 2, Eclipse, Netbeans) with no hitch… until last November.

Just so you know, I never touched the internals until the last week of November, when I determined that it was not software that was causing this problem.

Ubuntu reports that frequently hits the 95C or the 105C critical marks and automatically shutdown. I have already addressed the issue by:

  • Dusted-off the internals. Amazingly, it was very clean to begin with.
  • Removed very minor accumulations in the fan and sinks.
  • Reapplied thermal compound several times already, just incase I applied it wrong. Currently testing different application techniques. Also chose nano diamond to rule out shorting due to the compound.
  • Reseated the sinks tightly. Event bent up a bit the arms that hold the sink to ensure that the sinks are as tight as possible.
  • Made sure vents were clear
  • Bought a cooler
  • Elevated the laptop by buying larger "rubber feet". The laptop now sits at least 1 cm from a flat surface
  • Reinstalled different versions of Ubuntu since Linux kernels from 2.6 to 3.2 suffer an overheat issue. Currently on a 3.5 kernel (Lubuntu 12.10).

But still, after addressing these issues the overheat issue still exists. The overheating happens when:

  • I surf the net on any browser (Firefox, Chromium) even when flash plugin is not installed (And so Flash is not to blame)
  • I copied files to an external hard disk worth 39GB via the terminal. Unusually, it does not overheat when copied using the GUI.
  • Using Netbeans, event when just writing the code, not even compiling yet.
  • Randomly!
  • Even when I'm in the school computer lab which is crazy cold.
  • After a clean install of Windows

Limitations:

  • No BIOS settings for fan nor frequency settings for processors (It's Sony, what do you expect?)
  • lm-sensors don't detect fan sensors or any other sensors besides the CPU cores and motherboard, because Vaio laptops notoriously don't implement such.

I already installed lm-sensors and gkrellm to monitor the temperatures. I currently have view of both CPU cores, and ACPI temps. Unusually, I never saw them go beyond 60C. Currently, the latest readings in temperature range from 32C on fresh boot, 43C at room temperatures, 49C on moderate load (multi-tab surfing) and 53C when using Netbeans. It's quite weird that the temperatures fluctuate with great differences between each use.

Also, sometimes the system reports having reached the critical temps even when the laptop does not feel hot at all, like a while ago in the lab.

Until now, I am still waging this war with the laptop. Am I missing a vital routine that could turn the tables around and once and for all fix this issue? I am running out of ideas.

Update1:

Currently downloading drivers for another laptop via Firefox. CPU usage is 80% and 21% with temps of 58C and 51C on both cores. ACPI temperature at 60C and disk usage (write due to download) up to 205KB/s. Ram usage approx. 500MB. No overheating just yet.

Update2:

Just before running Prime95, I already tested installing and using Windows for a couple of days. Same thing happens on Windows. The only difference is that unlike Linux which shuts down the machine semi-properly, on Windows, it just turns off! It's like pulling the plug suddenly.

Therefore it's not a Linux issue.

Update3:

Managed to get hold of and run Prime95 on Linux. Amazingly, I could even push the laptop to 100% load on both cores, 100% memory use and reach ~90C stable and without going over (tested for like 10-15 mins) without overheating. I just wonder why the machine suddenly reports 95C and 105C.

Update4:

Dismantled the laptop for a thorough clean and then reassembled it. Nothing out of the ordinary, just a minor dust layer After that, I ran Prime95 for 30 mins to prove that the laptop can't overheat. It even tops at most 91*C, average at 85*C. It must be a faulty sensor.

Update5:
Finally ran a script that monitors temperatures in a log-graph, rather than just watching the current temps go up. Modified the script on this post to monitor the ACPI (as GKrellM labesl it), Cores and HDD temps on my rig per second. And then I used the laptop on different scenarios, like surfing, compiling code, low power mode, balanced and high modes.

Then an amazing discovery, the ACPI sensor skyrockets to critical in a split second! This event trips the OS thermal protection which shuts down the PC. I have a log of the temps (ACPI,Core1,Core2,HDD), and the Critical warning from /var/log/syslog. I also have a graph of the log I made. You can see that in this per-second log, it pops to a whopping 111 Celsius, out of it's range of 40-50. Not only that, there is virtually nothing that's causing it. As you can see in the log and graph, the HDD and cores are acting just fine. It's the ACPI that's gone wild.

By the way, the "ACPI" temps come from this path: /sys/class/thermal/thermal_zone0/temp

terminal check

graph check

Best Answer

It's been 3 months and finally pin-pointed the problem. It's a hardware problem and that spammy-looking ad-filled Indian site was right (won't post it here as it's a commercial entity), it's chip-level damage that's common to a number of Vaio laptops.

So the best and probably the only solution is to turn it over to your nearest service center for repairs. If it's under warranty, you're fine. If not, well, expect shelling out a few bucks for it. You might be better off buying a new notebook.


Anyways, I got another workaround and it's highly dangerous. I am only sharing this for purposes of informing that there is a way to get around it, but has its tradeoffs. This is not sound advice, just stating that it's possible.

This dangerous move involves disabling the ACPI critical trip point of the Linux. To do this, one must edit their grub file:

gksudo leafpad /etc/default/grub

And add thermal.nocrt=-1 to GRUB_CMDLINE_LINUX_DEFAULT as shown:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash thermal.nocrt=1"

Then update grub:

sudo update-grub

Then reboot.

This disables the ACPI critical trip point but not the thermal sensor, so that we can still monitor if afterwards.

After doing so, I ran my logger script. However, to compensate for the lack of a natural trip point handler, I set GKrellM to fire an action when the event happens. Since GKrellM is usually delayed, it's good for knowing that when it goes over the trip point, it has gone over it for a significant amount of time when I fire an action.

Then I went on with my usual routine. After doing so, the system tripped it again. However, it was a sudden spike, that it did not even register in GKrellM but my logger got it recorded. It's a very abrupt spike and that was it.

Related Question