Ubuntu – Ubuntu 20.04 – Shutdown after Overheating

20.04overheatingthinkpad

I use a Thinkpad L13. Now, I have thermal issues especially under full load. When I run my Python program which utilizes all cores, my laptop shuts down soon.

What have I tried so far? I installed TLP and thermald on my machine. Furthermore, I changed the Intel settings in BIOS to "Balanced".

Recently, two things took place:

  1. I had installed Ubuntu 20.04.

  2. Due to graphical issues with my ThinkPad, they had changed my mainboard recently. Maybe it's an hardware issue, like the cooler doesn't fit properly?

Before that, no problem occured.

The command grep -i -e temp -e therm /var/log/syslog* produces the following output on this occasion:

Apr 29 09:20:50 omikron systemd[1]: Started Daily Cleanup of Temporary Directories.
Apr 29 09:20:50 omikron systemd[1]: Starting Thermal Daemon Service...
Apr 29 09:20:50 omikron kernel: [    0.221560] mce: CPU0: Thermal monitoring enabled (TM1)
Apr 29 09:20:50 omikron kernel: [    0.376125] ACPI: \_SB_.PR00: _OSC native thermal LVT Acked
Apr 29 09:20:50 omikron kernel: [    0.539054] thermal_sys: Registered thermal governor 'fair_share'
Apr 29 09:20:50 omikron kernel: [    0.539055] thermal_sys: Registered thermal governor 'bang_bang'
Apr 29 09:20:50 omikron kernel: [    0.539056] thermal_sys: Registered thermal governor 'step_wise'
Apr 29 09:20:50 omikron kernel: [    0.539056] thermal_sys: Registered thermal governor 'user_space'
Apr 29 09:20:50 omikron kernel: [    0.539057] thermal_sys: Registered thermal governor 'power_allocator'
Apr 29 09:20:50 omikron kernel: [    0.725855] thermal LNXTHERM:00: registered as thermal_zone0
Apr 29 09:20:50 omikron kernel: [    0.725856] ACPI: Thermal Zone [THM0] (31 C)
Apr 29 09:20:50 omikron kernel: [    2.056100] proc_thermal 0000:00:04.0: enabling device (0000 -> 0002)
Apr 29 09:20:50 omikron kernel: [    2.147392] proc_thermal 0000:00:04.0: Creating sysfs group for PROC_THERMAL_PCI
Apr 29 09:20:50 omikron kernel: [    2.412750] thermal thermal_zone5: failed to read out thermal zone (-61)
Apr 29 09:20:50 omikron sensors[826]: temp1:            N/A
Apr 29 09:20:50 omikron sensors[826]: coretemp-isa-0000
Apr 29 09:20:50 omikron sensors[826]: temp1:         +1.0°C
Apr 29 09:20:50 omikron sensors[826]: temp2:         +1.0°C
Apr 29 09:20:50 omikron sensors[826]: temp3:         +4.0°C
Apr 29 09:20:50 omikron sensors[826]: temp4:         +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp5:       +121.0°C
Apr 29 09:20:50 omikron sensors[826]: temp6:       +121.0°C
Apr 29 09:20:50 omikron sensors[826]: temp7:         +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp8:         +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp9:        +64.0°C
Apr 29 09:20:50 omikron sensors[826]: temp10:        +3.0°C
Apr 29 09:20:50 omikron sensors[826]: temp11:       -80.0°C
Apr 29 09:20:50 omikron sensors[826]: temp12:        +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp13:        +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp14:        +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp15:        +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp16:        +0.0°C
Apr 29 09:20:50 omikron sensors[826]: temp1:        +48.0°C  (crit = +98.0°C)
Apr 29 09:20:50 omikron thermald[822]: [WARN]22 CPUID levels; family:model:stepping 0x6:8e:c (6:142:12)
Apr 29 09:20:50 omikron thermald[822]: [WARN]Polling mode is enabled: 4
Apr 29 09:20:50 omikron thermald[822]: [WARN]sensor id 10 : No temp sysfs for reading raw temp
Apr 29 09:20:50 omikron thermald[822]: message repeated 2 times: [ [WARN]sensor id 10 : No temp sysfs for reading raw temp]
Apr 29 09:20:50 omikron thermald[822]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
Apr 29 09:20:50 omikron thermald[822]: [WARN]error: could not parse file /etc/thermald/thermal-conf.xml
Apr 29 09:20:50 omikron thermald[822]: [WARN]sysfs open failed
Apr 29 09:20:50 omikron thermald[822]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
Apr 29 09:20:50 omikron thermald[822]: [WARN]error: could not parse file /etc/thermald/thermal-conf.xml
Apr 29 09:20:50 omikron systemd[1]: Started Thermal Daemon Service.
Apr 29 09:20:50 omikron thermald[822]: I/O warning : failed to load external entity "/etc/thermald/thermal-conf.xml"
Apr 29 09:20:50 omikron thermald[822]: [WARN]error: could not parse file /etc/thermald/thermal-conf.xml
Apr 29 09:21:04 omikron gsd-print-notif[1262]: Source ID 3 was not found when attempting to remove it
Apr 29 09:29:01 omikron kernel: [  493.759292] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759293] mce: CPU4: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759295] mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759296] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759298] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759299] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759300] mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759302] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759326] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.759327] mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 09:29:01 omikron kernel: [  493.760277] mce: CPU4: Core temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760278] mce: CPU0: Core temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760279] mce: CPU5: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760280] mce: CPU1: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760281] mce: CPU6: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760282] mce: CPU2: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760283] mce: CPU0: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760284] mce: CPU4: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760317] mce: CPU7: Package temperature/speed normal
Apr 29 09:29:01 omikron kernel: [  493.760318] mce: CPU3: Package temperature/speed normal
Apr 29 09:35:50 omikron systemd[1]: Starting Cleanup of Temporary Directories...
Apr 29 09:35:50 omikron systemd[1]: Finished Cleanup of Temporary Directories.
Apr 29 10:14:58 omikron kernel: [ 3250.661431] mce: CPU3: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 10:14:58 omikron kernel: [ 3250.661431] mce: CPU7: Core temperature above threshold, cpu clock throttled (total events = 1)
Apr 29 10:14:58 omikron kernel: [ 3250.661433] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661434] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661435] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661436] mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661437] mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661438] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661438] mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.661440] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 196)
Apr 29 10:14:58 omikron kernel: [ 3250.665320] mce: CPU3: Core temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665321] mce: CPU7: Core temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665322] mce: CPU2: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665323] mce: CPU0: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665324] mce: CPU4: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665325] mce: CPU5: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665325] mce: CPU6: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665326] mce: CPU1: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665327] mce: CPU7: Package temperature/speed normal
Apr 29 10:14:58 omikron kernel: [ 3250.665328] mce: CPU3: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.746988] mce: CPU4: Core temperature above threshold, cpu clock throttled (total events = 323)
Apr 29 10:20:05 omikron kernel: [ 3557.746989] mce: CPU0: Core temperature above threshold, cpu clock throttled (total events = 323)
Apr 29 10:20:05 omikron kernel: [ 3557.746991] mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.746992] mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.746993] mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.746994] mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.747022] mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.747023] mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.747025] mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.747026] mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 650)
Apr 29 10:20:05 omikron kernel: [ 3557.749589] mce: CPU4: Core temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749590] mce: CPU0: Core temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749591] mce: CPU7: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749591] mce: CPU3: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749592] mce: CPU0: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749593] mce: CPU4: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749625] mce: CPU5: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749626] mce: CPU1: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749627] mce: CPU6: Package temperature/speed normal
Apr 29 10:20:05 omikron kernel: [ 3557.749628] mce: CPU2: Package temperature/speed normal
Apr 29 10:23:09 omikron kernel: [ 3741.654959] thermal thermal_zone0: critical temperature reached (100 C), shutting down

EDIT (05/01/2020):

Today, I had a Zoom meeting and the laptop went hot such that it turned off during the meeting. This is not what should happen, right? What is going on here? I did not run a complicated computation here. Perhaps it has something to do with the power supply since I had put it in?


EDIT (05/09/2020):

I put the peformance settings to the maximum level and considered the same stress test as it is done in various temperature reviews of my notebook. On Windows, I get similar values as they do. Therefore, I think, it must be an issue with the new Ubuntu 20.04. Somehow, Ubuntu won't throttle the frequency such that the temperature would go down.


EDIT (07/19/2020):

I contacted the Lenovo support and they repaired my notebook (whatever they did). For a couple of weeks, it had worked fine. Now, I have the same issue again.

I've updated my BIOS version, which helps but comes with another issue: the cpu is throttling down to 400Mhz as soon as the temperature is near overheating. In result, my notebook is barely usable for demanding tasks.

As a possible solution, I deactivated Intel's turbo boost. The temperatures are now in tolerable ranges and everything works smoothly enough. That's a compromise I am willing to take.

Best Answer

A full diagnosis of Hardware+Software system is hard to perform via askubuntu in your case. Hardware issues are particularly difficult.

An alternative for a first step in the diagnosis may be provided by installing another OS side-by-side with your Ubuntu 20.04, and performing intensive testing as well.

You could run the same Python program (if you can configure it to use all cores). Even so, it might not be running under the exact same condition you see shutdowns. There are quite a few applications for testing performance out there, and they should be good enough (or even more stringent than your program). And it would not have any "contamination" from your possible Ubuntu 20.04 configuration.

Later on, when the full diagnosis is finished, you can get rid of this OS and reclaim the space for your Ubuntu.