Ubuntu – ThinkPad S440 wifi and unexpected system halt


Just a few days ago I installed new wifi driver and everything seemed to work fine on Lenovo ThinkPad S440. But now I experience a problem with wifi: the speed slows down to zero that no page could load in web browser. The second problem (not sure if they are connected but maybe) is system crashes unexpectedly. this is a pastebin of dmesg output so you could see what's happening. When the system crashes, X and all other programs crash. I can't even switch to tty and restart X. Can't be reproduced. Don't know how to show you the output I see on the screen.

I used a kernel boot flag acpi_os=Windows noapic on my Ubuntu 12.04 on Lenovo ThinkPad S440 because I suppose that if they've build it with Windows8+ in mind, it could help somehow, but I'm not sure. In this pastebin you can read all output from /var/log/syslog.

Update 2
Wat is this? I ran a memtest86 with several passes and it was fine, w/o errors, but:

[    0.000000] PM: Registered nosave memory: 00000000be97f000 - 00000000c2e7f000

repeats few times in a row.

[    5.170944] AMD IOMMUv2 driver by Joerg Roedel <joerg.roedel@amd.com>
[    5.170948] AMD IOMMUv2 functionality not available on this system
[    5.186546] ACPI Warning: 0x0000000000001828-0x000000000000182f SystemIO conflicts with Region \PMIO 1 (20121018/utaddress-251)
[    5.186556] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[    5.186560] ACPI Warning: 0x0000000000000830-0x000000000000083f SystemIO conflicts with Region \GPRL 1 (20121018/utaddress-251)
[    5.186564] ACPI Warning: 0x0000000000000830-0x000000000000083f SystemIO conflicts with Region \GPR_ 2 (20121018/utaddress-251)
[    5.186567] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[    5.186568] ACPI Warning: 0x0000000000000800-0x000000000000082f SystemIO conflicts with Region \GPRL 1 (20121018/utaddress-251)
[    5.186571] ACPI Warning: 0x0000000000000800-0x000000000000082f SystemIO conflicts with Region \GPR_ 2 (20121018/utaddress-251)
[    5.186574] ACPI Warning: 0x0000000000000800-0x000000000000082f SystemIO conflicts with Region \IO_D 3 (20121018/utaddress-251)
[    5.186577] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver

What is drm?

[   12.535066] [drm] GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
[   12.614067] fbcon: inteldrmfb (fb0) is primary device
[   13.805535] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[   14.129091] [drm:intel_dp_set_link_train] *ERROR* Timed out waiting for DP idle patterns
[   14.129093] [drm:i915_write32] *ERROR* Unknown unclaimed register before writing to 64040

This line can be result of acpi_os=Windows noapic kernel boot parameter, I guess:

[   14.189856] [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS

[   14.194131] ACPI Error: [\_SB_.PCI0.GFX0.DD02._BCL] Namespace lookup failure, AE_NOT_FOUND (20121018/psargs-359)
[   14.194139] ACPI Error: Method parse/execution failed [\_SB_.PCI0.RP05.PEGP.DD02._BCL] (Node ffff88012920ded8), AE_NOT_FOUND (20121018/psparse-537)

[  322.663766] [drm:i915_write8] *ERROR* Unknown unclaimed register before writing to 3b4

[  326.687401] [drm:i915_write32] *ERROR* Unclaimed write to 70030
[  326.689118] i915 0000:00:02.0: More than 8 outputs detected
[  326.894826] usb 2-7: reset full-speed USB device number 5 using xhci_hcd
[  326.904666] dpm_run_callback(): pnp_bus_resume+0x0/0x70 returns -19
[  326.904668] PM: Device 00:06 failed to resume: error -19
[  326.913169] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880124c65000
[  326.913171] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880124c65040
[  326.913172] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880124c65080
[  326.913173] xhci_hcd 0000:00:14.0: xHCI xhci_drop_endpoint called with disabled ep ffff880124c650c0

[  328.005875] [drm:intel_dp_set_link_train] *ERROR* Timed out waiting for DP idle patterns
[  328.005879] [drm:i915_write32] *ERROR* Unknown unclaimed register before writing to 64040

Update 3
So, how can I manage the case? Here is the snapshot of what i see after the crash.

Best Answer

Decided to move my comments to an asnwer, I don't have a complete answer, but at the very least I can help explain the messages you are seeing and point you in the right direction.

BIOS on modern computers is a HUGE legacy kludge. ACPI is a function of the BIOS. There are a lot of little chips and sensors that control all the small things on your computer, a little gpio chip dumping fan speed, another dumping temperature readings, etc. They are like little micro-controllers that handle all the little stuff and talk to hardware directly. All of these feed into an ACPI controller which is either its own chip or a part of a different chip. When people talk about the motherboard "chipset" this is part of the picture. These devices need a way to communicate with the larger system so that your OS (or BIOS) can correctly decide what it needs to do (thermal shutdown, raise fan speed, etc.). The easiest way is to just carve up a little bit of memory that the ACPI controller will read/write from, the details of exactly which block of memory is up to the BIOS/mobo designer, but thats irrelevant. Your ACPI driver will look up (or know) what that memory section is and write directly to it. Most of this stuff is completely transparent to you, as the user, only time it becomes an issue is if the driver, kernel, and bios do not agree on what is going on.

Memcheck is letting you know these little details, by telling you the following

[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cfff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d000-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000afc61fff] usable
[    0.000000] BIOS-e820: [mem 0x00000000afc62000-0x00000000afe63fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000afe64000-0x00000000be97efff] usable
[    0.000000] BIOS-e820: [mem 0x00000000be97f000-0x00000000c2e7efff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000c2e7f000-0x00000000c2f7efff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x00000000c2f7f000-0x00000000c2ffefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x00000000c2fff000-0x00000000c2ffffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000c3000000-0x00000000cf9fffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fe101000-0x00000000fe112fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed08000-0x00000000fed08fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed10000-0x00000000fed19fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000012f5fffff] usable

If you work out the math, the last line tells you that the last 6.5 GB of memory is free for the os to deal with as it wants (a memory address points to a byte of memory, the number is a hexadecimal 64 bit address so its easy to compute how many bytes are in the range). Upon closer examination you see that really this is just all the memory above the 4GB 32bit address limit up to the maximum address that the memory controller can deal with (even if your physical memory is less than this ~10GB limit). Above that in the list, you can see how lower 4GB of memory have some parts of it carved off by the BIOS for various reasons (primarily legacy), but the bulk of your lower 4GB is marked "usable". The OS reads this information and basically knows that it cannot use those sections of memory to map general OS functions and virtual memory. In the middle you have about 70MB of memory carved off for the ACPI controller.

Now, turning to your errors.

As the OS loads, various drivers are loaded up and run through some basic initializations and checks (verifying the device, turning it on, etc.). On a number of times, you get a complaint that there are conflicts between what certain (low level) system drivers are assuming is their memory chunk and what the OS thinks should be there. Combined with warning messages about not being able to determine the ACPI namespace for certain devices, tells me that there is a very large possibility that not everyone is on the same page about what needs to go where, Meaning that there is a potential that somebody will overwrite some pages of memory that they shouldn't or other mischief.

As for your crash message.

kthread is a generic name for a kernel process, a kernel process is started in kernel space and not user space, so its ability to cause mischief is greatly increased as they have direct access to the system memory, typically they represent driver daemons and other low-level kernel functions.

Your computer crashes out of a kthread with a tainted warning, meaning that the kernel has determined (through a variety of complicated algorithms) that the memory or the inputs the process is working with are not trustworthy, and rather than risk continuing on, it throws a kernel panic and crashes the system. Taint analysis and taint detection is a way to dynamically catch and prevent exploits from attackers, but in this case caught a kernel bug that is, most likely, related to your ACPI/wifi issue.

Finally, getting to the solution for your issues. It is hard to determine exactly what is causing mischief just from those logs and the snapshot, however its safe to say that the wifi driver is not working correctly, and your attempted ACPI fix only made the issue more complicated, as such I would recommend the following sequence of steps.

  1. I would revert whatever ACPI flags that you added to your boot command
  2. I would revert the wifi driver to what was working previously
  3. I would update the system kernel at least, an maybe the entire system.
  4. I would try to update the driver again
  5. If it doesn't work, then file a bug report on the driver
Related Question