Ubuntu – Ath10k and QCA6174 causing PCIe errors, firmware crashes, and connection drops

18.04atherosdriversfirmwarewireless

I recently (re)installed Ubuntu 18.04 on my Razer Blade Pro (2017). My wireless card is performing extremely poorly, and frequently dropping connection. Inspecting dmesg for Atheros messages yields the following (nasty-looking) crash:

[ 6709.200017] ath10k_pci 0000:3c:00.0: firmware crashed! (guid 01e29e97-0ee6-4538-8756-764abe49705f)
[ 6709.200048] ath10k_pci 0000:3c:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1a56:1535
[ 6709.200056] ath10k_pci 0000:3c:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0
[ 6709.201666] ath10k_pci 0000:3c:00.0: firmware ver WLAN.RM.4.4.1-00079-QCARMSWPZ-1 api 6 features wowlan,ignore-otp crc32 fd869beb
[ 6709.202773] ath10k_pci 0000:3c:00.0: board_file api 2 bmi_id N/A crc32 20d869c3
[ 6709.202784] ath10k_pci 0000:3c:00.0: htt-ver 3.47 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[ 6709.204809] ath10k_pci 0000:3c:00.0: firmware register dump:
[ 6709.204822] ath10k_pci 0000:3c:00.0: [00]: 0x05030000 0x000015B3 0x009E6FD4 0x00955B31
[ 6709.204830] ath10k_pci 0000:3c:00.0: [04]: 0x009E6FD4 0x00060730 0x0000001D 0x00473AD4
[ 6709.204838] ath10k_pci 0000:3c:00.0: [08]: 0x0049C59C 0x0044DEB4 0x004290B0 0x00449AB0
[ 6709.204847] ath10k_pci 0000:3c:00.0: [12]: 0x00000009 0xFFFFFFFF 0x00952F6C 0x00952F77
[ 6709.204854] ath10k_pci 0000:3c:00.0: [16]: 0x00952CC4 0x0091080D 0x00000000 0x0091080D
[ 6709.204862] ath10k_pci 0000:3c:00.0: [20]: 0x409E6FD4 0x0040E818 0x00405820 0x0049C464
[ 6709.204870] ath10k_pci 0000:3c:00.0: [24]: 0x809E9395 0x0040E878 0x0049C6E8 0xC09E6FD4
[ 6709.204879] ath10k_pci 0000:3c:00.0: [28]: 0x80932EF9 0x0040EA68 0x0040A054 0x00000009
[ 6709.204887] ath10k_pci 0000:3c:00.0: [32]: 0x809F8C46 0x0040EA98 0x0041201C 0x00000004
[ 6709.204894] ath10k_pci 0000:3c:00.0: [36]: 0x80911210 0x0040EAC8 0x00000005 0x004040F4
[ 6709.204902] ath10k_pci 0000:3c:00.0: [40]: 0x80911154 0x0040EB28 0x00400000 0x00000000
[ 6709.204910] ath10k_pci 0000:3c:00.0: [44]: 0x8091122D 0x0040EB48 0x00000000 0x00400600
[ 6709.204922] ath10k_pci 0000:3c:00.0: [48]: 0x40910024 0x0040EB78 0x0040AB98 0x0040AB98
[ 6709.204930] ath10k_pci 0000:3c:00.0: [52]: 0x00000000 0x0040EB98 0x009BB001 0x00040020
[ 6709.204938] ath10k_pci 0000:3c:00.0: [56]: 0x809EDA21 0x0040E938 0x00499F10 0x00000000
[ 6709.204944] ath10k_pci 0000:3c:00.0: Copy Engine register dump:
[ 6709.204967] ath10k_pci 0000:3c:00.0: [00]: 0x00034400  14  14   3   3
[ 6709.204990] ath10k_pci 0000:3c:00.0: [01]: 0x00034800  17  17 510 511
[ 6709.205012] ath10k_pci 0000:3c:00.0: [02]: 0x00034c00   5   5  68  69
[ 6709.205034] ath10k_pci 0000:3c:00.0: [03]: 0x00035000  27  27  29  27
[ 6709.205057] ath10k_pci 0000:3c:00.0: [04]: 0x00035400 131 131 131  67
[ 6709.205079] ath10k_pci 0000:3c:00.0: [05]: 0x00035800   0   0  64   0
[ 6709.205101] ath10k_pci 0000:3c:00.0: [06]: 0x00035c00  26  26  24  24
[ 6709.205123] ath10k_pci 0000:3c:00.0: [07]: 0x00036000   1   1   1   1
[ 6710.053042] ath10k_pci 0000:3c:00.0: Unknown eventid: 118809
[ 6710.056101] ath10k_pci 0000:3c:00.0: Unknown eventid: 90118
[ 6710.153420] ath10k_pci 0000:3c:00.0: device successfully recovered

There are also the following entries related to the wireless card:

[ 7403.617792] pcieport 0000:00:1c.6: AER: Corrected error received: id=00e6
[ 7403.617797] pcieport 0000:00:1c.6: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=00e6(Transmitter ID)
[ 7403.617800] pcieport 0000:00:1c.6:   device [8086:a116] error status/mask=00001000/00002000
[ 7403.617802] pcieport 0000:00:1c.6:    [12] Replay Timer Timeout 

The lspci output of the card is as follows:

3c:00.0 Network controller: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter (rev 32)
    Subsystem: Bigfoot Networks, Inc. QCA6174 802.11ac Wireless Network Adapter
    Flags: bus master, fast devsel, latency 0, IRQ 145
    Memory at dc200000 (64-bit, non-prefetchable) [size=2M]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable+ Count=1/8 Maskable+ 64bit-
    Capabilities: [70] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [148] Virtual Channel
    Capabilities: [168] Device Serial Number 00-00-00-00-00-00-00-00
    Capabilities: [178] Latency Tolerance Reporting
    Capabilities: [180] L1 PM Substates
    Kernel driver in use: ath10k_pci
    Kernel modules: ath10k_pci

-[0000:00]-+-00.0  Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
           +- ...
           +-1c.0-[02-3a]--
           +-1c.4-[3b]----00.0  ...
           +-1c.6-[3c]----00.0  Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter
           +-1d.0-[3d]----00.0  ...
           +- ...

Loading the card (at boot) shows the following dmesg output:

[   29.432791] ath10k_pci 0000:3c:00.0: enabling device (0000 -> 0002)
[   29.433628] ath10k_pci 0000:3c:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 reset_mode 0
[   29.721996] ath10k_pci 0000:3c:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:3c:00.0.bin failed with error -2
[   29.722023] ath10k_pci 0000:3c:00.0: Direct firmware load for ath10k/cal-pci-0000:3c:00.0.bin failed with error -2
[   29.725059] ath10k_pci 0000:3c:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1a56:1535
[   29.725061] ath10k_pci 0000:3c:00.0: kconfig debug 0 debugfs 1 tracing 1 dfs 0 testmode 0
[   29.725481] ath10k_pci 0000:3c:00.0: firmware ver WLAN.RM.4.4.1-00079-QCARMSWPZ-1 api 6 features wowlan,ignore-otp crc32 fd869beb
[   29.791271] ath10k_pci 0000:3c:00.0: board_file api 2 bmi_id N/A crc32 20d869c3
[   30.386364] ath10k_pci 0000:3c:00.0: Unknown eventid: 118809
[   30.389342] ath10k_pci 0000:3c:00.0: Unknown eventid: 90118
[   30.389967] ath10k_pci 0000:3c:00.0: htt-ver 3.47 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1
[   30.471606] ath: EEPROM regdomain: 0x6c
[   30.471606] ath: EEPROM indicates we should expect a direct regpair map
[   30.471607] ath: Country alpha2 being used: 00
[   30.471608] ath: Regpair used: 0x6c
[   30.475073] ath10k_pci 0000:3c:00.0 wlp60s0: renamed from wlan0
[   31.698248] ath10k_pci 0000:3c:00.0: Unknown eventid: 118809
[   31.701166] ath10k_pci 0000:3c:00.0: Unknown eventid: 90118

Notably, my system does not have hw3.2 under /lib/firmware/ath10k/QCA6174. I have version 1.173.1 of linux-firmware installed, and no proprietary drivers seem to be available for my wireless card. Mandatory AIO script results are available on Pastebin.

After a crash of my wireless card, I can generally restore connectivity by togging WiFi off and then back on in the GNONE menu, but this is annoying to do whenever my wireless crashes (which takes anywhere from a few minutes to a few hours from last crash to happen). This worked fine in 16.04 HWE before I had to uninstall Linux, so I'm not really certain why 18.04 would bring a whole new host of problems, but apparently they exist now.

I'm assuming this is a kernel-related bug (although I have yet to file a report on this), but I would like to know if there are any workarounds present to make my wireless connection last longer than ten minutes and/or stop the PCIe Bus Error messages from cluttering my syslog.

Short of replacing my wireless card, and waiting for an official fix, what can I do to improve wireless performance (and stop the crashes)?

Best Answer

Warning: This is only a partial solution!

While the main issue (the wifi drop and crash) appears to be solved, the AER Corrected Error message still spams the logs. At least wifi is more consistent now.

Bernard Wei's comment led to the repository for ath10k firmware, which conveniently included an update for the hw3.0 chain.

Downloading firmware-6.bin_WLAN.RM.4.4.1-00110-QCARMSWP-1 and replacing firmware-6.bin in /lib/firmware/ath10k/QCA6174/hw3.0 followed by a reboot brought a far more stable wireless experience.

cd /lib/firmware/ath10k/QCA6174/hw3.0
sudo mv firmware-6.bin firmware-6.bin.old
sudo wget https://github.com/kvalo/ath10k-firmware/raw/master/QCA6174/hw3.0/4.4.1/firmware-6.bin_WLAN.RM.4.4.1-00110-QCARMSWP-1 -O firmware-6.bin

Note, however, that the following lines are now in the syslog:

[   21.482256] ath10k_pci 0000:3c:00.0: Unknown eventid: 3
[   21.498398] ath10k_pci 0000:3c:00.0: Unknown eventid: 118809
[   21.501401] ath10k_pci 0000:3c:00.0: Unknown eventid: 90118

Now... to wait for this to hit the linux-firmware package for real. And also fix the AER errors...

Related Question