Linux – PCIe Bus Error filling logs FAST

linux-mintlogsnvidia

Something is wrong with my system and this error is showing up in syslog and kern.log at the rate of several thousand times per second. The fact that it's listed as corrected makes me think it's ephemeral and nothing is wrong, but the fact that it is showing up with such an obscene frequency filling up root is problematic (have you seen a 250+ GB kern.log?).

pcieport 0000:00:03.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, id=0018(Receiver ID)
pcieport 0000:00:03.0:   device [8086:2f08] error status/mask=00000001/00002000
pcieport 0000:00:03.0:    [ 0] Receiver Error         (First)
pcieport 0000:00:03.0: AER: Multiple Corrected error received: id=0018

Sometimes, there is another line saying can't find device of ID0018 thrown in the mix, too. Beyond these log entries filling up root, there are no other symptoms. The system behaves fine for what minimal browsing/video playing/encoding I do with it. Graphics outputs to the 4k display over HDMI with nvidia-343 drivers.

All I can really tell from that is the "device [8086:2f08]" is one of the root PCIe hubs off the CPU. The GPU is the only PCIe device I have plugged in, but I don't know if any of the on-board features on the motherboard may be off the PCIe bus, too.

System info:

mnemosyne ~ # inxi -Fxz
System:    Host: mnemosyne Kernel: 3.13.0-24-generic x86_64 (64 bit, gcc: 4.8.2) Console: tty 5 Distro: Linux Mint 17 Qiana
Machine:   System: ASUS product: All Series
           Mobo: ASUSTeK model: X99-A version: Rev 1.xx Bios: American Megatrends version: 0216 date: 08/29/2014
CPU:       Hexa core Intel Core i7-5820K CPU (-HT-MCP-) cache: 15360 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 40398.7
           Clock Speeds: 1: 1200.00 MHz 2: 1200.00 MHz 3: 1200.00 MHz 4: 3301.00 MHz 5: 1200.00 MHz 6: 1200.00 MHz 7: 3301.00 MHz 8: 1200.00 MHz 9: 1200.00 MHz 10: 1200.00 MHz 11: 1200.00 MHz 12: 1200.00 MHz
Graphics:  Card: NVIDIA GM107 [GeForce GTX 750] bus-ID: 01:00.0
           X.org: 1.15.1 drivers: nvidia (unloaded: fbdev,vesa,nouveau) tty size: 175x51 Advanced Data: N/A out of X
Audio:     Card: NVIDIA Device 0fbc driver: snd_hda_intel bus-ID: 01:00.1 Sound: ALSA ver: k3.13.0-24-generic
Network:   Card: Intel Ethernet Connection (2) I218-V driver: e1000e ver: 2.3.2-k port: f020 bus-ID: 00:19.0
           IF: eth0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Drives:    HDD Total Size: 36519.2GB (13.7% used) 1: id: /dev/sda model: WDC_WD60EFRX size: 6001.2GB
           2: id: /dev/sdb model: WDC_WD60EFRX size: 6001.2GB 3: id: /dev/sdc model: WDC_WD60EFRX size: 6001.2GB
           4: id: /dev/sdd model: WDC_WD60EFRX size: 6001.2GB 5: id: /dev/sde model: Crucial_CT512MX1 size: 512.1GB
           6: id: /dev/sdf model: WDC_WD60EFRX size: 6001.2GB 7: id: /dev/sdg model: WDC_WD60EFRX size: 6001.2GB
Partition: ID: / size: 454G used: 342G (80%) fs: ext4 ID: swap-1 size: 17.08GB used: 0.93GB (5%) fs: swap
RAID:      No RAID devices detected - /proc/mdstat and md_mod kernel raid module present
Sensors:   System Temperatures: cpu: 32.0C mobo: N/A
           Fan Speeds (in rpm): cpu: N/A
Info:      Processes: 318 Uptime: 9 days Memory: 4353.6/15950.5MB Runlevel: 2 Gcc sys: 4.8.2 Client: Shell inxi: 1.8.4

Any suggestions as to what could be causing this, or any way to narrow down the options would be greatly appreciated. This is all the computer hardware I own, so swapping parts isn't an option.

Best Answer

you can determine the device attached to this root port with the command

lspci -v -s 3.0 | grep Bus:

you should see a line something like this:

Bus: primary=00, secondary=04, subordinate=04, sec-latency=0

the secondary and subordinate are often the same, so you could then use the command

lspci -s 4:0

to see what devices are on that bus. for my system here, it looks like this:

lspci -v -s 4:0            
04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)
04:00.1 IDE interface: Marvell Technology Group Ltd. 88SE912x IDE Controller (rev 11)
Related Question