TCP – Troubleshooting TCP Issues on a Linux Laptop

debiannetworkingtcp

Once in several days I have the following problem. My laptop (Debian testing)
suddenly becomes unable to work with TCP connections to the internet.

The following things continue working fine:

  • UDP (DNS), ICMP (ping) — I get instant response
  • TCP connections to other machines in the local network (e.g. I can ssh to a
    neighbour laptop)
  • everything is ok for other machines in my LAN

But when I try TCP connections from my laptop, they time out (no response
to SYN packets). Here's a typical curl output:

% curl -v google.com     
* About to connect() to google.com port 80 (#0)
*   Trying 173.194.39.105...
* Connection timed out
*   Trying 173.194.39.110...
* Connection timed out
*   Trying 173.194.39.97...
* Connection timed out
*   Trying 173.194.39.102...
* Timeout
*   Trying 173.194.39.98...
* Timeout
*   Trying 173.194.39.96...
* Timeout
*   Trying 173.194.39.103...
* Timeout
*   Trying 173.194.39.99...
* Timeout
*   Trying 173.194.39.101...
* Timeout
*   Trying 173.194.39.104...
* Timeout
*   Trying 173.194.39.100...
* Timeout
*   Trying 2a00:1450:400d:803::1009...
* Failed to connect to 2a00:1450:400d:803::1009: Network is unreachable
* Success
* couldn't connect to host
* Closing connection #0
curl: (7) Failed to connect to 2a00:1450:400d:803::1009: Network is unreachable

Restarting the connection and/or reloading the network card kernel module doesn't help.
The only thing that helps is reboot.

Clearly something is wrong with my system (everything else works fine), but I
have no idea what exactly.

My setup is a wireless router that is connected to the ISP via PPPoE.

Any advice?

Answers to comments

What NIC is it?

12:00.0 Network controller: Broadcom Corporation BCM4313 802.11b/g/n Wireless LAN Controller (rev 01)
  Subsystem: Dell Inspiron M5010 / XPS 8300
  Flags: bus master, fast devsel, latency 0, IRQ 17
  Memory at fbb00000 (64-bit, non-prefetchable) [size=16K]
  Capabilities: [40] Power Management version 3
  Capabilities: [58] Vendor Specific Information: Len=78 <?>
  Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
  Capabilities: [d0] Express Endpoint, MSI 00
  Capabilities: [100] Advanced Error Reporting
  Capabilities: [13c] Virtual Channel
  Capabilities: [160] Device Serial Number 00-00-9d-ff-ff-aa-1c-65
  Capabilities: [16c] Power Budgeting <?>
  Kernel driver in use: brcmsmac

What is the state of your NIC when the problem occurs?

iptables-save prints nothing.

ip rule show:

0:  from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default 

ip route show table all:

default via 192.168.1.1 dev wlan0 
192.168.1.0/24 dev wlan0  proto kernel  scope link  src 192.168.1.105 
broadcast 127.0.0.0 dev lo  table local  proto kernel  scope link  src 127.0.0.1 
local 127.0.0.0/8 dev lo  table local  proto kernel  scope host  src 127.0.0.1 
local 127.0.0.1 dev lo  table local  proto kernel  scope host  src 127.0.0.1 
broadcast 127.255.255.255 dev lo  table local  proto kernel  scope link  src 127.0.0.1 
broadcast 192.168.1.0 dev wlan0  table local  proto kernel  scope link  src 192.168.1.105 
local 192.168.1.105 dev wlan0  table local  proto kernel  scope host  src 192.168.1.105 
broadcast 192.168.1.255 dev wlan0  table local  proto kernel  scope link  src 192.168.1.105 
fe80::/64 dev wlan0  proto kernel  metric 256 
unreachable default dev lo  table unspec  proto kernel  metric 4294967295  error -101 hoplimit 255
local ::1 via :: dev lo  table local  proto none  metric 0 
local fe80::1e65:9dff:feaa:b1f1 via :: dev lo  table local  proto none  metric 0 
ff00::/8 dev wlan0  table local  metric 256 
unreachable default dev lo  table unspec  proto kernel  metric 4294967295  error -101 hoplimit 255

All of the above is the same when the machine works in normal mode.

ifconfig — I ran it, but somehow forgot to save before rebooting. Will have to
wait till the next time the problem occurs. Sorry about that.

Any QoS in place?

Probably not — at least I haven't done anything specifically to enable it.

Have you tried sniffing the traffic actually sent on the interface?

I ran curl and tcpdump several times, and there were two patterns.

The first is just SYN packets without answers.

17:14:37.836917 IP (tos 0x0, ttl 64, id 4563, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.105.42030 > fra07s07-in-f102.1e100.net.http: Flags [S], cksum 0x27fc (incorrect -> 0xbea8), seq 3764607647, win 13600, options [mss 1360,sackOK,TS val 33770316 ecr 0,nop,wscale 4], length 0
17:14:38.836650 IP (tos 0x0, ttl 64, id 4564, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.105.42030 > fra07s07-in-f102.1e100.net.http: Flags [S], cksum 0x27fc (incorrect -> 0xbdae), seq 3764607647, win 13600, options [mss 1360,sackOK,TS val 33770566 ecr 0,nop,wscale 4], length 0
17:14:40.840649 IP (tos 0x0, ttl 64, id 4565, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.105.42030 > fra07s07-in-f102.1e100.net.http: Flags [S], cksum 0x27fc (incorrect -> 0xbbb9), seq 3764607647, win 13600, options [mss 1360,sackOK,TS val 33771067 ecr 0,nop,wscale 4], length 0

The second is this:

17:22:56.507827 IP (tos 0x0, ttl 64, id 41583, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.105.42036 > fra07s07-in-f102.1e100.net.http: Flags [S], cksum 0x27fc (incorrect -> 0x2244), seq 1564709704, win 13600, options [mss 1360,sackOK,TS val 33894944 ecr 0,nop,wscale 4], length 0
17:22:56.546763 IP (tos 0x58, ttl 54, id 65442, offset 0, flags [none], proto TCP (6), length 60)
    fra07s07-in-f102.1e100.net.http > 192.168.1.105.42036: Flags [S.], cksum 0x6b1e (correct), seq 1407776542, ack 1564709705, win 14180, options [mss 1430,sackOK,TS val 3721836586 ecr 33883552,nop,wscale 6], length 0
17:22:56.546799 IP (tos 0x58, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.105.42036 > fra07s07-in-f102.1e100.net.http: Flags [R], cksum 0xf301 (correct), seq 1564709705, win 0, length 0
17:22:58.511843 IP (tos 0x0, ttl 64, id 41584, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.105.42036 > fra07s07-in-f102.1e100.net.http: Flags [S], cksum 0x27fc (incorrect -> 0x204f), seq 1564709704, win 13600, options [mss 1360,sackOK,TS val 33895445 ecr 0,nop,wscale 4], length 0
17:22:58.555423 IP (tos 0x58, ttl 54, id 65443, offset 0, flags [none], proto TCP (6), length 60)
    fra07s07-in-f102.1e100.net.http > 192.168.1.105.42036: Flags [S.], cksum 0x3b03 (correct), seq 1439178112, ack 1564709705, win 14180, options [mss 1430,sackOK,TS val 3721838596 ecr 33883552,nop,wscale 6], length 0
17:22:58.555458 IP (tos 0x58, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.105.42036 > fra07s07-in-f102.1e100.net.http: Flags [R], cksum 0xf301 (correct), seq 1564709705, win 0, length 0

ethtool output

ethtool -k wlan0:

Features for wlan0:
rx-checksumming: off [fixed]
tx-checksumming: off
  tx-checksum-ipv4: off [fixed]
  tx-checksum-unneeded: off [fixed]
  tx-checksum-ip-generic: off [fixed]
  tx-checksum-ipv6: off [fixed]
  tx-checksum-fcoe-crc: off [fixed]
  tx-checksum-sctp: off [fixed]
scatter-gather: off
  tx-scatter-gather: off [fixed]
  tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
  tx-tcp-segmentation: off [fixed]
  tx-tcp-ecn-segmentation: off [fixed]
  tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: on [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]

iptables

# namei -l "$(command -v iptables)"
f: /sbin/iptables
drwxr-xr-x root root /
drwxr-xr-x root root sbin
lrwxrwxrwx root root iptables -> xtables-multi
-rwxr-xr-x root root   xtables-multi

# dpkg -S "$(command -v iptables)"
iptables: /sbin/iptables

# iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
# iptables -t mangle -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
# iptables -t security -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

module info

# ethtool -i wlan0                   
driver: brcmsmac
version: 3.2.0-3-686-pae
firmware-version: N/A
bus-info: 0000:12:00.0
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

# modinfo brcmsmac
filename:       /lib/modules/3.2.0-3-686-pae/kernel/drivers/net/wireless/brcm80211/brcmsmac/brcmsmac.ko
license:        Dual BSD/GPL
description:    Broadcom 802.11n wireless LAN driver.
author:         Broadcom Corporation
alias:          pci:v000014E4d00000576sv*sd*bc*sc*i*
alias:          pci:v000014E4d00004727sv*sd*bc*sc*i*
alias:          pci:v000014E4d00004353sv*sd*bc*sc*i*
alias:          pci:v000014E4d00004357sv*sd*bc*sc*i*
depends:        mac80211,brcmutil,cfg80211,cordic,crc8
intree:         Y
vermagic:       3.2.0-3-686-pae SMP mod_unload modversions 686 

There's no /sys/module/brcmsmac/parameters. Here's what I have there:

# tree /sys/module/brcmsmac
/sys/module/brcmsmac
├── drivers
│   └── pci:brcmsmac -> ../../../bus/pci/drivers/brcmsmac
├── holders
├── initstate
├── notes
├── refcnt
├── sections
│   └── __bug_table
└── uevent

Some sites actually work

As suggested by dr, I tried some other sites, and to my great surprise some of
them indeed worked. Here are some hosts that worked:

  • rambler.ru
  • google.ru
  • ya.ru
  • opennet.ru
  • tut.by
  • ro-che.info
  • yahoo.com
  • ebay.com

And here are some that didn't:

  • vk.com
  • meta.ua
  • ukr.net
  • tenet.ua
  • prom.ua
  • reddit.com
  • github.com
  • stackexchange.com

Network capture

I made a network capture and uploaded it here.

Best Answer

In the capture you provided, the Time Stamp Echo Reply in the SYN-ACK in the second packet doesn't match the TSVal in the SYN in the first packet and is a few seconds behind.

And see how all the TSecr sent by both 173.194.70.108 and 209.85.148.100 are all the same and irrelevant from the TSVal you send.

It looks like there's something that mingles with the TCP timestamps. I have no idea what may be causing that, but it sounds like it is outside your machine. Does rebooting the router help in this instance?

I don't know if it's what's causing your machine to send a RST (on the 3rd packet). But it definitely doesn't like that SYN-ACK, and it's the only thing wrong I can find about it. The only other explanation I can think of is if it's not your machine that is sending the RST but given the time difference between the SYN-ACK and RST I would doubt so. But just in case, do you use virtual machines or containers or network namespaces on this machine?

You could try disabling TCP timestamps altogether to see if that helps:

sudo sysctl -w net.ipv4.tcp_timestamps=0

So, either those sites send bogus TSecr or there's something on the way there (any router on the way, or transparent proxy) that mangles either the outgoing TSVal or the incoming TSecr, or a proxy with a bogus TCP stack. Why one would mangle the tcp timestamps I can only speculate: bug, intrusion detection evasion, a too-smart/bogus traffic shaping algorithm. That's not something I've heard of before (but then I'm no expert on the subject).

How to investigate further:

  • See if the TPLink router is to blame why resetting it to see if that helps or capture the traffic on the outside as well if possible to see if it does mangle the timestamps
  • Check whether there's a transparent proxy on the way by playing with TTLs, looking at request headers received by web servers or see behaviour when requesting dead websites.
  • capture traffic on a remote web server to see if it's the TSVal or TSecr that is mangled.
Related Question