SSH: Fix Connection Freezes on 4G Hotspot

opensshssh

Short Description

I have been seeing an strange behaviour on my SSH connection for years but never thought of raising a question until today. I tried to search a lot about this but couldn't find any reason.

Environment

  1. Basically, I have various AWS EC2 instances running on different regions (like Ireland, Mumbai etc.).
  2. I have a Mac.
  3. And I'm located in India (in case it strikes someone some reason).

Problem Statement 1

When my Mac is connected to a personal hotspot (from a Samsung device or from an iPhone) over 4G network, my SSH connection freezes after a few minutes (not more than 3) if I do not work on the SSH session (basically, the SSH connection went ideal). So I have to keep pressing the arrow key just to keep it alive.

Problem Statement 2 (which is not a problem)

But when my Mac is connected to a Wifi broadband connection, this problem never occurs. My SSH connection stays connected for hours even after I wake up my Mac from sleep (open the lid).

Based on my Googling again today, I found various articles which gives solution to use options like TCPKeepAlive or ServerAliveInterval:

  1. What options `ServerAliveInterval` and `ClientAliveInterval` in sshd_config exactly do?
  2. How does tcp-keepalive work in ssh?
  3. https://raspberrypi.stackexchange.com/questions/8311/ssh-connection-timeout-when-connecting
  4. https://patrickmn.com/aside/how-to-keep-alive-ssh-sessions/

But I couldn't find any post which dictates this problem. Does anyone of you have any idea about this behaviour? I'll be happy to provide you any possible details of my 4G hotspot connection.

Best Answer

I would surmise that a system tracking (and forgetting) connections statefully is causing this. When NAT is in use (and it's very often the case when not on IPv6) then usually the system doing NAT needs a memory to remember where to send back replies. For your Wifi broadband, the system doing NAT might have a longer memory to remember active connections (for example, Linux netfilter's conntrack by default remembers TCP connections for 5 days, while it remembers UDP flows for 2 or 3 minutes). The equivalent system doing NAT on your 4G path has probably a shorter memory, a bit less than 3mn.

To work around this, as you found and linked in your question, you can set the specific ssh parameter ServerAliveInterval that will send empty data (as SSH protocol) periodically when there's no activity in a way similar to TCP KeepAlive. This will make the connection always seen as active for the system doing NAT, and it won't forget it. So in your ~/.ssh/config file you could add:

ServerAliveInterval 115

with 115 a value chosen to be slightly less than 2mn to stay conservative: a value inferior to the estimated tracking duration of active connections on the invisible NAT device in the path, but not too low either (see later). So that at worse, when the tracking state is 5s from being about to be deleted, it gets back to the supposed 120s lifespan.

The drawback is that (on your Wifi broadband access anyway) if you lose connectivity for some time and then recover it, this might have made the client think the remote server was down and it will have closed the connection. You can also tweak ServerAliveCountMax for this, but anyway if the default value is 3, that would require something like 3*115=345s of connectivity loss, more than 5mn, before having a chance to have this problem.

The server side has an equivalent ClientAliveInterval that you can set there in its sshd_config file instead, for the same purpose. This would have the added benefit of not keeping around ghost ssh client connections seen as still connected for some length of time when the client side anyway lost connectivity.

Related Question