Ssh – Restart a specific reverse ssh tunnel

mobilemodemsshtunneling

I have multiple machines in the wild that open reverse ssh connections to my server. Each machine out there is using a different reverse ssh port, which I use to differentiate between the machines. I use these tunnels to log into the machines from the server (obviously):

me@server:~$ ssh -p 2219 root@localhost
Last login: Sun Jun  7 00:18:44 2015 from localhost
root@remote_machine:~#

The remote machines are using quite different access technologies (DSL, VSAT, GPRS/EDGE/3G/4G), so the endurance of the reverse ssh connection differs somewhat – and here apparently lies the problem.

This is what nmap lists after a longer idle period (ie. no ssh tunnel has been forcefully restarted, see below):

me@server:~$ sudo nmap -sS -p 1000-3000 --open localhost

Starting Nmap 5.21 ( http://nmap.org ) at 2015-06-07 11:09 CEST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000014s latency).
Hostname localhost resolves to 2 IPs. Only scanned 127.0.0.1
Not shown: 1988 closed ports
PORT     STATE SERVICE
1133/tcp open  unknown
1270/tcp open  ssserver
1356/tcp open  cuillamartin
1590/tcp open  unknown
1760/tcp open  unknown
1772/tcp open  unknown
1823/tcp open  unknown
1825/tcp open  unknown
1842/tcp open  unknown
1907/tcp open  unknown
2078/tcp open  unknown
2168/tcp open  unknown
2185/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.15 seconds
me@server:~$

Now, this is way too few connections, so let's kill them all and wait for the external connections to come back:

me@server:~$ for i in $(ps axww|grep ssh_key_used_for_reverse_connctions|grep sshd|sed -e 's/^[ \t]*//'|cut -d " " -f 1); do sudo kill -9 $i; done
me@server:~$

Ok, all connections are gone:

Starting Nmap 5.21 ( http://nmap.org ) at 2015-06-07 11:13 CEST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000014s latency).
Hostname localhost resolves to 2 IPs. Only scanned 127.0.0.1
All 2002 scanned ports on localhost (127.0.0.1) are closed

Nmap done: 1 IP address (1 host up) scanned in 0.15 seconds

Let's wait (the remote machines try every 30 seconds to establish a new connection) and see what's coming in now:

me@server:~$ sudo nmap -sS -p 1000-3000 --open localhost

Starting Nmap 5.21 ( http://nmap.org ) at 2015-06-07 11:14 CEST
Nmap scan report for localhost (127.0.0.1)
Host is up (0.000015s latency).
Hostname localhost resolves to 2 IPs. Only scanned 127.0.0.1
Not shown: 1950 closed ports
PORT     STATE SERVICE
1125/tcp open  unknown
1129/tcp open  unknown
1133/tcp open  unknown
1155/tcp open  unknown
1156/tcp open  unknown
1157/tcp open  unknown
1162/tcp open  unknown
1176/tcp open  unknown
1185/tcp open  unknown
1198/tcp open  unknown
1215/tcp open  unknown
1269/tcp open  unknown
1270/tcp open  ssserver
1343/tcp open  unknown
1345/tcp open  unknown
1351/tcp open  equationbuilder
1356/tcp open  cuillamartin
1420/tcp open  timbuktu-srv4
1432/tcp open  blueberry-lm
1541/tcp open  rds2
1590/tcp open  unknown
1698/tcp open  unknown
1743/tcp open  unknown
1760/tcp open  unknown
1772/tcp open  unknown
1773/tcp open  unknown
1812/tcp open  unknown
1823/tcp open  unknown
1825/tcp open  unknown
1842/tcp open  unknown
1859/tcp open  unknown
1900/tcp open  upnp
1907/tcp open  unknown
2002/tcp open  globe
2030/tcp open  device2
2031/tcp open  unknown
2032/tcp open  unknown
2033/tcp open  glogger
2035/tcp open  imsldoc
2058/tcp open  unknown
2078/tcp open  unknown
2093/tcp open  unknown
2159/tcp open  unknown
2168/tcp open  unknown
2169/tcp open  unknown
2180/tcp open  unknown
2185/tcp open  unknown
2186/tcp open  unknown
2219/tcp open  unknown
2221/tcp open  unknown
2228/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.16 seconds
me@server:~$

Ahh, much better.

Now, my questions: Even in the first scenario with little open connections, ps axww|grep ssh_key_used_for_remote_connections|grep sshd|sed -e 's/^[ \t]*//' shows a lot more ssh connections then are actually open, so the connection seems to silently die in the background without the remote machine noticing it.

A. Is there a better way of implementing the reverse ssh connections, for example any ssh options I might have missed that make the remote machine notice a dead/stuck connection better?
This is the script that's running on the remote machines to open up the reverse ssh tunnel:

#!/bin/bash

while true
do
  ssh -i /some/dir/reverse-ssh.key -o TCPKeepAlive=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3 -nNTv -R $(grep -o "[0-9][0-9][0-9][0-9]" /some/dir/id):localhost:22 reverse_ssh_user_name@office.radiopark.biz
  sleep 30
done

So I already use -o TCPKeepAlive=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=3. /some/dir/id holds a four digit number which each machine uses as it's reverse ssh port, from the servers point of view the reverse ssh port.

B. Is there a better way to kill only unresponsive reverse connections, leaving all "working" connections intact? For now I kill all of them, but that seems crude and wrong. ps won't let me see the port id and I would need to make the connection of reverse ssh port and ssh's PID on my server somehow.

I have looked into autossh but that seems to re-do what my scripts do(?).

mosh is out of the question as it uses UDP connections (which oftentimes don't go through at all) and random ports above 60000 (which don't get through, either).

Best Answer

This will show you the processes using the tunnel:

netstat -tnp | grep :2219 | awk '{print $NF}'

I can't reproduce your dead connections, but this should work

for i in $(seq 2000 2030) do
  if !nmap -p $i localhost
    netstat -tnp | grep 2222 | grep '/ssh *$' | awk '{print $NF}' | sed -e 's#/ssh##' | xargs kill
  fi
end
Related Question