Advantage of using “–syn” for matching new TCP connections

iptablestcp

I have seen a similar technique used in several iptables examples for matching new connections:

-A INPUT -p tcp -m tcp --dport 443 --syn -m conntrack --ctstate NEW -j SSH

As seen above, TCP connections are checked against TCP flags(SYN has to be 1 and RST, ACK and FIN 0) by tcp module besides --ctstate NEW of conntrack module. Does it provide any advantage over this:

-A INPUT -p tcp -m tcp --dport 443 -m conntrack --ctstate NEW -j SSH

My assumption is that it does because match modules are evaluated in the order they are specified in the rule and without --syn, all the TCP packets to port 443 would be passed from tcp module to conntrack module. In other words, --syn should provide this fail-fast paradigm.

Best Answer

In other words, --syn should provide this fail-fast paradigm.

That’s pretty much it. In practice, it’s the extension mechanism which is short-circuiting; the manpage says

Matches are evaluated first to last as specified on the command line and work in short-circuit fashion, i.e. if one extension yields false, evaluation will stop.

In the above case, the rule uses two extensions: the tcp extension, which processes --dport and --syn, followed by the conntrack extension, which processes --ctstate. If the tcp extension fails to match, the conntrack extension will be skipped entirely.

Related Solutions

TCP – Troubleshooting TCP Issues on a Linux Laptop

In the capture you provided, the Time Stamp Echo Reply in the SYN-ACK in the second packet doesn't match the TSVal in the SYN in the first packet and is a few seconds behind.

And see how all the TSecr sent by both 173.194.70.108 and 209.85.148.100 are all the same and irrelevant from the TSVal you send.

It looks like there's something that mingles with the TCP timestamps. I have no idea what may be causing that, but it sounds like it is outside your machine. Does rebooting the router help in this instance?

I don't know if it's what's causing your machine to send a RST (on the 3rd packet). But it definitely doesn't like that SYN-ACK, and it's the only thing wrong I can find about it. The only other explanation I can think of is if it's not your machine that is sending the RST but given the time difference between the SYN-ACK and RST I would doubt so. But just in case, do you use virtual machines or containers or network namespaces on this machine?

You could try disabling TCP timestamps altogether to see if that helps:

sudo sysctl -w net.ipv4.tcp_timestamps=0

So, either those sites send bogus TSecr or there's something on the way there (any router on the way, or transparent proxy) that mangles either the outgoing TSVal or the incoming TSecr, or a proxy with a bogus TCP stack. Why one would mangle the tcp timestamps I can only speculate: bug, intrusion detection evasion, a too-smart/bogus traffic shaping algorithm. That's not something I've heard of before (but then I'm no expert on the subject).

How to investigate further:

See if the TPLink router is to blame why resetting it to see if that helps or capture the traffic on the outside as well if possible to see if it does mangle the timestamps
Check whether there's a transparent proxy on the way by playing with TTLs, looking at request headers received by web servers or see behaviour when requesting dead websites.
capture traffic on a remote web server to see if it's the TSVal or TSecr that is mangled.

Iptables: Matching Outgoing Traffic with Conntrack and Owner – Troubleshooting Drops

To cut a long story short, that ACK was sent when the socket didn't belong to anybody. Instead of allowing packets that pertain to a socket that belongs to user x, allow packets that pertain to a connection that was initiated by a socket from user x.

The longer story.

To understand the issue, it helps to understand how wget and HTTP requests work in general.

wget http://cachefly.cachefly.net/10mb.test

wget establishes a TCP connection to cachefly.cachefly.net, and once established sends a request in the HTTP protocol that says: "Please send me the content of /10mb.test (GET /10mb.test HTTP/1.1) and by the way, could you please not close the connection after you're done (Connection: Keep-alive). The reason it does that is because in case the server replies with a redirection for a URL on the same IP address, it can reuse the connection.

Now the server can reply with either, "here comes the data you requested, beware it's 10MB large (Content-Length: 10485760), and yes OK, I'll leave the connection open". Or if it doesn't know the size of the data, "Here's the data, sorry I can't leave the connection open but I'll tell when you can stop downloading the data by closing my end of the connection".

In the URL above, we're in the first case.

So, as soon as wget has obtained the headers for the response, it knows its job is done once it has downloaded 10MB of data.

Basically, what wget does is read the data until 10MB have been received and exit. But at that point, there's more to be done. What about the server? It's been told to leave the connection open.

Before exiting, wget closes (close system call) the file descriptor for the socket. Upon, the close, the system finishes acknowledging the data sent by the server and sends a FIN to say: "I won't be sending any more data". At that point close returns and wget exits. There is no socket associated to the TCP connection anymore (at least not one owned by any user). However it's not finished yet. Upon receiving that FIN, the HTTP server sees end-of-file when reading the next request from the client. In HTTP, that means "no more request, I'll close my end". So it sends its FIN as well, to say, "I won't be sending anything either, that connection is going away".

Upon receiving that FIN, the client sends a "ACK". But, at that point, wget is long gone, so that ACK is not from any user. Which is why it is blocked by your firewall. Because the server doesn't receive the ACK, it's going to send the FIN over and over until it gives up and you'll see more dropped ACKs. That also means that by dropping those ACKs, you're needlessly using resources of the server (which needs to maintain a socket in the LAST-ACK state) for quite some time.

The behavior would have been different if the client had not requested "Keep-alive" or the server had not replied with "Keep-alive".

As already mentioned, if you're using the connection tracker, what you want to do is let every packet in the ESTABLISHED and RELATED states through and only worry about NEW packets.

If you allow NEW packets from user x but not packets from user y, then other packets for established connections by user x will go through, and because there can't be established connections by user y (since we're blocking the NEW packets that would establish the connection), there will not be any packet for user y connections going through.

Best Answer

Related Solutions

TCP – Troubleshooting TCP Issues on a Linux Laptop

Iptables: Matching Outgoing Traffic with Conntrack and Owner – Troubleshooting Drops

Related Question