Why doesn’t NAT reserve ports from the machine’s TCP and UDP port pool

iptablesnat;networking

I made two experiments. This is the network for both of them:

        [private network]     [public network]
    A -------------------- R ----------------- B
192.168.0.5     192.168.0.1|192.0.2.1       192.0.2.8

A's default gateway is R. R has IPv4 forwarding active and the following iptables rule:

iptables -t nat -A POSTROUTING -p TCP -j MASQUERADE --to-ports 50000

The intent is, anything TCP from A will be masked as 192.0.2.1 using R's port 50000.

I published a TCP service on port 60000 on B using nc -4l 192.0.2.8 60000.

Then I opened a connection from A: nc -4 192.0.2.8 60000

A started sending packets that looked like this:

192.168.0.5:53269 -> 192.0.2.8:60000

R translated that into

192.0.2.1:50000 -> 192.0.2.8:60000

So far, so good.

I then tried to open the following client on R: nc -4 192.0.2.8 60000 -p 50000. I sent messages, nothing happens. No packets can be seen on R's tcpdump.

Because the masquerade rule exists, or at least because it's active, I would have expected R's nc to fail with the error message "nc: Address already in use", which is what happens if I bind two ncs to the same port.

I then waited a while so conntrack's mapping would die.

The second experiment consisted on me trying to open R's client first. R starts talking to B just fine. If I then open the connection from A, its packets are ignored. A's SYNs arrive at R, but they aren't answered, not even by ICMP errors. I don't know if this is because R knows it ran out of masquerading ports or because Linux is just flat-out confused (it technically masks the port but the already established connection somehow interferes).

I feel the NAT's behaviour is wrong. I could accidentally configure a port for both masquerading (particularly, by not specifying --to-ports during the iptables rule) and a service, and the kernel will drop connections silently. I also don't see any of this documented anywhere.

For example:

  • A makes a normal request to B. R masks using port 50k.
  • A makes a DNS query to R. Being that T is recursive, R (using, out of sheer coincidence, ephemeral port 50k) queries authoritative nameserver Z on port 53.

A collision just happened; R is now using port 50k for two separate TCP connections.

I guess it's because you don't normally publish services on routers. But then again, would it hurt the kernel to "borrow" the port from the TCP port pool when it becomes actively masqueraded?

I know that I can separate my ephemeral ports from my --to-ports. However, this doesn't seem to be the default behaviour. Both NAT and the ephemeral ports default to 32768-61000, which is creepy.

(I found the ephemeral range by querying /proc/sys/net/ipv4/ip_local_port_range, and the NAT range by simply NATting lots of UDP requests in a separate experiment – and printing the source port at the server side. I couldn't find a way to print the range using iptables.)

Best Answer

would it hurt the kernel to "borrow" the port from the TCP port pool when it becomes actively masqueraded?

I guess the answer is "no, but it doesn't matter much."

I incorrectly assumed R only used the destination transport address of the response packet to tell whether it was headed towards A or itself. It actually seems to use the entire source-destination transport addresses tuple to identify a connection. Therefore, it's actually normal for NAT to create multiple connections using the same (R owned) port; it doesn't create any confusion. Consequently, the TCP/UDP port pools don't matter.

It's pretty obvious now that I think about it.

I then tried to open the following client on R: nc -4 192.0.2.8 60000 -p 50000. I sent messages, nothing happens. No packets can be seen on R's tcpdump.

This is the part of the experiments where I messed up.

The failure happens because both the source and destination transport addresses are the same, not just because the source address is the same.

If I do, say, nc -4 192.0.2.8 60001 -p 50000, it actually works. Even if it's using the same port as a NAT mask.

I feel the NAT's behaviour is wrong. I could accidentally configure a port for both masquerading (particularly, by not specifying --to-ports during the iptables rule) and a service, and the kernel will drop connections silently.

It won't, because the masked connections and the R-started connections will most likely have different destinations.

Because the masquerade rule exists, or at least because it's active, I would have expected R's nc to fail with the error message "nc: Address already in use", which is what happens if I bind two ncs to the same port.

I'm still looking for a bulletproof answer to this, but everything seems to point to "it's an adverse consequence of how it's implemented, and it's so small we're willing to live with it."

Related Question