Linux – Changing the TCP RTO value in Linux

linuxlinux-kerneltcp

I want to alter the TCP RTO (retransmission timeout) value for a connection, and some reading I have done suggests that I could do this, but does not reveal where and how to change it.

I have looked at the /proc/sys/net/ipv4 variables, but none of the variables is related to RTO. I would appreciate it if someone can tell me how to alter this value.

Best Answer

The reason you can't alter the RTO specifically is because it is not a static value. Instead (except for the initial SYN, naturally) it is based on the RTT (Round Trip Time) for each connection. Actually, it is based on a smoothed version of RTT and the RTT variance with some constants thrown into the mix. Hence, it is a dynamic, calculated value for each TCP connection, and I highly recommend this article which goes into more detail on the calculation and RTO in general.

Also relevant is RFC 6298 which states (among a lot of other things):

Whenever RTO is computed, if it is less than 1 second, then the RTO SHOULD be rounded up to 1 second.

Does the kernel always set RTO to 1 second then? Well, with Linux you can show the current RTO values for your open connections by running the ss -i command:

State       Recv-Q Send-Q                                                  Local Address:Port     Peer Address:Port
ESTAB       0      0                                                           10.0.2.15:52861   216.58.219.46:http
     cubic rto:204 rtt:4/2 cwnd:10 send 29.2Mbps rcv_space:14600
ESTAB       0      0                                                           10.0.2.15:ssh          10.0.2.2:52586
     cubic rto:201 rtt:1.5/0.75 ato:40 cwnd:10 send 77.9Mbps rcv_space:14600
ESTAB       0      0                                                           10.0.2.15:52864   216.58.219.46:http
     cubic rto:204 rtt:4.5/4.5 cwnd:10 send 26.0Mbps rcv_space:14600

The above is the output from a VM which I am logged into with SSH and has a couple of connections open to google.com. As you can see, the RTO is in fact set to 200-ish (milliseconds). You will note that is not rounded to the 1 second value from the RFC, and you may also think that it's a little high. That's because there are min (200 milliseconds) and max (120 seconds) bounds in play when it comes to RTO for Linux (there is a great explanation of this in the article I linked above).

So, you can't alter the RTO value directly, but for lossy networks (like wireless) you can try tweaking F-RTO (this may already be enabled depending on your distro). There are actually two related options related to F-RTO that you can tweak (good summary here):

net.ipv4.tcp_frto
net.ipv4.tcp_frto_response

Depending on what you are trying to optimize for, these may or may not be useful.

EDIT: following up on the ability to tweak the rto_min/max values for TCP from the comments.

You can't change the global minimum RTO for TCP (as an aside, you can do it for SCTP - those are exposed in sysctl), but the good news is that you can tweak the minimum value of the RTO on a per-route basis. Here's my routing table on my CentOS VM:

ip route
10.0.2.0/24 dev eth0  proto kernel  scope link  src 10.0.2.15 
169.254.0.0/16 dev eth0  scope link  metric 1002 
default via 10.0.2.2 dev eth0

I can change the rto_min value on the default route as follows:

ip route change default via 10.0.2.2 dev eth0 rto_min 5ms

And now, my routing table looks like this:

ip route
10.0.2.0/24 dev eth0  proto kernel  scope link  src 10.0.2.15 
169.254.0.0/16 dev eth0  scope link  metric 1002 
default via 10.0.2.2 dev eth0  rto_min lock 5ms

Finally, let's initiate a connection and check out ss -i to see if this has been respected:

ss -i
State       Recv-Q Send-Q                                               Local Address:Port                                                   Peer Address:Port   
ESTAB       0      0                                                        10.0.2.15:ssh                                                        10.0.2.2:50714   
     cubic rto:201 rtt:1.5/0.75 ato:40 cwnd:10 send 77.9Mbps rcv_space:14600
ESTAB       0      0                                                        10.0.2.15:39042                                                 216.58.216.14:http    
     cubic rto:15 rtt:5/2.5 cwnd:10 send 23.4Mbps rcv_space:14600

Success! The rto on the HTTP connection (after the change) is 15ms, whereas the SSH connection (before the change) is 200+ as before.

I actually like this approach - it allows you to set the lower value on appropriate routes rather than globally where it might screw up other traffic. Similarly (see the ip man page) you can tweak the initial rtt estimate and the initial rttvar for the route (used when calculating the dynamic RTO). While it's not a complete solution in terms of tweaking, I think most of the important pieces are there. You can't tweak the max setting, but I think that is not going to be as useful generally in any case.

Related Solutions

Linux – How long is a TCP local socket address that has been bound unavailable after closing

I believe that the idea of the socket being unavailable to a program is to allow any TCP data segments still in transit to arrive, and get discarded by the kernel. That is, it's possible for an application to call close(2) on a socket, but routing delays or mishaps to control packets or what have you can allow the other side of a TCP connection to send data for a while. The application has indicated it no longer wants to deal with TCP data segments, so the kernel should just discard them as they come in.

I hacked out a little program in C that you can compile and use to see how long the timeout is:

#include <stdio.h>        /* fprintf() */
#include <string.h>       /* strerror() */
#include <errno.h>        /* errno */
#include <stdlib.h>       /* strtol() */
#include <signal.h>       /* signal() */
#include <sys/time.h>     /* struct timeval */
#include <unistd.h>       /* read(), write(), close(), gettimeofday() */
#include <sys/types.h>    /* socket() */
#include <sys/socket.h>   /* socket-related stuff */
#include <netinet/in.h>
#include <arpa/inet.h>    /* inet_ntoa() */
float elapsed_time(struct timeval before, struct timeval after);
int
main(int ac, char **av)
{
        int opt;
        int listen_fd = -1;
        unsigned short port = 0;
        struct sockaddr_in  serv_addr;
        struct timeval before_bind;
        struct timeval after_bind;

        while (-1 != (opt = getopt(ac, av, "p:"))) {
                switch (opt) {
                case 'p':
                        port = (unsigned short)atoi(optarg);
                        break;
                }
        }

        if (0 == port) {
                fprintf(stderr, "Need a port to listen on\n");
                return 2;
        }

        if (0 > (listen_fd = socket(AF_INET, SOCK_STREAM, 0))) {
                fprintf(stderr, "Opening socket: %s\n", strerror(errno));
                return 1;
        }

        memset(&serv_addr, '\0', sizeof(serv_addr));
        serv_addr.sin_family      = AF_INET;
        serv_addr.sin_addr.s_addr = htonl(INADDR_ANY);
        serv_addr.sin_port        = htons(port);

        gettimeofday(&before_bind, NULL);
        while (0 > bind(listen_fd, (struct sockaddr *)&serv_addr, sizeof(serv_addr))) {
                fprintf(stderr, "binding socket to port %d: %s\n",
                        ntohs(serv_addr.sin_port),
                        strerror(errno));

                sleep(1);
        }
        gettimeofday(&after_bind, NULL);
        printf("bind took %.5f seconds\n", elapsed_time(before_bind, after_bind));

        printf("# Listening on port %d\n", ntohs(serv_addr.sin_port));
        if (0 > listen(listen_fd, 100)) {
                fprintf(stderr, "listen() on fd %d: %s\n",
                        listen_fd,
                        strerror(errno));
                return 1;
        }

        {
                struct sockaddr_in  cli_addr;
                struct timeval before;
                int newfd;
                socklen_t clilen;

                clilen = sizeof(cli_addr);

                if (0 > (newfd = accept(listen_fd, (struct sockaddr *)&cli_addr, &clilen))) {
                        fprintf(stderr, "accept() on fd %d: %s\n", listen_fd, strerror(errno));
                        exit(2);
                }
                gettimeofday(&before, NULL);
                printf("At %ld.%06ld\tconnected to: %s\n",
                        before.tv_sec, before.tv_usec,
                        inet_ntoa(cli_addr.sin_addr)
                );
                fflush(stdout);

                while (close(newfd) == EINTR) ;
        }

        if (0 > close(listen_fd))
                fprintf(stderr, "Closing socket: %s\n", strerror(errno));

        return 0;
}
float
elapsed_time(struct timeval before, struct timeval after)
{
        float r = 0.0;

        if (before.tv_usec > after.tv_usec) {
                after.tv_usec += 1000000;
                --after.tv_sec;
        }

        r = (float)(after.tv_sec - before.tv_sec)
                + (1.0E-6)*(float)(after.tv_usec - before.tv_usec);

        return r;
}

I tried this program on 3 different machines, and I get a variable time, between 55 and 59 seconds, when the kernel refuses to allow a non-root user to reopen a socket. I compiled the above code to an executable named "opener", and ran it like this:

./opener -p 7896; ./opener -p 7896

I opened another window and did this:

telnet otherhost 7896

That causes the first instance of "opener" to accept a connection, then close it. The second instance of "opener" tries to bind(2) to the TCP port 7896 every second. "opener" reports 55 to 59 seconds of delay.

Googling around, I find that people recommend doing this:

echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

to reduce that interval. It didn't work for me. Of the 4 linux machines I had access to, two had 30 and two had 60. I also set that value as low as 10. No difference to the "opener" program.

Doing this:

echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle

did change things. The second "opener" only took about 3 seconds to get its new socket.

Linux – In linux does “/proc/sys/net/ipv4/tcp_keepalive_time” has impact on both client & server

Linux kernel's built-in support for TCP keepalive affects all keepalive-enabled TCP sockets.

TCP keepalive is not enabled by default, though: the applications must explicitly request keepalive control for their sockets using the setsockopt interface.

You can check whether keepalive is enabled on a specific socket by using the netstat command with the -o, --timers option:

Here you can see that the same process can open different network sockets in both modes:

# netstat -anpo | grep 8999
tcp        0      0 10.10.171.44:48744         10.10.139.30:8999          ESTABLISHED 18232/java          keepalive (83.39/0/0)

# netstat -anpo | grep 8009
tcp        0      0 10.10.171.44:8009          10.10.171.42:40947         ESTABLISHED 18232/java          off (0.00/0/0)

Best Answer

Related Solutions

Linux – How long is a TCP local socket address that has been bound unavailable after closing

Linux – In linux does “/proc/sys/net/ipv4/tcp_keepalive_time” has impact on both client & server

Related Question