I believe that the idea of the socket being unavailable to a program is to allow any TCP data segments still in transit to arrive, and get discarded by the kernel. That is, it's possible for an application to call close(2)
on a socket, but routing delays or mishaps to control packets or what have you can allow the other side of a TCP connection to send data for a while. The application has indicated it no longer wants to deal with TCP data segments, so the kernel should just discard them as they come in.
I hacked out a little program in C that you can compile and use to see how long the timeout is:
#include <stdio.h> /* fprintf() */
#include <string.h> /* strerror() */
#include <errno.h> /* errno */
#include <stdlib.h> /* strtol() */
#include <signal.h> /* signal() */
#include <sys/time.h> /* struct timeval */
#include <unistd.h> /* read(), write(), close(), gettimeofday() */
#include <sys/types.h> /* socket() */
#include <sys/socket.h> /* socket-related stuff */
#include <netinet/in.h>
#include <arpa/inet.h> /* inet_ntoa() */
float elapsed_time(struct timeval before, struct timeval after);
int
main(int ac, char **av)
{
int opt;
int listen_fd = -1;
unsigned short port = 0;
struct sockaddr_in serv_addr;
struct timeval before_bind;
struct timeval after_bind;
while (-1 != (opt = getopt(ac, av, "p:"))) {
switch (opt) {
case 'p':
port = (unsigned short)atoi(optarg);
break;
}
}
if (0 == port) {
fprintf(stderr, "Need a port to listen on\n");
return 2;
}
if (0 > (listen_fd = socket(AF_INET, SOCK_STREAM, 0))) {
fprintf(stderr, "Opening socket: %s\n", strerror(errno));
return 1;
}
memset(&serv_addr, '\0', sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = htonl(INADDR_ANY);
serv_addr.sin_port = htons(port);
gettimeofday(&before_bind, NULL);
while (0 > bind(listen_fd, (struct sockaddr *)&serv_addr, sizeof(serv_addr))) {
fprintf(stderr, "binding socket to port %d: %s\n",
ntohs(serv_addr.sin_port),
strerror(errno));
sleep(1);
}
gettimeofday(&after_bind, NULL);
printf("bind took %.5f seconds\n", elapsed_time(before_bind, after_bind));
printf("# Listening on port %d\n", ntohs(serv_addr.sin_port));
if (0 > listen(listen_fd, 100)) {
fprintf(stderr, "listen() on fd %d: %s\n",
listen_fd,
strerror(errno));
return 1;
}
{
struct sockaddr_in cli_addr;
struct timeval before;
int newfd;
socklen_t clilen;
clilen = sizeof(cli_addr);
if (0 > (newfd = accept(listen_fd, (struct sockaddr *)&cli_addr, &clilen))) {
fprintf(stderr, "accept() on fd %d: %s\n", listen_fd, strerror(errno));
exit(2);
}
gettimeofday(&before, NULL);
printf("At %ld.%06ld\tconnected to: %s\n",
before.tv_sec, before.tv_usec,
inet_ntoa(cli_addr.sin_addr)
);
fflush(stdout);
while (close(newfd) == EINTR) ;
}
if (0 > close(listen_fd))
fprintf(stderr, "Closing socket: %s\n", strerror(errno));
return 0;
}
float
elapsed_time(struct timeval before, struct timeval after)
{
float r = 0.0;
if (before.tv_usec > after.tv_usec) {
after.tv_usec += 1000000;
--after.tv_sec;
}
r = (float)(after.tv_sec - before.tv_sec)
+ (1.0E-6)*(float)(after.tv_usec - before.tv_usec);
return r;
}
I tried this program on 3 different machines, and I get a variable time, between 55 and 59 seconds, when the kernel refuses to allow a non-root user to reopen a socket. I compiled the above code to an executable named "opener", and ran it like this:
./opener -p 7896; ./opener -p 7896
I opened another window and did this:
telnet otherhost 7896
That causes the first instance of "opener" to accept a connection, then close it. The second instance of "opener" tries to bind(2)
to the TCP port 7896 every second. "opener" reports 55 to 59 seconds of delay.
Googling around, I find that people recommend doing this:
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
to reduce that interval. It didn't work for me. Of the 4 linux machines I had access to, two had 30 and two had 60. I also set that value as low as 10. No difference to the "opener" program.
Doing this:
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle
did change things. The second "opener" only took about 3 seconds to get its new socket.
Linux kernel's built-in support for TCP keepalive affects all keepalive-enabled TCP sockets.
TCP keepalive is not enabled by default, though: the applications must explicitly request keepalive control for their sockets using the setsockopt
interface.
You can check whether keepalive is enabled on a specific socket by using the netstat
command with the -o, --timers
option:
Here you can see that the same process can open different network sockets in both modes:
# netstat -anpo | grep 8999
tcp 0 0 10.10.171.44:48744 10.10.139.30:8999 ESTABLISHED 18232/java keepalive (83.39/0/0)
# netstat -anpo | grep 8009
tcp 0 0 10.10.171.44:8009 10.10.171.42:40947 ESTABLISHED 18232/java off (0.00/0/0)
Best Answer
The reason you can't alter the RTO specifically is because it is not a static value. Instead (except for the initial SYN, naturally) it is based on the RTT (Round Trip Time) for each connection. Actually, it is based on a smoothed version of RTT and the RTT variance with some constants thrown into the mix. Hence, it is a dynamic, calculated value for each TCP connection, and I highly recommend this article which goes into more detail on the calculation and RTO in general.
Also relevant is RFC 6298 which states (among a lot of other things):
Does the kernel always set RTO to 1 second then? Well, with Linux you can show the current RTO values for your open connections by running the
ss -i
command:The above is the output from a VM which I am logged into with SSH and has a couple of connections open to google.com. As you can see, the RTO is in fact set to 200-ish (milliseconds). You will note that is not rounded to the 1 second value from the RFC, and you may also think that it's a little high. That's because there are min (200 milliseconds) and max (120 seconds) bounds in play when it comes to RTO for Linux (there is a great explanation of this in the article I linked above).
So, you can't alter the RTO value directly, but for lossy networks (like wireless) you can try tweaking F-RTO (this may already be enabled depending on your distro). There are actually two related options related to F-RTO that you can tweak (good summary here):
Depending on what you are trying to optimize for, these may or may not be useful.
EDIT: following up on the ability to tweak the rto_min/max values for TCP from the comments.
You can't change the global minimum RTO for TCP (as an aside, you can do it for SCTP - those are exposed in sysctl), but the good news is that you can tweak the minimum value of the RTO on a per-route basis. Here's my routing table on my CentOS VM:
I can change the rto_min value on the default route as follows:
And now, my routing table looks like this:
Finally, let's initiate a connection and check out
ss -i
to see if this has been respected:Success! The rto on the HTTP connection (after the change) is 15ms, whereas the SSH connection (before the change) is 200+ as before.
I actually like this approach - it allows you to set the lower value on appropriate routes rather than globally where it might screw up other traffic. Similarly (see the ip man page) you can tweak the initial rtt estimate and the initial rttvar for the route (used when calculating the dynamic RTO). While it's not a complete solution in terms of tweaking, I think most of the important pieces are there. You can't tweak the max setting, but I think that is not going to be as useful generally in any case.