Have Established TCP connections with no owner PID

netstattcp

Both ss --processes and netstat --program (with sudo) list some ESTABLISHED TCP connections to local port 6514 with non-zero Recv-Q values and no owner process (netstat output shows - where PID/command should be).

There are other established TCP connections to the same local port which do reveal the owner PID of the Java-based (logstash) application which I expect to own all these connections (it owns the LISTENing socket). These connections have empty receive queues.

Furthermore lsof -i:6514 does not list the "unowned" established TCP connections at all.

Running ss on the remote end of one of the "unowned" connections shows that it believes the connection is established and has empty send and receive queues. The remote end shows the connection has been established for weeks. The remote end is behind a NAT.

I want to understand how these "unowned" yet established TCP connections can exist, and how they get cleaned up, if ever.

I can see that ss --listening shows the LISTEN socket for local port 6514 to have a Send-Q of 50 and a Recv-Q of 51. Can I assume this means the listening Java process has reached its concurrent connection limit and is the reason for the "unowned" established connections?

# lsb_release -d
Description:    Ubuntu 14.04.1 LTS
# uname -irs
Linux 3.13.0-36-generic x86_64

Update

Running netstat --program --numeric-hosts --numeric-ports --extend shows that the user of the "unowned" connections is root not the Java process user and the INode is 0.

The issue has re-appeared with an hour or two after restarting the Java process. This time the LISTEN socket Recv-Q is only 9 compared to the Send-Q of 50 and the total number of TCP connections to local port 6514 is 21 with 8 of those "unowned".

Update 2

I've now realised the Recv-Q number on the LISTEN socket matches the number of "unowned" ESTABLISHED connections. I believe this means that the kernel has completed the TCP SYN/SYN+ACK/ACK handshake on the incoming connections but the Java process has not yet called accept().

If my understanding is correct I need to investigate why the application is not accepting the new connections.

Best Answer

I have narrowed this issue down to logstash and its use of JRuby's SSL implementation, in two different logstash plugins, on two different Java versions, on different machines, with different clients, and with or without an intermediate TCP proxy.

In all cases, replacing SSLServer with TCPServer in the Ruby code, and performing TLS offload in front of logstash solves the issue.

The underlying problem with the JRuby SSL implementation, or the way it is being used in the context of logstash, is unsolved.

Issues for each affected logstash plugin:

Related Question