Quite often in the course of troubleshooting and tuning things I find myself thinking about the following Linux kernel settings:
net.core.netdev_max_backlog
net.ipv4.tcp_max_syn_backlog
net.core.somaxconn
Other than fs.file-max
, net.ipv4.ip_local_port_range
, net.core.rmem_max
, net.core.wmem_max
, net.ipv4.tcp_rmem
, and net.ipv4.tcp_wmem
, they seems to be the important knobs to mess with when you are tuning a box for high levels of concurrency.
My question: How can I check to see how many items are in each of those queues ? Usually people just set them super high, but I would like to log those queue sizes to help predict future failure and catch issues before they manifest in a user noticeable way.
Best Answer
I too have wondered this and was motivated by your question!
I've collected how close I could come to each of the queues you listed with some information related to each. I welcome comments/feedback, any improvement to monitoring makes things easier to manage!
Will show the current global count of connections in the queue, you can break this up per port and put this in exec statements in snmpd.conf if you wanted to poll it from a monitoring application.
From:
These will show you how often you are seeing requests from the queue:
From:
http://linux.die.net/man/5/proc
This (read-only) file gives the number of files presently opened. It contains three numbers: The number of allocated file handles, the number of free file handles and the maximum number of file handles.
If you can build an exclusion list of services (netstat -an | grep LISTEN) then you can deduce how many connections are being used for ephemeral activity:
Should also monitor (from SNMP):
It may also be interesting to collect stats about all the states seen in this tree(established/time_wait/fin_wait/etc):
You'd have to dtrace/strace your system for setsockopt requests. I don't think stats for these requests are tracked otherwise. This isn't really a value that changes from my understanding. The application you've deployed will probably ask for a standard amount. I think you could 'profile' your application with strace and configure this value accordingly. (discuss?)
To track how close you are to the limit you would have to look at the average and max from the tx_queue and rx_queue fields from (on a regular basis):
To track errors related to this:
Should also be monitoring the global 'buffer' pool (via SNMP):