How to tell how much memory TCP buffers are actually using

tcp

I've got a front end machine with about 1k persistent, very low-bandwidth TCP connections. It's a bit memory constrained so I'm trying to figure out where a few hundred MBs are going. TCP buffers are one possible culprit, but I can't make a dent in these questions:

  1. Where is the memory reported? Is it part of the buff/cache item in top, or is it part of the process's RES metric?
  2. If I want to reduce it on a per-process level, how do I ensure that my reductions are having the desired effect?
  3. Do the buffers continue to take up some memory even when there's minimal traffic flowing, or do they grow dynamically, with the buffer sizes merely being the maximum allowable size?

I realize one possible answer is "trust the kernel to do this for you," but I want to rule out TCP buffers as a source of memory pressure.

Investigation: Question 1

This page writes, "the 'buffers' memory is memory used by Linux to buffer network and disk connections." This implies that they're not part of the RES metric in top.

To find the actual memory usage, /proc/net/sockstat is the most promising:

sockets: used 3640
TCP: inuse 48 orphan 49 tw 63 alloc 2620 mem 248
UDP: inuse 6 mem 10
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

This is the best explanation I could find, but mem isn't addressed there. It is addressed here, but 248*4k ~= 1MB, or about 1/1000 the system-wide max, which seems like an absurdly low number for a server with hundreds of persistent connections and sustained .2-.3Mbit/sec network traffic.

Of course, the system memory limits themselves are:

$ grep . /proc/sys/net/ipv4/tcp*mem
/proc/sys/net/ipv4/tcp_mem:140631   187510  281262
/proc/sys/net/ipv4/tcp_rmem:4096    87380   6291456
/proc/sys/net/ipv4/tcp_wmem:4096    16384   4194304

tcp_mem's third parameter is the system-wide maximum number of 4k pages dedicated to TCP buffers; if the total of buffer size ever surpasses this value, the kernel will start dropping packets. For non-exotic workloads there's no need to tune this value.

Next up is /proc/meminfo, and its mysterious Buffers and Cached items. I looked at several sources but couldn't find any that claimed it accounted for TCP buffers.

...
MemAvailable:    8298852 kB
Buffers:          192440 kB
Cached:          2094680 kB
SwapCached:        34560 kB
...

Investigation: Questions 2-3

To inspect TCP buffer sizes at the process level, we've got quite a few options, but none of them seem to provide the actual allocated memory instead of the current queue size or maximum.

There's ss -m --info:

State       Recv-Q Send-Q
ESTAB       0      0
... <snip> ....
skmem:(r0,rb1062000,t0,tb2626560,f0,w0,o0,bl0)  ...<snip> rcv_space:43690

So we have

  • Recv-Q and Send-Q, the current buffer usage
  • r and t, which are explained in this excellent post, but it's unclear how they're different from Recv-Q and Send-Q
  • Something called rb, which looks suspiciously like some sort of max buffer size, but for which I couldn't find any documentation
  • rcv_space, which this page claims isn't the actual buffer size; for that you need to call getsockopt

This answer suggests lsof, but the size/off seems to be reporting the same buffer usage as ss:

COMMAND     PID   TID                USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME
sslocal    4032                   michael   82u     IPv4            1733921      0t0        TCP localhost:socks->localhost:59594 (ESTABLISHED)

And then these answers suggest that lsof can't return the actual buffer size. It does provide a kernel module that should do the trick, but it only seems to work on sockets whose buffer sizes have been fixed with setsockopt; if not, SO_SNDBUF and SO_RCVBUF aren't included.

Best Answer

/proc/net/sockstat, specifically the mem field, is where to look. This value is is reported in kernel pages and corresponds directly to /proc/sys/net/ipv4/tcp_mem.

At the individual socket level, memory is allocated in kernel space only until the user space code reads it, at which time the kernel memory is freed (see here). sk_buff->truesize is the sum of both the amount of data buffered, as well as the socket structure itself (see here, and the patch which corrected for memory alignment is talked about here)

I suspect that the mem field of /proc/net/sockstat is calculated simply by summing sk_buff->truesize for all sockets, but I'm not familiar enough with the kernel source to know where to look for that.

By way of confirmation, this feature request from the netdata monitoring system includes a lot of good discussion and relevant links as well, and it backs up this interpretation of /proc/net/sockstat.

This post on the "out of socket memory" error contains some more general discussion of different memory issues.

Related Question