I was going to post this in ServerFault originally but I thought this might be a better place. Let me know if you think there is a better place to post this question.
I have an user-space application which performs networking through Java NIO's API (aka epoll
on Linux) For demonstration and diagnostic purposes, I have a line testing utility. Its basically the same thing as iperf
.
Some information about the environment and how the test is run.
- Ubuntu 16.04 Desktop updated today (4.4.0-34-generic)
- irqbalance is off
- Intel X504T1 10GbE (ixgbe) receiver <-> Solarflare 10GbE (sfc) sender
- Uses 10, 000 TCP sockets
- Sockets use the OS default configurations
- The user-space read buffer is 32KB
- reading occurs no more than 40hz
The line test consists of a single client that transmits as much information as possible over the TCP sockets.
- each read() per socket is allowed to be called more than once to obtain up to 98KB per hz (the 32KB buffer would have to be read 3 times to hit the ceiling)
- This means that at 40hz and the 98KB ceiling that read() can be called up to 120 times per second per connection; reading a total of 3, 840KB.
- Line tester shows that read() is called a total of about 110, 000 times a second.
The line test will totally saturate the 10GbE adapter easily using about 8% softirq
top - 22:04:29 up 51 min, 1 user, load average: 1.31, 1.02, 0.66
Tasks: 258 total, 1 running, 257 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.2 us, 3.6 sy, 0.0 ni, 85.6 id, 1.1 wa, 0.0 hi, 7.4 si, 0.0 st
KiB Mem : 16378912 total, 12909832 free, 2383088 used, 1085992 buff/cache
KiB Swap: 16721916 total, 16721916 free, 0 used. 13746736 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4922 jon 20 0 1553556 492552 127160 S 125.0 3.0 0:54.61 firefox
5099 jon 20 0 7212040 218396 16872 S 75.0 1.3 2:59.88 java
3194 root 20 0 722144 163812 134052 S 18.8 1.0 1:25.63 Xorg
4149 jon 20 0 1588648 147848 75344 S 6.2 0.9 0:28.63 compiz
4197 jon 20 0 544660 40600 26804 S 6.2 0.2 0:01.20 indicator-+
5186 jon 20 0 41948 3696 3084 R 6.2 0.0 0:00.01 top
1 root 20 0 119744 5884 3964 S 0.0 0.0 0:00.84 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 5:01.01 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:+
7 root 20 0 0 0 0 S 0.0 0.0 0:01.06 rcu_sched
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
9 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
10 root rt 0 0 0 0 S 0.0 0.0 0:00.04 watchdog/0
11 root rt 0 0 0 0 S 0.0 0.0 0:00.01 watchdog/1
12 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
13 root 20 0 0 0 0 S 0.0 0.0 0:08.16 ksoftirqd/1
cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 17 0 0 0 0 0 0 0 IR-IO-APIC 2-edge timer
1: 0 1 0 0 1 0 0 0 IR-IO-APIC 1-edge i8042
5: 0 0 0 0 0 0 0 0 IR-IO-APIC 5-edge parport0
8: 0 0 0 0 0 1 0 0 IR-IO-APIC 8-edge rtc0
9: 0 0 0 0 0 0 0 0 IR-IO-APIC 9-fasteoi acpi
12: 2 0 1 0 1 0 0 0 IR-IO-APIC 12-edge i8042
16: 50 6 2 6 10 0 0 3 IR-IO-APIC 16-fasteoi ehci_hcd:usb1
17: 1138 35 14 24 227 25 35 24 IR-IO-APIC 17-fasteoi snd_hda_intel
19: 0 1 0 0 0 1 0 0 IR-IO-APIC 19-fasteoi firewire_ohci
23: 11 4 10 1 7 0 0 0 IR-IO-APIC 23-fasteoi ehci_hcd:usb2
24: 0 0 0 0 0 0 0 0 DMAR-MSI 0-edge dmar0
27: 4571 1431 1142 812 1286 1442 985 730 IR-PCI-MSI 327680-edge xhci_hcd
28: 26230 3078 1744 1325 6297 2715 1703 1258 IR-PCI-MSI 512000-edge 0000:00:1f.2
29: 754 43 28 30 215 176 129 76 IR-PCI-MSI 2097152-edge eth0-rx-0
30: 0 0 0 0 0 0 0 0 IR-PCI-MSI 2097153-edge eth0-tx-0
31: 0 0 0 0 1 0 0 0 IR-PCI-MSI 2097154-edge eth0
32: 757 64 28 33 205 169 129 66 IR-PCI-MSI 2621440-edge eth1-rx-0
33: 0 0 0 0 0 0 0 0 IR-PCI-MSI 2621441-edge eth1-tx-0
34: 1 0 0 0 0 0 0 0 IR-PCI-MSI 2621442-edge eth1
35: 1042128 233608 58916 16705 1612687 1484813 1121118 630363 IR-PCI-MSI 1048576-edge enp2s0-TxRx-0
36: 858271 736510 372134 165262 1704892 1127381 1265752 767377 IR-PCI-MSI 1048577-edge enp2s0-TxRx-1
37: 816359 711664 426719 192686 1475309 1307882 807216 712562 IR-PCI-MSI 1048578-edge enp2s0-TxRx-2
38: 934786 714007 432100 217627 1905295 1622682 1150693 517990 IR-PCI-MSI 1048579-edge enp2s0-TxRx-3
39: 0 0 0 0 14185366 0 0 0 IR-PCI-MSI 1048580-edge enp2s0-TxRx-4
40: 0 0 0 0 0 14332864 0 0 IR-PCI-MSI 1048581-edge enp2s0-TxRx-5
41: 0 0 0 0 0 0 14617282 0 IR-PCI-MSI 1048582-edge enp2s0-TxRx-6
42: 0 0 0 0 0 0 0 14840029 IR-PCI-MSI 1048583-edge enp2s0-TxRx-7
43: 57 88 47 34 77 64 75 58 IR-PCI-MSI 1048584-edge enp2s0
44: 0 0 0 0 0 13 1 1 IR-PCI-MSI 360448-edge mei_me
45: 246 20 30 4 345 132 128 142 IR-PCI-MSI 442368-edge snd_hda_intel
46: 63933 9794 7233 4753 28843 19323 17678 11191 IR-PCI-MSI 524288-edge nvidia
NMI: 57 43 35 42 103 98 83 76 Non-maskable interrupts
LOC: 300755 258293 257168 289802 373725 262211 218677 196510 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 57 43 35 42 103 98 83 76 Performance monitoring interrupts
IWI: 0 0 0 0 1 0 0 0 IRQ work interrupts
RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries
RES: 7721466 2192716 1958606 3095012 1106115 1189666 309133 169884 Rescheduling interrupts
CAL: 2598 2206 2194 1751 1976 2255 2130 2211 Function call interrupts
TLB: 5450 6659 6103 5640 4352 5128 4535 4470 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 11 11 11 11 11 11 11 11 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 0 0 0 0 0 0 Posted-interrupt notification event
PIW: 0 0 0 0 0 0 0 0 Posted-interrupt wakeup event
Now, lets apply rate control to the socket reader.
- Inbound rate control is set to 50KB per connection
- Which is about 500MB/s since we have 10, 000 connections
- rate control sets reading frequency to 5hz, down from 40hz in the previous example.
- rate control's frequency is not aligned, meaning that not all connections tick using the same starting reference however, they are all governed by a single clock.
- clock is 40hz; meaning there is 40 opportunities for scheduled rate control reads to occur.
- during each of those 5hz rate control reads, the socket is only allowed to read up to 10KB. So, 5 times a second it reads 10KB out of the socket buffer.
- Line tester shows that read() is called a total of about 47, 000 times a second.
The amount of softirq
jumps from 8% to 50-65%; the number of interrupts almost triple and there is 26-58 million RES interrupts (per core) compared to 1-7 million before.
top - 22:31:50 up 1:19, 1 user, load average: 2.30, 2.30, 1.96
Tasks: 259 total, 2 running, 257 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.3 us, 5.5 sy, 0.0 ni, 41.2 id, 0.0 wa, 0.0 hi, 50.0 si, 0.0 st
KiB Mem : 16378912 total, 11752520 free, 2189080 used, 2437312 buff/cache
KiB Swap: 16721916 total, 16721916 free, 0 used. 12590400 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3 root 20 0 0 0 0 S 82.1 0.0 26:57.43 ksoftirqd/0
5194 jon 20 0 7212040 233488 16720 S 46.2 1.4 12:08.73 java
28 root 20 0 0 0 0 S 40.2 0.0 9:04.84 ksoftirqd/4
33 root 20 0 0 0 0 S 30.9 0.0 7:26.84 ksoftirqd/5
43 root 20 0 0 0 0 R 21.6 0.0 4:26.41 ksoftirqd/7
38 root 20 0 0 0 0 S 21.3 0.0 5:37.16 ksoftirqd/6
4922 jon 20 0 1533388 475124 127784 S 5.6 2.9 2:41.82 firefox
3194 root 20 0 722448 163872 134052 S 5.3 1.0 2:50.84 Xorg
5154 jon 20 0 589896 83876 53964 S 1.7 0.5 0:26.08 plugin-con+
13 root 20 0 0 0 0 S 1.3 0.0 0:42.60 ksoftirqd/1
4548 jon 20 0 5492168 634252 43104 S 1.3 3.9 2:18.86 java
4149 jon 20 0 1604016 169732 75348 S 1.0 1.0 0:52.62 compiz
18 root 20 0 0 0 0 S 0.7 0.0 0:35.31 ksoftirqd/2
23 root 20 0 0 0 0 S 0.3 0.0 0:22.65 ksoftirqd/3
cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 17 0 0 0 0 0 0 0 IR-IO-APIC 2-edge timer
1: 0 1 0 0 1 0 0 0 IR-IO-APIC 1-edge i8042
5: 0 0 0 0 0 0 0 0 IR-IO-APIC 5-edge parport0
8: 0 0 0 0 0 1 0 0 IR-IO-APIC 8-edge rtc0
9: 0 0 0 0 0 0 0 0 IR-IO-APIC 9-fasteoi acpi
12: 2 0 1 0 1 0 0 0 IR-IO-APIC 12-edge i8042
16: 50 6 2 6 10 0 0 3 IR-IO-APIC 16-fasteoi ehci_hcd:usb1
17: 1138 35 14 24 227 25 35 24 IR-IO-APIC 17-fasteoi snd_hda_intel
19: 0 1 0 0 0 1 0 0 IR-IO-APIC 19-fasteoi firewire_ohci
23: 11 4 10 1 7 0 0 0 IR-IO-APIC 23-fasteoi ehci_hcd:usb2
24: 0 0 0 0 0 0 0 0 DMAR-MSI 0-edge dmar0
27: 6518 1966 1471 1031 4361 3847 2501 1673 IR-PCI-MSI 327680-edge xhci_hcd
28: 26732 3381 1957 1447 6687 3367 2112 1502 IR-PCI-MSI 512000-edge 0000:00:1f.2
29: 930 184 150 114 283 344 232 142 IR-PCI-MSI 2097152-edge eth0-rx-0
30: 0 0 0 0 0 0 0 0 IR-PCI-MSI 2097153-edge eth0-tx-0
31: 0 0 0 0 1 0 0 0 IR-PCI-MSI 2097154-edge eth0
32: 899 234 138 104 277 348 236 143 IR-PCI-MSI 2621440-edge eth1-rx-0
33: 0 0 0 0 0 0 0 0 IR-PCI-MSI 2621441-edge eth1-tx-0
34: 1 0 0 0 0 0 0 0 IR-PCI-MSI 2621442-edge eth1
35: 1339704 330929 97391 31445 2023348 1859243 1369358 782238 IR-PCI-MSI 1048576-edge enp2s0-TxRx-0
36: 1863223 3328011 1764431 788048 2411300 2677922 2540016 1742062 IR-PCI-MSI 1048577-edge enp2s0-TxRx-1
37: 1911973 3426913 2084294 955668 2216702 2894499 2008907 1723010 IR-PCI-MSI 1048578-edge enp2s0-TxRx-2
38: 2064515 3379490 2155421 1093171 2652077 3162801 2369659 1442568 IR-PCI-MSI 1048579-edge enp2s0-TxRx-3
39: 0 0 0 0 23079493 0 0 0 IR-PCI-MSI 1048580-edge enp2s0-TxRx-4
40: 0 0 0 0 0 23379687 0 0 IR-PCI-MSI 1048581-edge enp2s0-TxRx-5
41: 0 0 0 0 0 0 24721093 0 IR-PCI-MSI 1048582-edge enp2s0-TxRx-6
42: 0 0 0 0 0 0 0 25752073 IR-PCI-MSI 1048583-edge enp2s0-TxRx-7
43: 211 430 277 179 142 219 240 197 IR-PCI-MSI 1048584-edge enp2s0
44: 0 0 0 0 0 13 1 1 IR-PCI-MSI 360448-edge mei_me
45: 246 20 30 4 345 132 128 142 IR-PCI-MSI 442368-edge snd_hda_intel
46: 87961 29805 21965 14718 43334 42053 34617 23830 IR-PCI-MSI 524288-edge nvidia
NMI: 218 130 107 105 252 247 225 214 Non-maskable interrupts
LOC: 716630 636798 640606 679852 641275 555921 488433 446196 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 218 130 107 105 252 247 225 214 Performance monitoring interrupts
IWI: 0 0 0 0 3 0 0 0 IRQ work interrupts
RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries
RES: 38554509 4165414 4123561 5839087 2680226 2883656 1297965 812274 Rescheduling interrupts
CAL: 3292 2356 2373 2014 2215 2496 2375 2474 Function call interrupts
TLB: 10997 21211 21364 22716 11757 23899 28023 27646 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 17 17 17 17 17 17 17 17 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 0 0 0 0 0 0 Posted-interrupt notification event
PIW: 0 0 0 0 0 0 0 0 Posted-interrupt wakeup event
Can anyone explain why this is happening and possibly how to avoid it?
For reference, here is top
when using Outbound Rate Control @ 500MB/s
top - 01:26:15 up 4:13, 1 user, load average: 0.38, 0.31, 1.00
Tasks: 254 total, 1 running, 253 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.7 us, 3.7 sy, 0.0 ni, 93.3 id, 0.1 wa, 0.0 hi, 1.2 si, 0.0 st
KiB Mem : 16378912 total, 12912528 free, 2209912 used, 1256472 buff/cache
KiB Swap: 16721916 total, 16721916 free, 0 used. 13873312 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6560 jon 20 0 7212040 204656 16836 S 38.9 1.2 0:21.37 java
3194 root 20 0 871176 206844 175404 S 1.0 1.3 12:11.62 Xorg
4149 jon 20 0 1909092 221972 99348 S 0.7 1.4 3:21.75 compiz
4548 jon 20 0 5879804 662312 45948 S 0.7 4.0 6:48.86 java
3940 jon 20 0 350840 13196 5468 S 0.3 0.1 0:20.41 ibus-daemon
4922 jon 20 0 1779380 686992 145824 S 0.3 4.2 20:38.42 firefox
5827 root 20 0 0 0 0 S 0.3 0.0 0:00.64 kworker/4:1
6341 root 20 0 0 0 0 S 0.3 0.0 0:00.93 kworker/1:2
6539 root 20 0 0 0 0 S 0.3 0.0 0:00.31 kworker/0:2
1 root 20 0 185280 5896 3964 S 0.0 0.0 0:01.01 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 107:56.20 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:+
Attaching 2, 500 TCP connections and using rate control sees an internal-tcp outbound packet rate of 20K pps; jumping to 5, 000 TCP connections sees that number jump to 105K pps; jumping to 7, 500 TCP makes outbound jump to 190K pps (these are just the packets acknowledging reads — or I assume)**
2: Putting the Solarflare card on the server and the Intel X540T1 on the client; I see IRQ pinning to ksoftirqd/0
using 100% and the total si
to 12.5%
which is about one core. With Solarflare the RES
interrupts don't exceede 10, 000 per core.**
The following is the server when using the Solarflare card.. but only about 360-400MB/s is being received instead of the target 500MB/s
top - 11:07:55 up 16 min, 1 user, load average: 1.49, 1.09, 0.62
Tasks: 259 total, 3 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 2.5 sy, 0.0 ni, 83.5 id, 0.0 wa, 0.0 hi, 12.5 si, 0.0 st
KiB Mem : 16378912 total, 12294300 free, 2356136 used, 1728476 buff/cache
KiB Swap: 16721916 total, 16721916 free, 0 used. 13067464 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3 root 20 0 0 0 0 R 99.7 0.0 5:20.82 ksoftirqd/0
4620 jon 20 0 7212040 246176 16712 S 25.6 1.5 1:24.67 java
3241 root 20 0 716936 161772 133628 R 3.3 1.0 0:15.42 Xorg
4659 jon 20 0 654928 36356 27820 S 1.0 0.2 0:00.63 gnome-term+
4103 jon 20 0 1567768 141048 75340 S 0.7 0.9 0:06.44 compiz
4542 jon 20 0 5688204 601804 43040 S 0.7 3.7 1:03.91 java
7 root 20 0 0 0 0 S 0.3 0.0 0:00.93 rcu_sched
4538 root 20 0 0 0 0 S 0.3 0.0 0:00.68 kworker/4:2
1 root 20 0 119844 5980 4028 S 0.0 0.0 0:00.84 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:+
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
9 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
10 root rt 0 0 0 0 S 0.0 0.0 0:00.02 watchdog/0
11 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/1
12 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
13 root 20 0 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/1
cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 17 0 0 0 0 0 0 0 IR-IO-APIC 2-edge timer
1: 1 0 0 1 0 0 0 0 IR-IO-APIC 1-edge i8042
5: 0 0 0 0 0 0 0 0 IR-IO-APIC 5-edge parport0
8: 0 0 0 0 0 1 0 0 IR-IO-APIC 8-edge rtc0
9: 0 0 0 0 0 0 0 0 IR-IO-APIC 9-fasteoi acpi
12: 1 0 1 0 1 0 1 0 IR-IO-APIC 12-edge i8042
16: 61 2 1 3 7 2 1 0 IR-IO-APIC 16-fasteoi ehci_hcd:usb1
17: 1166 55 10 19 245 45 13 19 IR-IO-APIC 17-fasteoi snd_hda_intel
19: 0 0 0 0 2 0 0 0 IR-IO-APIC 19-fasteoi firewire_ohci
23: 26 1 2 0 1 2 0 1 IR-IO-APIC 23-fasteoi ehci_hcd:usb2
24: 0 0 0 0 0 0 0 0 DMAR-MSI 0-edge dmar0
27: 1723 170 168 126 1603 166 135 47 IR-PCI-MSI 327680-edge xhci_hcd
28: 24980 1714 933 754 7492 1546 1202 936 IR-PCI-MSI 512000-edge 0000:00:1f.2
29: 298 2 1 7 159 4 6 1 IR-PCI-MSI 2097152-edge eth0-rx-0
30: 0 0 0 0 0 0 0 0 IR-PCI-MSI 2097153-edge eth0-tx-0
31: 1 0 0 0 0 0 0 0 IR-PCI-MSI 2097154-edge eth0
32: 16878 5179 2952 3044 18575 7842 3822 3939 IR-PCI-MSI 1048576-edge enp2s0f0-0
33: 16174 4967 2787 2583 19305 7883 3507 3862 IR-PCI-MSI 1048577-edge enp2s0f0-1
34: 16707 5192 2952 2659 18031 8588 3496 4393 IR-PCI-MSI 1048578-edge enp2s0f0-2
35: 17726 5431 2951 2746 17183 8105 3529 4238 IR-PCI-MSI 1048579-edge enp2s0f0-3
36: 6 1 0 3 6 3 0 1 IR-PCI-MSI 1050624-edge enp2s0f1-0
37: 1 1 0 0 0 0 0 0 IR-PCI-MSI 1050625-edge enp2s0f1-1
38: 1 1 0 0 0 0 0 0 IR-PCI-MSI 1050626-edge enp2s0f1-2
39: 1 1 0 0 0 0 0 0 IR-PCI-MSI 1050627-edge enp2s0f1-3
40: 414 12 9 3 0 14 18 8 IR-PCI-MSI 2621440-edge eth1-rx-0
41: 0 0 0 0 0 0 0 0 IR-PCI-MSI 2621441-edge eth1-tx-0
42: 1 0 0 0 0 0 0 0 IR-PCI-MSI 2621442-edge eth1
43: 0 0 0 0 10 0 5 0 IR-PCI-MSI 360448-edge mei_me
44: 95 26 8 33 398 384 51 16 IR-PCI-MSI 442368-edge snd_hda_intel
45: 17400 1413 1135 806 17781 1714 1401 988 IR-PCI-MSI 524288-edge nvidia
NMI: 37 3 5 3 2 1 1 1 Non-maskable interrupts
LOC: 112894 53399 87350 46718 43552 19663 25436 19705 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 37 3 5 3 2 1 1 1 Performance monitoring interrupts
IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts
RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries
RES: 1808 7668 9364 1244 4161 2554 9171 954 Rescheduling interrupts
CAL: 1900 2028 1497 1984 1862 1931 2118 2004 Function call interrupts
TLB: 1991 2539 3176 2985 3176 2458 1612 2087 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 5 5 5 5 5 5 5 5 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 0 0 0 0 0 0 Posted-interrupt notification event
PIW: 0 0 0 0 0 0 0 0 Posted-interrupt wakeup event
Best Answer
The problem ended up being using rate-control with the default configured sockets was creating a situation where the internal TCP buffer size was automatically-adjusting to larger and larger size due to the slow read out times. (the default max size is like 6MB) When the size was automatically growing, the TCP compact process would start to churn like crazy and thus eating into all the softirq. The way to fix this is to set an explicit TCP buffer size when using rate control to prevent this aberrant behavior.