Linux – High softirq when using rate control networking

kernellinuxnetworkingperformance

I was going to post this in ServerFault originally but I thought this might be a better place. Let me know if you think there is a better place to post this question.

I have an user-space application which performs networking through Java NIO's API (aka epoll on Linux) For demonstration and diagnostic purposes, I have a line testing utility. Its basically the same thing as iperf.

Some information about the environment and how the test is run.

  • Ubuntu 16.04 Desktop updated today (4.4.0-34-generic)
  • irqbalance is off
  • Intel X504T1 10GbE (ixgbe) receiver <-> Solarflare 10GbE (sfc) sender
  • Uses 10, 000 TCP sockets
  • Sockets use the OS default configurations
  • The user-space read buffer is 32KB
  • reading occurs no more than 40hz

The line test consists of a single client that transmits as much information as possible over the TCP sockets.

  • each read() per socket is allowed to be called more than once to obtain up to 98KB per hz (the 32KB buffer would have to be read 3 times to hit the ceiling)
  • This means that at 40hz and the 98KB ceiling that read() can be called up to 120 times per second per connection; reading a total of 3, 840KB.
  • Line tester shows that read() is called a total of about 110, 000 times a second.

The line test will totally saturate the 10GbE adapter easily using about 8% softirq

top - 22:04:29 up 51 min,  1 user,  load average: 1.31, 1.02, 0.66
Tasks: 258 total,   1 running, 257 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.2 us,  3.6 sy,  0.0 ni, 85.6 id,  1.1 wa,  0.0 hi,  7.4 si,  0.0 st
KiB Mem : 16378912 total, 12909832 free,  2383088 used,  1085992 buff/cache
KiB Swap: 16721916 total, 16721916 free,        0 used. 13746736 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
 4922 jon       20   0 1553556 492552 127160 S 125.0  3.0   0:54.61 firefox     
 5099 jon       20   0 7212040 218396  16872 S  75.0  1.3   2:59.88 java        
 3194 root      20   0  722144 163812 134052 S  18.8  1.0   1:25.63 Xorg        
 4149 jon       20   0 1588648 147848  75344 S   6.2  0.9   0:28.63 compiz      
 4197 jon       20   0  544660  40600  26804 S   6.2  0.2   0:01.20 indicator-+ 
 5186 jon       20   0   41948   3696   3084 R   6.2  0.0   0:00.01 top         
    1 root      20   0  119744   5884   3964 S   0.0  0.0   0:00.84 systemd     
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd    
    3 root      20   0       0      0      0 S   0.0  0.0   5:01.01 ksoftirqd/0 
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:+ 
    7 root      20   0       0      0      0 S   0.0  0.0   0:01.06 rcu_sched   
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh      
    9 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 migration/0 
   10 root      rt   0       0      0      0 S   0.0  0.0   0:00.04 watchdog/0  
   11 root      rt   0       0      0      0 S   0.0  0.0   0:00.01 watchdog/1  
   12 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 migration/1 
   13 root      20   0       0      0      0 S   0.0  0.0   0:08.16 ksoftirqd/1

cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:         17          0          0          0          0          0          0          0  IR-IO-APIC   2-edge      timer
  1:          0          1          0          0          1          0          0          0  IR-IO-APIC   1-edge      i8042
  5:          0          0          0          0          0          0          0          0  IR-IO-APIC   5-edge      parport0
  8:          0          0          0          0          0          1          0          0  IR-IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0  IR-IO-APIC   9-fasteoi   acpi
 12:          2          0          1          0          1          0          0          0  IR-IO-APIC  12-edge      i8042
 16:         50          6          2          6         10          0          0          3  IR-IO-APIC  16-fasteoi   ehci_hcd:usb1
 17:       1138         35         14         24        227         25         35         24  IR-IO-APIC  17-fasteoi   snd_hda_intel
 19:          0          1          0          0          0          1          0          0  IR-IO-APIC  19-fasteoi   firewire_ohci
 23:         11          4         10          1          7          0          0          0  IR-IO-APIC  23-fasteoi   ehci_hcd:usb2
 24:          0          0          0          0          0          0          0          0  DMAR-MSI   0-edge      dmar0
 27:       4571       1431       1142        812       1286       1442        985        730  IR-PCI-MSI 327680-edge      xhci_hcd
 28:      26230       3078       1744       1325       6297       2715       1703       1258  IR-PCI-MSI 512000-edge      0000:00:1f.2
 29:        754         43         28         30        215        176        129         76  IR-PCI-MSI 2097152-edge      eth0-rx-0
 30:          0          0          0          0          0          0          0          0  IR-PCI-MSI 2097153-edge      eth0-tx-0
 31:          0          0          0          0          1          0          0          0  IR-PCI-MSI 2097154-edge      eth0
 32:        757         64         28         33        205        169        129         66  IR-PCI-MSI 2621440-edge      eth1-rx-0
 33:          0          0          0          0          0          0          0          0  IR-PCI-MSI 2621441-edge      eth1-tx-0
 34:          1          0          0          0          0          0          0          0  IR-PCI-MSI 2621442-edge      eth1
 35:    1042128     233608      58916      16705    1612687    1484813    1121118     630363  IR-PCI-MSI 1048576-edge      enp2s0-TxRx-0
 36:     858271     736510     372134     165262    1704892    1127381    1265752     767377  IR-PCI-MSI 1048577-edge      enp2s0-TxRx-1
 37:     816359     711664     426719     192686    1475309    1307882     807216     712562  IR-PCI-MSI 1048578-edge      enp2s0-TxRx-2
 38:     934786     714007     432100     217627    1905295    1622682    1150693     517990  IR-PCI-MSI 1048579-edge      enp2s0-TxRx-3
 39:          0          0          0          0   14185366          0          0          0  IR-PCI-MSI 1048580-edge      enp2s0-TxRx-4
 40:          0          0          0          0          0   14332864          0          0  IR-PCI-MSI 1048581-edge      enp2s0-TxRx-5
 41:          0          0          0          0          0          0   14617282          0  IR-PCI-MSI 1048582-edge      enp2s0-TxRx-6
 42:          0          0          0          0          0          0          0   14840029  IR-PCI-MSI 1048583-edge      enp2s0-TxRx-7
 43:         57         88         47         34         77         64         75         58  IR-PCI-MSI 1048584-edge      enp2s0
 44:          0          0          0          0          0         13          1          1  IR-PCI-MSI 360448-edge      mei_me
 45:        246         20         30          4        345        132        128        142  IR-PCI-MSI 442368-edge      snd_hda_intel
 46:      63933       9794       7233       4753      28843      19323      17678      11191  IR-PCI-MSI 524288-edge      nvidia
NMI:         57         43         35         42        103         98         83         76   Non-maskable interrupts
LOC:     300755     258293     257168     289802     373725     262211     218677     196510   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:         57         43         35         42        103         98         83         76   Performance monitoring interrupts
IWI:          0          0          0          0          1          0          0          0   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:    7721466    2192716    1958606    3095012    1106115    1189666     309133     169884   Rescheduling interrupts
CAL:       2598       2206       2194       1751       1976       2255       2130       2211   Function call interrupts
TLB:       5450       6659       6103       5640       4352       5128       4535       4470   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:         11         11         11         11         11         11         11         11   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event

Now, lets apply rate control to the socket reader.

  • Inbound rate control is set to 50KB per connection
  • Which is about 500MB/s since we have 10, 000 connections
  • rate control sets reading frequency to 5hz, down from 40hz in the previous example.
  • rate control's frequency is not aligned, meaning that not all connections tick using the same starting reference however, they are all governed by a single clock.
  • clock is 40hz; meaning there is 40 opportunities for scheduled rate control reads to occur.
  • during each of those 5hz rate control reads, the socket is only allowed to read up to 10KB. So, 5 times a second it reads 10KB out of the socket buffer.
  • Line tester shows that read() is called a total of about 47, 000 times a second.

The amount of softirq jumps from 8% to 50-65%; the number of interrupts almost triple and there is 26-58 million RES interrupts (per core) compared to 1-7 million before.

top - 22:31:50 up  1:19,  1 user,  load average: 2.30, 2.30, 1.96
Tasks: 259 total,   2 running, 257 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.3 us,  5.5 sy,  0.0 ni, 41.2 id,  0.0 wa,  0.0 hi, 50.0 si,  0.0 st
KiB Mem : 16378912 total, 11752520 free,  2189080 used,  2437312 buff/cache
KiB Swap: 16721916 total, 16721916 free,        0 used. 12590400 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
    3 root      20   0       0      0      0 S  82.1  0.0  26:57.43 ksoftirqd/0 
 5194 jon       20   0 7212040 233488  16720 S  46.2  1.4  12:08.73 java        
   28 root      20   0       0      0      0 S  40.2  0.0   9:04.84 ksoftirqd/4 
   33 root      20   0       0      0      0 S  30.9  0.0   7:26.84 ksoftirqd/5 
   43 root      20   0       0      0      0 R  21.6  0.0   4:26.41 ksoftirqd/7 
   38 root      20   0       0      0      0 S  21.3  0.0   5:37.16 ksoftirqd/6 
 4922 jon       20   0 1533388 475124 127784 S   5.6  2.9   2:41.82 firefox     
 3194 root      20   0  722448 163872 134052 S   5.3  1.0   2:50.84 Xorg        
 5154 jon       20   0  589896  83876  53964 S   1.7  0.5   0:26.08 plugin-con+ 
   13 root      20   0       0      0      0 S   1.3  0.0   0:42.60 ksoftirqd/1 
 4548 jon       20   0 5492168 634252  43104 S   1.3  3.9   2:18.86 java        
 4149 jon       20   0 1604016 169732  75348 S   1.0  1.0   0:52.62 compiz      
   18 root      20   0       0      0      0 S   0.7  0.0   0:35.31 ksoftirqd/2 
   23 root      20   0       0      0      0 S   0.3  0.0   0:22.65 ksoftirqd/3 

cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:         17          0          0          0          0          0          0          0  IR-IO-APIC   2-edge      timer
  1:          0          1          0          0          1          0          0          0  IR-IO-APIC   1-edge      i8042
  5:          0          0          0          0          0          0          0          0  IR-IO-APIC   5-edge      parport0
  8:          0          0          0          0          0          1          0          0  IR-IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0  IR-IO-APIC   9-fasteoi   acpi
 12:          2          0          1          0          1          0          0          0  IR-IO-APIC  12-edge      i8042
 16:         50          6          2          6         10          0          0          3  IR-IO-APIC  16-fasteoi   ehci_hcd:usb1
 17:       1138         35         14         24        227         25         35         24  IR-IO-APIC  17-fasteoi   snd_hda_intel
 19:          0          1          0          0          0          1          0          0  IR-IO-APIC  19-fasteoi   firewire_ohci
 23:         11          4         10          1          7          0          0          0  IR-IO-APIC  23-fasteoi   ehci_hcd:usb2
 24:          0          0          0          0          0          0          0          0  DMAR-MSI   0-edge      dmar0
 27:       6518       1966       1471       1031       4361       3847       2501       1673  IR-PCI-MSI 327680-edge      xhci_hcd
 28:      26732       3381       1957       1447       6687       3367       2112       1502  IR-PCI-MSI 512000-edge      0000:00:1f.2
 29:        930        184        150        114        283        344        232        142  IR-PCI-MSI 2097152-edge      eth0-rx-0
 30:          0          0          0          0          0          0          0          0  IR-PCI-MSI 2097153-edge      eth0-tx-0
 31:          0          0          0          0          1          0          0          0  IR-PCI-MSI 2097154-edge      eth0
 32:        899        234        138        104        277        348        236        143  IR-PCI-MSI 2621440-edge      eth1-rx-0
 33:          0          0          0          0          0          0          0          0  IR-PCI-MSI 2621441-edge      eth1-tx-0
 34:          1          0          0          0          0          0          0          0  IR-PCI-MSI 2621442-edge      eth1
 35:    1339704     330929      97391      31445    2023348    1859243    1369358     782238  IR-PCI-MSI 1048576-edge      enp2s0-TxRx-0
 36:    1863223    3328011    1764431     788048    2411300    2677922    2540016    1742062  IR-PCI-MSI 1048577-edge      enp2s0-TxRx-1
 37:    1911973    3426913    2084294     955668    2216702    2894499    2008907    1723010  IR-PCI-MSI 1048578-edge      enp2s0-TxRx-2
 38:    2064515    3379490    2155421    1093171    2652077    3162801    2369659    1442568  IR-PCI-MSI 1048579-edge      enp2s0-TxRx-3
 39:          0          0          0          0   23079493          0          0          0  IR-PCI-MSI 1048580-edge      enp2s0-TxRx-4
 40:          0          0          0          0          0   23379687          0          0  IR-PCI-MSI 1048581-edge      enp2s0-TxRx-5
 41:          0          0          0          0          0          0   24721093          0  IR-PCI-MSI 1048582-edge      enp2s0-TxRx-6
 42:          0          0          0          0          0          0          0   25752073  IR-PCI-MSI 1048583-edge      enp2s0-TxRx-7
 43:        211        430        277        179        142        219        240        197  IR-PCI-MSI 1048584-edge      enp2s0
 44:          0          0          0          0          0         13          1          1  IR-PCI-MSI 360448-edge      mei_me
 45:        246         20         30          4        345        132        128        142  IR-PCI-MSI 442368-edge      snd_hda_intel
 46:      87961      29805      21965      14718      43334      42053      34617      23830  IR-PCI-MSI 524288-edge      nvidia
NMI:        218        130        107        105        252        247        225        214   Non-maskable interrupts
LOC:     716630     636798     640606     679852     641275     555921     488433     446196   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:        218        130        107        105        252        247        225        214   Performance monitoring interrupts
IWI:          0          0          0          0          3          0          0          0   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:   38554509    4165414    4123561    5839087    2680226    2883656    1297965     812274   Rescheduling interrupts
CAL:       3292       2356       2373       2014       2215       2496       2375       2474   Function call interrupts
TLB:      10997      21211      21364      22716      11757      23899      28023      27646   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:         17         17         17         17         17         17         17         17   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event

Can anyone explain why this is happening and possibly how to avoid it?

For reference, here is top when using Outbound Rate Control @ 500MB/s

top - 01:26:15 up  4:13,  1 user,  load average: 0.38, 0.31, 1.00
Tasks: 254 total,   1 running, 253 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.7 us,  3.7 sy,  0.0 ni, 93.3 id,  0.1 wa,  0.0 hi,  1.2 si,  0.0 st
KiB Mem : 16378912 total, 12912528 free,  2209912 used,  1256472 buff/cache
KiB Swap: 16721916 total, 16721916 free,        0 used. 13873312 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
 6560 jon       20   0 7212040 204656  16836 S  38.9  1.2   0:21.37 java        
 3194 root      20   0  871176 206844 175404 S   1.0  1.3  12:11.62 Xorg        
 4149 jon       20   0 1909092 221972  99348 S   0.7  1.4   3:21.75 compiz      
 4548 jon       20   0 5879804 662312  45948 S   0.7  4.0   6:48.86 java        
 3940 jon       20   0  350840  13196   5468 S   0.3  0.1   0:20.41 ibus-daemon 
 4922 jon       20   0 1779380 686992 145824 S   0.3  4.2  20:38.42 firefox     
 5827 root      20   0       0      0      0 S   0.3  0.0   0:00.64 kworker/4:1 
 6341 root      20   0       0      0      0 S   0.3  0.0   0:00.93 kworker/1:2 
 6539 root      20   0       0      0      0 S   0.3  0.0   0:00.31 kworker/0:2 
    1 root      20   0  185280   5896   3964 S   0.0  0.0   0:01.01 systemd     
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.02 kthreadd    
    3 root      20   0       0      0      0 S   0.0  0.0 107:56.20 ksoftirqd/0 
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:+ 

Attaching 2, 500 TCP connections and using rate control sees an internal-tcp outbound packet rate of 20K pps; jumping to 5, 000 TCP connections sees that number jump to 105K pps; jumping to 7, 500 TCP makes outbound jump to 190K pps (these are just the packets acknowledging reads — or I assume)**

2: Putting the Solarflare card on the server and the Intel X540T1 on the client; I see IRQ pinning to ksoftirqd/0 using 100% and the total si to 12.5% which is about one core. With Solarflare the RES interrupts don't exceede 10, 000 per core.**

The following is the server when using the Solarflare card.. but only about 360-400MB/s is being received instead of the target 500MB/s

top - 11:07:55 up 16 min,  1 user,  load average: 1.49, 1.09, 0.62
Tasks: 259 total,   3 running, 256 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.5 us,  2.5 sy,  0.0 ni, 83.5 id,  0.0 wa,  0.0 hi, 12.5 si,  0.0 st
KiB Mem : 16378912 total, 12294300 free,  2356136 used,  1728476 buff/cache
KiB Swap: 16721916 total, 16721916 free,        0 used. 13067464 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND     
    3 root      20   0       0      0      0 R  99.7  0.0   5:20.82 ksoftirqd/0 
 4620 jon       20   0 7212040 246176  16712 S  25.6  1.5   1:24.67 java        
 3241 root      20   0  716936 161772 133628 R   3.3  1.0   0:15.42 Xorg        
 4659 jon       20   0  654928  36356  27820 S   1.0  0.2   0:00.63 gnome-term+ 
 4103 jon       20   0 1567768 141048  75340 S   0.7  0.9   0:06.44 compiz      
 4542 jon       20   0 5688204 601804  43040 S   0.7  3.7   1:03.91 java        
    7 root      20   0       0      0      0 S   0.3  0.0   0:00.93 rcu_sched   
 4538 root      20   0       0      0      0 S   0.3  0.0   0:00.68 kworker/4:2 
    1 root      20   0  119844   5980   4028 S   0.0  0.0   0:00.84 systemd     
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd    
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:+ 
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh      
    9 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 migration/0 
   10 root      rt   0       0      0      0 S   0.0  0.0   0:00.02 watchdog/0  
   11 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 watchdog/1  
   12 root      rt   0       0      0      0 S   0.0  0.0   0:00.00 migration/1 
   13 root      20   0       0      0      0 S   0.0  0.0   0:00.02 ksoftirqd/1 

cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:         17          0          0          0          0          0          0          0  IR-IO-APIC   2-edge      timer
  1:          1          0          0          1          0          0          0          0  IR-IO-APIC   1-edge      i8042
  5:          0          0          0          0          0          0          0          0  IR-IO-APIC   5-edge      parport0
  8:          0          0          0          0          0          1          0          0  IR-IO-APIC   8-edge      rtc0
  9:          0          0          0          0          0          0          0          0  IR-IO-APIC   9-fasteoi   acpi
 12:          1          0          1          0          1          0          1          0  IR-IO-APIC  12-edge      i8042
 16:         61          2          1          3          7          2          1          0  IR-IO-APIC  16-fasteoi   ehci_hcd:usb1
 17:       1166         55         10         19        245         45         13         19  IR-IO-APIC  17-fasteoi   snd_hda_intel
 19:          0          0          0          0          2          0          0          0  IR-IO-APIC  19-fasteoi   firewire_ohci
 23:         26          1          2          0          1          2          0          1  IR-IO-APIC  23-fasteoi   ehci_hcd:usb2
 24:          0          0          0          0          0          0          0          0  DMAR-MSI   0-edge      dmar0
 27:       1723        170        168        126       1603        166        135         47  IR-PCI-MSI 327680-edge      xhci_hcd
 28:      24980       1714        933        754       7492       1546       1202        936  IR-PCI-MSI 512000-edge      0000:00:1f.2
 29:        298          2          1          7        159          4          6          1  IR-PCI-MSI 2097152-edge      eth0-rx-0
 30:          0          0          0          0          0          0          0          0  IR-PCI-MSI 2097153-edge      eth0-tx-0
 31:          1          0          0          0          0          0          0          0  IR-PCI-MSI 2097154-edge      eth0
 32:      16878       5179       2952       3044      18575       7842       3822       3939  IR-PCI-MSI 1048576-edge      enp2s0f0-0
 33:      16174       4967       2787       2583      19305       7883       3507       3862  IR-PCI-MSI 1048577-edge      enp2s0f0-1
 34:      16707       5192       2952       2659      18031       8588       3496       4393  IR-PCI-MSI 1048578-edge      enp2s0f0-2
 35:      17726       5431       2951       2746      17183       8105       3529       4238  IR-PCI-MSI 1048579-edge      enp2s0f0-3
 36:          6          1          0          3          6          3          0          1  IR-PCI-MSI 1050624-edge      enp2s0f1-0
 37:          1          1          0          0          0          0          0          0  IR-PCI-MSI 1050625-edge      enp2s0f1-1
 38:          1          1          0          0          0          0          0          0  IR-PCI-MSI 1050626-edge      enp2s0f1-2
 39:          1          1          0          0          0          0          0          0  IR-PCI-MSI 1050627-edge      enp2s0f1-3
 40:        414         12          9          3          0         14         18          8  IR-PCI-MSI 2621440-edge      eth1-rx-0
 41:          0          0          0          0          0          0          0          0  IR-PCI-MSI 2621441-edge      eth1-tx-0
 42:          1          0          0          0          0          0          0          0  IR-PCI-MSI 2621442-edge      eth1
 43:          0          0          0          0         10          0          5          0  IR-PCI-MSI 360448-edge      mei_me
 44:         95         26          8         33        398        384         51         16  IR-PCI-MSI 442368-edge      snd_hda_intel
 45:      17400       1413       1135        806      17781       1714       1401        988  IR-PCI-MSI 524288-edge      nvidia
NMI:         37          3          5          3          2          1          1          1   Non-maskable interrupts
LOC:     112894      53399      87350      46718      43552      19663      25436      19705   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:         37          3          5          3          2          1          1          1   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          0   IRQ work interrupts
RTR:          0          0          0          0          0          0          0          0   APIC ICR read retries
RES:       1808       7668       9364       1244       4161       2554       9171        954   Rescheduling interrupts
CAL:       1900       2028       1497       1984       1862       1931       2118       2004   Function call interrupts
TLB:       1991       2539       3176       2985       3176       2458       1612       2087   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          5          5          5          5          5          5          5          5   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0          0          0          0          0   Posted-interrupt notification event
PIW:          0          0          0          0          0          0          0          0   Posted-interrupt wakeup event

Best Answer

The problem ended up being using rate-control with the default configured sockets was creating a situation where the internal TCP buffer size was automatically-adjusting to larger and larger size due to the slow read out times. (the default max size is like 6MB) When the size was automatically growing, the TCP compact process would start to churn like crazy and thus eating into all the softirq. The way to fix this is to set an explicit TCP buffer size when using rate control to prevent this aberrant behavior.

Related Question