As I mentioned in my final edit, the reason that I am not able to get higher bandwidth using round-robin bonding when the switch has Link Aggregation Groups set is that switch Link Aggregation Groups do not do round-robin striping of packets on a single TCP connection, whereas the linux bonding does. This is mentioned in the kernel.org docs:
12.1.1 MT Bonding Mode Selection for Single Switch Topology
This configuration is the easiest to set up and to understand, although you will have to decide which bonding mode best suits your
needs. The trade offs for each mode are detailed below:
balance-rr: This mode is the only mode that will permit a single TCP/IP connection to stripe traffic across multiple interfaces. It is
therefore the only mode that will allow a single TCP/IP stream to
utilize more than one interface's worth of throughput. This comes at a
cost, however: the striping generally results in peer systems
receiving packets out of order, causing TCP/IP's congestion control
system to kick in, often by retransmitting segments.
It is possible to adjust TCP/IP's congestion limits by altering the net.ipv4.tcp_reordering sysctl parameter. The usual default value
is 3. But keep in mind TCP stack is able to automatically increase
this when it detects reorders.
Note that the fraction of packets that will be delivered out of order is highly variable, and is unlikely to be zero. The level of
reordering depends upon a variety of factors, including the networking
interfaces, the switch, and the topology of the configuration.
Speaking in general terms, higher speed network cards produce more
reordering (due to factors such as packet coalescing), and a "many to
many" topology will reorder at a higher rate than a "many slow to one
fast" configuration.
Many switches do not support any modes that stripe traffic (instead choosing a port based upon IP or MAC level addresses); for
those devices, traffic for a particular connection flowing through the
switch to a balance-rr bond will not utilize greater than one
interface's worth of bandwidth.
If you are utilizing protocols other than TCP/IP, UDP for example, and your application can tolerate out of order delivery, then this
mode can allow for single stream datagram performance that scales near
linearly as interfaces are added to the bond.
This mode requires the switch to have the appropriate ports configured for "etherchannel" or "trunking."
The last note about having ports configured for "trunking" is odd, since when I make the ports in a LAG, all outgoing Tx from the switch go down a single port. Removing the LAG makes it send and receive half and half on each port, but results in many resends, I assume due to out-of-order packets. However, I still get an increase in bandwidth.
Best Answer
Shortly after posting, I found the problem:
So it appears that
nc
will cycle through allA
records for a given host and test each one individually. The first failure was for the incorrect IP address, the success was for the correct one.