Rate limit network but allow bursting per TCP connection before limiting

bandwidthnetworking

We have a Cisco router which allows for rate limiting (they call it policing) but permitting bursting on a per-TCP connection basis. For example, we can cap the bandwidth at 50mbit but the cap won't be imposed until 4 megabytes have been transferred. This is enforced per each TCP connection that is made.

Is there some way to do this in Linux? Also, are there any drawbacks to such a solution? In case it's helpful to anyone, the Cisco command for setting the bursting is the third parameter to the police command which is run under a policy-map (at least on our ASA 5505).

The goal of this is to allow a server to take advantage of 95/5 bursting and serve web pages as quickly as possible for normal users but reduce the chances of bursting more than 5% of the time (such as if doing a server to server transfer or large files being downloaded from a website). I understand with a DDoS attack that went on too long this might not be a solution, but for various reasons that's not a concern here.

Best Answer

This is doable in linux with iptables and tc. You configure iptables to MARK packets on a connection where some number of bytes have been transferred. You then use tc to put those marked packets in a class in a queuing discipline to ratelimit the bandwidth.

One somewhat tricky part is to limit the connection for both uploads and downloads. tc doesn't support traffic shaping of the ingress. You can get around this by shaping the egress on your webserver-facing interface (which will shape downloads to your webserver), and shaping egress on your upstream-provider facing interface (which will shape uploads from your webserver). You aren't really shaping the ingress (download) traffic, as you can't control how quickly your upstream provider sends data. But, shaping your webserver facing interface will result in packets being dropped and the uploader shrinking their TCP window to accommodate for the bandwidth limit.

Example: (assumes this is on a linux-based router, where web server facing interface is eth0 and upstream is eth1)

# mark the packets for connections over 4MB being forwarded out eth1
# (uploads from webserver)
iptables -t mangle -A FORWARD -p tcp -o eth1 -m connbytes --connbytes 4194304: --connbytes-dir both --connbytes-mode bytes -j MARK --set-mark 50

# mark the packets for connections over 4MB being forwarded out eth0
# (downloads to webserver)
iptables -t mangle -A FORWARD -p tcp -o eth0 -m connbytes --connbytes 4194304: --connbytes-dir both --connbytes-mode bytes -j MARK --set-mark 50

# Setup queuing discipline for server-download traffic
tc qdisc add dev eth0 root handle 1: htb
tc class add dev eth0 parent 1: classid 1:50 htb rate 50mbit

# Setup queuing discipline for server-upload traffic
tc qdisc add dev eth1 root handle 1: htb
tc class add dev eth1 parent 1: classid 1:50 htb rate 50mbit

# set the tc filters to catch the marked packets and direct them appropriately
tc filter add dev eth0 parent 1:0 protocol ip handle 50 fw flowid 1:50
tc filter add dev eth1 parent 1:0 protocol ip handle 50 fw flowid 1:50

If you want to do this on the webserver itself instead of on a linux router, you can still use the upload portions of the above stuff. One notable change is you'd replace FOWARD with OUTPUT. For download you'd need to setup a queuing discipline using an "Intermediate Functional Block" device, or ifb. In short, it uses a virtual interface so that you can treat ingress traffic as egress, and shape it from there using tc. More info on how to setup an ifb can be found here: https://serverfault.com/questions/350023/tc-ingress-policing-and-ifb-mirroring

Note that this type of stuff tends to require a lot of tuning to scale. One immediate concern is that connbytes relies upon the conntrack module, which tends to hit scaling walls with large numbers of connections. I'd recommend heavy load testing.

Another caveat is that this doesn't work at all for UDP, since it is stateless. There are other techniques to tackle that, but it looks like your requirements are for TCP only.

Also, to undo all of the above, do the following:

# Flush the mangle FORWARD chain (don't run this if you have other stuff in there)
iptables -t mangle -F FORWARD

# Delete the queuing disciplines
tc qdisc del dev eth0 root
tc qdisc del dev eth1 root
Related Question