Linux Bridge – Fix Virtual Machines Not Forwarding IP Packets

bridgelinux

Kernel: 5.5.8-arch1-1

I am trying to get virtual networking working using a bridge attached to my physical interface. This is a typical setup, I'm not even trying to do anything weird.

  • Bridge: br0
  • Phys interface: enp6s0f0

The problem is that Linux isn't forwarding any IP traffic out the physical interface. It's forwarding ARP traffic both ways since ARP resolution works, but no IP traffic gets sent out of enp6s0f0.

Things I've tried:

  • adding enp6s0f1 to the bridge, giving enp7s0f0 to the VM, and using a cable to link enp7s0f0 to enp6s0f1
    • same result (ARP traffic forwarded, IP traffic not)
  • stopping docker and flushing all tables
    • no change
  • disabling rp_filter
    • no change
  • using the onboard NIC
    • no change (this was actually the initial setup and I dropped this quad-port card in to see if was the onboard NIC causing a problem)
  • pinging the VM from another machine
    • I could see the echo-request come in and I could see it on br0 but it was not forwarded to the VM port (either the vnet port or enp6s0f1)
  • enabling STP on the bridge (it was initially disabled)
    • no change
○ → ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp6s0f0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc mq master br0 state UP group default qlen 1000
    link/ether 00:10:18:85:1c:c0 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::210:18ff:fe85:1cc0/64 scope link 
       valid_lft forever preferred_lft forever
3: enp6s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:10:18:85:1c:c2 brd ff:ff:ff:ff:ff:ff
4: enp7s0f0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:10:18:85:1c:c4 brd ff:ff:ff:ff:ff:ff
5: enp7s0f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:10:18:85:1c:c6 brd ff:ff:ff:ff:ff:ff
6: enp9s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b4:2e:99:a6:22:f9 brd ff:ff:ff:ff:ff:ff
7: wlp8s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 08:71:90:4e:e9:77 brd ff:ff:ff:ff:ff:ff
8: br-183e1a17d7f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:ba:03:e1:9d brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-183e1a17d7f6
       valid_lft forever preferred_lft forever
9: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:02:61:00:66 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
10: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:10:18:85:1c:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.205/24 brd 192.168.1.255 scope global dynamic noprefixroute br0
       valid_lft 9730sec preferred_lft 7930sec
    inet6 fe80::210:18ff:fe85:1cc0/64 scope link 
       valid_lft forever preferred_lft forever
11: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UNKNOWN group default qlen 1000
    link/ether fe:54:00:be:eb:3e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc54:ff:febe:eb3e/64 scope link 
       valid_lft forever preferred_lft forever

○ → brctl showstp br0
br0
 bridge id      8000.001018851cc0
 designated root    1000.44e4d9d88a00
 root port         1            path cost          4
 max age          19.99         bridge max age        19.99
 hello time        1.99         bridge hello time      1.99
 forward delay        14.99         bridge forward delay      14.99
 ageing time         299.99
 hello timer           0.00         tcn timer          0.00
 topology change timer     0.00         gc timer          25.78
 flags          


enp6s0f0 (1)
 port id        8001            state            forwarding
 designated root    1000.44e4d9d88a00   path cost          4
 designated bridge  1000.44e4d9d88a00   message age timer     19.21
 designated port    800d            forward delay timer    0.00
 designated cost       0            hold timer         0.00
 flags          

vnet0 (2)
 port id        8002            state            forwarding
 designated root    1000.44e4d9d88a00   path cost        100
 designated bridge  8000.001018851cc0   message age timer      0.00
 designated port    8002            forward delay timer    0.00
 designated cost       4            hold timer         0.22
 flags          

○ → bridge -d link show
2: enp6s0f0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4 
    hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on mcast_to_unicast off neigh_suppress off vlan_tunnel off isolated off enp6s0f0
8: br-183e1a17d7f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br-183e1a17d7f6 br-183e1a17d7f6
9: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master docker0 docker0
10: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 br0
11: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 100 
    hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on mcast_to_unicast off neigh_suppress off vlan_tunnel off isolated off vnet0

○ → sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1

○ → sysctl net.ipv4.conf.br0.forwarding
net.ipv4.conf.br0.forwarding = 1

Best Answer

It appears that, probably because of an iptables rule from Docker, you had the module br_netfilter loaded and active (ie: sysctl net.bridge.bridge-nf-call-iptables returns 1). This makes bridged frames (Ethernet, layer 2) subject to iptables filtering (IP, layer 3):

What's bridge-netfilter?

Since Linux kernel 3.18-rc1, you have to modprobe br_netfilter to enable bridge-netfilter.

The bridge-netfilter code enables the following functionality:

{Ip,Ip6,Arp}tables can filter bridged IPv4/IPv6/ARP packets, even when encapsulated in an 802.1Q VLAN or PPPoE header. This enables the functionality of a stateful transparent firewall. All filtering, logging and NAT features of the 3 tools can therefore be used on bridged frames. Combined with ebtables, the bridge-nf code therefore makes Linux a very powerful transparent firewall. This enables, f.e., the creation of a transparent masquerading machine (i.e. all local hosts think they are directly connected to the Internet). Letting {ip,ip6,arp}tables see bridged traffic can be disabled or enabled using the appropriate proc entries, located in /proc/sys/net/bridge/:

  • bridge-nf-call-arptables

  • bridge-nf-call-iptables

  • bridge-nf-call-ip6tables

For example this module gets automatically loaded whenever an iptables with the physdev match is in use, even in an other network namespace.

There is documentation explaining side effects caused by this module. Those side effects are intended when using it for bridge transparent firewalling. Also, the iptables physdev match cannot work properly without it (it simply won't match anymore). It's also explained how to prevent its effects, especially in chapter 7:

Because of the br-nf code, there are 2 ways a frame/packet can pass through the 3 given iptables chains. The first way is when the frame is bridged, so the iptables chains are called by the bridge code. The second way is when the packet is routed.

Rather than disabling on iptables this module like this:

sysctl -w net.bridge.bridge-nf-call-iptables=1

one should adapt its iptables rules as explained in chapter 7 to avoid side effets. Else other unknown parts of the system will be disrupted.

Until recently in kernel 5.3 this module was not namespace aware and having it loaded suddenly enabled it on all network namespaces causing all kind of troubles when unexpected. It's also since this then that it's also possible to enable it per bridge (ip link set dev BRIDGE type bridge nf_call_iptables 1) rather than per namespace.

Once tools (Docker...) and kernel (>= 5.3) follow evolution, simply having it enabled in select network namespaces and bridges should suffice, but today probably not. Also note that kernel 5.3 also inherited native bridge stateful firewalling, usable by nftables, probably turning this module soon obsolete (once direct encapsulation/decapsulation support in bridge for VLAN and PPPoE are available):

Netfilter

Add native connection tracking support for the bridge. Before this patchset, only chance for people to do stateful filtering is to use the br_netfilter emulation layer, this is a step forward to deprecate it

Related Question