Linux – Why would the Linux host suddently stop receiving multicast? All other nics on private network are receiving

linuxmulticastnetworkingudpxen

Here is my dilemma. Suddenly, as of yesterday, multicast packets are no longer being received from eth1 (private gigabit network) from one node. Routing between all the nodes is fine, no collisions, packet loss, etc.

ifconfig info, inet addr, Bcast, Mask, are all fine – they all share the same bcast and netmask. Also, they all share: UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 on eth1 .

These nodes are all hosted by a Xen VM provider. All the guests are seeing each others private IP addresses. There are no iptables rules involved. Multicast packets can be seen between all the nodes (20+) except one – using tcpdump. The system was restarted, etc.

just to add, the affected node, according to netstat -g is not being assigned the multicast group, "eth1 1 224.2.2.4" as all the others.

What would cause something like this? It seems that one node is no longer part of the multicast group – I have a ticket opened but I have a feeling they are stumped.

Best Answer

I'm not aware of any expiration policy for IGMP group membership within the Linux stack. It may happen, but I doubt it, since there are at least two ways for the kernel to be told (one explicit, the other implicit) when a program's IGMP membership should be dropped.

Therefore, I think you have a bug in the software listening for the multicast packets. (Care to name it?) The program receiving the multicast has somehow either dropped its own membership or neglected to add its membership on starting up. On restarting the multicast listener program, tcpdump should see the IGMPv2+ group add membership request go out on the network.

You may well have never noticed this bug when testing on a small LAN since cheap network switches don't understand IGMP. The feature is called IGMP snooping, and it's only found in switches roughly 5× the cost, per port, of the cheapest units, or more. A switch without IGMP snooping ability — or one with the feature turned off — turns multicast into broadcast, so IGMP group-add messages aren't necessary.

Your hosting provider apparently has IGMP snooping enabled on their network fabric, since multicast messages stopped coming in after the IGMP group membership went away on the troubled machine's network stack.

It may also be that the hosting provider's IGMP snooping options are misconfigured in the switches, so they're dropping the group membership, but that doesn't explain the netstat -g result.

Related Question