About MTU settings in machines and switch

lanmtunetworkingswitch

Suppose I have two machines and one switch.

M1–Switch–M2.

The settings are:

  • M1 has MTU set to 100
  • Switch has MTU set to 1000
  • M2 has MTU set to 1000.

Questions:

  1. When M1 tries to send a 100-byte packet to M2, there should be no problem, right?

  2. When M2 tries to send a 1000-byte packet to M1, is there any problem?

  3. M2 can send a 1000-byte packet to Switch, but when Switch tries to send the packet to M1, it needs to fragment the packet into 10 small packets. Is that right?

Update:

To be more realistic:
M1, Switch and M2 are all running on a 10G network, and we use IPv4.

The settings are:

  • M1 has MTU set to 1500

  • Switch has MTU set to 9000

  • M2 has MTU set to 9000

Does it help to anwser the question?

Best Answer

You didn't specify what networking technologies you were talking about, so I'm going to assume Ethernet and IP[v4].

Ethernet has always defined its range of acceptable payload lengths to be from 46 to 1500 bytes, and requires all devices (hosts and switches) on the LAN to be able to receive frames with 1500-byte payloads. Because of this, Ethernet does not provide a fragmentation mechanism, nor does it provide a mechanism for communicating or negotiating MTUs (or, more importantly, MRUs -- Maximum Receive Units) between devices. In fact the term "MTU" or "maximum transmission unit" does not appear anywhere in the IEEE 802.3 specification.

So let's add IP into the picture. IP has a concept of an MTU, and most modern IP stacks let you set MTUs on a per-interface basis (and more). But your question as stated doesn't quite work out in the context of IP either, because IP has a minimum MTU of 576. So allow me to restate your question as "M1 has an MTU of 600, and M2 has an MTU of 1200". But what MTU shall we say that "Switch" has? Well, if Switch is just a Layer 2 Ethernet switch, it doesn't have a concept of a settable MTU. So to make your question work out in the context of IP, we'll have to turn that switch into a router. So let's call it "Router" and say it has two Ethernet interfaces, one attached to M1 and one attached to M2. Let's also say it has MTUs of 1200 set on both of its interfaces.

  1. When M1 sends a frame with a 600-byte payload to M2, there would be no problem.
  2. When M2 sends a frame with a 1200-byte payload to M1, there still would be no problem. Why not? Because setting M1's MTU didn't necessarily change its MRU, and in my experience MTUs and MRUs are separate, and implementations don't give you a way to change your MRU. So M1's MRU on that interface would be 1500 since it's Ethernet.
  3. Router wouldn't know it needs to fragment the frames from M2, because it believes all hosts on the Ethernet LAN that M1 is on are able to receive frames with 1200-byte payloads, because it was configured for a 1200-byte MTU on that interface. Luckily this would still probably work out fine, as I discussed in (2).

Okay, still trying to find and answer the true spirit of your question, let's say the link between M1 and Router is actually PPP instead of Ethernet. The PPP protocol allows hosts to communicate/negotiate their MRUs. Let's say that M1 told Router that M1 has a 600-byte MRU limitation, so Router has set its MTU for that link to 600 bytes.

Now, in this case, if M2 sends a 1200-byte IP datagram to M1 (without setting the "Don't Fragment" bit in the IP header), Router will receive it just fine, and realize it needs to fragment it to send it to M1. So does Router fragment it into two 600-byte fragments? Well, no, it's not that simple for a couple reasons.

One reason is that every fragment has to have its own IP header, which adds 20 bytes to the size of each fragment after the first. The other reason is that IP's fragmentation offset field counts in 8-byte chunks instead of individual bytes.

So let's say the 1200-byte datagram was specifically 1172 bytes of application data in a UDP datagram (8 bytes of UDP headers, 20 bytes of IP headers). After fragmentation, the first fragment would contain a 20-byte IP header, the 8-byte UDP header, and the first 568 bytes of the application data, for a total of 586 bytes. The second frame would contain another 20-byte IP header, no UDP header, and the next 576 bytes of the application data, for a total of 586 bytes. That leaves 28 bytes of application data left over for the final fragment, which, with its IP header added, would be 48 bytes.

Update based on Kavin's update that he was talking about Jumbo frames:
Jumbo frames are something that some Gigabit Ethernet product vendors created independently around the time GigE was created, and it was (I believe) subsequently rejected or ignored by the IEEE and seems unlikely to ever become part of the 802.3 Ethernet standard. Even IEEE 802.3-2008 which includes not just 1000BASE-T but 10GBASE-T, does not contain anything about 9000-byte frame payloads.

The vendors that came up with jumbo frames did not provide any kind of autonegotiation or communication mechanism for jumbo frame support, nor did they create an Ethernet-layer fragmentation method to handle the (very common) case you illustrated. If you want to run your Ethernet LAN in this nonstandard mode, you have to ensure that all hosts and switches on your LAN support jumbo frames.

If M1's NIC is not capable of receiving jumbo frames, it will consider a jumbo frame to be "Ethernet jabber" -- a broken device that "keeps jabbering on and on"; keeps sending bits well beyond the end of a maximum allowable 1500 (really 1518) -byte frame. Note that this meaning of jabber is a term for a kind of Ethernet malfunction and is not to be confused with the similarly-named "Jabber" Internet chat system. You'll have to decide if you want to stop using jumbo frames on this network, or if you want to upgrade M1 to have a NIC that supports jumbo frames.

If M1's NIC is capable of receiving jumbo frames, I suspect that setting its IPv4 MTU for that interface down to 1500 will ensure it doesn't transmit any jumbo sized IP datagrams in a single jumbo Ethernet frame, but it will most likely be able to receive large IP datagrams in single jumbo Ethernet frames no problem, because again, MTU is not MRU, and setting an IP-layer MTU doesn't affect what size frame buffers the NIC allows. Now, if you're tweaking a NIC/driver setting to tell the NIC to only use 1500-byte buffers instead of 9000-byte buffers, that's an Ethernet-layer change, and would probably make your NIC act as if it didn't support 9000-byte buffers.

Related Question