MSS Initiative

What's Here Other Links

The Path MTU Discovery Problem

This page attempts to explain Path MTU Discovery (PTMUD), and Path MTU Discovery Blackholes from the beginning. Only basic network understanding is assumed.


Path MTU Discovery

Path MTU Discovery is how hosts are supposed to find out how much information they can send in a packet from one host to another without having it be fragmented along the way. The way it works is, your machine (we'll call it Host A) sends a request to a webserver (we'll call it Host B). Host B then attempts to respond with what it feels is an appropriate sized packet (usually by looking at the MSS Host A sent and the known MTU of it's first hop) with the DF (Don't Fragment) bit set. If it's too large for a router somewhere in between (or even for Host A), the Router or Host that can't handle it will send back an ICMP message type 3 code 4 (called destination host unreachable) telling Host B that it was too big (newer routers will also send the maximum MTU possible). Now, Host B must resend with a smaller packets. Newer systems will just use the MTU provided in the ICMP message (if it was provided), but hosts that don't support this information will continue to send smaller packets until the packets reach their destination... at least for a while. If Host B does not support using the MTU in the ICMP error message it may periodically attempt to discover MTU increases by sending larger packets.

When it comes to PPPoE, Host A is reporting (via MSS) that it can handle packets up to 1460 (before headers), while the PPPoE connection can actually only handle 1452.

So what do we have: Host A can talk to the PPPoE router at 1460, but the PPPoE router can speak to the ISP only at 1452. Then from that ISP's router the rest of the trip is all 1460. So both endpoints tell each other than can talk at 1460-byte packets, but in fact there's a hop in the middle that's smaller than this.

So then what happens is the router on the other end of your DSL connection will send appropriate ICMP messages back to Host B. When this works, Host B will send smaller packets and all works well.

It may not be PPPoE - it may be your work VPN, your personal VPN, or a variety of other tunnels.

However, sometimes Path MTU Discovery is disabled or doesn't work. There are a variety of reasons for this. It may be disabled because the admin feels there is nothing wrong with fragmenting (more on this below), or because he doesn't want to let in the appropriate ICMP messages, or any other reason. This situation should work (albeit less efficiently). However, sometimes PMTUD breaks. This may happen when the network the webserver is on may be filtering all ICMP messages (generally a bad idea), or any network in between Host B and the router or host needing smaller packets may be blocking ICMP messages. These are just two possibilities. For whatever reason, PMTUD may be disabled or broken at certain sites. These are the sites that users have problems with.

Fragmentation - When PMTU Discovery Is Disabled

When Path MTU Discovery is disabled, routers which cannot fit a packet down the next hop will fragment the packets and send the fragments down the line.

Note for sysadmins: This should work fine unless your firewall does not accept fragments. In this case, it will accept the first part of the packet, and reject the rest of the pieces, thus you will not be able to re-assemble the packet, and therefore your session will just hang as the rest of that packet keeps getting dropped. So, updating your ruleset to accept packets will fix this problem. With IP Filter, allow fragments by adding keep frags at the end of the appropriate keep state rules.

When PMTU Discovery Is Broken

When Path MTU Discovery is broken, i.e. the site is sending packets with DF=1 (don't fragment) that are too big, but is not getting the "Message too big" ICMP packets (type 3 code 4) due to filtering or a broken router, you will also experience problems. RFC 2923 gives suggestions for what should happen in the event Path MTU Discovery fails. The idea is, that the site should "send smaller packets, perhaps turning off the DF flag for each packet." In other words, if the packets aren't getting through, the site should just keep decreasing packet size DF=0. If the former is used, but fails, then the site should then send packets with DF=0. If this succeeds the site should continue to do disable Path MTU Discovery (by keeping DF=0).

Unfortunately there are two problems here. Firstly, RFC 2923 is just a suggestion, and by no means actually used by every IP stack out there. Secondly, in most cases, for PMTUD to fail, TCP has to fail, or come close. This takes a very long time... possibly minutes. This is obviously bad because not only do users not want to wait several minutes for any website to load, but many browsers may timeout before the connection works.

There are three fixes. The first, and best, is to notify us so we can contact the site. They should either disable PMTUD or fix it. The second solution is to decrease the MSS value of all machines behind the gateway. This is a pain, and not the cleanest solution. Lastly, one can use MSS Clamping (discussbed below). Clamping is a very ugly hack, and breaks other things, and thus should be avoided. However for broken sites, it allows one fix on the network that should work for all clients on the network.

MSS Clamping

MSS Clamping is a process on your router or firewall that digs deep into the packets you send out while negotiating connections and adjusts the MSS level lower than your machines actually set it. This way the webservers (and other hosts) will send smaller packets that will get through. This is sub-optimal. For starters this messes with a part of the protocol that the router shouldn't really be messing with. Additionally, this breaks things. Many non-TCP protocols such as IPSec will choke if MSS Clamping is used.

Other References

For more info on Path MTU Discovery, look at RFC 1191.

For more info on TCP Problems with Path MTU Discovery, look at RFC 2923.

For more info on PPPoE, look at RFC 2516.

Valid XHTML 1.0

This page is © Phil Dibowitz 2001 - 2009