MSS Problems with Sun PPPoE
Unlike Roaring Penguin, Sun does not provide an MSS Clamping feature with their PPPoE software. As a side-effect many users have found that they are unable to visit some websites from NAT'd machines behind gateway machines using Sun PPPoE. Sun does not provide MSS Clamping because it's ugly, it's a [bad] hack, and it breaks non-TCP protocols. This is OK, because it shouldn't ever be needed. The idea of this page is to explain the problem, why it happens, what should happen, and appropriate work arounds for when things don't work properly.
See also: The MSS Initiative.
THE SHORT VERSION
Note, that the "short version" is for the advanced readers, and does little by way of explanation. What you are probably experiencing is that most sites work (these are sites with which you are successfully having Path MTU Discovery), and a handful of sites which don't work. Essentially, there is one of two problems (possibly both).
The first possibility is Path MTU Discovery being disabled for some of the sites you're having problems with. The key that should solve this, is to allow fragments in your network, because when Path MTU Discovery is disabled, the packets will get fragmented. So, for those of you using IP Filter, you should add
keep frags where appropriate. This is usually to your
keep state rules. So, your rule:
pass in quick on iprb0 proto tcp from 10.0.0.0/8 to any flags S keep state
pass in quick on iprb0 proto udp from 10.0.0.0/8 to any keep state
pass in quick on iprb0 proto icmp from 10.0.0.0/8 to any keep state
would change to:
pass in quick on iprb0 proto tcp from 10.0.0.0/8 to any flags S keep state keep frags
pass in quick on iprb0 proto udp from 10.0.0.0/8 to any keep state keep frags
pass in quick on iprb0 proto icmp from 10.0.0.0/8 to any keep state keep frags
That should solve the problem. Your internal machines may need shift+reload's, or even reboots, but the above change should actually solve the problem. For more details, read the "long version."
The other possible problem is that Path MTU Discovery is breaking. This is less likely, but certainly possible. The only three fixes are to notify the admin of the site, to use MSS Clamping, or to decrease the MSS values of all machines behind your gateway.
THE LONG VERSION
PATH MTU DISCOVERY
- MTU - Maximum Transfer Unit. This is the maximum number of bytes that your computer will send out in a packet. This should be set according to what your connection can handle. For ethernet this should be set to 1500. For PPPoE links this should be set to 1492.
- MSS - Maximum Segment Size. This is used in negotiating what the MTU of a connection between two hosts will be. Essentially this is saying "please don't send me packets bigger than X." This should typically be set to 40 less than your MTU to allow room for headers.
Path MTU Discovery is how hosts are supposed to find out how much information they can send in a packet from one host to another without having it be fragmented along the way. The way it works is, your machine (we'll call it Host A) sends a request to a webserver (we'll call it Host B). Host B then attempts to respond with what it feels is an appropriate sized packet (usually by looking at the MSS Host A sent and the known MTU of it's first hop) with the DF (Don't Fragment) bit set. If it's too large for a router somewhere in between (or even for Host A), the Router or Host that can't handle it will send back an ICMP message type 3 code 4 telling Host B that it was too big (newer routers will also send the maximum MTU possible). Now, Host B will continue to send smaller packets until the packets reach their destination (or use the provided MTU if available)... at least for a while. Periodically, Host B will attempt to discover MTU increases by sending larger packets.
When it comes to PPPoE, Host A is reporting (via MSS) that it can handle packets up to 1460 (before headers), while the PPPoE connection can actually only handle 1452 (this is because Host A is connected to the box doing PPPoE via ethernet and therefore is using a standard ethernet MTU and MSS). Thus the router on the other end of your DSL connection will send appropriate ICMP messages back to Host B. When this works, additional MSS Clamping is not needed.
However, sometimes it is disabled, or doesn't work. There are a variety of reasons for this. It may be disabled because the admin feels there is nothing wrong with fragmenting, or because he doesn't want to let in the appropriate ICMP messages, or any other reason. This situation is easy to handle. However, sometimes PMTUD breaks. This may happen when the network the webserver is on may be filtering all ICMP messages (generally a bad idea), or any network in between Host B and the router or host needing smaller packets may be blocking ICMP messages. These are just two possibilities. For whatever reason, PMTUD may be disabled or broken at certain sites. These are the sites that users have problems with.
MSS Clamping just digs deep into the packets you send out and adjusts the MSS level lower than the NAT'd machines actually set it. This way the webservers (and other hosts) will send smaller packets that will get through. This is bad. For starters this messes with a part of the protocol that the router shouldn't really be messing with. Additionally, this breaks things. Many non-TCP protocols such as IPSec will choke if MSS Clamping is used.
FRAGMENTATION - WHEN PMTUD IS DISABLED
When Path MTU Discovery is disabled, most routers will fragment the packets and send the fragments down the line. If your firewall does not accept fragments, it will accept the first part of the packet, and reject the rest of the pieces, thus you will not be able to re-assemble the packet, and therefore your session will just hang as the rest of that packet keeps getting dropped.
So, updating your ruleset to accept packets will fix this problem. With IP Filter, allow fragments by adding
keep frags at the end of the appropriate
keep state rules. In my tests this eliminated all problems with sites that had PMTUD disabled.
WHEN PMTUD IS BROKEN
When Path MTU Discovery is broken, i.e. the site is sending packets with DF=1 that are too big, but is not getting the "Message too big" ICMP packets (type 3 code 4) due to filtering or a broken router, you will also experience problems. RFC 2923 gives suggestions for what should happen in the event Path MTU Discovery fails. The idea is, that the site should "send smaller packets, perhaps turning off the DF flag for each packet." In other words, if the packets aren't getting through the site should just keep decreasing packet size or set DF=0. If the former is used, but fails, then the site should then send packets with DF=0. If this succeeds the site should continue to do disable PMTUD (by keeping DF=0).
Unfortunately there are two problems here. Firstly, RFC 2923 is just a suggestion, and by no means actually used by every IP stack out there. Secondly, in most cases, for PMTUD to fail, TCP has to fail, or come close. This takes a very long time... possibly minutes. This is obviously bad because not only do users not want to wait several minutes for any website to load, but most browsers will timeout before TCP on the remote end times out.
There are three fixes. The first, and best, is to notify the admin of the site. They should either disable PMTUD or fix it. I've started an initiative for this purpose. The second solution is to decrease the MSS value of all machines behind the gateway. This is a pain, and not the cleanest solution. Lastly, one can use MSS Clamping. Again, that's a very ugly hack, and breaks other things, and thus should be avoided. James Carlson at Sun is writing an external MSS clamping STREAMS module that people can use at their own risk for this situation. It will not be supported or official (that I am aware of). As soon as that's available, I'll update this page.
After changing my firewall ruleset, it took some work to get some workstations to connect to "problem" sites again. If bookmarks still do not work, try just stripped down URLs, such as just the domain name. Try reboots.
There is, of course, another solution entirely that will fix this. Set all your internal workstations to use a smaller MSS (try 1450, and keep going smaller until you eliminate the problem). This is another reasonable solution that should not break anything. Unfortunately it requires a configuration change to every workstation behind the gateway.
You may find that some machines being NAT'd work with problem sites even though all the others don't. I found my Win2K machine was sending an MSS of 1272. Strangely enough, after further investigation it's actually set to 1460 with an MTU of 1500. Silly M$.
For more info on Path MTU Discovery, look at RFC 1191.
For more info on TCP Problems with Path MTU Discovery, look at RFC 2923.
For more info on PPPoE, look at RFC 2516.
Special thanks to James Carlson at Sun for all of your help, and your time.
Last Updated: 02/24/02
This page is © Phil Dibowitz 2001 - 2004