diff options
Diffstat (limited to 'doc/rfc/rfc2923.txt')
-rw-r--r-- | doc/rfc/rfc2923.txt | 843 |
1 files changed, 843 insertions, 0 deletions
diff --git a/doc/rfc/rfc2923.txt b/doc/rfc/rfc2923.txt new file mode 100644 index 0000000..2ac3f3a --- /dev/null +++ b/doc/rfc/rfc2923.txt @@ -0,0 +1,843 @@ + + + + + + +Network Working Group K. Lahey +Request for Comments: 2923 dotRocket, Inc. +Category: Informational September 2000 + + + TCP Problems with Path MTU Discovery + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2000). All Rights Reserved. + +Abstract + + This memo catalogs several known Transmission Control Protocol (TCP) + implementation problems dealing with Path Maximum Transmission Unit + Discovery (PMTUD), including the long-standing black hole problem, + stretch acknowlegements (ACKs) due to confusion between Maximum + Segment Size (MSS) and segment size, and MSS advertisement based on + PMTU. + +1. Introduction + + This memo catalogs several known TCP implementation problems dealing + with Path MTU Discovery [RFC1191], including the long-standing black + hole problem, stretch ACKs due to confusion between MSS and segment + size, and MSS advertisement based on PMTU. The goal in doing so is + to improve conditions in the existing Internet by enhancing the + quality of current TCP/IP implementations. + + While Path MTU Discovery (PMTUD) can be used with any upper-layer + protocol, it is most commonly used by TCP; this document does not + attempt to treat problems encountered by other upper-layer protocols. + Path MTU Discovery for IPv6 [RFC1981] treats only IPv6-dependent + issues, but not the TCP issues brought up in this document. + + Each problem is defined as follows: + + Name of Problem + The name associated with the problem. In this memo, the name is + given as a subsection heading. + + + + + +Lahey Informational [Page 1] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + + Classification + One or more problem categories for which the problem is + classified: "congestion control", "performance", "reliability", + "non-interoperation -- connectivity failure". + + Description + A definition of the problem, succinct but including necessary + background material. + + Significance + A brief summary of the sorts of environments for which the problem + is significant. + + Implications + Why the problem is viewed as a problem. + + Relevant RFCs + The RFCs defining the TCP specification with which the problem + conflicts. These RFCs often qualify behavior using terms such as + MUST, SHOULD, MAY, and others written capitalized. See RFC 2119 + for the exact interpretation of these terms. + + Trace file demonstrating the problem + One or more ASCII trace files demonstrating the problem, if + applicable. + + Trace file demonstrating correct behavior + One or more examples of how correct behavior appears in a trace, + if applicable. + + References + References that further discuss the problem. + + How to detect + How to test an implementation to see if it exhibits the problem. + This discussion may include difficulties and subtleties associated + with causing the problem to manifest itself, and with interpreting + traces to detect the presence of the problem (if applicable). + + How to fix + For known causes of the problem, how to correct the + implementation. + + + + + + + + + +Lahey Informational [Page 2] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + +2. Known implementation problems + +2.1. + + Name of Problem + Black Hole Detection + + Classification + Non-interoperation -- connectivity failure + + Description + A host performs Path MTU Discovery by sending out as large a + packet as possible, with the Don't Fragment (DF) bit set in the IP + header. If the packet is too large for a router to forward on to + a particular link, the router must send an ICMP Destination + Unreachable -- Fragmentation Needed message to the source address. + The host then adjusts the packet size based on the ICMP message. + + As was pointed out in [RFC1435], routers don't always do this + correctly -- many routers fail to send the ICMP messages, for a + variety of reasons ranging from kernel bugs to configuration + problems. Firewalls are often misconfigured to suppress all ICMP + messages. IPsec [RFC2401] and IP-in-IP [RFC2003] tunnels + shouldn't cause these sorts of problems, if the implementations + follow the advice in the appropriate documents. + + PMTUD, as documented in [RFC1191], fails when the appropriate ICMP + messages are not received by the originating host. The upper- + layer protocol continues to try to send large packets and, without + the ICMP messages, never discovers that it needs to reduce the + size of those packets. Its packets are disappearing into a PMTUD + black hole. + + Significance + When PMTUD fails due to the lack of ICMP messages, TCP will also + completely fail under some conditions. + + Implications + This failure is especially difficult to debug, as pings and some + interactive TCP connections to the destination host work. Bulk + transfers fail with the first large packet and the connection + eventually times out. + + These situations can almost always be blamed on a misconfiguration + within the network, which should be corrected. However it seems + inappropriate for some TCP implementations to suffer + + + + + +Lahey Informational [Page 3] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + + interoperability failures over paths which do not affect other TCP + implementations (i.e. those without PMTUD). This creates a market + disincentive for deploying TCP implementation with PMTUD enabled. + + Relevant RFCs + RFC 1191 describes Path MTU Discovery. RFC 1435 provides an early + description of these sorts of problems. + + Trace file demonstrating the problem + Made using tcpdump [Jacobson89] recording at an intermediate host. + + 20:12:11.951321 A > B: S 1748427200:1748427200(0) + win 49152 <mss 1460> + 20:12:11.951829 B > A: S 1001927984:1001927984(0) + ack 1748427201 win 16384 <mss 65240> + 20:12:11.955230 A > B: . ack 1 win 49152 (DF) + 20:12:11.959099 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:12:13.139074 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:12:16.188685 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:12:22.290483 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:12:34.491856 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:12:58.896405 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:13:47.703184 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:14:52.780640 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:15:57.856037 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:17:02.932431 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:18:08.009337 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:19:13.090521 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:20:18.168066 A > B: . 1:1461(1460) ack 1 win 49152 (DF) + 20:21:23.242761 A > B: R 1461:1461(0) ack 1 win 49152 (DF) + + The short SYN packet has no trouble traversing the network, due to + its small size. Similarly, ICMP echo packets used to diagnose + connectivity problems will succeed. + + Large data packets fail to traverse the network. Eventually the + connection times out. This can be especially confusing when the + application starts out with a very small write, which succeeds, + following up with many large writes, which then fail. + + Trace file demonstrating correct behavior + + Made using tcpdump recording at an intermediate host. + + 16:48:42.659115 A > B: S 271394446:271394446(0) + win 8192 <mss 1460> (DF) + 16:48:42.672279 B > A: S 2837734676:2837734676(0) + ack 271394447 win 16384 <mss 65240> + + + +Lahey Informational [Page 4] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + + 16:48:42.676890 A > B: . ack 1 win 8760 (DF) + 16:48:42.870574 A > B: . 1:1461(1460) ack 1 win 8760 (DF) + 16:48:42.871799 A > B: . 1461:2921(1460) ack 1 win 8760 (DF) + 16:48:45.786814 A > B: . 1:1461(1460) ack 1 win 8760 (DF) + 16:48:51.794676 A > B: . 1:1461(1460) ack 1 win 8760 (DF) + 16:49:03.808912 A > B: . 1:537(536) ack 1 win 8760 + 16:49:04.016476 B > A: . ack 537 win 16384 + 16:49:04.021245 A > B: . 537:1073(536) ack 1 win 8760 + 16:49:04.021697 A > B: . 1073:1609(536) ack 1 win 8760 + 16:49:04.120694 B > A: . ack 1609 win 16384 + 16:49:04.126142 A > B: . 1609:2145(536) ack 1 win 8760 + + In this case, the sender sees four packets fail to traverse the + network (using a two-packet initial send window) and turns off + PMTUD. All subsequent packets have the DF flag turned off, and + the size set to the default value of 536 [RFC1122]. + + References + This problem has been discussed extensively on the tcp-impl + mailing list; the name "black hole" has been in use for many + years. + + How to detect + This shows up as a TCP connection which hangs (fails to make + progress) until closed by timeout (this often manifests itself as + a connection that connects and starts to transfer, then eventually + terminates after 15 minutes with zero bytes transfered). This is + particularly annoying with an application like ftp, which will + work perfectly while it uses small packets for control + information, and then fail on bulk transfers. + + A series of ICMP echo packets will show that the two end hosts are + still capable of passing packets, a series of MTU-sized ICMP echo + packets will show some fragmentation, and a series of MTU-sized + ICMP echo packets with DF set will fail. This can be confusing + for network engineers trying to diagnose the problem. + + There are several traceroute implementations that do PMTUD, and + can demonstrate the problem. + + How to fix + TCP should notice that the connection is timing out. After + several timeouts, TCP should attempt to send smaller packets, + perhaps turning off the DF flag for each packet. If this + succeeds, it should continue to turn off PMTUD for the connection + for some reasonable period of time, after which it should probe + again to try to determine if the path has changed. + + + + +Lahey Informational [Page 5] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + + Note that, under IPv6, there is no DF bit -- it is implicitly on + at all times. Fragmentation is not allowed in routers, only at + the originating host. Fortunately, the minimum supported MTU for + IPv6 is 1280 octets, which is significantly larger than the 68 + octet minimum in IPv4. This should make it more reasonable for + IPv6 TCP implementations to fall back to 1280 octet packets, when + IPv4 implementations will probably have to turn off DF to respond + to black hole detection. + + Ideally, the ICMP black holes should be fixed when they are found. + + If hosts start to implement black hole detection, it may be that + these problems will go unnoticed and unfixed. This is especially + unfortunate, since detection can take several seconds each time, + and these delays could result in a significant, hidden degradation + of performance. Hosts that implement black hole detection should + probably log detected black holes, so that they can be fixed. + +2.2. + + Name of Problem + Stretch ACK due to PMTUD + + Classification + Congestion Control / Performance + + Description + When a naively implemented TCP stack communicates with a PMTUD + equipped stack, it will try to generate an ACK for every second + full-sized segment. If it determines the full-sized segment based + on the advertised MSS, this can degrade badly in the face of + PMTUD. + + The PMTU can wind up being a small fraction of the advertised MSS; + in this case, an ACK would be generated only very infrequently. + + Significance + + Stretch ACKs have a variety of unfortunate effects, more fully + outlined in [RFC2525]. Most of these have to do with encouraging + a more bursty connection, due to the infrequent arrival of ACKs. + They can also impede congestion window growth. + + Implications + + The complete implications of stretch ACKs are outlined in + [RFC2525]. + + + + +Lahey Informational [Page 6] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + + Relevant RFCs + RFC 1122 outlines the requirements for frequency of ACK + generation. [RFC2581] expands on this and clarifies that delayed + ACK is a SHOULD, not a MUST. + + Trace file demonstrating it + + Made using tcpdump recording at an intermediate host. The + timestamp options from all but the first two packets have been + removed for clarity. + + 18:16:52.976657 A > B: S 3183102292:3183102292(0) win 16384 + <mss 4312,nop,wscale 0,nop,nop,timestamp 12128 0> (DF) + 18:16:52.979580 B > A: S 2022212745:2022212745(0) ack 3183102293 win + 49152 <mss 4312,nop,wscale 1,nop,nop,timestamp 1592957 12128> (DF) + 18:16:52.979738 A > B: . ack 1 win 17248 (DF) + 18:16:52.982473 A > B: . 1:4301(4300) ack 1 win 17248 (DF) + 18:16:52.982557 C > A: icmp: B unreachable - + need to frag (mtu 1500)! (DF) + 18:16:52.985839 B > A: . ack 1 win 32768 (DF) + 18:16:54.129928 A > B: . 1:1449(1448) ack 1 win 17248 (DF) + . + . + . + 18:16:58.507078 A > B: . 1463941:1465389(1448) ack 1 win 17248 (DF) + 18:16:58.507200 A > B: . 1465389:1466837(1448) ack 1 win 17248 (DF) + 18:16:58.507326 A > B: . 1466837:1468285(1448) ack 1 win 17248 (DF) + 18:16:58.507439 A > B: . 1468285:1469733(1448) ack 1 win 17248 (DF) + 18:16:58.524763 B > A: . ack 1452357 win 32768 (DF) + 18:16:58.524986 B > A: . ack 1461045 win 32768 (DF) + 18:16:58.525138 A > B: . 1469733:1471181(1448) ack 1 win 17248 (DF) + 18:16:58.525268 A > B: . 1471181:1472629(1448) ack 1 win 17248 (DF) + 18:16:58.525393 A > B: . 1472629:1474077(1448) ack 1 win 17248 (DF) + 18:16:58.525516 A > B: . 1474077:1475525(1448) ack 1 win 17248 (DF) + 18:16:58.525642 A > B: . 1475525:1476973(1448) ack 1 win 17248 (DF) + 18:16:58.525766 A > B: . 1476973:1478421(1448) ack 1 win 17248 (DF) + 18:16:58.526063 A > B: . 1478421:1479869(1448) ack 1 win 17248 (DF) + 18:16:58.526187 A > B: . 1479869:1481317(1448) ack 1 win 17248 (DF) + 18:16:58.526310 A > B: . 1481317:1482765(1448) ack 1 win 17248 (DF) + 18:16:58.526432 A > B: . 1482765:1484213(1448) ack 1 win 17248 (DF) + 18:16:58.526561 A > B: . 1484213:1485661(1448) ack 1 win 17248 (DF) + 18:16:58.526671 A > B: . 1485661:1487109(1448) ack 1 win 17248 (DF) + 18:16:58.537944 B > A: . ack 1478421 win 32768 (DF) + 18:16:58.538328 A > B: . 1487109:1488557(1448) ack 1 win 17248 (DF) + + + + + + + +Lahey Informational [Page 7] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + + Note that the interval between ACKs is significantly larger than two + times the segment size; it works out to be almost exactly two times + the advertised MSS. This transfer was long enough that it could be + verified that the stretch ACK was not the result of lost ACK packets. + + Trace file demonstrating correct behavior + + Made using tcpdump recording at an intermediate host. The timestamp + options from all but the first two packets have been removed for + clarity. + + 18:13:32.287965 A > B: S 2972697496:2972697496(0) + win 16384 <mss 4312,nop,wscale 0,nop,nop,timestamp 11326 0> (DF) + 18:13:32.290785 B > A: S 245639054:245639054(0) + ack 2972697497 win 34496 <mss 4312> (DF) + 18:13:32.290941 A > B: . ack 1 win 17248 (DF) + 18:13:32.293774 A > B: . 1:4313(4312) ack 1 win 17248 (DF) + 18:13:32.293856 C > A: icmp: B unreachable - + need to frag (mtu 1500)! (DF) + 18:13:33.637338 A > B: . 1:1461(1460) ack 1 win 17248 (DF) + . + . + . + 18:13:35.561691 A > B: . 1514021:1515481(1460) ack 1 win 17248 (DF) + 18:13:35.561814 A > B: . 1515481:1516941(1460) ack 1 win 17248 (DF) + 18:13:35.561938 A > B: . 1516941:1518401(1460) ack 1 win 17248 (DF) + 18:13:35.562059 A > B: . 1518401:1519861(1460) ack 1 win 17248 (DF) + 18:13:35.562174 A > B: . 1519861:1521321(1460) ack 1 win 17248 (DF) + 18:13:35.564008 B > A: . ack 1481901 win 64680 (DF) + 18:13:35.564383 A > B: . 1521321:1522781(1460) ack 1 win 17248 (DF) + 18:13:35.564499 A > B: . 1522781:1524241(1460) ack 1 win 17248 (DF) + 18:13:35.615576 B > A: . ack 1484821 win 64680 (DF) + 18:13:35.615646 B > A: . ack 1487741 win 64680 (DF) + 18:13:35.615716 B > A: . ack 1490661 win 64680 (DF) + 18:13:35.615784 B > A: . ack 1493581 win 64680 (DF) + 18:13:35.615856 B > A: . ack 1496501 win 64680 (DF) + 18:13:35.615952 A > B: . 1524241:1525701(1460) ack 1 win 17248 (DF) + 18:13:35.615966 B > A: . ack 1499421 win 64680 (DF) + 18:13:35.616088 A > B: . 1525701:1527161(1460) ack 1 win 17248 (DF) + 18:13:35.616105 B > A: . ack 1502341 win 64680 (DF) + 18:13:35.616211 A > B: . 1527161:1528621(1460) ack 1 win 17248 (DF) + 18:13:35.616228 B > A: . ack 1505261 win 64680 (DF) + 18:13:35.616327 A > B: . 1528621:1530081(1460) ack 1 win 17248 (DF) + 18:13:35.616349 B > A: . ack 1508181 win 64680 (DF) + 18:13:35.616448 A > B: . 1530081:1531541(1460) ack 1 win 17248 (DF) + 18:13:35.616565 A > B: . 1531541:1533001(1460) ack 1 win 17248 (DF) + 18:13:35.616891 A > B: . 1533001:1534461(1460) ack 1 win 17248 (DF) + + + + +Lahey Informational [Page 8] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + + In this trace, an ACK is generated for every two segments that + arrive. (The segment size is slightly larger in this trace, even + though the source hosts are the same, because of the lack of + timestamp options in this trace.) + + How to detect + This condition can be observed in a packet trace when the advertised + MSS is significantly larger than the actual PMTU of a connection. + + How to fix Several solutions for this problem have been proposed: + + A simple solution is to ACK every other packet, regardless of size. + This has the drawback of generating large numbers of ACKs in the face + of lots of very small packets; this shows up with applications like + the X Window System. + + A slightly more complex solution would monitor the size of incoming + segments and try to determine what segment size the sender is using. + This requires slightly more state in the receiver, but has the + advantage of making receiver silly window syndrome avoidance + computations more accurate [RFC813]. + +2.3. + + Name of Problem + Determining MSS from PMTU + + Classification + Performance + + Description + The MSS advertised at the start of a connection should be based on + the MTU of the interfaces on the system. (For efficiency and other + reasons this may not be the largest MSS possible.) Some systems use + PMTUD determined values to determine the MSS to advertise. + + This results in an advertised MSS that is smaller than the largest + MTU the system can receive. + + Significance + The advertised MSS is an indication to the remote system about the + largest TCP segment that can be received [RFC879]. If this value is + too small, the remote system will be forced to use a smaller segment + size when sending, purely because the local system found a particular + PMTU earlier. + + + + + + +Lahey Informational [Page 9] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + + Given the asymmetric nature of many routes on the Internet + [Paxson97], it seems entirely possible that the return PMTU is + different from the sending PMTU. Limiting the segment size in this + way can reduce performance and frustrate the PMTUD algorithm. + + Even if the route was symmetric, setting this artificially lowered + limit on segment size will make it impossible to probe later to + determine if the PMTU has changed. + + Implications + The whole point of PMTUD is to send as large a segment as possible. + If long-running connections cannot successfully probe for larger + PMTU, then potential performance gains will be impossible to realize. + This destroys the whole point of PMTUD. + + Relevant RFCs RFC 1191. [RFC879] provides a complete discussion of + MSS calculations and appropriate values. Note that this practice + does not violate any of the specifications in these RFCs. + + Trace file demonstrating it + This trace was made using tcpdump running on an intermediate host. + Host A initiates two separate consecutive connections, A1 and A2, to + host B. Router C is the location of the MTU bottleneck. As usual, + TCP options are removed from all non-SYN packets. + + 22:33:32.305912 A1 > B: S 1523306220:1523306220(0) + win 8760 <mss 1460> (DF) + 22:33:32.306518 B > A1: S 729966260:729966260(0) + ack 1523306221 win 16384 <mss 65240> + 22:33:32.310307 A1 > B: . ack 1 win 8760 (DF) + 22:33:32.323496 A1 > B: P 1:1461(1460) ack 1 win 8760 (DF) + 22:33:32.323569 C > A1: icmp: 129.99.238.5 unreachable - + need to frag (mtu 1024) (DF) (ttl 255, id 20666) + 22:33:32.783694 A1 > B: . 1:985(984) ack 1 win 8856 (DF) + 22:33:32.840817 B > A1: . ack 985 win 16384 + 22:33:32.845651 A1 > B: . 1461:2445(984) ack 1 win 8856 (DF) + 22:33:32.846094 B > A1: . ack 985 win 16384 + 22:33:33.724392 A1 > B: . 985:1969(984) ack 1 win 8856 (DF) + 22:33:33.724893 B > A1: . ack 2445 win 14924 + 22:33:33.728591 A1 > B: . 2445:2921(476) ack 1 win 8856 (DF) + 22:33:33.729161 A1 > B: . ack 1 win 8856 (DF) + 22:33:33.840758 B > A1: . ack 2921 win 16384 + + [...] + + 22:33:34.238659 A1 > B: F 7301:8193(892) ack 1 win 8856 (DF) + 22:33:34.239036 B > A1: . ack 8194 win 15492 + 22:33:34.239303 B > A1: F 1:1(0) ack 8194 win 16384 + + + +Lahey Informational [Page 10] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + + 22:33:34.242971 A1 > B: . ack 2 win 8856 (DF) + 22:33:34.454218 A2 > B: S 1523591299:1523591299(0) + win 8856 <mss 984> (DF) + 22:33:34.454617 B > A2: S 732408874:732408874(0) + ack 1523591300 win 16384 <mss 65240> + 22:33:34.457516 A2 > B: . ack 1 win 8856 (DF) + 22:33:34.470683 A2 > B: P 1:985(984) ack 1 win 8856 (DF) + 22:33:34.471144 B > A2: . ack 985 win 16384 + 22:33:34.476554 A2 > B: . 985:1969(984) ack 1 win 8856 (DF) + 22:33:34.477580 A2 > B: P 1969:2953(984) ack 1 win 8856 (DF) + + [...] + + Notice that the SYN packet for session A2 specifies an MSS of 984. + + Trace file demonstrating correct behavior + + As before, this trace was made using tcpdump running on an + intermediate host. Host A initiates two separate consecutive + connections, A1 and A2, to host B. Router C is the location of the + MTU bottleneck. As usual, TCP options are removed from all non-SYN + packets. + + 22:36:58.828602 A1 > B: S 3402991286:3402991286(0) win 32768 + <mss 4312,wscale 0,nop,timestamp 1123370309 0, + echo 1123370309> (DF) + 22:36:58.844040 B > A1: S 946999880:946999880(0) + ack 3402991287 win 16384 + <mss 65240,nop,wscale 0,nop,nop,timestamp 429552 1123370309> + 22:36:58.848058 A1 > B: . ack 1 win 32768 (DF) + 22:36:58.851514 A1 > B: P 1:1025(1024) ack 1 win 32768 (DF) + 22:36:58.851584 C > A1: icmp: 129.99.238.5 unreachable - + need to frag (mtu 1024) (DF) + 22:36:58.855885 A1 > B: . 1:969(968) ack 1 win 32768 (DF) + 22:36:58.856378 A1 > B: . 969:985(16) ack 1 win 32768 (DF) + 22:36:59.036309 B > A1: . ack 985 win 16384 + 22:36:59.039255 A1 > B: FP 985:1025(40) ack 1 win 32768 (DF) + 22:36:59.039623 B > A1: . ack 1026 win 16344 + 22:36:59.039828 B > A1: F 1:1(0) ack 1026 win 16384 + 22:36:59.043037 A1 > B: . ack 2 win 32768 (DF) + 22:37:01.436032 A2 > B: S 3404812097:3404812097(0) win 32768 + <mss 4312,wscale 0,nop,timestamp 1123372916 0, + echo 1123372916> (DF) + 22:37:01.436424 B > A2: S 949814769:949814769(0) + ack 3404812098 win 16384 + <mss 65240,nop,wscale 0,nop,nop,timestamp 429562 1123372916> + 22:37:01.440147 A2 > B: . ack 1 win 32768 (DF) + 22:37:01.442736 A2 > B: . 1:969(968) ack 1 win 32768 (DF) + + + +Lahey Informational [Page 11] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + + 22:37:01.442894 A2 > B: P 969:985(16) ack 1 win 32768 (DF) + 22:37:01.443283 B > A2: . ack 985 win 16384 + 22:37:01.446068 A2 > B: P 985:1025(40) ack 1 win 32768 (DF) + 22:37:01.446519 B > A2: . ack 1025 win 16384 + 22:37:01.448465 A2 > B: F 1025:1025(0) ack 1 win 32768 (DF) + 22:37:01.448837 B > A2: . ack 1026 win 16384 + 22:37:01.449007 B > A2: F 1:1(0) ack 1026 win 16384 + 22:37:01.452201 A2 > B: . ack 2 win 32768 (DF) + + Note that the same MSS was used for both session A1 and session A2. + + How to detect + This can be detected using a packet trace of two separate + connections; the first should invoke PMTUD; the second should start + soon enough after the first that the PMTU value does not time out. + + How to fix + The MSS should be determined based on the MTUs of the interfaces on + the system, as outlined in [RFC1122] and [RFC1191]. + +3. Security Considerations + + The one security concern raised by this memo is that ICMP black holes + are often caused by over-zealous security administrators who block + all ICMP messages. It is vitally important that those who design and + deploy security systems understand the impact of strict filtering on + upper-layer protocols. The safest web site in the world is worthless + if most TCP implementations cannot transfer data from it. It would + be far nicer to have all of the black holes fixed rather than fixing + all of the TCP implementations. + +4. Acknowledgements + + Thanks to Mark Allman, Vern Paxson, and Jamshid Mahdavi for generous + help reviewing the document, and to Matt Mathis for early suggestions + of various mechanisms that can cause PMTUD black holes, as well as + review. The structure for describing TCP problems, and the early + description of that structure is from [RFC2525]. Special thanks to + Amy Bock, who helped perform the PMTUD tests which discovered these + bugs. + + + + + + + + + + + +Lahey Informational [Page 12] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + +5. References + + [RFC2581] Allman, M., Paxson, V. and W. Stevens, "TCP Congestion + Control", RFC 2581, April 1999. + + [RFC1122] Braden, R., "Requirements for Internet Hosts -- + Communication Layers", STD 3, RFC 1122, October 1989. + + [RFC813] Clark, D., "Window and Acknowledgement Strategy in TCP", + RFC 813, July 1982. + + [Jacobson89] V. Jacobson, C. Leres, and S. McCanne, tcpdump, June + 1989, ftp.ee.lbl.gov + + [RFC1435] Knowles, S., "IESG Advice from Experience with Path MTU + Discovery", RFC 1435, March 1993. + + [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC + 1191, November 1990. + + [RFC1981] McCann, J., Deering, S. and J. Mogul, "Path MTU + Discovery for IP version 6", RFC 1981, August 1996. + + [Paxson96] V. Paxson, "End-to-End Routing Behavior in the + Internet", IEEE/ACM Transactions on Networking (5), + pp.~601-615, Oct. 1997. + + [RFC2525] Paxon, V., Allman, M., Dawson, S., Fenner, W., Griner, + J., Heavens, I., Lahey, K., Semke, I. and B. Volz, + "Known TCP Implementation Problems", RFC 2525, March + 1999. + + [RFC879] Postel, J., "The TCP Maximum Segment Size and Related + Topics", RFC 879, November 1983. + + [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast + Retransmit, and Fast Recovery Algorithms", RFC 2001, + January 1997. + + + + + + + + + + + + + +Lahey Informational [Page 13] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + +6. Author's Address + + Kevin Lahey + dotRocket, Inc. + 1901 S. Bascom Ave., Suite 300 + Campbell, CA 95008 + USA + + Phone: +1 408-371-8977 x115 + email: kml@dotrocket.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Lahey Informational [Page 14] + +RFC 2923 TCP Problems with Path MTU Discovery September 2000 + + +7. Full Copyright Statement + + Copyright (C) The Internet Society (2000). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Lahey Informational [Page 15] + |