diff options
Diffstat (limited to 'doc/rfc/rfc1191.txt')
-rw-r--r-- | doc/rfc/rfc1191.txt | 1084 |
1 files changed, 1084 insertions, 0 deletions
diff --git a/doc/rfc/rfc1191.txt b/doc/rfc/rfc1191.txt new file mode 100644 index 0000000..b0af14f --- /dev/null +++ b/doc/rfc/rfc1191.txt @@ -0,0 +1,1084 @@ + + + + +Network Working Group J. Mogul +Request for Comments: 1191 DECWRL +Obsoletes: RFC 1063 S. Deering + Stanford University + November 1990 + + Path MTU Discovery + + +Status of this Memo + + This RFC specifies a protocol on the IAB Standards Track for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "IAB + Official Protocol Standards" for the standardization state and status + of this protocol. Distribution of this memo is unlimited. + + + Table of Contents + + Status of this Memo 1 + Abstract 2 + Acknowledgements 2 + 1. Introduction 2 + 2. Protocol overview 3 + 3. Host specification 4 + 3.1. TCP MSS Option 5 + 4. Router specification 6 + 5. Host processing of old-style messages 7 + 6. Host implementation 8 + 6.1. Layering 9 + 6.2. Storing PMTU information 10 + 6.3. Purging stale PMTU information 11 + 6.4. TCP layer actions 13 + 6.5. Issues for other transport protocols 14 + 6.6. Management interface 15 + 7. Likely values for Path MTUs 15 + 7.1. A better way to detect PMTU increases 16 + 8. Security considerations 18 + References 18 + Authors' Addresses 19 + + + List of Tables + + Table 7-1: Common MTUs in the Internet 17 + + + + + + +Mogul & Deering [page 1] + + +RFC 1191 Path MTU Discovery November 1990 + + + + +Abstract + + This memo describes a technique for dynamically discovering the + maximum transmission unit (MTU) of an arbitrary internet path. It + specifies a small change to the way routers generate one type of ICMP + message. For a path that passes through a router that has not been + so changed, this technique might not discover the correct Path MTU, + but it will always choose a Path MTU as accurate as, and in many + cases more accurate than, the Path MTU that would be chosen by + current practice. + + +Acknowledgements + + This proposal is a product of the IETF MTU Discovery Working Group. + + The mechanism proposed here was first suggested by Geof Cooper [2], + who in two short paragraphs set out all the basic ideas that took the + Working Group months to reinvent. + + +1. Introduction + + When one IP host has a large amount of data to send to another host, + the data is transmitted as a series of IP datagrams. It is usually + preferable that these datagrams be of the largest size that does not + require fragmentation anywhere along the path from the source to the + destination. (For the case against fragmentation, see [5].) This + datagram size is referred to as the Path MTU (PMTU), and it is equal + to the minimum of the MTUs of each hop in the path. A shortcoming of + the current Internet protocol suite is the lack of a standard + mechanism for a host to discover the PMTU of an arbitrary path. + + Note: The Path MTU is what in [1] is called the "Effective MTU + for sending" (EMTU_S). A PMTU is associated with a path, + which is a particular combination of IP source and destination + address and perhaps a Type-of-service (TOS). + + The current practice [1] is to use the lesser of 576 and the + first-hop MTU as the PMTU for any destination that is not connected + to the same network or subnet as the source. In many cases, this + results in the use of smaller datagrams than necessary, because many + paths have a PMTU greater than 576. A host sending datagrams much + smaller than the Path MTU allows is wasting Internet resources and + probably getting suboptimal throughput. Furthermore, current + practice does not prevent fragmentation in all cases, since there are + some paths whose PMTU is less than 576. + + +Mogul & Deering [page 2] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + It is expected that future routing protocols will be able to provide + accurate PMTU information within a routing area, although perhaps not + across multi-level routing hierarchies. It is not clear how soon + that will be ubiquitously available, so for the next several years + the Internet needs a simple mechanism that discovers PMTUs without + wasting resources and that works before all hosts and routers are + modified. + + +2. Protocol overview + + In this memo, we describe a technique for using the Don't Fragment + (DF) bit in the IP header to dynamically discover the PMTU of a path. + The basic idea is that a source host initially assumes that the PMTU + of a path is the (known) MTU of its first hop, and sends all + datagrams on that path with the DF bit set. If any of the datagrams + are too large to be forwarded without fragmentation by some router + along the path, that router will discard them and return ICMP + Destination Unreachable messages with a code meaning "fragmentation + needed and DF set" [7]. Upon receipt of such a message (henceforth + called a "Datagram Too Big" message), the source host reduces its + assumed PMTU for the path. + + The PMTU discovery process ends when the host's estimate of the PMTU + is low enough that its datagrams can be delivered without + fragmentation. Or, the host may elect to end the discovery process + by ceasing to set the DF bit in the datagram headers; it may do so, + for example, because it is willing to have datagrams fragmented in + some circumstances. Normally, the host continues to set DF in all + datagrams, so that if the route changes and the new PMTU is lower, it + will be discovered. + + Unfortunately, the Datagram Too Big message, as currently specified, + does not report the MTU of the hop for which the rejected datagram + was too big, so the source host cannot tell exactly how much to + reduce its assumed PMTU. To remedy this, we propose that a currently + unused header field in the Datagram Too Big message be used to report + the MTU of the constricting hop. This is the only change specified + for routers in support of PMTU Discovery. + + The PMTU of a path may change over time, due to changes in the + routing topology. Reductions of the PMTU are detected by Datagram + Too Big messages, except on paths for which the host has stopped + setting the DF bit. To detect increases in a path's PMTU, a host + periodically increases its assumed PMTU (and if it had stopped, + resumes setting the DF bit). This will almost always result in + datagrams being discarded and Datagram Too Big messages being + + +Mogul & Deering [page 3] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + generated, because in most cases the PMTU of the path will not have + changed, so it should be done infrequently. + + Since this mechanism essentially guarantees that host will not + receive any fragments from a peer doing PMTU Discovery, it may aid in + interoperating with certain hosts that (improperly) are unable to + reassemble fragmented datagrams. + + +3. Host specification + + When a host receives a Datagram Too Big message, it MUST reduce its + estimate of the PMTU for the relevant path, based on the value of the + Next-Hop MTU field in the message (see section 4). We do not specify + the precise behavior of a host in this circumstance, since different + applications may have different requirements, and since different + implementation architectures may favor different strategies. + + We do require that after receiving a Datagram Too Big message, a host + MUST attempt to avoid eliciting more such messages in the near + future. The host may either reduce the size of the datagrams it is + sending along the path, or cease setting the Don't Fragment bit in + the headers of those datagrams. Clearly, the former strategy may + continue to elicit Datagram Too Big messages for a while, but since + each of these messages (and the dropped datagrams they respond to) + consume Internet resources, the host MUST force the PMTU Discovery + process to converge. + + Hosts using PMTU Discovery MUST detect decreases in Path MTU as fast + as possible. Hosts MAY detect increases in Path MTU, but because + doing so requires sending datagrams larger than the current estimated + PMTU, and because the likelihood is that the PMTU will not have + increased, this MUST be done at infrequent intervals. An attempt to + detect an increase (by sending a datagram larger than the current + estimate) MUST NOT be done less than 5 minutes after a Datagram Too + Big message has been received for the given destination, or less than + 1 minute after a previous, successful attempted increase. We + recommend setting these timers at twice their minimum values (10 + minutes and 2 minutes, respectively). + + Hosts MUST be able to deal with Datagram Too Big messages that do not + include the next-hop MTU, since it is not feasible to upgrade all the + routers in the Internet in any finite time. A Datagram Too Big + message from an unmodified router can be recognized by the presence + of a zero in the (newly-defined) Next-Hop MTU field. (This is + required by the ICMP specification [7], which says that "unused" + fields must be zero.) In section 5, we discuss possible strategies + + +Mogul & Deering [page 4] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + for a host to follow in response to an old-style Datagram Too Big + message (one sent by an unmodified router). + + A host MUST never reduce its estimate of the Path MTU below 68 + octets. + + A host MUST not increase its estimate of the Path MTU in response to + the contents of a Datagram Too Big message. A message purporting to + announce an increase in the Path MTU might be a stale datagram that + has been floating around in the Internet, a false packet injected as + part of a denial-of-service attack, or the result of having multiple + paths to the destination. + + +3.1. TCP MSS Option + + A host doing PMTU Discovery must obey the rule that it not send IP + datagrams larger than 576 octets unless it has permission from the + receiver. For TCP connections, this means that a host must not send + datagrams larger than 40 octets plus the Maximum Segment Size (MSS) + sent by its peer. + + Note: The TCP MSS is defined to be the relevant IP datagram + size minus 40 [9]. The default of 576 octets for the maximum + IP datagram size yields a default of 536 octets for the TCP + MSS. + + Section 4.2.2.6 of "Requirements for Internet Hosts -- Communication + Layers" [1] says: + + Some TCP implementations send an MSS option only if the + destination host is on a non-connected network. However, in + general the TCP layer may not have the appropriate information + to make this decision, so it is preferable to leave to the IP + layer the task of determining a suitable MTU for the Internet + path. + + Actually, many TCP implementations always send an MSS option, but set + the value to 536 if the destination is non-local. This behavior was + correct when the Internet was full of hosts that did not follow the + rule that datagrams larger than 576 octets should not be sent to + non-local destinations. Now that most hosts do follow this rule, it + is unnecessary to limit the value in the TCP MSS option to 536 for + non-local peers. + + Moreover, doing this prevents PMTU Discovery from discovering PMTUs + larger than 576, so hosts SHOULD no longer lower the value they send + + +Mogul & Deering [page 5] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + in the MSS option. The MSS option should be 40 octets less than the + size of the largest datagram the host is able to reassemble (MMS_R, + as defined in [1]); in many cases, this will be the architectural + limit of 65495 (65535 - 40) octets. A host MAY send an MSS value + derived from the MTU of its connected network (the maximum MTU over + its connected networks, for a multi-homed host); this should not + cause problems for PMTU Discovery, and may dissuade a broken peer + from sending enormous datagrams. + + Note: At the moment, we see no reason to send an MSS greater + than the maximum MTU of the connected networks, and we + recommend that hosts do not use 65495. It is quite possible + that some IP implementations have sign-bit bugs that would be + tickled by unnecessary use of such a large MSS. + + +4. Router specification + + When a router is unable to forward a datagram because it exceeds the + MTU of the next-hop network and its Don't Fragment bit is set, the + router is required to return an ICMP Destination Unreachable message + to the source of the datagram, with the Code indicating + "fragmentation needed and DF set". To support the Path MTU Discovery + technique specified in this memo, the router MUST include the MTU of + that next-hop network in the low-order 16 bits of the ICMP header + field that is labelled "unused" in the ICMP specification [7]. The + high-order 16 bits remain unused, and MUST be set to zero. Thus, the + message has the following format: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type = 3 | Code = 4 | Checksum | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | unused = 0 | Next-Hop MTU | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Internet Header + 64 bits of Original Datagram Data | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + + The value carried in the Next-Hop MTU field is: + + The size in octets of the largest datagram that could be + forwarded, along the path of the original datagram, without + being fragmented at this router. The size includes the IP + header and IP data, and does not include any lower-level + headers. + + +Mogul & Deering [page 6] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + This field will never contain a value less than 68, since every + router "must be able to forward a datagram of 68 octets without + fragmentation" [8]. + + +5. Host processing of old-style messages + + In this section we outline several possible strategies for a host to + follow upon receiving a Datagram Too Big message from an unmodified + router (i.e., one where the Next-Hop MTU field is zero). This + section is not part of the protocol specification. + + The simplest thing for a host to do in response to such a message is + to assume that the PMTU is the minimum of its currently-assumed PMTU + and 576, and to stop setting the DF bit in datagrams sent on that + path. Thus, the host falls back to the same PMTU as it would choose + under current practice (see section 3.3.3 of "Requirements for + Internet Hosts -- Communication Layers" [1]). This strategy has the + advantage that it terminates quickly, and does no worse than existing + practice. It fails, however, to avoid fragmentation in some cases, + and to make the most efficient utilization of the internetwork in + other cases. + + More sophisticated strategies involve "searching" for an accurate + PMTU estimate, by continuing to send datagrams with the DF bit while + varying their sizes. A good search strategy is one that obtains an + accurate estimate of the Path MTU without causing many packets to be + lost in the process. + + Several possible strategies apply algorithmic functions to the + previous PMTU estimate to generate a new estimate. For example, one + could multiply the old estimate by a constant (say, 0.75). We do NOT + recommend this; it either converges far too slowly, or it + substantially underestimates the true PMTU. + + A more sophisticated approach is to do a binary search on the packet + size. This converges somewhat faster, although it still takes 4 or 5 + steps to converge from an FDDI MTU to an Ethernet MTU. A serious + disadvantage is that it requires a complex implementation in order to + recognize when a datagram has made it to the other end (indicating + that the current estimate is too low). We also do not recommend this + strategy. + + One strategy that appears to work quite well starts from the + observation that there are, in practice, relatively few MTU values in + use in the Internet. Thus, rather than blindly searching through + arbitrarily chosen values, we can search only the ones that are + + +Mogul & Deering [page 7] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + likely to appear. Moreover, since designers tend to chose MTUs in + similar ways, it is possible to collect groups of similar MTU values + and use the lowest value in the group as our search "plateau". (It + is clearly better to underestimate an MTU by a few per cent than to + overestimate it by one octet.) + + In section 7, we describe how we arrived at a table of representative + MTU plateaus for use in PMTU estimation. With this table, + convergence is as good as binary search in the worst case, and is far + better in common cases (for example, it takes only two round-trip + times to go from an FDDI MTU to an Ethernet MTU). Since the plateaus + lie near powers of two, if an MTU is not represented in this table, + the algorithm will not underestimate it by more than a factor of 2. + + Any search strategy must have some "memory" of previous estimates in + order to chose the next one. One approach is to use the + currently-cached estimate of the Path MTU, but in fact there is + better information available in the Datagram Too Big message itself. + All ICMP Destination Unreachable messages, including this one, + contain the IP header of the original datagram, which contains the + Total Length of the datagram that was too big to be forwarded without + fragmentation. Since this Total Length may be less than the current + PMTU estimate, but is nonetheless larger than the actual PMTU, it may + be a good input to the method for choosing the next PMTU estimate. + + Note: routers based on implementations derived from 4.2BSD + Unix send an incorrect value for the Total Length of the + original IP datagram. The value sent by these routers is the + sum of the original Total Length and the original Header + Length (expressed in octets). Since it is impossible for the + host receiving such a Datagram Too Big message to know if it + sent by one of these routers, the host must be conservative + and assume that it is. If the Total Length field returned is + not less than the current PMTU estimate, it must be reduced by + 4 times the value of the returned Header Length field. + + The strategy we recommend, then, is to use as the next PMTU estimate + the greatest plateau value that is less than the returned Total + Length field (corrected, if necessary, according to the Note above). + + +6. Host implementation + + In this section we discuss how PMTU Discovery is implemented in host + software. This is not a specification, but rather a set of + suggestions. + + The issues include: + +Mogul & Deering [page 8] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + - What layer or layers implement PMTU Discovery? + + - Where is the PMTU information cached? + + - How is stale PMTU information removed? + + - What must transport and higher layers do? + + +6.1. Layering + + In the IP architecture, the choice of what size datagram to send is + made by a protocol at a layer above IP. We refer to such a protocol + as a "packetization protocol". Packetization protocols are usually + transport protocols (for example, TCP) but can also be higher-layer + protocols (for example, protocols built on top of UDP). + + Implementing PMTU Discovery in the packetization layers simplifies + some of the inter-layer issues, but has several drawbacks: the + implementation may have to be redone for each packetization protocol, + it becomes hard to share PMTU information between different + packetization layers, and the connection-oriented state maintained by + some packetization layers may not easily extend to save PMTU + information for long periods. + + We therefore believe that the IP layer should store PMTU information + and that the ICMP layer should process received Datagram Too Big + messages. The packetization layers must still be able to respond to + changes in the Path MTU, by changing the size of the datagrams they + send, and must also be able to specify that datagrams are sent with + the DF bit set. We do not want the IP layer to simply set the DF bit + in every packet, since it is possible that a packetization layer, + perhaps a UDP application outside the kernel, is unable to change its + datagram size. Protocols involving intentional fragmentation, while + inelegant, are sometimes successful (NFS being the primary example), + and we do not want to break such protocols. + + To support this layering, packetization layers require an extension + of the IP service interface defined in [1]: + + A way to learn of changes in the value of MMS_S, the "maximum + send transport-message size", which is derived from the Path + MTU by subtracting the minimum IP header size. + + + + + + +Mogul & Deering [page 9] + + +RFC 1191 Path MTU Discovery November 1990 + + + + +6.2. Storing PMTU information + + In general, the IP layer should associate each PMTU value that it has + learned with a specific path. A path is identified by a source + address, a destination address and an IP type-of-service. (Some + implementations do not record the source address of paths; this is + acceptable for single-homed hosts, which have only one possible + source address.) + + Note: Some paths may be further distinguished by different + security classifications. The details of such classifications + are beyond the scope of this memo. + + The obvious place to store this association is as a field in the + routing table entries. A host will not have a route for every + possible destination, but it should be able to cache a per-host route + for every active destination. (This requirement is already imposed + by the need to process ICMP Redirect messages.) + + When the first packet is sent to a host for which no per-host route + exists, a route is chosen either from the set of per-network routes, + or from the set of default routes. The PMTU fields in these route + entries should be initialized to be the MTU of the associated + first-hop data link, and must never be changed by the PMTU Discovery + process. (PMTU Discovery only creates or changes entries for + per-host routes). Until a Datagram Too Big message is received, the + PMTU associated with the initially-chosen route is presumed to be + accurate. + + When a Datagram Too Big message is received, the ICMP layer + determines a new estimate for the Path MTU (either from a non-zero + Next-Hop MTU value in the packet, or using the method described in + section 5). If a per-host route for this path does not exist, then + one is created (almost as if a per-host ICMP Redirect is being + processed; the new route uses the same first-hop router as the + current route). If the PMTU estimate associated with the per-host + route is higher than the new estimate, then the value in the routing + entry is changed. + + The packetization layers must be notified about decreases in the + PMTU. Any packetization layer instance (for example, a TCP + connection) that is actively using the path must be notified if the + PMTU estimate is decreased. + + Note: even if the Datagram Too Big message contains an + Original Datagram Header that refers to a UDP packet, the TCP + layer must be notified if any of its connections use the given + + +Mogul & Deering [page 10] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + path. + + Also, the instance that sent the datagram that elicited the Datagram + Too Big message should be notified that its datagram has been + dropped, even if the PMTU estimate has not changed, so that it may + retransmit the dropped datagram. + + Note: The notification mechanism can be analogous to the + mechanism used to provide notification of an ICMP Source + Quench message. In some implementations (such as + 4.2BSD-derived systems), the existing notification mechanism + is not able to identify the specific connection involved, and + so an additional mechanism is necessary. + + Alternatively, an implementation can avoid the use of an + asynchronous notification mechanism for PMTU decreases by + postponing notification until the next attempt to send a + datagram larger than the PMTU estimate. In this approach, + when an attempt is made to SEND a datagram with the DF bit + set, and the datagram is larger than the PMTU estimate, the + SEND function should fail and return a suitable error + indication. This approach may be more suitable to a + connectionless packetization layer (such as one using UDP), + which (in some implementations) may be hard to "notify" from + the ICMP layer. In this case, the normal timeout-based + retransmission mechanisms would be used to recover from the + dropped datagrams. + + It is important to understand that the notification of the + packetization layer instances using the path about the change in the + PMTU is distinct from the notification of a specific instance that a + packet has been dropped. The latter should be done as soon as + practical (i.e., asynchronously from the point of view of the + packetization layer instance), while the former may be delayed until + a packetization layer instance wants to create a packet. + Retransmission should be done for only for those packets that are + known to be dropped, as indicated by a Datagram Too Big message. + + +6.3. Purging stale PMTU information + + Internetwork topology is dynamic; routes change over time. The PMTU + discovered for a given destination may be wrong if a new route comes + into use. Thus, PMTU information cached by a host can become stale. + + Because a host using PMTU Discovery always sets the DF bit, if the + stale PMTU value is too large, this will be discovered almost + + +Mogul & Deering [page 11] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + immediately once a datagram is sent to the given destination. No + such mechanism exists for realizing that a stale PMTU value is too + small, so an implementation should "age" cached values. When a PMTU + value has not been decreased for a while (on the order of 10 + minutes), the PMTU estimate should be set to the first-hop data-link + MTU, and the packetization layers should be notified of the change. + This will cause the complete PMTU Discovery process to take place + again. + + Note: an implementation should provide a means for changing + the timeout duration, including setting it to "infinity". For + example, hosts attached to an FDDI network which is then + attached to the rest of the Internet via a slow serial line + are never going to discover a new non-local PMTU, so they + should not have to put up with dropped datagrams every 10 + minutes. + + An upper layer MUST not retransmit datagrams in response to an + increase in the PMTU estimate, since this increase never comes in + response to an indication of a dropped datagram. + + One approach to implementing PMTU aging is to add a timestamp field + to the routing table entry. This field is initialized to a + "reserved" value, indicating that the PMTU has never been changed. + Whenever the PMTU is decreased in response to a Datagram Too Big + message, the timestamp is set to the current time. + + Once a minute, a timer-driven procedure runs through the routing + table, and for each entry whose timestamp is not "reserved" and is + older than the timeout interval: + + - The PMTU estimate is set to the MTU of the associated first + hop. + + - Packetization layers using this route are notified of the + increase. + + PMTU estimates may disappear from the routing table if the per-host + routes are removed; this can happen in response to an ICMP Redirect + message, or because certain routing-table daemons delete old routes + after several minutes. Also, on a multi-homed host a topology change + may result in the use of a different source interface. When this + happens, if the packetization layer is not notified then it may + continue to use a cached PMTU value that is now too small. One + solution is to notify the packetization layer of a possible PMTU + change whenever a Redirect message causes a route change, and + whenever a route is simply deleted from the routing table. + + +Mogul & Deering [page 12] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + Note: a more sophisticated method for detecting PMTU increases + is described in section 7.1. + + +6.4. TCP layer actions + + The TCP layer must track the PMTU for the destination of a + connection; it should not send datagrams that would be larger than + this. A simple implementation could ask the IP layer for this value + (using the GET_MAXSIZES interface described in [1]) each time it + created a new segment, but this could be inefficient. Moreover, TCP + implementations that follow the "slow-start" congestion-avoidance + algorithm [4] typically calculate and cache several other values + derived from the PMTU. It may be simpler to receive asynchronous + notification when the PMTU changes, so that these variables may be + updated. + + A TCP implementation must also store the MSS value received from its + peer (which defaults to 536), and not send any segment larger than + this MSS, regardless of the PMTU. In 4.xBSD-derived implementations, + this requires adding an additional field to the TCP state record. + + Finally, when a Datagram Too Big message is received, it implies that + a datagram was dropped by the router that sent the ICMP message. It + is sufficient to treat this as any other dropped segment, and wait + until the retransmission timer expires to cause retransmission of the + segment. If the PMTU Discovery process requires several steps to + estimate the right PMTU, this could delay the connection by many + round-trip times. + + Alternatively, the retransmission could be done in immediate response + to a notification that the Path MTU has changed, but only for the + specific connection specified by the Datagram Too Big message. The + datagram size used in the retransmission should, of course, be no + larger than the new PMTU. + + Note: One MUST not retransmit in response to every Datagram + Too Big message, since a burst of several oversized segments + will give rise to several such messages and hence several + retransmissions of the same data. If the new estimated PMTU + is still wrong, the process repeats, and there is an + exponential growth in the number of superfluous segments sent! + + This means that the TCP layer must be able to recognize when a + Datagram Too Big notification actually decreases the PMTU that + it has already used to send a datagram on the given + connection, and should ignore any other notifications. + + +Mogul & Deering [page 13] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + Modern TCP implementations incorporate "congestion advoidance" and + "slow-start" algorithms to improve performance [4]. Unlike a + retransmission caused by a TCP retransmission timeout, a + retransmission caused by a Datagram Too Big message should not change + the congestion window. It should, however, trigger the slow-start + mechanism (i.e., only one segment should be retransmitted until + acknowledgements begin to arrive again). + + TCP performance can be reduced if the sender's maximum window size is + not an exact multiple of the segment size in use (this is not the + congestion window size, which is always a multiple of the segment + size). In many system (such as those derived from 4.2BSD), the + segment size is often set to 1024 octets, and the maximum window size + (the "send space") is usually a multiple of 1024 octets, so the + proper relationship holds by default. If PMTU Discovery is used, + however, the segment size may not be a submultiple of the send space, + and it may change during a connection; this means that the TCP layer + may need to change the transmission window size when PMTU Discovery + changes the PMTU value. The maximum window size should be set to the + greatest multiple of the segment size (PMTU - 40) that is less than + or equal to the sender's buffer space size. + + PMTU Discovery does not affect the value sent in the TCP MSS option, + because that value is used by the other end of the connection, which + may be using an unrelated PMTU value. + + +6.5. Issues for other transport protocols + + Some transport protocols (such as ISO TP4 [3]) are not allowed to + repacketize when doing a retransmission. That is, once an attempt is + made to transmit a datagram of a certain size, its contents cannot be + split into smaller datagrams for retransmission. In such a case, the + original datagram should be retransmitted without the DF bit set, + allowing it to be fragmented as necessary to reach its destination. + Subsequent datagrams, when transmitted for the first time, should be + no larger than allowed by the Path MTU, and should have the DF bit + set. + + The Sun Network File System (NFS) uses a Remote Procedure Call (RPC) + protocol [11] that, in many cases, sends datagrams that must be + fragmented even for the first-hop link. This might improve + performance in certain cases, but it is known to cause reliability + and performance problems, especially when the client and server are + separated by routers. + + We recommend that NFS implementations use PMTU Discovery whenever + + +Mogul & Deering [page 14] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + routers are involved. Most NFS implementations allow the RPC + datagram size to be changed at mount-time (indirectly, by changing + the effective file system block size), but might require some + modification to support changes later on. + + Also, since a single NFS operation cannot be split across several UDP + datagrams, certain operations (primarily, those operating on file + names and directories) require a minimum datagram size that may be + larger than the PMTU. NFS implementations should not reduce the + datagram size below this threshold, even if PMTU Discovery suggests a + lower value. (Of course, in this case datagrams should not be sent + with DF set.) + + +6.6. Management interface + + We suggest that an implementation provide a way for a system utility + program to: + + - Specify that PMTU Discovery not be done on a given route. + + - Change the PMTU value associated with a given route. + + The former can be accomplished by associating a flag with the routing + entry; when a packet is sent via a route with this flag set, the IP + layer leaves the DF bit clear no matter what the upper layer + requests. + + These features might be used to work around an anomalous situation, + or by a routing protocol implementation that is able to obtain Path + MTU values. + + The implementation should also provide a way to change the timeout + period for aging stale PMTU information. + + +7. Likely values for Path MTUs + + The algorithm recommended in section 5 for "searching" the space of + Path MTUs is based on a table of values that severely restricts the + search space. We describe here a table of MTU values that, as of + this writing, represents all major data-link technologies in use in + the Internet. + + In table 7-1, data links are listed in order of decreasing MTU, and + grouped so that each set of similar MTUs is associated with a + "plateau" equal to the lowest MTU in the group. (The table also + + +Mogul & Deering [page 15] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + includes some entries not currently associated with a data link, and + gives references where available). Where a plateau represents more + than one MTU, the table shows the maximum inaccuracy associated with + the plateau, as a percentage. + + We do not expect that the values in the table, especially for higher + MTU levels, are going to be valid forever. The values given here are + an implementation suggestion, NOT a specification or requirement. + Implementors should use up-to-date references to pick a set of + plateaus; it is important that the table not contain too many entries + or the process of searching for a PMTU might waste Internet + resources. Implementors should also make it convenient for customers + without source code to update the table values in their systems (for + example, the table in a BSD-derived Unix kernel could be changed + using a new "ioctl" command). + + Note: It might be a good idea to add a few table entries for + values equal to small powers of 2 plus 40 (for the IP and TCP + headers), where no similar values exist, since this seems to + be a reasonably non-arbitrary way of choosing arbitrary + values. + + The table might also contain entries for values slightly less + than large powers of 2, in case MTUs are defined near those + values (it is better in this case for the table entries to be + low than to be high, or else the next lowest plateau may be + chosen instead). + + +7.1. A better way to detect PMTU increases + + Section 6.3 suggests detecting increases in the PMTU value by + periodically increasing the PTMU estimate to the first-hop MTU. + Since it is likely that this process will simply "rediscover" the + current PTMU estimate, at the cost of several dropped datagrams, it + should not be done often. + + A better approach is to periodically increase the PMTU estimate to + the next-highest value in the plateau table (or the first-hop MTU, if + that is smaller). If the increased estimate is wrong, at most one + round-trip time is wasted before the correct value is rediscovered. + If the increased estimate is still too low, a higher estimate will be + attempted somewhat later. + + Because it may take several such periods to discover a significant + increase in the PMTU, we recommend that a short timeout period should + be used after the estimate is increased, and a longer timeout be used + + +Mogul & Deering [page 16] + + +RFC 1191 Path MTU Discovery November 1990 + + + + + Plateau MTU Comments Reference + ------ --- -------- --------- + 65535 Official maximum MTU RFC 791 + 65535 Hyperchannel RFC 1044 + 65535 + 32000 Just in case + 17914 16Mb IBM Token Ring ref. [6] + 17914 + 8166 IEEE 802.4 RFC 1042 + 8166 + 4464 IEEE 802.5 (4Mb max) RFC 1042 + 4352 FDDI (Revised) RFC 1188 + 4352 (1%) + 2048 Wideband Network RFC 907 + 2002 IEEE 802.5 (4Mb recommended) RFC 1042 + 2002 (2%) + 1536 Exp. Ethernet Nets RFC 895 + 1500 Ethernet Networks RFC 894 + 1500 Point-to-Point (default) RFC 1134 + 1492 IEEE 802.3 RFC 1042 + 1492 (3%) + 1006 SLIP RFC 1055 + 1006 ARPANET BBN 1822 + 1006 + 576 X.25 Networks RFC 877 + 544 DEC IP Portal ref. [10] + 512 NETBIOS RFC 1088 + 508 IEEE 802/Source-Rt Bridge RFC 1042 + 508 ARCNET RFC 1051 + 508 (13%) + 296 Point-to-Point (low delay) RFC 1144 + 296 + 68 Official minimum MTU RFC 791 + + Table 7-1: Common MTUs in the Internet + + after the PTMU estimate is decreased because of a Datagram Too Big + message. For example, after the PTMU estimate is decreased, the + timeout should be set to 10 minutes; once this timer expires and a + larger MTU is attempted, the timeout can be set to a much smaller + value (say, 2 minutes). In no case should the timeout be shorter + than the estimated round-trip time, if this is known. + + + + + + + +Mogul & Deering [page 17] + + +RFC 1191 Path MTU Discovery November 1990 + + + + +8. Security considerations + + This Path MTU Discovery mechanism makes possible two denial-of- + service attacks, both based on a malicious party sending false + Datagram Too Big messages to an Internet host. + + In the first attack, the false message indicates a PMTU much smaller + than reality. This should not entirely stop data flow, since the + victim host should never set its PMTU estimate below the absolute + minimum, but at 8 octets of IP data per datagram, progress could be + slow. + + In the other attack, the false message indicates a PMTU greater than + reality. If believed, this could cause temporary blockage as the + victim sends datagrams that will be dropped by some router. Within + one round-trip time, the host would discover its mistake (receiving + Datagram Too Big messages from that router), but frequent repetition + of this attack could cause lots of datagrams to be dropped. A host, + however, should never raise its estimate of the PMTU based on a + Datagram Too Big message, so should not be vulnerable to this attack. + + A malicious party could also cause problems if it could stop a victim + from receiving legitimate Datagram Too Big messages, but in this case + there are simpler denial-of-service attacks available. + + +References + +[1] R. Braden, ed. Requirements for Internet Hosts -- Communication + Layers. RFC 1122, SRI Network Information Center, October, 1989. + +[2] Geof Cooper. IP Datagram Sizes. Electronic distribution of the + TCP-IP Discussion Group, Message-ID + <8705240517.AA01407@apolling.imagen.uucp>. + +[3] ISO. ISO Transport Protocol Specification: ISO DP 8073. RFC 905, + SRI Network Information Center, April, 1984. + +[4] Van Jacobson. Congestion Avoidance and Control. In Proc. SIGCOMM + '88 Symposium on Communications Architectures and Protocols, pages + 314-329. Stanford, CA, August, 1988. + +[5] C. Kent and J. Mogul. Fragmentation Considered Harmful. In Proc. + SIGCOMM '87 Workshop on Frontiers in Computer Communications + Technology. August, 1987. + +[6] Drew Daniel Perkins. Private Communication. + + +Mogul & Deering [page 18] + + +RFC 1191 Path MTU Discovery November 1990 + + + + +[7] J. Postel. Internet Control Message Protocol. RFC 792, SRI + Network Information Center, September, 1981. + +[8] J. Postel. Internet Protocol. RFC 791, SRI Network Information + Center, September, 1981. + +[9] J. Postel. The TCP Maximum Segment Size and Related Topics. RFC + 879, SRI Network Information Center, November, 1983. + +[10] Michael Reilly. Private Communication. + +[11] Sun Microsystems, Inc. RPC: Remote Procedure Call Protocol. RFC + 1057, SRI Network Information Center, June, 1988. + + + +Authors' Addresses + + Jeffrey Mogul + Digital Equipment Corporation Western Research Laboratory + 100 Hamilton Avenue + Palo Alto, CA 94301 + + Phone: (415) 853-6643 + EMail: mogul@decwrl.dec.com + + + Steve Deering + Xerox Palo Alto Research Center + 3333 Coyote Hill Road + Palo Alto, CA 94304 + + Phone: (415) 494-4839 + EMail: deering@xerox.com + + + + + + + + + + + + + + + +Mogul & Deering [page 19] +
\ No newline at end of file |