summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1191.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1191.txt')
-rw-r--r--doc/rfc/rfc1191.txt1084
1 files changed, 1084 insertions, 0 deletions
diff --git a/doc/rfc/rfc1191.txt b/doc/rfc/rfc1191.txt
new file mode 100644
index 0000000..b0af14f
--- /dev/null
+++ b/doc/rfc/rfc1191.txt
@@ -0,0 +1,1084 @@
+
+
+
+
+Network Working Group J. Mogul
+Request for Comments: 1191 DECWRL
+Obsoletes: RFC 1063 S. Deering
+ Stanford University
+ November 1990
+
+ Path MTU Discovery
+
+
+Status of this Memo
+
+ This RFC specifies a protocol on the IAB Standards Track for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "IAB
+ Official Protocol Standards" for the standardization state and status
+ of this protocol. Distribution of this memo is unlimited.
+
+
+ Table of Contents
+
+ Status of this Memo 1
+ Abstract 2
+ Acknowledgements 2
+ 1. Introduction 2
+ 2. Protocol overview 3
+ 3. Host specification 4
+ 3.1. TCP MSS Option 5
+ 4. Router specification 6
+ 5. Host processing of old-style messages 7
+ 6. Host implementation 8
+ 6.1. Layering 9
+ 6.2. Storing PMTU information 10
+ 6.3. Purging stale PMTU information 11
+ 6.4. TCP layer actions 13
+ 6.5. Issues for other transport protocols 14
+ 6.6. Management interface 15
+ 7. Likely values for Path MTUs 15
+ 7.1. A better way to detect PMTU increases 16
+ 8. Security considerations 18
+ References 18
+ Authors' Addresses 19
+
+
+ List of Tables
+
+ Table 7-1: Common MTUs in the Internet 17
+
+
+
+
+
+
+Mogul & Deering [page 1]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+Abstract
+
+ This memo describes a technique for dynamically discovering the
+ maximum transmission unit (MTU) of an arbitrary internet path. It
+ specifies a small change to the way routers generate one type of ICMP
+ message. For a path that passes through a router that has not been
+ so changed, this technique might not discover the correct Path MTU,
+ but it will always choose a Path MTU as accurate as, and in many
+ cases more accurate than, the Path MTU that would be chosen by
+ current practice.
+
+
+Acknowledgements
+
+ This proposal is a product of the IETF MTU Discovery Working Group.
+
+ The mechanism proposed here was first suggested by Geof Cooper [2],
+ who in two short paragraphs set out all the basic ideas that took the
+ Working Group months to reinvent.
+
+
+1. Introduction
+
+ When one IP host has a large amount of data to send to another host,
+ the data is transmitted as a series of IP datagrams. It is usually
+ preferable that these datagrams be of the largest size that does not
+ require fragmentation anywhere along the path from the source to the
+ destination. (For the case against fragmentation, see [5].) This
+ datagram size is referred to as the Path MTU (PMTU), and it is equal
+ to the minimum of the MTUs of each hop in the path. A shortcoming of
+ the current Internet protocol suite is the lack of a standard
+ mechanism for a host to discover the PMTU of an arbitrary path.
+
+ Note: The Path MTU is what in [1] is called the "Effective MTU
+ for sending" (EMTU_S). A PMTU is associated with a path,
+ which is a particular combination of IP source and destination
+ address and perhaps a Type-of-service (TOS).
+
+ The current practice [1] is to use the lesser of 576 and the
+ first-hop MTU as the PMTU for any destination that is not connected
+ to the same network or subnet as the source. In many cases, this
+ results in the use of smaller datagrams than necessary, because many
+ paths have a PMTU greater than 576. A host sending datagrams much
+ smaller than the Path MTU allows is wasting Internet resources and
+ probably getting suboptimal throughput. Furthermore, current
+ practice does not prevent fragmentation in all cases, since there are
+ some paths whose PMTU is less than 576.
+
+
+Mogul & Deering [page 2]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ It is expected that future routing protocols will be able to provide
+ accurate PMTU information within a routing area, although perhaps not
+ across multi-level routing hierarchies. It is not clear how soon
+ that will be ubiquitously available, so for the next several years
+ the Internet needs a simple mechanism that discovers PMTUs without
+ wasting resources and that works before all hosts and routers are
+ modified.
+
+
+2. Protocol overview
+
+ In this memo, we describe a technique for using the Don't Fragment
+ (DF) bit in the IP header to dynamically discover the PMTU of a path.
+ The basic idea is that a source host initially assumes that the PMTU
+ of a path is the (known) MTU of its first hop, and sends all
+ datagrams on that path with the DF bit set. If any of the datagrams
+ are too large to be forwarded without fragmentation by some router
+ along the path, that router will discard them and return ICMP
+ Destination Unreachable messages with a code meaning "fragmentation
+ needed and DF set" [7]. Upon receipt of such a message (henceforth
+ called a "Datagram Too Big" message), the source host reduces its
+ assumed PMTU for the path.
+
+ The PMTU discovery process ends when the host's estimate of the PMTU
+ is low enough that its datagrams can be delivered without
+ fragmentation. Or, the host may elect to end the discovery process
+ by ceasing to set the DF bit in the datagram headers; it may do so,
+ for example, because it is willing to have datagrams fragmented in
+ some circumstances. Normally, the host continues to set DF in all
+ datagrams, so that if the route changes and the new PMTU is lower, it
+ will be discovered.
+
+ Unfortunately, the Datagram Too Big message, as currently specified,
+ does not report the MTU of the hop for which the rejected datagram
+ was too big, so the source host cannot tell exactly how much to
+ reduce its assumed PMTU. To remedy this, we propose that a currently
+ unused header field in the Datagram Too Big message be used to report
+ the MTU of the constricting hop. This is the only change specified
+ for routers in support of PMTU Discovery.
+
+ The PMTU of a path may change over time, due to changes in the
+ routing topology. Reductions of the PMTU are detected by Datagram
+ Too Big messages, except on paths for which the host has stopped
+ setting the DF bit. To detect increases in a path's PMTU, a host
+ periodically increases its assumed PMTU (and if it had stopped,
+ resumes setting the DF bit). This will almost always result in
+ datagrams being discarded and Datagram Too Big messages being
+
+
+Mogul & Deering [page 3]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ generated, because in most cases the PMTU of the path will not have
+ changed, so it should be done infrequently.
+
+ Since this mechanism essentially guarantees that host will not
+ receive any fragments from a peer doing PMTU Discovery, it may aid in
+ interoperating with certain hosts that (improperly) are unable to
+ reassemble fragmented datagrams.
+
+
+3. Host specification
+
+ When a host receives a Datagram Too Big message, it MUST reduce its
+ estimate of the PMTU for the relevant path, based on the value of the
+ Next-Hop MTU field in the message (see section 4). We do not specify
+ the precise behavior of a host in this circumstance, since different
+ applications may have different requirements, and since different
+ implementation architectures may favor different strategies.
+
+ We do require that after receiving a Datagram Too Big message, a host
+ MUST attempt to avoid eliciting more such messages in the near
+ future. The host may either reduce the size of the datagrams it is
+ sending along the path, or cease setting the Don't Fragment bit in
+ the headers of those datagrams. Clearly, the former strategy may
+ continue to elicit Datagram Too Big messages for a while, but since
+ each of these messages (and the dropped datagrams they respond to)
+ consume Internet resources, the host MUST force the PMTU Discovery
+ process to converge.
+
+ Hosts using PMTU Discovery MUST detect decreases in Path MTU as fast
+ as possible. Hosts MAY detect increases in Path MTU, but because
+ doing so requires sending datagrams larger than the current estimated
+ PMTU, and because the likelihood is that the PMTU will not have
+ increased, this MUST be done at infrequent intervals. An attempt to
+ detect an increase (by sending a datagram larger than the current
+ estimate) MUST NOT be done less than 5 minutes after a Datagram Too
+ Big message has been received for the given destination, or less than
+ 1 minute after a previous, successful attempted increase. We
+ recommend setting these timers at twice their minimum values (10
+ minutes and 2 minutes, respectively).
+
+ Hosts MUST be able to deal with Datagram Too Big messages that do not
+ include the next-hop MTU, since it is not feasible to upgrade all the
+ routers in the Internet in any finite time. A Datagram Too Big
+ message from an unmodified router can be recognized by the presence
+ of a zero in the (newly-defined) Next-Hop MTU field. (This is
+ required by the ICMP specification [7], which says that "unused"
+ fields must be zero.) In section 5, we discuss possible strategies
+
+
+Mogul & Deering [page 4]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ for a host to follow in response to an old-style Datagram Too Big
+ message (one sent by an unmodified router).
+
+ A host MUST never reduce its estimate of the Path MTU below 68
+ octets.
+
+ A host MUST not increase its estimate of the Path MTU in response to
+ the contents of a Datagram Too Big message. A message purporting to
+ announce an increase in the Path MTU might be a stale datagram that
+ has been floating around in the Internet, a false packet injected as
+ part of a denial-of-service attack, or the result of having multiple
+ paths to the destination.
+
+
+3.1. TCP MSS Option
+
+ A host doing PMTU Discovery must obey the rule that it not send IP
+ datagrams larger than 576 octets unless it has permission from the
+ receiver. For TCP connections, this means that a host must not send
+ datagrams larger than 40 octets plus the Maximum Segment Size (MSS)
+ sent by its peer.
+
+ Note: The TCP MSS is defined to be the relevant IP datagram
+ size minus 40 [9]. The default of 576 octets for the maximum
+ IP datagram size yields a default of 536 octets for the TCP
+ MSS.
+
+ Section 4.2.2.6 of "Requirements for Internet Hosts -- Communication
+ Layers" [1] says:
+
+ Some TCP implementations send an MSS option only if the
+ destination host is on a non-connected network. However, in
+ general the TCP layer may not have the appropriate information
+ to make this decision, so it is preferable to leave to the IP
+ layer the task of determining a suitable MTU for the Internet
+ path.
+
+ Actually, many TCP implementations always send an MSS option, but set
+ the value to 536 if the destination is non-local. This behavior was
+ correct when the Internet was full of hosts that did not follow the
+ rule that datagrams larger than 576 octets should not be sent to
+ non-local destinations. Now that most hosts do follow this rule, it
+ is unnecessary to limit the value in the TCP MSS option to 536 for
+ non-local peers.
+
+ Moreover, doing this prevents PMTU Discovery from discovering PMTUs
+ larger than 576, so hosts SHOULD no longer lower the value they send
+
+
+Mogul & Deering [page 5]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ in the MSS option. The MSS option should be 40 octets less than the
+ size of the largest datagram the host is able to reassemble (MMS_R,
+ as defined in [1]); in many cases, this will be the architectural
+ limit of 65495 (65535 - 40) octets. A host MAY send an MSS value
+ derived from the MTU of its connected network (the maximum MTU over
+ its connected networks, for a multi-homed host); this should not
+ cause problems for PMTU Discovery, and may dissuade a broken peer
+ from sending enormous datagrams.
+
+ Note: At the moment, we see no reason to send an MSS greater
+ than the maximum MTU of the connected networks, and we
+ recommend that hosts do not use 65495. It is quite possible
+ that some IP implementations have sign-bit bugs that would be
+ tickled by unnecessary use of such a large MSS.
+
+
+4. Router specification
+
+ When a router is unable to forward a datagram because it exceeds the
+ MTU of the next-hop network and its Don't Fragment bit is set, the
+ router is required to return an ICMP Destination Unreachable message
+ to the source of the datagram, with the Code indicating
+ "fragmentation needed and DF set". To support the Path MTU Discovery
+ technique specified in this memo, the router MUST include the MTU of
+ that next-hop network in the low-order 16 bits of the ICMP header
+ field that is labelled "unused" in the ICMP specification [7]. The
+ high-order 16 bits remain unused, and MUST be set to zero. Thus, the
+ message has the following format:
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Type = 3 | Code = 4 | Checksum |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | unused = 0 | Next-Hop MTU |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Internet Header + 64 bits of Original Datagram Data |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+
+ The value carried in the Next-Hop MTU field is:
+
+ The size in octets of the largest datagram that could be
+ forwarded, along the path of the original datagram, without
+ being fragmented at this router. The size includes the IP
+ header and IP data, and does not include any lower-level
+ headers.
+
+
+Mogul & Deering [page 6]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ This field will never contain a value less than 68, since every
+ router "must be able to forward a datagram of 68 octets without
+ fragmentation" [8].
+
+
+5. Host processing of old-style messages
+
+ In this section we outline several possible strategies for a host to
+ follow upon receiving a Datagram Too Big message from an unmodified
+ router (i.e., one where the Next-Hop MTU field is zero). This
+ section is not part of the protocol specification.
+
+ The simplest thing for a host to do in response to such a message is
+ to assume that the PMTU is the minimum of its currently-assumed PMTU
+ and 576, and to stop setting the DF bit in datagrams sent on that
+ path. Thus, the host falls back to the same PMTU as it would choose
+ under current practice (see section 3.3.3 of "Requirements for
+ Internet Hosts -- Communication Layers" [1]). This strategy has the
+ advantage that it terminates quickly, and does no worse than existing
+ practice. It fails, however, to avoid fragmentation in some cases,
+ and to make the most efficient utilization of the internetwork in
+ other cases.
+
+ More sophisticated strategies involve "searching" for an accurate
+ PMTU estimate, by continuing to send datagrams with the DF bit while
+ varying their sizes. A good search strategy is one that obtains an
+ accurate estimate of the Path MTU without causing many packets to be
+ lost in the process.
+
+ Several possible strategies apply algorithmic functions to the
+ previous PMTU estimate to generate a new estimate. For example, one
+ could multiply the old estimate by a constant (say, 0.75). We do NOT
+ recommend this; it either converges far too slowly, or it
+ substantially underestimates the true PMTU.
+
+ A more sophisticated approach is to do a binary search on the packet
+ size. This converges somewhat faster, although it still takes 4 or 5
+ steps to converge from an FDDI MTU to an Ethernet MTU. A serious
+ disadvantage is that it requires a complex implementation in order to
+ recognize when a datagram has made it to the other end (indicating
+ that the current estimate is too low). We also do not recommend this
+ strategy.
+
+ One strategy that appears to work quite well starts from the
+ observation that there are, in practice, relatively few MTU values in
+ use in the Internet. Thus, rather than blindly searching through
+ arbitrarily chosen values, we can search only the ones that are
+
+
+Mogul & Deering [page 7]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ likely to appear. Moreover, since designers tend to chose MTUs in
+ similar ways, it is possible to collect groups of similar MTU values
+ and use the lowest value in the group as our search "plateau". (It
+ is clearly better to underestimate an MTU by a few per cent than to
+ overestimate it by one octet.)
+
+ In section 7, we describe how we arrived at a table of representative
+ MTU plateaus for use in PMTU estimation. With this table,
+ convergence is as good as binary search in the worst case, and is far
+ better in common cases (for example, it takes only two round-trip
+ times to go from an FDDI MTU to an Ethernet MTU). Since the plateaus
+ lie near powers of two, if an MTU is not represented in this table,
+ the algorithm will not underestimate it by more than a factor of 2.
+
+ Any search strategy must have some "memory" of previous estimates in
+ order to chose the next one. One approach is to use the
+ currently-cached estimate of the Path MTU, but in fact there is
+ better information available in the Datagram Too Big message itself.
+ All ICMP Destination Unreachable messages, including this one,
+ contain the IP header of the original datagram, which contains the
+ Total Length of the datagram that was too big to be forwarded without
+ fragmentation. Since this Total Length may be less than the current
+ PMTU estimate, but is nonetheless larger than the actual PMTU, it may
+ be a good input to the method for choosing the next PMTU estimate.
+
+ Note: routers based on implementations derived from 4.2BSD
+ Unix send an incorrect value for the Total Length of the
+ original IP datagram. The value sent by these routers is the
+ sum of the original Total Length and the original Header
+ Length (expressed in octets). Since it is impossible for the
+ host receiving such a Datagram Too Big message to know if it
+ sent by one of these routers, the host must be conservative
+ and assume that it is. If the Total Length field returned is
+ not less than the current PMTU estimate, it must be reduced by
+ 4 times the value of the returned Header Length field.
+
+ The strategy we recommend, then, is to use as the next PMTU estimate
+ the greatest plateau value that is less than the returned Total
+ Length field (corrected, if necessary, according to the Note above).
+
+
+6. Host implementation
+
+ In this section we discuss how PMTU Discovery is implemented in host
+ software. This is not a specification, but rather a set of
+ suggestions.
+
+ The issues include:
+
+Mogul & Deering [page 8]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ - What layer or layers implement PMTU Discovery?
+
+ - Where is the PMTU information cached?
+
+ - How is stale PMTU information removed?
+
+ - What must transport and higher layers do?
+
+
+6.1. Layering
+
+ In the IP architecture, the choice of what size datagram to send is
+ made by a protocol at a layer above IP. We refer to such a protocol
+ as a "packetization protocol". Packetization protocols are usually
+ transport protocols (for example, TCP) but can also be higher-layer
+ protocols (for example, protocols built on top of UDP).
+
+ Implementing PMTU Discovery in the packetization layers simplifies
+ some of the inter-layer issues, but has several drawbacks: the
+ implementation may have to be redone for each packetization protocol,
+ it becomes hard to share PMTU information between different
+ packetization layers, and the connection-oriented state maintained by
+ some packetization layers may not easily extend to save PMTU
+ information for long periods.
+
+ We therefore believe that the IP layer should store PMTU information
+ and that the ICMP layer should process received Datagram Too Big
+ messages. The packetization layers must still be able to respond to
+ changes in the Path MTU, by changing the size of the datagrams they
+ send, and must also be able to specify that datagrams are sent with
+ the DF bit set. We do not want the IP layer to simply set the DF bit
+ in every packet, since it is possible that a packetization layer,
+ perhaps a UDP application outside the kernel, is unable to change its
+ datagram size. Protocols involving intentional fragmentation, while
+ inelegant, are sometimes successful (NFS being the primary example),
+ and we do not want to break such protocols.
+
+ To support this layering, packetization layers require an extension
+ of the IP service interface defined in [1]:
+
+ A way to learn of changes in the value of MMS_S, the "maximum
+ send transport-message size", which is derived from the Path
+ MTU by subtracting the minimum IP header size.
+
+
+
+
+
+
+Mogul & Deering [page 9]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+6.2. Storing PMTU information
+
+ In general, the IP layer should associate each PMTU value that it has
+ learned with a specific path. A path is identified by a source
+ address, a destination address and an IP type-of-service. (Some
+ implementations do not record the source address of paths; this is
+ acceptable for single-homed hosts, which have only one possible
+ source address.)
+
+ Note: Some paths may be further distinguished by different
+ security classifications. The details of such classifications
+ are beyond the scope of this memo.
+
+ The obvious place to store this association is as a field in the
+ routing table entries. A host will not have a route for every
+ possible destination, but it should be able to cache a per-host route
+ for every active destination. (This requirement is already imposed
+ by the need to process ICMP Redirect messages.)
+
+ When the first packet is sent to a host for which no per-host route
+ exists, a route is chosen either from the set of per-network routes,
+ or from the set of default routes. The PMTU fields in these route
+ entries should be initialized to be the MTU of the associated
+ first-hop data link, and must never be changed by the PMTU Discovery
+ process. (PMTU Discovery only creates or changes entries for
+ per-host routes). Until a Datagram Too Big message is received, the
+ PMTU associated with the initially-chosen route is presumed to be
+ accurate.
+
+ When a Datagram Too Big message is received, the ICMP layer
+ determines a new estimate for the Path MTU (either from a non-zero
+ Next-Hop MTU value in the packet, or using the method described in
+ section 5). If a per-host route for this path does not exist, then
+ one is created (almost as if a per-host ICMP Redirect is being
+ processed; the new route uses the same first-hop router as the
+ current route). If the PMTU estimate associated with the per-host
+ route is higher than the new estimate, then the value in the routing
+ entry is changed.
+
+ The packetization layers must be notified about decreases in the
+ PMTU. Any packetization layer instance (for example, a TCP
+ connection) that is actively using the path must be notified if the
+ PMTU estimate is decreased.
+
+ Note: even if the Datagram Too Big message contains an
+ Original Datagram Header that refers to a UDP packet, the TCP
+ layer must be notified if any of its connections use the given
+
+
+Mogul & Deering [page 10]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ path.
+
+ Also, the instance that sent the datagram that elicited the Datagram
+ Too Big message should be notified that its datagram has been
+ dropped, even if the PMTU estimate has not changed, so that it may
+ retransmit the dropped datagram.
+
+ Note: The notification mechanism can be analogous to the
+ mechanism used to provide notification of an ICMP Source
+ Quench message. In some implementations (such as
+ 4.2BSD-derived systems), the existing notification mechanism
+ is not able to identify the specific connection involved, and
+ so an additional mechanism is necessary.
+
+ Alternatively, an implementation can avoid the use of an
+ asynchronous notification mechanism for PMTU decreases by
+ postponing notification until the next attempt to send a
+ datagram larger than the PMTU estimate. In this approach,
+ when an attempt is made to SEND a datagram with the DF bit
+ set, and the datagram is larger than the PMTU estimate, the
+ SEND function should fail and return a suitable error
+ indication. This approach may be more suitable to a
+ connectionless packetization layer (such as one using UDP),
+ which (in some implementations) may be hard to "notify" from
+ the ICMP layer. In this case, the normal timeout-based
+ retransmission mechanisms would be used to recover from the
+ dropped datagrams.
+
+ It is important to understand that the notification of the
+ packetization layer instances using the path about the change in the
+ PMTU is distinct from the notification of a specific instance that a
+ packet has been dropped. The latter should be done as soon as
+ practical (i.e., asynchronously from the point of view of the
+ packetization layer instance), while the former may be delayed until
+ a packetization layer instance wants to create a packet.
+ Retransmission should be done for only for those packets that are
+ known to be dropped, as indicated by a Datagram Too Big message.
+
+
+6.3. Purging stale PMTU information
+
+ Internetwork topology is dynamic; routes change over time. The PMTU
+ discovered for a given destination may be wrong if a new route comes
+ into use. Thus, PMTU information cached by a host can become stale.
+
+ Because a host using PMTU Discovery always sets the DF bit, if the
+ stale PMTU value is too large, this will be discovered almost
+
+
+Mogul & Deering [page 11]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ immediately once a datagram is sent to the given destination. No
+ such mechanism exists for realizing that a stale PMTU value is too
+ small, so an implementation should "age" cached values. When a PMTU
+ value has not been decreased for a while (on the order of 10
+ minutes), the PMTU estimate should be set to the first-hop data-link
+ MTU, and the packetization layers should be notified of the change.
+ This will cause the complete PMTU Discovery process to take place
+ again.
+
+ Note: an implementation should provide a means for changing
+ the timeout duration, including setting it to "infinity". For
+ example, hosts attached to an FDDI network which is then
+ attached to the rest of the Internet via a slow serial line
+ are never going to discover a new non-local PMTU, so they
+ should not have to put up with dropped datagrams every 10
+ minutes.
+
+ An upper layer MUST not retransmit datagrams in response to an
+ increase in the PMTU estimate, since this increase never comes in
+ response to an indication of a dropped datagram.
+
+ One approach to implementing PMTU aging is to add a timestamp field
+ to the routing table entry. This field is initialized to a
+ "reserved" value, indicating that the PMTU has never been changed.
+ Whenever the PMTU is decreased in response to a Datagram Too Big
+ message, the timestamp is set to the current time.
+
+ Once a minute, a timer-driven procedure runs through the routing
+ table, and for each entry whose timestamp is not "reserved" and is
+ older than the timeout interval:
+
+ - The PMTU estimate is set to the MTU of the associated first
+ hop.
+
+ - Packetization layers using this route are notified of the
+ increase.
+
+ PMTU estimates may disappear from the routing table if the per-host
+ routes are removed; this can happen in response to an ICMP Redirect
+ message, or because certain routing-table daemons delete old routes
+ after several minutes. Also, on a multi-homed host a topology change
+ may result in the use of a different source interface. When this
+ happens, if the packetization layer is not notified then it may
+ continue to use a cached PMTU value that is now too small. One
+ solution is to notify the packetization layer of a possible PMTU
+ change whenever a Redirect message causes a route change, and
+ whenever a route is simply deleted from the routing table.
+
+
+Mogul & Deering [page 12]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ Note: a more sophisticated method for detecting PMTU increases
+ is described in section 7.1.
+
+
+6.4. TCP layer actions
+
+ The TCP layer must track the PMTU for the destination of a
+ connection; it should not send datagrams that would be larger than
+ this. A simple implementation could ask the IP layer for this value
+ (using the GET_MAXSIZES interface described in [1]) each time it
+ created a new segment, but this could be inefficient. Moreover, TCP
+ implementations that follow the "slow-start" congestion-avoidance
+ algorithm [4] typically calculate and cache several other values
+ derived from the PMTU. It may be simpler to receive asynchronous
+ notification when the PMTU changes, so that these variables may be
+ updated.
+
+ A TCP implementation must also store the MSS value received from its
+ peer (which defaults to 536), and not send any segment larger than
+ this MSS, regardless of the PMTU. In 4.xBSD-derived implementations,
+ this requires adding an additional field to the TCP state record.
+
+ Finally, when a Datagram Too Big message is received, it implies that
+ a datagram was dropped by the router that sent the ICMP message. It
+ is sufficient to treat this as any other dropped segment, and wait
+ until the retransmission timer expires to cause retransmission of the
+ segment. If the PMTU Discovery process requires several steps to
+ estimate the right PMTU, this could delay the connection by many
+ round-trip times.
+
+ Alternatively, the retransmission could be done in immediate response
+ to a notification that the Path MTU has changed, but only for the
+ specific connection specified by the Datagram Too Big message. The
+ datagram size used in the retransmission should, of course, be no
+ larger than the new PMTU.
+
+ Note: One MUST not retransmit in response to every Datagram
+ Too Big message, since a burst of several oversized segments
+ will give rise to several such messages and hence several
+ retransmissions of the same data. If the new estimated PMTU
+ is still wrong, the process repeats, and there is an
+ exponential growth in the number of superfluous segments sent!
+
+ This means that the TCP layer must be able to recognize when a
+ Datagram Too Big notification actually decreases the PMTU that
+ it has already used to send a datagram on the given
+ connection, and should ignore any other notifications.
+
+
+Mogul & Deering [page 13]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ Modern TCP implementations incorporate "congestion advoidance" and
+ "slow-start" algorithms to improve performance [4]. Unlike a
+ retransmission caused by a TCP retransmission timeout, a
+ retransmission caused by a Datagram Too Big message should not change
+ the congestion window. It should, however, trigger the slow-start
+ mechanism (i.e., only one segment should be retransmitted until
+ acknowledgements begin to arrive again).
+
+ TCP performance can be reduced if the sender's maximum window size is
+ not an exact multiple of the segment size in use (this is not the
+ congestion window size, which is always a multiple of the segment
+ size). In many system (such as those derived from 4.2BSD), the
+ segment size is often set to 1024 octets, and the maximum window size
+ (the "send space") is usually a multiple of 1024 octets, so the
+ proper relationship holds by default. If PMTU Discovery is used,
+ however, the segment size may not be a submultiple of the send space,
+ and it may change during a connection; this means that the TCP layer
+ may need to change the transmission window size when PMTU Discovery
+ changes the PMTU value. The maximum window size should be set to the
+ greatest multiple of the segment size (PMTU - 40) that is less than
+ or equal to the sender's buffer space size.
+
+ PMTU Discovery does not affect the value sent in the TCP MSS option,
+ because that value is used by the other end of the connection, which
+ may be using an unrelated PMTU value.
+
+
+6.5. Issues for other transport protocols
+
+ Some transport protocols (such as ISO TP4 [3]) are not allowed to
+ repacketize when doing a retransmission. That is, once an attempt is
+ made to transmit a datagram of a certain size, its contents cannot be
+ split into smaller datagrams for retransmission. In such a case, the
+ original datagram should be retransmitted without the DF bit set,
+ allowing it to be fragmented as necessary to reach its destination.
+ Subsequent datagrams, when transmitted for the first time, should be
+ no larger than allowed by the Path MTU, and should have the DF bit
+ set.
+
+ The Sun Network File System (NFS) uses a Remote Procedure Call (RPC)
+ protocol [11] that, in many cases, sends datagrams that must be
+ fragmented even for the first-hop link. This might improve
+ performance in certain cases, but it is known to cause reliability
+ and performance problems, especially when the client and server are
+ separated by routers.
+
+ We recommend that NFS implementations use PMTU Discovery whenever
+
+
+Mogul & Deering [page 14]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ routers are involved. Most NFS implementations allow the RPC
+ datagram size to be changed at mount-time (indirectly, by changing
+ the effective file system block size), but might require some
+ modification to support changes later on.
+
+ Also, since a single NFS operation cannot be split across several UDP
+ datagrams, certain operations (primarily, those operating on file
+ names and directories) require a minimum datagram size that may be
+ larger than the PMTU. NFS implementations should not reduce the
+ datagram size below this threshold, even if PMTU Discovery suggests a
+ lower value. (Of course, in this case datagrams should not be sent
+ with DF set.)
+
+
+6.6. Management interface
+
+ We suggest that an implementation provide a way for a system utility
+ program to:
+
+ - Specify that PMTU Discovery not be done on a given route.
+
+ - Change the PMTU value associated with a given route.
+
+ The former can be accomplished by associating a flag with the routing
+ entry; when a packet is sent via a route with this flag set, the IP
+ layer leaves the DF bit clear no matter what the upper layer
+ requests.
+
+ These features might be used to work around an anomalous situation,
+ or by a routing protocol implementation that is able to obtain Path
+ MTU values.
+
+ The implementation should also provide a way to change the timeout
+ period for aging stale PMTU information.
+
+
+7. Likely values for Path MTUs
+
+ The algorithm recommended in section 5 for "searching" the space of
+ Path MTUs is based on a table of values that severely restricts the
+ search space. We describe here a table of MTU values that, as of
+ this writing, represents all major data-link technologies in use in
+ the Internet.
+
+ In table 7-1, data links are listed in order of decreasing MTU, and
+ grouped so that each set of similar MTUs is associated with a
+ "plateau" equal to the lowest MTU in the group. (The table also
+
+
+Mogul & Deering [page 15]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ includes some entries not currently associated with a data link, and
+ gives references where available). Where a plateau represents more
+ than one MTU, the table shows the maximum inaccuracy associated with
+ the plateau, as a percentage.
+
+ We do not expect that the values in the table, especially for higher
+ MTU levels, are going to be valid forever. The values given here are
+ an implementation suggestion, NOT a specification or requirement.
+ Implementors should use up-to-date references to pick a set of
+ plateaus; it is important that the table not contain too many entries
+ or the process of searching for a PMTU might waste Internet
+ resources. Implementors should also make it convenient for customers
+ without source code to update the table values in their systems (for
+ example, the table in a BSD-derived Unix kernel could be changed
+ using a new "ioctl" command).
+
+ Note: It might be a good idea to add a few table entries for
+ values equal to small powers of 2 plus 40 (for the IP and TCP
+ headers), where no similar values exist, since this seems to
+ be a reasonably non-arbitrary way of choosing arbitrary
+ values.
+
+ The table might also contain entries for values slightly less
+ than large powers of 2, in case MTUs are defined near those
+ values (it is better in this case for the table entries to be
+ low than to be high, or else the next lowest plateau may be
+ chosen instead).
+
+
+7.1. A better way to detect PMTU increases
+
+ Section 6.3 suggests detecting increases in the PMTU value by
+ periodically increasing the PTMU estimate to the first-hop MTU.
+ Since it is likely that this process will simply "rediscover" the
+ current PTMU estimate, at the cost of several dropped datagrams, it
+ should not be done often.
+
+ A better approach is to periodically increase the PMTU estimate to
+ the next-highest value in the plateau table (or the first-hop MTU, if
+ that is smaller). If the increased estimate is wrong, at most one
+ round-trip time is wasted before the correct value is rediscovered.
+ If the increased estimate is still too low, a higher estimate will be
+ attempted somewhat later.
+
+ Because it may take several such periods to discover a significant
+ increase in the PMTU, we recommend that a short timeout period should
+ be used after the estimate is increased, and a longer timeout be used
+
+
+Mogul & Deering [page 16]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+ Plateau MTU Comments Reference
+ ------ --- -------- ---------
+ 65535 Official maximum MTU RFC 791
+ 65535 Hyperchannel RFC 1044
+ 65535
+ 32000 Just in case
+ 17914 16Mb IBM Token Ring ref. [6]
+ 17914
+ 8166 IEEE 802.4 RFC 1042
+ 8166
+ 4464 IEEE 802.5 (4Mb max) RFC 1042
+ 4352 FDDI (Revised) RFC 1188
+ 4352 (1%)
+ 2048 Wideband Network RFC 907
+ 2002 IEEE 802.5 (4Mb recommended) RFC 1042
+ 2002 (2%)
+ 1536 Exp. Ethernet Nets RFC 895
+ 1500 Ethernet Networks RFC 894
+ 1500 Point-to-Point (default) RFC 1134
+ 1492 IEEE 802.3 RFC 1042
+ 1492 (3%)
+ 1006 SLIP RFC 1055
+ 1006 ARPANET BBN 1822
+ 1006
+ 576 X.25 Networks RFC 877
+ 544 DEC IP Portal ref. [10]
+ 512 NETBIOS RFC 1088
+ 508 IEEE 802/Source-Rt Bridge RFC 1042
+ 508 ARCNET RFC 1051
+ 508 (13%)
+ 296 Point-to-Point (low delay) RFC 1144
+ 296
+ 68 Official minimum MTU RFC 791
+
+ Table 7-1: Common MTUs in the Internet
+
+ after the PTMU estimate is decreased because of a Datagram Too Big
+ message. For example, after the PTMU estimate is decreased, the
+ timeout should be set to 10 minutes; once this timer expires and a
+ larger MTU is attempted, the timeout can be set to a much smaller
+ value (say, 2 minutes). In no case should the timeout be shorter
+ than the estimated round-trip time, if this is known.
+
+
+
+
+
+
+
+Mogul & Deering [page 17]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+8. Security considerations
+
+ This Path MTU Discovery mechanism makes possible two denial-of-
+ service attacks, both based on a malicious party sending false
+ Datagram Too Big messages to an Internet host.
+
+ In the first attack, the false message indicates a PMTU much smaller
+ than reality. This should not entirely stop data flow, since the
+ victim host should never set its PMTU estimate below the absolute
+ minimum, but at 8 octets of IP data per datagram, progress could be
+ slow.
+
+ In the other attack, the false message indicates a PMTU greater than
+ reality. If believed, this could cause temporary blockage as the
+ victim sends datagrams that will be dropped by some router. Within
+ one round-trip time, the host would discover its mistake (receiving
+ Datagram Too Big messages from that router), but frequent repetition
+ of this attack could cause lots of datagrams to be dropped. A host,
+ however, should never raise its estimate of the PMTU based on a
+ Datagram Too Big message, so should not be vulnerable to this attack.
+
+ A malicious party could also cause problems if it could stop a victim
+ from receiving legitimate Datagram Too Big messages, but in this case
+ there are simpler denial-of-service attacks available.
+
+
+References
+
+[1] R. Braden, ed. Requirements for Internet Hosts -- Communication
+ Layers. RFC 1122, SRI Network Information Center, October, 1989.
+
+[2] Geof Cooper. IP Datagram Sizes. Electronic distribution of the
+ TCP-IP Discussion Group, Message-ID
+ <8705240517.AA01407@apolling.imagen.uucp>.
+
+[3] ISO. ISO Transport Protocol Specification: ISO DP 8073. RFC 905,
+ SRI Network Information Center, April, 1984.
+
+[4] Van Jacobson. Congestion Avoidance and Control. In Proc. SIGCOMM
+ '88 Symposium on Communications Architectures and Protocols, pages
+ 314-329. Stanford, CA, August, 1988.
+
+[5] C. Kent and J. Mogul. Fragmentation Considered Harmful. In Proc.
+ SIGCOMM '87 Workshop on Frontiers in Computer Communications
+ Technology. August, 1987.
+
+[6] Drew Daniel Perkins. Private Communication.
+
+
+Mogul & Deering [page 18]
+
+
+RFC 1191 Path MTU Discovery November 1990
+
+
+
+
+[7] J. Postel. Internet Control Message Protocol. RFC 792, SRI
+ Network Information Center, September, 1981.
+
+[8] J. Postel. Internet Protocol. RFC 791, SRI Network Information
+ Center, September, 1981.
+
+[9] J. Postel. The TCP Maximum Segment Size and Related Topics. RFC
+ 879, SRI Network Information Center, November, 1983.
+
+[10] Michael Reilly. Private Communication.
+
+[11] Sun Microsystems, Inc. RPC: Remote Procedure Call Protocol. RFC
+ 1057, SRI Network Information Center, June, 1988.
+
+
+
+Authors' Addresses
+
+ Jeffrey Mogul
+ Digital Equipment Corporation Western Research Laboratory
+ 100 Hamilton Avenue
+ Palo Alto, CA 94301
+
+ Phone: (415) 853-6643
+ EMail: mogul@decwrl.dec.com
+
+
+ Steve Deering
+ Xerox Palo Alto Research Center
+ 3333 Coyote Hill Road
+ Palo Alto, CA 94304
+
+ Phone: (415) 494-4839
+ EMail: deering@xerox.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Mogul & Deering [page 19]
+ \ No newline at end of file