doc: Add RFC documents

author: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committer: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit: 4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree: e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc1191.txt
parent: ea76e11061bda059ae9f9ad130a9895cc85607db (diff)
1 files changed, 1084 insertions, 0 deletions
diff --git a/doc/rfc/rfc1191.txt b/doc/rfc/rfc1191.txt
new file mode 100644
index 0000000..b0af14f
--- /dev/null
+++ b/doc/rfc/rfc1191.txt
@@ -0,0 +1,1084 @@
+
+
+
+
+Network Working Group                                           J. Mogul
+Request for Comments: 1191                                        DECWRL
+Obsoletes: RFC 1063                                           S. Deering
+                                                     Stanford University
+                                                           November 1990
+
+                           Path MTU Discovery
+
+
+Status of this Memo
+
+   This RFC specifies a protocol on the IAB Standards Track for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "IAB
+   Official Protocol Standards" for the standardization state and status
+   of this protocol.  Distribution of this memo is unlimited.
+
+
+                           Table of Contents
+
+       Status of this Memo                                             1
+       Abstract                                                        2
+       Acknowledgements                                                2
+       1. Introduction                                                 2
+       2. Protocol overview                                            3
+       3. Host specification                                           4
+           3.1. TCP MSS Option                                         5
+       4. Router specification                                         6
+       5. Host processing of old-style messages                        7
+       6. Host implementation                                          8
+           6.1. Layering                                               9
+           6.2. Storing PMTU information                              10
+           6.3. Purging stale PMTU information                        11
+           6.4. TCP layer actions                                     13
+           6.5. Issues for other transport protocols                  14
+           6.6. Management interface                                  15
+       7. Likely values for Path MTUs                                 15
+           7.1. A better way to detect PMTU increases                 16
+       8. Security considerations                                     18
+       References                                                     18
+       Authors' Addresses                                             19
+
+
+                             List of Tables
+
+       Table 7-1:   Common MTUs in the Internet                       17
+
+
+
+
+
+
+Mogul & Deering                                                 [page 1]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+Abstract
+
+   This memo describes a technique for dynamically discovering the
+   maximum transmission unit (MTU) of an arbitrary internet path.  It
+   specifies a small change to the way routers generate one type of ICMP
+   message.  For a path that passes through a router that has not been
+   so changed, this technique might not discover the correct Path MTU,
+   but it will always choose a Path MTU as accurate as, and in many
+   cases more accurate than, the Path MTU that would be chosen by
+   current practice.
+
+
+Acknowledgements
+
+   This proposal is a product of the IETF MTU Discovery Working Group.
+
+   The mechanism proposed here was first suggested by Geof Cooper [2],
+   who in two short paragraphs set out all the basic ideas that took the
+   Working Group months to reinvent.
+
+
+1. Introduction
+
+   When one IP host has a large amount of data to send to another host,
+   the data is transmitted as a series of IP datagrams.  It is usually
+   preferable that these datagrams be of the largest size that does not
+   require fragmentation anywhere along the path from the source to the
+   destination.  (For the case against fragmentation, see [5].)  This
+   datagram size is referred to as the Path MTU (PMTU), and it is equal
+   to the minimum of the MTUs of each hop in the path.  A shortcoming of
+   the current Internet protocol suite is the lack of a standard
+   mechanism for a host to discover the PMTU of an arbitrary path.
+
+          Note: The Path MTU is what in [1] is called the "Effective MTU
+          for sending" (EMTU_S).  A PMTU is associated with a path,
+          which is a particular combination of IP source and destination
+          address and perhaps a Type-of-service (TOS).
+
+   The current practice [1] is to use the lesser of 576 and the
+   first-hop MTU as the PMTU for any destination that is not connected
+   to the same network or subnet as the source.  In many cases, this
+   results in the use of smaller datagrams than necessary, because many
+   paths have a PMTU greater than 576.  A host sending datagrams much
+   smaller than the Path MTU allows is wasting Internet resources and
+   probably getting suboptimal throughput.  Furthermore, current
+   practice does not prevent fragmentation in all cases, since there are
+   some paths whose PMTU is less than 576.
+
+
+Mogul & Deering                                                 [page 2]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   It is expected that future routing protocols will be able to provide
+   accurate PMTU information within a routing area, although perhaps not
+   across multi-level routing hierarchies.  It is not clear how soon
+   that will be ubiquitously available, so for the next several years
+   the Internet needs a simple mechanism that discovers PMTUs without
+   wasting resources and that works before all hosts and routers are
+   modified.
+
+
+2. Protocol overview
+
+   In this memo, we describe a technique for using the Don't Fragment
+   (DF) bit in the IP header to dynamically discover the PMTU of a path.
+   The basic idea is that a source host initially assumes that the PMTU
+   of a path is the (known) MTU of its first hop, and sends all
+   datagrams on that path with the DF bit set.  If any of the datagrams
+   are too large to be forwarded without fragmentation by some router
+   along the path, that router will discard them and return ICMP
+   Destination Unreachable messages with a code meaning "fragmentation
+   needed and DF set" [7].  Upon receipt of such a message (henceforth
+   called a "Datagram Too Big" message), the source host reduces its
+   assumed PMTU for the path.
+
+   The PMTU discovery process ends when the host's estimate of the PMTU
+   is low enough that its datagrams can be delivered without
+   fragmentation.  Or, the host may elect to end the discovery process
+   by ceasing to set the DF bit in the datagram headers; it may do so,
+   for example, because it is willing to have datagrams fragmented in
+   some circumstances.  Normally, the host continues to set DF in all
+   datagrams, so that if the route changes and the new PMTU is lower, it
+   will be discovered.
+
+   Unfortunately, the Datagram Too Big message, as currently specified,
+   does not report the MTU of the hop for which the rejected datagram
+   was too big, so the source host cannot tell exactly how much to
+   reduce its assumed PMTU.  To remedy this, we propose that a currently
+   unused header field in the Datagram Too Big message be used to report
+   the MTU of the constricting hop.  This is the only change specified
+   for routers in support of PMTU Discovery.
+
+   The PMTU of a path may change over time, due to changes in the
+   routing topology.  Reductions of the PMTU are detected by Datagram
+   Too Big messages, except on paths for which the host has stopped
+   setting the DF bit.  To detect increases in a path's PMTU, a host
+   periodically increases its assumed PMTU (and if it had stopped,
+   resumes setting the DF bit).  This will almost always result in
+   datagrams being discarded and Datagram Too Big messages being
+
+
+Mogul & Deering                                                 [page 3]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   generated, because in most cases the PMTU of the path will not have
+   changed, so it should be done infrequently.
+
+   Since this mechanism essentially guarantees that host will not
+   receive any fragments from a peer doing PMTU Discovery, it may aid in
+   interoperating with certain hosts that (improperly) are unable to
+   reassemble fragmented datagrams.
+
+
+3. Host specification
+
+   When a host receives a Datagram Too Big message, it MUST reduce its
+   estimate of the PMTU for the relevant path, based on the value of the
+   Next-Hop MTU field in the message (see section 4).  We do not specify
+   the precise behavior of a host in this circumstance, since different
+   applications may have different requirements, and since different
+   implementation architectures may favor different strategies.
+
+   We do require that after receiving a Datagram Too Big message, a host
+   MUST attempt to avoid eliciting more such messages in the near
+   future.  The host may either reduce the size of the datagrams it is
+   sending along the path, or cease setting the Don't Fragment bit in
+   the headers of those datagrams.  Clearly, the former strategy may
+   continue to elicit Datagram Too Big messages for a while, but since
+   each of these messages (and the dropped datagrams they respond to)
+   consume Internet resources, the host MUST force the PMTU Discovery
+   process to converge.
+
+   Hosts using PMTU Discovery MUST detect decreases in Path MTU as fast
+   as possible.  Hosts MAY detect increases in Path MTU, but because
+   doing so requires sending datagrams larger than the current estimated
+   PMTU, and because the likelihood is that the PMTU will not have
+   increased, this MUST be done at infrequent intervals.  An attempt to
+   detect an increase (by sending a datagram larger than the current
+   estimate) MUST NOT be done less than 5 minutes after a Datagram Too
+   Big message has been received for the given destination, or less than
+   1 minute after a previous, successful attempted increase.  We
+   recommend setting these timers at twice their minimum values (10
+   minutes and 2 minutes, respectively).
+
+   Hosts MUST be able to deal with Datagram Too Big messages that do not
+   include the next-hop MTU, since it is not feasible to upgrade all the
+   routers in the Internet in any finite time.  A Datagram Too Big
+   message from an unmodified router can be recognized by the presence
+   of a zero in the (newly-defined) Next-Hop MTU field.  (This is
+   required by the ICMP specification [7], which says that "unused"
+   fields must be zero.)  In section 5, we discuss possible strategies
+
+
+Mogul & Deering                                                 [page 4]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   for a host to follow in response to an old-style Datagram Too Big
+   message (one sent by an unmodified router).
+
+   A host MUST never reduce its estimate of the Path MTU below 68
+   octets.
+
+   A host MUST not increase its estimate of the Path MTU in response to
+   the contents of a Datagram Too Big message.  A message purporting to
+   announce an increase in the Path MTU might be a stale datagram that
+   has been floating around in the Internet, a false packet injected as
+   part of a denial-of-service attack, or the result of having multiple
+   paths to the destination.
+
+
+3.1. TCP MSS Option
+
+   A host doing PMTU Discovery must obey the rule that it not send IP
+   datagrams larger than 576 octets unless it has permission from the
+   receiver.  For TCP connections, this means that a host must not send
+   datagrams larger than 40 octets plus the Maximum Segment Size (MSS)
+   sent by its peer.
+
+          Note: The TCP MSS is defined to be the relevant IP datagram
+          size minus 40 [9].  The default of 576 octets for the maximum
+          IP datagram size yields a default of 536 octets for the TCP
+          MSS.
+
+   Section 4.2.2.6 of "Requirements for Internet Hosts -- Communication
+   Layers" [1] says:
+
+          Some TCP implementations send an MSS option only if the
+          destination host is on a non-connected network.  However, in
+          general the TCP layer may not have the appropriate information
+          to make this decision, so it is preferable to leave to the IP
+          layer the task of determining a suitable MTU for the Internet
+          path.
+
+   Actually, many TCP implementations always send an MSS option, but set
+   the value to 536 if the destination is non-local.  This behavior was
+   correct when the Internet was full of hosts that did not follow the
+   rule that datagrams larger than 576 octets should not be sent to
+   non-local destinations.  Now that most hosts do follow this rule, it
+   is unnecessary to limit the value in the TCP MSS option to 536 for
+   non-local peers.
+
+   Moreover, doing this prevents PMTU Discovery from discovering PMTUs
+   larger than 576, so hosts SHOULD no longer lower the value they send
+
+
+Mogul & Deering                                                 [page 5]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   in the MSS option.  The MSS option should be 40 octets less than the
+   size of the largest datagram the host is able to reassemble (MMS_R,
+   as defined in [1]); in many cases, this will be the architectural
+   limit of 65495 (65535 - 40) octets.  A host MAY send an MSS value
+   derived from the MTU of its connected network (the maximum MTU over
+   its connected networks, for a multi-homed host); this should not
+   cause problems for PMTU Discovery, and may dissuade a broken peer
+   from sending enormous datagrams.
+
+          Note: At the moment, we see no reason to send an MSS greater
+          than the maximum MTU of the connected networks, and we
+          recommend that hosts do not use 65495.  It is quite possible
+          that some IP implementations have sign-bit bugs that would be
+          tickled by unnecessary use of such a large MSS.
+
+
+4. Router specification
+
+   When a router is unable to forward a datagram because it exceeds the
+   MTU of the next-hop network and its Don't Fragment bit is set, the
+   router is required to return an ICMP Destination Unreachable message
+   to the source of the datagram, with the Code indicating
+   "fragmentation needed and DF set".  To support the Path MTU Discovery
+   technique specified in this memo, the router MUST include the MTU of
+   that next-hop network in the low-order 16 bits of the ICMP header
+   field that is labelled "unused" in the ICMP specification [7].  The
+   high-order 16 bits remain unused, and MUST be set to zero.  Thus, the
+   message has the following format:
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |   Type = 3    |   Code = 4    |           Checksum            |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           unused = 0          |         Next-Hop MTU          |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |      Internet Header + 64 bits of Original Datagram Data      |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+
+   The value carried in the Next-Hop MTU field is:
+
+          The size in octets of the largest datagram that could be
+          forwarded, along the path of the original datagram, without
+          being fragmented at this router.  The size includes the IP
+          header and IP data, and does not include any lower-level
+          headers.
+
+
+Mogul & Deering                                                 [page 6]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   This field will never contain a value less than 68, since every
+   router "must be able to forward a datagram of 68 octets without
+   fragmentation" [8].
+
+
+5. Host processing of old-style messages
+
+   In this section we outline several possible strategies for a host to
+   follow upon receiving a Datagram Too Big message from an unmodified
+   router (i.e., one where the Next-Hop MTU field is zero).  This
+   section is not part of the protocol specification.
+
+   The simplest thing for a host to do in response to such a message is
+   to assume that the PMTU is the minimum of its currently-assumed PMTU
+   and 576, and to stop setting the DF bit in datagrams sent on that
+   path.  Thus, the host falls back to the same PMTU as it would choose
+   under current practice (see section 3.3.3 of "Requirements for
+   Internet Hosts -- Communication Layers" [1]).  This strategy has the
+   advantage that it terminates quickly, and does no worse than existing
+   practice.  It fails, however, to avoid fragmentation in some cases,
+   and to make the most efficient utilization of the internetwork in
+   other cases.
+
+   More sophisticated strategies involve "searching" for an accurate
+   PMTU estimate, by continuing to send datagrams with the DF bit while
+   varying their sizes.  A good search strategy is one that obtains an
+   accurate estimate of the Path MTU without causing many packets to be
+   lost in the process.
+
+   Several possible strategies apply algorithmic functions to the
+   previous PMTU estimate to generate a new estimate.  For example, one
+   could multiply the old estimate by a constant (say, 0.75).  We do NOT
+   recommend this; it either converges far too slowly, or it
+   substantially underestimates the true PMTU.
+
+   A more sophisticated approach is to do a binary search on the packet
+   size.  This converges somewhat faster, although it still takes 4 or 5
+   steps to converge from an FDDI MTU to an Ethernet MTU.  A serious
+   disadvantage is that it requires a complex implementation in order to
+   recognize when a datagram has made it to the other end (indicating
+   that the current estimate is too low).  We also do not recommend this
+   strategy.
+
+   One strategy that appears to work quite well starts from the
+   observation that there are, in practice, relatively few MTU values in
+   use in the Internet.  Thus, rather than blindly searching through
+   arbitrarily chosen values, we can search only the ones that are
+
+
+Mogul & Deering                                                 [page 7]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   likely to appear.  Moreover, since designers tend to chose MTUs in
+   similar ways, it is possible to collect groups of similar MTU values
+   and use the lowest value in the group as our search "plateau".  (It
+   is clearly better to underestimate an MTU by a few per cent than to
+   overestimate it by one octet.)
+
+   In section 7, we describe how we arrived at a table of representative
+   MTU plateaus for use in PMTU estimation.  With this table,
+   convergence is as good as binary search in the worst case, and is far
+   better in common cases (for example, it takes only two round-trip
+   times to go from an FDDI MTU to an Ethernet MTU).  Since the plateaus
+   lie near powers of two, if an MTU is not represented in this table,
+   the algorithm will not underestimate it by more than a factor of 2.
+
+   Any search strategy must have some "memory" of previous estimates in
+   order to chose the next one.  One approach is to use the
+   currently-cached estimate of the Path MTU, but in fact there is
+   better information available in the Datagram Too Big message itself.
+   All ICMP Destination Unreachable messages, including this one,
+   contain the IP header of the original datagram, which contains the
+   Total Length of the datagram that was too big to be forwarded without
+   fragmentation.  Since this Total Length may be less than the current
+   PMTU estimate, but is nonetheless larger than the actual PMTU, it may
+   be a good input to the method for choosing the next PMTU estimate.
+
+          Note: routers based on implementations derived from 4.2BSD
+          Unix send an incorrect value for the Total Length of the
+          original IP datagram.  The value sent by these routers is the
+          sum of the original Total Length and the original Header
+          Length (expressed in octets).  Since it is impossible for the
+          host receiving such a Datagram Too Big message to know if it
+          sent by one of these routers, the host must be conservative
+          and assume that it is.  If the Total Length field returned is
+          not less than the current PMTU estimate, it must be reduced by
+          4 times the value of the returned Header Length field.
+
+   The strategy we recommend, then, is to use as the next PMTU estimate
+   the greatest plateau value that is less than the returned Total
+   Length field (corrected, if necessary, according to the Note above).
+
+
+6. Host implementation
+
+   In this section we discuss how PMTU Discovery is implemented in host
+   software.  This is not a specification, but rather a set of
+   suggestions.
+
+   The issues include:
+
+Mogul & Deering                                                 [page 8]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+      - What layer or layers implement PMTU Discovery?
+
+      - Where is the PMTU information cached?
+
+      - How is stale PMTU information removed?
+
+      - What must transport and higher layers do?
+
+
+6.1. Layering
+
+   In the IP architecture, the choice of what size datagram to send is
+   made by a protocol at a layer above IP.  We refer to such a protocol
+   as a "packetization protocol".  Packetization protocols are usually
+   transport protocols (for example, TCP) but can also be higher-layer
+   protocols (for example, protocols built on top of UDP).
+
+   Implementing PMTU Discovery in the packetization layers simplifies
+   some of the inter-layer issues, but has several drawbacks: the
+   implementation may have to be redone for each packetization protocol,
+   it becomes hard to share PMTU information between different
+   packetization layers, and the connection-oriented state maintained by
+   some packetization layers may not easily extend to save PMTU
+   information for long periods.
+
+   We therefore believe that the IP layer should store PMTU information
+   and that the ICMP layer should process received Datagram Too Big
+   messages.  The packetization layers must still be able to respond to
+   changes in the Path MTU, by changing the size of the datagrams they
+   send, and must also be able to specify that datagrams are sent with
+   the DF bit set.  We do not want the IP layer to simply set the DF bit
+   in every packet, since it is possible that a packetization layer,
+   perhaps a UDP application outside the kernel, is unable to change its
+   datagram size.  Protocols involving intentional fragmentation, while
+   inelegant, are sometimes successful (NFS being the primary example),
+   and we do not want to break such protocols.
+
+   To support this layering, packetization layers require an extension
+   of the IP service interface defined in [1]:
+
+          A way to learn of changes in the value of MMS_S, the "maximum
+          send transport-message size", which is derived from the Path
+          MTU by subtracting the minimum IP header size.
+
+
+
+
+
+
+Mogul & Deering                                                 [page 9]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+6.2. Storing PMTU information
+
+   In general, the IP layer should associate each PMTU value that it has
+   learned with a specific path.  A path is identified by a source
+   address, a destination address and an IP type-of-service.  (Some
+   implementations do not record the source address of paths; this is
+   acceptable for single-homed hosts, which have only one possible
+   source address.)
+
+          Note: Some paths may be further distinguished by different
+          security classifications.  The details of such classifications
+          are beyond the scope of this memo.
+
+   The obvious place to store this association is as a field in the
+   routing table entries.  A host will not have a route for every
+   possible destination, but it should be able to cache a per-host route
+   for every active destination.  (This requirement is already imposed
+   by the need to process ICMP Redirect messages.)
+
+   When the first packet is sent to a host for which no per-host route
+   exists, a route is chosen either from the set of per-network routes,
+   or from the set of default routes.  The PMTU fields in these route
+   entries should be initialized to be the MTU of the associated
+   first-hop data link, and must never be changed by the PMTU Discovery
+   process.  (PMTU Discovery only creates or changes entries for
+   per-host routes).  Until a Datagram Too Big message is received, the
+   PMTU associated with the initially-chosen route is presumed to be
+   accurate.
+
+   When a Datagram Too Big message is received, the ICMP layer
+   determines a new estimate for the Path MTU (either from a non-zero
+   Next-Hop MTU value in the packet, or using the method described in
+   section 5).  If a per-host route for this path does not exist, then
+   one is created (almost as if a per-host ICMP Redirect is being
+   processed; the new route uses the same first-hop router as the
+   current route).  If the PMTU estimate associated with the per-host
+   route is higher than the new estimate, then the value in the routing
+   entry is changed.
+
+   The packetization layers must be notified about decreases in the
+   PMTU.  Any packetization layer instance (for example, a TCP
+   connection) that is actively using the path must be notified if the
+   PMTU estimate is decreased.
+
+          Note: even if the Datagram Too Big message contains an
+          Original Datagram Header that refers to a UDP packet, the TCP
+          layer must be notified if any of its connections use the given
+
+
+Mogul & Deering                                                [page 10]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+          path.
+
+   Also, the instance that sent the datagram that elicited the Datagram
+   Too Big message should be notified that its datagram has been
+   dropped, even if the PMTU estimate has not changed, so that it may
+   retransmit the dropped datagram.
+
+          Note: The notification mechanism can be analogous to the
+          mechanism used to provide notification of an ICMP Source
+          Quench message.  In some implementations (such as
+          4.2BSD-derived systems), the existing notification mechanism
+          is not able to identify the specific connection involved, and
+          so an additional mechanism is necessary.
+
+          Alternatively, an implementation can avoid the use of an
+          asynchronous notification mechanism for PMTU decreases by
+          postponing notification until the next attempt to send a
+          datagram larger than the PMTU estimate.  In this approach,
+          when an attempt is made to SEND a datagram with the DF bit
+          set, and the datagram is larger than the PMTU estimate, the
+          SEND function should fail and return a suitable error
+          indication.  This approach may be more suitable to a
+          connectionless packetization layer (such as one using UDP),
+          which (in some implementations) may be hard to "notify" from
+          the ICMP layer.  In this case, the normal timeout-based
+          retransmission mechanisms would be used to recover from the
+          dropped datagrams.
+
+   It is important to understand that the notification of the
+   packetization layer instances using the path about the change in the
+   PMTU is distinct from the notification of a specific instance that a
+   packet has been dropped.  The latter should be done as soon as
+   practical (i.e., asynchronously from the point of view of the
+   packetization layer instance), while the former may be delayed until
+   a packetization layer instance wants to create a packet.
+   Retransmission should be done for only for those packets that are
+   known to be dropped, as indicated by a Datagram Too Big message.
+
+
+6.3. Purging stale PMTU information
+
+   Internetwork topology is dynamic; routes change over time.  The PMTU
+   discovered for a given destination may be wrong if a new route comes
+   into use.  Thus, PMTU information cached by a host can become stale.
+
+   Because a host using PMTU Discovery always sets the DF bit, if the
+   stale PMTU value is too large, this will be discovered almost
+
+
+Mogul & Deering                                                [page 11]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   immediately once a datagram is sent to the given destination.  No
+   such mechanism exists for realizing that a stale PMTU value is too
+   small, so an implementation should "age" cached values.  When a PMTU
+   value has not been decreased for a while (on the order of 10
+   minutes), the PMTU estimate should be set to the first-hop data-link
+   MTU, and the packetization layers should be notified of the change.
+   This will cause the complete PMTU Discovery process to take place
+   again.
+
+          Note: an implementation should provide a means for changing
+          the timeout duration, including setting it to "infinity".  For
+          example, hosts attached to an FDDI network which is then
+          attached to the rest of the Internet via a slow serial line
+          are never going to discover a new non-local PMTU, so they
+          should not have to put up with dropped datagrams every 10
+          minutes.
+
+   An upper layer MUST not retransmit datagrams in response to an
+   increase in the PMTU estimate, since this increase never comes in
+   response to an indication of a dropped datagram.
+
+   One approach to implementing PMTU aging is to add a timestamp field
+   to the routing table entry.  This field is initialized to a
+   "reserved" value, indicating that the PMTU has never been changed.
+   Whenever the PMTU is decreased in response to a Datagram Too Big
+   message, the timestamp is set to the current time.
+
+   Once a minute, a timer-driven procedure runs through the routing
+   table, and for each entry whose timestamp is not "reserved" and is
+   older than the timeout interval:
+
+      - The PMTU estimate is set to the MTU of the associated first
+        hop.
+
+      - Packetization layers using this route are notified of the
+        increase.
+
+   PMTU estimates may disappear from the routing table if the per-host
+   routes are removed; this can happen in response to an ICMP Redirect
+   message, or because certain routing-table daemons delete old routes
+   after several minutes.  Also, on a multi-homed host a topology change
+   may result in the use of a different source interface.  When this
+   happens, if the packetization layer is not notified then it may
+   continue to use a cached PMTU value that is now too small.  One
+   solution is to notify the packetization layer of a possible PMTU
+   change whenever a Redirect message causes a route change, and
+   whenever a route is simply deleted from the routing table.
+
+
+Mogul & Deering                                                [page 12]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+          Note: a more sophisticated method for detecting PMTU increases
+          is described in section 7.1.
+
+
+6.4. TCP layer actions
+
+   The TCP layer must track the PMTU for the destination of a
+   connection; it should not send datagrams that would be larger than
+   this.  A simple implementation could ask the IP layer for this value
+   (using the GET_MAXSIZES interface described in [1]) each time it
+   created a new segment, but this could be inefficient.  Moreover, TCP
+   implementations that follow the "slow-start" congestion-avoidance
+   algorithm [4] typically calculate and cache several other values
+   derived from the PMTU.  It may be simpler to receive asynchronous
+   notification when the PMTU changes, so that these variables may be
+   updated.
+
+   A TCP implementation must also store the MSS value received from its
+   peer (which defaults to 536), and not send any segment larger than
+   this MSS, regardless of the PMTU.  In 4.xBSD-derived implementations,
+   this requires adding an additional field to the TCP state record.
+
+   Finally, when a Datagram Too Big message is received, it implies that
+   a datagram was dropped by the router that sent the ICMP message.  It
+   is sufficient to treat this as any other dropped segment, and wait
+   until the retransmission timer expires to cause retransmission of the
+   segment.  If the PMTU Discovery process requires several steps to
+   estimate the right PMTU, this could delay the connection by many
+   round-trip times.
+
+   Alternatively, the retransmission could be done in immediate response
+   to a notification that the Path MTU has changed, but only for the
+   specific connection specified by the Datagram Too Big message.  The
+   datagram size used in the retransmission should, of course, be no
+   larger than the new PMTU.
+
+          Note: One MUST not retransmit in response to every Datagram
+          Too Big message, since a burst of several oversized segments
+          will give rise to several such messages and hence several
+          retransmissions of the same data.  If the new estimated PMTU
+          is still wrong, the process repeats, and there is an
+          exponential growth in the number of superfluous segments sent!
+
+          This means that the TCP layer must be able to recognize when a
+          Datagram Too Big notification actually decreases the PMTU that
+          it has already used to send a datagram on the given
+          connection, and should ignore any other notifications.
+
+
+Mogul & Deering                                                [page 13]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   Modern TCP implementations incorporate "congestion advoidance" and
+   "slow-start" algorithms to improve performance [4].  Unlike a
+   retransmission caused by a TCP retransmission timeout, a
+   retransmission caused by a Datagram Too Big message should not change
+   the congestion window.  It should, however, trigger the slow-start
+   mechanism (i.e., only one segment should be retransmitted until
+   acknowledgements begin to arrive again).
+
+   TCP performance can be reduced if the sender's maximum window size is
+   not an exact multiple of the segment size in use (this is not the
+   congestion window size, which is always a multiple of the segment
+   size).  In many system (such as those derived from 4.2BSD), the
+   segment size is often set to 1024 octets, and the maximum window size
+   (the "send space") is usually a multiple of 1024 octets, so the
+   proper relationship holds by default.  If PMTU Discovery is used,
+   however, the segment size may not be a submultiple of the send space,
+   and it may change during a connection; this means that the TCP layer
+   may need to change the transmission window size when PMTU Discovery
+   changes the PMTU value.  The maximum window size should be set to the
+   greatest multiple of the segment size (PMTU - 40) that is less than
+   or equal to the sender's buffer space size.
+
+   PMTU Discovery does not affect the value sent in the TCP MSS option,
+   because that value is used by the other end of the connection, which
+   may be using an unrelated PMTU value.
+
+
+6.5. Issues for other transport protocols
+
+   Some transport protocols (such as ISO TP4 [3]) are not allowed to
+   repacketize when doing a retransmission.  That is, once an attempt is
+   made to transmit a datagram of a certain size, its contents cannot be
+   split into smaller datagrams for retransmission.  In such a case, the
+   original datagram should be retransmitted without the DF bit set,
+   allowing it to be fragmented as necessary to reach its destination.
+   Subsequent datagrams, when transmitted for the first time, should be
+   no larger than allowed by the Path MTU, and should have the DF bit
+   set.
+
+   The Sun Network File System (NFS) uses a Remote Procedure Call (RPC)
+   protocol [11] that, in many cases, sends datagrams that must be
+   fragmented even for the first-hop link.  This might improve
+   performance in certain cases, but it is known to cause reliability
+   and performance problems, especially when the client and server are
+   separated by routers.
+
+   We recommend that NFS implementations use PMTU Discovery whenever
+
+
+Mogul & Deering                                                [page 14]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   routers are involved.  Most NFS implementations allow the RPC
+   datagram size to be changed at mount-time (indirectly, by changing
+   the effective file system block size), but might require some
+   modification to support changes later on.
+
+   Also, since a single NFS operation cannot be split across several UDP
+   datagrams, certain operations (primarily, those operating on file
+   names and directories) require a minimum datagram size that may be
+   larger than the PMTU.  NFS implementations should not reduce the
+   datagram size below this threshold, even if PMTU Discovery suggests a
+   lower value.  (Of course, in this case datagrams should not be sent
+   with DF set.)
+
+
+6.6. Management interface
+
+   We suggest that an implementation provide a way for a system utility
+   program to:
+
+      - Specify that PMTU Discovery not be done on a given route.
+
+      - Change the PMTU value associated with a given route.
+
+   The former can be accomplished by associating a flag with the routing
+   entry; when a packet is sent via a route with this flag set, the IP
+   layer leaves the DF bit clear no matter what the upper layer
+   requests.
+
+   These features might be used to work around an anomalous situation,
+   or by a routing protocol implementation that is able to obtain Path
+   MTU values.
+
+   The implementation should also provide a way to change the timeout
+   period for aging stale PMTU information.
+
+
+7. Likely values for Path MTUs
+
+   The algorithm recommended in section 5 for "searching" the space of
+   Path MTUs is based on a table of values that severely restricts the
+   search space.  We describe here a table of MTU values that, as of
+   this writing, represents all major data-link technologies in use in
+   the Internet.
+
+   In table 7-1, data links are listed in order of decreasing MTU, and
+   grouped so that each set of similar MTUs is associated with a
+   "plateau" equal to the lowest MTU in the group.  (The table also
+
+
+Mogul & Deering                                                [page 15]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   includes some entries not currently associated with a data link, and
+   gives references where available).  Where a plateau represents more
+   than one MTU, the table shows the maximum inaccuracy associated with
+   the plateau, as a percentage.
+
+   We do not expect that the values in the table, especially for higher
+   MTU levels, are going to be valid forever.  The values given here are
+   an implementation suggestion, NOT a specification or requirement.
+   Implementors should use up-to-date references to pick a set of
+   plateaus; it is important that the table not contain too many entries
+   or the process of searching for a PMTU might waste Internet
+   resources.  Implementors should also make it convenient for customers
+   without source code to update the table values in their systems (for
+   example, the table in a BSD-derived Unix kernel could be changed
+   using a new "ioctl" command).
+
+          Note: It might be a good idea to add a few table entries for
+          values equal to small powers of 2 plus 40 (for the IP and TCP
+          headers), where no similar values exist, since this seems to
+          be a reasonably non-arbitrary way of choosing arbitrary
+          values.
+
+          The table might also contain entries for values slightly less
+          than large powers of 2, in case MTUs are defined near those
+          values (it is better in this case for the table entries to be
+          low than to be high, or else the next lowest plateau may be
+          chosen instead).
+
+
+7.1. A better way to detect PMTU increases
+
+   Section 6.3 suggests detecting increases in the PMTU value by
+   periodically increasing the PTMU estimate to the first-hop MTU.
+   Since it is likely that this process will simply "rediscover" the
+   current PTMU estimate, at the cost of several dropped datagrams, it
+   should not be done often.
+
+   A better approach is to periodically increase the PMTU estimate to
+   the next-highest value in the plateau table (or the first-hop MTU, if
+   that is smaller).  If the increased estimate is wrong, at most one
+   round-trip time is wasted before the correct value is rediscovered.
+   If the increased estimate is still too low, a higher estimate will be
+   attempted somewhat later.
+
+   Because it may take several such periods to discover a significant
+   increase in the PMTU, we recommend that a short timeout period should
+   be used after the estimate is increased, and a longer timeout be used
+
+
+Mogul & Deering                                                [page 16]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+   Plateau    MTU    Comments                      Reference
+   ------     ---    --------                      ---------
+              65535  Official maximum MTU          RFC 791
+              65535  Hyperchannel                  RFC 1044
+   65535
+   32000             Just in case
+              17914  16Mb IBM Token Ring           ref. [6]
+   17914
+              8166   IEEE 802.4                    RFC 1042
+   8166
+              4464   IEEE 802.5 (4Mb max)          RFC 1042
+              4352   FDDI (Revised)                RFC 1188
+   4352 (1%)
+              2048   Wideband Network              RFC 907
+              2002   IEEE 802.5 (4Mb recommended)  RFC 1042
+   2002 (2%)
+              1536   Exp. Ethernet Nets            RFC 895
+              1500   Ethernet Networks             RFC 894
+              1500   Point-to-Point (default)      RFC 1134
+              1492   IEEE 802.3                    RFC 1042
+   1492 (3%)
+              1006   SLIP                          RFC 1055
+              1006   ARPANET                       BBN 1822
+   1006
+              576    X.25 Networks                 RFC 877
+              544    DEC IP Portal                 ref. [10]
+              512    NETBIOS                       RFC 1088
+              508    IEEE 802/Source-Rt Bridge     RFC 1042
+              508    ARCNET                        RFC 1051
+   508 (13%)
+              296    Point-to-Point (low delay)    RFC 1144
+   296
+   68                Official minimum MTU          RFC 791
+
+                Table 7-1:  Common MTUs in the Internet
+
+   after the PTMU estimate is decreased because of a Datagram Too Big
+   message.  For example, after the PTMU estimate is decreased, the
+   timeout should be set to 10 minutes; once this timer expires and a
+   larger MTU is attempted, the timeout can be set to a much smaller
+   value (say, 2 minutes).  In no case should the timeout be shorter
+   than the estimated round-trip time, if this is known.
+
+
+
+
+
+
+
+Mogul & Deering                                                [page 17]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+8. Security considerations
+
+   This Path MTU Discovery mechanism makes possible two denial-of-
+   service attacks, both based on a malicious party sending false
+   Datagram Too Big messages to an Internet host.
+
+   In the first attack, the false message indicates a PMTU much smaller
+   than reality.  This should not entirely stop data flow, since the
+   victim host should never set its PMTU estimate below the absolute
+   minimum, but at 8 octets of IP data per datagram, progress could be
+   slow.
+
+   In the other attack, the false message indicates a PMTU greater than
+   reality.  If believed, this could cause temporary blockage as the
+   victim sends datagrams that will be dropped by some router.  Within
+   one round-trip time, the host would discover its mistake (receiving
+   Datagram Too Big messages from that router), but frequent repetition
+   of this attack could cause lots of datagrams to be dropped.  A host,
+   however, should never raise its estimate of the PMTU based on a
+   Datagram Too Big message, so should not be vulnerable to this attack.
+
+   A malicious party could also cause problems if it could stop a victim
+   from receiving legitimate Datagram Too Big messages, but in this case
+   there are simpler denial-of-service attacks available.
+
+
+References
+
+[1]   R. Braden, ed.  Requirements for Internet Hosts -- Communication
+      Layers.  RFC 1122, SRI Network Information Center, October, 1989.
+
+[2]   Geof Cooper.  IP Datagram Sizes.  Electronic distribution of the
+      TCP-IP Discussion Group, Message-ID
+      <8705240517.AA01407@apolling.imagen.uucp>.
+
+[3]   ISO.  ISO Transport Protocol Specification: ISO DP 8073.  RFC 905,
+      SRI Network Information Center, April, 1984.
+
+[4]   Van Jacobson.  Congestion Avoidance and Control.  In Proc. SIGCOMM
+      '88 Symposium on Communications Architectures and Protocols, pages
+      314-329.  Stanford, CA, August, 1988.
+
+[5]   C. Kent and J. Mogul.  Fragmentation Considered Harmful.  In Proc.
+      SIGCOMM '87 Workshop on Frontiers in Computer Communications
+      Technology.  August, 1987.
+
+[6]   Drew Daniel Perkins.  Private Communication.
+
+
+Mogul & Deering                                                [page 18]
+
+
+RFC 1191                   Path MTU Discovery              November 1990
+
+
+
+
+[7]   J. Postel.  Internet Control Message Protocol.  RFC 792, SRI
+      Network Information Center, September, 1981.
+
+[8]   J. Postel.  Internet Protocol.  RFC 791, SRI Network Information
+      Center, September, 1981.
+
+[9]   J. Postel.  The TCP Maximum Segment Size and Related Topics.  RFC
+      879, SRI Network Information Center, November, 1983.
+
+[10]  Michael Reilly.  Private Communication.
+
+[11]  Sun Microsystems, Inc.  RPC: Remote Procedure Call Protocol.  RFC
+      1057, SRI Network Information Center, June, 1988.
+
+
+
+Authors' Addresses
+
+   Jeffrey Mogul
+   Digital Equipment Corporation Western Research Laboratory
+   100 Hamilton Avenue
+   Palo Alto, CA  94301
+
+   Phone: (415) 853-6643
+   EMail: mogul@decwrl.dec.com
+
+
+   Steve Deering
+   Xerox Palo Alto Research Center
+   3333 Coyote Hill Road
+   Palo Alto, CA  94304
+
+   Phone: (415) 494-4839
+   EMail: deering@xerox.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Mogul & Deering                                                [page 19]
+
+\ No newline at end of file
author	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
committer	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
commit	4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree	e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc1191.txt
parent	ea76e11061bda059ae9f9ad130a9895cc85607db (diff)