summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1063.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1063.txt')
-rw-r--r--doc/rfc/rfc1063.txt619
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc1063.txt b/doc/rfc/rfc1063.txt
new file mode 100644
index 0000000..27f9ed1
--- /dev/null
+++ b/doc/rfc/rfc1063.txt
@@ -0,0 +1,619 @@
+
+
+
+
+
+
+Network Working Group J. Mogul
+Request For Comments: 1063 C. Kent
+ DEC
+ C. Partridge
+ BBN
+ K. McCloghrie
+ TWG
+ July 1988
+
+
+ IP MTU Discovery Options
+
+STATUS OF THIS MEMO
+
+ A pair of IP options that can be used to learn the minimum MTU of a
+ path through an internet is described, along with its possible uses.
+ This is a proposal for an Experimental protocol. Distribution of
+ this memo is unlimited.
+
+INTRODUCTION
+
+ Although the Internet Protocol allows gateways to fragment packets
+ that are too large to forward, fragmentation is not always desirable.
+ It can lead to poor performance or even total communication failure
+ in circumstances that are surprisingly common. (For a thorough
+ discussion of this issue, see [1]).
+
+ A datagram will be fragmented if it is larger than the Maximum
+ Transmission Unit (MTU) of some network along the path it follows.
+ In order to avoid fragmentation, a host sending an IP datagram must
+ ensure that the datagram is no larger than the Minimum MTU (MINMTU)
+ over the entire path.
+
+ It has long been recognized that the methods for discovering the
+ MINMTU of an IP internetwork path are inadequate. The methods
+ currently available fall into two categories: (1) choosing small MTUs
+ to avoid fragmentation or (2) using additional probe packets to
+ discover when fragmentation will occur. Both methods have problems.
+
+ Choosing MTUs requires a balance between network utilization (which
+ requires the use of the largest possible datagram) and fragmentation
+ avoidance (which in the absence of knowledge about the network path
+ encourages the use of small, and thus too many, datagrams). Any
+ choice for the MTU size, without information from the network, is
+ likely to either fail to properly utilize the network or fail to
+ avoid fragmentation.
+
+ Probe packets have the problem of burdening the network with
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 1]
+
+RFC 1063 IP MTU Discovery Options July 1988
+
+
+ unnecessary packets. And because network paths often change during
+ the lifetime of a TCP connection, probe packets will have to be sent
+ on a regular basis to detect any changes in the effective MINMTU.
+
+ Implementors sometimes mistake the TCP MSS option as a mechanism for
+ learning the network MINMTU. In fact, the MSS option is only a
+ mechanism for learning about buffering capabilities at the two TCP
+ peers. Separate provisions must be made to learn the IP MINMTU.
+
+ In this memo, we propose two new IP options that, when used in
+ conjunction will permit two peers to determine the MINMTU of the
+ paths between them. In this scheme, one option is used to determine
+ the lowest MTU in a path; the second option is used to convey this
+ MTU back to the sender (possibly in the IP datagram containing the
+ transport acknowledgement to the datagram which contained the MTU
+ discovery option).
+
+OPTION FORMATS
+
+ Probe MTU Option (Number 11)
+
+ Format
+
+ +--------+--------+--------+--------+
+ |00001011|00000100| 2 octet value |
+ +--------+--------+--------+--------+
+
+ Definition
+
+ This option always contains the lowest MTU of all the networks
+ that have been traversed so far by the datagram.
+
+ A host that sends this option must initialize the value field to
+ be the MTU of the directly-connected network. If the host is
+ multi-homed, this should be for the first-hop network.
+
+ Each gateway that receives a datagram containing this option must
+ compare the MTU field with the MTUs of the inbound and outbound
+ links for the datagram. If either MTU is lower than the value in
+ the MTU field of the option, the option value should be set to the
+ lower MTU. (Note that gateways conforming to RFC-1009 may not
+ know either the inbound interface or the outbound interface at the
+ time that IP options are processed. Accordingly, support for this
+ option may require major gateway software changes).
+
+ Any host receiving a datagram containing this option should
+ confirm that value of the MTU field of the option is less than or
+ equal to that of the inbound link, and if necessary, reduce the
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 2]
+
+RFC 1063 IP MTU Discovery Options July 1988
+
+
+ MTU field value, before processing the option.
+
+ If the receiving host is not able to accept datagrams as large as
+ specified by the value of the MTU field of the option, then it
+ should reduce the MTU field to the size of the largest datagram it
+ can accept.
+
+ Reply MTU Option (Number 12)
+
+ Format
+
+ +--------+--------+--------+--------+
+ |00001100|00000100| 2 octet value |
+ +--------+--------+--------+--------+
+
+ Definition
+
+ This option is used to return the value learned from a Probe MTU
+ option to the sender of the Probe MTU option.
+
+RELATION TO TCP MSS
+
+ Note that there are two superficially similar problems in choosing
+ the size of a datagram. First, there is the restriction [2] that a
+ host not send a datagram larger than 576 octets unless it has
+ assurance that the destination is prepared to accept a larger
+ datagram. Second, the sending host should not send a datagram larger
+ than MINMTU, in order to avoid fragmentation. The datagram size
+ should normally be the minimum of these two lower bounds.
+
+ In the past, the TCP MSS option [3] has been used to avoid sending
+ packets larger than the destination can accept. Unfortunately, this
+ is not the most general mechanism; it is not available to other
+ transport layers, and it cannot determine the MINMTU (because
+ gateways do not parse TCP options).
+
+ Because the MINMTU returned by a probe cannot be larger than the
+ maximum datagram size that the destination can accept, this IP option
+ could, in theory, supplant the use of the TCP MSS option, providing
+ an economy of mechanism. (Note however, that some researchers
+ believe that the value of the TCP MSS is distinct from the path's
+ MINMTU. The MSS is the upper limit of the data size that the peer
+ will accept, while the MINMTU represents a statement about the data
+ size supported by the path).
+
+ Note that a failure to observe the MINMTU restriction is not normally
+ fatal; fragmentation will occur, but this is supposed to work. A
+ failure to observe the TCP MSS option, however, could be fatal
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 3]
+
+RFC 1063 IP MTU Discovery Options July 1988
+
+
+ because it might lead to datagrams that can never be accepted by the
+ destination. Therefore, unless and until the Probe MTU option is
+ universally implemented, at least by hosts, the TCP MSS option must
+ be used as well.
+
+IMPLEMENTATION APPROACHES
+
+ Who Sends the Option
+
+ There are at least two ways to implement the MTU discovery scheme.
+ One method makes the transport layer responsible for MTU
+ discovery; the other method makes the IP layer responsible for MTU
+ discovery. A host system should support one of the two schemes.
+
+ Transport Discovery
+
+ In the transport case, the transport layer can include the Probe
+ MTU option in an outbound datagram. When a datagram containing
+ the Probe MTU option is received, the option must be passed up to
+ the receiving transport layer, which should then acknowledge the
+ Probe with a Reply MTU option in the next return datagram. Note
+ that because the options are placed on unreliable datagrams, the
+ original sender will have to resend Probes (possibly once per
+ window of data) until it receives a Reply option. Also note that
+ the Reply MTU option may be returned on an IP datagram for a
+ different transport protocol from which it was sent (e.g., TCP
+ generated the probe but the Reply was received on a UDP datagram).
+
+ IP Discovery
+
+ A better scheme is to put MTU discovery into the IP layer, using
+ control mechanisms in the routing cache. Whenever an IP datagram
+ is sent, the IP layer checks in the routing cache to see if a
+ Probe or Reply MTU option needs to be inserted in the datagram.
+ Whenever a datagram containing either option is received, the
+ information in those options is placed in the routing cache.
+
+ The basic working of the protocol is somewhat complex. We trace
+ it here through one round-trip. Implementors should realize that
+ there may be cases where both options are contained in one
+ datagram. For the purposes of this exposition, the sender of the
+ probe is called the Probe-Sender and the receiver, Probe-Receiver.
+
+ When the IP layer is asked to send a Probe MTU option (see the
+ section below on when to probe), it makes some record in the
+ routing cache that indicates the next IP datagram to Probe-
+ Receiver should contain the Probe MTU option.
+
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 4]
+
+RFC 1063 IP MTU Discovery Options July 1988
+
+
+ When the next IP datagram to Probe-Receiver is sent, the Probe MTU
+ option is inserted. The IP layer in Probe-Sender should continue
+ to send an occasional Probe MTU in subsequent datagrams until a
+ Reply MTU option is received. It is strongly recommended that the
+ Probe MTU not be sent in all datagrams but only at such a rate
+ that, on average, one Probe MTU will be sent per round-trip
+ interval. (Another way of saying this is that we would hope that
+ only one datagram in a transport protocol window worth of data has
+ the Probe MTU option set). This mechanism might be implemented by
+ sending every Nth packet, or, in those implementations where the
+ round-trip time estimate to the destination is cached with the
+ route, once every estimated RTT.
+
+ When a Probe MTU option is received by Probe-Receiver, the
+ receiving IP should place the value of this option in the next
+ datagram it sends back to Probe-Sender. The value is then
+ discarded. In other words, each Probe MTU option causes the Reply
+ MTU option to be placed in one return datagram.
+
+ When Probe-Sender receives the Reply MTU option, it should check
+ the value of the option against the current MINMTU estimate in the
+ routing cache. If the option value is lower, it becomes the new
+ MINMTU estimate. If the option value is higher, Probe-Sender
+ should be more conservative about changing the MINMTU estimate.
+ If a route is flapping, the MINMTU may change frequently. In such
+ situations, keeping the smallest MINMTU of various routes in use
+ is preferred. As a result, a higher MINMTU estimate should only
+ be accepted after a lower estimate has been permitted to "age" a
+ bit. In other words, if the probe value is higher than the
+ estimated MINMTU, only update the estimate if the estimate is
+ several seconds old or more. Finally, whenever the Probe-Sender
+ receives a Reply MTU option, it should stop retransmitting probes
+ to Probe-Receiver.
+
+ A few additional issues complicate this discussion.
+
+ One problem is setting the default MINMTU when no Reply MTU
+ options have been received. We recommend the use of the minimum
+ of the supported IP datagram size (576 octets) and the connected
+ network MTU for destinations not on the local connected network,
+ and the connected network MTU for hosts on the connected network.
+
+ The MINMTU information, while kept by the Internet layer, is in
+ fact, only of interest to the transport and higher layers.
+ Accordingly, the Internet layer must keep the transport layer
+ informed of the current value of the estimated MINMTU.
+ Furthermore, minimal transport protocols, such as UDP, must be
+ prepared to pass this information up to the transport protocol
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 5]
+
+RFC 1063 IP MTU Discovery Options July 1988
+
+
+ user.
+
+ It is expected that there will be a transition period during which
+ some hosts support this option and some do not. As a result,
+ hosts should stop sending Probe MTU options and refuse to send any
+ further options if it does not receive either a Probe MTU option
+ or Reply MTU option from the remote system after a certain number
+ of Probe MTU options have been sent. In short, if Probe-Sender
+ has sent several probes but has gotten no indication that Probe-
+ Receiver supports MTU probing, then Probe-Sender should assume
+ that Probe-Receiver does not support probes. (Obviously, if
+ Probe-Sender later receives a probe option from Probe-Receiver, it
+ should revise its opinion.)
+
+ Implementations should not assume that routes to the same
+ destination that have a different TOS have the same estimated
+ MINMTU. We recommend that the MTU be probed separately for each
+ TOS.
+
+ Respecting the TCP MSS
+
+ One issue concerning TCP MSS is that it is usually negotiated
+ assuming an IP header that contains no options. If the transport
+ layer is sending maximum size segments, it may not leave space for
+ IP to fit the options into the datagram. Thus, insertion of the
+ Probe MTU or Reply MTU option may violate the MSS restriction.
+ Because, unlike other IP options, the MTU options can be inserted
+ without the knowledge of the transport layer, the implementor must
+ carefully consider the implications of adding options to an IP
+ datagram.
+
+ One approach is to reserve 4 bytes from the MINMTU reported to the
+ transport layer; this will allow the IP layer to insert at least
+ one MTU option in every datagram (it can compare the size of the
+ outgoing datagram with the MINMTU stored in the route cache to see
+ how much room there actually is). This is simple to implement,
+ but does waste a little bandwidth in the normal case.
+
+ Another approach is to provide a means for the IP layer to notify
+ the transport layer that space must be reserved for sending an
+ option; the transport layer would then make a forthcoming segment
+ somewhat smaller than usual.
+
+ When a Probe Can Be Sent
+
+ A system that receives a Probe MTU option should always respond
+ with a Reply MTU option, unless the probe was sent to an IP or LAN
+ broadcast address.
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 6]
+
+RFC 1063 IP MTU Discovery Options July 1988
+
+
+ A Probe MTU option should be sent in any of the following
+ situations:
+
+ (1) The MINMTU for the path is not yet known;
+
+ (2) A received datagram suffers a fragmentation re-assembly
+ timeout. (This is a strong hint the path has changed;
+ send a probe to the datagram's source);
+
+ (3) An ICMP Time Exceeded/Fragmentation Reassembly Timeout is
+ received (this is the only message we will get that
+ indicates fragmentation occurred along the network path);
+
+ (4) The transport layer requests it.
+
+ Implementations may also wish to periodically probe a path, even
+ if there is no indication that fragmentation is occurring. This
+ practice is perfectly reasonable; if fragmentation and reassembly
+ is working perfectly, the sender may never get any indication that
+ the path MINMTU has changed unless a probe is sent. We recommend,
+ however, that implementations send such periodic probes sparingly.
+ Once every few minutes, or once every few hundred datagrams is
+ probably sufficient.
+
+ There are also some scenarios in which the Probe MTU should not be
+ sent, even though there may be some indication of an MINMTU
+ change:
+
+ (1) Probes should not be sent in response to the receipt of
+ a probe option. Although the fact that the remote peer
+ is probing indicates that the MINMTU may have changed,
+ sending a probe in response to a probe causes a continuous
+ exchange of probe options.
+
+ (2) Probes must not be sent in response to fragmented
+ datagrams except when the fragmentation reassembly
+ of the datagram fails. The problem in this case is
+ that the receiver has no mechanism for informing the remote
+ peer that fragmentation has occurred, unless fragmentation
+ reassembly fails (in which case an ICMP message is sent).
+ Thus, a peer may use the wrong MTU for some time before
+ discovering a problem. If we probe on fragmented
+ datagrams, we may probe, unnecessarily, for some time
+ until the remote peer corrects its MTU.
+
+ (3) For compatibility with hosts that do not implement the
+ option, no Probe MTU Option should be sent more than
+ ten times without receiving a Reply MTU Option or a
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 7]
+
+RFC 1063 IP MTU Discovery Options July 1988
+
+
+ Probe MTU Option from the remote peer. Peers which
+ ignore probes and do not send probes must be treated
+ as not supporting probes.
+
+ (4) Probes should not be sent to an IP or LAN broadcast
+ address.
+
+ (5) We recommend that Probe MTUs not be sent to other hosts
+ on the directly-connected network, but that this feature
+ be configurable. There are situations (for example, when
+ Proxy ARP is in use) where it may be difficult to determine
+ which systems are on the directly-connected network. In
+ this case, probing may make sense.
+
+SAMPLE IMPLEMENTATION SKETCH
+
+ We present here a somewhat more concrete description of how an IP-
+ layer implementation of MTU probing might be designed.
+
+ First, the routing cache entries are enhanced to store seven
+ additional values:
+
+ MINMTU: The current MINMTU of the path.
+
+ ProbeRetry: A timestamp indicating when the next probe
+ should be sent.
+
+ LastDecreased: A timestamp showing when the MTU was
+ last decreased.
+
+ ProbeReply: A bit indicating a Reply MTU option should be
+ sent.
+
+ ReplyMTU: The value to go in the Reply MTU option.
+
+ SupportsProbes: A bit indicating that the remote peer
+ can deal with probes (always defaults to
+ 1=true).
+
+ ConsecutiveProbes: The number of probes sent without
+ the receipt of a Probe MTU or Reply
+ MTU option.
+
+ There are also several configuration parameters; these should be
+ configurable by appropriate network management software; the values
+ we suggest are "reasonable":
+
+ Default_MINMTU: The default value for the MINMTU field of the
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 8]
+
+RFC 1063 IP MTU Discovery Options July 1988
+
+
+ routing cache entry, to be used when the real
+ MINMTU is unknown. Recommended value: 576.
+
+ Max_ConsecutiveProbs: The maximum number of probes to send
+ before assuming that the destination does
+ not support the probe option.
+ Recommended value: 10.
+
+ ProbeRetryTime: The time (in seconds) to wait before retrying
+ an unanswered probe. Recommended value:
+ 60 seconds, or 2*RTT if the the RTT is available
+ to the IP layer.
+
+ ReprobeInterval: The time to wait before sending a probe after
+ receiving a successful Reply MTU, in order to
+ detect increases in the route's MINMTU.
+ Recommended value: 5 times the ProbeRetryTime.
+
+ IncreaseInterval: The time to wait before increasing the MINMTU
+ after the value has been decreased, to prevent
+ flapping. Recommended value: same as
+ ProbeRetryTime.
+
+ When a new route is entered into the routing cache, the initial
+ values should be set as follows:
+
+ MINMTU = Default_MINMTU
+
+ ProbeRetry = Current Time
+
+ LastDecreased = Current Time - IncreaseInterval
+
+ ProbeReply = false
+
+ SupportsProbes = true
+
+ ConsecutiveProbes = 0
+
+ This initialization is done before attempting to send the first
+ packet along this route, so that the first packet will contain a
+ Probe MTU option.
+
+ Whenever the IP layer sends a datagram on this route it checks the
+ SupportsProbes bit to see if the remote system supports probing. If
+ the SupportsProbes bit is set, and the timestamp in ProbeRetry is
+ less than or equal to the current time, a Probe option should be sent
+ in the datagram, and the ProbeRetry field incremented by
+ ProbeRetryTime.
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 9]
+
+RFC 1063 IP MTU Discovery Options July 1988
+
+
+ Whether or not the Probe MTU option is sent in a datagram, if the
+ ProbeReply bit is set, then a Reply MTU option with the value of the
+ ReplyMTU field is placed in the outbound datagram. The ProbeReply
+ bit is then cleared.
+
+ Every time a Probe option is sent, the ConsecutiveProbes value should
+ be incremented. If this value reaches Max_ConsecutiveProbes, the
+ SupportsProbe bit should be cleared.
+
+ When an IP datagram containing the Probe MTU option is received, the
+ receiving IP sets the ReplyMTU to the Probe MTU option value and sets
+ the ProbeReply bit in its outbound route to the source of the
+ datagram. The SupportsProbe bit is set, and the ConsecutiveProbes
+ value is reset to 0.
+
+ If an IP datagram containing the Reply MTU option is received, the IP
+ layer must locate the routing cache entry corresponding to the source
+ of the Reply MTU option; if no such entry exists, a new one (with
+ default values) should be created. The SupportsProbe bit is set, and
+ the ConsecutiveProbes value is reset to 0. The ProbeRetry field is
+ set to the current time plus ReprobeInterval.
+
+ Four cases are possible when a Reply MTU option is received:
+
+ (1) The Reply MTU option value is less than the current
+ MINMTU: the MINMTU field is set to the new value, and
+ the LastDecreased field is set to the current time.
+
+ (2) The Reply MTU option value is greater than the
+ current MINMTU and the LastDecreased field plus
+ IncreaseInterval is less than the current time: set the
+ ProbeRetry field to LastDecreased plus IncreaseInterval,
+ but do not change MINMTU.
+
+ (3) The Reply MTU option value is greater than the
+ current MINMTU and the LastDecreased field plus
+ IncreaseInterval is greater than the current time: set
+ the MINMTU field to the new value.
+
+ (4) The Reply MTU option value is equal to the current
+ MINMTU: do nothing more.
+
+ Whenever the MTU field is changed, the transport layer should be
+ notified, either by an upcall or by a change in a shared variable
+ (which may be accessed from the transport layer by a downcall).
+
+ If a fragmentation reassembly timeout occurs, if an ICMP Time
+ Exceeded/Fragmentation Reassembly Timeout is received, or if the IP
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 10]
+
+RFC 1063 IP MTU Discovery Options July 1988
+
+
+ layer is asked to send a probe by a higher layer, the ProbeRetry
+ field for the appropriate routing cache entry is set to the current
+ time. This will cause a Probe option to be sent with the next
+ datagram (unless the SupportsProbe bit is turned off).
+
+MANAGEMENT PARAMETERS
+
+ We suggest that the following parameters be made available to local
+ applications and remote network management systems:
+
+ (1) The number of probe retries to be made before determining
+ a system is down. The value of 10 is certain to be wrong
+ in some situations.
+
+ (2) The frequency with which probes are sent. Systems may
+ find that more or less frequent probing is more cost
+ effective.
+
+ (3) The default MINMTU used to initialize routes.
+
+ (4) Applications should have the ability to force a probe
+ on a particular route. There are cases where a probe
+ needs to be sent but the sender doesn't know it. An
+ operator must be able to cause a probe in such situations.
+ Furthermore, it may be useful for applications to "ping"
+ for the MTU.
+
+REFERENCES
+
+ [1] Kent, C. and J. Mogul, "Fragmentation Considered
+ Harmful", Proc. ACM SIGCOMM '87, Stowe, VT, August 1987.
+
+ [2] Postel, J., Ed., "Internet Protocol", RFC-791,
+ USC/Information Sciences Institute, Marina del Rey, CA,
+ September 1981.
+
+ [3] Postel, J., Ed., "Transmission Control Protocol", RFC-793,
+ USC/Information Sciences Institute, Marina del Rey, CA,
+ September 1981.
+
+ [4] Postel, J., "The TCP Maximum Segment Size and Related Topics",
+ RFC-879, USC/Information Sciences Institute, Marina del Rey,
+ CA, November 1983.
+
+
+
+
+
+
+
+
+Mogul, Kent, Partridge, & McCloghrie [Page 11]
+ \ No newline at end of file