1 files changed, 1179 insertions, 0 deletions
diff --git a/doc/rfc/rfc1185.txt b/doc/rfc/rfc1185.txt
new file mode 100644
index 0000000..4f467f5
--- /dev/null
+++ b/doc/rfc/rfc1185.txt
@@ -0,0 +1,1179 @@
+
+
+
+
+
+
+Network Working Group                                        V. Jacobson
+Request for Comments: 1185                                           LBL
+                                                               R. Braden
+                                                                     ISI
+                                                                L. Zhang
+                                                                    PARC
+                                                            October 1990
+
+
+                   TCP Extension for High-Speed Paths
+
+Status of This Memo
+
+   This memo describes an Experimental Protocol extension to TCP for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "IAB
+   Official Protocol Standards" for the standardization state and status
+   of this protocol.  Distribution of this memo is unlimited.
+
+Summary
+
+   This memo describes a small extension to TCP to support reliable
+   operation over very high-speed paths, using sender timestamps
+   transmitted using the TCP Echo option proposed in RFC-1072.
+
+1. INTRODUCTION
+
+   TCP uses positive acknowledgments and retransmissions to provide
+   reliable end-to-end delivery over a full-duplex virtual circuit
+   called a connection [Postel81].  A connection is defined by its two
+   end points; each end point is a "socket", i.e., a (host,port) pair.
+   To protect against data corruption, TCP uses an end-to-end checksum.
+   Duplication and reordering are handled using a fine-grained sequence
+   number space, with each octet receiving a distinct sequence number.
+
+   The TCP protocol [Postel81] was designed to operate reliably over
+   almost any transmission medium regardless of transmission rate,
+   delay, corruption, duplication, or reordering of segments.  In
+   practice, proper TCP implementations have demonstrated remarkable
+   robustness in adapting to a wide range of network characteristics.
+   For example, TCP implementations currently adapt to transfer rates in
+   the range of 100 bps to 10**7 bps and round-trip delays in the range
+   1 ms to 100 seconds.
+
+   However, the introduction of fiber optics is resulting in ever-higher
+   transmission speeds, and the fastest paths are moving out of the
+   domain for which TCP was originally engineered.  This memo and RFC-
+   1072 [Jacobson88] propose modest extensions to TCP to extend the
+
+
+
+Jacobson, Braden & Zhang                                        [Page 1]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+   domain of its application to higher speeds.
+
+   There is no one-line answer to the question: "How fast can TCP go?".
+   The issues are reliability and performance, and these depend upon the
+   round-trip delay and the maximum time that segments may be queued in
+   the Internet, as well as upon the transmission speed.  We must think
+   through these relationships very carefully if we are to successfully
+   extend TCP's domain.
+
+   TCP performance depends not upon the transfer rate itself, but rather
+   upon the product of the transfer rate and the round-trip delay.  This
+   "bandwidth*delay product" measures the amount of data that would
+   "fill the pipe"; it is the buffer space required at sender and
+   receiver to obtain maximum throughput on the TCP connection over the
+   path.  RFC-1072 proposed a set of TCP extensions to improve TCP
+   efficiency for "LFNs" (long fat networks), i.e., networks with large
+   bandwidth*delay products.
+
+   On the other hand, high transfer rate can threaten TCP reliability by
+   violating the assumptions behind the TCP mechanism for duplicate
+   detection and sequencing.  The present memo specifies a solution for
+   this problem, extending TCP reliability to transfer rates well beyond
+   the foreseeable upper limit of bandwidth.
+
+   An especially serious kind of error may result from an accidental
+   reuse of TCP sequence numbers in data segments.  Suppose that an "old
+   duplicate segment", e.g., a duplicate data segment that was delayed
+   in Internet queues, was delivered to the receiver at the wrong moment
+   so that its sequence numbers fell somewhere within the current
+   window.  There would be no checksum failure to warn of the error, and
+   the result could be an undetected corruption of the data.  Reception
+   of an old duplicate ACK segment at the transmitter could be only
+   slightly less serious: it is likely to lock up the connection so that
+   no further progress can be made and a RST is required to
+   resynchronize the two ends.
+
+   Duplication of sequence numbers might happen in either of two ways:
+
+   (1)  Sequence number wrap-around on the current connection
+
+        A TCP sequence number contains 32 bits.  At a high enough
+        transfer rate, the 32-bit sequence space may be "wrapped"
+        (cycled) within the time that a segment may be delayed in
+        queues.  Section 2 discusses this case and proposes a mechanism
+        to reject old duplicates on the current connection.
+
+   (2)  Segment from an earlier connection incarnation
+
+
+
+
+Jacobson, Braden & Zhang                                        [Page 2]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+        Suppose a connection terminates, either by a proper close
+        sequence or due to a host crash, and the same connection (i.e.,
+        using the same pair of sockets) is immediately reopened.  A
+        delayed segment from the terminated connection could fall within
+        the current window for the new incarnation and be accepted as
+        valid.  This case is discussed in Section 3.
+
+   TCP reliability depends upon the existence of a bound on the lifetime
+   of a segment: the "Maximum Segment Lifetime" or MSL.  An MSL is
+   generally required by any reliable transport protocol, since every
+   sequence number field must be finite, and therefore any sequence
+   number may eventually be reused.  In the Internet protocol suite, the
+   MSL bound is enforced by an IP-layer mechanism, the "Time-to-Live" or
+   TTL field.
+
+   Watson's Delta-T protocol [Watson81] includes network-layer
+   mechanisms for precise enforcement of an MSL.  In contrast, the IP
+   mechanism for MSL enforcement is loosely defined and even more
+   loosely implemented in the Internet.  Therefore, it is unwise to
+   depend upon active enforcement of MSL for TCP connections, and it is
+   unrealistic to imagine setting MSL's smaller than the current values
+   (e.g., 120 seconds specified for TCP).  The timestamp algorithm
+   described in the following section gives a way out of this dilemma
+   for high-speed networks.
+
+
+2.  SEQUENCE NUMBER WRAP-AROUND
+
+   2.1  Background
+
+      Avoiding reuse of sequence numbers within the same connection is
+      simple in principle: enforce a segment lifetime shorter than the
+      time it takes to cycle the sequence space, whose size is
+      effectively 2**31.
+
+      More specifically, if the maximum effective bandwidth at which TCP
+      is able to transmit over a particular path is B bytes per second,
+      then the following constraint must be satisfied for error-free
+      operation:
+
+          2**31 / B  > MSL (secs)                                    [1]
+
+      The following table shows the value for Twrap = 2**31/B in
+      seconds, for some important values of the bandwidth B:
+
+
+
+
+
+
+
+Jacobson, Braden & Zhang                                        [Page 3]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+           Network       B*8          B         Twrap
+                      bits/sec   bytes/sec      secs
+           _______    _______      ______       ______
+
+           ARPANET       56kbps       7KBps    3*10**5 (~3.6 days)
+
+           DS1          1.5Mbps     190KBps    10**4 (~3 hours)
+
+           Ethernet      10Mbps    1.25MBps    1700 (~30 mins)
+
+           DS3           45Mbps     5.6MBps    380
+
+           FDDI         100Mbps    12.5MBps    170
+
+           Gigabit        1Gbps     125MBps    17
+
+
+      It is clear why wrap-around of the sequence space was not a
+      problem for 56kbps packet switching or even 10Mbps Ethernets.  On
+      the other hand, at DS3 and FDDI speeds, Twrap is comparable to the
+      2 minute MSL assumed by the TCP specification [Postel81].  Moving
+      towards gigabit speeds, Twrap becomes too small for reliable
+      enforcement by the Internet TTL mechanism.
+
+      The 16-bit window field of TCP limits the effective bandwidth B to
+      2**16/RTT, where RTT is the round-trip time in seconds
+      [McKenzie89].  If the RTT is large enough, this limits B to a
+      value that meets the constraint [1] for a large MSL value.  For
+      example, consider a transcontinental backbone with an RTT of 60ms
+      (set by the laws of physics).  With the bandwidth*delay product
+      limited to 64KB by the TCP window size, B is then limited to
+      1.1MBps, no matter how high the theoretical transfer rate of the
+      path.  This corresponds to cycling the sequence number space in
+      Twrap= 2000 secs, which is safe in today's Internet.
+
+      Based on this reasoning, an earlier RFC [McKenzie89] has cautioned
+      that expanding the TCP window space as proposed in RFC-1072 will
+      lead to sequence wrap-around and hence to possible data
+      corruption.  We believe that this is mis-identifying the culprit,
+      which is not the larger window but rather the high bandwidth.
+
+           For example, consider a (very large) FDDI LAN with a diameter
+           of 10km.  Using the speed of light, we can compute the RTT
+           across the ring as (2*10**4)/(3*10**8) = 67 microseconds, and
+           the delay*bandwidth product is then 833 bytes.  A TCP
+           connection across this LAN using a window of only 833 bytes
+           will run at the full 100mbps and can wrap the sequence space
+           in about 3 minutes, very close to the MSL of TCP. Thus, high
+
+
+
+Jacobson, Braden & Zhang                                        [Page 4]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+           speed alone can cause a reliability problem with sequence
+           number wrap-around, even without extended windows.
+
+      An "obvious" fix for the problem of cycling the sequence space is
+      to increase the size of the TCP sequence number field.  For
+      example, the sequence number field (and also the acknowledgment
+      field) could be expanded to 64 bits.  However, the proposals for
+      making such a change while maintaining compatibility with current
+      TCP have tended towards complexity and ugliness.
+
+      This memo proposes a simple solution to the problem, using the TCP
+      echo options defined in RFC-1072.  Section 2.2 which follows
+      describes the original use of these options to carry timestamps in
+      order to measure RTT accurately.  Section 2.3 proposes a method of
+      using these same timestamps to reject old duplicate segments that
+      could corrupt an open TCP connection.  Section 3 discusses the
+      application of this mechanism to avoiding old duplicates from
+      previous incarnations.
+
+   2.2  TCP Timestamps
+
+      RFC-1072 defined two TCP options, Echo and Echo Reply.  Echo
+      carries a 32-bit number, and the receiver of the option must
+      return this same value to the source host in an Echo Reply option.
+
+      RFC-1072 furthermore describes the use of these options to contain
+      32-bit timestamps, for measuring the RTT.  A TCP sending data
+      would include Echo options containing the current clock value.
+      The receiver would echo these timestamps in returning segments
+      (generally, ACK segments).  The difference between a timestamp
+      from an Echo Reply option and the current time would then measure
+      the RTT at the sender.
+
+      This mechanism was designed to solve the following problem: almost
+      all TCP implementations base their RTT measurements on a sample of
+      only one packet per window.  If we look at RTT estimation as a
+      signal processing problem (which it is), a data signal at some
+      frequency (the packet rate) is being sampled at a lower frequency
+      (the window rate).  Unfortunately, this lower sampling frequency
+      violates Nyquist's criteria and may introduce "aliasing" artifacts
+      into the estimated RTT [Hamming77].
+
+      A good RTT estimator with a conservative retransmission timeout
+      calculation can tolerate the aliasing when the sampling frequency
+      is "close" to the data frequency.   For example, with a window of
+      8 packets, the sample rate is 1/8 the data frequency -- less than
+      an order of magnitude different.  However, when the window is tens
+      or hundreds of packets, the RTT estimator may be seriously in
+
+
+
+Jacobson, Braden & Zhang                                        [Page 5]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+      error, resulting in spurious retransmissions.
+
+      A solution to the aliasing problem that actually simplifies the
+      sender substantially (since the RTT code is typically the single
+      biggest protocol cost for TCP) is as follows: the will sender
+      place a timestamp in each segment and the receiver will reflect
+      these timestamps back in ACK segments.  Then a single subtract
+      gives the sender an accurate RTT measurement for every ACK segment
+      (which will correspond to every other data segment, with a
+      sensible receiver).  RFC-1072 defined a timestamp echo option for
+      this purpose.
+
+      It is vitally important to use the timestamp echo option with big
+      windows; otherwise, the door is opened to some dangerous
+      instabilities due to aliasing.  Furthermore, the option is
+      probably useful for all TCP's, since it simplifies the sender.
+
+   2.3  Avoiding Old Duplicate Segments
+
+      Timestamps carried from sender to receiver in TCP Echo options can
+      also be used to prevent data corruption caused by sequence number
+      wrap-around, as this section describes.
+
+      2.3.1  Basic Algorithm
+
+         Assume that every received TCP segment contains a timestamp.
+         The basic idea is that a segment received with a timestamp that
+         is earlier than the timestamp of the most recently accepted
+         segment can be discarded as an old duplicate.  More
+         specifically, the following processing is to be performed on
+         normal incoming segments:
+
+         R1)  If the timestamp in the arriving segment timestamp is less
+              than the timestamp of the most recently received in-
+              sequence segment, treat the arriving segment as not
+              acceptable:
+
+                   If SEG.LEN > 0, send an acknowledgement in reply as
+                   specified in RFC-793 page 69, and drop the segment;
+                   otherwise, just silently drop the segment.*
+
+_________________________
+*Sending an ACK segment in reply is not strictly necessary, since  the
+case  can  only  arise  when a later in-order segment has already been
+received.   However,  for  consistency  and  simplicity,  we   suggest
+treating  a  timestamp  failure  the  same  way  TCP  treats any other
+unacceptable segment.
+
+
+
+
+Jacobson, Braden & Zhang                                        [Page 6]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+         R2)  If the segment is outside the window, reject it (normal
+              TCP processing)
+
+         R3)  If an arriving segment is in-sequence (i.e, at the left
+              window edge), accept it normally and record its timestamp.
+
+         R4)  Otherwise, treat the segment as a normal in-window, out-
+              of-sequence TCP segment (e.g., queue it for later delivery
+              to the user).
+
+
+         Steps R2-R4 are the normal TCP processing steps specified by
+         RFC-793, except that in R3 the latest timestamp is set from
+         each in-sequence segment that is accepted.  Thus, the latest
+         timestamp recorded at the receiver corresponds to the left edge
+         of the window and only advances when the left edge moves
+         [Jacobson88].
+
+         It is important to note that the timestamp is checked only when
+         a segment first arrives at the receiver, regardless of whether
+         it is in-sequence or is queued.  Consider the following
+         example.
+
+              Suppose the segment sequence: A.1, B.1, C.1, ..., Z.1 has
+              been sent, where the letter indicates the sequence number
+              and the digit represents the timestamp.  Suppose also that
+              segment B.1 has been lost.  The highest in-sequence
+              timestamp is 1 (from A.1), so C.1, ..., Z.1 are considered
+              acceptable and are queued.  When B is retransmitted as
+              segment B.2 (using the latest timestamp), it fills the
+              hole and causes all the segments through Z to be
+              acknowledged and passed to the user.  The timestamps of
+              the queued segments are *not* inspected again at this
+              time, since they have already been accepted.  When B.2 is
+              accepted, the receivers's current timestamp is set to 2.
+
+         This rule is vital to allow reasonable performance under loss.
+         A full window of data is in transit at all times, and after a
+         loss a full window less one packet will show up out-of-sequence
+         to be queued at the receiver (e.g., up to ~2**30 bytes of
+         data); the timestamp option must not result in discarding this
+         data.
+
+         In certain unlikely circumstances, the algorithm of rules R1-R4
+         could lead to discarding some segments unnecessarily, as shown
+         in the following example:
+
+              Suppose again that segments: A.1, B.1, C.1, ..., Z.1 have
+
+
+
+Jacobson, Braden & Zhang                                        [Page 7]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+              been sent in sequence and that segment B.1 has been lost.
+              Furthermore, suppose delivery of some of C.1, ... Z.1 is
+              delayed until AFTER the retransmission B.2 arrives at the
+              receiver.  These delayed segments will be discarded
+              unnecessarily when they do arrive, since their timestamps
+              are now out of date.
+
+         This case is very unlikely to occur.  If the retransmission was
+         triggered by a timeout, some of the segments C.1, ... Z.1 must
+         have been delayed longer than the RTO time.  This is presumably
+         an unlikely event, or there would be many spurious timeouts and
+         retransmissions.  If B's retransmission was triggered by the
+         "fast retransmit" algorithm, i.e., by duplicate ACK's, then the
+         queued segments that caused these ACK's must have been received
+         already.
+
+         Even if a segment was delayed past the RTO, the selective
+         acknowledgment (SACK) facility of RFC-1072 will cause the
+         delayed packets to be retransmitted at the same time as B.2,
+         avoiding an extra RTT and therefore causing a very small
+         performance penalty.
+
+         We know of no case with a significant probability of occurrence
+         in which timestamps will cause performance degradation by
+         unnecessarily discarding segments.
+
+      2.3.2  Header Prediction
+
+         "Header prediction" [Jacobson90] is a high-performance
+         transport protocol implementation technique that is is most
+         important for high-speed links.  This technique optimizes the
+         code for the most common case: receiving a segment correctly
+         and in order.  Using header prediction, the receiver asks the
+         question, "Is this segment the next in sequence?"  This
+         question can be answered in fewer machine instructions than the
+         question, "Is this segment within the window?"
+
+         Adding header prediction to our timestamp procedure leads to
+         the following sequence for processing an arriving TCP segment:
+
+         H1)  Check timestamp (same as step R1 above)
+
+         H2)  Do header prediction: if segment is next in sequence and
+              if there are no special conditions requiring additional
+              processing, accept the segment, record its timestamp, and
+              skip H3.
+
+         H3)  Process the segment normally, as specified in RFC-793.
+
+
+
+Jacobson, Braden & Zhang                                        [Page 8]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+              This includes dropping segments that are outside the
+              window and possibly sending acknowledgments, and queueing
+              in-window, out-of-sequence segments.
+
+         However, the timestamp check in step H1 is very unlikely to
+         fail, and it is a relatively expensive operation since it
+         requires interval arithmetic on a finite field.  To perform
+         this check on every single segment seems like poor
+         implementation engineering, defeating the purpose of header
+         prediction.  Therefore, we suggest that an implementor
+         interchange H1 and H2, i.e., perform header prediction FIRST,
+         performing H1 and H3 only if header prediction fails.  We
+         believe that this change might gain 5-10% in performance on
+         high-speed networks.
+
+         This reordering does raise a theoretical hazard: a segment from
+         2**32 bytes in the past may arrive at exactly the wrong time
+         and be accepted mistakenly by the header-prediction step.  We
+         make the following argument to show that the probability of
+         this failure is negligible.
+
+              If all segments are equally likely to show up as old
+              duplicates, then the probability of an old duplicate
+              exactly matching the left window edge is the maximum
+              segment size (MSS) divided by the size of the sequence
+              space.  This ratio must be less than 2**-16, since MSS
+              must be < 2**16; for example, it will be (2**12)/(2**32) =
+              2**-20 for an FDDI link.  However, the older a segment is,
+              the less likely it is to be retained in the Internet, and
+              under any reasonable model of segment lifetime the
+              probability of an old duplicate exactly at the left window
+              edge must be much smaller than 2**16.
+
+              The 16 bit TCP checksum also allows a basic unreliability
+              of one part in 2**16.  A protocol mechanism whose
+              reliability exceeds the reliability of the TCP checksum
+              should be considered "good enough", i.e., it won't
+              contribute significantly to the overall error rate.  We
+              therefore believe we can ignore the problem of an old
+              duplicate being accepted by doing header prediction before
+              checking the timestamp.
+
+      2.3.3  Timestamp Frequency
+
+         It is important to understand that the receiver algorithm for
+         timestamps does not involve clock synchronization with the
+         sender.  The sender's clock is used to stamp the segments, and
+         the sender uses this fact to measure RTT's.  However, the
+
+
+
+Jacobson, Braden & Zhang                                        [Page 9]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+         receiver treats the timestamp as simply a monotone-increasing
+         serial number, without any necessary connection to its clock.
+         From the receiver's viewpoint, the timestamp is acting as a
+         logical extension of the high-order bits of the sequence
+         number.
+
+         However, the receiver algorithm dpes place some requirements on
+         the frequency of the timestamp "clock":
+
+         (a)  Timestamp clock must not be "too slow".
+
+              It must tick at least once for each 2**31 bytes sent.  In
+              fact, in order to be useful to the sender for round trip
+              timing, the clock should tick at least once per window's
+              worth of data, and even with the RFC-1072 window
+              extension, 2**31 bytes must be at least two windows.
+
+              To make this more quantitative, any clock faster than 1
+              tick/sec will reject old duplicate segments for link
+              speeds of ~2 Gbps;  a 1ms clock will work up to link
+              speeds of 2 Tbps (10**12 bps!).
+
+         (b)  Timestamp clock must not be "too fast".
+
+              Its cycling time must be greater than MSL seconds.  Since
+              the clock (timestamp) is 32 bits and the worst-case MSL is
+              255 seconds, the maximum acceptable clock frequency is one
+              tick every 59 ns.
+
+              However, since the sender is using the timestamp for RTT
+              calculations, the timestamp doesn't need to have much more
+              resolution than the granularity of the retransmit timer,
+              e.g., tens or hundreds of milliseconds.
+
+         Thus, both limits are easily satisfied with a reasonable clock
+         rate in the range 1-100ms per tick.
+
+         Using the timestamp option relaxes the requirements on MSL for
+         avoiding sequence number wrap-around.  For example, with a 1 ms
+         timestamp clock, the 32-bit timestamp will wrap its sign bit in
+         25 days.  Thus, it will reject old duplicates on the same
+         connection as long as MSL is 25 days or less.  This appears to
+         be a very safe figure.  If the timestamp has 10 ms resolution,
+         the MSL requirement is boosted to 250 days.  An MSL of 25 days
+         or longer can probably be assumed by the gateway system without
+         requiring precise MSL enforcement by the TTL value in the IP
+         layer.
+
+
+
+
+Jacobson, Braden & Zhang                                       [Page 10]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+3.  DUPLICATES FROM EARLIER INCARNATIONS OF CONNECTION
+
+   We turn now to the second potential cause of old duplicate packet
+   errors: packets from an earlier incarnation of the same connection.
+   The appendix contains a review the mechanisms currently included in
+   TCP to handle this problem.  These mechanisms depend upon the
+   enforcement of a maximum segment lifetime (MSL) by the Internet
+   layer.
+
+   The MSL required to prevent failures due to an earlier connection
+   incarnation does not depend (directly) upon the transfer rate.
+   However, the timestamp option used as described in Section 2 can
+   provide additional security against old duplicates from earlier
+   connections.  Furthermore, we will see that with the universal use of
+   the timestamp option, enforcement of a maximum segment lifetime would
+   no longer be required for reliable TCP operation.
+
+   There are two cases to be considered (see the appendix for more
+   explanation):  (1) a system crashing (and losing connection state)
+   and restarting, and (2) the same connection being closed and reopened
+   without a loss of host state.  These will be described in the
+   following two sections.
+
+   3.1  System Crash with Loss of State
+
+      TCP's quiet time of one MSL upon system startup handles the loss
+      of connection state in a system crash/restart.  For an
+      explanation, see for example "When to Keep Quiet" in the TCP
+      protocol specification [Postel81].  The MSL that is required here
+      does not depend upon the transfer speed.  The current TCP MSL of 2
+      minutes seems acceptable as an operational compromise, as many
+      host systems take this long to boot after a crash.
+
+      However, the timestamp option may be used to ease the MSL
+      requirements (or to provide additional security against data
+      corruption).  If timestamps are being used and if the timestamp
+      clock can be guaranteed to be monotonic over a system
+      crash/restart, i.e., if the first value of the sender's timestamp
+      clock after a crash/restart can be guaranteed to be greater than
+      the last value before the restart, then a quiet time will be
+      unnecessary.
+
+      To dispense totally with the quiet time would seem to require that
+      the host clock be synchronized to a time source that is stable
+      over the crash/restart period, with an accuracy of one timestamp
+      clock tick or better.  Fortunately, we can back off from this
+      strict requirement.  Suppose that the clock is always re-
+      synchronized to within N timestamp clock ticks and that booting
+
+
+
+Jacobson, Braden & Zhang                                       [Page 11]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+      (extended with a quiet time, if necessary) takes more than N
+      ticks.  This will guarantee monotonicity of the timestamps, which
+      can then be used to reject old duplicates even without an enforced
+      MSL.
+
+   3.2  Closing and Reopening a Connection
+
+      When a TCP connection is closed, a delay of 2*MSL in TIME-WAIT
+      state ties up the socket pair for 4 minutes (see Section 3.5 of
+      [Postel81].  Applications built upon TCP that close one connection
+      and open a new one (e.g., an FTP data transfer connection using
+      Stream mode) must choose a new socket pair each time.  This delay
+      serves two different purposes:
+
+      (a)  Implement the full-duplex reliable close handshake of TCP.
+
+           The proper time to delay the final close step is not really
+           related to the MSL; it depends instead upon the RTO for the
+           FIN segments and therefore upon the RTT of the path.*
+           Although there is no formal upper-bound on RTT, common
+           network engineering practice makes an RTT greater than 1
+           minute very unlikely.  Thus, the 4 minute delay in TIME-WAIT
+           state works satisfactorily to provide a reliable full-duplex
+           TCP close.  Note again that this is independent of MSL
+           enforcement and network speed.
+
+           The TIME-WAIT state could cause an indirect performance
+           problem if an application needed to repeatedly close one
+           connection and open another at a very high frequency, since
+           the number of available TCP ports on a host is less than
+           2**16.  However, high network speeds are not the major
+           contributor to this problem; the RTT is the limiting factor
+           in how quickly connections can be opened and closed.
+           Therefore, this problem will no worse at high transfer
+           speeds.
+
+      (b)  Allow old duplicate segements to expire.
+
+           Suppose that a host keeps a cache of the last timestamp
+           received from each remote host.  This can be used to reject
+           old duplicate segments from earlier incarnations of the
+_________________________
+*Note: It could be argued that the side that is sending  a  FIN  knows
+what  degree  of reliability it needs, and therefore it should be able
+to  determine  the  length  of  the  TIME-WAIT  delay  for  the  FIN's
+recipient.   This could be accomplished with an appropriate TCP option
+in FIN segments.
+
+
+
+
+Jacobson, Braden & Zhang                                       [Page 12]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+           connection, if the timestamp clock can be guaranteed to have
+           ticked at least once since the old conennection was open.
+           This requires that the TIME-WAIT delay plus the RTT together
+           must be at least one tick of the sender's timestamp clock.
+
+           Note that this is a variant on the mechanism proposed by
+           Garlick, Rom, and Postel (see the appendix), which required
+           each host to maintain connection records containing the
+           highest sequence numbers on every connection.  Using
+           timestamps instead, it is only necessary to keep one quantity
+           per remote host, regardless of the number of simultaneous
+           connections to that host.
+
+      We conclude that if all hosts used the TCP timestamp algorithm
+      described in Section 2, enforcement of a maximum segment lifetime
+      would be unnecessary and the quiet time at system startup could be
+      shortened or removed.  In any case, the timestamp mechanism can
+      provide additional security against old duplicates from earlier
+      connection incarnations.   However, a 4 minute TIME-WAIT delay
+      (unrelated to MSL enforcement or network speed) must be retained
+      to provide the reliable close handshake of TCP.
+
+4. CONCLUSIONS
+
+   We have presented a mechanism, based upon the TCP timestamp echo
+   option of RFC-1072, that will allow very high TCP transfer rates
+   without reliability problems due to old duplicate segments on the
+   same connection.  This mechanism also provides additional security
+   against intrusion of old duplicates from earlier incarnations of the
+   same connection.  If the timestamp mechanism were used by all hosts,
+   the quiet time at system startup could be eliminated and enforcement
+   of a maximum segment lifetime (MSL) would no longer be necessary.
+
+REFERENCES
+
+   [Cerf76]  Cerf, V., "TCP Resynchronization", Tech Note #79, Digital
+   Systems Lab, Stanford, January 1976.
+
+   [Dalal74]  Dalal, Y., "More on Selecting Sequence Numbers", INWG
+   Protocol Note #4, October 1974.
+
+   [Garlick77]  Garlick, L., R. Rom, and J. Postel, "Issues in Reliable
+   Host-to-Host Protocols", Proc. Second Berkeley Workshop on
+   Distributed Data Management and Computer Networks, May 1977.
+
+   [Hamming77]  Hamming, R., "Digital Filters", ISBN 0-13-212571-4,
+   Prentice Hall, Englewood Cliffs, N.J., 1977.
+
+
+
+
+Jacobson, Braden & Zhang                                       [Page 13]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+   [Jacobson88]  Jacobson, V., and R. Braden, "TCP Extensions for
+   Long-Delay Paths", RFC 1072, LBL and USC/Information Sciences
+   Institute, October 1988.
+
+   [Jacobson90]  Jacobson, V., "4BSD Header Prediction", ACM Computer
+   Communication Review, April 1990.
+
+   [McKenzie89]  McKenzie, A., "A Problem with the TCP Big Window
+   Option", RFC 1110, BBN STC, August 1989.
+
+   [Postel81]  Postel, J., "Transmission Control Protocol", RFC 793,
+   DARPA, September 1981.
+
+   [Tomlinson74]  Tomlinson, R., "Selecting Sequence Numbers", INWG
+   Protocol Note #2, September 1974.
+
+   [Watson81]  Watson, R., "Timer-based Mechanisms in Reliable
+   Transport Protocol Connection Management", Computer Networks,
+   Vol. 5, 1981.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Jacobson, Braden & Zhang                                       [Page 14]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+APPENDIX -- Protection against Old Duplicates in TCP
+
+   During the development of TCP, a great deal of effort was devoted to
+   the problem of protecting a TCP connection from segments left from
+   earlier incarnations of the same connection.  Several different
+   mechanisms were proposed for this purpose [Tomlinson74] [Dalal74]
+   [Cerf76] [Garlick77].
+
+   The connection parameters that are required in this discussion are:
+
+           Tc = Connection duration in seconds.
+
+           Nc = Total number of bytes sent on connection.
+
+           B = Effective bandwidth of connection = Nc/Tc.
+
+   Tomlinson proposed a scheme with two parts: a clock-driven selection
+   of ISN (Initial Sequence Number) for a connection, and a
+   resynchronization procedure [Tomlinson74]. The clock-driven scheme
+   chooses:
+
+      ISN = (integer(R*t)) mod 2**32                 [2]
+
+   where t is the current time relative to an arbitrary origin, and R is
+   a constant.  R was intended to be chosen so that ISN will advance
+   faster than sequence numbers will be used up on the connection.
+   However, at high speeds this will not be true; the consequences of
+   this will be discussed below.
+
+   The clock-driven choice of ISN in formula [2] guarantees freedom from
+   old duplicates matching a reopened connection if the original
+   connection was "short-lived" and "slow".  By "short-lived", we mean a
+   connection that stayed open for a time Tc less than the time to cycle
+   the ISN, i.e., Tc < 2**32/R seconds.  By "slow", we mean that the
+   effective transfer rate B is less than R.
+
+   This is illustrated in Figure 1, where sequence numbers are plotted
+   against time.  The asterisks show the ISN lines from formula [2],
+   while the circles represent the trajectories of several short-lived
+   incarnations of the same connection, each terminating at the "x".
+
+        Note: allowing rapid reuse of connections was believed to be an
+        important goal during the early TCP development.  This
+        requirement was driven by the hope that TCP would serve as a
+        basis for user-level transaction protocols as well as
+        connection-oriented protocols.  The paradigm discussed was the
+        "Christmas Tree" or "Kamikazee" segment that contained SYN and
+        FIN bits as well as data.  Enthusiasm for this was somewhat
+
+
+
+Jacobson, Braden & Zhang                                       [Page 15]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+        dampened when it was observed that the 3-way SYN handshake and
+        the FIN handshake mean that 5 packets are required for a minimum
+        exchange. Furthermore, the TIME-WAIT state delay implies that
+        the same connection really cannot be reopened immediately.  No
+        further work has been done in this area, although existing
+        applications (especially SMTP) often generate very short TCP
+        sessions.  The reuse problem is generally avoided by using a
+        different port pair for each connection.
+
+
+        |- 2**32       ISN             ISN
+        |              *               *
+        |             *               *
+        |            *               *
+        |           *x              *
+        |          o               *
+    ^   |         *               *
+    |   |        *  x            *
+        |       * o             *
+    S   |      *o              *
+    e   |     o               *
+    q   |    *               *
+        |   *               *
+    #   |  * x             *
+        | *o              *
+        |o_______________*____________
+                         ^         Time -->
+                       4.55hrs
+
+
+     Figure 1.  Clock-Driven ISN  avoiding duplication on
+                short-Lived, slow connections.
+
+
+   However, clock-driven ISN selection does not protect against old
+   duplicate packets for a long-lived or fast connection:  the
+   connection may close (or crash) just as the ISN has cycled around and
+   reached the same value again.  If the connection is then reopened, a
+   datagram still in transit from the old connection may fall into the
+   current window.  This is illustrated by Figure 2 for a slow, long-
+   lived connection, and by Figures 3 and 4 for fast connections.  In
+   each case, the point "x" marks the place at which the original
+   connection closes or crashes.  The arrow in Figure 2 illustrates an
+   old duplicate segment.  Figure 3 shows a connection whose total byte
+   count Nc < 2**32, while Figure 4 concerns Nc >= 2**32.
+
+   To prevent the duplication illustrated in Figure 2, Tomlinson
+   proposed to "resynchronize" the connection sequence numbers if they
+
+
+
+Jacobson, Braden & Zhang                                       [Page 16]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+   came within an MSL of the ISN.  Resynchronization might take the form
+   of a delay (point "y") or the choice of a new sequence number (point
+   "z").
+
+        |- 2**32       ISN               ISN
+        |              *                 *
+        |             *                 *
+        |            *                 *
+        |           *                 *
+        |          *                 *
+    ^   |         *                 *
+    |   |        *                 *
+        |       *                 *
+    S   |      *                 *
+    e   |     *                x* y
+    q   |    *           o     *
+        |   *      o          *z
+    #   |  *o                *
+        | *                 *
+        |*_________________*____________
+                           ^         Time -->
+                          4.55hrs
+
+        Figure 2.  Resynchronization to Avoid Duplication
+                   on Slow, Long-Lived Connection
+
+
+
+        |- 2**32       ISN               ISN
+        |              *                 *
+        |       x   o *                 *
+        |            *                 *
+        |      o-->o*                 *
+        |          *                 *
+    ^   |     o   o                 *
+    |   |        *                 *
+        |    o  *                 *
+    S   |      *                 *
+    e   |   o *                 *
+    q   |    *                 *
+        |  o*                 *
+    #   |  *                 *
+        | o                 *
+        |*_________________*____________
+                           ^         Time -->
+                          4.55hrs
+
+     Figure 3.  Duplication on Fast Connection: Nc < 2**32 bytes
+
+
+
+Jacobson, Braden & Zhang                                       [Page 17]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+        |- 2**32       ISN               ISN
+        |      o       *                 *
+        |           x *                 *
+        |            *                 *
+        |     o     *                 *
+        |          o                 *
+    ^   |         *                 *
+    |   |    o   *                 *
+        |       * o               *
+    S   |      *                *
+    e   |   o *                 *
+    q   |    *   o             *
+        |   *                 *
+    #   |  o                 *
+        | *     o           *
+        |*_________________*____________
+                           ^         Time -->
+                          4.55hrs
+
+     Figure 4.  Duplication on Fast Connection: Nc > 2**32 bytes
+
+   In summary, Figures 1-4 illustrated four possible failure modes for
+   old duplicate packets from an earlier incarnation.  We will call
+   these four modes F1 , F2, F3, and F4:
+
+
+   F1:  B < R, Tc < 4.55 hrs. (Figure 1)
+
+   F2:  B < R, Tc >= 4.55 hrs. (Figure 2)
+
+   F3:  B >= R, Nc < 2**32 (Figure 3)
+
+   F4:  B >= R, Nc >= 2**32 (Figure 4)
+
+
+   Another limitation of clock-driven ISN selection should be mentioned.
+   Tomlinson assumed that the current time t in formula [2] is obtained
+   from a clock that is persistent over a system crash.  For his scheme
+   to work correctly, the clock must be restarted with an accuracy of
+   1/R seconds (e.g, 4 microseconds in the case of TCP).  While this may
+   be possible for some hosts and some crashes, in most cases there will
+   be an uncertainty in the clock after a crash that ranges from a
+   second to several minutes.
+
+   As a result of this random clock offset after system
+   reinitialization, there is a possibility that old segments sent
+   before the crash may fall into the window of a new connection
+   incarnation.  The solution to this problem that was adopted in the
+
+
+
+Jacobson, Braden & Zhang                                       [Page 18]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+   final TCP spec is a "quiet time" of MSL seconds when the system is
+   initialized [Postel81, p. 28].  No TCP connection can be opened until
+   the expiration of this quiet time.
+
+   A different approach was suggested by Garlick, Rom, and Postel
+   [Garlick77].  Rather than using clock-driven ISN selection, they
+   proposed to maintain connection records containing the last ISN used
+   on every connection.  To immediately open a new incarnation of a
+   connection, the ISN is taken to be greater than the last sequence
+   number of the previous incarnation, so that the new incarnation will
+   have unique sequence numbers.  To handle a system crash, they
+   proposed a quiet time, i.e., a delay at system startup time to allow
+   old duplicates to expire.  Note that the connection records need be
+   kept only for MSL seconds; after that, no collision is possible, and
+   a new connection can start with sequence number zero.
+
+   The scheme finally adopted for TCP combines features of both these
+   proposals.  TCP uses three mechanisms:
+
+   (A)  ISN selection is clock-driven to handle short-lived connections.
+        The parameter R =  250KBps, so that the ISN value cycles in
+        2**32/R = 4.55 hours.
+
+   (B)  (One end of) a closed connection is left in a "busy" state,
+        known as "TIME-WAIT" state, for a time of 2*MSL.  TIME-WAIT
+        state handles the proper close of a long-lived connection
+        without resynchronization.  It also allows reliable completion
+        of the full-duplex close handshake.
+
+   (C)  There is a quiet time of one MSL at system startup.  This
+        handles a crash of a long-lived connection and avoids time
+        resynchronization problems in (A).
+
+   Notice that (B) and (C) together are logically sufficient to prevent
+   accidental reuse of sequence numbers from a different incarnation,
+   for any of the failure modes F1-F4.  (A) is not logically necessary
+   since the close delay (B) makes it impossible to reopen the same TCP
+   connection immediately.  However, the use of (A) does give additional
+   assurance in a common case, perhaps compensating for a host that has
+   set its TIME-WAIT state delay too short.
+
+   Some TCP implementations have permitted a connection in the TIME-WAIT
+   state to be reopened immediately by the other side, thus short-
+   circuiting mechanism (B).  Specifically, a new SYN for the same
+   socket pair is accepted when the earlier incarnation is still in
+   TIME-WAIT state.  Old duplicates in one direction can be avoided by
+   choosing the ISN to be the next unused sequence number from the
+   preceding connection (i.e., FIN+1); this is essentially an
+
+
+
+Jacobson, Braden & Zhang                                       [Page 19]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+   application of the scheme of Garlick, Rom, and Postel, using the
+   connection block in TIME-WAIT state as the connection record.
+
+   However, the connection is still vulnerable to old duplicates in the
+   other direction.  Mechanism (A) prevents trouble in mode F1, but
+   failures can arise in F2, F3, or F4; of these, F2, on short, fast
+   connections, is the most dangerous.
+
+   Finally, we note TCP will operate reliably without any MSL-based
+   mechanisms in the following restricted domain:
+
+   *    Total data sent is less then 2**32 octets, and
+
+   *    Effective sustained rate less than 250KBps, and
+
+   *    Connection duration less than 4.55 hours.
+
+   At the present time, the great majority of current TCP usage falls
+   into this restricted domain.  The third component, connection
+   duration, is the most commonly violated.
+
+Security Considerations
+
+   Security issues are not discussed in this memo.
+
+Authors' Addresses
+
+   Van Jacobson
+   University of California
+   Lawrence Berkeley Laboratory
+   Mail Stop 46A
+   Berkeley, CA 94720
+
+   Phone: (415) 486-6411
+   EMail: van@CSAM.LBL.GOV
+
+
+   Bob Braden
+   University of Southern California
+   Information Sciences Institute
+   4676 Admiralty Way
+   Marina del Rey, CA 90292
+
+   Phone: (213) 822-1511
+   EMail: Braden@ISI.EDU
+
+
+
+
+
+
+Jacobson, Braden & Zhang                                       [Page 20]
+
+RFC 1185               TCP over High-Speed Paths            October 1990
+
+
+   Lixia Zhang
+   XEROX Palo Alto Research Center
+   3333 Coyote Hill Road
+   Palo Alto, CA 94304
+
+   Phone: (415) 494-4415
+   EMail: lixia@PARC.XEROX.COM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Jacobson, Braden & Zhang                                       [Page 21]
+
+\ No newline at end of file