summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1185.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1185.txt')
-rw-r--r--doc/rfc/rfc1185.txt1179
1 files changed, 1179 insertions, 0 deletions
diff --git a/doc/rfc/rfc1185.txt b/doc/rfc/rfc1185.txt
new file mode 100644
index 0000000..4f467f5
--- /dev/null
+++ b/doc/rfc/rfc1185.txt
@@ -0,0 +1,1179 @@
+
+
+
+
+
+
+Network Working Group V. Jacobson
+Request for Comments: 1185 LBL
+ R. Braden
+ ISI
+ L. Zhang
+ PARC
+ October 1990
+
+
+ TCP Extension for High-Speed Paths
+
+Status of This Memo
+
+ This memo describes an Experimental Protocol extension to TCP for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "IAB
+ Official Protocol Standards" for the standardization state and status
+ of this protocol. Distribution of this memo is unlimited.
+
+Summary
+
+ This memo describes a small extension to TCP to support reliable
+ operation over very high-speed paths, using sender timestamps
+ transmitted using the TCP Echo option proposed in RFC-1072.
+
+1. INTRODUCTION
+
+ TCP uses positive acknowledgments and retransmissions to provide
+ reliable end-to-end delivery over a full-duplex virtual circuit
+ called a connection [Postel81]. A connection is defined by its two
+ end points; each end point is a "socket", i.e., a (host,port) pair.
+ To protect against data corruption, TCP uses an end-to-end checksum.
+ Duplication and reordering are handled using a fine-grained sequence
+ number space, with each octet receiving a distinct sequence number.
+
+ The TCP protocol [Postel81] was designed to operate reliably over
+ almost any transmission medium regardless of transmission rate,
+ delay, corruption, duplication, or reordering of segments. In
+ practice, proper TCP implementations have demonstrated remarkable
+ robustness in adapting to a wide range of network characteristics.
+ For example, TCP implementations currently adapt to transfer rates in
+ the range of 100 bps to 10**7 bps and round-trip delays in the range
+ 1 ms to 100 seconds.
+
+ However, the introduction of fiber optics is resulting in ever-higher
+ transmission speeds, and the fastest paths are moving out of the
+ domain for which TCP was originally engineered. This memo and RFC-
+ 1072 [Jacobson88] propose modest extensions to TCP to extend the
+
+
+
+Jacobson, Braden & Zhang [Page 1]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ domain of its application to higher speeds.
+
+ There is no one-line answer to the question: "How fast can TCP go?".
+ The issues are reliability and performance, and these depend upon the
+ round-trip delay and the maximum time that segments may be queued in
+ the Internet, as well as upon the transmission speed. We must think
+ through these relationships very carefully if we are to successfully
+ extend TCP's domain.
+
+ TCP performance depends not upon the transfer rate itself, but rather
+ upon the product of the transfer rate and the round-trip delay. This
+ "bandwidth*delay product" measures the amount of data that would
+ "fill the pipe"; it is the buffer space required at sender and
+ receiver to obtain maximum throughput on the TCP connection over the
+ path. RFC-1072 proposed a set of TCP extensions to improve TCP
+ efficiency for "LFNs" (long fat networks), i.e., networks with large
+ bandwidth*delay products.
+
+ On the other hand, high transfer rate can threaten TCP reliability by
+ violating the assumptions behind the TCP mechanism for duplicate
+ detection and sequencing. The present memo specifies a solution for
+ this problem, extending TCP reliability to transfer rates well beyond
+ the foreseeable upper limit of bandwidth.
+
+ An especially serious kind of error may result from an accidental
+ reuse of TCP sequence numbers in data segments. Suppose that an "old
+ duplicate segment", e.g., a duplicate data segment that was delayed
+ in Internet queues, was delivered to the receiver at the wrong moment
+ so that its sequence numbers fell somewhere within the current
+ window. There would be no checksum failure to warn of the error, and
+ the result could be an undetected corruption of the data. Reception
+ of an old duplicate ACK segment at the transmitter could be only
+ slightly less serious: it is likely to lock up the connection so that
+ no further progress can be made and a RST is required to
+ resynchronize the two ends.
+
+ Duplication of sequence numbers might happen in either of two ways:
+
+ (1) Sequence number wrap-around on the current connection
+
+ A TCP sequence number contains 32 bits. At a high enough
+ transfer rate, the 32-bit sequence space may be "wrapped"
+ (cycled) within the time that a segment may be delayed in
+ queues. Section 2 discusses this case and proposes a mechanism
+ to reject old duplicates on the current connection.
+
+ (2) Segment from an earlier connection incarnation
+
+
+
+
+Jacobson, Braden & Zhang [Page 2]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ Suppose a connection terminates, either by a proper close
+ sequence or due to a host crash, and the same connection (i.e.,
+ using the same pair of sockets) is immediately reopened. A
+ delayed segment from the terminated connection could fall within
+ the current window for the new incarnation and be accepted as
+ valid. This case is discussed in Section 3.
+
+ TCP reliability depends upon the existence of a bound on the lifetime
+ of a segment: the "Maximum Segment Lifetime" or MSL. An MSL is
+ generally required by any reliable transport protocol, since every
+ sequence number field must be finite, and therefore any sequence
+ number may eventually be reused. In the Internet protocol suite, the
+ MSL bound is enforced by an IP-layer mechanism, the "Time-to-Live" or
+ TTL field.
+
+ Watson's Delta-T protocol [Watson81] includes network-layer
+ mechanisms for precise enforcement of an MSL. In contrast, the IP
+ mechanism for MSL enforcement is loosely defined and even more
+ loosely implemented in the Internet. Therefore, it is unwise to
+ depend upon active enforcement of MSL for TCP connections, and it is
+ unrealistic to imagine setting MSL's smaller than the current values
+ (e.g., 120 seconds specified for TCP). The timestamp algorithm
+ described in the following section gives a way out of this dilemma
+ for high-speed networks.
+
+
+2. SEQUENCE NUMBER WRAP-AROUND
+
+ 2.1 Background
+
+ Avoiding reuse of sequence numbers within the same connection is
+ simple in principle: enforce a segment lifetime shorter than the
+ time it takes to cycle the sequence space, whose size is
+ effectively 2**31.
+
+ More specifically, if the maximum effective bandwidth at which TCP
+ is able to transmit over a particular path is B bytes per second,
+ then the following constraint must be satisfied for error-free
+ operation:
+
+ 2**31 / B > MSL (secs) [1]
+
+ The following table shows the value for Twrap = 2**31/B in
+ seconds, for some important values of the bandwidth B:
+
+
+
+
+
+
+
+Jacobson, Braden & Zhang [Page 3]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ Network B*8 B Twrap
+ bits/sec bytes/sec secs
+ _______ _______ ______ ______
+
+ ARPANET 56kbps 7KBps 3*10**5 (~3.6 days)
+
+ DS1 1.5Mbps 190KBps 10**4 (~3 hours)
+
+ Ethernet 10Mbps 1.25MBps 1700 (~30 mins)
+
+ DS3 45Mbps 5.6MBps 380
+
+ FDDI 100Mbps 12.5MBps 170
+
+ Gigabit 1Gbps 125MBps 17
+
+
+ It is clear why wrap-around of the sequence space was not a
+ problem for 56kbps packet switching or even 10Mbps Ethernets. On
+ the other hand, at DS3 and FDDI speeds, Twrap is comparable to the
+ 2 minute MSL assumed by the TCP specification [Postel81]. Moving
+ towards gigabit speeds, Twrap becomes too small for reliable
+ enforcement by the Internet TTL mechanism.
+
+ The 16-bit window field of TCP limits the effective bandwidth B to
+ 2**16/RTT, where RTT is the round-trip time in seconds
+ [McKenzie89]. If the RTT is large enough, this limits B to a
+ value that meets the constraint [1] for a large MSL value. For
+ example, consider a transcontinental backbone with an RTT of 60ms
+ (set by the laws of physics). With the bandwidth*delay product
+ limited to 64KB by the TCP window size, B is then limited to
+ 1.1MBps, no matter how high the theoretical transfer rate of the
+ path. This corresponds to cycling the sequence number space in
+ Twrap= 2000 secs, which is safe in today's Internet.
+
+ Based on this reasoning, an earlier RFC [McKenzie89] has cautioned
+ that expanding the TCP window space as proposed in RFC-1072 will
+ lead to sequence wrap-around and hence to possible data
+ corruption. We believe that this is mis-identifying the culprit,
+ which is not the larger window but rather the high bandwidth.
+
+ For example, consider a (very large) FDDI LAN with a diameter
+ of 10km. Using the speed of light, we can compute the RTT
+ across the ring as (2*10**4)/(3*10**8) = 67 microseconds, and
+ the delay*bandwidth product is then 833 bytes. A TCP
+ connection across this LAN using a window of only 833 bytes
+ will run at the full 100mbps and can wrap the sequence space
+ in about 3 minutes, very close to the MSL of TCP. Thus, high
+
+
+
+Jacobson, Braden & Zhang [Page 4]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ speed alone can cause a reliability problem with sequence
+ number wrap-around, even without extended windows.
+
+ An "obvious" fix for the problem of cycling the sequence space is
+ to increase the size of the TCP sequence number field. For
+ example, the sequence number field (and also the acknowledgment
+ field) could be expanded to 64 bits. However, the proposals for
+ making such a change while maintaining compatibility with current
+ TCP have tended towards complexity and ugliness.
+
+ This memo proposes a simple solution to the problem, using the TCP
+ echo options defined in RFC-1072. Section 2.2 which follows
+ describes the original use of these options to carry timestamps in
+ order to measure RTT accurately. Section 2.3 proposes a method of
+ using these same timestamps to reject old duplicate segments that
+ could corrupt an open TCP connection. Section 3 discusses the
+ application of this mechanism to avoiding old duplicates from
+ previous incarnations.
+
+ 2.2 TCP Timestamps
+
+ RFC-1072 defined two TCP options, Echo and Echo Reply. Echo
+ carries a 32-bit number, and the receiver of the option must
+ return this same value to the source host in an Echo Reply option.
+
+ RFC-1072 furthermore describes the use of these options to contain
+ 32-bit timestamps, for measuring the RTT. A TCP sending data
+ would include Echo options containing the current clock value.
+ The receiver would echo these timestamps in returning segments
+ (generally, ACK segments). The difference between a timestamp
+ from an Echo Reply option and the current time would then measure
+ the RTT at the sender.
+
+ This mechanism was designed to solve the following problem: almost
+ all TCP implementations base their RTT measurements on a sample of
+ only one packet per window. If we look at RTT estimation as a
+ signal processing problem (which it is), a data signal at some
+ frequency (the packet rate) is being sampled at a lower frequency
+ (the window rate). Unfortunately, this lower sampling frequency
+ violates Nyquist's criteria and may introduce "aliasing" artifacts
+ into the estimated RTT [Hamming77].
+
+ A good RTT estimator with a conservative retransmission timeout
+ calculation can tolerate the aliasing when the sampling frequency
+ is "close" to the data frequency. For example, with a window of
+ 8 packets, the sample rate is 1/8 the data frequency -- less than
+ an order of magnitude different. However, when the window is tens
+ or hundreds of packets, the RTT estimator may be seriously in
+
+
+
+Jacobson, Braden & Zhang [Page 5]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ error, resulting in spurious retransmissions.
+
+ A solution to the aliasing problem that actually simplifies the
+ sender substantially (since the RTT code is typically the single
+ biggest protocol cost for TCP) is as follows: the will sender
+ place a timestamp in each segment and the receiver will reflect
+ these timestamps back in ACK segments. Then a single subtract
+ gives the sender an accurate RTT measurement for every ACK segment
+ (which will correspond to every other data segment, with a
+ sensible receiver). RFC-1072 defined a timestamp echo option for
+ this purpose.
+
+ It is vitally important to use the timestamp echo option with big
+ windows; otherwise, the door is opened to some dangerous
+ instabilities due to aliasing. Furthermore, the option is
+ probably useful for all TCP's, since it simplifies the sender.
+
+ 2.3 Avoiding Old Duplicate Segments
+
+ Timestamps carried from sender to receiver in TCP Echo options can
+ also be used to prevent data corruption caused by sequence number
+ wrap-around, as this section describes.
+
+ 2.3.1 Basic Algorithm
+
+ Assume that every received TCP segment contains a timestamp.
+ The basic idea is that a segment received with a timestamp that
+ is earlier than the timestamp of the most recently accepted
+ segment can be discarded as an old duplicate. More
+ specifically, the following processing is to be performed on
+ normal incoming segments:
+
+ R1) If the timestamp in the arriving segment timestamp is less
+ than the timestamp of the most recently received in-
+ sequence segment, treat the arriving segment as not
+ acceptable:
+
+ If SEG.LEN > 0, send an acknowledgement in reply as
+ specified in RFC-793 page 69, and drop the segment;
+ otherwise, just silently drop the segment.*
+
+_________________________
+*Sending an ACK segment in reply is not strictly necessary, since the
+case can only arise when a later in-order segment has already been
+received. However, for consistency and simplicity, we suggest
+treating a timestamp failure the same way TCP treats any other
+unacceptable segment.
+
+
+
+
+Jacobson, Braden & Zhang [Page 6]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ R2) If the segment is outside the window, reject it (normal
+ TCP processing)
+
+ R3) If an arriving segment is in-sequence (i.e, at the left
+ window edge), accept it normally and record its timestamp.
+
+ R4) Otherwise, treat the segment as a normal in-window, out-
+ of-sequence TCP segment (e.g., queue it for later delivery
+ to the user).
+
+
+ Steps R2-R4 are the normal TCP processing steps specified by
+ RFC-793, except that in R3 the latest timestamp is set from
+ each in-sequence segment that is accepted. Thus, the latest
+ timestamp recorded at the receiver corresponds to the left edge
+ of the window and only advances when the left edge moves
+ [Jacobson88].
+
+ It is important to note that the timestamp is checked only when
+ a segment first arrives at the receiver, regardless of whether
+ it is in-sequence or is queued. Consider the following
+ example.
+
+ Suppose the segment sequence: A.1, B.1, C.1, ..., Z.1 has
+ been sent, where the letter indicates the sequence number
+ and the digit represents the timestamp. Suppose also that
+ segment B.1 has been lost. The highest in-sequence
+ timestamp is 1 (from A.1), so C.1, ..., Z.1 are considered
+ acceptable and are queued. When B is retransmitted as
+ segment B.2 (using the latest timestamp), it fills the
+ hole and causes all the segments through Z to be
+ acknowledged and passed to the user. The timestamps of
+ the queued segments are *not* inspected again at this
+ time, since they have already been accepted. When B.2 is
+ accepted, the receivers's current timestamp is set to 2.
+
+ This rule is vital to allow reasonable performance under loss.
+ A full window of data is in transit at all times, and after a
+ loss a full window less one packet will show up out-of-sequence
+ to be queued at the receiver (e.g., up to ~2**30 bytes of
+ data); the timestamp option must not result in discarding this
+ data.
+
+ In certain unlikely circumstances, the algorithm of rules R1-R4
+ could lead to discarding some segments unnecessarily, as shown
+ in the following example:
+
+ Suppose again that segments: A.1, B.1, C.1, ..., Z.1 have
+
+
+
+Jacobson, Braden & Zhang [Page 7]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ been sent in sequence and that segment B.1 has been lost.
+ Furthermore, suppose delivery of some of C.1, ... Z.1 is
+ delayed until AFTER the retransmission B.2 arrives at the
+ receiver. These delayed segments will be discarded
+ unnecessarily when they do arrive, since their timestamps
+ are now out of date.
+
+ This case is very unlikely to occur. If the retransmission was
+ triggered by a timeout, some of the segments C.1, ... Z.1 must
+ have been delayed longer than the RTO time. This is presumably
+ an unlikely event, or there would be many spurious timeouts and
+ retransmissions. If B's retransmission was triggered by the
+ "fast retransmit" algorithm, i.e., by duplicate ACK's, then the
+ queued segments that caused these ACK's must have been received
+ already.
+
+ Even if a segment was delayed past the RTO, the selective
+ acknowledgment (SACK) facility of RFC-1072 will cause the
+ delayed packets to be retransmitted at the same time as B.2,
+ avoiding an extra RTT and therefore causing a very small
+ performance penalty.
+
+ We know of no case with a significant probability of occurrence
+ in which timestamps will cause performance degradation by
+ unnecessarily discarding segments.
+
+ 2.3.2 Header Prediction
+
+ "Header prediction" [Jacobson90] is a high-performance
+ transport protocol implementation technique that is is most
+ important for high-speed links. This technique optimizes the
+ code for the most common case: receiving a segment correctly
+ and in order. Using header prediction, the receiver asks the
+ question, "Is this segment the next in sequence?" This
+ question can be answered in fewer machine instructions than the
+ question, "Is this segment within the window?"
+
+ Adding header prediction to our timestamp procedure leads to
+ the following sequence for processing an arriving TCP segment:
+
+ H1) Check timestamp (same as step R1 above)
+
+ H2) Do header prediction: if segment is next in sequence and
+ if there are no special conditions requiring additional
+ processing, accept the segment, record its timestamp, and
+ skip H3.
+
+ H3) Process the segment normally, as specified in RFC-793.
+
+
+
+Jacobson, Braden & Zhang [Page 8]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ This includes dropping segments that are outside the
+ window and possibly sending acknowledgments, and queueing
+ in-window, out-of-sequence segments.
+
+ However, the timestamp check in step H1 is very unlikely to
+ fail, and it is a relatively expensive operation since it
+ requires interval arithmetic on a finite field. To perform
+ this check on every single segment seems like poor
+ implementation engineering, defeating the purpose of header
+ prediction. Therefore, we suggest that an implementor
+ interchange H1 and H2, i.e., perform header prediction FIRST,
+ performing H1 and H3 only if header prediction fails. We
+ believe that this change might gain 5-10% in performance on
+ high-speed networks.
+
+ This reordering does raise a theoretical hazard: a segment from
+ 2**32 bytes in the past may arrive at exactly the wrong time
+ and be accepted mistakenly by the header-prediction step. We
+ make the following argument to show that the probability of
+ this failure is negligible.
+
+ If all segments are equally likely to show up as old
+ duplicates, then the probability of an old duplicate
+ exactly matching the left window edge is the maximum
+ segment size (MSS) divided by the size of the sequence
+ space. This ratio must be less than 2**-16, since MSS
+ must be < 2**16; for example, it will be (2**12)/(2**32) =
+ 2**-20 for an FDDI link. However, the older a segment is,
+ the less likely it is to be retained in the Internet, and
+ under any reasonable model of segment lifetime the
+ probability of an old duplicate exactly at the left window
+ edge must be much smaller than 2**16.
+
+ The 16 bit TCP checksum also allows a basic unreliability
+ of one part in 2**16. A protocol mechanism whose
+ reliability exceeds the reliability of the TCP checksum
+ should be considered "good enough", i.e., it won't
+ contribute significantly to the overall error rate. We
+ therefore believe we can ignore the problem of an old
+ duplicate being accepted by doing header prediction before
+ checking the timestamp.
+
+ 2.3.3 Timestamp Frequency
+
+ It is important to understand that the receiver algorithm for
+ timestamps does not involve clock synchronization with the
+ sender. The sender's clock is used to stamp the segments, and
+ the sender uses this fact to measure RTT's. However, the
+
+
+
+Jacobson, Braden & Zhang [Page 9]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ receiver treats the timestamp as simply a monotone-increasing
+ serial number, without any necessary connection to its clock.
+ From the receiver's viewpoint, the timestamp is acting as a
+ logical extension of the high-order bits of the sequence
+ number.
+
+ However, the receiver algorithm dpes place some requirements on
+ the frequency of the timestamp "clock":
+
+ (a) Timestamp clock must not be "too slow".
+
+ It must tick at least once for each 2**31 bytes sent. In
+ fact, in order to be useful to the sender for round trip
+ timing, the clock should tick at least once per window's
+ worth of data, and even with the RFC-1072 window
+ extension, 2**31 bytes must be at least two windows.
+
+ To make this more quantitative, any clock faster than 1
+ tick/sec will reject old duplicate segments for link
+ speeds of ~2 Gbps; a 1ms clock will work up to link
+ speeds of 2 Tbps (10**12 bps!).
+
+ (b) Timestamp clock must not be "too fast".
+
+ Its cycling time must be greater than MSL seconds. Since
+ the clock (timestamp) is 32 bits and the worst-case MSL is
+ 255 seconds, the maximum acceptable clock frequency is one
+ tick every 59 ns.
+
+ However, since the sender is using the timestamp for RTT
+ calculations, the timestamp doesn't need to have much more
+ resolution than the granularity of the retransmit timer,
+ e.g., tens or hundreds of milliseconds.
+
+ Thus, both limits are easily satisfied with a reasonable clock
+ rate in the range 1-100ms per tick.
+
+ Using the timestamp option relaxes the requirements on MSL for
+ avoiding sequence number wrap-around. For example, with a 1 ms
+ timestamp clock, the 32-bit timestamp will wrap its sign bit in
+ 25 days. Thus, it will reject old duplicates on the same
+ connection as long as MSL is 25 days or less. This appears to
+ be a very safe figure. If the timestamp has 10 ms resolution,
+ the MSL requirement is boosted to 250 days. An MSL of 25 days
+ or longer can probably be assumed by the gateway system without
+ requiring precise MSL enforcement by the TTL value in the IP
+ layer.
+
+
+
+
+Jacobson, Braden & Zhang [Page 10]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+3. DUPLICATES FROM EARLIER INCARNATIONS OF CONNECTION
+
+ We turn now to the second potential cause of old duplicate packet
+ errors: packets from an earlier incarnation of the same connection.
+ The appendix contains a review the mechanisms currently included in
+ TCP to handle this problem. These mechanisms depend upon the
+ enforcement of a maximum segment lifetime (MSL) by the Internet
+ layer.
+
+ The MSL required to prevent failures due to an earlier connection
+ incarnation does not depend (directly) upon the transfer rate.
+ However, the timestamp option used as described in Section 2 can
+ provide additional security against old duplicates from earlier
+ connections. Furthermore, we will see that with the universal use of
+ the timestamp option, enforcement of a maximum segment lifetime would
+ no longer be required for reliable TCP operation.
+
+ There are two cases to be considered (see the appendix for more
+ explanation): (1) a system crashing (and losing connection state)
+ and restarting, and (2) the same connection being closed and reopened
+ without a loss of host state. These will be described in the
+ following two sections.
+
+ 3.1 System Crash with Loss of State
+
+ TCP's quiet time of one MSL upon system startup handles the loss
+ of connection state in a system crash/restart. For an
+ explanation, see for example "When to Keep Quiet" in the TCP
+ protocol specification [Postel81]. The MSL that is required here
+ does not depend upon the transfer speed. The current TCP MSL of 2
+ minutes seems acceptable as an operational compromise, as many
+ host systems take this long to boot after a crash.
+
+ However, the timestamp option may be used to ease the MSL
+ requirements (or to provide additional security against data
+ corruption). If timestamps are being used and if the timestamp
+ clock can be guaranteed to be monotonic over a system
+ crash/restart, i.e., if the first value of the sender's timestamp
+ clock after a crash/restart can be guaranteed to be greater than
+ the last value before the restart, then a quiet time will be
+ unnecessary.
+
+ To dispense totally with the quiet time would seem to require that
+ the host clock be synchronized to a time source that is stable
+ over the crash/restart period, with an accuracy of one timestamp
+ clock tick or better. Fortunately, we can back off from this
+ strict requirement. Suppose that the clock is always re-
+ synchronized to within N timestamp clock ticks and that booting
+
+
+
+Jacobson, Braden & Zhang [Page 11]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ (extended with a quiet time, if necessary) takes more than N
+ ticks. This will guarantee monotonicity of the timestamps, which
+ can then be used to reject old duplicates even without an enforced
+ MSL.
+
+ 3.2 Closing and Reopening a Connection
+
+ When a TCP connection is closed, a delay of 2*MSL in TIME-WAIT
+ state ties up the socket pair for 4 minutes (see Section 3.5 of
+ [Postel81]. Applications built upon TCP that close one connection
+ and open a new one (e.g., an FTP data transfer connection using
+ Stream mode) must choose a new socket pair each time. This delay
+ serves two different purposes:
+
+ (a) Implement the full-duplex reliable close handshake of TCP.
+
+ The proper time to delay the final close step is not really
+ related to the MSL; it depends instead upon the RTO for the
+ FIN segments and therefore upon the RTT of the path.*
+ Although there is no formal upper-bound on RTT, common
+ network engineering practice makes an RTT greater than 1
+ minute very unlikely. Thus, the 4 minute delay in TIME-WAIT
+ state works satisfactorily to provide a reliable full-duplex
+ TCP close. Note again that this is independent of MSL
+ enforcement and network speed.
+
+ The TIME-WAIT state could cause an indirect performance
+ problem if an application needed to repeatedly close one
+ connection and open another at a very high frequency, since
+ the number of available TCP ports on a host is less than
+ 2**16. However, high network speeds are not the major
+ contributor to this problem; the RTT is the limiting factor
+ in how quickly connections can be opened and closed.
+ Therefore, this problem will no worse at high transfer
+ speeds.
+
+ (b) Allow old duplicate segements to expire.
+
+ Suppose that a host keeps a cache of the last timestamp
+ received from each remote host. This can be used to reject
+ old duplicate segments from earlier incarnations of the
+_________________________
+*Note: It could be argued that the side that is sending a FIN knows
+what degree of reliability it needs, and therefore it should be able
+to determine the length of the TIME-WAIT delay for the FIN's
+recipient. This could be accomplished with an appropriate TCP option
+in FIN segments.
+
+
+
+
+Jacobson, Braden & Zhang [Page 12]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ connection, if the timestamp clock can be guaranteed to have
+ ticked at least once since the old conennection was open.
+ This requires that the TIME-WAIT delay plus the RTT together
+ must be at least one tick of the sender's timestamp clock.
+
+ Note that this is a variant on the mechanism proposed by
+ Garlick, Rom, and Postel (see the appendix), which required
+ each host to maintain connection records containing the
+ highest sequence numbers on every connection. Using
+ timestamps instead, it is only necessary to keep one quantity
+ per remote host, regardless of the number of simultaneous
+ connections to that host.
+
+ We conclude that if all hosts used the TCP timestamp algorithm
+ described in Section 2, enforcement of a maximum segment lifetime
+ would be unnecessary and the quiet time at system startup could be
+ shortened or removed. In any case, the timestamp mechanism can
+ provide additional security against old duplicates from earlier
+ connection incarnations. However, a 4 minute TIME-WAIT delay
+ (unrelated to MSL enforcement or network speed) must be retained
+ to provide the reliable close handshake of TCP.
+
+4. CONCLUSIONS
+
+ We have presented a mechanism, based upon the TCP timestamp echo
+ option of RFC-1072, that will allow very high TCP transfer rates
+ without reliability problems due to old duplicate segments on the
+ same connection. This mechanism also provides additional security
+ against intrusion of old duplicates from earlier incarnations of the
+ same connection. If the timestamp mechanism were used by all hosts,
+ the quiet time at system startup could be eliminated and enforcement
+ of a maximum segment lifetime (MSL) would no longer be necessary.
+
+REFERENCES
+
+ [Cerf76] Cerf, V., "TCP Resynchronization", Tech Note #79, Digital
+ Systems Lab, Stanford, January 1976.
+
+ [Dalal74] Dalal, Y., "More on Selecting Sequence Numbers", INWG
+ Protocol Note #4, October 1974.
+
+ [Garlick77] Garlick, L., R. Rom, and J. Postel, "Issues in Reliable
+ Host-to-Host Protocols", Proc. Second Berkeley Workshop on
+ Distributed Data Management and Computer Networks, May 1977.
+
+ [Hamming77] Hamming, R., "Digital Filters", ISBN 0-13-212571-4,
+ Prentice Hall, Englewood Cliffs, N.J., 1977.
+
+
+
+
+Jacobson, Braden & Zhang [Page 13]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ [Jacobson88] Jacobson, V., and R. Braden, "TCP Extensions for
+ Long-Delay Paths", RFC 1072, LBL and USC/Information Sciences
+ Institute, October 1988.
+
+ [Jacobson90] Jacobson, V., "4BSD Header Prediction", ACM Computer
+ Communication Review, April 1990.
+
+ [McKenzie89] McKenzie, A., "A Problem with the TCP Big Window
+ Option", RFC 1110, BBN STC, August 1989.
+
+ [Postel81] Postel, J., "Transmission Control Protocol", RFC 793,
+ DARPA, September 1981.
+
+ [Tomlinson74] Tomlinson, R., "Selecting Sequence Numbers", INWG
+ Protocol Note #2, September 1974.
+
+ [Watson81] Watson, R., "Timer-based Mechanisms in Reliable
+ Transport Protocol Connection Management", Computer Networks,
+ Vol. 5, 1981.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Jacobson, Braden & Zhang [Page 14]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+APPENDIX -- Protection against Old Duplicates in TCP
+
+ During the development of TCP, a great deal of effort was devoted to
+ the problem of protecting a TCP connection from segments left from
+ earlier incarnations of the same connection. Several different
+ mechanisms were proposed for this purpose [Tomlinson74] [Dalal74]
+ [Cerf76] [Garlick77].
+
+ The connection parameters that are required in this discussion are:
+
+ Tc = Connection duration in seconds.
+
+ Nc = Total number of bytes sent on connection.
+
+ B = Effective bandwidth of connection = Nc/Tc.
+
+ Tomlinson proposed a scheme with two parts: a clock-driven selection
+ of ISN (Initial Sequence Number) for a connection, and a
+ resynchronization procedure [Tomlinson74]. The clock-driven scheme
+ chooses:
+
+ ISN = (integer(R*t)) mod 2**32 [2]
+
+ where t is the current time relative to an arbitrary origin, and R is
+ a constant. R was intended to be chosen so that ISN will advance
+ faster than sequence numbers will be used up on the connection.
+ However, at high speeds this will not be true; the consequences of
+ this will be discussed below.
+
+ The clock-driven choice of ISN in formula [2] guarantees freedom from
+ old duplicates matching a reopened connection if the original
+ connection was "short-lived" and "slow". By "short-lived", we mean a
+ connection that stayed open for a time Tc less than the time to cycle
+ the ISN, i.e., Tc < 2**32/R seconds. By "slow", we mean that the
+ effective transfer rate B is less than R.
+
+ This is illustrated in Figure 1, where sequence numbers are plotted
+ against time. The asterisks show the ISN lines from formula [2],
+ while the circles represent the trajectories of several short-lived
+ incarnations of the same connection, each terminating at the "x".
+
+ Note: allowing rapid reuse of connections was believed to be an
+ important goal during the early TCP development. This
+ requirement was driven by the hope that TCP would serve as a
+ basis for user-level transaction protocols as well as
+ connection-oriented protocols. The paradigm discussed was the
+ "Christmas Tree" or "Kamikazee" segment that contained SYN and
+ FIN bits as well as data. Enthusiasm for this was somewhat
+
+
+
+Jacobson, Braden & Zhang [Page 15]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ dampened when it was observed that the 3-way SYN handshake and
+ the FIN handshake mean that 5 packets are required for a minimum
+ exchange. Furthermore, the TIME-WAIT state delay implies that
+ the same connection really cannot be reopened immediately. No
+ further work has been done in this area, although existing
+ applications (especially SMTP) often generate very short TCP
+ sessions. The reuse problem is generally avoided by using a
+ different port pair for each connection.
+
+
+ |- 2**32 ISN ISN
+ | * *
+ | * *
+ | * *
+ | *x *
+ | o *
+ ^ | * *
+ | | * x *
+ | * o *
+ S | *o *
+ e | o *
+ q | * *
+ | * *
+ # | * x *
+ | *o *
+ |o_______________*____________
+ ^ Time -->
+ 4.55hrs
+
+
+ Figure 1. Clock-Driven ISN avoiding duplication on
+ short-Lived, slow connections.
+
+
+ However, clock-driven ISN selection does not protect against old
+ duplicate packets for a long-lived or fast connection: the
+ connection may close (or crash) just as the ISN has cycled around and
+ reached the same value again. If the connection is then reopened, a
+ datagram still in transit from the old connection may fall into the
+ current window. This is illustrated by Figure 2 for a slow, long-
+ lived connection, and by Figures 3 and 4 for fast connections. In
+ each case, the point "x" marks the place at which the original
+ connection closes or crashes. The arrow in Figure 2 illustrates an
+ old duplicate segment. Figure 3 shows a connection whose total byte
+ count Nc < 2**32, while Figure 4 concerns Nc >= 2**32.
+
+ To prevent the duplication illustrated in Figure 2, Tomlinson
+ proposed to "resynchronize" the connection sequence numbers if they
+
+
+
+Jacobson, Braden & Zhang [Page 16]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ came within an MSL of the ISN. Resynchronization might take the form
+ of a delay (point "y") or the choice of a new sequence number (point
+ "z").
+
+ |- 2**32 ISN ISN
+ | * *
+ | * *
+ | * *
+ | * *
+ | * *
+ ^ | * *
+ | | * *
+ | * *
+ S | * *
+ e | * x* y
+ q | * o *
+ | * o *z
+ # | *o *
+ | * *
+ |*_________________*____________
+ ^ Time -->
+ 4.55hrs
+
+ Figure 2. Resynchronization to Avoid Duplication
+ on Slow, Long-Lived Connection
+
+
+
+ |- 2**32 ISN ISN
+ | * *
+ | x o * *
+ | * *
+ | o-->o* *
+ | * *
+ ^ | o o *
+ | | * *
+ | o * *
+ S | * *
+ e | o * *
+ q | * *
+ | o* *
+ # | * *
+ | o *
+ |*_________________*____________
+ ^ Time -->
+ 4.55hrs
+
+ Figure 3. Duplication on Fast Connection: Nc < 2**32 bytes
+
+
+
+Jacobson, Braden & Zhang [Page 17]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ |- 2**32 ISN ISN
+ | o * *
+ | x * *
+ | * *
+ | o * *
+ | o *
+ ^ | * *
+ | | o * *
+ | * o *
+ S | * *
+ e | o * *
+ q | * o *
+ | * *
+ # | o *
+ | * o *
+ |*_________________*____________
+ ^ Time -->
+ 4.55hrs
+
+ Figure 4. Duplication on Fast Connection: Nc > 2**32 bytes
+
+ In summary, Figures 1-4 illustrated four possible failure modes for
+ old duplicate packets from an earlier incarnation. We will call
+ these four modes F1 , F2, F3, and F4:
+
+
+ F1: B < R, Tc < 4.55 hrs. (Figure 1)
+
+ F2: B < R, Tc >= 4.55 hrs. (Figure 2)
+
+ F3: B >= R, Nc < 2**32 (Figure 3)
+
+ F4: B >= R, Nc >= 2**32 (Figure 4)
+
+
+ Another limitation of clock-driven ISN selection should be mentioned.
+ Tomlinson assumed that the current time t in formula [2] is obtained
+ from a clock that is persistent over a system crash. For his scheme
+ to work correctly, the clock must be restarted with an accuracy of
+ 1/R seconds (e.g, 4 microseconds in the case of TCP). While this may
+ be possible for some hosts and some crashes, in most cases there will
+ be an uncertainty in the clock after a crash that ranges from a
+ second to several minutes.
+
+ As a result of this random clock offset after system
+ reinitialization, there is a possibility that old segments sent
+ before the crash may fall into the window of a new connection
+ incarnation. The solution to this problem that was adopted in the
+
+
+
+Jacobson, Braden & Zhang [Page 18]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ final TCP spec is a "quiet time" of MSL seconds when the system is
+ initialized [Postel81, p. 28]. No TCP connection can be opened until
+ the expiration of this quiet time.
+
+ A different approach was suggested by Garlick, Rom, and Postel
+ [Garlick77]. Rather than using clock-driven ISN selection, they
+ proposed to maintain connection records containing the last ISN used
+ on every connection. To immediately open a new incarnation of a
+ connection, the ISN is taken to be greater than the last sequence
+ number of the previous incarnation, so that the new incarnation will
+ have unique sequence numbers. To handle a system crash, they
+ proposed a quiet time, i.e., a delay at system startup time to allow
+ old duplicates to expire. Note that the connection records need be
+ kept only for MSL seconds; after that, no collision is possible, and
+ a new connection can start with sequence number zero.
+
+ The scheme finally adopted for TCP combines features of both these
+ proposals. TCP uses three mechanisms:
+
+ (A) ISN selection is clock-driven to handle short-lived connections.
+ The parameter R = 250KBps, so that the ISN value cycles in
+ 2**32/R = 4.55 hours.
+
+ (B) (One end of) a closed connection is left in a "busy" state,
+ known as "TIME-WAIT" state, for a time of 2*MSL. TIME-WAIT
+ state handles the proper close of a long-lived connection
+ without resynchronization. It also allows reliable completion
+ of the full-duplex close handshake.
+
+ (C) There is a quiet time of one MSL at system startup. This
+ handles a crash of a long-lived connection and avoids time
+ resynchronization problems in (A).
+
+ Notice that (B) and (C) together are logically sufficient to prevent
+ accidental reuse of sequence numbers from a different incarnation,
+ for any of the failure modes F1-F4. (A) is not logically necessary
+ since the close delay (B) makes it impossible to reopen the same TCP
+ connection immediately. However, the use of (A) does give additional
+ assurance in a common case, perhaps compensating for a host that has
+ set its TIME-WAIT state delay too short.
+
+ Some TCP implementations have permitted a connection in the TIME-WAIT
+ state to be reopened immediately by the other side, thus short-
+ circuiting mechanism (B). Specifically, a new SYN for the same
+ socket pair is accepted when the earlier incarnation is still in
+ TIME-WAIT state. Old duplicates in one direction can be avoided by
+ choosing the ISN to be the next unused sequence number from the
+ preceding connection (i.e., FIN+1); this is essentially an
+
+
+
+Jacobson, Braden & Zhang [Page 19]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ application of the scheme of Garlick, Rom, and Postel, using the
+ connection block in TIME-WAIT state as the connection record.
+
+ However, the connection is still vulnerable to old duplicates in the
+ other direction. Mechanism (A) prevents trouble in mode F1, but
+ failures can arise in F2, F3, or F4; of these, F2, on short, fast
+ connections, is the most dangerous.
+
+ Finally, we note TCP will operate reliably without any MSL-based
+ mechanisms in the following restricted domain:
+
+ * Total data sent is less then 2**32 octets, and
+
+ * Effective sustained rate less than 250KBps, and
+
+ * Connection duration less than 4.55 hours.
+
+ At the present time, the great majority of current TCP usage falls
+ into this restricted domain. The third component, connection
+ duration, is the most commonly violated.
+
+Security Considerations
+
+ Security issues are not discussed in this memo.
+
+Authors' Addresses
+
+ Van Jacobson
+ University of California
+ Lawrence Berkeley Laboratory
+ Mail Stop 46A
+ Berkeley, CA 94720
+
+ Phone: (415) 486-6411
+ EMail: van@CSAM.LBL.GOV
+
+
+ Bob Braden
+ University of Southern California
+ Information Sciences Institute
+ 4676 Admiralty Way
+ Marina del Rey, CA 90292
+
+ Phone: (213) 822-1511
+ EMail: Braden@ISI.EDU
+
+
+
+
+
+
+Jacobson, Braden & Zhang [Page 20]
+
+RFC 1185 TCP over High-Speed Paths October 1990
+
+
+ Lixia Zhang
+ XEROX Palo Alto Research Center
+ 3333 Coyote Hill Road
+ Palo Alto, CA 94304
+
+ Phone: (415) 494-4415
+ EMail: lixia@PARC.XEROX.COM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Jacobson, Braden & Zhang [Page 21]
+ \ No newline at end of file