summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1072.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc1072.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc1072.txt')
-rw-r--r--doc/rfc/rfc1072.txt893
1 files changed, 893 insertions, 0 deletions
diff --git a/doc/rfc/rfc1072.txt b/doc/rfc/rfc1072.txt
new file mode 100644
index 0000000..6ed8d5b
--- /dev/null
+++ b/doc/rfc/rfc1072.txt
@@ -0,0 +1,893 @@
+Network Working Group V. Jacobson
+Request for Comments: 1072 LBL
+ R. Braden
+ ISI
+ October 1988
+
+
+ TCP Extensions for Long-Delay Paths
+
+
+Status of This Memo
+
+ This memo proposes a set of extensions to the TCP protocol to provide
+ efficient operation over a path with a high bandwidth*delay product.
+ These extensions are not proposed as an Internet standard at this
+ time. Instead, they are intended as a basis for further
+ experimentation and research on transport protocol performance.
+ Distribution of this memo is unlimited.
+
+1. INTRODUCTION
+
+ Recent work on TCP performance has shown that TCP can work well over
+ a variety of Internet paths, ranging from 800 Mbit/sec I/O channels
+ to 300 bit/sec dial-up modems [Jacobson88]. However, there is still
+ a fundamental TCP performance bottleneck for one transmission regime:
+ paths with high bandwidth and long round-trip delays. The
+ significant parameter is the product of bandwidth (bits per second)
+ and round-trip delay (RTT in seconds); this product is the number of
+ bits it takes to "fill the pipe", i.e., the amount of unacknowledged
+ data that TCP must handle in order to keep the pipeline full. TCP
+ performance problems arise when this product is large, e.g.,
+ significantly exceeds 10**5 bits. We will refer to an Internet path
+ operating in this region as a "long, fat pipe", and a network
+ containing this path as an "LFN" (pronounced "elephan(t)").
+
+ High-capacity packet satellite channels (e.g., DARPA's Wideband Net)
+ are LFN's. For example, a T1-speed satellite channel has a
+ bandwidth*delay product of 10**6 bits or more; this corresponds to
+ 100 outstanding TCP segments of 1200 bytes each! Proposed future
+ terrestrial fiber-optical paths will also fall into the LFN class;
+ for example, a cross-country delay of 30 ms at a DS3 bandwidth
+ (45Mbps) also exceeds 10**6 bits.
+
+ Clever algorithms alone will not give us good TCP performance over
+ LFN's; it will be necessary to actually extend the protocol. This
+ RFC proposes a set of TCP extensions for this purpose.
+
+ There are three fundamental problems with the current TCP over LFN
+
+
+
+Jacobson & Braden [Page 1]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ paths:
+
+
+ (1) Window Size Limitation
+
+ The TCP header uses a 16 bit field to report the receive window
+ size to the sender. Therefore, the largest window that can be
+ used is 2**16 = 65K bytes. (In practice, some TCP
+ implementations will "break" for windows exceeding 2**15,
+ because of their failure to do unsigned arithmetic).
+
+ To circumvent this problem, we propose a new TCP option to allow
+ windows larger than 2**16. This option will define an implicit
+ scale factor, to be used to multiply the window size value found
+ in a TCP header to obtain the true window size.
+
+
+ (2) Cumulative Acknowledgments
+
+ Any packet losses in an LFN can have a catastrophic effect on
+ throughput. This effect is exaggerated by the simple cumulative
+ acknowledgment of TCP. Whenever a segment is lost, the
+ transmitting TCP will (eventually) time out and retransmit the
+ missing segment. However, the sending TCP has no information
+ about segments that may have reached the receiver and been
+ queued because they were not at the left window edge, so it may
+ be forced to retransmit these segments unnecessarily.
+
+ We propose a TCP extension to implement selective
+ acknowledgements. By sending selective acknowledgments, the
+ receiver of data can inform the sender about all segments that
+ have arrived successfully, so the sender need retransmit only
+ the segments that have actually been lost.
+
+ Selective acknowledgments have been included in a number of
+ experimental Internet protocols -- VMTP [Cheriton88], NETBLT
+ [Clark87], and RDP [Velten84]. There is some empirical evidence
+ in favor of selective acknowledgments -- simple experiments with
+ RDP have shown that disabling the selective acknowlegment
+ facility greatly increases the number of retransmitted segments
+ over a lossy, high-delay Internet path [Partridge87]. A
+ simulation study of a simple form of selective acknowledgments
+ added to the ISO transport protocol TP4 also showed promise of
+ performance improvement [NBS85].
+
+
+
+
+
+
+
+Jacobson & Braden [Page 2]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ (3) Round Trip Timing
+
+ TCP implements reliable data delivery by measuring the RTT,
+ i.e., the time interval between sending a segment and receiving
+ an acknowledgment for it, and retransmitting any segments that
+ are not acknowledged within some small multiple of the average
+ RTT. Experience has shown that accurate, current RTT estimates
+ are necessary to adapt to changing traffic conditions and,
+ without them, a busy network is subject to an instability known
+ as "congestion collapse" [Nagle84].
+
+ In part because TCP segments may be repacketized upon
+ retransmission, and in part because of complications due to the
+ cumulative TCP acknowledgement, measuring a segments's RTT may
+ involve a non-trivial amount of computation in some
+ implementations. To minimize this computation, some
+ implementations time only one segment per window. While this
+ yields an adequate approximation to the RTT for small windows
+ (e.g., a 4 to 8 segment Arpanet window), for an LFN (e.g., 100
+ segment Wideband Network windows) it results in an unacceptably
+ poor RTT estimate.
+
+ In the presence of errors, the problem becomes worse. Zhang
+ [Zhang86], Jain [Jain86] and Karn [Karn87] have shown that it is
+ not possible to accumulate reliable RTT estimates if
+ retransmitted segments are included in the estimate. Since a
+ full window of data will have been transmitted prior to a
+ retransmission, all of the segments in that window will have to
+ be ACKed before the next RTT sample can be taken. This means at
+ least an additional window's worth of time between RTT
+ measurements and, as the error rate approaches one per window of
+ data (e.g., 10**-6 errors per bit for the Wideband Net), it
+ becomes effectively impossible to obtain an RTT measurement.
+
+ We propose a TCP "echo" option that allows each segment to carry
+ its own timestamp. This will allow every segment, including
+ retransmissions, to be timed at negligible computational cost.
+
+
+ In designing new TCP options, we must pay careful attention to
+ interoperability with existing implementations. The only TCP option
+ defined to date is an "initial option", i.e., it may appear only on a
+ SYN segment. It is likely that most implementations will properly
+ ignore any options in the SYN segment that they do not understand, so
+ new initial options should not cause a problem. On the other hand,
+ we fear that receiving unexpected non-initial options may cause some
+ TCP's to crash.
+
+
+
+
+Jacobson & Braden [Page 3]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ Therefore, in each of the extensions we propose, non-initial options
+ may be sent only if an exchange of initial options has indicated that
+ both sides understand the extension. This approach will also allow a
+ TCP to determine when the connection opens how big a TCP header it
+ will be sending.
+
+2. TCP WINDOW SCALE OPTION
+
+ The obvious way to implement a window scale factor would be to define
+ a new TCP option that could be included in any segment specifying a
+ window. The receiver would include it in every acknowledgment
+ segment, and the sender would interpret it. Unfortunately, this
+ simple approach would not work. The sender must reliably know the
+ receiver's current scale factor, but a TCP option in an
+ acknowledgement segment will not be delivered reliably (unless the
+ ACK happens to be piggy-backed on data).
+
+ However, SYN segments are always sent reliably, suggesting that each
+ side may communicate its window scale factor in an initial TCP
+ option. This approach has a disadvantage: the scale must be
+ established when the connection is opened, and cannot be changed
+ thereafter. However, other alternatives would be much more
+ complicated, and we therefore propose a new initial option called
+ Window Scale.
+
+2.1 Window Scale Option
+
+ This three-byte option may be sent in a SYN segment by a TCP (1)
+ to indicate that it is prepared to do both send and receive window
+ scaling, and (2) to communicate a scale factor to be applied to
+ its receive window. The scale factor is encoded logarithmically,
+ as a power of 2 (presumably to be implemented by binary shifts).
+
+ Note: the window in the SYN segment itself is never scaled.
+
+ TCP Window Scale Option:
+
+ Kind: 3
+
+ +---------+---------+---------+
+ | Kind=3 |Length=3 |shift.cnt|
+ +---------+---------+---------+
+
+ Here shift.cnt is the number of bits by which the receiver right-
+ shifts the true receive-window value, to scale it into a 16-bit
+ value to be sent in TCP header (this scaling is explained below).
+ The value shift.cnt may be zero (offering to scale, while applying
+ a scale factor of 1 to the receive window).
+
+
+
+Jacobson & Braden [Page 4]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ This option is an offer, not a promise; both sides must send
+ Window Scale options in their SYN segments to enable window
+ scaling in either direction.
+
+2.2 Using the Window Scale Option
+
+ A model implementation of window scaling is as follows, using the
+ notation of RFC-793 [Postel81]:
+
+ * The send-window (SND.WND) and receive-window (RCV.WND) sizes
+ in the connection state block and in all sequence space
+ calculations are expanded from 16 to 32 bits.
+
+ * Two window shift counts are added to the connection state:
+ snd.scale and rcv.scale. These are shift counts to be
+ applied to the incoming and outgoing windows, respectively.
+ The precise algorithm is shown below.
+
+ * All outgoing SYN segments are sent with the Window Scale
+ option, containing a value shift.cnt = R that the TCP would
+ like to use for its receive window.
+
+ * Snd.scale and rcv.scale are initialized to zero, and are
+ changed only during processing of a received SYN segment. If
+ the SYN segment contains a Window Scale option with shift.cnt
+ = S, set snd.scale to S and set rcv.scale to R; otherwise,
+ both snd.scale and rcv.scale are left at zero.
+
+ * The window field (SEG.WND) in the header of every incoming
+ segment, with the exception of SYN segments, will be left-
+ shifted by snd.scale bits before updating SND.WND:
+
+ SND.WND = SEG.WND << snd.scale
+
+ (assuming the other conditions of RFC793 are met, and using
+ the "C" notation "<<" for left-shift).
+
+ * The window field (SEG.WND) of every outgoing segment, with
+ the exception of SYN segments, will have been right-shifted
+ by rcv.scale bits:
+
+ SEG.WND = RCV.WND >> rcv.scale.
+
+
+ TCP determines if a data segment is "old" or "new" by testing if
+ its sequence number is within 2**31 bytes of the left edge of the
+ window. If not, the data is "old" and discarded. To insure that
+ new data is never mistakenly considered old and vice-versa, the
+
+
+
+Jacobson & Braden [Page 5]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ left edge of the sender's window has to be at least 2**31 away
+ from the right edge of the receiver's window. Similarly with the
+ sender's right edge and receiver's left edge. Since the right and
+ left edges of either the sender's or receiver's window differ by
+ the window size, and since the sender and receiver windows can be
+ out of phase by at most the window size, the above constraints
+ imply that 2 * the max window size must be less than 2**31, or
+
+ max window < 2**30
+
+ Since the max window is 2**S (where S is the scaling shift count)
+ times at most 2**16 - 1 (the maximum unscaled window), the maximum
+ window is guaranteed to be < 2*30 if S <= 14. Thus, the shift
+ count must be limited to 14. (This allows windows of 2**30 = 1
+ Gbyte.) If a Window Scale option is received with a shift.cnt
+ value exceeding 14, the TCP should log the error but use 14
+ instead of the specified value.
+
+
+3. TCP SELECTIVE ACKNOWLEDGMENT OPTIONS
+
+ To minimize the impact on the TCP protocol, the selective
+ acknowledgment extension uses the form of two new TCP options. The
+ first is an enabling option, "SACK-permitted", that may be sent in a
+ SYN segment to indicate that the the SACK option may be used once the
+ connection is established. The other is the SACK option itself,
+ which may be sent over an established connection once permission has
+ been given by SACK-permitted.
+
+ The SACK option is to be included in a segment sent from a TCP that
+ is receiving data to the TCP that is sending that data; we will refer
+ to these TCP's as the data receiver and the data sender,
+ respectively. We will consider a particular simplex data flow; any
+ data flowing in the reverse direction over the same connection can be
+ treated independently.
+
+3.1 SACK-Permitted Option
+
+ This two-byte option may be sent in a SYN by a TCP that has been
+ extended to receive (and presumably process) the SACK option once
+ the connection has opened.
+
+
+
+
+
+
+
+
+
+
+Jacobson & Braden [Page 6]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ TCP Sack-Permitted Option:
+
+ Kind: 4
+
+ +---------+---------+
+ | Kind=4 | Length=2|
+ +---------+---------+
+
+3.2 SACK Option
+
+ The SACK option is to be used to convey extended acknowledgment
+ information over an established connection. Specifically, it is
+ to be sent by a data receiver to inform the data transmitter of
+ non-contiguous blocks of data that have been received and queued.
+ The data receiver is awaiting the receipt of data in later
+ retransmissions to fill the gaps in sequence space between these
+ blocks. At that time, the data receiver will acknowledge the data
+ normally by advancing the left window edge in the Acknowledgment
+ Number field of the TCP header.
+
+ It is important to understand that the SACK option will not change
+ the meaning of the Acknowledgment Number field, whose value will
+ still specify the left window edge, i.e., one byte beyond the last
+ sequence number of fully-received data. The SACK option is
+ advisory; if it is ignored, TCP acknowledgments will continue to
+ function as specified in the protocol.
+
+ However, SACK will provide additional information that the data
+ transmitter can use to optimize retransmissions. The TCP data
+ receiver may include the SACK option in an acknowledgment segment
+ whenever it has data that is queued and unacknowledged. Of
+ course, the SACK option may be sent only when the TCP has received
+ the SACK-permitted option in the SYN segment for that connection.
+
+ TCP SACK Option:
+
+ Kind: 5
+
+ Length: Variable
+
+
+ +--------+--------+--------+--------+--------+--------+...---+
+ | Kind=5 | Length | Relative Origin | Block Size | |
+ +--------+--------+--------+--------+--------+--------+...---+
+
+
+ This option contains a list of the blocks of contiguous sequence
+ space occupied by data that has been received and queued within
+
+
+
+Jacobson & Braden [Page 7]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ the window. Each block is contiguous and isolated; that is, the
+ octets just below the block,
+
+ Acknowledgment Number + Relative Origin -1,
+
+ and just above the block,
+
+ Acknowledgment Number + Relative Origin + Block Size,
+
+ have not been received.
+
+ Each contiguous block of data queued at the receiver is defined in
+ the SACK option by two 16-bit integers:
+
+
+ * Relative Origin
+
+ This is the first sequence number of this block, relative to
+ the Acknowledgment Number field in the TCP header (i.e.,
+ relative to the data receiver's left window edge).
+
+
+ * Block Size
+
+ This is the size in octets of this block of contiguous data.
+
+
+ A SACK option that specifies n blocks will have a length of 4*n+2
+ octets, so the 44 bytes available for TCP options can specify a
+ maximum of 10 blocks. Of course, if other TCP options are
+ introduced, they will compete for the 44 bytes, and the limit of
+ 10 may be reduced in particular segments.
+
+ There is no requirement on the order in which blocks can appear in
+ a single SACK option.
+
+ Note: requiring that the blocks be ordered would allow a
+ slightly more efficient algorithm in the transmitter; however,
+ this does not seem to be an important optimization.
+
+3.3 SACK with Window Scaling
+
+ If window scaling is in effect, then 16 bits may not be sufficient
+ for the SACK option fields that define the origin and length of a
+ block. There are two possible ways to handle this:
+
+ (1) Expand the SACK origin and length fields to 24 or 32 bits.
+
+
+
+
+Jacobson & Braden [Page 8]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ (2) Scale the SACK fields by the same factor as the window.
+
+
+ The first alternative would significantly reduce the number of
+ blocks possible in a SACK option; therefore, we have chosen the
+ second alternative, scaling the SACK information as well as the
+ window.
+
+ Scaling the SACK information introduces some loss of precision,
+ since a SACK option must report queued data blocks whose origins
+ and lengths are multiples of the window scale factor rcv.scale.
+ These reported blocks must be equal to or smaller than the actual
+ blocks of queued data.
+
+ Specifically, suppose that the receiver has a contiguous block of
+ queued data that occupies sequence numbers L, L+1, ... L+N-1, and
+ that the window scale factor is S = rcv.scale. Then the
+ corresponding block that will be reported in a SACK option will
+ be:
+
+ Relative Origin = int((L+S-1)/S)
+
+ Block Size = int((L+N)/S) - (Relative Origin)
+
+ where the function int(x) returns the greatest integer contained
+ in x.
+
+ The resulting loss of precision is not a serious problem for the
+ sender. If the data-sending TCP keeps track of the boundaries of
+ all segments in its retransmission queue, it will generally be
+ able to infer from the imprecise SACK data which full segments
+ don't need to be retransmitted. This will fail only if S is
+ larger than the maximum segment size, in which case some segments
+ may be retransmitted unnecessarily. If the sending TCP does not
+ keep track of transmitted segment boundaries, the imprecision of
+ the scaled SACK quantities will only result in retransmitting a
+ small amount of unneeded sequence space. On the average, the data
+ sender will unnecessarily retransmit J*S bytes of the sequence
+ space for each SACK received; here J is the number of blocks
+ reported in the SACK, and S = snd.scale.
+
+3.4 SACK Option Examples
+
+ Assume the left window edge is 5000 and that the data transmitter
+ sends a burst of 8 segments, each containing 500 data bytes.
+ Unless specified otherwise, we assume that the scale factor S = 1.
+
+
+
+
+
+Jacobson & Braden [Page 9]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ Case 1: The first 4 segments are received but the last 4 are
+ dropped.
+
+ The data receiver will return a normal TCP ACK segment
+ acknowledging sequence number 7000, with no SACK option.
+
+
+ Case 2: The first segment is dropped but the remaining 7 are
+ received.
+
+ The data receiver will return a TCP ACK segment that
+ acknowledges sequence number 5000 and contains a SACK option
+ specifying one block of queued data:
+
+ Relative Origin = 500; Block Size = 3500
+
+
+ Case 3: The 2nd, 4th, 6th, and 8th (last) segments are
+ dropped.
+
+ The data receiver will return a TCP ACK segment that
+ acknowledges sequence number 5500 and contains a SACK option
+ specifying the 3 blocks:
+
+ Relative Origin = 500; Block Size = 500
+ Relative Origin = 1500; Block Size = 500
+ Relative Origin = 2500; Block Size = 500
+
+
+ Case 4: Same as Case 3, except Scale Factor S = 16.
+
+ The SACK option would specify the 3 scaled blocks:
+
+ Relative Origin = 32; Block Size = 30
+ Relative Origin = 94; Block Size = 31
+ Relative Origin = 157; Block Size = 30
+
+ These three reported blocks have sequence numbers 512 through
+ 991, 1504 through 1999, and 2512 through 2992, respectively.
+
+
+3.5 Generating the SACK Option
+
+ Let us assume that the data receiver maintains a queue of valid
+ segments that it has neither passed to the user nor acknowledged
+ because of earlier missing data, and that this queue is ordered by
+ starting sequence number. Computation of the SACK option can be
+ done with one pass down this queue. Segments that occupy
+
+
+
+Jacobson & Braden [Page 10]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ contiguous sequence space are aggregated into a single SACK block,
+ and each gap in the sequence space (except a gap that is
+ terminated by the right window edge) triggers the start of a new
+ SACK block. If this algorithm defines more than 10 blocks, only
+ the first 10 can be included in the option.
+
+3.6 Interpreting the SACK Option
+
+ The data transmitter is assumed to have a retransmission queue
+ that contains the segments that have been transmitted but not yet
+ acknowledged, in sequence-number order. If the data transmitter
+ performs re-packetization before retransmission, the block
+ boundaries in a SACK option that it receives may not fall on
+ boundaries of segments in the retransmission queue; however, this
+ does not pose a serious difficulty for the transmitter.
+
+ Let us suppose that for each segment in the retransmission queue
+ there is a (new) flag bit "ACK'd", to be used to indicate that
+ this particular segment has been entirely acknowledged. When a
+ segment is first transmitted, it will be entered into the
+ retransmission queue with its ACK'd bit off. If the ACK'd bit is
+ subsequently turned on (as the result of processing a received
+ SACK option), the data transmitter will skip this segment during
+ any later retransmission. However, the segment will not be
+ dequeued and its buffer freed until the left window edge is
+ advanced over it.
+
+ When an acknowledgment segment arrives containing a SACK option,
+ the data transmitter will turn on the ACK'd bits for segments that
+ have been selectively acknowleged. More specifically, for each
+ block in the SACK option, the data transmitter will turn on the
+ ACK'd flags for all segments in the retransmission queue that are
+ wholly contained within that block. This requires straightforward
+ sequence number comparisons.
+
+
+4. TCP ECHO OPTIONS
+
+ A simple method for measuring the RTT of a segment would be: the
+ sender places a timestamp in the segment and the receiver returns
+ that timestamp in the corresponding ACK segment. When the ACK segment
+ arrives at the sender, the difference between the current time and
+ the timestamp is the RTT. To implement this timing method, the
+ receiver must simply reflect or echo selected data (the timestamp)
+ from the sender's segments. This idea is the basis of the "TCP Echo"
+ and "TCP Echo Reply" options.
+
+
+
+
+
+Jacobson & Braden [Page 11]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+4.1 TCP Echo and TCP Echo Reply Options
+
+ TCP Echo Option:
+
+ Kind: 6
+
+ Length: 6
+
+ +--------+--------+--------+--------+--------+--------+
+ | Kind=6 | Length | 4 bytes of info to be echoed |
+ +--------+--------+--------+--------+--------+--------+
+
+ This option carries four bytes of information that the receiving TCP
+ may send back in a subsequent TCP Echo Reply option (see below). A
+ TCP may send the TCP Echo option in any segment, but only if a TCP
+ Echo option was received in a SYN segment for the connection.
+
+ When the TCP echo option is used for RTT measurement, it will be
+ included in data segments, and the four information bytes will define
+ the time at which the data segment was transmitted in any format
+ convenient to the sender.
+
+ TCP Echo Reply Option:
+
+ Kind: 7
+
+ Length: 6
+
+ +--------+--------+--------+--------+--------+--------+
+ | Kind=7 | Length | 4 bytes of echoed info |
+ +--------+--------+--------+--------+--------+--------+
+
+
+ A TCP that receives a TCP Echo option containing four information
+ bytes will return these same bytes in a TCP Echo Reply option.
+
+ This TCP Echo Reply option must be returned in the next segment
+ (e.g., an ACK segment) that is sent. If more than one Echo option is
+ received before a reply segment is sent, the TCP must choose only one
+ of the options to echo, ignoring the others; specifically, it must
+ choose the newest segment with the oldest sequence number (see next
+ section.)
+
+ To use the TCP Echo and Echo Reply options, a TCP must send a TCP
+ Echo option in its own SYN segment and receive a TCP Echo option in a
+ SYN segment from the other TCP. A TCP that does not implement the
+ TCP Echo or Echo Reply options must simply ignore any TCP Echo
+ options it receives. However, a TCP should not receive one of these
+
+
+
+Jacobson & Braden [Page 12]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ options in a non-SYN segment unless it included a TCP Echo option in
+ its own SYN segment.
+
+4.2 Using the Echo Options
+
+ If we wish to use the Echo/Echo Reply options for RTT measurement, we
+ have to define what the receiver does when there is not a one-to-one
+ correspondence between data and ACK segments. Assuming that we want
+ to minimize the state kept in the receiver (i.e., the number of
+ unprocessed Echo options), we can plan on a receiver remembering the
+ information value from at most one Echo between ACKs. There are
+ three situations to consider:
+
+ (A) Delayed ACKs.
+
+ Many TCP's acknowledge only every Kth segment out of a group of
+ segments arriving within a short time interval; this policy is
+ known generally as "delayed ACK's". The data-sender TCP must
+ measure the effective RTT, including the additional time due to
+ delayed ACK's, or else it will retransmit unnecessarily. Thus,
+ when delayed ACK's are in use, the receiver should reply with
+ the Echo option information from the earliest unacknowledged
+ segment.
+
+ (B) A hole in the sequence space (segment(s) have been lost).
+
+ The sender will continue sending until the window is filled, and
+ we may be generating ACKs as these out-of-order segments arrive
+ (e.g., for the SACK information or to aid "fast retransmit").
+ An Echo Reply option will tell the sender the RTT of some
+ recently sent segment (since the ACK can only contain the
+ sequence number of the hole, the sender may not be able to
+ determine which segment, but that doesn't matter). If the loss
+ was due to congestion, these RTTs may be particularly valuable
+ to the sender since they reflect the network characteristics
+ immediately after the congestion.
+
+ (C) A filled hole in the sequence space.
+
+ The segment that fills the hole represents the most recent
+ measurement of the network characteristics. On the other hand,
+ an RTT computed from an earlier segment would probably include
+ the sender's retransmit time-out, badly biasing the sender's
+ average RTT estimate.
+
+
+ Case (A) suggests the receiver should remember and return the Echo
+ option information from the oldest unacknowledged segment. Cases (B)
+
+
+
+Jacobson & Braden [Page 13]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ and (C) suggest that the option should come from the most recent
+ unacknowledged segment. An algorithm that covers all three cases is
+ for the receiver to return the Echo option information from the
+ newest segment with the oldest sequence number, as specified earlier.
+
+ A model implementation of these options is as follows.
+
+
+ (1) Receiver Implementation
+
+ A 32-bit slot for Echo option data, rcv.echodata, is added to
+ the receiver connection state, together with a flag,
+ rcv.echopresent, that indicates whether there is anything in the
+ slot. When the receiver generates a segment, it checks
+ rcv.echopresent and, if it is set, adds an echo-reply option
+ containing rcv.echodata to the outgoing segment then clears
+ rcv.echopresent.
+
+ If an incoming segment is in the window and contains an echo
+ option, the receiver checks rcv.echopresent. If it isn't set,
+ the value of the echo option is copied to rcv.echodata and
+ rcv.echopresent is set. If rcv.echopresent is already set, the
+ receiver checks whether the segment is at the left edge of the
+ window. If so, the segment's echo option value is copied to
+ rcv.echodata (this is situation (C) above). Otherwise, the
+ segment's echo option is ignored.
+
+
+ (2) Sender Implementation
+
+ The sender's connection state has a single flag bit,
+ snd.echoallowed, added. If snd.echoallowed is set or if the
+ segment contains a SYN, the sender is free to add a TCP Echo
+ option (presumably containing the current time in some units
+ convenient to the sender) to every outgoing segment.
+
+ Snd.echoallowed should be set if a SYN is received with a TCP
+ Echo option (presumably, a host that implements the option will
+ attempt to use it to time the SYN segment).
+
+
+5. CONCLUSIONS AND ACKNOWLEDGMENTS
+
+We have proposed five new TCP options for scaled windows, selective
+acknowledgments, and round-trip timing, in order to provide efficient
+operation over large-bandwidth*delay-product paths. These extensions
+are designed to provide compatible interworking with TCP's that do not
+implement the extensions.
+
+
+
+Jacobson & Braden [Page 14]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+The Window Scale option was originally suggested by Mike St. Johns of
+USAF/DCA. The present form of the option was suggested by Mike Karels
+of UC Berkeley in response to a more cumbersome scheme proposed by Van
+Jacobson. Gerd Beling of FGAN (West Germany) contributed the initial
+definition of the SACK option.
+
+All three options have evolved through discussion with the End-to-End
+Task Force, and the authors are grateful to the other members of the
+Task Force for their advice and encouragement.
+
+6. REFERENCES
+
+ [Cheriton88] Cheriton, D., "VMTP: Versatile Message Transaction
+ Protocol", RFC 1045, Stanford University, February 1988.
+
+ [Jain86] Jain, R., "Divergence of Timeout Algorithms for Packet
+ Retransmissions", Proc. Fifth Phoenix Conf. on Comp. and Comm.,
+ Scottsdale, Arizona, March 1986.
+
+ [Karn87] Karn, P. and C. Partridge, "Estimating Round-Trip Times
+ in Reliable Transport Protocols", Proc. SIGCOMM '87, Stowe, VT,
+ August 1987.
+
+ [Clark87] Clark, D., Lambert, M., and L. Zhang, "NETBLT: A Bulk
+ Data Transfer Protocol", RFC 998, MIT, March 1987.
+
+ [Nagle84] Nagle, J., "Congestion Control in IP/TCP
+ Internetworks", RFC 896, FACC, January 1984.
+
+ [NBS85] Colella, R., Aronoff, R., and K. Mills, "Performance
+ Improvements for ISO Transport", Ninth Data Comm Symposium,
+ published in ACM SIGCOMM Comp Comm Review, vol. 15, no. 5,
+ September 1985.
+
+ [Partridge87] Partridge, C., "Private Communication", February
+ 1987.
+
+ [Postel81] Postel, J., "Transmission Control Protocol - DARPA
+ Internet Program Protocol Specification", RFC 793, DARPA,
+ September 1981.
+
+ [Velten84] Velten, D., Hinden, R., and J. Sax, "Reliable Data
+ Protocol", RFC 908, BBN, July 1984.
+
+ [Jacobson88] Jacobson, V., "Congestion Avoidance and Control", to
+ be presented at SIGCOMM '88, Stanford, CA., August 1988.
+
+ [Zhang86] Zhang, L., "Why TCP Timers Don't Work Well", Proc.
+
+
+
+Jacobson & Braden [Page 15]
+
+RFC 1072 TCP Extensions for Long-Delay Paths October 1988
+
+
+ SIGCOMM '86, Stowe, Vt., August 1986.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Jacobson & Braden [Page 16]
+