diff options
Diffstat (limited to 'doc/rfc/rfc2018.txt')
-rw-r--r-- | doc/rfc/rfc2018.txt | 675 |
1 files changed, 675 insertions, 0 deletions
diff --git a/doc/rfc/rfc2018.txt b/doc/rfc/rfc2018.txt new file mode 100644 index 0000000..1d84811 --- /dev/null +++ b/doc/rfc/rfc2018.txt @@ -0,0 +1,675 @@ + + + + + + +Network Working Group M. Mathis +Request for Comments: 2018 J. Mahdavi +Category: Standards Track PSC + S. Floyd + LBNL + A. Romanow + Sun Microsystems + October 1996 + + + TCP Selective Acknowledgment Options + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + TCP may experience poor performance when multiple packets are lost + from one window of data. With the limited information available + from cumulative acknowledgments, a TCP sender can only learn about a + single lost packet per round trip time. An aggressive sender could + choose to retransmit packets early, but such retransmitted segments + may have already been successfully received. + + A Selective Acknowledgment (SACK) mechanism, combined with a + selective repeat retransmission policy, can help to overcome these + limitations. The receiving TCP sends back SACK packets to the sender + informing the sender of data that has been received. The sender can + then retransmit only the missing data segments. + + This memo proposes an implementation of SACK and discusses its + performance and related issues. + +Acknowledgements + + Much of the text in this document is taken directly from RFC1072 "TCP + Extensions for Long-Delay Paths" by Bob Braden and Van Jacobson. The + authors would like to thank Kevin Fall (LBNL), Christian Huitema + (INRIA), Van Jacobson (LBNL), Greg Miller (MITRE), Greg Minshall + (Ipsilon), Lixia Zhang (XEROX PARC and UCLA), Dave Borman (BSDI), + Allison Mankin (ISI) and others for their review and constructive + comments. + + + + +Mathis, et. al. Standards Track [Page 1] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + +1. Introduction + + Multiple packet losses from a window of data can have a catastrophic + effect on TCP throughput. TCP [Postel81] uses a cumulative + acknowledgment scheme in which received segments that are not at the + left edge of the receive window are not acknowledged. This forces + the sender to either wait a roundtrip time to find out about each + lost packet, or to unnecessarily retransmit segments which have been + correctly received [Fall95]. With the cumulative acknowledgment + scheme, multiple dropped segments generally cause TCP to lose its + ACK-based clock, reducing overall throughput. + + Selective Acknowledgment (SACK) is a strategy which corrects this + behavior in the face of multiple dropped segments. With selective + acknowledgments, the data receiver can inform the sender about all + segments that have arrived successfully, so the sender need + retransmit only the segments that have actually been lost. + + Several transport protocols, including NETBLT [Clark87], XTP + [Strayer92], RDP [Velten84], NADIR [Huitema81], and VMTP [Cheriton88] + have used selective acknowledgment. There is some empirical evidence + in favor of selective acknowledgments -- simple experiments with RDP + have shown that disabling the selective acknowledgment facility + greatly increases the number of retransmitted segments over a lossy, + high-delay Internet path [Partridge87]. A recent simulation study by + Kevin Fall and Sally Floyd [Fall95], demonstrates the strength of TCP + with SACK over the non-SACK Tahoe and Reno TCP implementations. + + RFC1072 [VJ88] describes one possible implementation of SACK options + for TCP. Unfortunately, it has never been deployed in the Internet, + as there was disagreement about how SACK options should be used in + conjunction with the TCP window shift option (initially described + RFC1072 and revised in [Jacobson92]). + + We propose slight modifications to the SACK options as proposed in + RFC1072. Specifically, sending a selective acknowledgment for the + most recently received data reduces the need for long SACK options + [Keshav94, Mathis95]. In addition, the SACK option now carries full + 32 bit sequence numbers. These two modifications represent the only + changes to the proposal in RFC1072. They make SACK easier to + implement and address concerns about robustness. + + The selective acknowledgment extension uses two TCP options. The + first is an enabling option, "SACK-permitted", which may be sent in a + SYN segment to indicate that the SACK option can be used once the + connection is established. The other is the SACK option itself, + which may be sent over an established connection once permission has + been given by SACK-permitted. + + + +Mathis, et. al. Standards Track [Page 2] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + + The SACK option is to be included in a segment sent from a TCP that + is receiving data to the TCP that is sending that data; we will refer + to these TCP's as the data receiver and the data sender, + respectively. We will consider a particular simplex data flow; any + data flowing in the reverse direction over the same connection can be + treated independently. + +2. Sack-Permitted Option + + This two-byte option may be sent in a SYN by a TCP that has been + extended to receive (and presumably process) the SACK option once the + connection has opened. It MUST NOT be sent on non-SYN segments. + + TCP Sack-Permitted Option: + + Kind: 4 + + +---------+---------+ + | Kind=4 | Length=2| + +---------+---------+ + +3. Sack Option Format + + The SACK option is to be used to convey extended acknowledgment + information from the receiver to the sender over an established TCP + connection. + + TCP SACK Option: + + Kind: 5 + + Length: Variable + + +--------+--------+ + | Kind=5 | Length | + +--------+--------+--------+--------+ + | Left Edge of 1st Block | + +--------+--------+--------+--------+ + | Right Edge of 1st Block | + +--------+--------+--------+--------+ + | | + / . . . / + | | + +--------+--------+--------+--------+ + | Left Edge of nth Block | + +--------+--------+--------+--------+ + | Right Edge of nth Block | + +--------+--------+--------+--------+ + + + +Mathis, et. al. Standards Track [Page 3] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + + The SACK option is to be sent by a data receiver to inform the data + sender of non-contiguous blocks of data that have been received and + queued. The data receiver awaits the receipt of data (perhaps by + means of retransmissions) to fill the gaps in sequence space between + received blocks. When missing segments are received, the data + receiver acknowledges the data normally by advancing the left window + edge in the Acknowledgement Number Field of the TCP header. The SACK + option does not change the meaning of the Acknowledgement Number + field. + + This option contains a list of some of the blocks of contiguous + sequence space occupied by data that has been received and queued + within the window. + + Each contiguous block of data queued at the data receiver is defined + in the SACK option by two 32-bit unsigned integers in network byte + order: + + * Left Edge of Block + + This is the first sequence number of this block. + + * Right Edge of Block + + This is the sequence number immediately following the last + sequence number of this block. + + Each block represents received bytes of data that are contiguous and + isolated; that is, the bytes just below the block, (Left Edge of + Block - 1), and just above the block, (Right Edge of Block), have not + been received. + + A SACK option that specifies n blocks will have a length of 8*n+2 + bytes, so the 40 bytes available for TCP options can specify a + maximum of 4 blocks. It is expected that SACK will often be used in + conjunction with the Timestamp option used for RTTM [Jacobson92], + which takes an additional 10 bytes (plus two bytes of padding); thus + a maximum of 3 SACK blocks will be allowed in this case. + + The SACK option is advisory, in that, while it notifies the data + sender that the data receiver has received the indicated segments, + the data receiver is permitted to later discard data which have been + reported in a SACK option. A discussion appears below in Section 8 + of the consequences of advisory SACK, in particular that the data + receiver may renege, or drop already SACKed data. + + + + + + +Mathis, et. al. Standards Track [Page 4] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + +4. Generating Sack Options: Data Receiver Behavior + + If the data receiver has received a SACK-Permitted option on the SYN + for this connection, the data receiver MAY elect to generate SACK + options as described below. If the data receiver generates SACK + options under any circumstance, it SHOULD generate them under all + permitted circumstances. If the data receiver has not received a + SACK-Permitted option for a given connection, it MUST NOT send SACK + options on that connection. + + If sent at all, SACK options SHOULD be included in all ACKs which do + not ACK the highest sequence number in the data receiver's queue. In + this situation the network has lost or mis-ordered data, such that + the receiver holds non-contiguous data in its queue. RFC 1122, + Section 4.2.2.21, discusses the reasons for the receiver to send ACKs + in response to additional segments received in this state. The + receiver SHOULD send an ACK for every valid segment that arrives + containing new data, and each of these "duplicate" ACKs SHOULD bear a + SACK option. + + If the data receiver chooses to send a SACK option, the following + rules apply: + + * The first SACK block (i.e., the one immediately following the + kind and length fields in the option) MUST specify the contiguous + block of data containing the segment which triggered this ACK, + unless that segment advanced the Acknowledgment Number field in + the header. This assures that the ACK with the SACK option + reflects the most recent change in the data receiver's buffer + queue. + + * The data receiver SHOULD include as many distinct SACK blocks as + possible in the SACK option. Note that the maximum available + option space may not be sufficient to report all blocks present in + the receiver's queue. + + * The SACK option SHOULD be filled out by repeating the most + recently reported SACK blocks (based on first SACK blocks in + previous SACK options) that are not subsets of a SACK block + already included in the SACK option being constructed. This + assures that in normal operation, any segment remaining part of a + non-contiguous block of data held by the data receiver is reported + in at least three successive SACK options, even for large-window + TCP implementations [RFC1323]). After the first SACK block, the + following SACK blocks in the SACK option may be listed in + arbitrary order. + + + + + +Mathis, et. al. Standards Track [Page 5] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + + It is very important that the SACK option always reports the block + containing the most recently received segment, because this provides + the sender with the most up-to-date information about the state of + the network and the data receiver's queue. + +5. Interpreting the Sack Option and Retransmission Strategy: Data + Sender Behavior + + When receiving an ACK containing a SACK option, the data sender + SHOULD record the selective acknowledgment for future reference. The + data sender is assumed to have a retransmission queue that contains + the segments that have been transmitted but not yet acknowledged, in + sequence-number order. If the data sender performs re-packetization + before retransmission, the block boundaries in a SACK option that it + receives may not fall on boundaries of segments in the retransmission + queue; however, this does not pose a serious difficulty for the + sender. + + One possible implementation of the sender's behavior is as follows. + Let us suppose that for each segment in the retransmission queue + there is a (new) flag bit "SACKed", to be used to indicate that this + particular segment has been reported in a SACK option. + + When an acknowledgment segment arrives containing a SACK option, the + data sender will turn on the SACKed bits for segments that have been + selectively acknowledged. More specifically, for each block in the + SACK option, the data sender will turn on the SACKed flags for all + segments in the retransmission queue that are wholly contained within + that block. This requires straightforward sequence number + comparisons. + + After the SACKed bit is turned on (as the result of processing a + received SACK option), the data sender will skip that segment during + any later retransmission. Any segment that has the SACKed bit turned + off and is less than the highest SACKed segment is available for + retransmission. + + After a retransmit timeout the data sender SHOULD turn off all of the + SACKed bits, since the timeout might indicate that the data receiver + has reneged. The data sender MUST retransmit the segment at the left + edge of the window after a retransmit timeout, whether or not the + SACKed bit is on for that segment. A segment will not be dequeued + and its buffer freed until the left window edge is advanced over it. + + + + + + + + +Mathis, et. al. Standards Track [Page 6] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + +5.1 Congestion Control Issues + + This document does not attempt to specify in detail the congestion + control algorithms for implementations of TCP with SACK. However, + the congestion control algorithms present in the de facto standard + TCP implementations MUST be preserved [Stevens94]. In particular, to + preserve robustness in the presence of packets reordered by the + network, recovery is not triggered by a single ACK reporting out-of- + order packets at the receiver. Further, during recovery the data + sender limits the number of segments sent in response to each ACK. + Existing implementations limit the data sender to sending one segment + during Reno-style fast recovery, or to two segments during slow-start + [Jacobson88]. Other aspects of congestion control, such as reducing + the congestion window in response to congestion, must similarly be + preserved. + + The use of time-outs as a fall-back mechanism for detecting dropped + packets is unchanged by the SACK option. Because the data receiver + is allowed to discard SACKed data, when a retransmit timeout occurs + the data sender MUST ignore prior SACK information in determining + which data to retransmit. + + Future research into congestion control algorithms may take advantage + of the additional information provided by SACK. One such area for + future research concerns modifications to TCP for a wireless or + satellite environment where packet loss is not necessarily an + indication of congestion. + +6. Efficiency and Worst Case Behavior + + If the return path carrying ACKs and SACK options were lossless, one + block per SACK option packet would always be sufficient. Every + segment arriving while the data receiver holds discontinuous data + would cause the data receiver to send an ACK with a SACK option + containing the one altered block in the receiver's queue. The data + sender is thus able to construct a precise replica of the receiver's + queue by taking the union of all the first SACK blocks. + + + + + + + + + + + + + + +Mathis, et. al. Standards Track [Page 7] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + + Since the return path is not lossless, the SACK option is defined to + include more than one SACK block in a single packet. The redundant + blocks in the SACK option packet increase the robustness of SACK + delivery in the presence of lost ACKs. For a receiver that is also + using the time stamp option [Jacobson92], the SACK option has room to + include three SACK blocks. Thus each SACK block will generally be + repeated at least three times, if necessary, once in each of three + successive ACK packets. However, if all of the ACK packets reporting + a particular SACK block are dropped, then the sender might assume + that the data in that SACK block has not been received, and + unnecessarily retransmit those segments. + + The deployment of other TCP options may reduce the number of + available SACK blocks to 2 or even to 1. This will reduce the + redundancy of SACK delivery in the presence of lost ACKs. Even so, + the exposure of TCP SACK in regard to the unnecessary retransmission + of packets is strictly less than the exposure of current + implementations of TCP. The worst-case conditions necessary for the + sender to needlessly retransmit data is discussed in more detail in a + separate document [Floyd96]. + + Older TCP implementations which do not have the SACK option will not + be unfairly disadvantaged when competing against SACK-capable TCPs. + This issue is discussed in more detail in [Floyd96]. + +7. Sack Option Examples + + The following examples attempt to demonstrate the proper behavior of + SACK generation by the data receiver. + + Assume the left window edge is 5000 and that the data transmitter + sends a burst of 8 segments, each containing 500 data bytes. + + Case 1: The first 4 segments are received but the last 4 are + dropped. + + The data receiver will return a normal TCP ACK segment + acknowledging sequence number 7000, with no SACK option. + + + + + + + + + + + + + +Mathis, et. al. Standards Track [Page 8] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + + Case 2: The first segment is dropped but the remaining 7 are + received. + + Upon receiving each of the last seven packets, the data + receiver will return a TCP ACK segment that acknowledges + sequence number 5000 and contains a SACK option specifying + one block of queued data: + + Triggering ACK Left Edge Right Edge + Segment + + 5000 (lost) + 5500 5000 5500 6000 + 6000 5000 5500 6500 + 6500 5000 5500 7000 + 7000 5000 5500 7500 + 7500 5000 5500 8000 + 8000 5000 5500 8500 + 8500 5000 5500 9000 + + + Case 3: The 2nd, 4th, 6th, and 8th (last) segments are + dropped. + + The data receiver ACKs the first packet normally. The + third, fifth, and seventh packets trigger SACK options as + follows: + + Triggering ACK First Block 2nd Block 3rd Block + Segment Left Right Left Right Left Right + Edge Edge Edge Edge Edge Edge + + 5000 5500 + 5500 (lost) + 6000 5500 6000 6500 + 6500 (lost) + 7000 5500 7000 7500 6000 6500 + 7500 (lost) + 8000 5500 8000 8500 7000 7500 6000 6500 + 8500 (lost) + + + + + + + + + + + +Mathis, et. al. Standards Track [Page 9] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + + Suppose at this point, the 4th packet is received out of order. + (This could either be because the data was badly misordered in the + network, or because the 2nd packet was retransmitted and lost, and + then the 4th packet was retransmitted). At this point the data + receiver has only two SACK blocks to report. The data receiver + replies with the following Selective Acknowledgment: + + Triggering ACK First Block 2nd Block 3rd Block + Segment Left Right Left Right Left Right + Edge Edge Edge Edge Edge Edge + + 6500 5500 6000 7500 8000 8500 + + Suppose at this point, the 2nd segment is received. The data + receiver then replies with the following Selective Acknowledgment: + + Triggering ACK First Block 2nd Block 3rd Block + Segment Left Right Left Right Left Right + Edge Edge Edge Edge Edge Edge + + 5500 7500 8000 8500 + +8. Data Receiver Reneging + + Note that the data receiver is permitted to discard data in its queue + that has not been acknowledged to the data sender, even if the data + has already been reported in a SACK option. Such discarding of + SACKed packets is discouraged, but may be used if the receiver runs + out of buffer space. + + The data receiver MAY elect not to keep data which it has reported in + a SACK option. In this case, the receiver SACK generation is + additionally qualified: + + * The first SACK block MUST reflect the newest segment. Even if + the newest segment is going to be discarded and the receiver has + already discarded adjacent segments, the first SACK block MUST + report, at a minimum, the left and right edges of the newest + segment. + + * Except for the newest segment, all SACK blocks MUST NOT report + any old data which is no longer actually held by the receiver. + + Since the data receiver may later discard data reported in a SACK + option, the sender MUST NOT discard data before it is acknowledged by + the Acknowledgment Number field in the TCP header. + + + + + +Mathis, et. al. Standards Track [Page 10] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + +9. Security Considerations + + This document neither strengthens nor weakens TCP's current security + properties. + +10. References + + [Cheriton88] Cheriton, D., "VMTP: Versatile Message Transaction + Protocol", RFC 1045, Stanford University, February 1988. + + [Clark87] Clark, D., Lambert, M., and L. Zhang, "NETBLT: A Bulk Data + Transfer Protocol", RFC 998, MIT, March 1987. + + [Fall95] Fall, K. and Floyd, S., "Comparisons of Tahoe, Reno, and + Sack TCP", ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z, December 1995. + + [Floyd96] Floyd, S., "Issues of TCP with SACK", + ftp://ftp.ee.lbl.gov/papers/issues_sa.ps.Z, January 1996. + + [Huitema81] Huitema, C., and Valet, I., An Experiment on High Speed + File Transfer using Satellite Links, 7th Data Communication + Symposium, Mexico, October 1981. + + [Jacobson88] Jacobson, V., "Congestion Avoidance and Control", + Proceedings of SIGCOMM '88, Stanford, CA., August 1988. + + [Jacobson88}, Jacobson, V. and R. Braden, "TCP Extensions for Long- + Delay Paths", RFC 1072, October 1988. + + [Jacobson92] Jacobson, V., Braden, R., and D. Borman, "TCP Extensions + for High Performance", RFC 1323, May 1992. + + [Keshav94] Keshav, presentation to the Internet End-to-End Research + Group, November 1994. + + [Mathis95] Mathis, M., and Mahdavi, J., TCP Forward Acknowledgment + Option, presentation to the Internet End-to-End Research Group, June + 1995. + + [Partridge87] Partridge, C., "Private Communication", February 1987. + + [Postel81] Postel, J., "Transmission Control Protocol - DARPA + Internet Program Protocol Specification", RFC 793, DARPA, September + 1981. + + [Stevens94] Stevens, W., TCP/IP Illustrated, Volume 1: The Protocols, + Addison-Wesley, 1994. + + + + +Mathis, et. al. Standards Track [Page 11] + +RFC 2018 TCP Selective Acknowledgement Options October 1996 + + + [Strayer92] Strayer, T., Dempsey, B., and Weaver, A., XTP -- the + xpress transfer protocol. Addison-Wesley Publishing Company, 1992. + + [Velten84] Velten, D., Hinden, R., and J. Sax, "Reliable Data + Protocol", RFC 908, BBN, July 1984. + +11. Authors' Addresses + + Matt Mathis and Jamshid Mahdavi + Pittsburgh Supercomputing Center + 4400 Fifth Ave + Pittsburgh, PA 15213 + mathis@psc.edu + mahdavi@psc.edu + + Sally Floyd + Lawrence Berkeley National Laboratory + One Cyclotron Road + Berkeley, CA 94720 + floyd@ee.lbl.gov + + Allyn Romanow + Sun Microsystems, Inc. + 2550 Garcia Ave., MPK17-202 + Mountain View, CA 94043 + allyn@eng.sun.com + + + + + + + + + + + + + + + + + + + + + + + + + +Mathis, et. al. Standards Track [Page 12] + |