From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc5681.txt | 1011 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1011 insertions(+) create mode 100644 doc/rfc/rfc5681.txt (limited to 'doc/rfc/rfc5681.txt') diff --git a/doc/rfc/rfc5681.txt b/doc/rfc/rfc5681.txt new file mode 100644 index 0000000..07b414f --- /dev/null +++ b/doc/rfc/rfc5681.txt @@ -0,0 +1,1011 @@ + + + + + + +Network Working Group M. Allman +Request for Comments: 5681 V. Paxson +Obsoletes: 2581 ICSI +Category: Standards Track E. Blanton + Purdue University + September 2009 + + + TCP Congestion Control + +Abstract + + This document defines TCP's four intertwined congestion control + algorithms: slow start, congestion avoidance, fast retransmit, and + fast recovery. In addition, the document specifies how TCP should + begin transmission after a relatively long idle period, as well as + discussing various acknowledgment generation methods. This document + obsoletes RFC 2581. + +Status of This Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (c) 2009 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents in effect on the date of + publication of this document (http://trustee.ietf.org/license-info). + Please review these documents carefully, as they describe your rights + and restrictions with respect to this document. + + This document may contain material from IETF Documents or IETF + Contributions published or made publicly available before November + 10, 2008. The person(s) controlling the copyright in some of this + material may not have granted the IETF Trust the right to allow + modifications of such material outside the IETF Standards Process. + Without obtaining an adequate license from the person(s) controlling + the copyright in such materials, this document may not be modified + outside the IETF Standards Process, and derivative works of it may + + + + + +Allman, et al. Standards Track [Page 1] + +RFC 5681 TCP Congestion Control September 2009 + + + not be created outside the IETF Standards Process, except to format + it for publication as an RFC or to translate it into languages other + than English. + +Table Of Contents + + 1. Introduction ....................................................2 + 2. Definitions .....................................................3 + 3. Congestion Control Algorithms ...................................4 + 3.1. Slow Start and Congestion Avoidance ........................4 + 3.2. Fast Retransmit/Fast Recovery ..............................8 + 4. Additional Considerations ......................................10 + 4.1. Restarting Idle Connections ...............................10 + 4.2. Generating Acknowledgments ................................11 + 4.3. Loss Recovery Mechanisms ..................................12 + 5. Security Considerations ........................................13 + 6. Changes between RFC 2001 and RFC 2581 ..........................13 + 7. Changes Relative to RFC 2581 ...................................14 + 8. Acknowledgments ................................................15 + 9. References .....................................................15 + 9.1. Normative References ......................................15 + 9.2. Informative References ....................................16 + +1. Introduction + + This document specifies four TCP [RFC793] congestion control + algorithms: slow start, congestion avoidance, fast retransmit and + fast recovery. These algorithms were devised in [Jac88] and [Jac90]. + Their use with TCP is standardized in [RFC1122]. Additional early + work in additive-increase, multiplicative-decrease congestion control + is given in [CJ89]. + + Note that [Ste94] provides examples of these algorithms in action and + [WS95] provides an explanation of the source code for the BSD + implementation of these algorithms. + + In addition to specifying these congestion control algorithms, this + document specifies what TCP connections should do after a relatively + long idle period, as well as specifying and clarifying some of the + issues pertaining to TCP ACK generation. + + This document obsoletes [RFC2581], which in turn obsoleted [RFC2001]. + + This document is organized as follows. Section 2 provides various + definitions that will be used throughout the document. Section 3 + provides a specification of the congestion control algorithms. + Section 4 outlines concerns related to the congestion control + algorithms and finally, section 5 outlines security considerations. + + + +Allman, et al. Standards Track [Page 2] + +RFC 5681 TCP Congestion Control September 2009 + + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + +2. Definitions + + This section provides the definition of several terms that will be + used throughout the remainder of this document. + + SEGMENT: A segment is ANY TCP/IP data or acknowledgment packet (or + both). + + SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the + largest segment that the sender can transmit. This value can be + based on the maximum transmission unit of the network, the path + MTU discovery [RFC1191, RFC4821] algorithm, RMSS (see next item), + or other factors. The size does not include the TCP/IP headers + and options. + + RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the size of the + largest segment the receiver is willing to accept. This is the + value specified in the MSS option sent by the receiver during + connection startup. Or, if the MSS option is not used, it is 536 + bytes [RFC1122]. The size does not include the TCP/IP headers and + options. + + FULL-SIZED SEGMENT: A segment that contains the maximum number of + data bytes permitted (i.e., a segment containing SMSS bytes of + data). + + RECEIVER WINDOW (rwnd): The most recently advertised receiver window. + + CONGESTION WINDOW (cwnd): A TCP state variable that limits the amount + of data a TCP can send. At any given time, a TCP MUST NOT send + data with a sequence number higher than the sum of the highest + acknowledged sequence number and the minimum of cwnd and rwnd. + + INITIAL WINDOW (IW): The initial window is the size of the sender's + congestion window after the three-way handshake is completed. + + LOSS WINDOW (LW): The loss window is the size of the congestion + window after a TCP sender detects loss using its retransmission + timer. + + RESTART WINDOW (RW): The restart window is the size of the congestion + window after a TCP restarts transmission after an idle period (if + the slow start algorithm is used; see section 4.1 for more + discussion). + + + +Allman, et al. Standards Track [Page 3] + +RFC 5681 TCP Congestion Control September 2009 + + + FLIGHT SIZE: The amount of data that has been sent but not yet + cumulatively acknowledged. + + DUPLICATE ACKNOWLEDGMENT: An acknowledgment is considered a + "duplicate" in the following algorithms when (a) the receiver of + the ACK has outstanding data, (b) the incoming acknowledgment + carries no data, (c) the SYN and FIN bits are both off, (d) the + acknowledgment number is equal to the greatest acknowledgment + received on the given connection (TCP.UNA from [RFC793]) and (e) + the advertised window in the incoming acknowledgment equals the + advertised window in the last incoming acknowledgment. + + Alternatively, a TCP that utilizes selective acknowledgments + (SACKs) [RFC2018, RFC2883] can leverage the SACK information to + determine when an incoming ACK is a "duplicate" (e.g., if the ACK + contains previously unknown SACK information). + +3. Congestion Control Algorithms + + This section defines the four congestion control algorithms: slow + start, congestion avoidance, fast retransmit, and fast recovery, + developed in [Jac88] and [Jac90]. In some situations, it may be + beneficial for a TCP sender to be more conservative than the + algorithms allow; however, a TCP MUST NOT be more aggressive than the + following algorithms allow (that is, MUST NOT send data when the + value of cwnd computed by the following algorithms would not allow + the data to be sent). + + Also, note that the algorithms specified in this document work in + terms of using loss as the signal of congestion. Explicit Congestion + Notification (ECN) could also be used as specified in [RFC3168]. + +3.1. Slow Start and Congestion Avoidance + + The slow start and congestion avoidance algorithms MUST be used by a + TCP sender to control the amount of outstanding data being injected + into the network. To implement these algorithms, two variables are + added to the TCP per-connection state. The congestion window (cwnd) + is a sender-side limit on the amount of data the sender can transmit + into the network before receiving an acknowledgment (ACK), while the + receiver's advertised window (rwnd) is a receiver-side limit on the + amount of outstanding data. The minimum of cwnd and rwnd governs + data transmission. + + Another state variable, the slow start threshold (ssthresh), is used + to determine whether the slow start or congestion avoidance algorithm + is used to control data transmission, as discussed below. + + + + +Allman, et al. Standards Track [Page 4] + +RFC 5681 TCP Congestion Control September 2009 + + + Beginning transmission into a network with unknown conditions + requires TCP to slowly probe the network to determine the available + capacity, in order to avoid congesting the network with an + inappropriately large burst of data. The slow start algorithm is + used for this purpose at the beginning of a transfer, or after + repairing loss detected by the retransmission timer. Slow start + additionally serves to start the "ACK clock" used by the TCP sender + to release data into the network in the slow start, congestion + avoidance, and loss recovery algorithms. + + IW, the initial value of cwnd, MUST be set using the following + guidelines as an upper bound. + + If SMSS > 2190 bytes: + IW = 2 * SMSS bytes and MUST NOT be more than 2 segments + If (SMSS > 1095 bytes) and (SMSS <= 2190 bytes): + IW = 3 * SMSS bytes and MUST NOT be more than 3 segments + if SMSS <= 1095 bytes: + IW = 4 * SMSS bytes and MUST NOT be more than 4 segments + + As specified in [RFC3390], the SYN/ACK and the acknowledgment of the + SYN/ACK MUST NOT increase the size of the congestion window. + Further, if the SYN or SYN/ACK is lost, the initial window used by a + sender after a correctly transmitted SYN MUST be one segment + consisting of at most SMSS bytes. + + A detailed rationale and discussion of the IW setting is provided in + [RFC3390]. + + When initial congestion windows of more than one segment are + implemented along with Path MTU Discovery [RFC1191], and the MSS + being used is found to be too large, the congestion window cwnd + SHOULD be reduced to prevent large bursts of smaller segments. + Specifically, cwnd SHOULD be reduced by the ratio of the old segment + size to the new segment size. + + The initial value of ssthresh SHOULD be set arbitrarily high (e.g., + to the size of the largest possible advertised window), but ssthresh + MUST be reduced in response to congestion. Setting ssthresh as high + as possible allows the network conditions, rather than some arbitrary + host limit, to dictate the sending rate. In cases where the end + systems have a solid understanding of the network path, more + carefully setting the initial ssthresh value may have merit (e.g., + such that the end host does not create congestion along the path). + + + + + + + +Allman, et al. Standards Track [Page 5] + +RFC 5681 TCP Congestion Control September 2009 + + + The slow start algorithm is used when cwnd < ssthresh, while the + congestion avoidance algorithm is used when cwnd > ssthresh. When + cwnd and ssthresh are equal, the sender may use either slow start or + congestion avoidance. + + During slow start, a TCP increments cwnd by at most SMSS bytes for + each ACK received that cumulatively acknowledges new data. Slow + start ends when cwnd exceeds ssthresh (or, optionally, when it + reaches it, as noted above) or when congestion is observed. While + traditionally TCP implementations have increased cwnd by precisely + SMSS bytes upon receipt of an ACK covering new data, we RECOMMEND + that TCP implementations increase cwnd, per: + + cwnd += min (N, SMSS) (2) + + where N is the number of previously unacknowledged bytes acknowledged + in the incoming ACK. This adjustment is part of Appropriate Byte + Counting [RFC3465] and provides robustness against misbehaving + receivers that may attempt to induce a sender to artificially inflate + cwnd using a mechanism known as "ACK Division" [SCWA99]. ACK + Division consists of a receiver sending multiple ACKs for a single + TCP data segment, each acknowledging only a portion of its data. A + TCP that increments cwnd by SMSS for each such ACK will + inappropriately inflate the amount of data injected into the network. + + During congestion avoidance, cwnd is incremented by roughly 1 full- + sized segment per round-trip time (RTT). Congestion avoidance + continues until congestion is detected. The basic guidelines for + incrementing cwnd during congestion avoidance are: + + * MAY increment cwnd by SMSS bytes + + * SHOULD increment cwnd per equation (2) once per RTT + + * MUST NOT increment cwnd by more than SMSS bytes + + We note that [RFC3465] allows for cwnd increases of more than SMSS + bytes for incoming acknowledgments during slow start on an + experimental basis; however, such behavior is not allowed as part of + the standard. + + The RECOMMENDED way to increase cwnd during congestion avoidance is + to count the number of bytes that have been acknowledged by ACKs for + new data. (A drawback of this implementation is that it requires + maintaining an additional state variable.) When the number of bytes + acknowledged reaches cwnd, then cwnd can be incremented by up to SMSS + bytes. Note that during congestion avoidance, cwnd MUST NOT be + + + + +Allman, et al. Standards Track [Page 6] + +RFC 5681 TCP Congestion Control September 2009 + + + increased by more than SMSS bytes per RTT. This method both allows + TCPs to increase cwnd by one segment per RTT in the face of delayed + ACKs and provides robustness against ACK Division attacks. + + Another common formula that a TCP MAY use to update cwnd during + congestion avoidance is given in equation (3): + + cwnd += SMSS*SMSS/cwnd (3) + + This adjustment is executed on every incoming ACK that acknowledges + new data. Equation (3) provides an acceptable approximation to the + underlying principle of increasing cwnd by 1 full-sized segment per + RTT. (Note that for a connection in which the receiver is + acknowledging every-other packet, (3) is less aggressive than allowed + -- roughly increasing cwnd every second RTT.) + + Implementation Note: Since integer arithmetic is usually used in TCP + implementations, the formula given in equation (3) can fail to + increase cwnd when the congestion window is larger than SMSS*SMSS. + If the above formula yields 0, the result SHOULD be rounded up to 1 + byte. + + Implementation Note: Older implementations have an additional + additive constant on the right-hand side of equation (3). This is + incorrect and can actually lead to diminished performance [RFC2525]. + + Implementation Note: Some implementations maintain cwnd in units of + bytes, while others in units of full-sized segments. The latter will + find equation (3) difficult to use, and may prefer to use the + counting approach discussed in the previous paragraph. + + When a TCP sender detects segment loss using the retransmission timer + and the given segment has not yet been resent by way of the + retransmission timer, the value of ssthresh MUST be set to no more + than the value given in equation (4): + + ssthresh = max (FlightSize / 2, 2*SMSS) (4) + + where, as discussed above, FlightSize is the amount of outstanding + data in the network. + + On the other hand, when a TCP sender detects segment loss using the + retransmission timer and the given segment has already been + retransmitted by way of the retransmission timer at least once, the + value of ssthresh is held constant. + + + + + + +Allman, et al. Standards Track [Page 7] + +RFC 5681 TCP Congestion Control September 2009 + + + Implementation Note: An easy mistake to make is to simply use cwnd, + rather than FlightSize, which in some implementations may + incidentally increase well beyond rwnd. + + Furthermore, upon a timeout (as specified in [RFC2988]) cwnd MUST be + set to no more than the loss window, LW, which equals 1 full-sized + segment (regardless of the value of IW). Therefore, after + retransmitting the dropped segment the TCP sender uses the slow start + algorithm to increase the window from 1 full-sized segment to the new + value of ssthresh, at which point congestion avoidance again takes + over. + + As shown in [FF96] and [RFC3782], slow-start-based loss recovery + after a timeout can cause spurious retransmissions that trigger + duplicate acknowledgments. The reaction to the arrival of these + duplicate ACKs in TCP implementations varies widely. This document + does not specify how to treat such acknowledgments, but does note + this as an area that may benefit from additional attention, + experimentation and specification. + +3.2. Fast Retransmit/Fast Recovery + + A TCP receiver SHOULD send an immediate duplicate ACK when an out- + of-order segment arrives. The purpose of this ACK is to inform the + sender that a segment was received out-of-order and which sequence + number is expected. From the sender's perspective, duplicate ACKs + can be caused by a number of network problems. First, they can be + caused by dropped segments. In this case, all segments after the + dropped segment will trigger duplicate ACKs until the loss is + repaired. Second, duplicate ACKs can be caused by the re-ordering of + data segments by the network (not a rare event along some network + paths [Pax97]). Finally, duplicate ACKs can be caused by replication + of ACK or data segments by the network. In addition, a TCP receiver + SHOULD send an immediate ACK when the incoming segment fills in all + or part of a gap in the sequence space. This will generate more + timely information for a sender recovering from a loss through a + retransmission timeout, a fast retransmit, or an advanced loss + recovery algorithm, as outlined in section 4.3. + + The TCP sender SHOULD use the "fast retransmit" algorithm to detect + and repair loss, based on incoming duplicate ACKs. The fast + retransmit algorithm uses the arrival of 3 duplicate ACKs (as defined + in section 2, without any intervening ACKs which move SND.UNA) as an + indication that a segment has been lost. After receiving 3 duplicate + ACKs, TCP performs a retransmission of what appears to be the missing + segment, without waiting for the retransmission timer to expire. + + + + + +Allman, et al. Standards Track [Page 8] + +RFC 5681 TCP Congestion Control September 2009 + + + After the fast retransmit algorithm sends what appears to be the + missing segment, the "fast recovery" algorithm governs the + transmission of new data until a non-duplicate ACK arrives. The + reason for not performing slow start is that the receipt of the + duplicate ACKs not only indicates that a segment has been lost, but + also that segments are most likely leaving the network (although a + massive segment duplication by the network can invalidate this + conclusion). In other words, since the receiver can only generate a + duplicate ACK when a segment has arrived, that segment has left the + network and is in the receiver's buffer, so we know it is no longer + consuming network resources. Furthermore, since the ACK "clock" + [Jac88] is preserved, the TCP sender can continue to transmit new + segments (although transmission must continue using a reduced cwnd, + since loss is an indication of congestion). + + The fast retransmit and fast recovery algorithms are implemented + together as follows. + + 1. On the first and second duplicate ACKs received at a sender, a + TCP SHOULD send a segment of previously unsent data per [RFC3042] + provided that the receiver's advertised window allows, the total + FlightSize would remain less than or equal to cwnd plus 2*SMSS, + and that new data is available for transmission. Further, the + TCP sender MUST NOT change cwnd to reflect these two segments + [RFC3042]. Note that a sender using SACK [RFC2018] MUST NOT send + new data unless the incoming duplicate acknowledgment contains + new SACK information. + + 2. When the third duplicate ACK is received, a TCP MUST set ssthresh + to no more than the value given in equation (4). When [RFC3042] + is in use, additional data sent in limited transmit MUST NOT be + included in this calculation. + + 3. The lost segment starting at SND.UNA MUST be retransmitted and + cwnd set to ssthresh plus 3*SMSS. This artificially "inflates" + the congestion window by the number of segments (three) that have + left the network and which the receiver has buffered. + + 4. For each additional duplicate ACK received (after the third), + cwnd MUST be incremented by SMSS. This artificially inflates the + congestion window in order to reflect the additional segment that + has left the network. + + Note: [SCWA99] discusses a receiver-based attack whereby many + bogus duplicate ACKs are sent to the data sender in order to + artificially inflate cwnd and cause a higher than appropriate + + + + + +Allman, et al. Standards Track [Page 9] + +RFC 5681 TCP Congestion Control September 2009 + + + sending rate to be used. A TCP MAY therefore limit the number of + times cwnd is artificially inflated during loss recovery to the + number of outstanding segments (or, an approximation thereof). + + Note: When an advanced loss recovery mechanism (such as outlined + in section 4.3) is not in use, this increase in FlightSize can + cause equation (4) to slightly inflate cwnd and ssthresh, as some + of the segments between SND.UNA and SND.NXT are assumed to have + left the network but are still reflected in FlightSize. + + 5. When previously unsent data is available and the new value of + cwnd and the receiver's advertised window allow, a TCP SHOULD + send 1*SMSS bytes of previously unsent data. + + 6. When the next ACK arrives that acknowledges previously + unacknowledged data, a TCP MUST set cwnd to ssthresh (the value + set in step 2). This is termed "deflating" the window. + + This ACK should be the acknowledgment elicited by the + retransmission from step 3, one RTT after the retransmission + (though it may arrive sooner in the presence of significant out- + of-order delivery of data segments at the receiver). + Additionally, this ACK should acknowledge all the intermediate + segments sent between the lost segment and the receipt of the + third duplicate ACK, if none of these were lost. + + Note: This algorithm is known to generally not recover efficiently + from multiple losses in a single flight of packets [FF96]. Section + 4.3 below addresses such cases. + +4. Additional Considerations + +4.1. Restarting Idle Connections + + A known problem with the TCP congestion control algorithms described + above is that they allow a potentially inappropriate burst of traffic + to be transmitted after TCP has been idle for a relatively long + period of time. After an idle period, TCP cannot use the ACK clock + to strobe new segments into the network, as all the ACKs have drained + from the network. Therefore, as specified above, TCP can potentially + send a cwnd-size line-rate burst into the network after an idle + period. In addition, changing network conditions may have rendered + TCP's notion of the available end-to-end network capacity between two + endpoints, as estimated by cwnd, inaccurate during the course of a + long idle period. + + + + + + +Allman, et al. Standards Track [Page 10] + +RFC 5681 TCP Congestion Control September 2009 + + + [Jac88] recommends that a TCP use slow start to restart transmission + after a relatively long idle period. Slow start serves to restart + the ACK clock, just as it does at the beginning of a transfer. This + mechanism has been widely deployed in the following manner. When TCP + has not received a segment for more than one retransmission timeout, + cwnd is reduced to the value of the restart window (RW) before + transmission begins. + + For the purposes of this standard, we define RW = min(IW,cwnd). + + Using the last time a segment was received to determine whether or + not to decrease cwnd can fail to deflate cwnd in the common case of + persistent HTTP connections [HTH98]. In this case, a Web server + receives a request before transmitting data to the Web client. The + reception of the request makes the test for an idle connection fail, + and allows the TCP to begin transmission with a possibly + inappropriately large cwnd. + + Therefore, a TCP SHOULD set cwnd to no more than RW before beginning + transmission if the TCP has not sent data in an interval exceeding + the retransmission timeout. + +4.2. Generating Acknowledgments + + The delayed ACK algorithm specified in [RFC1122] SHOULD be used by a + TCP receiver. When using delayed ACKs, a TCP receiver MUST NOT + excessively delay acknowledgments. Specifically, an ACK SHOULD be + generated for at least every second full-sized segment, and MUST be + generated within 500 ms of the arrival of the first unacknowledged + packet. + + The requirement that an ACK "SHOULD" be generated for at least every + second full-sized segment is listed in [RFC1122] in one place as a + SHOULD and another as a MUST. Here we unambiguously state it is a + SHOULD. We also emphasize that this is a SHOULD, meaning that an + implementor should indeed only deviate from this requirement after + careful consideration of the implications. See the discussion of + "Stretch ACK violation" in [RFC2525] and the references therein for a + discussion of the possible performance problems with generating ACKs + less frequently than every second full-sized segment. + + In some cases, the sender and receiver may not agree on what + constitutes a full-sized segment. An implementation is deemed to + comply with this requirement if it sends at least one acknowledgment + every time it receives 2*RMSS bytes of new data from the sender, + where RMSS is the Maximum Segment Size specified by the receiver to + the sender (or the default value of 536 bytes, per [RFC1122], if the + receiver does not specify an MSS option during connection + + + +Allman, et al. Standards Track [Page 11] + +RFC 5681 TCP Congestion Control September 2009 + + + establishment). The sender may be forced to use a segment size less + than RMSS due to the maximum transmission unit (MTU), the path MTU + discovery algorithm or other factors. For instance, consider the + case when the receiver announces an RMSS of X bytes but the sender + ends up using a segment size of Y bytes (Y < X) due to path MTU + discovery (or the sender's MTU size). The receiver will generate + stretch ACKs if it waits for 2*X bytes to arrive before an ACK is + sent. Clearly this will take more than 2 segments of size Y bytes. + Therefore, while a specific algorithm is not defined, it is desirable + for receivers to attempt to prevent this situation, for example, by + acknowledging at least every second segment, regardless of size. + Finally, we repeat that an ACK MUST NOT be delayed for more than 500 + ms waiting on a second full-sized segment to arrive. + + Out-of-order data segments SHOULD be acknowledged immediately, in + order to accelerate loss recovery. To trigger the fast retransmit + algorithm, the receiver SHOULD send an immediate duplicate ACK when + it receives a data segment above a gap in the sequence space. To + provide feedback to senders recovering from losses, the receiver + SHOULD send an immediate ACK when it receives a data segment that + fills in all or part of a gap in the sequence space. + + A TCP receiver MUST NOT generate more than one ACK for every incoming + segment, other than to update the offered window as the receiving + application consumes new data (see [RFC813] and page 42 of [RFC793]). + +4.3. Loss Recovery Mechanisms + + A number of loss recovery algorithms that augment fast retransmit and + fast recovery have been suggested by TCP researchers and specified in + the RFC series. While some of these algorithms are based on the TCP + selective acknowledgment (SACK) option [RFC2018], such as [FF96], + [MM96a], [MM96b], and [RFC3517], others do not require SACKs, such as + [Hoe96], [FF96], and [RFC3782]. The non-SACK algorithms use "partial + acknowledgments" (ACKs that cover previously unacknowledged data, but + not all the data outstanding when loss was detected) to trigger + retransmissions. While this document does not standardize any of the + specific algorithms that may improve fast retransmit/fast recovery, + these enhanced algorithms are implicitly allowed, as long as they + follow the general principles of the basic four algorithms outlined + above. + + That is, when the first loss in a window of data is detected, + ssthresh MUST be set to no more than the value given by equation (4). + Second, until all lost segments in the window of data in question are + repaired, the number of segments transmitted in each RTT MUST be no + more than half the number of outstanding segments when the loss was + detected. Finally, after all loss in the given window of segments + + + +Allman, et al. Standards Track [Page 12] + +RFC 5681 TCP Congestion Control September 2009 + + + has been successfully retransmitted, cwnd MUST be set to no more than + ssthresh and congestion avoidance MUST be used to further increase + cwnd. Loss in two successive windows of data, or the loss of a + retransmission, should be taken as two indications of congestion and, + therefore, cwnd (and ssthresh) MUST be lowered twice in this case. + + We RECOMMEND that TCP implementors employ some form of advanced loss + recovery that can cope with multiple losses in a window of data. The + algorithms detailed in [RFC3782] and [RFC3517] conform to the general + principles outlined above. We note that while these are not the only + two algorithms that conform to the above general principles these two + algorithms have been vetted by the community and are currently on the + Standards Track. + +5. Security Considerations + + This document requires a TCP to diminish its sending rate in the + presence of retransmission timeouts and the arrival of duplicate + acknowledgments. An attacker can therefore impair the performance of + a TCP connection by either causing data packets or their + acknowledgments to be lost, or by forging excessive duplicate + acknowledgments. + + In response to the ACK division attack outlined in [SCWA99], this + document RECOMMENDS increasing the congestion window based on the + number of bytes newly acknowledged in each arriving ACK rather than + by a particular constant on each arriving ACK (as outlined in section + 3.1). + + The Internet, to a considerable degree, relies on the correct + implementation of these algorithms in order to preserve network + stability and avoid congestion collapse. An attacker could cause TCP + endpoints to respond more aggressively in the face of congestion by + forging excessive duplicate acknowledgments or excessive + acknowledgments for new data. Conceivably, such an attack could + drive a portion of the network into congestion collapse. + +6. Changes between RFC 2001 and RFC 2581 + + [RFC2001] was extensively rewritten editorially and it is not + feasible to itemize the list of changes between [RFC2001] and + [RFC2581]. The intention of [RFC2581] was to not change any of the + recommendations given in [RFC2001], but to further clarify cases that + were not discussed in detail in [RFC2001]. Specifically, [RFC2581] + suggested what TCP connections should do after a relatively long idle + period, as well as specified and clarified some of the issues + + + + + +Allman, et al. Standards Track [Page 13] + +RFC 5681 TCP Congestion Control September 2009 + + + pertaining to TCP ACK generation. Finally, the allowable upper bound + for the initial congestion window was raised from one to two + segments. + +7. Changes Relative to RFC 2581 + + A specific definition for "duplicate acknowledgment" has been added, + based on the definition used by BSD TCP. + + The document now notes that what to do with duplicate ACKs after the + retransmission timer has fired is future work and explicitly + unspecified in this document. + + The initial window requirements were changed to allow Larger Initial + Windows as standardized in [RFC3390]. Additionally, the steps to + take when an initial window is discovered to be too large due to Path + MTU Discovery [RFC1191] are detailed. + + The recommended initial value for ssthresh has been changed to say + that it SHOULD be arbitrarily high, where it was previously MAY. + This is to provide additional guidance to implementors on the matter. + + During slow start, the usage of Appropriate Byte Counting [RFC3465] + with L=1*SMSS is explicitly recommended. The method of increasing + cwnd given in [RFC2581] is still explicitly allowed. Byte counting + during congestion avoidance is also recommended, while the method + from [RFC2581] and other safe methods are still allowed. + + The treatment of ssthresh on retransmission timeout was clarified. + In particular, ssthresh must be set to half the FlightSize on the + first retransmission of a given segment and then is held constant on + subsequent retransmissions of the same segment. + + The description of fast retransmit and fast recovery has been + clarified, and the use of Limited Transmit [RFC3042] is now + recommended. + + TCPs now MAY limit the number of duplicate ACKs that artificially + inflate cwnd during loss recovery to the number of segments + outstanding to avoid the duplicate ACK spoofing attack described in + [SCWA99]. + + The restart window has been changed to min(IW,cwnd) from IW. This + behavior was described as "experimental" in [RFC2581]. + + It is now recommended that TCP implementors implement an advanced + loss recovery algorithm conforming to the principles outlined in this + document. + + + +Allman, et al. Standards Track [Page 14] + +RFC 5681 TCP Congestion Control September 2009 + + + The security considerations have been updated to discuss ACK division + and recommend byte counting as a counter to this attack. + +8. Acknowledgments + + The core algorithms we describe were developed by Van Jacobson + [Jac88, Jac90]. In addition, Limited Transmit [RFC3042] was + developed in conjunction with Hari Balakrishnan and Sally Floyd. The + initial congestion window size specified in this document is a result + of work with Sally Floyd and Craig Partridge [RFC2414, RFC3390]. + + W. Richard ("Rich") Stevens wrote the first version of this document + [RFC2001] and co-authored the second version [RFC2581]. This present + version much benefits from his clarity and thoughtfulness of + description, and we are grateful for Rich's contributions in + elucidating TCP congestion control, as well as in more broadly + helping us understand numerous issues relating to networking. + + We wish to emphasize that the shortcomings and mistakes of this + document are solely the responsibility of the current authors. + + Some of the text from this document is taken from "TCP/IP + Illustrated, Volume 1: The Protocols" by W. Richard Stevens + (Addison-Wesley, 1994) and "TCP/IP Illustrated, Volume 2: The + Implementation" by Gary R. Wright and W. Richard Stevens (Addison- + Wesley, 1995). This material is used with the permission of + Addison-Wesley. + + Anil Agarwal, Steve Arden, Neal Cardwell, Noritoshi Demizu, Gorry + Fairhurst, Kevin Fall, John Heffner, Alfred Hoenes, Sally Floyd, + Reiner Ludwig, Matt Mathis, Craig Partridge, and Joe Touch + contributed a number of helpful suggestions. + +9. References + +9.1. Normative References + + [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC + 793, September 1981. + + [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - + Communication Layers", STD 3, RFC 1122, October 1989. + + [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, + November 1990. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + + +Allman, et al. Standards Track [Page 15] + +RFC 5681 TCP Congestion Control September 2009 + + +9.2. Informative References + + [CJ89] Chiu, D. and R. Jain, "Analysis of the Increase/Decrease + Algorithms for Congestion Avoidance in Computer Networks", + Journal of Computer Networks and ISDN Systems, vol. 17, no. + 1, pp. 1-14, June 1989. + + [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of + Tahoe, Reno and SACK TCP", Computer Communication Review, + July 1996, ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z. + + [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion + Control Scheme for TCP", In ACM SIGCOMM, August 1996. + + [HTH98] Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP + Slow-Start Restart After Idle", Work in Progress, March + 1998. + + [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer + Communication Review, vol. 18, no. 4, pp. 314-329, Aug. + 1988. ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z. + + [Jac90] Jacobson, V., "Modified TCP Congestion Avoidance + Algorithm", end2end-interest mailing list, April 30, 1990. + ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail. + + [MM96a] Mathis, M. and J. Mahdavi, "Forward Acknowledgment: + Refining TCP Congestion Control", Proceedings of + SIGCOMM'96, August, 1996, Stanford, CA. Available from + http://www.psc.edu/networking/papers/papers.html + + [MM96b] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding + Parameters", Technical report. Available from + http://www.psc.edu/networking/papers/FACKnotes/current. + + [Pax97] Paxson, V., "End-to-End Internet Packet Dynamics", + Proceedings of SIGCOMM '97, Cannes, France, Sep. 1997. + + [RFC813] Clark, D., "Window and Acknowledgement Strategy in TCP", + RFC 813, July 1982. + + [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast + Retransmit, and Fast Recovery Algorithms", RFC 2001, + January 1997. + + [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP + Selective Acknowledgment Options", RFC 2018, October 1996. + + + + +Allman, et al. Standards Track [Page 16] + +RFC 5681 TCP Congestion Control September 2009 + + + [RFC2414] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's + Initial Window", RFC 2414, September 1998. + + [RFC2525] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, J., + Heavens, I., Lahey, K., Semke, J., and B. Volz, "Known TCP + Implementation Problems", RFC 2525, March 1999. + + [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion + Control", RFC 2581, April 1999. + + [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An + Extension to the Selective Acknowledgement (SACK) Option + for TCP", RFC 2883, July 2000. + + [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission + Timer", RFC 2988, November 2000. + + [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing + TCP's Loss Recovery Using Limited Transmit", RFC 3042, + January 2001. + + [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of + Explicit Congestion Notification (ECN) to IP", RFC 3168, + September 2001. + + [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's + Initial Window", RFC 3390, October 2002. + + [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte + Counting (ABC)", RFC 3465, February 2003. + + [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A + Conservative Selective Acknowledgment (SACK)-based Loss + Recovery Algorithm for TCP", RFC 3517, April 2003. + + [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno + Modification to TCP's Fast Recovery Algorithm", RFC 3782, + April 2004. + + [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU + Discovery", RFC 4821, March 2007. + + [SCWA99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, + "TCP Congestion Control With a Misbehaving Receiver", ACM + Computer Communication Review, 29(5), October 1999. + + [Ste94] Stevens, W., "TCP/IP Illustrated, Volume 1: The Protocols", + Addison-Wesley, 1994. + + + +Allman, et al. Standards Track [Page 17] + +RFC 5681 TCP Congestion Control September 2009 + + + [WS95] Wright, G. and W. Stevens, "TCP/IP Illustrated, Volume 2: + The Implementation", Addison-Wesley, 1995. + +Authors' Addresses + + Mark Allman + International Computer Science Institute (ICSI) + 1947 Center Street + Suite 600 + Berkeley, CA 94704-1198 + Phone: +1 440 235 1792 + EMail: mallman@icir.org + http://www.icir.org/mallman/ + + + Vern Paxson + International Computer Science Institute (ICSI) + 1947 Center Street + Suite 600 + Berkeley, CA 94704-1198 + Phone: +1 510/642-4274 x302 + EMail: vern@icir.org + http://www.icir.org/vern/ + + + Ethan Blanton + Purdue University Computer Sciences + 305 North University Street + West Lafayette, IN 47907 + EMail: eblanton@cs.purdue.edu + http://www.cs.purdue.edu/homes/eblanton/ + + + + + + + + + + + + + + + + + + + + +Allman, et al. Standards Track [Page 18] + -- cgit v1.2.3