summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc5681.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc5681.txt')
-rw-r--r--doc/rfc/rfc5681.txt1011
1 files changed, 1011 insertions, 0 deletions
diff --git a/doc/rfc/rfc5681.txt b/doc/rfc/rfc5681.txt
new file mode 100644
index 0000000..07b414f
--- /dev/null
+++ b/doc/rfc/rfc5681.txt
@@ -0,0 +1,1011 @@
+
+
+
+
+
+
+Network Working Group M. Allman
+Request for Comments: 5681 V. Paxson
+Obsoletes: 2581 ICSI
+Category: Standards Track E. Blanton
+ Purdue University
+ September 2009
+
+
+ TCP Congestion Control
+
+Abstract
+
+ This document defines TCP's four intertwined congestion control
+ algorithms: slow start, congestion avoidance, fast retransmit, and
+ fast recovery. In addition, the document specifies how TCP should
+ begin transmission after a relatively long idle period, as well as
+ discussing various acknowledgment generation methods. This document
+ obsoletes RFC 2581.
+
+Status of This Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (c) 2009 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents in effect on the date of
+ publication of this document (http://trustee.ietf.org/license-info).
+ Please review these documents carefully, as they describe your rights
+ and restrictions with respect to this document.
+
+ This document may contain material from IETF Documents or IETF
+ Contributions published or made publicly available before November
+ 10, 2008. The person(s) controlling the copyright in some of this
+ material may not have granted the IETF Trust the right to allow
+ modifications of such material outside the IETF Standards Process.
+ Without obtaining an adequate license from the person(s) controlling
+ the copyright in such materials, this document may not be modified
+ outside the IETF Standards Process, and derivative works of it may
+
+
+
+
+
+Allman, et al. Standards Track [Page 1]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ not be created outside the IETF Standards Process, except to format
+ it for publication as an RFC or to translate it into languages other
+ than English.
+
+Table Of Contents
+
+ 1. Introduction ....................................................2
+ 2. Definitions .....................................................3
+ 3. Congestion Control Algorithms ...................................4
+ 3.1. Slow Start and Congestion Avoidance ........................4
+ 3.2. Fast Retransmit/Fast Recovery ..............................8
+ 4. Additional Considerations ......................................10
+ 4.1. Restarting Idle Connections ...............................10
+ 4.2. Generating Acknowledgments ................................11
+ 4.3. Loss Recovery Mechanisms ..................................12
+ 5. Security Considerations ........................................13
+ 6. Changes between RFC 2001 and RFC 2581 ..........................13
+ 7. Changes Relative to RFC 2581 ...................................14
+ 8. Acknowledgments ................................................15
+ 9. References .....................................................15
+ 9.1. Normative References ......................................15
+ 9.2. Informative References ....................................16
+
+1. Introduction
+
+ This document specifies four TCP [RFC793] congestion control
+ algorithms: slow start, congestion avoidance, fast retransmit and
+ fast recovery. These algorithms were devised in [Jac88] and [Jac90].
+ Their use with TCP is standardized in [RFC1122]. Additional early
+ work in additive-increase, multiplicative-decrease congestion control
+ is given in [CJ89].
+
+ Note that [Ste94] provides examples of these algorithms in action and
+ [WS95] provides an explanation of the source code for the BSD
+ implementation of these algorithms.
+
+ In addition to specifying these congestion control algorithms, this
+ document specifies what TCP connections should do after a relatively
+ long idle period, as well as specifying and clarifying some of the
+ issues pertaining to TCP ACK generation.
+
+ This document obsoletes [RFC2581], which in turn obsoleted [RFC2001].
+
+ This document is organized as follows. Section 2 provides various
+ definitions that will be used throughout the document. Section 3
+ provides a specification of the congestion control algorithms.
+ Section 4 outlines concerns related to the congestion control
+ algorithms and finally, section 5 outlines security considerations.
+
+
+
+Allman, et al. Standards Track [Page 2]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [RFC2119].
+
+2. Definitions
+
+ This section provides the definition of several terms that will be
+ used throughout the remainder of this document.
+
+ SEGMENT: A segment is ANY TCP/IP data or acknowledgment packet (or
+ both).
+
+ SENDER MAXIMUM SEGMENT SIZE (SMSS): The SMSS is the size of the
+ largest segment that the sender can transmit. This value can be
+ based on the maximum transmission unit of the network, the path
+ MTU discovery [RFC1191, RFC4821] algorithm, RMSS (see next item),
+ or other factors. The size does not include the TCP/IP headers
+ and options.
+
+ RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The RMSS is the size of the
+ largest segment the receiver is willing to accept. This is the
+ value specified in the MSS option sent by the receiver during
+ connection startup. Or, if the MSS option is not used, it is 536
+ bytes [RFC1122]. The size does not include the TCP/IP headers and
+ options.
+
+ FULL-SIZED SEGMENT: A segment that contains the maximum number of
+ data bytes permitted (i.e., a segment containing SMSS bytes of
+ data).
+
+ RECEIVER WINDOW (rwnd): The most recently advertised receiver window.
+
+ CONGESTION WINDOW (cwnd): A TCP state variable that limits the amount
+ of data a TCP can send. At any given time, a TCP MUST NOT send
+ data with a sequence number higher than the sum of the highest
+ acknowledged sequence number and the minimum of cwnd and rwnd.
+
+ INITIAL WINDOW (IW): The initial window is the size of the sender's
+ congestion window after the three-way handshake is completed.
+
+ LOSS WINDOW (LW): The loss window is the size of the congestion
+ window after a TCP sender detects loss using its retransmission
+ timer.
+
+ RESTART WINDOW (RW): The restart window is the size of the congestion
+ window after a TCP restarts transmission after an idle period (if
+ the slow start algorithm is used; see section 4.1 for more
+ discussion).
+
+
+
+Allman, et al. Standards Track [Page 3]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ FLIGHT SIZE: The amount of data that has been sent but not yet
+ cumulatively acknowledged.
+
+ DUPLICATE ACKNOWLEDGMENT: An acknowledgment is considered a
+ "duplicate" in the following algorithms when (a) the receiver of
+ the ACK has outstanding data, (b) the incoming acknowledgment
+ carries no data, (c) the SYN and FIN bits are both off, (d) the
+ acknowledgment number is equal to the greatest acknowledgment
+ received on the given connection (TCP.UNA from [RFC793]) and (e)
+ the advertised window in the incoming acknowledgment equals the
+ advertised window in the last incoming acknowledgment.
+
+ Alternatively, a TCP that utilizes selective acknowledgments
+ (SACKs) [RFC2018, RFC2883] can leverage the SACK information to
+ determine when an incoming ACK is a "duplicate" (e.g., if the ACK
+ contains previously unknown SACK information).
+
+3. Congestion Control Algorithms
+
+ This section defines the four congestion control algorithms: slow
+ start, congestion avoidance, fast retransmit, and fast recovery,
+ developed in [Jac88] and [Jac90]. In some situations, it may be
+ beneficial for a TCP sender to be more conservative than the
+ algorithms allow; however, a TCP MUST NOT be more aggressive than the
+ following algorithms allow (that is, MUST NOT send data when the
+ value of cwnd computed by the following algorithms would not allow
+ the data to be sent).
+
+ Also, note that the algorithms specified in this document work in
+ terms of using loss as the signal of congestion. Explicit Congestion
+ Notification (ECN) could also be used as specified in [RFC3168].
+
+3.1. Slow Start and Congestion Avoidance
+
+ The slow start and congestion avoidance algorithms MUST be used by a
+ TCP sender to control the amount of outstanding data being injected
+ into the network. To implement these algorithms, two variables are
+ added to the TCP per-connection state. The congestion window (cwnd)
+ is a sender-side limit on the amount of data the sender can transmit
+ into the network before receiving an acknowledgment (ACK), while the
+ receiver's advertised window (rwnd) is a receiver-side limit on the
+ amount of outstanding data. The minimum of cwnd and rwnd governs
+ data transmission.
+
+ Another state variable, the slow start threshold (ssthresh), is used
+ to determine whether the slow start or congestion avoidance algorithm
+ is used to control data transmission, as discussed below.
+
+
+
+
+Allman, et al. Standards Track [Page 4]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ Beginning transmission into a network with unknown conditions
+ requires TCP to slowly probe the network to determine the available
+ capacity, in order to avoid congesting the network with an
+ inappropriately large burst of data. The slow start algorithm is
+ used for this purpose at the beginning of a transfer, or after
+ repairing loss detected by the retransmission timer. Slow start
+ additionally serves to start the "ACK clock" used by the TCP sender
+ to release data into the network in the slow start, congestion
+ avoidance, and loss recovery algorithms.
+
+ IW, the initial value of cwnd, MUST be set using the following
+ guidelines as an upper bound.
+
+ If SMSS > 2190 bytes:
+ IW = 2 * SMSS bytes and MUST NOT be more than 2 segments
+ If (SMSS > 1095 bytes) and (SMSS <= 2190 bytes):
+ IW = 3 * SMSS bytes and MUST NOT be more than 3 segments
+ if SMSS <= 1095 bytes:
+ IW = 4 * SMSS bytes and MUST NOT be more than 4 segments
+
+ As specified in [RFC3390], the SYN/ACK and the acknowledgment of the
+ SYN/ACK MUST NOT increase the size of the congestion window.
+ Further, if the SYN or SYN/ACK is lost, the initial window used by a
+ sender after a correctly transmitted SYN MUST be one segment
+ consisting of at most SMSS bytes.
+
+ A detailed rationale and discussion of the IW setting is provided in
+ [RFC3390].
+
+ When initial congestion windows of more than one segment are
+ implemented along with Path MTU Discovery [RFC1191], and the MSS
+ being used is found to be too large, the congestion window cwnd
+ SHOULD be reduced to prevent large bursts of smaller segments.
+ Specifically, cwnd SHOULD be reduced by the ratio of the old segment
+ size to the new segment size.
+
+ The initial value of ssthresh SHOULD be set arbitrarily high (e.g.,
+ to the size of the largest possible advertised window), but ssthresh
+ MUST be reduced in response to congestion. Setting ssthresh as high
+ as possible allows the network conditions, rather than some arbitrary
+ host limit, to dictate the sending rate. In cases where the end
+ systems have a solid understanding of the network path, more
+ carefully setting the initial ssthresh value may have merit (e.g.,
+ such that the end host does not create congestion along the path).
+
+
+
+
+
+
+
+Allman, et al. Standards Track [Page 5]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ The slow start algorithm is used when cwnd < ssthresh, while the
+ congestion avoidance algorithm is used when cwnd > ssthresh. When
+ cwnd and ssthresh are equal, the sender may use either slow start or
+ congestion avoidance.
+
+ During slow start, a TCP increments cwnd by at most SMSS bytes for
+ each ACK received that cumulatively acknowledges new data. Slow
+ start ends when cwnd exceeds ssthresh (or, optionally, when it
+ reaches it, as noted above) or when congestion is observed. While
+ traditionally TCP implementations have increased cwnd by precisely
+ SMSS bytes upon receipt of an ACK covering new data, we RECOMMEND
+ that TCP implementations increase cwnd, per:
+
+ cwnd += min (N, SMSS) (2)
+
+ where N is the number of previously unacknowledged bytes acknowledged
+ in the incoming ACK. This adjustment is part of Appropriate Byte
+ Counting [RFC3465] and provides robustness against misbehaving
+ receivers that may attempt to induce a sender to artificially inflate
+ cwnd using a mechanism known as "ACK Division" [SCWA99]. ACK
+ Division consists of a receiver sending multiple ACKs for a single
+ TCP data segment, each acknowledging only a portion of its data. A
+ TCP that increments cwnd by SMSS for each such ACK will
+ inappropriately inflate the amount of data injected into the network.
+
+ During congestion avoidance, cwnd is incremented by roughly 1 full-
+ sized segment per round-trip time (RTT). Congestion avoidance
+ continues until congestion is detected. The basic guidelines for
+ incrementing cwnd during congestion avoidance are:
+
+ * MAY increment cwnd by SMSS bytes
+
+ * SHOULD increment cwnd per equation (2) once per RTT
+
+ * MUST NOT increment cwnd by more than SMSS bytes
+
+ We note that [RFC3465] allows for cwnd increases of more than SMSS
+ bytes for incoming acknowledgments during slow start on an
+ experimental basis; however, such behavior is not allowed as part of
+ the standard.
+
+ The RECOMMENDED way to increase cwnd during congestion avoidance is
+ to count the number of bytes that have been acknowledged by ACKs for
+ new data. (A drawback of this implementation is that it requires
+ maintaining an additional state variable.) When the number of bytes
+ acknowledged reaches cwnd, then cwnd can be incremented by up to SMSS
+ bytes. Note that during congestion avoidance, cwnd MUST NOT be
+
+
+
+
+Allman, et al. Standards Track [Page 6]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ increased by more than SMSS bytes per RTT. This method both allows
+ TCPs to increase cwnd by one segment per RTT in the face of delayed
+ ACKs and provides robustness against ACK Division attacks.
+
+ Another common formula that a TCP MAY use to update cwnd during
+ congestion avoidance is given in equation (3):
+
+ cwnd += SMSS*SMSS/cwnd (3)
+
+ This adjustment is executed on every incoming ACK that acknowledges
+ new data. Equation (3) provides an acceptable approximation to the
+ underlying principle of increasing cwnd by 1 full-sized segment per
+ RTT. (Note that for a connection in which the receiver is
+ acknowledging every-other packet, (3) is less aggressive than allowed
+ -- roughly increasing cwnd every second RTT.)
+
+ Implementation Note: Since integer arithmetic is usually used in TCP
+ implementations, the formula given in equation (3) can fail to
+ increase cwnd when the congestion window is larger than SMSS*SMSS.
+ If the above formula yields 0, the result SHOULD be rounded up to 1
+ byte.
+
+ Implementation Note: Older implementations have an additional
+ additive constant on the right-hand side of equation (3). This is
+ incorrect and can actually lead to diminished performance [RFC2525].
+
+ Implementation Note: Some implementations maintain cwnd in units of
+ bytes, while others in units of full-sized segments. The latter will
+ find equation (3) difficult to use, and may prefer to use the
+ counting approach discussed in the previous paragraph.
+
+ When a TCP sender detects segment loss using the retransmission timer
+ and the given segment has not yet been resent by way of the
+ retransmission timer, the value of ssthresh MUST be set to no more
+ than the value given in equation (4):
+
+ ssthresh = max (FlightSize / 2, 2*SMSS) (4)
+
+ where, as discussed above, FlightSize is the amount of outstanding
+ data in the network.
+
+ On the other hand, when a TCP sender detects segment loss using the
+ retransmission timer and the given segment has already been
+ retransmitted by way of the retransmission timer at least once, the
+ value of ssthresh is held constant.
+
+
+
+
+
+
+Allman, et al. Standards Track [Page 7]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ Implementation Note: An easy mistake to make is to simply use cwnd,
+ rather than FlightSize, which in some implementations may
+ incidentally increase well beyond rwnd.
+
+ Furthermore, upon a timeout (as specified in [RFC2988]) cwnd MUST be
+ set to no more than the loss window, LW, which equals 1 full-sized
+ segment (regardless of the value of IW). Therefore, after
+ retransmitting the dropped segment the TCP sender uses the slow start
+ algorithm to increase the window from 1 full-sized segment to the new
+ value of ssthresh, at which point congestion avoidance again takes
+ over.
+
+ As shown in [FF96] and [RFC3782], slow-start-based loss recovery
+ after a timeout can cause spurious retransmissions that trigger
+ duplicate acknowledgments. The reaction to the arrival of these
+ duplicate ACKs in TCP implementations varies widely. This document
+ does not specify how to treat such acknowledgments, but does note
+ this as an area that may benefit from additional attention,
+ experimentation and specification.
+
+3.2. Fast Retransmit/Fast Recovery
+
+ A TCP receiver SHOULD send an immediate duplicate ACK when an out-
+ of-order segment arrives. The purpose of this ACK is to inform the
+ sender that a segment was received out-of-order and which sequence
+ number is expected. From the sender's perspective, duplicate ACKs
+ can be caused by a number of network problems. First, they can be
+ caused by dropped segments. In this case, all segments after the
+ dropped segment will trigger duplicate ACKs until the loss is
+ repaired. Second, duplicate ACKs can be caused by the re-ordering of
+ data segments by the network (not a rare event along some network
+ paths [Pax97]). Finally, duplicate ACKs can be caused by replication
+ of ACK or data segments by the network. In addition, a TCP receiver
+ SHOULD send an immediate ACK when the incoming segment fills in all
+ or part of a gap in the sequence space. This will generate more
+ timely information for a sender recovering from a loss through a
+ retransmission timeout, a fast retransmit, or an advanced loss
+ recovery algorithm, as outlined in section 4.3.
+
+ The TCP sender SHOULD use the "fast retransmit" algorithm to detect
+ and repair loss, based on incoming duplicate ACKs. The fast
+ retransmit algorithm uses the arrival of 3 duplicate ACKs (as defined
+ in section 2, without any intervening ACKs which move SND.UNA) as an
+ indication that a segment has been lost. After receiving 3 duplicate
+ ACKs, TCP performs a retransmission of what appears to be the missing
+ segment, without waiting for the retransmission timer to expire.
+
+
+
+
+
+Allman, et al. Standards Track [Page 8]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ After the fast retransmit algorithm sends what appears to be the
+ missing segment, the "fast recovery" algorithm governs the
+ transmission of new data until a non-duplicate ACK arrives. The
+ reason for not performing slow start is that the receipt of the
+ duplicate ACKs not only indicates that a segment has been lost, but
+ also that segments are most likely leaving the network (although a
+ massive segment duplication by the network can invalidate this
+ conclusion). In other words, since the receiver can only generate a
+ duplicate ACK when a segment has arrived, that segment has left the
+ network and is in the receiver's buffer, so we know it is no longer
+ consuming network resources. Furthermore, since the ACK "clock"
+ [Jac88] is preserved, the TCP sender can continue to transmit new
+ segments (although transmission must continue using a reduced cwnd,
+ since loss is an indication of congestion).
+
+ The fast retransmit and fast recovery algorithms are implemented
+ together as follows.
+
+ 1. On the first and second duplicate ACKs received at a sender, a
+ TCP SHOULD send a segment of previously unsent data per [RFC3042]
+ provided that the receiver's advertised window allows, the total
+ FlightSize would remain less than or equal to cwnd plus 2*SMSS,
+ and that new data is available for transmission. Further, the
+ TCP sender MUST NOT change cwnd to reflect these two segments
+ [RFC3042]. Note that a sender using SACK [RFC2018] MUST NOT send
+ new data unless the incoming duplicate acknowledgment contains
+ new SACK information.
+
+ 2. When the third duplicate ACK is received, a TCP MUST set ssthresh
+ to no more than the value given in equation (4). When [RFC3042]
+ is in use, additional data sent in limited transmit MUST NOT be
+ included in this calculation.
+
+ 3. The lost segment starting at SND.UNA MUST be retransmitted and
+ cwnd set to ssthresh plus 3*SMSS. This artificially "inflates"
+ the congestion window by the number of segments (three) that have
+ left the network and which the receiver has buffered.
+
+ 4. For each additional duplicate ACK received (after the third),
+ cwnd MUST be incremented by SMSS. This artificially inflates the
+ congestion window in order to reflect the additional segment that
+ has left the network.
+
+ Note: [SCWA99] discusses a receiver-based attack whereby many
+ bogus duplicate ACKs are sent to the data sender in order to
+ artificially inflate cwnd and cause a higher than appropriate
+
+
+
+
+
+Allman, et al. Standards Track [Page 9]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ sending rate to be used. A TCP MAY therefore limit the number of
+ times cwnd is artificially inflated during loss recovery to the
+ number of outstanding segments (or, an approximation thereof).
+
+ Note: When an advanced loss recovery mechanism (such as outlined
+ in section 4.3) is not in use, this increase in FlightSize can
+ cause equation (4) to slightly inflate cwnd and ssthresh, as some
+ of the segments between SND.UNA and SND.NXT are assumed to have
+ left the network but are still reflected in FlightSize.
+
+ 5. When previously unsent data is available and the new value of
+ cwnd and the receiver's advertised window allow, a TCP SHOULD
+ send 1*SMSS bytes of previously unsent data.
+
+ 6. When the next ACK arrives that acknowledges previously
+ unacknowledged data, a TCP MUST set cwnd to ssthresh (the value
+ set in step 2). This is termed "deflating" the window.
+
+ This ACK should be the acknowledgment elicited by the
+ retransmission from step 3, one RTT after the retransmission
+ (though it may arrive sooner in the presence of significant out-
+ of-order delivery of data segments at the receiver).
+ Additionally, this ACK should acknowledge all the intermediate
+ segments sent between the lost segment and the receipt of the
+ third duplicate ACK, if none of these were lost.
+
+ Note: This algorithm is known to generally not recover efficiently
+ from multiple losses in a single flight of packets [FF96]. Section
+ 4.3 below addresses such cases.
+
+4. Additional Considerations
+
+4.1. Restarting Idle Connections
+
+ A known problem with the TCP congestion control algorithms described
+ above is that they allow a potentially inappropriate burst of traffic
+ to be transmitted after TCP has been idle for a relatively long
+ period of time. After an idle period, TCP cannot use the ACK clock
+ to strobe new segments into the network, as all the ACKs have drained
+ from the network. Therefore, as specified above, TCP can potentially
+ send a cwnd-size line-rate burst into the network after an idle
+ period. In addition, changing network conditions may have rendered
+ TCP's notion of the available end-to-end network capacity between two
+ endpoints, as estimated by cwnd, inaccurate during the course of a
+ long idle period.
+
+
+
+
+
+
+Allman, et al. Standards Track [Page 10]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ [Jac88] recommends that a TCP use slow start to restart transmission
+ after a relatively long idle period. Slow start serves to restart
+ the ACK clock, just as it does at the beginning of a transfer. This
+ mechanism has been widely deployed in the following manner. When TCP
+ has not received a segment for more than one retransmission timeout,
+ cwnd is reduced to the value of the restart window (RW) before
+ transmission begins.
+
+ For the purposes of this standard, we define RW = min(IW,cwnd).
+
+ Using the last time a segment was received to determine whether or
+ not to decrease cwnd can fail to deflate cwnd in the common case of
+ persistent HTTP connections [HTH98]. In this case, a Web server
+ receives a request before transmitting data to the Web client. The
+ reception of the request makes the test for an idle connection fail,
+ and allows the TCP to begin transmission with a possibly
+ inappropriately large cwnd.
+
+ Therefore, a TCP SHOULD set cwnd to no more than RW before beginning
+ transmission if the TCP has not sent data in an interval exceeding
+ the retransmission timeout.
+
+4.2. Generating Acknowledgments
+
+ The delayed ACK algorithm specified in [RFC1122] SHOULD be used by a
+ TCP receiver. When using delayed ACKs, a TCP receiver MUST NOT
+ excessively delay acknowledgments. Specifically, an ACK SHOULD be
+ generated for at least every second full-sized segment, and MUST be
+ generated within 500 ms of the arrival of the first unacknowledged
+ packet.
+
+ The requirement that an ACK "SHOULD" be generated for at least every
+ second full-sized segment is listed in [RFC1122] in one place as a
+ SHOULD and another as a MUST. Here we unambiguously state it is a
+ SHOULD. We also emphasize that this is a SHOULD, meaning that an
+ implementor should indeed only deviate from this requirement after
+ careful consideration of the implications. See the discussion of
+ "Stretch ACK violation" in [RFC2525] and the references therein for a
+ discussion of the possible performance problems with generating ACKs
+ less frequently than every second full-sized segment.
+
+ In some cases, the sender and receiver may not agree on what
+ constitutes a full-sized segment. An implementation is deemed to
+ comply with this requirement if it sends at least one acknowledgment
+ every time it receives 2*RMSS bytes of new data from the sender,
+ where RMSS is the Maximum Segment Size specified by the receiver to
+ the sender (or the default value of 536 bytes, per [RFC1122], if the
+ receiver does not specify an MSS option during connection
+
+
+
+Allman, et al. Standards Track [Page 11]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ establishment). The sender may be forced to use a segment size less
+ than RMSS due to the maximum transmission unit (MTU), the path MTU
+ discovery algorithm or other factors. For instance, consider the
+ case when the receiver announces an RMSS of X bytes but the sender
+ ends up using a segment size of Y bytes (Y < X) due to path MTU
+ discovery (or the sender's MTU size). The receiver will generate
+ stretch ACKs if it waits for 2*X bytes to arrive before an ACK is
+ sent. Clearly this will take more than 2 segments of size Y bytes.
+ Therefore, while a specific algorithm is not defined, it is desirable
+ for receivers to attempt to prevent this situation, for example, by
+ acknowledging at least every second segment, regardless of size.
+ Finally, we repeat that an ACK MUST NOT be delayed for more than 500
+ ms waiting on a second full-sized segment to arrive.
+
+ Out-of-order data segments SHOULD be acknowledged immediately, in
+ order to accelerate loss recovery. To trigger the fast retransmit
+ algorithm, the receiver SHOULD send an immediate duplicate ACK when
+ it receives a data segment above a gap in the sequence space. To
+ provide feedback to senders recovering from losses, the receiver
+ SHOULD send an immediate ACK when it receives a data segment that
+ fills in all or part of a gap in the sequence space.
+
+ A TCP receiver MUST NOT generate more than one ACK for every incoming
+ segment, other than to update the offered window as the receiving
+ application consumes new data (see [RFC813] and page 42 of [RFC793]).
+
+4.3. Loss Recovery Mechanisms
+
+ A number of loss recovery algorithms that augment fast retransmit and
+ fast recovery have been suggested by TCP researchers and specified in
+ the RFC series. While some of these algorithms are based on the TCP
+ selective acknowledgment (SACK) option [RFC2018], such as [FF96],
+ [MM96a], [MM96b], and [RFC3517], others do not require SACKs, such as
+ [Hoe96], [FF96], and [RFC3782]. The non-SACK algorithms use "partial
+ acknowledgments" (ACKs that cover previously unacknowledged data, but
+ not all the data outstanding when loss was detected) to trigger
+ retransmissions. While this document does not standardize any of the
+ specific algorithms that may improve fast retransmit/fast recovery,
+ these enhanced algorithms are implicitly allowed, as long as they
+ follow the general principles of the basic four algorithms outlined
+ above.
+
+ That is, when the first loss in a window of data is detected,
+ ssthresh MUST be set to no more than the value given by equation (4).
+ Second, until all lost segments in the window of data in question are
+ repaired, the number of segments transmitted in each RTT MUST be no
+ more than half the number of outstanding segments when the loss was
+ detected. Finally, after all loss in the given window of segments
+
+
+
+Allman, et al. Standards Track [Page 12]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ has been successfully retransmitted, cwnd MUST be set to no more than
+ ssthresh and congestion avoidance MUST be used to further increase
+ cwnd. Loss in two successive windows of data, or the loss of a
+ retransmission, should be taken as two indications of congestion and,
+ therefore, cwnd (and ssthresh) MUST be lowered twice in this case.
+
+ We RECOMMEND that TCP implementors employ some form of advanced loss
+ recovery that can cope with multiple losses in a window of data. The
+ algorithms detailed in [RFC3782] and [RFC3517] conform to the general
+ principles outlined above. We note that while these are not the only
+ two algorithms that conform to the above general principles these two
+ algorithms have been vetted by the community and are currently on the
+ Standards Track.
+
+5. Security Considerations
+
+ This document requires a TCP to diminish its sending rate in the
+ presence of retransmission timeouts and the arrival of duplicate
+ acknowledgments. An attacker can therefore impair the performance of
+ a TCP connection by either causing data packets or their
+ acknowledgments to be lost, or by forging excessive duplicate
+ acknowledgments.
+
+ In response to the ACK division attack outlined in [SCWA99], this
+ document RECOMMENDS increasing the congestion window based on the
+ number of bytes newly acknowledged in each arriving ACK rather than
+ by a particular constant on each arriving ACK (as outlined in section
+ 3.1).
+
+ The Internet, to a considerable degree, relies on the correct
+ implementation of these algorithms in order to preserve network
+ stability and avoid congestion collapse. An attacker could cause TCP
+ endpoints to respond more aggressively in the face of congestion by
+ forging excessive duplicate acknowledgments or excessive
+ acknowledgments for new data. Conceivably, such an attack could
+ drive a portion of the network into congestion collapse.
+
+6. Changes between RFC 2001 and RFC 2581
+
+ [RFC2001] was extensively rewritten editorially and it is not
+ feasible to itemize the list of changes between [RFC2001] and
+ [RFC2581]. The intention of [RFC2581] was to not change any of the
+ recommendations given in [RFC2001], but to further clarify cases that
+ were not discussed in detail in [RFC2001]. Specifically, [RFC2581]
+ suggested what TCP connections should do after a relatively long idle
+ period, as well as specified and clarified some of the issues
+
+
+
+
+
+Allman, et al. Standards Track [Page 13]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ pertaining to TCP ACK generation. Finally, the allowable upper bound
+ for the initial congestion window was raised from one to two
+ segments.
+
+7. Changes Relative to RFC 2581
+
+ A specific definition for "duplicate acknowledgment" has been added,
+ based on the definition used by BSD TCP.
+
+ The document now notes that what to do with duplicate ACKs after the
+ retransmission timer has fired is future work and explicitly
+ unspecified in this document.
+
+ The initial window requirements were changed to allow Larger Initial
+ Windows as standardized in [RFC3390]. Additionally, the steps to
+ take when an initial window is discovered to be too large due to Path
+ MTU Discovery [RFC1191] are detailed.
+
+ The recommended initial value for ssthresh has been changed to say
+ that it SHOULD be arbitrarily high, where it was previously MAY.
+ This is to provide additional guidance to implementors on the matter.
+
+ During slow start, the usage of Appropriate Byte Counting [RFC3465]
+ with L=1*SMSS is explicitly recommended. The method of increasing
+ cwnd given in [RFC2581] is still explicitly allowed. Byte counting
+ during congestion avoidance is also recommended, while the method
+ from [RFC2581] and other safe methods are still allowed.
+
+ The treatment of ssthresh on retransmission timeout was clarified.
+ In particular, ssthresh must be set to half the FlightSize on the
+ first retransmission of a given segment and then is held constant on
+ subsequent retransmissions of the same segment.
+
+ The description of fast retransmit and fast recovery has been
+ clarified, and the use of Limited Transmit [RFC3042] is now
+ recommended.
+
+ TCPs now MAY limit the number of duplicate ACKs that artificially
+ inflate cwnd during loss recovery to the number of segments
+ outstanding to avoid the duplicate ACK spoofing attack described in
+ [SCWA99].
+
+ The restart window has been changed to min(IW,cwnd) from IW. This
+ behavior was described as "experimental" in [RFC2581].
+
+ It is now recommended that TCP implementors implement an advanced
+ loss recovery algorithm conforming to the principles outlined in this
+ document.
+
+
+
+Allman, et al. Standards Track [Page 14]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ The security considerations have been updated to discuss ACK division
+ and recommend byte counting as a counter to this attack.
+
+8. Acknowledgments
+
+ The core algorithms we describe were developed by Van Jacobson
+ [Jac88, Jac90]. In addition, Limited Transmit [RFC3042] was
+ developed in conjunction with Hari Balakrishnan and Sally Floyd. The
+ initial congestion window size specified in this document is a result
+ of work with Sally Floyd and Craig Partridge [RFC2414, RFC3390].
+
+ W. Richard ("Rich") Stevens wrote the first version of this document
+ [RFC2001] and co-authored the second version [RFC2581]. This present
+ version much benefits from his clarity and thoughtfulness of
+ description, and we are grateful for Rich's contributions in
+ elucidating TCP congestion control, as well as in more broadly
+ helping us understand numerous issues relating to networking.
+
+ We wish to emphasize that the shortcomings and mistakes of this
+ document are solely the responsibility of the current authors.
+
+ Some of the text from this document is taken from "TCP/IP
+ Illustrated, Volume 1: The Protocols" by W. Richard Stevens
+ (Addison-Wesley, 1994) and "TCP/IP Illustrated, Volume 2: The
+ Implementation" by Gary R. Wright and W. Richard Stevens (Addison-
+ Wesley, 1995). This material is used with the permission of
+ Addison-Wesley.
+
+ Anil Agarwal, Steve Arden, Neal Cardwell, Noritoshi Demizu, Gorry
+ Fairhurst, Kevin Fall, John Heffner, Alfred Hoenes, Sally Floyd,
+ Reiner Ludwig, Matt Mathis, Craig Partridge, and Joe Touch
+ contributed a number of helpful suggestions.
+
+9. References
+
+9.1. Normative References
+
+ [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC
+ 793, September 1981.
+
+ [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
+ Communication Layers", STD 3, RFC 1122, October 1989.
+
+ [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
+ November 1990.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+
+
+Allman, et al. Standards Track [Page 15]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+9.2. Informative References
+
+ [CJ89] Chiu, D. and R. Jain, "Analysis of the Increase/Decrease
+ Algorithms for Congestion Avoidance in Computer Networks",
+ Journal of Computer Networks and ISDN Systems, vol. 17, no.
+ 1, pp. 1-14, June 1989.
+
+ [FF96] Fall, K. and S. Floyd, "Simulation-based Comparisons of
+ Tahoe, Reno and SACK TCP", Computer Communication Review,
+ July 1996, ftp://ftp.ee.lbl.gov/papers/sacks.ps.Z.
+
+ [Hoe96] Hoe, J., "Improving the Start-up Behavior of a Congestion
+ Control Scheme for TCP", In ACM SIGCOMM, August 1996.
+
+ [HTH98] Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP
+ Slow-Start Restart After Idle", Work in Progress, March
+ 1998.
+
+ [Jac88] Jacobson, V., "Congestion Avoidance and Control", Computer
+ Communication Review, vol. 18, no. 4, pp. 314-329, Aug.
+ 1988. ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z.
+
+ [Jac90] Jacobson, V., "Modified TCP Congestion Avoidance
+ Algorithm", end2end-interest mailing list, April 30, 1990.
+ ftp://ftp.isi.edu/end2end/end2end-interest-1990.mail.
+
+ [MM96a] Mathis, M. and J. Mahdavi, "Forward Acknowledgment:
+ Refining TCP Congestion Control", Proceedings of
+ SIGCOMM'96, August, 1996, Stanford, CA. Available from
+ http://www.psc.edu/networking/papers/papers.html
+
+ [MM96b] Mathis, M. and J. Mahdavi, "TCP Rate-Halving with Bounding
+ Parameters", Technical report. Available from
+ http://www.psc.edu/networking/papers/FACKnotes/current.
+
+ [Pax97] Paxson, V., "End-to-End Internet Packet Dynamics",
+ Proceedings of SIGCOMM '97, Cannes, France, Sep. 1997.
+
+ [RFC813] Clark, D., "Window and Acknowledgement Strategy in TCP",
+ RFC 813, July 1982.
+
+ [RFC2001] Stevens, W., "TCP Slow Start, Congestion Avoidance, Fast
+ Retransmit, and Fast Recovery Algorithms", RFC 2001,
+ January 1997.
+
+ [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
+ Selective Acknowledgment Options", RFC 2018, October 1996.
+
+
+
+
+Allman, et al. Standards Track [Page 16]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ [RFC2414] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
+ Initial Window", RFC 2414, September 1998.
+
+ [RFC2525] Paxson, V., Allman, M., Dawson, S., Fenner, W., Griner, J.,
+ Heavens, I., Lahey, K., Semke, J., and B. Volz, "Known TCP
+ Implementation Problems", RFC 2525, March 1999.
+
+ [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
+ Control", RFC 2581, April 1999.
+
+ [RFC2883] Floyd, S., Mahdavi, J., Mathis, M., and M. Podolsky, "An
+ Extension to the Selective Acknowledgement (SACK) Option
+ for TCP", RFC 2883, July 2000.
+
+ [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
+ Timer", RFC 2988, November 2000.
+
+ [RFC3042] Allman, M., Balakrishnan, H., and S. Floyd, "Enhancing
+ TCP's Loss Recovery Using Limited Transmit", RFC 3042,
+ January 2001.
+
+ [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of
+ Explicit Congestion Notification (ECN) to IP", RFC 3168,
+ September 2001.
+
+ [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
+ Initial Window", RFC 3390, October 2002.
+
+ [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte
+ Counting (ABC)", RFC 3465, February 2003.
+
+ [RFC3517] Blanton, E., Allman, M., Fall, K., and L. Wang, "A
+ Conservative Selective Acknowledgment (SACK)-based Loss
+ Recovery Algorithm for TCP", RFC 3517, April 2003.
+
+ [RFC3782] Floyd, S., Henderson, T., and A. Gurtov, "The NewReno
+ Modification to TCP's Fast Recovery Algorithm", RFC 3782,
+ April 2004.
+
+ [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU
+ Discovery", RFC 4821, March 2007.
+
+ [SCWA99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
+ "TCP Congestion Control With a Misbehaving Receiver", ACM
+ Computer Communication Review, 29(5), October 1999.
+
+ [Ste94] Stevens, W., "TCP/IP Illustrated, Volume 1: The Protocols",
+ Addison-Wesley, 1994.
+
+
+
+Allman, et al. Standards Track [Page 17]
+
+RFC 5681 TCP Congestion Control September 2009
+
+
+ [WS95] Wright, G. and W. Stevens, "TCP/IP Illustrated, Volume 2:
+ The Implementation", Addison-Wesley, 1995.
+
+Authors' Addresses
+
+ Mark Allman
+ International Computer Science Institute (ICSI)
+ 1947 Center Street
+ Suite 600
+ Berkeley, CA 94704-1198
+ Phone: +1 440 235 1792
+ EMail: mallman@icir.org
+ http://www.icir.org/mallman/
+
+
+ Vern Paxson
+ International Computer Science Institute (ICSI)
+ 1947 Center Street
+ Suite 600
+ Berkeley, CA 94704-1198
+ Phone: +1 510/642-4274 x302
+ EMail: vern@icir.org
+ http://www.icir.org/vern/
+
+
+ Ethan Blanton
+ Purdue University Computer Sciences
+ 305 North University Street
+ West Lafayette, IN 47907
+ EMail: eblanton@cs.purdue.edu
+ http://www.cs.purdue.edu/homes/eblanton/
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allman, et al. Standards Track [Page 18]
+