diff options
Diffstat (limited to 'doc/rfc/rfc2861.txt')
-rw-r--r-- | doc/rfc/rfc2861.txt | 619 |
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc2861.txt b/doc/rfc/rfc2861.txt new file mode 100644 index 0000000..e5a4998 --- /dev/null +++ b/doc/rfc/rfc2861.txt @@ -0,0 +1,619 @@ + + + + + + +Network Working Group M. Handley +Request for Comments: 2861 J. Padhye +Category: Experimental S. Floyd + ACIRI + June 2000 + + + TCP Congestion Window Validation + +Status of this Memo + + This memo defines an Experimental Protocol for the Internet + community. It does not specify an Internet standard of any kind. + Discussion and suggestions for improvement are requested. + Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2000). All Rights Reserved. + +Abstract + + TCP's congestion window controls the number of packets a TCP flow may + have in the network at any time. However, long periods when the + sender is idle or application-limited can lead to the invalidation of + the congestion window, in that the congestion window no longer + reflects current information about the state of the network. This + document describes a simple modification to TCP's congestion control + algorithms to decay the congestion window cwnd after the transition + from a sufficiently-long application-limited period, while using the + slow-start threshold ssthresh to save information about the previous + value of the congestion window. + + An invalid congestion window also results when the congestion window + is increased (i.e., in TCP's slow-start or congestion avoidance + phases) during application-limited periods, when the previous value + of the congestion window might never have been fully utilized. We + propose that the TCP sender should not increase the congestion window + when the TCP sender has been application-limited (and therefore has + not fully used the current congestion window). We have explored + these algorithms both with simulations and with experiments from an + implementation in FreeBSD. + +1. Conventions and Acronyms + + The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, + SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this + document, are to be interpreted as described in [B97]. + + + +Handley, et al. Experimental [Page 1] + +RFC 2861 TCP Congestion Window Validation June 2000 + + +2. Introduction + + TCP's congestion window controls the number of packets a TCP flow may + have in the network at any time. The congestion window is set using + an Additive-Increase, Multiplicative-Decrease (AIMD) mechanism that + probes for available bandwidth, dynamically adapting to changing + network conditions. This AIMD mechanism works well when the sender + continually has data to send, as is typically the case for TCP used + for bulk-data transfer. In contrast, for TCP used with telnet + applications, the data sender often has little or no data to send, + and the sending rate is often determined by the rate at which data is + generated by the user. With the advent of the web, including + developments such as TCP senders with dynamically-created data and + HTTP 1.1 with persistent-connection TCP, the interaction between + application-limited periods (when the sender sends less than is + allowed by the congestion or receiver windows) and network-limited + periods (when the sender is limited by the TCP window) becomes + increasingly important. More precisely, we define a network-limited + period as any period when the sender is sending a full window of + data. + + Long periods when the sender is application-limited can lead to the + invalidation of the congestion window. During periods when the TCP + sender is network-limited, the value of the congestion window is + repeatedly "revalidated" by the successful transmission of a window + of data without loss. When the TCP sender is network-limited, there + is an incoming stream of acknowledgements that "clocks out" new data, + giving concrete evidence of recent available bandwidth in the + network. In contrast, during periods when the TCP sender is + application-limited, the estimate of available capacity represented + by the congestion window may become steadily less accurate over time. + In particular, capacity that had once been used by the network- + limited connection might now be used by other traffic. + + Current TCP implementations have a range of behaviors for starting up + after an idle period. Some current TCP implementations slow-start + after an idle period longer than the RTO estimate, as suggested in + [RFC2581] and in the appendix of [VJ88], while other implementations + don't reduce their congestion window after an idle period. RFC 2581 + [RFC2581] recommends the following: "a TCP SHOULD set cwnd to no more + than RW [the initial window] before beginning transmission if the TCP + has not sent data in an interval exceeding the retransmission + timeout." A proposal for TCP's slow-start after idle has also been + discussed in [HTH98]. The issue of validation of congestion + information during idle periods has also been addressed in contexts + other than TCP and IP, for example in "Use-it or Lose-it" mechanisms + for ATM networks [J96,J95]. + + + + +Handley, et al. Experimental [Page 2] + +RFC 2861 TCP Congestion Window Validation June 2000 + + + To address the revalidation of the congestion window after a + application-limited period, we propose a simple modification to TCP's + congestion control algorithms to decay the congestion window cwnd + after the transition from a sufficiently-long application-limited + period (i.e., at least one roundtrip time) to a network-limited + period. In particular, we propose that after an idle period, the TCP + sender should reduce its congestion window by half for every RTT that + the flow has remained idle. + + When the congestion window is reduced, the slow-start threshold + ssthresh remains as "memory" of the recent congestion window. + Specifically, ssthresh is never decreased when cwnd is reduced after + an application-limited period; before cwnd is reduced, ssthresh is + set to the maximum of its current value, and half-way between the old + and the new values of cwnd. This use of ssthresh allows a TCP sender + increasing its sending rate after an application-limited period to + quickly slow-start to recover most of the previous value of the + congestion window. To be more precise, if ssthresh is less than 3/4 + cwnd when the congestion window is reduced after an application- + limited period, then ssthresh is increased to 3/4 cwnd before the + reduction of the congestion window. + + An invalid congestion window also results when the congestion window + is increased (i.e., in TCP's slow-start or congestion avoidance + phases) during application-limited periods, when the previous value + of the congestion window might never have been fully utilized. As + far as we know, all current TCP implementations increase the + congestion window when an acknowledgement arrives, if allowed by the + receiver's advertised window and the slow-start or congestion + avoidance window increase algorithm, without checking to see if the + previous value of the congestion window has in fact been used. This + document proposes that the window increase algorithm not be invoked + during application-limited periods [MSML99]. In particular, the TCP + sender should not increase the congestion window when the TCP sender + has been application-limited (and therefore has not fully used the + current congestion window). This restriction prevents the congestion + window from growing arbitrarily large, in the absence of evidence + that the congestion window can be supported by the network. From + [MSML99, Section 5.2]: "This restriction assures that [cwnd] only + grows as long as TCP actually succeeds in injecting enough data into + the network to test the path." + + A somewhat-orthogonal problem associated with maintaining a large + congestion window after an application-limited period is that the + sender, with a sudden large amount of data to send after a quiescent + period, might immediately send a full congestion window of back-to- + back packets. This problem of sending large bursts of packets back- + to-back can be effectively handled using rate-based pacing (RBP, + + + +Handley, et al. Experimental [Page 3] + +RFC 2861 TCP Congestion Window Validation June 2000 + + + [VH97]), or using a maximum burst size control [FF96]. We would + contend that, even with mechanisms for limiting the sending of back- + to-back packets or pacing packets out over the period of a roundtrip + time, an old congestion window that has not been fully used for some + time can not be trusted as an indication of the bandwidth currently + available for that flow. We would contend that the mechanisms to + pace out packets allowed by the congestion window are largely + orthogonal to the algorithms used to determine the appropriate size + of the congestion window. + +3. Description + + When a TCP sender has sufficient data available to fill the available + network capacity for that flow, cwnd and ssthresh get set to + appropriate values for the network conditions. When a TCP sender + stops sending, the flow stops sampling the network conditions, and so + the value of the congestion window may become inaccurate. We believe + the correct conservative behavior under these circumstances is to + decay the congestion window by half for every RTT that the flow + remains inactive. The value of half is a very conservative figure + based on how quickly multiplicative decrease would have decayed the + window in the presence of loss. + + Another possibility is that the sender may not stop sending, but may + become application-limited rather than network-limited, and offer + less data to the network than the congestion window allows to be + sent. In this case the TCP flow is still sampling network + conditions, but is not offering sufficient traffic to be sure that + there is still sufficient capacity in the network for that flow to + send a full congestion window. Under these circumstances we believe + the correct conservative behavior is for the sender to keep track of + the maximum amount of the congestion window used during each RTT, and + to decay the congestion window each RTT to midway between the current + cwnd value and the maximum value used. + + Before the congestion window is reduced, ssthresh is set to the + maximum of its current value and 3/4 cwnd. If the sender then has + more data to send than the decayed cwnd allows, the TCP will slow- + start (perform exponential increase) at least half-way back up to the + old value of cwnd. + + The justification for this value of "3/4 cwnd" is that 3/4 cwnd is a + conservative estimate of the recent average value of the congestion + window, and the TCP should safely be able to slow-start at least up + to this point. For a TCP in steady-state that has been reducing its + congestion window each time the congestion window reached some + maximum value `maxwin', the average congestion window has been 3/4 + maxwin. On average, when the connection becomes application-limited, + + + +Handley, et al. Experimental [Page 4] + +RFC 2861 TCP Congestion Window Validation June 2000 + + + cwnd will be 3/4 maxwin, and in this case cwnd itself represents the + average value of the congestion window. However, if the connection + happens to become application-limited when cwnd equals maxwin, then + the average value of the congestion window is given by 3/4 cwnd. + + An alternate possibility would be to set ssthresh to the maximum of + the current value of ssthresh, and the old value of cwnd, allowing + TCP to slow-start all of the way back up to the old value of cwnd. + Further experimentation can be used to evaluate these two options for + setting ssthresh. + + For the separate issue of the increase of the congestion window in + response to an acknowledgement, we believe the correct behavior is + for the sender to increase the congestion window only if the window + was full when the acknowledgment arrived. + + We term this set of modifications to TCP Congestion Window Validation + (CWV) because they are related to ensuring the congestion window is + always a valid reflection of the current network state as probed by + the connection. + +3.1. The basic algorithm for reducing the congestion window + + A key issue in the CWV algorithm is to determine how to apply the + guideline of reducing the congestion window once for every roundtrip + time that the flow is application-limited. We use TCP's + retransmission timer (RTO) as a reasonable upper bound on the + roundtrip time, and reduce the congestion window roughly once per + RTO. + + This basic algorithm could be implemented in TCP as follows: When TCP + sends a new packet it checks to see if more than RTO seconds have + elapsed since the previous packet was sent. If RTO has elapsed, + ssthresh is set to the maximum of 3/4 cwnd and the current value of + ssthresh, and then the congestion window is halved for every RTO that + elapsed since the previous packet was sent. In addition, T_prev is + set to the current time, and W_used is reset to zero. T_prev will be + used to determine the elapsed time since the sender last was network- + limited or had reduced cwnd after an idle period. When the sender is + application-limited, W_used holds the maximum congestion window + actually used since the sender was last network-limited. + + The mechanism for determining the number of RTOs in the most recent + idle period could also be implemented by using a timer that expires + every RTO after the last packet was sent instead of a check per + packet - efficiency constraints on different operating systems may + dictate which is more efficient to implement. + + + + +Handley, et al. Experimental [Page 5] + +RFC 2861 TCP Congestion Window Validation June 2000 + + + After TCP sends a packet, it also checks to see if that packet filled + the congestion window. If so, the sender is network-limited, and + sets the variable T_prev to the current TCP clock time, and the + variable W_used to zero. + + When TCP sends a packet that does not fill the congestion window, and + the TCP send queue is empty, then the sender is application-limited. + The sender checks to see if the amount of unacknowledged data is + greater than W_used; if so, W_used is set to the amount of + unacknowledged data. In addition TCP checks to see if the elapsed + time since T_prev is greater than RTO. If so, then the TCP has not + just reduced its congestion window following an idle period. The TCP + has been application-limited rather than network-limited for at least + an entire RTO interval, but for less than two RTO intervals. In this + case, TCP sets ssthresh to the maximum of 3/4 cwnd and the current + value of ssthresh, and reduces its congestion window to + (cwnd+W_used)/2. W_used is then set to zero, and T_prev is set to + the current time, so a further reduction will not take place until at + least another RTO period has elapsed. Thus, during an application- + limited period the CWV algorithm reduces the congestion window once + per RTO. + +3.2. Pseudo-code for reducing the congestion window + + Initially: + T_last = tcpnow, T_prev = tcpnow, W_used = 0 + + After sending a data segment: + If tcpnow - T_last >= RTO + (The sender has been idle.) + ssthresh = max(ssthresh, 3*cwnd/4) + For i=1 To (tcpnow - T_last)/RTO + win = min(cwnd, receiver's declared max window) + cwnd = max(win/2, MSS) + T_prev = tcpnow + W_used = 0 + + T_last = tcpnow + + If window is full + T_prev = tcpnow + W_used = 0 + Else + If no more data is available to send + W_used = max(W_used, amount of unacknowledged data) + If tcpnow - T_prev >= RTO + (The sender has been application-limited.) + ssthresh = max(ssthresh, 3*cwnd/4) + + + +Handley, et al. Experimental [Page 6] + +RFC 2861 TCP Congestion Window Validation June 2000 + + + win = min(cwnd, receiver's declared max window) + cwnd = (win + W_used)/2 + T_prev = tcpnow + W_used = 0 + +4. Simulations + + The CWV proposal has been implemented as an option in the network + simulator NS [NS]. The simulations in the validation test suite for + CWV can be run with the command "./test-all-tcp" in the directory + "tcl/test". The simulations show the use of CWV to reduce the + congestion window after a period when the TCP connection was + application-limited, and to limit the increase in the congestion + window when a transfer is application-limited. As the simulations + illustrate, the use of ssthresh to maintain connection history is a + critical part of the Congestion Window Validation algorithm. [HPF99] + discusses these simulations in more detail. + +5. Experiments + + We have implemented the CWV mechanism in the TCP implementation in + FreeBSD 3.2. [HPF99] discusses these experiments in more detail. + + The first experiment examines the effects of the Congestion Window + Validation mechanisms for limiting cwnd increases during + application-limited periods. The experiment used a real ssh + connection through a modem link emulated using Dummynet [Dummynet]. + The link speed is 30Kb/s and the link has five packet buffers + available. Today most modem banks have more buffering available than + this, but the more buffer-limited situation sometimes occurs with + older modems. In the first half of the transfer, the user is typing + away over the connection. About half way through the time, the user + lists a moderately large file, which causes a large burst of traffic + to be transmitted. + + For the unmodified TCP, every returning ACK during the first part of + the transfer results in an increase in cwnd. As a result, the large + burst of data arriving from the application to the transport layer is + sent as many back-to-back packets, most of which get lost and + subsequently retransmitted. + + For the modified TCP with Congestion Window Validation, the + congestion window is not increased when the window is not full, and + has been decreased during application-limited periods closer to what + the user actually used. The burst of traffic is now constrained by + the congestion window, resulting in a better-behaved flow with + + + + + +Handley, et al. Experimental [Page 7] + +RFC 2861 TCP Congestion Window Validation June 2000 + + + minimal loss. The end result is that the transfer happens + approximately 30% faster than the transfer without CWV, due to + avoiding retransmission timeouts. + + The second experiment uses a real ssh connection over a real dialup + ppp connection, where the modem bank has much more buffering. For + the unmodified TCP, the initial burst from the large file does not + cause loss, but does cause the RTT to increase to approximately 5 + seconds, where the connection becomes bounded by the receiver's + window. + + For the modified TCP with Congestion Window Validation, the flow is + much better behaved, and produces no large burst of traffic. In this + case the linear increase for cwnd results in a slow increase in the + RTT as the buffer slowly fills. + + For the second experiment, both the modified and the unmodified TCP + finish delivering the data at precisely the same time. This is + because the link has been fully utilized in both cases due to the + modem buffer being larger than the receiver window. Clearly a modem + buffer of this size is undesirable due to its effect on the RTT of + competing flows, but it is necessary with current TCP implementations + that produce bursts similar to those shown in the top graph. + +6. Conclusions + + This document has presented several TCP algorithms for Congestion + Window Validation, to be employed after an idle period or a period in + which the sender was application-limited, and before an increase of + the congestion window. The goal of these algorithms is for TCP's + congestion window to reflect recent knowledge of the TCP connection + about the state of the network path, while at the same time keeping + some memory (i.e., in ssthresh) about the earlier state of the path. + We believe that these modifications will be of benefit to both the + network and to the TCP flows themselves, by preventing unnecessary + packet drops due to the TCP sender's failure to update its + information (or lack of information) about current network + conditions. Future work will document and investigate the benefit + provided by these algorithms, using both simulations and experiments. + Additional future work will describe a more complex version of the + CWV algorithm for TCP implementations where the sender does not have + an accurate estimate of the TCP roundtrip time. + + + + + + + + + +Handley, et al. Experimental [Page 8] + +RFC 2861 TCP Congestion Window Validation June 2000 + + +7. References + + [FF96] Fall, K., and Floyd, S., Simulation-based Comparisons of + Tahoe, Reno, and SACK TCP, Computer Communication Review, + V. 26 N. 3, July 1996, pp. 5-21. URL + "http://www.aciri.org/floyd/papers.html". + + [HPF99] Mark Handley, Jitendra Padhye, Sally Floyd, TCP Congestion + Window Validation, UMass CMPSCI Technical Report 99-77, + September 1999. URL "ftp://www- + net.cs.umass.edu/pub/Handley99-tcpq-tr-99-77.ps.gz". + + [HTH98] Amy Hughes, Joe Touch, John Heidemann, "Issues in TCP + Slow-Start Restart After Idle", Work in Progress. + + [J88] Jacobson, V., Congestion Avoidance and Control, Originally + from Proceedings of SIGCOMM '88 (Palo Alto, CA, Aug. + 1988), and revised in 1992. URL "http://www- + nrg.ee.lbl.gov/nrg-papers.html". + + [JKBFL96] Raj Jain, Shiv Kalyanaraman, Rohit Goyal, Sonia Fahmy, and + Fang Lu, Comments on "Use-it or Lose-it", ATM Forum + Document Number: ATM Forum/96-0178, URL + "http://www.netlab.ohio- + state.edu/~jain/atmf/af_rl5b2.htm". + + [JKGFL95] R. Jain, S. Kalyanaraman, R. Goyal, S. Fahmy, and F. Lu, A + Fix for Source End System Rule 5, AF-TM 95-1660, December + 1995, URL "http://www.netlab.ohio- + state.edu/~jain/atmf/af_rl52.htm". + + [MSML99] Matt Mathis, Jeff Semke, Jamshid Mahdavi, and Kevin Lahey, + The Rate-Halving Algorithm for TCP Congestion Control, + June 1999. URL + "http://www.psc.edu/networking/ftp/papers/draft- + ratehalving.txt". + + [NS] NS, the UCB/LBNL/VINT Network Simulator. URL + "http://www-mash.cs.berkeley.edu/ns/". + + [RFC2581] Allman, M., Paxson, V. and W. Stevens, TCP Congestion + Control, RFC 2581, April 1999. + + [VH97] Vikram Visweswaraiah and John Heidemann. Improving Restart + of Idle TCP Connections, Technical Report 97-661, + University of Southern California, November, 1997. + + + + + +Handley, et al. Experimental [Page 9] + +RFC 2861 TCP Congestion Window Validation June 2000 + + + [Dummynet] Luigi Rizzo, "Dummynet and Forward Error Correction", + Freenix 98, June 1998, New Orleans. URL + "http://info.iet.unipi.it/~luigi/ip_dummynet/". + +8. Security Considerations + + General security considerations concerning TCP congestion control are + discussed in RFC 2581. This document describes a algorithm for one + aspect of those congestion control procedures, and so the + considerations described in RFC 2581 apply to this algorithm also. + There are no known additional security concerns for this specific + algorithm. + +9. Authors' Addresses + + Mark Handley + AT&T Center for Internet Research at ICSI (ACIRI) + + Phone: +1 510 666 2946 + EMail: mjh@aciri.org + URL: http://www.aciri.org/mjh/ + + + Jitendra Padhye + AT&T Center for Internet Research at ICSI (ACIRI) + + Phone: +1 510 666 2887 + EMail: padhye@aciri.org + URL: http://www-net.cs.umass.edu/~jitu/ + + + Sally Floyd + AT&T Center for Internet Research at ICSI (ACIRI) + + Phone: +1 510 666 2989 + EMail: floyd@aciri.org + URL: http://www.aciri.org/floyd/ + + + + + + + + + + + + + + +Handley, et al. Experimental [Page 10] + +RFC 2861 TCP Congestion Window Validation June 2000 + + +10. Full Copyright Statement + + Copyright (C) The Internet Society (2000). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Handley, et al. Experimental [Page 11] + |