1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc2861.txt b/doc/rfc/rfc2861.txt
new file mode 100644
index 0000000..e5a4998
--- /dev/null
+++ b/doc/rfc/rfc2861.txt
@@ -0,0 +1,619 @@
+
+
+
+
+
+
+Network Working Group                                         M. Handley
+Request for Comments: 2861                                     J. Padhye
+Category: Experimental                                          S. Floyd
+                                                                   ACIRI
+                                                               June 2000
+
+
+                    TCP Congestion Window Validation
+
+Status of this Memo
+
+   This memo defines an Experimental Protocol for the Internet
+   community.  It does not specify an Internet standard of any kind.
+   Discussion and suggestions for improvement are requested.
+   Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+Abstract
+
+   TCP's congestion window controls the number of packets a TCP flow may
+   have in the network at any time.  However, long periods when the
+   sender is idle or application-limited can lead to the invalidation of
+   the congestion window, in that the congestion window no longer
+   reflects current information about the state of the network.  This
+   document describes a simple modification to TCP's congestion control
+   algorithms to decay the congestion window cwnd after the transition
+   from a sufficiently-long application-limited period, while using the
+   slow-start threshold ssthresh to save information about the previous
+   value of the congestion window.
+
+   An invalid congestion window also results when the congestion window
+   is increased (i.e., in TCP's slow-start or congestion avoidance
+   phases) during application-limited periods, when the previous value
+   of the congestion window might never have been fully utilized.  We
+   propose that the TCP sender should not increase the congestion window
+   when the TCP sender has been application-limited (and therefore has
+   not fully used the current congestion window).  We have explored
+   these algorithms both with simulations and with experiments from an
+   implementation in FreeBSD.
+
+1.  Conventions and Acronyms
+
+   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
+   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
+   document, are to be interpreted as described in [B97].
+
+
+
+Handley, et al.               Experimental                      [Page 1]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+2. Introduction
+
+   TCP's congestion window controls the number of packets a TCP flow may
+   have in the network at any time.  The congestion window is set using
+   an Additive-Increase, Multiplicative-Decrease (AIMD) mechanism that
+   probes for available bandwidth, dynamically adapting to changing
+   network conditions.  This AIMD mechanism works well when the sender
+   continually has data to send, as is typically the case for TCP used
+   for bulk-data transfer.  In contrast, for TCP used with telnet
+   applications, the data sender often has little or no data to send,
+   and the sending rate is often determined by the rate at which data is
+   generated by the user.  With the advent of the web, including
+   developments such as TCP senders with dynamically-created data and
+   HTTP 1.1 with persistent-connection TCP, the interaction between
+   application-limited periods (when the sender sends less than is
+   allowed by the congestion or receiver windows) and network-limited
+   periods (when the sender is limited by the TCP window) becomes
+   increasingly important.  More precisely, we define a network-limited
+   period as any period when the sender is sending a full window of
+   data.
+
+   Long periods when the sender is application-limited can lead to the
+   invalidation of the congestion window.  During periods when the TCP
+   sender is network-limited, the value of the congestion window is
+   repeatedly "revalidated" by the successful transmission of a window
+   of data without loss.  When the TCP sender is network-limited, there
+   is an incoming stream of acknowledgements that "clocks out" new data,
+   giving concrete evidence of recent available bandwidth in the
+   network.  In contrast, during periods when the TCP sender is
+   application-limited, the estimate of available capacity represented
+   by the congestion window may become steadily less accurate over time.
+   In particular, capacity that had once been used by the network-
+   limited connection might now be used by other traffic.
+
+   Current TCP implementations have a range of behaviors for starting up
+   after an idle period.  Some current TCP implementations slow-start
+   after an idle period longer than the RTO estimate, as suggested in
+   [RFC2581] and in the appendix of [VJ88], while other implementations
+   don't reduce their congestion window after an idle period.  RFC 2581
+   [RFC2581] recommends the following: "a TCP SHOULD set cwnd to no more
+   than RW [the initial window] before beginning transmission if the TCP
+   has not sent data in an interval exceeding the retransmission
+   timeout."  A proposal for TCP's slow-start after idle has also been
+   discussed in [HTH98].  The issue of validation of congestion
+   information during idle periods has also been addressed in contexts
+   other than TCP and IP, for example in "Use-it or Lose-it" mechanisms
+   for ATM networks [J96,J95].
+
+
+
+
+Handley, et al.               Experimental                      [Page 2]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   To address the revalidation of the congestion window after a
+   application-limited period, we propose a simple modification to TCP's
+   congestion control algorithms to decay the congestion window cwnd
+   after the transition from a sufficiently-long application-limited
+   period (i.e., at least one roundtrip time) to a network-limited
+   period.  In particular, we propose that after an idle period, the TCP
+   sender should reduce its congestion window by half for every RTT that
+   the flow has remained idle.
+
+   When the congestion window is reduced, the slow-start threshold
+   ssthresh remains as "memory" of the recent congestion window.
+   Specifically, ssthresh is never decreased when cwnd is reduced after
+   an application-limited period; before cwnd is reduced, ssthresh is
+   set to the maximum of its current value, and half-way between the old
+   and the new values of cwnd.  This use of ssthresh allows a TCP sender
+   increasing its sending rate after an application-limited period to
+   quickly slow-start to recover most of the previous value of the
+   congestion window.  To be more precise, if ssthresh is less than 3/4
+   cwnd when the congestion window is reduced after an application-
+   limited period, then ssthresh is increased to 3/4 cwnd before the
+   reduction of the congestion window.
+
+   An invalid congestion window also results when the congestion window
+   is increased (i.e., in TCP's slow-start or congestion avoidance
+   phases) during application-limited periods, when the previous value
+   of the congestion window might never have been fully utilized.  As
+   far as we know, all current TCP implementations increase the
+   congestion window when an acknowledgement arrives, if allowed by the
+   receiver's advertised window and the slow-start or congestion
+   avoidance window increase algorithm, without checking to see if the
+   previous value of the congestion window has in fact been used.  This
+   document proposes that the window increase algorithm not be invoked
+   during application-limited periods [MSML99].  In particular, the TCP
+   sender should not increase the congestion window when the TCP sender
+   has been application-limited (and therefore has not fully used the
+   current congestion window).  This restriction prevents the congestion
+   window from growing arbitrarily large, in the absence of evidence
+   that the congestion window can be supported by the network.  From
+   [MSML99, Section 5.2]: "This restriction assures that [cwnd] only
+   grows as long as TCP actually succeeds in injecting enough data into
+   the network to test the path."
+
+   A somewhat-orthogonal problem associated with maintaining a large
+   congestion window after an application-limited period is that the
+   sender, with a sudden large amount of data to send after a quiescent
+   period, might immediately send a full congestion window of back-to-
+   back packets.  This problem of sending large bursts of packets back-
+   to-back can be effectively handled using rate-based pacing (RBP,
+
+
+
+Handley, et al.               Experimental                      [Page 3]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   [VH97]), or using a maximum burst size control [FF96].  We would
+   contend that, even with mechanisms for limiting the sending of back-
+   to-back packets or pacing packets out over the period of a roundtrip
+   time, an old congestion window that has not been fully used for some
+   time can not be trusted as an indication of the bandwidth currently
+   available for that flow.  We would contend that the mechanisms to
+   pace out packets allowed by the congestion window are largely
+   orthogonal to the algorithms used to determine the appropriate size
+   of the congestion window.
+
+3. Description
+
+   When a TCP sender has sufficient data available to fill the available
+   network capacity for that flow, cwnd and ssthresh get set to
+   appropriate values for the network conditions.  When a TCP sender
+   stops sending, the flow stops sampling the network conditions, and so
+   the value of the congestion window may become inaccurate.  We believe
+   the correct conservative behavior under these circumstances is to
+   decay the congestion window by half for every RTT that the flow
+   remains inactive.  The value of half is a very conservative figure
+   based on how quickly multiplicative decrease would have decayed the
+   window in the presence of loss.
+
+   Another possibility is that the sender may not stop sending, but may
+   become application-limited rather than network-limited, and offer
+   less data to the network than the congestion window allows to be
+   sent.  In this case the TCP flow is still sampling network
+   conditions, but is not offering sufficient traffic to be sure that
+   there is still sufficient capacity in the network for that flow to
+   send a full congestion window.  Under these circumstances we believe
+   the correct conservative behavior is for the sender to keep track of
+   the maximum amount of the congestion window used during each RTT, and
+   to decay the congestion window each RTT to midway between the current
+   cwnd value and the maximum value used.
+
+   Before the congestion window is reduced, ssthresh is set to the
+   maximum of its current value and 3/4 cwnd.  If the sender then has
+   more data to send than the decayed cwnd allows, the TCP will slow-
+   start (perform exponential increase) at least half-way back up to the
+   old value of cwnd.
+
+   The justification for this value of "3/4 cwnd" is that 3/4 cwnd is a
+   conservative estimate of the recent average value of the congestion
+   window, and the TCP should safely be able to slow-start at least up
+   to this point.  For a TCP in steady-state that has been reducing its
+   congestion window each time the congestion window reached some
+   maximum value `maxwin', the average congestion window has been 3/4
+   maxwin.  On average, when the connection becomes application-limited,
+
+
+
+Handley, et al.               Experimental                      [Page 4]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   cwnd will be 3/4 maxwin, and in this case cwnd itself represents the
+   average value of the congestion window.  However, if the connection
+   happens to become application-limited when cwnd equals maxwin, then
+   the average value of the congestion window is given by 3/4 cwnd.
+
+   An alternate possibility would be to set ssthresh to the maximum of
+   the current value of ssthresh, and the old value of cwnd, allowing
+   TCP to slow-start all of the way back up to the old value of cwnd.
+   Further experimentation can be used to evaluate these two options for
+   setting ssthresh.
+
+   For the separate issue of the increase of the congestion window in
+   response to an acknowledgement, we believe the correct behavior is
+   for the sender to increase the congestion window only if the window
+   was full when the acknowledgment arrived.
+
+   We term this set of modifications to TCP Congestion Window Validation
+   (CWV) because they are related to ensuring the congestion window is
+   always a valid reflection of the current network state as probed by
+   the connection.
+
+3.1. The basic algorithm for reducing the congestion window
+
+   A key issue in the CWV algorithm is to determine how to apply the
+   guideline of reducing the congestion window once for every roundtrip
+   time that the flow is application-limited.  We use TCP's
+   retransmission timer (RTO) as a reasonable upper bound on the
+   roundtrip time, and reduce the congestion window roughly once per
+   RTO.
+
+   This basic algorithm could be implemented in TCP as follows: When TCP
+   sends a new packet it checks to see if more than RTO seconds have
+   elapsed since the previous packet was sent.  If RTO has elapsed,
+   ssthresh is set to the maximum of 3/4 cwnd and the current value of
+   ssthresh, and then the congestion window is halved for every RTO that
+   elapsed since the previous packet was sent.  In addition, T_prev is
+   set to the current time, and W_used is reset to zero.  T_prev will be
+   used to determine the elapsed time since the sender last was network-
+   limited or had reduced cwnd after an idle period.  When the sender is
+   application-limited, W_used holds the maximum congestion window
+   actually used since the sender was last network-limited.
+
+   The mechanism for determining the number of RTOs in the most recent
+   idle period could also be implemented by using a timer that expires
+   every RTO after the last packet was sent instead of a check per
+   packet - efficiency constraints on different operating systems may
+   dictate which is more efficient to implement.
+
+
+
+
+Handley, et al.               Experimental                      [Page 5]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   After TCP sends a packet, it also checks to see if that packet filled
+   the congestion window.  If so, the sender is network-limited, and
+   sets the variable T_prev to the current TCP clock time, and the
+   variable W_used to zero.
+
+   When TCP sends a packet that does not fill the congestion window, and
+   the TCP send queue is empty, then the sender is application-limited.
+   The sender checks to see if the amount of unacknowledged data is
+   greater than W_used; if so, W_used is set to the amount of
+   unacknowledged data.  In addition TCP checks to see if the elapsed
+   time since T_prev is greater than RTO.  If so, then the TCP has not
+   just reduced its congestion window following an idle period.  The TCP
+   has been application-limited rather than network-limited for at least
+   an entire RTO interval, but for less than two RTO intervals.  In this
+   case, TCP sets ssthresh to the maximum of 3/4 cwnd and the current
+   value of ssthresh, and reduces its congestion window to
+   (cwnd+W_used)/2.  W_used is then set to zero, and T_prev is set to
+   the current time, so a further reduction will not take place until at
+   least another RTO period has elapsed.  Thus, during an application-
+   limited period the CWV algorithm reduces the congestion window once
+   per RTO.
+
+3.2.  Pseudo-code for reducing the congestion window
+
+   Initially:
+       T_last = tcpnow, T_prev = tcpnow, W_used = 0
+
+   After sending a data segment:
+       If tcpnow - T_last >= RTO
+           (The sender has been idle.)
+           ssthresh =  max(ssthresh, 3*cwnd/4)
+           For i=1  To (tcpnow - T_last)/RTO
+               win =  min(cwnd, receiver's declared max window)
+               cwnd =  max(win/2, MSS)
+           T_prev = tcpnow
+           W_used = 0
+
+       T_last = tcpnow
+
+       If window is full
+           T_prev = tcpnow
+           W_used = 0
+       Else
+           If no more data is available to send
+               W_used =  max(W_used, amount of unacknowledged data)
+               If tcpnow - T_prev >= RTO
+                   (The sender has been application-limited.)
+                   ssthresh =  max(ssthresh, 3*cwnd/4)
+
+
+
+Handley, et al.               Experimental                      [Page 6]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+                   win =  min(cwnd, receiver's declared max window)
+                   cwnd = (win + W_used)/2
+                   T_prev = tcpnow
+                   W_used = 0
+
+4. Simulations
+
+   The CWV proposal has been implemented as an option in the network
+   simulator NS [NS].  The simulations in the validation test suite for
+   CWV can be run with the command "./test-all-tcp" in the directory
+   "tcl/test".  The simulations show the use of CWV to reduce the
+   congestion window after a period when the TCP connection was
+   application-limited, and to limit the increase in the congestion
+   window when a transfer is application-limited.  As the simulations
+   illustrate, the use of ssthresh to maintain connection history is a
+   critical part of the Congestion Window Validation algorithm.  [HPF99]
+   discusses these simulations in more detail.
+
+5. Experiments
+
+   We have implemented the CWV mechanism in the TCP implementation in
+   FreeBSD 3.2.  [HPF99] discusses these experiments in more detail.
+
+   The first experiment examines the effects of the Congestion Window
+   Validation mechanisms for limiting cwnd increases during
+   application-limited periods.  The experiment used a real ssh
+   connection through a modem link emulated using Dummynet [Dummynet].
+   The link speed is 30Kb/s and the link has five packet buffers
+   available.  Today most modem banks have more buffering available than
+   this, but the more buffer-limited situation sometimes occurs with
+   older modems.  In the first half of the transfer, the user is typing
+   away over the connection.  About half way through the time, the user
+   lists a moderately large file, which causes a large burst of traffic
+   to be transmitted.
+
+   For the unmodified TCP, every returning ACK during the first part of
+   the transfer results in an increase in cwnd.  As a result, the large
+   burst of data arriving from the application to the transport layer is
+   sent as many back-to-back packets, most of which get lost and
+   subsequently retransmitted.
+
+   For the modified TCP with Congestion Window Validation, the
+   congestion window is not increased when the window is not full, and
+   has been decreased during application-limited periods closer to what
+   the user actually used.  The burst of traffic is now constrained by
+   the congestion window, resulting in a better-behaved flow with
+
+
+
+
+
+Handley, et al.               Experimental                      [Page 7]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   minimal loss.  The end result is that the transfer happens
+   approximately 30% faster than the transfer without CWV, due to
+   avoiding retransmission timeouts.
+
+   The second experiment uses a real ssh connection over a real dialup
+   ppp connection, where the modem bank has much more buffering.  For
+   the unmodified TCP, the initial burst from the large file does not
+   cause loss, but does cause the RTT to increase to approximately 5
+   seconds, where the connection becomes bounded by the receiver's
+   window.
+
+   For the modified TCP with Congestion Window Validation, the flow is
+   much better behaved, and produces no large burst of traffic.  In this
+   case the linear increase for cwnd results in a slow increase in the
+   RTT as the buffer slowly fills.
+
+   For the second experiment, both the modified and the unmodified TCP
+   finish delivering the data at precisely the same time.  This is
+   because the link has been fully utilized in both cases due to the
+   modem buffer being larger than the receiver window.  Clearly a modem
+   buffer of this size is undesirable due to its effect on the RTT of
+   competing flows, but it is necessary with current TCP implementations
+   that produce bursts similar to those shown in the top graph.
+
+6. Conclusions
+
+   This document has presented several TCP algorithms for Congestion
+   Window Validation, to be employed after an idle period or a period in
+   which the sender was application-limited, and before an increase of
+   the congestion window.  The goal of these algorithms is for TCP's
+   congestion window to reflect recent knowledge of the TCP connection
+   about the state of the network path, while at the same time keeping
+   some memory (i.e., in ssthresh) about the earlier state of the path.
+   We believe that these modifications will be of benefit to both the
+   network and to the TCP flows themselves, by preventing unnecessary
+   packet drops due to the TCP sender's failure to update its
+   information (or lack of information) about current network
+   conditions.  Future work will document and investigate the benefit
+   provided by these algorithms, using both simulations and experiments.
+   Additional future work will describe a more complex version of the
+   CWV algorithm for TCP implementations where the sender does not have
+   an accurate estimate of the TCP roundtrip time.
+
+
+
+
+
+
+
+
+
+Handley, et al.               Experimental                      [Page 8]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+7. References
+
+   [FF96]     Fall, K., and Floyd, S., Simulation-based Comparisons of
+              Tahoe, Reno, and SACK TCP, Computer Communication Review,
+              V. 26 N. 3, July 1996, pp. 5-21.  URL
+              "http://www.aciri.org/floyd/papers.html".
+
+   [HPF99]    Mark Handley, Jitendra Padhye, Sally Floyd, TCP Congestion
+              Window Validation, UMass CMPSCI Technical Report 99-77,
+              September 1999.  URL "ftp://www-
+              net.cs.umass.edu/pub/Handley99-tcpq-tr-99-77.ps.gz".
+
+   [HTH98]    Amy Hughes, Joe Touch, John Heidemann, "Issues in TCP
+              Slow-Start Restart After Idle", Work in Progress.
+
+   [J88]      Jacobson, V., Congestion Avoidance and Control, Originally
+              from Proceedings of SIGCOMM '88 (Palo Alto, CA, Aug.
+              1988), and revised in 1992.  URL "http://www-
+              nrg.ee.lbl.gov/nrg-papers.html".
+
+   [JKBFL96]  Raj Jain, Shiv Kalyanaraman, Rohit Goyal, Sonia Fahmy, and
+              Fang Lu, Comments on "Use-it or Lose-it", ATM Forum
+              Document Number:  ATM Forum/96-0178, URL
+              "http://www.netlab.ohio-
+              state.edu/~jain/atmf/af_rl5b2.htm".
+
+   [JKGFL95]  R. Jain, S. Kalyanaraman, R. Goyal, S. Fahmy, and F. Lu, A
+              Fix for Source End System Rule 5, AF-TM 95-1660, December
+              1995, URL "http://www.netlab.ohio-
+              state.edu/~jain/atmf/af_rl52.htm".
+
+   [MSML99]   Matt Mathis, Jeff Semke, Jamshid Mahdavi, and Kevin Lahey,
+              The Rate-Halving Algorithm for TCP Congestion Control,
+              June 1999.  URL
+              "http://www.psc.edu/networking/ftp/papers/draft-
+              ratehalving.txt".
+
+   [NS]       NS, the UCB/LBNL/VINT Network Simulator.  URL
+              "http://www-mash.cs.berkeley.edu/ns/".
+
+   [RFC2581]  Allman, M., Paxson, V. and W. Stevens, TCP Congestion
+              Control, RFC 2581, April 1999.
+
+   [VH97]     Vikram Visweswaraiah and John Heidemann. Improving Restart
+              of Idle TCP Connections, Technical Report 97-661,
+              University of Southern California, November, 1997.
+
+
+
+
+
+Handley, et al.               Experimental                      [Page 9]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+   [Dummynet] Luigi Rizzo, "Dummynet and Forward Error Correction",
+              Freenix 98, June 1998, New Orleans.  URL
+              "http://info.iet.unipi.it/~luigi/ip_dummynet/".
+
+8. Security Considerations
+
+   General security considerations concerning TCP congestion control are
+   discussed in RFC 2581.  This document describes a algorithm for one
+   aspect of those congestion control procedures, and so the
+   considerations described in RFC 2581 apply to this algorithm also.
+   There are no known additional security concerns for this specific
+   algorithm.
+
+9. Authors' Addresses
+
+   Mark Handley
+   AT&T Center for Internet Research at ICSI (ACIRI)
+
+   Phone: +1 510 666 2946
+   EMail: mjh@aciri.org
+   URL: http://www.aciri.org/mjh/
+
+
+   Jitendra Padhye
+   AT&T Center for Internet Research at ICSI (ACIRI)
+
+   Phone: +1 510 666 2887
+   EMail: padhye@aciri.org
+   URL: http://www-net.cs.umass.edu/~jitu/
+
+
+   Sally Floyd
+   AT&T Center for Internet Research at ICSI (ACIRI)
+
+   Phone: +1 510 666 2989
+   EMail: floyd@aciri.org
+   URL:  http://www.aciri.org/floyd/
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Handley, et al.               Experimental                     [Page 10]
+
+RFC 2861            TCP Congestion Window Validation           June 2000
+
+
+10. Full Copyright Statement
+
+   Copyright (C) The Internet Society (2000).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Handley, et al.               Experimental                     [Page 11]
+