diff options
Diffstat (limited to 'doc/rfc/rfc9406.txt')
-rw-r--r-- | doc/rfc/rfc9406.txt | 454 |
1 files changed, 454 insertions, 0 deletions
diff --git a/doc/rfc/rfc9406.txt b/doc/rfc/rfc9406.txt new file mode 100644 index 0000000..ded1e1e --- /dev/null +++ b/doc/rfc/rfc9406.txt @@ -0,0 +1,454 @@ + + + + +Internet Engineering Task Force (IETF) P. Balasubramanian +Request for Comments: 9406 Confluent +Category: Standards Track Y. Huang +ISSN: 2070-1721 M. Olson + Microsoft + May 2023 + + + HyStart++: Modified Slow Start for TCP + +Abstract + + This document describes HyStart++, a simple modification to the slow + start phase of congestion control algorithms. Slow start can + overshoot the ideal send rate in many cases, causing high packet loss + and poor performance. HyStart++ uses increase in round-trip delay as + a heuristic to find an exit point before possible overshoot. It also + adds a mitigation to prevent jitter from causing premature slow start + exit. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc9406. + +Copyright Notice + + Copyright (c) 2023 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Revised BSD License text as described in Section 4.e of the + Trust Legal Provisions and are provided without warranty as described + in the Revised BSD License. + +Table of Contents + + 1. Introduction + 2. Terminology + 3. Definitions + 4. HyStart++ Algorithm + 4.1. Summary + 4.2. Algorithm Details + 4.3. Tuning Constants and Other Considerations + 5. Deployments and Performance Evaluations + 6. Security Considerations + 7. IANA Considerations + 8. References + 8.1. Normative References + 8.2. Informative References + Acknowledgments + Authors' Addresses + +1. Introduction + + [RFC5681] describes the slow start congestion control algorithm for + TCP. The slow start algorithm is used when the congestion window + (cwnd) is less than the slow start threshold (ssthresh). During slow + start, in the absence of packet loss signals, TCP increases the cwnd + exponentially to probe the network capacity. This fast growth can + overshoot the ideal sending rate and cause significant packet loss + that cannot always be recovered efficiently. + + HyStart++ builds upon Hybrid Start (HyStart), originally described in + [HyStart]. HyStart++ uses increase in round-trip delay as a signal + to exit slow start before potential packet loss occurs as a result of + overshoot. This is one of two algorithms specified in [HyStart] for + finding a safe exit point for slow start. After the slow start exit, + a new Conservative Slow Start (CSS) phase is used to determine + whether the slow start exit was premature and to resume slow start. + This mitigation improves performance in the presence of jitter. + HyStart++ reduces packet loss and retransmissions, and improves + goodput in lab measurements and real-world deployments. + + While this document describes HyStart++ for TCP, it can also be used + for other transport protocols that use slow start, such as QUIC + [RFC9002] or the Stream Control Transmission Protocol (SCTP) + [RFC9260]. + +2. Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + +3. Definitions + + To aid the reader, we repeat some definitions from [RFC5681]: + + SENDER MAXIMUM SEGMENT SIZE (SMSS): The size of the largest segment + that the sender can transmit. This value can be based on the + maximum transmission unit of the network, the Path MTU Discovery + algorithm [RFC1191] [RFC4821], RMSS (see next item), or other + factors. The size does not include the TCP/IP headers and + options. + + RECEIVER MAXIMUM SEGMENT SIZE (RMSS): The size of the largest + segment that the receiver is willing to accept. This is the value + specified in the MSS option sent by the receiver during connection + startup. Or, if the MSS option is not used, it is 536 bytes + [RFC1122]. The size does not include the TCP/IP headers and + options. + + RECEIVER WINDOW (rwnd): The most recently advertised receiver + window. + + CONGESTION WINDOW (cwnd): A TCP state variable that limits the + amount of data a TCP can send. At any given time, a TCP MUST NOT + send data with a sequence number higher than the sum of the + highest acknowledged sequence number and the minimum of the cwnd + and rwnd. + +4. HyStart++ Algorithm + +4.1. Summary + + [HyStart] specifies two algorithms (a "Delay Increase" algorithm and + an "Inter-Packet Arrival" algorithm) to be run in parallel to detect + that the sending rate has reached capacity. In practice, the Inter- + Packet Arrival algorithm does not perform well and is not able to + detect congestion early, primarily due to ACK compression. The idea + of the Delay Increase algorithm is to look for spikes in RTT (round- + trip time), which suggest that the bottleneck buffer is filling up. + + In HyStart++, a TCP sender uses standard slow start and then uses the + Delay Increase algorithm to trigger an exit from slow start. But + instead of going straight from slow start to congestion avoidance, + the sender spends a number of RTTs in a Conservative Slow Start (CSS) + phase to determine whether the exit from slow start was premature. + During CSS, the congestion window is grown exponentially in a fashion + similar to regular slow start, but with a smaller exponential base, + resulting in less aggressive growth. If the RTT reduces during CSS, + it's concluded that the RTT spike was not related to congestion + caused by the connection sending at a rate greater than the ideal + send rate, and the connection resumes slow start. If the RTT + inflation persists throughout CSS, the connection enters congestion + avoidance. + +4.2. Algorithm Details + + The following pseudocode uses a limit, L, to control the + aggressiveness of the cwnd increase during both standard slow start + and CSS. While an arriving ACK may newly acknowledge an arbitrary + number of bytes, the HyStart++ algorithm limits the number of those + bytes applied to increase the cwnd to L*SMSS bytes. + + lastRoundMinRTT and currentRoundMinRTT are initialized to infinity at + the initialization time. currRTT is the RTT sampled from the latest + incoming ACK and initialized to infinity. + + lastRoundMinRTT = infinity + currentRoundMinRTT = infinity + currRTT = infinity + + HyStart++ measures rounds using sequence numbers, as follows: + + * Define windowEnd as a sequence number initialized to SND.NXT. + + * When windowEnd is ACKed, the current round ends and windowEnd is + set to SND.NXT. + + At the start of each round during standard slow start [RFC5681] and + CSS, initialize the variables used to compute the last round's and + current round's minimum RTT: + + lastRoundMinRTT = currentRoundMinRTT + currentRoundMinRTT = infinity + rttSampleCount = 0 + + For each arriving ACK in slow start, where N is the number of + previously unacknowledged bytes acknowledged in the arriving ACK: + + Update the cwnd: + + cwnd = cwnd + min(N, L * SMSS) + + Keep track of the minimum observed RTT: + + currentRoundMinRTT = min(currentRoundMinRTT, currRTT) + rttSampleCount += 1 + + For rounds where at least N_RTT_SAMPLE RTT samples have been obtained + and currentRoundMinRTT and lastRoundMinRTT are valid, check to see if + delay increase triggers slow start exit: + + if ((rttSampleCount >= N_RTT_SAMPLE) AND + (currentRoundMinRTT != infinity) AND + (lastRoundMinRTT != infinity)) + RttThresh = max(MIN_RTT_THRESH, + min(lastRoundMinRTT / MIN_RTT_DIVISOR, MAX_RTT_THRESH)) + if (currentRoundMinRTT >= (lastRoundMinRTT + RttThresh)) + cssBaselineMinRtt = currentRoundMinRTT + exit slow start and enter CSS + + For each arriving ACK in CSS, where N is the number of previously + unacknowledged bytes acknowledged in the arriving ACK: + + Update the cwnd: + + cwnd = cwnd + (min(N, L * SMSS) / CSS_GROWTH_DIVISOR) + + Keep track of the minimum observed RTT: + + currentRoundMinRTT = min(currentRoundMinRTT, currRTT) + rttSampleCount += 1 + + For CSS rounds where at least N_RTT_SAMPLE RTT samples have been + obtained, check to see if the current round's minRTT drops below + baseline (cssBaselineMinRtt) indicating that slow start exit was + spurious: + + if (currentRoundMinRTT < cssBaselineMinRtt) + cssBaselineMinRtt = infinity + resume slow start including HyStart++ + + CSS lasts at most CSS_ROUNDS rounds. If the transition into CSS + happens in the middle of a round, that partial round counts towards + the limit. + + If CSS_ROUNDS rounds are complete, enter congestion avoidance by + setting the ssthresh to the current cwnd. + + ssthresh = cwnd + + If loss or Explicit Congestion Notification (ECN) marking is observed + at any time during standard slow start or CSS, enter congestion + avoidance by setting the ssthresh to the current cwnd. + + ssthresh = cwnd + +4.3. Tuning Constants and Other Considerations + + It is RECOMMENDED that a HyStart++ implementation use the following + constants: + + MIN_RTT_THRESH = 4 msec + MAX_RTT_THRESH = 16 msec + MIN_RTT_DIVISOR = 8 + N_RTT_SAMPLE = 8 + CSS_GROWTH_DIVISOR = 4 + CSS_ROUNDS = 5 + L = infinity if paced, L = 8 if non-paced + + These constants have been determined with lab measurements and real- + world deployments. An implementation MAY tune them for different + network characteristics. + + The delay increase sensitivity is determined by MIN_RTT_THRESH and + MAX_RTT_THRESH. Smaller values of MIN_RTT_THRESH may cause spurious + exits from slow start. Larger values of MAX_RTT_THRESH may result in + slow start not exiting until loss is encountered for connections on + large RTT paths. + + MIN_RTT_DIVISOR is a fraction of RTT to compute the delay threshold. + A smaller value would mean a larger threshold and thus less + sensitivity to delay increase, and vice versa. + + While all TCP implementations are REQUIRED to take at least one RTT + sample each round, implementations of HyStart++ are RECOMMENDED to + take at least N_RTT_SAMPLE RTT samples. Using lower values of + N_RTT_SAMPLE will lower the accuracy of the measured RTT for the + round; higher values will improve accuracy at the cost of more + processing. + + The minimum value of CSS_GROWTH_DIVISOR MUST be at least 2. A value + of 1 results in the same aggressive behavior as regular slow start. + Values larger than 4 will cause the algorithm to be less aggressive + and maybe less performant. + + Smaller values of CSS_ROUNDS may miss detecting jitter, and larger + values may limit performance. + + Packet pacing [ASA00] is a possible mechanism to avoid large bursts + and their associated harm. A paced TCP implementation SHOULD use L = + infinity. Burst concerns are mitigated by pacing, and this setting + allows for optimal cwnd growth on modern networks. + + For TCP implementations that pace to mitigate burst concerns, L + values smaller than infinity may suffer performance problems due to + slow cwnd growth in high-speed networks. For non-paced TCP + implementations, L values smaller than 8 may suffer performance + problems due to slow cwnd growth in high-speed networks; L values + larger than 8 may cause an increase in burstiness and thereby loss + rates, and result in poor performance. + + An implementation SHOULD use HyStart++ only for the initial slow + start (when the ssthresh is at its initial value of arbitrarily high + per [RFC5681]) and fall back to using standard slow start for the + remainder of the connection lifetime. This is acceptable because + subsequent slow starts will use the discovered ssthresh value to exit + slow start and avoid the overshoot problem. An implementation MAY + use HyStart++ to grow the restart window [RFC5681] after a long idle + period. + + In application-limited scenarios, the amount of data in flight could + fall below the bandwidth-delay product (BDP) and result in smaller + RTT samples, which can trigger an exit back to slow start. It is + expected that a connection might oscillate between CSS and slow start + in such scenarios. But this behavior will neither result in a + connection prematurely entering congestion avoidance nor cause + overshooting compared to slow start. + +5. Deployments and Performance Evaluations + + At the time of this writing, HyStart++ as described in this document + has been default enabled for all TCP connections in the Windows + operating system for over two years with pacing disabled and an + actual L = 8. + + In lab measurements with Windows TCP, HyStart++ shows goodput + improvements as well as reductions in packet loss and retransmissions + compared to standard slow start. For example, across a variety of + tests on a 100 Mbps link with a bottleneck buffer size of bandwidth- + delay product, HyStart++ reduces bytes retransmitted by 50% and + retransmission timeouts (RTOs) by 36%. + + In an A/B test where we compared an implementation of HyStart++ + (based on an earlier draft version of this document) to standard slow + start across a large Windows device population, out of 52 billion TCP + connections, 0.7% of connections move from 1 RTO to 0 RTOs and + another 0.7% of connections move from 2 RTOs to 1 RTO with HyStart++. + This test did not focus on send-heavy connections, and the impact on + send-heavy connections is likely much higher. We plan to conduct + more such production experiments to gather more data in the future. + +6. Security Considerations + + HyStart++ enhances slow start and inherits the general security + considerations discussed in [RFC5681]. + + An attacker can cause HyStart++ to exit slow start prematurely and + impair the performance of a TCP connection by, for example, dropping + data packets or their acknowledgments. + + The ACK division attack outlined in [SCWA99] does not affect + HyStart++ because the congestion window increase in HyStart++ is + based on the number of bytes newly acknowledged in each arriving ACK + rather than by a particular constant on each arriving ACK. + +7. IANA Considerations + + This document has no IANA actions. + +8. References + +8.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion + Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, + <https://www.rfc-editor.org/info/rfc5681>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + +8.2. Informative References + + [ASA00] Aggarwal, A., Savage, S., and T. Anderson, "Understanding + the performance of TCP pacing", Proceedings IEEE INFOCOM + 2000, DOI 10.1109/INFCOM.2000.832483, March 2000, + <https://doi.org/10.1109/INFCOM.2000.832483>. + + [HyStart] Ha, S. and I. Rhee, "Taming the elephants: New TCP slow + start", Computer Networks vol. 55, no. 9, pp. 2092-2110, + DOI 10.1016/j.comnet.2011.01.014, June 2011, + <https://doi.org/10.1016/j.comnet.2011.01.014>. + + [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - + Communication Layers", STD 3, RFC 1122, + DOI 10.17487/RFC1122, October 1989, + <https://www.rfc-editor.org/info/rfc1122>. + + [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, + DOI 10.17487/RFC1191, November 1990, + <https://www.rfc-editor.org/info/rfc1191>. + + [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU + Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, + <https://www.rfc-editor.org/info/rfc4821>. + + [RFC9002] Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection + and Congestion Control", RFC 9002, DOI 10.17487/RFC9002, + May 2021, <https://www.rfc-editor.org/info/rfc9002>. + + [RFC9260] Stewart, R., Tüxen, M., and K. Nielsen, "Stream Control + Transmission Protocol", RFC 9260, DOI 10.17487/RFC9260, + June 2022, <https://www.rfc-editor.org/info/rfc9260>. + + [SCWA99] Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, + "TCP congestion control with a misbehaving receiver", ACM + SIGCOMM Computer Communication Review, vol. 29, issue 5, + pp. 71-78, DOI 10.1145/505696.505704, October 1999, + <https://doi.org/10.1145/505696.505704>. + +Acknowledgments + + During the discussions of this work on the TCPM mailing list and in + working group meetings, helpful comments, critiques, and reviews were + received from (listed alphabetically by last name) Mark Allman, Bob + Briscoe, Neal Cardwell, Yuchung Cheng, Junho Choi, Martin Duke, Reese + Enghardt, Christian Huitema, Ilpo Järvinen, Yoshifumi Nishida, + Randall Stewart, and Michael Tüxen. + +Authors' Addresses + + Praveen Balasubramanian + Confluent + 899 West Evelyn Ave + Mountain View, CA 94041 + United States of America + Email: pravb.ietf@gmail.com + + + Yi Huang + Microsoft + One Microsoft Way + Redmond, WA 98052 + United States of America + Phone: +1 425 703 0447 + Email: huanyi@microsoft.com + + + Matt Olson + Microsoft + One Microsoft Way + Redmond, WA 98052 + United States of America + Phone: +1 425 538 8598 + Email: maolson@microsoft.com |