summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc9002.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc9002.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc9002.txt')
-rw-r--r--doc/rfc/rfc9002.txt2070
1 files changed, 2070 insertions, 0 deletions
diff --git a/doc/rfc/rfc9002.txt b/doc/rfc/rfc9002.txt
new file mode 100644
index 0000000..abc7f80
--- /dev/null
+++ b/doc/rfc/rfc9002.txt
@@ -0,0 +1,2070 @@
+
+
+
+
+Internet Engineering Task Force (IETF) J. Iyengar, Ed.
+Request for Comments: 9002 Fastly
+Category: Standards Track I. Swett, Ed.
+ISSN: 2070-1721 Google
+ May 2021
+
+
+ QUIC Loss Detection and Congestion Control
+
+Abstract
+
+ This document describes loss detection and congestion control
+ mechanisms for QUIC.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ https://www.rfc-editor.org/info/rfc9002.
+
+Copyright Notice
+
+ Copyright (c) 2021 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (https://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction
+ 2. Conventions and Definitions
+ 3. Design of the QUIC Transmission Machinery
+ 4. Relevant Differences between QUIC and TCP
+ 4.1. Separate Packet Number Spaces
+ 4.2. Monotonically Increasing Packet Numbers
+ 4.3. Clearer Loss Epoch
+ 4.4. No Reneging
+ 4.5. More ACK Ranges
+ 4.6. Explicit Correction for Delayed Acknowledgments
+ 4.7. Probe Timeout Replaces RTO and TLP
+ 4.8. The Minimum Congestion Window Is Two Packets
+ 4.9. Handshake Packets Are Not Special
+ 5. Estimating the Round-Trip Time
+ 5.1. Generating RTT Samples
+ 5.2. Estimating min_rtt
+ 5.3. Estimating smoothed_rtt and rttvar
+ 6. Loss Detection
+ 6.1. Acknowledgment-Based Detection
+ 6.1.1. Packet Threshold
+ 6.1.2. Time Threshold
+ 6.2. Probe Timeout
+ 6.2.1. Computing PTO
+ 6.2.2. Handshakes and New Paths
+ 6.2.3. Speeding up Handshake Completion
+ 6.2.4. Sending Probe Packets
+ 6.3. Handling Retry Packets
+ 6.4. Discarding Keys and Packet State
+ 7. Congestion Control
+ 7.1. Explicit Congestion Notification
+ 7.2. Initial and Minimum Congestion Window
+ 7.3. Congestion Control States
+ 7.3.1. Slow Start
+ 7.3.2. Recovery
+ 7.3.3. Congestion Avoidance
+ 7.4. Ignoring Loss of Undecryptable Packets
+ 7.5. Probe Timeout
+ 7.6. Persistent Congestion
+ 7.6.1. Duration
+ 7.6.2. Establishing Persistent Congestion
+ 7.6.3. Example
+ 7.7. Pacing
+ 7.8. Underutilizing the Congestion Window
+ 8. Security Considerations
+ 8.1. Loss and Congestion Signals
+ 8.2. Traffic Analysis
+ 8.3. Misreporting ECN Markings
+ 9. References
+ 9.1. Normative References
+ 9.2. Informative References
+ Appendix A. Loss Recovery Pseudocode
+ A.1. Tracking Sent Packets
+ A.1.1. Sent Packet Fields
+ A.2. Constants of Interest
+ A.3. Variables of Interest
+ A.4. Initialization
+ A.5. On Sending a Packet
+ A.6. On Receiving a Datagram
+ A.7. On Receiving an Acknowledgment
+ A.8. Setting the Loss Detection Timer
+ A.9. On Timeout
+ A.10. Detecting Lost Packets
+ A.11. Upon Dropping Initial or Handshake Keys
+ Appendix B. Congestion Control Pseudocode
+ B.1. Constants of Interest
+ B.2. Variables of Interest
+ B.3. Initialization
+ B.4. On Packet Sent
+ B.5. On Packet Acknowledgment
+ B.6. On New Congestion Event
+ B.7. Process ECN Information
+ B.8. On Packets Lost
+ B.9. Removing Discarded Packets from Bytes in Flight
+ Contributors
+ Authors' Addresses
+
+1. Introduction
+
+ QUIC is a secure, general-purpose transport protocol, described in
+ [QUIC-TRANSPORT]. This document describes loss detection and
+ congestion control mechanisms for QUIC.
+
+2. Conventions and Definitions
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in BCP
+ 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here.
+
+ Definitions of terms that are used in this document:
+
+ Ack-eliciting frames: All frames other than ACK, PADDING, and
+ CONNECTION_CLOSE are considered ack-eliciting.
+
+ Ack-eliciting packets: Packets that contain ack-eliciting frames
+ elicit an ACK from the receiver within the maximum acknowledgment
+ delay and are called ack-eliciting packets.
+
+ In-flight packets: Packets are considered in flight when they are
+ ack-eliciting or contain a PADDING frame, and they have been sent
+ but are not acknowledged, declared lost, or discarded along with
+ old keys.
+
+3. Design of the QUIC Transmission Machinery
+
+ All transmissions in QUIC are sent with a packet-level header, which
+ indicates the encryption level and includes a packet sequence number
+ (referred to below as a packet number). The encryption level
+ indicates the packet number space, as described in Section 12.3 of
+ [QUIC-TRANSPORT]. Packet numbers never repeat within a packet number
+ space for the lifetime of a connection. Packet numbers are sent in
+ monotonically increasing order within a space, preventing ambiguity.
+ It is permitted for some packet numbers to never be used, leaving
+ intentional gaps.
+
+ This design obviates the need for disambiguating between
+ transmissions and retransmissions; this eliminates significant
+ complexity from QUIC's interpretation of TCP loss detection
+ mechanisms.
+
+ QUIC packets can contain multiple frames of different types. The
+ recovery mechanisms ensure that data and frames that need reliable
+ delivery are acknowledged or declared lost and sent in new packets as
+ necessary. The types of frames contained in a packet affect recovery
+ and congestion control logic:
+
+ * All packets are acknowledged, though packets that contain no ack-
+ eliciting frames are only acknowledged along with ack-eliciting
+ packets.
+
+ * Long header packets that contain CRYPTO frames are critical to the
+ performance of the QUIC handshake and use shorter timers for
+ acknowledgment.
+
+ * Packets containing frames besides ACK or CONNECTION_CLOSE frames
+ count toward congestion control limits and are considered to be in
+ flight.
+
+ * PADDING frames cause packets to contribute toward bytes in flight
+ without directly causing an acknowledgment to be sent.
+
+4. Relevant Differences between QUIC and TCP
+
+ Readers familiar with TCP's loss detection and congestion control
+ will find algorithms here that parallel well-known TCP ones.
+ However, protocol differences between QUIC and TCP contribute to
+ algorithmic differences. These protocol differences are briefly
+ described below.
+
+4.1. Separate Packet Number Spaces
+
+ QUIC uses separate packet number spaces for each encryption level,
+ except 0-RTT and all generations of 1-RTT keys use the same packet
+ number space. Separate packet number spaces ensures that the
+ acknowledgment of packets sent with one level of encryption will not
+ cause spurious retransmission of packets sent with a different
+ encryption level. Congestion control and round-trip time (RTT)
+ measurement are unified across packet number spaces.
+
+4.2. Monotonically Increasing Packet Numbers
+
+ TCP conflates transmission order at the sender with delivery order at
+ the receiver, resulting in the retransmission ambiguity problem
+ [RETRANSMISSION]. QUIC separates transmission order from delivery
+ order: packet numbers indicate transmission order, and delivery order
+ is determined by the stream offsets in STREAM frames.
+
+ QUIC's packet number is strictly increasing within a packet number
+ space and directly encodes transmission order. A higher packet
+ number signifies that the packet was sent later, and a lower packet
+ number signifies that the packet was sent earlier. When a packet
+ containing ack-eliciting frames is detected lost, QUIC includes
+ necessary frames in a new packet with a new packet number, removing
+ ambiguity about which packet is acknowledged when an ACK is received.
+ Consequently, more accurate RTT measurements can be made, spurious
+ retransmissions are trivially detected, and mechanisms such as Fast
+ Retransmit can be applied universally, based only on packet number.
+
+ This design point significantly simplifies loss detection mechanisms
+ for QUIC. Most TCP mechanisms implicitly attempt to infer
+ transmission ordering based on TCP sequence numbers -- a nontrivial
+ task, especially when TCP timestamps are not available.
+
+4.3. Clearer Loss Epoch
+
+ QUIC starts a loss epoch when a packet is lost. The loss epoch ends
+ when any packet sent after the start of the epoch is acknowledged.
+ TCP waits for the gap in the sequence number space to be filled, and
+ so if a segment is lost multiple times in a row, the loss epoch may
+ not end for several round trips. Because both should reduce their
+ congestion windows only once per epoch, QUIC will do it once for
+ every round trip that experiences loss, while TCP may only do it once
+ across multiple round trips.
+
+4.4. No Reneging
+
+ QUIC ACK frames contain information similar to that in TCP Selective
+ Acknowledgments (SACKs) [RFC2018]. However, QUIC does not allow a
+ packet acknowledgment to be reneged, greatly simplifying
+ implementations on both sides and reducing memory pressure on the
+ sender.
+
+4.5. More ACK Ranges
+
+ QUIC supports many ACK ranges, as opposed to TCP's three SACK ranges.
+ In high-loss environments, this speeds recovery, reduces spurious
+ retransmits, and ensures forward progress without relying on
+ timeouts.
+
+4.6. Explicit Correction for Delayed Acknowledgments
+
+ QUIC endpoints measure the delay incurred between when a packet is
+ received and when the corresponding acknowledgment is sent, allowing
+ a peer to maintain a more accurate RTT estimate; see Section 13.2 of
+ [QUIC-TRANSPORT].
+
+4.7. Probe Timeout Replaces RTO and TLP
+
+ QUIC uses a probe timeout (PTO; see Section 6.2), with a timer based
+ on TCP's retransmission timeout (RTO) computation; see [RFC6298].
+ QUIC's PTO includes the peer's maximum expected acknowledgment delay
+ instead of using a fixed minimum timeout.
+
+ Similar to the RACK-TLP loss detection algorithm for TCP [RFC8985],
+ QUIC does not collapse the congestion window when the PTO expires,
+ since a single packet loss at the tail does not indicate persistent
+ congestion. Instead, QUIC collapses the congestion window when
+ persistent congestion is declared; see Section 7.6. In doing this,
+ QUIC avoids unnecessary congestion window reductions, obviating the
+ need for correcting mechanisms such as Forward RTO-Recovery (F-RTO)
+ [RFC5682]. Since QUIC does not collapse the congestion window on a
+ PTO expiration, a QUIC sender is not limited from sending more in-
+ flight packets after a PTO expiration if it still has available
+ congestion window. This occurs when a sender is application limited
+ and the PTO timer expires. This is more aggressive than TCP's RTO
+ mechanism when application limited, but identical when not
+ application limited.
+
+ QUIC allows probe packets to temporarily exceed the congestion window
+ whenever the timer expires.
+
+4.8. The Minimum Congestion Window Is Two Packets
+
+ TCP uses a minimum congestion window of one packet. However, loss of
+ that single packet means that the sender needs to wait for a PTO to
+ recover (Section 6.2), which can be much longer than an RTT. Sending
+ a single ack-eliciting packet also increases the chances of incurring
+ additional latency when a receiver delays its acknowledgment.
+
+ QUIC therefore recommends that the minimum congestion window be two
+ packets. While this increases network load, it is considered safe
+ since the sender will still reduce its sending rate exponentially
+ under persistent congestion (Section 6.2).
+
+4.9. Handshake Packets Are Not Special
+
+ TCP treats the loss of SYN or SYN-ACK packet as persistent congestion
+ and reduces the congestion window to one packet; see [RFC5681]. QUIC
+ treats loss of a packet containing handshake data the same as other
+ losses.
+
+5. Estimating the Round-Trip Time
+
+ At a high level, an endpoint measures the time from when a packet was
+ sent to when it is acknowledged as an RTT sample. The endpoint uses
+ RTT samples and peer-reported host delays (see Section 13.2 of
+ [QUIC-TRANSPORT]) to generate a statistical description of the
+ network path's RTT. An endpoint computes the following three values
+ for each path: the minimum value over a period of time (min_rtt), an
+ exponentially weighted moving average (smoothed_rtt), and the mean
+ deviation (referred to as "variation" in the rest of this document)
+ in the observed RTT samples (rttvar).
+
+5.1. Generating RTT Samples
+
+ An endpoint generates an RTT sample on receiving an ACK frame that
+ meets the following two conditions:
+
+ * the largest acknowledged packet number is newly acknowledged, and
+
+ * at least one of the newly acknowledged packets was ack-eliciting.
+
+ The RTT sample, latest_rtt, is generated as the time elapsed since
+ the largest acknowledged packet was sent:
+
+ latest_rtt = ack_time - send_time_of_largest_acked
+
+ An RTT sample is generated using only the largest acknowledged packet
+ in the received ACK frame. This is because a peer reports
+ acknowledgment delays for only the largest acknowledged packet in an
+ ACK frame. While the reported acknowledgment delay is not used by
+ the RTT sample measurement, it is used to adjust the RTT sample in
+ subsequent computations of smoothed_rtt and rttvar (Section 5.3).
+
+ To avoid generating multiple RTT samples for a single packet, an ACK
+ frame SHOULD NOT be used to update RTT estimates if it does not newly
+ acknowledge the largest acknowledged packet.
+
+ An RTT sample MUST NOT be generated on receiving an ACK frame that
+ does not newly acknowledge at least one ack-eliciting packet. A peer
+ usually does not send an ACK frame when only non-ack-eliciting
+ packets are received. Therefore, an ACK frame that contains
+ acknowledgments for only non-ack-eliciting packets could include an
+ arbitrarily large ACK Delay value. Ignoring such ACK frames avoids
+ complications in subsequent smoothed_rtt and rttvar computations.
+
+ A sender might generate multiple RTT samples per RTT when multiple
+ ACK frames are received within an RTT. As suggested in [RFC6298],
+ doing so might result in inadequate history in smoothed_rtt and
+ rttvar. Ensuring that RTT estimates retain sufficient history is an
+ open research question.
+
+5.2. Estimating min_rtt
+
+ min_rtt is the sender's estimate of the minimum RTT observed for a
+ given network path over a period of time. In this document, min_rtt
+ is used by loss detection to reject implausibly small RTT samples.
+
+ min_rtt MUST be set to the latest_rtt on the first RTT sample.
+ min_rtt MUST be set to the lesser of min_rtt and latest_rtt
+ (Section 5.1) on all other samples.
+
+ An endpoint uses only locally observed times in computing the min_rtt
+ and does not adjust for acknowledgment delays reported by the peer.
+ Doing so allows the endpoint to set a lower bound for the
+ smoothed_rtt based entirely on what it observes (see Section 5.3) and
+ limits potential underestimation due to erroneously reported delays
+ by the peer.
+
+ The RTT for a network path may change over time. If a path's actual
+ RTT decreases, the min_rtt will adapt immediately on the first low
+ sample. If the path's actual RTT increases, however, the min_rtt
+ will not adapt to it, allowing future RTT samples that are smaller
+ than the new RTT to be included in smoothed_rtt.
+
+ Endpoints SHOULD set the min_rtt to the newest RTT sample after
+ persistent congestion is established. This avoids repeatedly
+ declaring persistent congestion when the RTT increases. This also
+ allows a connection to reset its estimate of min_rtt and smoothed_rtt
+ after a disruptive network event; see Section 5.3.
+
+ Endpoints MAY reestablish the min_rtt at other times in the
+ connection, such as when traffic volume is low and an acknowledgment
+ is received with a low acknowledgment delay. Implementations SHOULD
+ NOT refresh the min_rtt value too often since the actual minimum RTT
+ of the path is not frequently observable.
+
+5.3. Estimating smoothed_rtt and rttvar
+
+ smoothed_rtt is an exponentially weighted moving average of an
+ endpoint's RTT samples, and rttvar estimates the variation in the RTT
+ samples using a mean variation.
+
+ The calculation of smoothed_rtt uses RTT samples after adjusting them
+ for acknowledgment delays. These delays are decoded from the ACK
+ Delay field of ACK frames as described in Section 19.3 of
+ [QUIC-TRANSPORT].
+
+ The peer might report acknowledgment delays that are larger than the
+ peer's max_ack_delay during the handshake (Section 13.2.1 of
+ [QUIC-TRANSPORT]). To account for this, the endpoint SHOULD ignore
+ max_ack_delay until the handshake is confirmed, as defined in
+ Section 4.1.2 of [QUIC-TLS]. When they occur, these large
+ acknowledgment delays are likely to be non-repeating and limited to
+ the handshake. The endpoint can therefore use them without limiting
+ them to the max_ack_delay, avoiding unnecessary inflation of the RTT
+ estimate.
+
+ Note that a large acknowledgment delay can result in a substantially
+ inflated smoothed_rtt if there is an error either in the peer's
+ reporting of the acknowledgment delay or in the endpoint's min_rtt
+ estimate. Therefore, prior to handshake confirmation, an endpoint
+ MAY ignore RTT samples if adjusting the RTT sample for acknowledgment
+ delay causes the sample to be less than the min_rtt.
+
+ After the handshake is confirmed, any acknowledgment delays reported
+ by the peer that are greater than the peer's max_ack_delay are
+ attributed to unintentional but potentially repeating delays, such as
+ scheduler latency at the peer or loss of previous acknowledgments.
+ Excess delays could also be due to a noncompliant receiver.
+ Therefore, these extra delays are considered effectively part of path
+ delay and incorporated into the RTT estimate.
+
+ Therefore, when adjusting an RTT sample using peer-reported
+ acknowledgment delays, an endpoint:
+
+ * MAY ignore the acknowledgment delay for Initial packets, since
+ these acknowledgments are not delayed by the peer (Section 13.2.1
+ of [QUIC-TRANSPORT]);
+
+ * SHOULD ignore the peer's max_ack_delay until the handshake is
+ confirmed;
+
+ * MUST use the lesser of the acknowledgment delay and the peer's
+ max_ack_delay after the handshake is confirmed; and
+
+ * MUST NOT subtract the acknowledgment delay from the RTT sample if
+ the resulting value is smaller than the min_rtt. This limits the
+ underestimation of the smoothed_rtt due to a misreporting peer.
+
+ Additionally, an endpoint might postpone the processing of
+ acknowledgments when the corresponding decryption keys are not
+ immediately available. For example, a client might receive an
+ acknowledgment for a 0-RTT packet that it cannot decrypt because
+ 1-RTT packet protection keys are not yet available to it. In such
+ cases, an endpoint SHOULD subtract such local delays from its RTT
+ sample until the handshake is confirmed.
+
+ Similar to [RFC6298], smoothed_rtt and rttvar are computed as
+ follows.
+
+ An endpoint initializes the RTT estimator during connection
+ establishment and when the estimator is reset during connection
+ migration; see Section 9.4 of [QUIC-TRANSPORT]. Before any RTT
+ samples are available for a new path or when the estimator is reset,
+ the estimator is initialized using the initial RTT; see
+ Section 6.2.2.
+
+ smoothed_rtt and rttvar are initialized as follows, where kInitialRtt
+ contains the initial RTT value:
+
+ smoothed_rtt = kInitialRtt
+ rttvar = kInitialRtt / 2
+
+ RTT samples for the network path are recorded in latest_rtt; see
+ Section 5.1. On the first RTT sample after initialization, the
+ estimator is reset using that sample. This ensures that the
+ estimator retains no history of past samples. Packets sent on other
+ paths do not contribute RTT samples to the current path, as described
+ in Section 9.4 of [QUIC-TRANSPORT].
+
+ On the first RTT sample after initialization, smoothed_rtt and rttvar
+ are set as follows:
+
+ smoothed_rtt = latest_rtt
+ rttvar = latest_rtt / 2
+
+ On subsequent RTT samples, smoothed_rtt and rttvar evolve as follows:
+
+ ack_delay = decoded acknowledgment delay from ACK frame
+ if (handshake confirmed):
+ ack_delay = min(ack_delay, max_ack_delay)
+ adjusted_rtt = latest_rtt
+ if (latest_rtt >= min_rtt + ack_delay):
+ adjusted_rtt = latest_rtt - ack_delay
+ smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt
+ rttvar_sample = abs(smoothed_rtt - adjusted_rtt)
+ rttvar = 3/4 * rttvar + 1/4 * rttvar_sample
+
+6. Loss Detection
+
+ QUIC senders use acknowledgments to detect lost packets and a PTO to
+ ensure acknowledgments are received; see Section 6.2. This section
+ provides a description of these algorithms.
+
+ If a packet is lost, the QUIC transport needs to recover from that
+ loss, such as by retransmitting the data, sending an updated frame,
+ or discarding the frame. For more information, see Section 13.3 of
+ [QUIC-TRANSPORT].
+
+ Loss detection is separate per packet number space, unlike RTT
+ measurement and congestion control, because RTT and congestion
+ control are properties of the path, whereas loss detection also
+ relies upon key availability.
+
+6.1. Acknowledgment-Based Detection
+
+ Acknowledgment-based loss detection implements the spirit of TCP's
+ Fast Retransmit [RFC5681], Early Retransmit [RFC5827], Forward
+ Acknowledgment [FACK], SACK loss recovery [RFC6675], and RACK-TLP
+ [RFC8985]. This section provides an overview of how these algorithms
+ are implemented in QUIC.
+
+ A packet is declared lost if it meets all of the following
+ conditions:
+
+ * The packet is unacknowledged, in flight, and was sent prior to an
+ acknowledged packet.
+
+ * The packet was sent kPacketThreshold packets before an
+ acknowledged packet (Section 6.1.1), or it was sent long enough in
+ the past (Section 6.1.2).
+
+ The acknowledgment indicates that a packet sent later was delivered,
+ and the packet and time thresholds provide some tolerance for packet
+ reordering.
+
+ Spuriously declaring packets as lost leads to unnecessary
+ retransmissions and may result in degraded performance due to the
+ actions of the congestion controller upon detecting loss.
+ Implementations can detect spurious retransmissions and increase the
+ packet or time reordering threshold to reduce future spurious
+ retransmissions and loss events. Implementations with adaptive time
+ thresholds MAY choose to start with smaller initial reordering
+ thresholds to minimize recovery latency.
+
+6.1.1. Packet Threshold
+
+ The RECOMMENDED initial value for the packet reordering threshold
+ (kPacketThreshold) is 3, based on best practices for TCP loss
+ detection [RFC5681] [RFC6675]. In order to remain similar to TCP,
+ implementations SHOULD NOT use a packet threshold less than 3; see
+ [RFC5681].
+
+ Some networks may exhibit higher degrees of packet reordering,
+ causing a sender to detect spurious losses. Additionally, packet
+ reordering could be more common with QUIC than TCP because network
+ elements that could observe and reorder TCP packets cannot do that
+ for QUIC and also because QUIC packet numbers are encrypted.
+ Algorithms that increase the reordering threshold after spuriously
+ detecting losses, such as RACK [RFC8985], have proven to be useful in
+ TCP and are expected to be at least as useful in QUIC.
+
+6.1.2. Time Threshold
+
+ Once a later packet within the same packet number space has been
+ acknowledged, an endpoint SHOULD declare an earlier packet lost if it
+ was sent a threshold amount of time in the past. To avoid declaring
+ packets as lost too early, this time threshold MUST be set to at
+ least the local timer granularity, as indicated by the kGranularity
+ constant. The time threshold is:
+
+ max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity)
+
+ If packets sent prior to the largest acknowledged packet cannot yet
+ be declared lost, then a timer SHOULD be set for the remaining time.
+
+ Using max(smoothed_rtt, latest_rtt) protects from the two following
+ cases:
+
+ * the latest RTT sample is lower than the smoothed RTT, perhaps due
+ to reordering where the acknowledgment encountered a shorter path;
+
+ * the latest RTT sample is higher than the smoothed RTT, perhaps due
+ to a sustained increase in the actual RTT, but the smoothed RTT
+ has not yet caught up.
+
+ The RECOMMENDED time threshold (kTimeThreshold), expressed as an RTT
+ multiplier, is 9/8. The RECOMMENDED value of the timer granularity
+ (kGranularity) is 1 millisecond.
+
+ | Note: TCP's RACK [RFC8985] specifies a slightly larger
+ | threshold, equivalent to 5/4, for a similar purpose.
+ | Experience with QUIC shows that 9/8 works well.
+
+ Implementations MAY experiment with absolute thresholds, thresholds
+ from previous connections, adaptive thresholds, or the including of
+ RTT variation. Smaller thresholds reduce reordering resilience and
+ increase spurious retransmissions, and larger thresholds increase
+ loss detection delay.
+
+6.2. Probe Timeout
+
+ A Probe Timeout (PTO) triggers the sending of one or two probe
+ datagrams when ack-eliciting packets are not acknowledged within the
+ expected period of time or the server may not have validated the
+ client's address. A PTO enables a connection to recover from loss of
+ tail packets or acknowledgments.
+
+ As with loss detection, the PTO is per packet number space. That is,
+ a PTO value is computed per packet number space.
+
+ A PTO timer expiration event does not indicate packet loss and MUST
+ NOT cause prior unacknowledged packets to be marked as lost. When an
+ acknowledgment is received that newly acknowledges packets, loss
+ detection proceeds as dictated by the packet and time threshold
+ mechanisms; see Section 6.1.
+
+ The PTO algorithm used in QUIC implements the reliability functions
+ of Tail Loss Probe [RFC8985], RTO [RFC5681], and F-RTO algorithms for
+ TCP [RFC5682]. The timeout computation is based on TCP's RTO period
+ [RFC6298].
+
+6.2.1. Computing PTO
+
+ When an ack-eliciting packet is transmitted, the sender schedules a
+ timer for the PTO period as follows:
+
+ PTO = smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay
+
+ The PTO period is the amount of time that a sender ought to wait for
+ an acknowledgment of a sent packet. This time period includes the
+ estimated network RTT (smoothed_rtt), the variation in the estimate
+ (4*rttvar), and max_ack_delay, to account for the maximum time by
+ which a receiver might delay sending an acknowledgment.
+
+ When the PTO is armed for Initial or Handshake packet number spaces,
+ the max_ack_delay in the PTO period computation is set to 0, since
+ the peer is expected to not delay these packets intentionally; see
+ Section 13.2.1 of [QUIC-TRANSPORT].
+
+ The PTO period MUST be at least kGranularity to avoid the timer
+ expiring immediately.
+
+ When ack-eliciting packets in multiple packet number spaces are in
+ flight, the timer MUST be set to the earlier value of the Initial and
+ Handshake packet number spaces.
+
+ An endpoint MUST NOT set its PTO timer for the Application Data
+ packet number space until the handshake is confirmed. Doing so
+ prevents the endpoint from retransmitting information in packets when
+ either the peer does not yet have the keys to process them or the
+ endpoint does not yet have the keys to process their acknowledgments.
+ For example, this can happen when a client sends 0-RTT packets to the
+ server; it does so without knowing whether the server will be able to
+ decrypt them. Similarly, this can happen when a server sends 1-RTT
+ packets before confirming that the client has verified the server's
+ certificate and can therefore read these 1-RTT packets.
+
+ A sender SHOULD restart its PTO timer every time an ack-eliciting
+ packet is sent or acknowledged, or when Initial or Handshake keys are
+ discarded (Section 4.9 of [QUIC-TLS]). This ensures the PTO is
+ always set based on the latest estimate of the RTT and for the
+ correct packet across packet number spaces.
+
+ When a PTO timer expires, the PTO backoff MUST be increased,
+ resulting in the PTO period being set to twice its current value.
+ The PTO backoff factor is reset when an acknowledgment is received,
+ except in the following case. A server might take longer to respond
+ to packets during the handshake than otherwise. To protect such a
+ server from repeated client probes, the PTO backoff is not reset at a
+ client that is not yet certain that the server has finished
+ validating the client's address. That is, a client does not reset
+ the PTO backoff factor on receiving acknowledgments in Initial
+ packets.
+
+ This exponential reduction in the sender's rate is important because
+ consecutive PTOs might be caused by loss of packets or
+ acknowledgments due to severe congestion. Even when there are ack-
+ eliciting packets in flight in multiple packet number spaces, the
+ exponential increase in PTO occurs across all spaces to prevent
+ excess load on the network. For example, a timeout in the Initial
+ packet number space doubles the length of the timeout in the
+ Handshake packet number space.
+
+ The total length of time over which consecutive PTOs expire is
+ limited by the idle timeout.
+
+ The PTO timer MUST NOT be set if a timer is set for time threshold
+ loss detection; see Section 6.1.2. A timer that is set for time
+ threshold loss detection will expire earlier than the PTO timer in
+ most cases and is less likely to spuriously retransmit data.
+
+6.2.2. Handshakes and New Paths
+
+ Resumed connections over the same network MAY use the previous
+ connection's final smoothed RTT value as the resumed connection's
+ initial RTT. When no previous RTT is available, the initial RTT
+ SHOULD be set to 333 milliseconds. This results in handshakes
+ starting with a PTO of 1 second, as recommended for TCP's initial
+ RTO; see Section 2 of [RFC6298].
+
+ A connection MAY use the delay between sending a PATH_CHALLENGE and
+ receiving a PATH_RESPONSE to set the initial RTT (see kInitialRtt in
+ Appendix A.2) for a new path, but the delay SHOULD NOT be considered
+ an RTT sample.
+
+ When the Initial keys and Handshake keys are discarded (see
+ Section 6.4), any Initial packets and Handshake packets can no longer
+ be acknowledged, so they are removed from bytes in flight. When
+ Initial or Handshake keys are discarded, the PTO and loss detection
+ timers MUST be reset, because discarding keys indicates forward
+ progress and the loss detection timer might have been set for a now-
+ discarded packet number space.
+
+6.2.2.1. Before Address Validation
+
+ Until the server has validated the client's address on the path, the
+ amount of data it can send is limited to three times the amount of
+ data received, as specified in Section 8.1 of [QUIC-TRANSPORT]. If
+ no additional data can be sent, the server's PTO timer MUST NOT be
+ armed until datagrams have been received from the client because
+ packets sent on PTO count against the anti-amplification limit.
+
+ When the server receives a datagram from the client, the
+ amplification limit is increased and the server resets the PTO timer.
+ If the PTO timer is then set to a time in the past, it is executed
+ immediately. Doing so avoids sending new 1-RTT packets prior to
+ packets critical to the completion of the handshake. In particular,
+ this can happen when 0-RTT is accepted but the server fails to
+ validate the client's address.
+
+ Since the server could be blocked until more datagrams are received
+ from the client, it is the client's responsibility to send packets to
+ unblock the server until it is certain that the server has finished
+ its address validation (see Section 8 of [QUIC-TRANSPORT]). That is,
+ the client MUST set the PTO timer if the client has not received an
+ acknowledgment for any of its Handshake packets and the handshake is
+ not confirmed (see Section 4.1.2 of [QUIC-TLS]), even if there are no
+ packets in flight. When the PTO fires, the client MUST send a
+ Handshake packet if it has Handshake keys, otherwise it MUST send an
+ Initial packet in a UDP datagram with a payload of at least 1200
+ bytes.
+
+6.2.3. Speeding up Handshake Completion
+
+ When a server receives an Initial packet containing duplicate CRYPTO
+ data, it can assume the client did not receive all of the server's
+ CRYPTO data sent in Initial packets, or the client's estimated RTT is
+ too small. When a client receives Handshake or 1-RTT packets prior
+ to obtaining Handshake keys, it may assume some or all of the
+ server's Initial packets were lost.
+
+ To speed up handshake completion under these conditions, an endpoint
+ MAY, for a limited number of times per connection, send a packet
+ containing unacknowledged CRYPTO data earlier than the PTO expiry,
+ subject to the address validation limits in Section 8.1 of
+ [QUIC-TRANSPORT]. Doing so at most once for each connection is
+ adequate to quickly recover from a single packet loss. An endpoint
+ that always retransmits packets in response to receiving packets that
+ it cannot process risks creating an infinite exchange of packets.
+
+ Endpoints can also use coalesced packets (see Section 12.2 of
+ [QUIC-TRANSPORT]) to ensure that each datagram elicits at least one
+ acknowledgment. For example, a client can coalesce an Initial packet
+ containing PING and PADDING frames with a 0-RTT data packet, and a
+ server can coalesce an Initial packet containing a PING frame with
+ one or more packets in its first flight.
+
+6.2.4. Sending Probe Packets
+
+ When a PTO timer expires, a sender MUST send at least one ack-
+ eliciting packet in the packet number space as a probe. An endpoint
+ MAY send up to two full-sized datagrams containing ack-eliciting
+ packets to avoid an expensive consecutive PTO expiration due to a
+ single lost datagram or to transmit data from multiple packet number
+ spaces. All probe packets sent on a PTO MUST be ack-eliciting.
+
+ In addition to sending data in the packet number space for which the
+ timer expired, the sender SHOULD send ack-eliciting packets from
+ other packet number spaces with in-flight data, coalescing packets if
+ possible. This is particularly valuable when the server has both
+ Initial and Handshake data in flight or when the client has both
+ Handshake and Application Data in flight because the peer might only
+ have receive keys for one of the two packet number spaces.
+
+ If the sender wants to elicit a faster acknowledgment on PTO, it can
+ skip a packet number to eliminate the acknowledgment delay.
+
+ An endpoint SHOULD include new data in packets that are sent on PTO
+ expiration. Previously sent data MAY be sent if no new data can be
+ sent. Implementations MAY use alternative strategies for determining
+ the content of probe packets, including sending new or retransmitted
+ data based on the application's priorities.
+
+ It is possible the sender has no new or previously sent data to send.
+ As an example, consider the following sequence of events: new
+ application data is sent in a STREAM frame, deemed lost, then
+ retransmitted in a new packet, and then the original transmission is
+ acknowledged. When there is no data to send, the sender SHOULD send
+ a PING or other ack-eliciting frame in a single packet, rearming the
+ PTO timer.
+
+ Alternatively, instead of sending an ack-eliciting packet, the sender
+ MAY mark any packets still in flight as lost. Doing so avoids
+ sending an additional packet but increases the risk that loss is
+ declared too aggressively, resulting in an unnecessary rate reduction
+ by the congestion controller.
+
+ Consecutive PTO periods increase exponentially, and as a result,
+ connection recovery latency increases exponentially as packets
+ continue to be dropped in the network. Sending two packets on PTO
+ expiration increases resilience to packet drops, thus reducing the
+ probability of consecutive PTO events.
+
+ When the PTO timer expires multiple times and new data cannot be
+ sent, implementations must choose between sending the same payload
+ every time or sending different payloads. Sending the same payload
+ may be simpler and ensures the highest priority frames arrive first.
+ Sending different payloads each time reduces the chances of spurious
+ retransmission.
+
+6.3. Handling Retry Packets
+
+ A Retry packet causes a client to send another Initial packet,
+ effectively restarting the connection process. A Retry packet
+ indicates that the Initial packet was received but not processed. A
+ Retry packet cannot be treated as an acknowledgment because it does
+ not indicate that a packet was processed or specify the packet
+ number.
+
+ Clients that receive a Retry packet reset congestion control and loss
+ recovery state, including resetting any pending timers. Other
+ connection state, in particular cryptographic handshake messages, is
+ retained; see Section 17.2.5 of [QUIC-TRANSPORT].
+
+ The client MAY compute an RTT estimate to the server as the time
+ period from when the first Initial packet was sent to when a Retry or
+ a Version Negotiation packet is received. The client MAY use this
+ value in place of its default for the initial RTT estimate.
+
+6.4. Discarding Keys and Packet State
+
+ When Initial and Handshake packet protection keys are discarded (see
+ Section 4.9 of [QUIC-TLS]), all packets that were sent with those
+ keys can no longer be acknowledged because their acknowledgments
+ cannot be processed. The sender MUST discard all recovery state
+ associated with those packets and MUST remove them from the count of
+ bytes in flight.
+
+ Endpoints stop sending and receiving Initial packets once they start
+ exchanging Handshake packets; see Section 17.2.2.1 of
+ [QUIC-TRANSPORT]. At this point, recovery state for all in-flight
+ Initial packets is discarded.
+
+ When 0-RTT is rejected, recovery state for all in-flight 0-RTT
+ packets is discarded.
+
+ If a server accepts 0-RTT, but does not buffer 0-RTT packets that
+ arrive before Initial packets, early 0-RTT packets will be declared
+ lost, but that is expected to be infrequent.
+
+ It is expected that keys are discarded at some time after the packets
+ encrypted with them are either acknowledged or declared lost.
+ However, Initial and Handshake secrets are discarded as soon as
+ Handshake and 1-RTT keys are proven to be available to both client
+ and server; see Section 4.9.1 of [QUIC-TLS].
+
+7. Congestion Control
+
+ This document specifies a sender-side congestion controller for QUIC
+ similar to TCP NewReno [RFC6582].
+
+ The signals QUIC provides for congestion control are generic and are
+ designed to support different sender-side algorithms. A sender can
+ unilaterally choose a different algorithm to use, such as CUBIC
+ [RFC8312].
+
+ If a sender uses a different controller than that specified in this
+ document, the chosen controller MUST conform to the congestion
+ control guidelines specified in Section 3.1 of [RFC8085].
+
+ Similar to TCP, packets containing only ACK frames do not count
+ toward bytes in flight and are not congestion controlled. Unlike
+ TCP, QUIC can detect the loss of these packets and MAY use that
+ information to adjust the congestion controller or the rate of ACK-
+ only packets being sent, but this document does not describe a
+ mechanism for doing so.
+
+ The congestion controller is per path, so packets sent on other paths
+ do not alter the current path's congestion controller, as described
+ in Section 9.4 of [QUIC-TRANSPORT].
+
+ The algorithm in this document specifies and uses the controller's
+ congestion window in bytes.
+
+ An endpoint MUST NOT send a packet if it would cause bytes_in_flight
+ (see Appendix B.2) to be larger than the congestion window, unless
+ the packet is sent on a PTO timer expiration (see Section 6.2) or
+ when entering recovery (see Section 7.3.2).
+
+7.1. Explicit Congestion Notification
+
+ If a path has been validated to support Explicit Congestion
+ Notification (ECN) [RFC3168] [RFC8311], QUIC treats a Congestion
+ Experienced (CE) codepoint in the IP header as a signal of
+ congestion. This document specifies an endpoint's response when the
+ peer-reported ECN-CE count increases; see Section 13.4.2 of
+ [QUIC-TRANSPORT].
+
+7.2. Initial and Minimum Congestion Window
+
+ QUIC begins every connection in slow start with the congestion window
+ set to an initial value. Endpoints SHOULD use an initial congestion
+ window of ten times the maximum datagram size (max_datagram_size),
+ while limiting the window to the larger of 14,720 bytes or twice the
+ maximum datagram size. This follows the analysis and recommendations
+ in [RFC6928], increasing the byte limit to account for the smaller
+ 8-byte overhead of UDP compared to the 20-byte overhead for TCP.
+
+ If the maximum datagram size changes during the connection, the
+ initial congestion window SHOULD be recalculated with the new size.
+ If the maximum datagram size is decreased in order to complete the
+ handshake, the congestion window SHOULD be set to the new initial
+ congestion window.
+
+ Prior to validating the client's address, the server can be further
+ limited by the anti-amplification limit as specified in Section 8.1
+ of [QUIC-TRANSPORT]. Though the anti-amplification limit can prevent
+ the congestion window from being fully utilized and therefore slow
+ down the increase in congestion window, it does not directly affect
+ the congestion window.
+
+ The minimum congestion window is the smallest value the congestion
+ window can attain in response to loss, an increase in the peer-
+ reported ECN-CE count, or persistent congestion. The RECOMMENDED
+ value is 2 * max_datagram_size.
+
+7.3. Congestion Control States
+
+ The NewReno congestion controller described in this document has
+ three distinct states, as shown in Figure 1.
+
+ New path or +------------+
+ persistent congestion | Slow |
+ (O)---------------------->| Start |
+ +------------+
+ |
+ Loss or |
+ ECN-CE increase |
+ v
+ +------------+ Loss or +------------+
+ | Congestion | ECN-CE increase | Recovery |
+ | Avoidance |------------------>| Period |
+ +------------+ +------------+
+ ^ |
+ | |
+ +----------------------------+
+ Acknowledgment of packet
+ sent during recovery
+
+ Figure 1: Congestion Control States and Transitions
+
+ These states and the transitions between them are described in
+ subsequent sections.
+
+7.3.1. Slow Start
+
+ A NewReno sender is in slow start any time the congestion window is
+ below the slow start threshold. A sender begins in slow start
+ because the slow start threshold is initialized to an infinite value.
+
+ While a sender is in slow start, the congestion window increases by
+ the number of bytes acknowledged when each acknowledgment is
+ processed. This results in exponential growth of the congestion
+ window.
+
+ The sender MUST exit slow start and enter a recovery period when a
+ packet is lost or when the ECN-CE count reported by its peer
+ increases.
+
+ A sender reenters slow start any time the congestion window is less
+ than the slow start threshold, which only occurs after persistent
+ congestion is declared.
+
+7.3.2. Recovery
+
+ A NewReno sender enters a recovery period when it detects the loss of
+ a packet or when the ECN-CE count reported by its peer increases. A
+ sender that is already in a recovery period stays in it and does not
+ reenter it.
+
+ On entering a recovery period, a sender MUST set the slow start
+ threshold to half the value of the congestion window when loss is
+ detected. The congestion window MUST be set to the reduced value of
+ the slow start threshold before exiting the recovery period.
+
+ Implementations MAY reduce the congestion window immediately upon
+ entering a recovery period or use other mechanisms, such as
+ Proportional Rate Reduction [PRR], to reduce the congestion window
+ more gradually. If the congestion window is reduced immediately, a
+ single packet can be sent prior to reduction. This speeds up loss
+ recovery if the data in the lost packet is retransmitted and is
+ similar to TCP as described in Section 5 of [RFC6675].
+
+ The recovery period aims to limit congestion window reduction to once
+ per round trip. Therefore, during a recovery period, the congestion
+ window does not change in response to new losses or increases in the
+ ECN-CE count.
+
+ A recovery period ends and the sender enters congestion avoidance
+ when a packet sent during the recovery period is acknowledged. This
+ is slightly different from TCP's definition of recovery, which ends
+ when the lost segment that started recovery is acknowledged
+ [RFC5681].
+
+7.3.3. Congestion Avoidance
+
+ A NewReno sender is in congestion avoidance any time the congestion
+ window is at or above the slow start threshold and not in a recovery
+ period.
+
+ A sender in congestion avoidance uses an Additive Increase
+ Multiplicative Decrease (AIMD) approach that MUST limit the increase
+ to the congestion window to at most one maximum datagram size for
+ each congestion window that is acknowledged.
+
+ The sender exits congestion avoidance and enters a recovery period
+ when a packet is lost or when the ECN-CE count reported by its peer
+ increases.
+
+7.4. Ignoring Loss of Undecryptable Packets
+
+ During the handshake, some packet protection keys might not be
+ available when a packet arrives, and the receiver can choose to drop
+ the packet. In particular, Handshake and 0-RTT packets cannot be
+ processed until the Initial packets arrive, and 1-RTT packets cannot
+ be processed until the handshake completes. Endpoints MAY ignore the
+ loss of Handshake, 0-RTT, and 1-RTT packets that might have arrived
+ before the peer had packet protection keys to process those packets.
+ Endpoints MUST NOT ignore the loss of packets that were sent after
+ the earliest acknowledged packet in a given packet number space.
+
+7.5. Probe Timeout
+
+ Probe packets MUST NOT be blocked by the congestion controller. A
+ sender MUST however count these packets as being additionally in
+ flight, since these packets add network load without establishing
+ packet loss. Note that sending probe packets might cause the
+ sender's bytes in flight to exceed the congestion window until an
+ acknowledgment is received that establishes loss or delivery of
+ packets.
+
+7.6. Persistent Congestion
+
+ When a sender establishes loss of all packets sent over a long enough
+ duration, the network is considered to be experiencing persistent
+ congestion.
+
+7.6.1. Duration
+
+ The persistent congestion duration is computed as follows:
+
+ (smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay) *
+ kPersistentCongestionThreshold
+
+ Unlike the PTO computation in Section 6.2, this duration includes the
+ max_ack_delay irrespective of the packet number spaces in which
+ losses are established.
+
+ This duration allows a sender to send as many packets before
+ establishing persistent congestion, including some in response to PTO
+ expiration, as TCP does with Tail Loss Probes [RFC8985] and an RTO
+ [RFC5681].
+
+ Larger values of kPersistentCongestionThreshold cause the sender to
+ become less responsive to persistent congestion in the network, which
+ can result in aggressive sending into a congested network. Too small
+ a value can result in a sender declaring persistent congestion
+ unnecessarily, resulting in reduced throughput for the sender.
+
+ The RECOMMENDED value for kPersistentCongestionThreshold is 3, which
+ results in behavior that is approximately equivalent to a TCP sender
+ declaring an RTO after two TLPs.
+
+ This design does not use consecutive PTO events to establish
+ persistent congestion, since application patterns impact PTO
+ expiration. For example, a sender that sends small amounts of data
+ with silence periods between them restarts the PTO timer every time
+ it sends, potentially preventing the PTO timer from expiring for a
+ long period of time, even when no acknowledgments are being received.
+ The use of a duration enables a sender to establish persistent
+ congestion without depending on PTO expiration.
+
+7.6.2. Establishing Persistent Congestion
+
+ A sender establishes persistent congestion after the receipt of an
+ acknowledgment if two packets that are ack-eliciting are declared
+ lost, and:
+
+ * across all packet number spaces, none of the packets sent between
+ the send times of these two packets are acknowledged;
+
+ * the duration between the send times of these two packets exceeds
+ the persistent congestion duration (Section 7.6.1); and
+
+ * a prior RTT sample existed when these two packets were sent.
+
+ These two packets MUST be ack-eliciting, since a receiver is required
+ to acknowledge only ack-eliciting packets within its maximum
+ acknowledgment delay; see Section 13.2 of [QUIC-TRANSPORT].
+
+ The persistent congestion period SHOULD NOT start until there is at
+ least one RTT sample. Before the first RTT sample, a sender arms its
+ PTO timer based on the initial RTT (Section 6.2.2), which could be
+ substantially larger than the actual RTT. Requiring a prior RTT
+ sample prevents a sender from establishing persistent congestion with
+ potentially too few probes.
+
+ Since network congestion is not affected by packet number spaces,
+ persistent congestion SHOULD consider packets sent across packet
+ number spaces. A sender that does not have state for all packet
+ number spaces or an implementation that cannot compare send times
+ across packet number spaces MAY use state for just the packet number
+ space that was acknowledged. This might result in erroneously
+ declaring persistent congestion, but it will not lead to a failure to
+ detect persistent congestion.
+
+ When persistent congestion is declared, the sender's congestion
+ window MUST be reduced to the minimum congestion window
+ (kMinimumWindow), similar to a TCP sender's response on an RTO
+ [RFC5681].
+
+7.6.3. Example
+
+ The following example illustrates how a sender might establish
+ persistent congestion. Assume:
+
+ smoothed_rtt + max(4*rttvar, kGranularity) + max_ack_delay = 2
+ kPersistentCongestionThreshold = 3
+
+ Consider the following sequence of events:
+
+ +========+===================================+
+ | Time | Action |
+ +========+===================================+
+ | t=0 | Send packet #1 (application data) |
+ +--------+-----------------------------------+
+ | t=1 | Send packet #2 (application data) |
+ +--------+-----------------------------------+
+ | t=1.2 | Receive acknowledgment of #1 |
+ +--------+-----------------------------------+
+ | t=2 | Send packet #3 (application data) |
+ +--------+-----------------------------------+
+ | t=3 | Send packet #4 (application data) |
+ +--------+-----------------------------------+
+ | t=4 | Send packet #5 (application data) |
+ +--------+-----------------------------------+
+ | t=5 | Send packet #6 (application data) |
+ +--------+-----------------------------------+
+ | t=6 | Send packet #7 (application data) |
+ +--------+-----------------------------------+
+ | t=8 | Send packet #8 (PTO 1) |
+ +--------+-----------------------------------+
+ | t=12 | Send packet #9 (PTO 2) |
+ +--------+-----------------------------------+
+ | t=12.2 | Receive acknowledgment of #9 |
+ +--------+-----------------------------------+
+
+ Table 1
+
+ Packets 2 through 8 are declared lost when the acknowledgment for
+ packet 9 is received at "t = 12.2".
+
+ The congestion period is calculated as the time between the oldest
+ and newest lost packets: "8 - 1 = 7". The persistent congestion
+ duration is "2 * 3 = 6". Because the threshold was reached and
+ because none of the packets between the oldest and the newest lost
+ packets were acknowledged, the network is considered to have
+ experienced persistent congestion.
+
+ While this example shows PTO expiration, they are not required for
+ persistent congestion to be established.
+
+7.7. Pacing
+
+ A sender SHOULD pace sending of all in-flight packets based on input
+ from the congestion controller.
+
+ Sending multiple packets into the network without any delay between
+ them creates a packet burst that might cause short-term congestion
+ and losses. Senders MUST either use pacing or limit such bursts.
+ Senders SHOULD limit bursts to the initial congestion window; see
+ Section 7.2. A sender with knowledge that the network path to the
+ receiver can absorb larger bursts MAY use a higher limit.
+
+ An implementation should take care to architect its congestion
+ controller to work well with a pacer. For instance, a pacer might
+ wrap the congestion controller and control the availability of the
+ congestion window, or a pacer might pace out packets handed to it by
+ the congestion controller.
+
+ Timely delivery of ACK frames is important for efficient loss
+ recovery. To avoid delaying their delivery to the peer, packets
+ containing only ACK frames SHOULD therefore not be paced.
+
+ Endpoints can implement pacing as they choose. A perfectly paced
+ sender spreads packets exactly evenly over time. For a window-based
+ congestion controller, such as the one in this document, that rate
+ can be computed by averaging the congestion window over the RTT.
+ Expressed as a rate in units of bytes per time, where
+ congestion_window is in bytes:
+
+ rate = N * congestion_window / smoothed_rtt
+
+ Or expressed as an inter-packet interval in units of time:
+
+ interval = ( smoothed_rtt * packet_size / congestion_window ) / N
+
+ Using a value for "N" that is small, but at least 1 (for example,
+ 1.25) ensures that variations in RTT do not result in
+ underutilization of the congestion window.
+
+ Practical considerations, such as packetization, scheduling delays,
+ and computational efficiency, can cause a sender to deviate from this
+ rate over time periods that are much shorter than an RTT.
+
+ One possible implementation strategy for pacing uses a leaky bucket
+ algorithm, where the capacity of the "bucket" is limited to the
+ maximum burst size and the rate the "bucket" fills is determined by
+ the above function.
+
+7.8. Underutilizing the Congestion Window
+
+ When bytes in flight is smaller than the congestion window and
+ sending is not pacing limited, the congestion window is
+ underutilized. This can happen due to insufficient application data
+ or flow control limits. When this occurs, the congestion window
+ SHOULD NOT be increased in either slow start or congestion avoidance.
+
+ A sender that paces packets (see Section 7.7) might delay sending
+ packets and not fully utilize the congestion window due to this
+ delay. A sender SHOULD NOT consider itself application limited if it
+ would have fully utilized the congestion window without pacing delay.
+
+ A sender MAY implement alternative mechanisms to update its
+ congestion window after periods of underutilization, such as those
+ proposed for TCP in [RFC7661].
+
+8. Security Considerations
+
+8.1. Loss and Congestion Signals
+
+ Loss detection and congestion control fundamentally involve the
+ consumption of signals, such as delay, loss, and ECN markings, from
+ unauthenticated entities. An attacker can cause endpoints to reduce
+ their sending rate by manipulating these signals: by dropping
+ packets, by altering path delay strategically, or by changing ECN
+ codepoints.
+
+8.2. Traffic Analysis
+
+ Packets that carry only ACK frames can be heuristically identified by
+ observing packet size. Acknowledgment patterns may expose
+ information about link characteristics or application behavior. To
+ reduce leaked information, endpoints can bundle acknowledgments with
+ other frames, or they can use PADDING frames at a potential cost to
+ performance.
+
+8.3. Misreporting ECN Markings
+
+ A receiver can misreport ECN markings to alter the congestion
+ response of a sender. Suppressing reports of ECN-CE markings could
+ cause a sender to increase their send rate. This increase could
+ result in congestion and loss.
+
+ A sender can detect suppression of reports by marking occasional
+ packets that it sends with an ECN-CE marking. If a packet sent with
+ an ECN-CE marking is not reported as having been CE marked when the
+ packet is acknowledged, then the sender can disable ECN for that path
+ by not setting ECN-Capable Transport (ECT) codepoints in subsequent
+ packets sent on that path [RFC3168].
+
+ Reporting additional ECN-CE markings will cause a sender to reduce
+ their sending rate, which is similar in effect to advertising reduced
+ connection flow control limits and so no advantage is gained by doing
+ so.
+
+ Endpoints choose the congestion controller that they use. Congestion
+ controllers respond to reports of ECN-CE by reducing their rate, but
+ the response may vary. Markings can be treated as equivalent to loss
+ [RFC3168], but other responses can be specified, such as [RFC8511] or
+ [RFC8311].
+
+9. References
+
+9.1. Normative References
+
+ [QUIC-TLS] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure
+ QUIC", RFC 9001, DOI 10.17487/RFC9001, May 2021,
+ <https://www.rfc-editor.org/info/rfc9001>.
+
+ [QUIC-TRANSPORT]
+ Iyengar, J., Ed. and M. Thomson, Ed., "QUIC: A UDP-Based
+ Multiplexed and Secure Transport", RFC 9000,
+ DOI 10.17487/RFC9000, May 2021,
+ <https://www.rfc-editor.org/info/rfc9000>.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
+ of Explicit Congestion Notification (ECN) to IP",
+ RFC 3168, DOI 10.17487/RFC3168, September 2001,
+ <https://www.rfc-editor.org/info/rfc3168>.
+
+ [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
+ Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
+ March 2017, <https://www.rfc-editor.org/info/rfc8085>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+9.2. Informative References
+
+ [FACK] Mathis, M. and J. Mahdavi, "Forward acknowledgement:
+ Refining TCP Congestion Control", ACM SIGCOMM Computer
+ Communication Review, DOI 10.1145/248157.248181, August
+ 1996, <https://doi.org/10.1145/248157.248181>.
+
+ [PRR] Mathis, M., Dukkipati, N., and Y. Cheng, "Proportional
+ Rate Reduction for TCP", RFC 6937, DOI 10.17487/RFC6937,
+ May 2013, <https://www.rfc-editor.org/info/rfc6937>.
+
+ [RETRANSMISSION]
+ Karn, P. and C. Partridge, "Improving Round-Trip Time
+ Estimates in Reliable Transport Protocols", ACM
+ Transactions on Computer Systems,
+ DOI 10.1145/118544.118549, November 1991,
+ <https://doi.org/10.1145/118544.118549>.
+
+ [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
+ Selective Acknowledgment Options", RFC 2018,
+ DOI 10.17487/RFC2018, October 1996,
+ <https://www.rfc-editor.org/info/rfc2018>.
+
+ [RFC3465] Allman, M., "TCP Congestion Control with Appropriate Byte
+ Counting (ABC)", RFC 3465, DOI 10.17487/RFC3465, February
+ 2003, <https://www.rfc-editor.org/info/rfc3465>.
+
+ [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
+ Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
+ <https://www.rfc-editor.org/info/rfc5681>.
+
+ [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
+ "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
+ Spurious Retransmission Timeouts with TCP", RFC 5682,
+ DOI 10.17487/RFC5682, September 2009,
+ <https://www.rfc-editor.org/info/rfc5682>.
+
+ [RFC5827] Allman, M., Avrachenkov, K., Ayesta, U., Blanton, J., and
+ P. Hurtig, "Early Retransmit for TCP and Stream Control
+ Transmission Protocol (SCTP)", RFC 5827,
+ DOI 10.17487/RFC5827, May 2010,
+ <https://www.rfc-editor.org/info/rfc5827>.
+
+ [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent,
+ "Computing TCP's Retransmission Timer", RFC 6298,
+ DOI 10.17487/RFC6298, June 2011,
+ <https://www.rfc-editor.org/info/rfc6298>.
+
+ [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The
+ NewReno Modification to TCP's Fast Recovery Algorithm",
+ RFC 6582, DOI 10.17487/RFC6582, April 2012,
+ <https://www.rfc-editor.org/info/rfc6582>.
+
+ [RFC6675] Blanton, E., Allman, M., Wang, L., Jarvinen, I., Kojo, M.,
+ and Y. Nishida, "A Conservative Loss Recovery Algorithm
+ Based on Selective Acknowledgment (SACK) for TCP",
+ RFC 6675, DOI 10.17487/RFC6675, August 2012,
+ <https://www.rfc-editor.org/info/rfc6675>.
+
+ [RFC6928] Chu, J., Dukkipati, N., Cheng, Y., and M. Mathis,
+ "Increasing TCP's Initial Window", RFC 6928,
+ DOI 10.17487/RFC6928, April 2013,
+ <https://www.rfc-editor.org/info/rfc6928>.
+
+ [RFC7661] Fairhurst, G., Sathiaseelan, A., and R. Secchi, "Updating
+ TCP to Support Rate-Limited Traffic", RFC 7661,
+ DOI 10.17487/RFC7661, October 2015,
+ <https://www.rfc-editor.org/info/rfc7661>.
+
+ [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion
+ Notification (ECN) Experimentation", RFC 8311,
+ DOI 10.17487/RFC8311, January 2018,
+ <https://www.rfc-editor.org/info/rfc8311>.
+
+ [RFC8312] Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and
+ R. Scheffenegger, "CUBIC for Fast Long-Distance Networks",
+ RFC 8312, DOI 10.17487/RFC8312, February 2018,
+ <https://www.rfc-editor.org/info/rfc8312>.
+
+ [RFC8511] Khademi, N., Welzl, M., Armitage, G., and G. Fairhurst,
+ "TCP Alternative Backoff with ECN (ABE)", RFC 8511,
+ DOI 10.17487/RFC8511, December 2018,
+ <https://www.rfc-editor.org/info/rfc8511>.
+
+ [RFC8985] Cheng, Y., Cardwell, N., Dukkipati, N., and P. Jha, "The
+ RACK-TLP Loss Detection Algorithm for TCP", RFC 8985,
+ DOI 10.17487/RFC8985, February 2021,
+ <https://www.rfc-editor.org/info/rfc8985>.
+
+Appendix A. Loss Recovery Pseudocode
+
+ We now describe an example implementation of the loss detection
+ mechanisms described in Section 6.
+
+ The pseudocode segments in this section are licensed as Code
+ Components; see the copyright notice.
+
+A.1. Tracking Sent Packets
+
+ To correctly implement congestion control, a QUIC sender tracks every
+ ack-eliciting packet until the packet is acknowledged or lost. It is
+ expected that implementations will be able to access this information
+ by packet number and crypto context and store the per-packet fields
+ (Appendix A.1.1) for loss recovery and congestion control.
+
+ After a packet is declared lost, the endpoint can still maintain
+ state for it for an amount of time to allow for packet reordering;
+ see Section 13.3 of [QUIC-TRANSPORT]. This enables a sender to
+ detect spurious retransmissions.
+
+ Sent packets are tracked for each packet number space, and ACK
+ processing only applies to a single space.
+
+A.1.1. Sent Packet Fields
+
+ packet_number: The packet number of the sent packet.
+
+ ack_eliciting: A Boolean that indicates whether a packet is ack-
+ eliciting. If true, it is expected that an acknowledgment will be
+ received, though the peer could delay sending the ACK frame
+ containing it by up to the max_ack_delay.
+
+ in_flight: A Boolean that indicates whether the packet counts toward
+ bytes in flight.
+
+ sent_bytes: The number of bytes sent in the packet, not including
+ UDP or IP overhead, but including QUIC framing overhead.
+
+ time_sent: The time the packet was sent.
+
+A.2. Constants of Interest
+
+ Constants used in loss recovery are based on a combination of RFCs,
+ papers, and common practice.
+
+ kPacketThreshold: Maximum reordering in packets before packet
+ threshold loss detection considers a packet lost. The value
+ recommended in Section 6.1.1 is 3.
+
+ kTimeThreshold: Maximum reordering in time before time threshold
+ loss detection considers a packet lost. Specified as an RTT
+ multiplier. The value recommended in Section 6.1.2 is 9/8.
+
+ kGranularity: Timer granularity. This is a system-dependent value,
+ and Section 6.1.2 recommends a value of 1 ms.
+
+ kInitialRtt: The RTT used before an RTT sample is taken. The value
+ recommended in Section 6.2.2 is 333 ms.
+
+ kPacketNumberSpace: An enum to enumerate the three packet number
+ spaces:
+
+ enum kPacketNumberSpace {
+ Initial,
+ Handshake,
+ ApplicationData,
+ }
+
+A.3. Variables of Interest
+
+ Variables required to implement the congestion control mechanisms are
+ described in this section.
+
+ latest_rtt: The most recent RTT measurement made when receiving an
+ acknowledgment for a previously unacknowledged packet.
+
+ smoothed_rtt: The smoothed RTT of the connection, computed as
+ described in Section 5.3.
+
+ rttvar: The RTT variation, computed as described in Section 5.3.
+
+ min_rtt: The minimum RTT seen over a period of time, ignoring
+ acknowledgment delay, as described in Section 5.2.
+
+ first_rtt_sample: The time that the first RTT sample was obtained.
+
+ max_ack_delay: The maximum amount of time by which the receiver
+ intends to delay acknowledgments for packets in the Application
+ Data packet number space, as defined by the eponymous transport
+ parameter (Section 18.2 of [QUIC-TRANSPORT]). Note that the
+ actual ack_delay in a received ACK frame may be larger due to late
+ timers, reordering, or loss.
+
+ loss_detection_timer: Multi-modal timer used for loss detection.
+
+ pto_count: The number of times a PTO has been sent without receiving
+ an acknowledgment.
+
+ time_of_last_ack_eliciting_packet[kPacketNumberSpace]: The time the
+ most recent ack-eliciting packet was sent.
+
+ largest_acked_packet[kPacketNumberSpace]: The largest packet number
+ acknowledged in the packet number space so far.
+
+ loss_time[kPacketNumberSpace]: The time at which the next packet in
+ that packet number space can be considered lost based on exceeding
+ the reordering window in time.
+
+ sent_packets[kPacketNumberSpace]: An association of packet numbers
+ in a packet number space to information about them. Described in
+ detail above in Appendix A.1.
+
+A.4. Initialization
+
+ At the beginning of the connection, initialize the loss detection
+ variables as follows:
+
+ loss_detection_timer.reset()
+ pto_count = 0
+ latest_rtt = 0
+ smoothed_rtt = kInitialRtt
+ rttvar = kInitialRtt / 2
+ min_rtt = 0
+ first_rtt_sample = 0
+ for pn_space in [ Initial, Handshake, ApplicationData ]:
+ largest_acked_packet[pn_space] = infinite
+ time_of_last_ack_eliciting_packet[pn_space] = 0
+ loss_time[pn_space] = 0
+
+A.5. On Sending a Packet
+
+ After a packet is sent, information about the packet is stored. The
+ parameters to OnPacketSent are described in detail above in
+ Appendix A.1.1.
+
+ Pseudocode for OnPacketSent follows:
+
+ OnPacketSent(packet_number, pn_space, ack_eliciting,
+ in_flight, sent_bytes):
+ sent_packets[pn_space][packet_number].packet_number =
+ packet_number
+ sent_packets[pn_space][packet_number].time_sent = now()
+ sent_packets[pn_space][packet_number].ack_eliciting =
+ ack_eliciting
+ sent_packets[pn_space][packet_number].in_flight = in_flight
+ sent_packets[pn_space][packet_number].sent_bytes = sent_bytes
+ if (in_flight):
+ if (ack_eliciting):
+ time_of_last_ack_eliciting_packet[pn_space] = now()
+ OnPacketSentCC(sent_bytes)
+ SetLossDetectionTimer()
+
+A.6. On Receiving a Datagram
+
+ When a server is blocked by anti-amplification limits, receiving a
+ datagram unblocks it, even if none of the packets in the datagram are
+ successfully processed. In such a case, the PTO timer will need to
+ be rearmed.
+
+ Pseudocode for OnDatagramReceived follows:
+
+ OnDatagramReceived(datagram):
+ // If this datagram unblocks the server, arm the
+ // PTO timer to avoid deadlock.
+ if (server was at anti-amplification limit):
+ SetLossDetectionTimer()
+ if loss_detection_timer.timeout < now():
+ // Execute PTO if it would have expired
+ // while the amplification limit applied.
+ OnLossDetectionTimeout()
+
+A.7. On Receiving an Acknowledgment
+
+ When an ACK frame is received, it may newly acknowledge any number of
+ packets.
+
+ Pseudocode for OnAckReceived and UpdateRtt follow:
+
+ IncludesAckEliciting(packets):
+ for packet in packets:
+ if (packet.ack_eliciting):
+ return true
+ return false
+
+ OnAckReceived(ack, pn_space):
+ if (largest_acked_packet[pn_space] == infinite):
+ largest_acked_packet[pn_space] = ack.largest_acked
+ else:
+ largest_acked_packet[pn_space] =
+ max(largest_acked_packet[pn_space], ack.largest_acked)
+
+ // DetectAndRemoveAckedPackets finds packets that are newly
+ // acknowledged and removes them from sent_packets.
+ newly_acked_packets =
+ DetectAndRemoveAckedPackets(ack, pn_space)
+ // Nothing to do if there are no newly acked packets.
+ if (newly_acked_packets.empty()):
+ return
+
+ // Update the RTT if the largest acknowledged is newly acked
+ // and at least one ack-eliciting was newly acked.
+ if (newly_acked_packets.largest().packet_number ==
+ ack.largest_acked &&
+ IncludesAckEliciting(newly_acked_packets)):
+ latest_rtt =
+ now() - newly_acked_packets.largest().time_sent
+ UpdateRtt(ack.ack_delay)
+
+ // Process ECN information if present.
+ if (ACK frame contains ECN information):
+ ProcessECN(ack, pn_space)
+
+ lost_packets = DetectAndRemoveLostPackets(pn_space)
+ if (!lost_packets.empty()):
+ OnPacketsLost(lost_packets)
+ OnPacketsAcked(newly_acked_packets)
+
+ // Reset pto_count unless the client is unsure if
+ // the server has validated the client's address.
+ if (PeerCompletedAddressValidation()):
+ pto_count = 0
+ SetLossDetectionTimer()
+
+
+ UpdateRtt(ack_delay):
+ if (first_rtt_sample == 0):
+ min_rtt = latest_rtt
+ smoothed_rtt = latest_rtt
+ rttvar = latest_rtt / 2
+ first_rtt_sample = now()
+ return
+
+ // min_rtt ignores acknowledgment delay.
+ min_rtt = min(min_rtt, latest_rtt)
+ // Limit ack_delay by max_ack_delay after handshake
+ // confirmation.
+ if (handshake confirmed):
+ ack_delay = min(ack_delay, max_ack_delay)
+
+ // Adjust for acknowledgment delay if plausible.
+ adjusted_rtt = latest_rtt
+ if (latest_rtt >= min_rtt + ack_delay):
+ adjusted_rtt = latest_rtt - ack_delay
+
+ rttvar = 3/4 * rttvar + 1/4 * abs(smoothed_rtt - adjusted_rtt)
+ smoothed_rtt = 7/8 * smoothed_rtt + 1/8 * adjusted_rtt
+
+A.8. Setting the Loss Detection Timer
+
+ QUIC loss detection uses a single timer for all timeout loss
+ detection. The duration of the timer is based on the timer's mode,
+ which is set in the packet and timer events further below. The
+ function SetLossDetectionTimer defined below shows how the single
+ timer is set.
+
+ This algorithm may result in the timer being set in the past,
+ particularly if timers wake up late. Timers set in the past fire
+ immediately.
+
+ Pseudocode for SetLossDetectionTimer follows (where the "^" operator
+ represents exponentiation):
+
+ GetLossTimeAndSpace():
+ time = loss_time[Initial]
+ space = Initial
+ for pn_space in [ Handshake, ApplicationData ]:
+ if (time == 0 || loss_time[pn_space] < time):
+ time = loss_time[pn_space];
+ space = pn_space
+ return time, space
+
+ GetPtoTimeAndSpace():
+ duration = (smoothed_rtt + max(4 * rttvar, kGranularity))
+ * (2 ^ pto_count)
+ // Anti-deadlock PTO starts from the current time
+ if (no ack-eliciting packets in flight):
+ assert(!PeerCompletedAddressValidation())
+ if (has handshake keys):
+ return (now() + duration), Handshake
+ else:
+ return (now() + duration), Initial
+ pto_timeout = infinite
+ pto_space = Initial
+ for space in [ Initial, Handshake, ApplicationData ]:
+ if (no ack-eliciting packets in flight in space):
+ continue;
+ if (space == ApplicationData):
+ // Skip Application Data until handshake confirmed.
+ if (handshake is not confirmed):
+ return pto_timeout, pto_space
+ // Include max_ack_delay and backoff for Application Data.
+ duration += max_ack_delay * (2 ^ pto_count)
+
+ t = time_of_last_ack_eliciting_packet[space] + duration
+ if (t < pto_timeout):
+ pto_timeout = t
+ pto_space = space
+ return pto_timeout, pto_space
+
+ PeerCompletedAddressValidation():
+ // Assume clients validate the server's address implicitly.
+ if (endpoint is server):
+ return true
+ // Servers complete address validation when a
+ // protected packet is received.
+ return has received Handshake ACK ||
+ handshake confirmed
+
+ SetLossDetectionTimer():
+ earliest_loss_time, _ = GetLossTimeAndSpace()
+ if (earliest_loss_time != 0):
+ // Time threshold loss detection.
+ loss_detection_timer.update(earliest_loss_time)
+ return
+
+ if (server is at anti-amplification limit):
+ // The server's timer is not set if nothing can be sent.
+ loss_detection_timer.cancel()
+ return
+
+ if (no ack-eliciting packets in flight &&
+ PeerCompletedAddressValidation()):
+ // There is nothing to detect lost, so no timer is set.
+ // However, the client needs to arm the timer if the
+ // server might be blocked by the anti-amplification limit.
+ loss_detection_timer.cancel()
+ return
+
+ timeout, _ = GetPtoTimeAndSpace()
+ loss_detection_timer.update(timeout)
+
+A.9. On Timeout
+
+ When the loss detection timer expires, the timer's mode determines
+ the action to be performed.
+
+ Pseudocode for OnLossDetectionTimeout follows:
+
+ OnLossDetectionTimeout():
+ earliest_loss_time, pn_space = GetLossTimeAndSpace()
+ if (earliest_loss_time != 0):
+ // Time threshold loss Detection
+ lost_packets = DetectAndRemoveLostPackets(pn_space)
+ assert(!lost_packets.empty())
+ OnPacketsLost(lost_packets)
+ SetLossDetectionTimer()
+ return
+
+ if (no ack-eliciting packets in flight):
+ assert(!PeerCompletedAddressValidation())
+ // Client sends an anti-deadlock packet: Initial is padded
+ // to earn more anti-amplification credit,
+ // a Handshake packet proves address ownership.
+ if (has Handshake keys):
+ SendOneAckElicitingHandshakePacket()
+ else:
+ SendOneAckElicitingPaddedInitialPacket()
+ else:
+ // PTO. Send new data if available, else retransmit old data.
+ // If neither is available, send a single PING frame.
+ _, pn_space = GetPtoTimeAndSpace()
+ SendOneOrTwoAckElicitingPackets(pn_space)
+
+ pto_count++
+ SetLossDetectionTimer()
+
+A.10. Detecting Lost Packets
+
+ DetectAndRemoveLostPackets is called every time an ACK is received or
+ the time threshold loss detection timer expires. This function
+ operates on the sent_packets for that packet number space and returns
+ a list of packets newly detected as lost.
+
+ Pseudocode for DetectAndRemoveLostPackets follows:
+
+ DetectAndRemoveLostPackets(pn_space):
+ assert(largest_acked_packet[pn_space] != infinite)
+ loss_time[pn_space] = 0
+ lost_packets = []
+ loss_delay = kTimeThreshold * max(latest_rtt, smoothed_rtt)
+
+ // Minimum time of kGranularity before packets are deemed lost.
+ loss_delay = max(loss_delay, kGranularity)
+
+ // Packets sent before this time are deemed lost.
+ lost_send_time = now() - loss_delay
+
+ foreach unacked in sent_packets[pn_space]:
+ if (unacked.packet_number > largest_acked_packet[pn_space]):
+ continue
+
+ // Mark packet as lost, or set time when it should be marked.
+ // Note: The use of kPacketThreshold here assumes that there
+ // were no sender-induced gaps in the packet number space.
+ if (unacked.time_sent <= lost_send_time ||
+ largest_acked_packet[pn_space] >=
+ unacked.packet_number + kPacketThreshold):
+ sent_packets[pn_space].remove(unacked.packet_number)
+ lost_packets.insert(unacked)
+ else:
+ if (loss_time[pn_space] == 0):
+ loss_time[pn_space] = unacked.time_sent + loss_delay
+ else:
+ loss_time[pn_space] = min(loss_time[pn_space],
+ unacked.time_sent + loss_delay)
+ return lost_packets
+
+A.11. Upon Dropping Initial or Handshake Keys
+
+ When Initial or Handshake keys are discarded, packets from the space
+ are discarded and loss detection state is updated.
+
+ Pseudocode for OnPacketNumberSpaceDiscarded follows:
+
+ OnPacketNumberSpaceDiscarded(pn_space):
+ assert(pn_space != ApplicationData)
+ RemoveFromBytesInFlight(sent_packets[pn_space])
+ sent_packets[pn_space].clear()
+ // Reset the loss detection and PTO timer
+ time_of_last_ack_eliciting_packet[pn_space] = 0
+ loss_time[pn_space] = 0
+ pto_count = 0
+ SetLossDetectionTimer()
+
+Appendix B. Congestion Control Pseudocode
+
+ We now describe an example implementation of the congestion
+ controller described in Section 7.
+
+ The pseudocode segments in this section are licensed as Code
+ Components; see the copyright notice.
+
+B.1. Constants of Interest
+
+ Constants used in congestion control are based on a combination of
+ RFCs, papers, and common practice.
+
+ kInitialWindow: Default limit on the initial bytes in flight as
+ described in Section 7.2.
+
+ kMinimumWindow: Minimum congestion window in bytes as described in
+ Section 7.2.
+
+ kLossReductionFactor: Scaling factor applied to reduce the
+ congestion window when a new loss event is detected. Section 7
+ recommends a value of 0.5.
+
+ kPersistentCongestionThreshold: Period of time for persistent
+ congestion to be established, specified as a PTO multiplier.
+ Section 7.6 recommends a value of 3.
+
+B.2. Variables of Interest
+
+ Variables required to implement the congestion control mechanisms are
+ described in this section.
+
+ max_datagram_size: The sender's current maximum payload size. This
+ does not include UDP or IP overhead. The max datagram size is
+ used for congestion window computations. An endpoint sets the
+ value of this variable based on its Path Maximum Transmission Unit
+ (PMTU; see Section 14.2 of [QUIC-TRANSPORT]), with a minimum value
+ of 1200 bytes.
+
+ ecn_ce_counters[kPacketNumberSpace]: The highest value reported for
+ the ECN-CE counter in the packet number space by the peer in an
+ ACK frame. This value is used to detect increases in the reported
+ ECN-CE counter.
+
+ bytes_in_flight: The sum of the size in bytes of all sent packets
+ that contain at least one ack-eliciting or PADDING frame and have
+ not been acknowledged or declared lost. The size does not include
+ IP or UDP overhead, but does include the QUIC header and
+ Authenticated Encryption with Associated Data (AEAD) overhead.
+ Packets only containing ACK frames do not count toward
+ bytes_in_flight to ensure congestion control does not impede
+ congestion feedback.
+
+ congestion_window: Maximum number of bytes allowed to be in flight.
+
+ congestion_recovery_start_time: The time the current recovery period
+ started due to the detection of loss or ECN. When a packet sent
+ after this time is acknowledged, QUIC exits congestion recovery.
+
+ ssthresh: Slow start threshold in bytes. When the congestion window
+ is below ssthresh, the mode is slow start and the window grows by
+ the number of bytes acknowledged.
+
+ The congestion control pseudocode also accesses some of the variables
+ from the loss recovery pseudocode.
+
+B.3. Initialization
+
+ At the beginning of the connection, initialize the congestion control
+ variables as follows:
+
+ congestion_window = kInitialWindow
+ bytes_in_flight = 0
+ congestion_recovery_start_time = 0
+ ssthresh = infinite
+ for pn_space in [ Initial, Handshake, ApplicationData ]:
+ ecn_ce_counters[pn_space] = 0
+
+B.4. On Packet Sent
+
+ Whenever a packet is sent and it contains non-ACK frames, the packet
+ increases bytes_in_flight.
+
+ OnPacketSentCC(sent_bytes):
+ bytes_in_flight += sent_bytes
+
+B.5. On Packet Acknowledgment
+
+ This is invoked from loss detection's OnAckReceived and is supplied
+ with the newly acked_packets from sent_packets.
+
+ In congestion avoidance, implementers that use an integer
+ representation for congestion_window should be careful with division
+ and can use the alternative approach suggested in Section 2.1 of
+ [RFC3465].
+
+ InCongestionRecovery(sent_time):
+ return sent_time <= congestion_recovery_start_time
+
+ OnPacketsAcked(acked_packets):
+ for acked_packet in acked_packets:
+ OnPacketAcked(acked_packet)
+
+ OnPacketAcked(acked_packet):
+ if (!acked_packet.in_flight):
+ return;
+ // Remove from bytes_in_flight.
+ bytes_in_flight -= acked_packet.sent_bytes
+ // Do not increase congestion_window if application
+ // limited or flow control limited.
+ if (IsAppOrFlowControlLimited())
+ return
+ // Do not increase congestion window in recovery period.
+ if (InCongestionRecovery(acked_packet.time_sent)):
+ return
+ if (congestion_window < ssthresh):
+ // Slow start.
+ congestion_window += acked_packet.sent_bytes
+ else:
+ // Congestion avoidance.
+ congestion_window +=
+ max_datagram_size * acked_packet.sent_bytes
+ / congestion_window
+
+B.6. On New Congestion Event
+
+ This is invoked from ProcessECN and OnPacketsLost when a new
+ congestion event is detected. If not already in recovery, this
+ starts a recovery period and reduces the slow start threshold and
+ congestion window immediately.
+
+ OnCongestionEvent(sent_time):
+ // No reaction if already in a recovery period.
+ if (InCongestionRecovery(sent_time)):
+ return
+
+ // Enter recovery period.
+ congestion_recovery_start_time = now()
+ ssthresh = congestion_window * kLossReductionFactor
+ congestion_window = max(ssthresh, kMinimumWindow)
+ // A packet can be sent to speed up loss recovery.
+ MaybeSendOnePacket()
+
+B.7. Process ECN Information
+
+ This is invoked when an ACK frame with an ECN section is received
+ from the peer.
+
+ ProcessECN(ack, pn_space):
+ // If the ECN-CE counter reported by the peer has increased,
+ // this could be a new congestion event.
+ if (ack.ce_counter > ecn_ce_counters[pn_space]):
+ ecn_ce_counters[pn_space] = ack.ce_counter
+ sent_time = sent_packets[ack.largest_acked].time_sent
+ OnCongestionEvent(sent_time)
+
+B.8. On Packets Lost
+
+ This is invoked when DetectAndRemoveLostPackets deems packets lost.
+
+ OnPacketsLost(lost_packets):
+ sent_time_of_last_loss = 0
+ // Remove lost packets from bytes_in_flight.
+ for lost_packet in lost_packets:
+ if lost_packet.in_flight:
+ bytes_in_flight -= lost_packet.sent_bytes
+ sent_time_of_last_loss =
+ max(sent_time_of_last_loss, lost_packet.time_sent)
+ // Congestion event if in-flight packets were lost
+ if (sent_time_of_last_loss != 0):
+ OnCongestionEvent(sent_time_of_last_loss)
+
+ // Reset the congestion window if the loss of these
+ // packets indicates persistent congestion.
+ // Only consider packets sent after getting an RTT sample.
+ if (first_rtt_sample == 0):
+ return
+ pc_lost = []
+ for lost in lost_packets:
+ if lost.time_sent > first_rtt_sample:
+ pc_lost.insert(lost)
+ if (InPersistentCongestion(pc_lost)):
+ congestion_window = kMinimumWindow
+ congestion_recovery_start_time = 0
+
+B.9. Removing Discarded Packets from Bytes in Flight
+
+ When Initial or Handshake keys are discarded, packets sent in that
+ space no longer count toward bytes in flight.
+
+ Pseudocode for RemoveFromBytesInFlight follows:
+
+ RemoveFromBytesInFlight(discarded_packets):
+ // Remove any unacknowledged packets from flight.
+ foreach packet in discarded_packets:
+ if packet.in_flight
+ bytes_in_flight -= size
+
+Contributors
+
+ The IETF QUIC Working Group received an enormous amount of support
+ from many people. The following people provided substantive
+ contributions to this document:
+
+ * Alessandro Ghedini
+ * Benjamin Saunders
+ * Gorry Fairhurst
+ * 山本和彦 (Kazu Yamamoto)
+ * 奥 一穂 (Kazuho Oku)
+ * Lars Eggert
+ * Magnus Westerlund
+ * Marten Seemann
+ * Martin Duke
+ * Martin Thomson
+ * Mirja Kühlewind
+ * Nick Banks
+ * Praveen Balasubramanian
+
+Authors' Addresses
+
+ Jana Iyengar (editor)
+ Fastly
+
+ Email: jri.ietf@gmail.com
+
+
+ Ian Swett (editor)
+ Google
+
+ Email: ianswett@google.com