summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc9000.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc9000.txt')
-rw-r--r--doc/rfc/rfc9000.txt8485
1 files changed, 8485 insertions, 0 deletions
diff --git a/doc/rfc/rfc9000.txt b/doc/rfc/rfc9000.txt
new file mode 100644
index 0000000..3ceabcf
--- /dev/null
+++ b/doc/rfc/rfc9000.txt
@@ -0,0 +1,8485 @@
+
+
+
+
+Internet Engineering Task Force (IETF) J. Iyengar, Ed.
+Request for Comments: 9000 Fastly
+Category: Standards Track M. Thomson, Ed.
+ISSN: 2070-1721 Mozilla
+ May 2021
+
+
+ QUIC: A UDP-Based Multiplexed and Secure Transport
+
+Abstract
+
+ This document defines the core of the QUIC transport protocol. QUIC
+ provides applications with flow-controlled streams for structured
+ communication, low-latency connection establishment, and network path
+ migration. QUIC includes security measures that ensure
+ confidentiality, integrity, and availability in a range of deployment
+ circumstances. Accompanying documents describe the integration of
+ TLS for key negotiation, loss detection, and an exemplary congestion
+ control algorithm.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ https://www.rfc-editor.org/info/rfc9000.
+
+Copyright Notice
+
+ Copyright (c) 2021 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (https://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Overview
+ 1.1. Document Structure
+ 1.2. Terms and Definitions
+ 1.3. Notational Conventions
+ 2. Streams
+ 2.1. Stream Types and Identifiers
+ 2.2. Sending and Receiving Data
+ 2.3. Stream Prioritization
+ 2.4. Operations on Streams
+ 3. Stream States
+ 3.1. Sending Stream States
+ 3.2. Receiving Stream States
+ 3.3. Permitted Frame Types
+ 3.4. Bidirectional Stream States
+ 3.5. Solicited State Transitions
+ 4. Flow Control
+ 4.1. Data Flow Control
+ 4.2. Increasing Flow Control Limits
+ 4.3. Flow Control Performance
+ 4.4. Handling Stream Cancellation
+ 4.5. Stream Final Size
+ 4.6. Controlling Concurrency
+ 5. Connections
+ 5.1. Connection ID
+ 5.1.1. Issuing Connection IDs
+ 5.1.2. Consuming and Retiring Connection IDs
+ 5.2. Matching Packets to Connections
+ 5.2.1. Client Packet Handling
+ 5.2.2. Server Packet Handling
+ 5.2.3. Considerations for Simple Load Balancers
+ 5.3. Operations on Connections
+ 6. Version Negotiation
+ 6.1. Sending Version Negotiation Packets
+ 6.2. Handling Version Negotiation Packets
+ 6.3. Using Reserved Versions
+ 7. Cryptographic and Transport Handshake
+ 7.1. Example Handshake Flows
+ 7.2. Negotiating Connection IDs
+ 7.3. Authenticating Connection IDs
+ 7.4. Transport Parameters
+ 7.4.1. Values of Transport Parameters for 0-RTT
+ 7.4.2. New Transport Parameters
+ 7.5. Cryptographic Message Buffering
+ 8. Address Validation
+ 8.1. Address Validation during Connection Establishment
+ 8.1.1. Token Construction
+ 8.1.2. Address Validation Using Retry Packets
+ 8.1.3. Address Validation for Future Connections
+ 8.1.4. Address Validation Token Integrity
+ 8.2. Path Validation
+ 8.2.1. Initiating Path Validation
+ 8.2.2. Path Validation Responses
+ 8.2.3. Successful Path Validation
+ 8.2.4. Failed Path Validation
+ 9. Connection Migration
+ 9.1. Probing a New Path
+ 9.2. Initiating Connection Migration
+ 9.3. Responding to Connection Migration
+ 9.3.1. Peer Address Spoofing
+ 9.3.2. On-Path Address Spoofing
+ 9.3.3. Off-Path Packet Forwarding
+ 9.4. Loss Detection and Congestion Control
+ 9.5. Privacy Implications of Connection Migration
+ 9.6. Server's Preferred Address
+ 9.6.1. Communicating a Preferred Address
+ 9.6.2. Migration to a Preferred Address
+ 9.6.3. Interaction of Client Migration and Preferred Address
+ 9.7. Use of IPv6 Flow Label and Migration
+ 10. Connection Termination
+ 10.1. Idle Timeout
+ 10.1.1. Liveness Testing
+ 10.1.2. Deferring Idle Timeout
+ 10.2. Immediate Close
+ 10.2.1. Closing Connection State
+ 10.2.2. Draining Connection State
+ 10.2.3. Immediate Close during the Handshake
+ 10.3. Stateless Reset
+ 10.3.1. Detecting a Stateless Reset
+ 10.3.2. Calculating a Stateless Reset Token
+ 10.3.3. Looping
+ 11. Error Handling
+ 11.1. Connection Errors
+ 11.2. Stream Errors
+ 12. Packets and Frames
+ 12.1. Protected Packets
+ 12.2. Coalescing Packets
+ 12.3. Packet Numbers
+ 12.4. Frames and Frame Types
+ 12.5. Frames and Number Spaces
+ 13. Packetization and Reliability
+ 13.1. Packet Processing
+ 13.2. Generating Acknowledgments
+ 13.2.1. Sending ACK Frames
+ 13.2.2. Acknowledgment Frequency
+ 13.2.3. Managing ACK Ranges
+ 13.2.4. Limiting Ranges by Tracking ACK Frames
+ 13.2.5. Measuring and Reporting Host Delay
+ 13.2.6. ACK Frames and Packet Protection
+ 13.2.7. PADDING Frames Consume Congestion Window
+ 13.3. Retransmission of Information
+ 13.4. Explicit Congestion Notification
+ 13.4.1. Reporting ECN Counts
+ 13.4.2. ECN Validation
+ 14. Datagram Size
+ 14.1. Initial Datagram Size
+ 14.2. Path Maximum Transmission Unit
+ 14.2.1. Handling of ICMP Messages by PMTUD
+ 14.3. Datagram Packetization Layer PMTU Discovery
+ 14.3.1. DPLPMTUD and Initial Connectivity
+ 14.3.2. Validating the Network Path with DPLPMTUD
+ 14.3.3. Handling of ICMP Messages by DPLPMTUD
+ 14.4. Sending QUIC PMTU Probes
+ 14.4.1. PMTU Probes Containing Source Connection ID
+ 15. Versions
+ 16. Variable-Length Integer Encoding
+ 17. Packet Formats
+ 17.1. Packet Number Encoding and Decoding
+ 17.2. Long Header Packets
+ 17.2.1. Version Negotiation Packet
+ 17.2.2. Initial Packet
+ 17.2.3. 0-RTT
+ 17.2.4. Handshake Packet
+ 17.2.5. Retry Packet
+ 17.3. Short Header Packets
+ 17.3.1. 1-RTT Packet
+ 17.4. Latency Spin Bit
+ 18. Transport Parameter Encoding
+ 18.1. Reserved Transport Parameters
+ 18.2. Transport Parameter Definitions
+ 19. Frame Types and Formats
+ 19.1. PADDING Frames
+ 19.2. PING Frames
+ 19.3. ACK Frames
+ 19.3.1. ACK Ranges
+ 19.3.2. ECN Counts
+ 19.4. RESET_STREAM Frames
+ 19.5. STOP_SENDING Frames
+ 19.6. CRYPTO Frames
+ 19.7. NEW_TOKEN Frames
+ 19.8. STREAM Frames
+ 19.9. MAX_DATA Frames
+ 19.10. MAX_STREAM_DATA Frames
+ 19.11. MAX_STREAMS Frames
+ 19.12. DATA_BLOCKED Frames
+ 19.13. STREAM_DATA_BLOCKED Frames
+ 19.14. STREAMS_BLOCKED Frames
+ 19.15. NEW_CONNECTION_ID Frames
+ 19.16. RETIRE_CONNECTION_ID Frames
+ 19.17. PATH_CHALLENGE Frames
+ 19.18. PATH_RESPONSE Frames
+ 19.19. CONNECTION_CLOSE Frames
+ 19.20. HANDSHAKE_DONE Frames
+ 19.21. Extension Frames
+ 20. Error Codes
+ 20.1. Transport Error Codes
+ 20.2. Application Protocol Error Codes
+ 21. Security Considerations
+ 21.1. Overview of Security Properties
+ 21.1.1. Handshake
+ 21.1.2. Protected Packets
+ 21.1.3. Connection Migration
+ 21.2. Handshake Denial of Service
+ 21.3. Amplification Attack
+ 21.4. Optimistic ACK Attack
+ 21.5. Request Forgery Attacks
+ 21.5.1. Control Options for Endpoints
+ 21.5.2. Request Forgery with Client Initial Packets
+ 21.5.3. Request Forgery with Preferred Addresses
+ 21.5.4. Request Forgery with Spoofed Migration
+ 21.5.5. Request Forgery with Version Negotiation
+ 21.5.6. Generic Request Forgery Countermeasures
+ 21.6. Slowloris Attacks
+ 21.7. Stream Fragmentation and Reassembly Attacks
+ 21.8. Stream Commitment Attack
+ 21.9. Peer Denial of Service
+ 21.10. Explicit Congestion Notification Attacks
+ 21.11. Stateless Reset Oracle
+ 21.12. Version Downgrade
+ 21.13. Targeted Attacks by Routing
+ 21.14. Traffic Analysis
+ 22. IANA Considerations
+ 22.1. Registration Policies for QUIC Registries
+ 22.1.1. Provisional Registrations
+ 22.1.2. Selecting Codepoints
+ 22.1.3. Reclaiming Provisional Codepoints
+ 22.1.4. Permanent Registrations
+ 22.2. QUIC Versions Registry
+ 22.3. QUIC Transport Parameters Registry
+ 22.4. QUIC Frame Types Registry
+ 22.5. QUIC Transport Error Codes Registry
+ 23. References
+ 23.1. Normative References
+ 23.2. Informative References
+ Appendix A. Pseudocode
+ A.1. Sample Variable-Length Integer Decoding
+ A.2. Sample Packet Number Encoding Algorithm
+ A.3. Sample Packet Number Decoding Algorithm
+ A.4. Sample ECN Validation Algorithm
+ Contributors
+ Authors' Addresses
+
+1. Overview
+
+ QUIC is a secure general-purpose transport protocol. This document
+ defines version 1 of QUIC, which conforms to the version-independent
+ properties of QUIC defined in [QUIC-INVARIANTS].
+
+ QUIC is a connection-oriented protocol that creates a stateful
+ interaction between a client and server.
+
+ The QUIC handshake combines negotiation of cryptographic and
+ transport parameters. QUIC integrates the TLS handshake [TLS13],
+ although using a customized framing for protecting packets. The
+ integration of TLS and QUIC is described in more detail in
+ [QUIC-TLS]. The handshake is structured to permit the exchange of
+ application data as soon as possible. This includes an option for
+ clients to send data immediately (0-RTT), which requires some form of
+ prior communication or configuration to enable.
+
+ Endpoints communicate in QUIC by exchanging QUIC packets. Most
+ packets contain frames, which carry control information and
+ application data between endpoints. QUIC authenticates the entirety
+ of each packet and encrypts as much of each packet as is practical.
+ QUIC packets are carried in UDP datagrams [UDP] to better facilitate
+ deployment in existing systems and networks.
+
+ Application protocols exchange information over a QUIC connection via
+ streams, which are ordered sequences of bytes. Two types of streams
+ can be created: bidirectional streams, which allow both endpoints to
+ send data; and unidirectional streams, which allow a single endpoint
+ to send data. A credit-based scheme is used to limit stream creation
+ and to bound the amount of data that can be sent.
+
+ QUIC provides the necessary feedback to implement reliable delivery
+ and congestion control. An algorithm for detecting and recovering
+ from loss of data is described in Section 6 of [QUIC-RECOVERY]. QUIC
+ depends on congestion control to avoid network congestion. An
+ exemplary congestion control algorithm is described in Section 7 of
+ [QUIC-RECOVERY].
+
+ QUIC connections are not strictly bound to a single network path.
+ Connection migration uses connection identifiers to allow connections
+ to transfer to a new network path. Only clients are able to migrate
+ in this version of QUIC. This design also allows connections to
+ continue after changes in network topology or address mappings, such
+ as might be caused by NAT rebinding.
+
+ Once established, multiple options are provided for connection
+ termination. Applications can manage a graceful shutdown, endpoints
+ can negotiate a timeout period, errors can cause immediate connection
+ teardown, and a stateless mechanism provides for termination of
+ connections after one endpoint has lost state.
+
+1.1. Document Structure
+
+ This document describes the core QUIC protocol and is structured as
+ follows:
+
+ * Streams are the basic service abstraction that QUIC provides.
+
+ - Section 2 describes core concepts related to streams,
+
+ - Section 3 provides a reference model for stream states, and
+
+ - Section 4 outlines the operation of flow control.
+
+ * Connections are the context in which QUIC endpoints communicate.
+
+ - Section 5 describes core concepts related to connections,
+
+ - Section 6 describes version negotiation,
+
+ - Section 7 details the process for establishing connections,
+
+ - Section 8 describes address validation and critical denial-of-
+ service mitigations,
+
+ - Section 9 describes how endpoints migrate a connection to a new
+ network path,
+
+ - Section 10 lists the options for terminating an open
+ connection, and
+
+ - Section 11 provides guidance for stream and connection error
+ handling.
+
+ * Packets and frames are the basic unit used by QUIC to communicate.
+
+ - Section 12 describes concepts related to packets and frames,
+
+ - Section 13 defines models for the transmission, retransmission,
+ and acknowledgment of data, and
+
+ - Section 14 specifies rules for managing the size of datagrams
+ carrying QUIC packets.
+
+ * Finally, encoding details of QUIC protocol elements are described
+ in:
+
+ - Section 15 (versions),
+
+ - Section 16 (integer encoding),
+
+ - Section 17 (packet headers),
+
+ - Section 18 (transport parameters),
+
+ - Section 19 (frames), and
+
+ - Section 20 (errors).
+
+ Accompanying documents describe QUIC's loss detection and congestion
+ control [QUIC-RECOVERY], and the use of TLS and other cryptographic
+ mechanisms [QUIC-TLS].
+
+ This document defines QUIC version 1, which conforms to the protocol
+ invariants in [QUIC-INVARIANTS].
+
+ To refer to QUIC version 1, cite this document. References to the
+ limited set of version-independent properties of QUIC can cite
+ [QUIC-INVARIANTS].
+
+1.2. Terms and Definitions
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in BCP
+ 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here.
+
+ Commonly used terms in this document are described below.
+
+ QUIC: The transport protocol described by this document. QUIC is a
+ name, not an acronym.
+
+ Endpoint: An entity that can participate in a QUIC connection by
+ generating, receiving, and processing QUIC packets. There are
+ only two types of endpoints in QUIC: client and server.
+
+ Client: The endpoint that initiates a QUIC connection.
+
+ Server: The endpoint that accepts a QUIC connection.
+
+ QUIC packet: A complete processable unit of QUIC that can be
+ encapsulated in a UDP datagram. One or more QUIC packets can be
+ encapsulated in a single UDP datagram.
+
+ Ack-eliciting packet: A QUIC packet that contains frames other than
+ ACK, PADDING, and CONNECTION_CLOSE. These cause a recipient to
+ send an acknowledgment; see Section 13.2.1.
+
+ Frame: A unit of structured protocol information. There are
+ multiple frame types, each of which carries different information.
+ Frames are contained in QUIC packets.
+
+ Address: When used without qualification, the tuple of IP version,
+ IP address, and UDP port number that represents one end of a
+ network path.
+
+ Connection ID: An identifier that is used to identify a QUIC
+ connection at an endpoint. Each endpoint selects one or more
+ connection IDs for its peer to include in packets sent towards the
+ endpoint. This value is opaque to the peer.
+
+ Stream: A unidirectional or bidirectional channel of ordered bytes
+ within a QUIC connection. A QUIC connection can carry multiple
+ simultaneous streams.
+
+ Application: An entity that uses QUIC to send and receive data.
+
+ This document uses the terms "QUIC packets", "UDP datagrams", and "IP
+ packets" to refer to the units of the respective protocols. That is,
+ one or more QUIC packets can be encapsulated in a UDP datagram, which
+ is in turn encapsulated in an IP packet.
+
+1.3. Notational Conventions
+
+ Packet and frame diagrams in this document use a custom format. The
+ purpose of this format is to summarize, not define, protocol
+ elements. Prose defines the complete semantics and details of
+ structures.
+
+ Complex fields are named and then followed by a list of fields
+ surrounded by a pair of matching braces. Each field in this list is
+ separated by commas.
+
+ Individual fields include length information, plus indications about
+ fixed value, optionality, or repetitions. Individual fields use the
+ following notational conventions, with all lengths in bits:
+
+ x (A): Indicates that x is A bits long
+
+ x (i): Indicates that x holds an integer value using the variable-
+ length encoding described in Section 16
+
+ x (A..B): Indicates that x can be any length from A to B; A can be
+ omitted to indicate a minimum of zero bits, and B can be omitted
+ to indicate no set upper limit; values in this format always end
+ on a byte boundary
+
+ x (L) = C: Indicates that x has a fixed value of C; the length of x
+ is described by L, which can use any of the length forms above
+
+ x (L) = C..D: Indicates that x has a value in the range from C to D,
+ inclusive, with the length described by L, as above
+
+ [x (L)]: Indicates that x is optional and has a length of L
+
+ x (L) ...: Indicates that x is repeated zero or more times and that
+ each instance has a length of L
+
+ This document uses network byte order (that is, big endian) values.
+ Fields are placed starting from the high-order bits of each byte.
+
+ By convention, individual fields reference a complex field by using
+ the name of the complex field.
+
+ Figure 1 provides an example:
+
+ Example Structure {
+ One-bit Field (1),
+ 7-bit Field with Fixed Value (7) = 61,
+ Field with Variable-Length Integer (i),
+ Arbitrary-Length Field (..),
+ Variable-Length Field (8..24),
+ Field With Minimum Length (16..),
+ Field With Maximum Length (..128),
+ [Optional Field (64)],
+ Repeated Field (8) ...,
+ }
+
+ Figure 1: Example Format
+
+ When a single-bit field is referenced in prose, the position of that
+ field can be clarified by using the value of the byte that carries
+ the field with the field's value set. For example, the value 0x80
+ could be used to refer to the single-bit field in the most
+ significant bit of the byte, such as One-bit Field in Figure 1.
+
+2. Streams
+
+ Streams in QUIC provide a lightweight, ordered byte-stream
+ abstraction to an application. Streams can be unidirectional or
+ bidirectional.
+
+ Streams can be created by sending data. Other processes associated
+ with stream management -- ending, canceling, and managing flow
+ control -- are all designed to impose minimal overheads. For
+ instance, a single STREAM frame (Section 19.8) can open, carry data
+ for, and close a stream. Streams can also be long-lived and can last
+ the entire duration of a connection.
+
+ Streams can be created by either endpoint, can concurrently send data
+ interleaved with other streams, and can be canceled. QUIC does not
+ provide any means of ensuring ordering between bytes on different
+ streams.
+
+ QUIC allows for an arbitrary number of streams to operate
+ concurrently and for an arbitrary amount of data to be sent on any
+ stream, subject to flow control constraints and stream limits; see
+ Section 4.
+
+2.1. Stream Types and Identifiers
+
+ Streams can be unidirectional or bidirectional. Unidirectional
+ streams carry data in one direction: from the initiator of the stream
+ to its peer. Bidirectional streams allow for data to be sent in both
+ directions.
+
+ Streams are identified within a connection by a numeric value,
+ referred to as the stream ID. A stream ID is a 62-bit integer (0 to
+ 2^62-1) that is unique for all streams on a connection. Stream IDs
+ are encoded as variable-length integers; see Section 16. A QUIC
+ endpoint MUST NOT reuse a stream ID within a connection.
+
+ The least significant bit (0x01) of the stream ID identifies the
+ initiator of the stream. Client-initiated streams have even-numbered
+ stream IDs (with the bit set to 0), and server-initiated streams have
+ odd-numbered stream IDs (with the bit set to 1).
+
+ The second least significant bit (0x02) of the stream ID
+ distinguishes between bidirectional streams (with the bit set to 0)
+ and unidirectional streams (with the bit set to 1).
+
+ The two least significant bits from a stream ID therefore identify a
+ stream as one of four types, as summarized in Table 1.
+
+ +======+==================================+
+ | Bits | Stream Type |
+ +======+==================================+
+ | 0x00 | Client-Initiated, Bidirectional |
+ +------+----------------------------------+
+ | 0x01 | Server-Initiated, Bidirectional |
+ +------+----------------------------------+
+ | 0x02 | Client-Initiated, Unidirectional |
+ +------+----------------------------------+
+ | 0x03 | Server-Initiated, Unidirectional |
+ +------+----------------------------------+
+
+ Table 1: Stream ID Types
+
+ The stream space for each type begins at the minimum value (0x00
+ through 0x03, respectively); successive streams of each type are
+ created with numerically increasing stream IDs. A stream ID that is
+ used out of order results in all streams of that type with lower-
+ numbered stream IDs also being opened.
+
+2.2. Sending and Receiving Data
+
+ STREAM frames (Section 19.8) encapsulate data sent by an application.
+ An endpoint uses the Stream ID and Offset fields in STREAM frames to
+ place data in order.
+
+ Endpoints MUST be able to deliver stream data to an application as an
+ ordered byte stream. Delivering an ordered byte stream requires that
+ an endpoint buffer any data that is received out of order, up to the
+ advertised flow control limit.
+
+ QUIC makes no specific allowances for delivery of stream data out of
+ order. However, implementations MAY choose to offer the ability to
+ deliver data out of order to a receiving application.
+
+ An endpoint could receive data for a stream at the same stream offset
+ multiple times. Data that has already been received can be
+ discarded. The data at a given offset MUST NOT change if it is sent
+ multiple times; an endpoint MAY treat receipt of different data at
+ the same offset within a stream as a connection error of type
+ PROTOCOL_VIOLATION.
+
+ Streams are an ordered byte-stream abstraction with no other
+ structure visible to QUIC. STREAM frame boundaries are not expected
+ to be preserved when data is transmitted, retransmitted after packet
+ loss, or delivered to the application at a receiver.
+
+ An endpoint MUST NOT send data on any stream without ensuring that it
+ is within the flow control limits set by its peer. Flow control is
+ described in detail in Section 4.
+
+2.3. Stream Prioritization
+
+ Stream multiplexing can have a significant effect on application
+ performance if resources allocated to streams are correctly
+ prioritized.
+
+ QUIC does not provide a mechanism for exchanging prioritization
+ information. Instead, it relies on receiving priority information
+ from the application.
+
+ A QUIC implementation SHOULD provide ways in which an application can
+ indicate the relative priority of streams. An implementation uses
+ information provided by the application to determine how to allocate
+ resources to active streams.
+
+2.4. Operations on Streams
+
+ This document does not define an API for QUIC; it instead defines a
+ set of functions on streams that application protocols can rely upon.
+ An application protocol can assume that a QUIC implementation
+ provides an interface that includes the operations described in this
+ section. An implementation designed for use with a specific
+ application protocol might provide only those operations that are
+ used by that protocol.
+
+ On the sending part of a stream, an application protocol can:
+
+ * write data, understanding when stream flow control credit
+ (Section 4.1) has successfully been reserved to send the written
+ data;
+
+ * end the stream (clean termination), resulting in a STREAM frame
+ (Section 19.8) with the FIN bit set; and
+
+ * reset the stream (abrupt termination), resulting in a RESET_STREAM
+ frame (Section 19.4) if the stream was not already in a terminal
+ state.
+
+ On the receiving part of a stream, an application protocol can:
+
+ * read data; and
+
+ * abort reading of the stream and request closure, possibly
+ resulting in a STOP_SENDING frame (Section 19.5).
+
+ An application protocol can also request to be informed of state
+ changes on streams, including when the peer has opened or reset a
+ stream, when a peer aborts reading on a stream, when new data is
+ available, and when data can or cannot be written to the stream due
+ to flow control.
+
+3. Stream States
+
+ This section describes streams in terms of their send or receive
+ components. Two state machines are described: one for the streams on
+ which an endpoint transmits data (Section 3.1) and another for
+ streams on which an endpoint receives data (Section 3.2).
+
+ Unidirectional streams use either the sending or receiving state
+ machine, depending on the stream type and endpoint role.
+ Bidirectional streams use both state machines at both endpoints. For
+ the most part, the use of these state machines is the same whether
+ the stream is unidirectional or bidirectional. The conditions for
+ opening a stream are slightly more complex for a bidirectional stream
+ because the opening of either the send or receive side causes the
+ stream to open in both directions.
+
+ The state machines shown in this section are largely informative.
+ This document uses stream states to describe rules for when and how
+ different types of frames can be sent and the reactions that are
+ expected when different types of frames are received. Though these
+ state machines are intended to be useful in implementing QUIC, these
+ states are not intended to constrain implementations. An
+ implementation can define a different state machine as long as its
+ behavior is consistent with an implementation that implements these
+ states.
+
+ | Note: In some cases, a single event or action can cause a
+ | transition through multiple states. For instance, sending
+ | STREAM with a FIN bit set can cause two state transitions for a
+ | sending stream: from the "Ready" state to the "Send" state, and
+ | from the "Send" state to the "Data Sent" state.
+
+3.1. Sending Stream States
+
+ Figure 2 shows the states for the part of a stream that sends data to
+ a peer.
+
+ o
+ | Create Stream (Sending)
+ | Peer Creates Bidirectional Stream
+ v
+ +-------+
+ | Ready | Send RESET_STREAM
+ | |-----------------------.
+ +-------+ |
+ | |
+ | Send STREAM / |
+ | STREAM_DATA_BLOCKED |
+ v |
+ +-------+ |
+ | Send | Send RESET_STREAM |
+ | |---------------------->|
+ +-------+ |
+ | |
+ | Send STREAM + FIN |
+ v v
+ +-------+ +-------+
+ | Data | Send RESET_STREAM | Reset |
+ | Sent |------------------>| Sent |
+ +-------+ +-------+
+ | |
+ | Recv All ACKs | Recv ACK
+ v v
+ +-------+ +-------+
+ | Data | | Reset |
+ | Recvd | | Recvd |
+ +-------+ +-------+
+
+ Figure 2: States for Sending Parts of Streams
+
+ The sending part of a stream that the endpoint initiates (types 0 and
+ 2 for clients, 1 and 3 for servers) is opened by the application.
+ The "Ready" state represents a newly created stream that is able to
+ accept data from the application. Stream data might be buffered in
+ this state in preparation for sending.
+
+ Sending the first STREAM or STREAM_DATA_BLOCKED frame causes a
+ sending part of a stream to enter the "Send" state. An
+ implementation might choose to defer allocating a stream ID to a
+ stream until it sends the first STREAM frame and enters this state,
+ which can allow for better stream prioritization.
+
+ The sending part of a bidirectional stream initiated by a peer (type
+ 0 for a server, type 1 for a client) starts in the "Ready" state when
+ the receiving part is created.
+
+ In the "Send" state, an endpoint transmits -- and retransmits as
+ necessary -- stream data in STREAM frames. The endpoint respects the
+ flow control limits set by its peer and continues to accept and
+ process MAX_STREAM_DATA frames. An endpoint in the "Send" state
+ generates STREAM_DATA_BLOCKED frames if it is blocked from sending by
+ stream flow control limits (Section 4.1).
+
+ After the application indicates that all stream data has been sent
+ and a STREAM frame containing the FIN bit is sent, the sending part
+ of the stream enters the "Data Sent" state. From this state, the
+ endpoint only retransmits stream data as necessary. The endpoint
+ does not need to check flow control limits or send
+ STREAM_DATA_BLOCKED frames for a stream in this state.
+ MAX_STREAM_DATA frames might be received until the peer receives the
+ final stream offset. The endpoint can safely ignore any
+ MAX_STREAM_DATA frames it receives from its peer for a stream in this
+ state.
+
+ Once all stream data has been successfully acknowledged, the sending
+ part of the stream enters the "Data Recvd" state, which is a terminal
+ state.
+
+ From any state that is one of "Ready", "Send", or "Data Sent", an
+ application can signal that it wishes to abandon transmission of
+ stream data. Alternatively, an endpoint might receive a STOP_SENDING
+ frame from its peer. In either case, the endpoint sends a
+ RESET_STREAM frame, which causes the stream to enter the "Reset Sent"
+ state.
+
+ An endpoint MAY send a RESET_STREAM as the first frame that mentions
+ a stream; this causes the sending part of that stream to open and
+ then immediately transition to the "Reset Sent" state.
+
+ Once a packet containing a RESET_STREAM has been acknowledged, the
+ sending part of the stream enters the "Reset Recvd" state, which is a
+ terminal state.
+
+3.2. Receiving Stream States
+
+ Figure 3 shows the states for the part of a stream that receives data
+ from a peer. The states for a receiving part of a stream mirror only
+ some of the states of the sending part of the stream at the peer.
+ The receiving part of a stream does not track states on the sending
+ part that cannot be observed, such as the "Ready" state. Instead,
+ the receiving part of a stream tracks the delivery of data to the
+ application, some of which cannot be observed by the sender.
+
+ o
+ | Recv STREAM / STREAM_DATA_BLOCKED / RESET_STREAM
+ | Create Bidirectional Stream (Sending)
+ | Recv MAX_STREAM_DATA / STOP_SENDING (Bidirectional)
+ | Create Higher-Numbered Stream
+ v
+ +-------+
+ | Recv | Recv RESET_STREAM
+ | |-----------------------.
+ +-------+ |
+ | |
+ | Recv STREAM + FIN |
+ v |
+ +-------+ |
+ | Size | Recv RESET_STREAM |
+ | Known |---------------------->|
+ +-------+ |
+ | |
+ | Recv All Data |
+ v v
+ +-------+ Recv RESET_STREAM +-------+
+ | Data |--- (optional) --->| Reset |
+ | Recvd | Recv All Data | Recvd |
+ +-------+<-- (optional) ----+-------+
+ | |
+ | App Read All Data | App Read Reset
+ v v
+ +-------+ +-------+
+ | Data | | Reset |
+ | Read | | Read |
+ +-------+ +-------+
+
+ Figure 3: States for Receiving Parts of Streams
+
+ The receiving part of a stream initiated by a peer (types 1 and 3 for
+ a client, or 0 and 2 for a server) is created when the first STREAM,
+ STREAM_DATA_BLOCKED, or RESET_STREAM frame is received for that
+ stream. For bidirectional streams initiated by a peer, receipt of a
+ MAX_STREAM_DATA or STOP_SENDING frame for the sending part of the
+ stream also creates the receiving part. The initial state for the
+ receiving part of a stream is "Recv".
+
+ For a bidirectional stream, the receiving part enters the "Recv"
+ state when the sending part initiated by the endpoint (type 0 for a
+ client, type 1 for a server) enters the "Ready" state.
+
+ An endpoint opens a bidirectional stream when a MAX_STREAM_DATA or
+ STOP_SENDING frame is received from the peer for that stream.
+ Receiving a MAX_STREAM_DATA frame for an unopened stream indicates
+ that the remote peer has opened the stream and is providing flow
+ control credit. Receiving a STOP_SENDING frame for an unopened
+ stream indicates that the remote peer no longer wishes to receive
+ data on this stream. Either frame might arrive before a STREAM or
+ STREAM_DATA_BLOCKED frame if packets are lost or reordered.
+
+ Before a stream is created, all streams of the same type with lower-
+ numbered stream IDs MUST be created. This ensures that the creation
+ order for streams is consistent on both endpoints.
+
+ In the "Recv" state, the endpoint receives STREAM and
+ STREAM_DATA_BLOCKED frames. Incoming data is buffered and can be
+ reassembled into the correct order for delivery to the application.
+ As data is consumed by the application and buffer space becomes
+ available, the endpoint sends MAX_STREAM_DATA frames to allow the
+ peer to send more data.
+
+ When a STREAM frame with a FIN bit is received, the final size of the
+ stream is known; see Section 4.5. The receiving part of the stream
+ then enters the "Size Known" state. In this state, the endpoint no
+ longer needs to send MAX_STREAM_DATA frames; it only receives any
+ retransmissions of stream data.
+
+ Once all data for the stream has been received, the receiving part
+ enters the "Data Recvd" state. This might happen as a result of
+ receiving the same STREAM frame that causes the transition to "Size
+ Known". After all data has been received, any STREAM or
+ STREAM_DATA_BLOCKED frames for the stream can be discarded.
+
+ The "Data Recvd" state persists until stream data has been delivered
+ to the application. Once stream data has been delivered, the stream
+ enters the "Data Read" state, which is a terminal state.
+
+ Receiving a RESET_STREAM frame in the "Recv" or "Size Known" state
+ causes the stream to enter the "Reset Recvd" state. This might cause
+ the delivery of stream data to the application to be interrupted.
+
+ It is possible that all stream data has already been received when a
+ RESET_STREAM is received (that is, in the "Data Recvd" state).
+ Similarly, it is possible for remaining stream data to arrive after
+ receiving a RESET_STREAM frame (the "Reset Recvd" state). An
+ implementation is free to manage this situation as it chooses.
+
+ Sending a RESET_STREAM means that an endpoint cannot guarantee
+ delivery of stream data; however, there is no requirement that stream
+ data not be delivered if a RESET_STREAM is received. An
+ implementation MAY interrupt delivery of stream data, discard any
+ data that was not consumed, and signal the receipt of the
+ RESET_STREAM. A RESET_STREAM signal might be suppressed or withheld
+ if stream data is completely received and is buffered to be read by
+ the application. If the RESET_STREAM is suppressed, the receiving
+ part of the stream remains in "Data Recvd".
+
+ Once the application receives the signal indicating that the stream
+ was reset, the receiving part of the stream transitions to the "Reset
+ Read" state, which is a terminal state.
+
+3.3. Permitted Frame Types
+
+ The sender of a stream sends just three frame types that affect the
+ state of a stream at either the sender or the receiver: STREAM
+ (Section 19.8), STREAM_DATA_BLOCKED (Section 19.13), and RESET_STREAM
+ (Section 19.4).
+
+ A sender MUST NOT send any of these frames from a terminal state
+ ("Data Recvd" or "Reset Recvd"). A sender MUST NOT send a STREAM or
+ STREAM_DATA_BLOCKED frame for a stream in the "Reset Sent" state or
+ any terminal state -- that is, after sending a RESET_STREAM frame. A
+ receiver could receive any of these three frames in any state, due to
+ the possibility of delayed delivery of packets carrying them.
+
+ The receiver of a stream sends MAX_STREAM_DATA frames (Section 19.10)
+ and STOP_SENDING frames (Section 19.5).
+
+ The receiver only sends MAX_STREAM_DATA frames in the "Recv" state.
+ A receiver MAY send a STOP_SENDING frame in any state where it has
+ not received a RESET_STREAM frame -- that is, states other than
+ "Reset Recvd" or "Reset Read". However, there is little value in
+ sending a STOP_SENDING frame in the "Data Recvd" state, as all stream
+ data has been received. A sender could receive either of these two
+ types of frames in any state as a result of delayed delivery of
+ packets.
+
+3.4. Bidirectional Stream States
+
+ A bidirectional stream is composed of sending and receiving parts.
+ Implementations can represent states of the bidirectional stream as
+ composites of sending and receiving stream states. The simplest
+ model presents the stream as "open" when either sending or receiving
+ parts are in a non-terminal state and "closed" when both sending and
+ receiving streams are in terminal states.
+
+ Table 2 shows a more complex mapping of bidirectional stream states
+ that loosely correspond to the stream states defined in HTTP/2
+ [HTTP2]. This shows that multiple states on sending or receiving
+ parts of streams are mapped to the same composite state. Note that
+ this is just one possibility for such a mapping; this mapping
+ requires that data be acknowledged before the transition to a
+ "closed" or "half-closed" state.
+
+ +===================+=======================+=================+
+ | Sending Part | Receiving Part | Composite State |
+ +===================+=======================+=================+
+ | No Stream / Ready | No Stream / Recv (*1) | idle |
+ +-------------------+-----------------------+-----------------+
+ | Ready / Send / | Recv / Size Known | open |
+ | Data Sent | | |
+ +-------------------+-----------------------+-----------------+
+ | Ready / Send / | Data Recvd / Data | half-closed |
+ | Data Sent | Read | (remote) |
+ +-------------------+-----------------------+-----------------+
+ | Ready / Send / | Reset Recvd / Reset | half-closed |
+ | Data Sent | Read | (remote) |
+ +-------------------+-----------------------+-----------------+
+ | Data Recvd | Recv / Size Known | half-closed |
+ | | | (local) |
+ +-------------------+-----------------------+-----------------+
+ | Reset Sent / | Recv / Size Known | half-closed |
+ | Reset Recvd | | (local) |
+ +-------------------+-----------------------+-----------------+
+ | Reset Sent / | Data Recvd / Data | closed |
+ | Reset Recvd | Read | |
+ +-------------------+-----------------------+-----------------+
+ | Reset Sent / | Reset Recvd / Reset | closed |
+ | Reset Recvd | Read | |
+ +-------------------+-----------------------+-----------------+
+ | Data Recvd | Data Recvd / Data | closed |
+ | | Read | |
+ +-------------------+-----------------------+-----------------+
+ | Data Recvd | Reset Recvd / Reset | closed |
+ | | Read | |
+ +-------------------+-----------------------+-----------------+
+
+ Table 2: Possible Mapping of Stream States to HTTP/2
+
+ | Note (*1): A stream is considered "idle" if it has not yet been
+ | created or if the receiving part of the stream is in the "Recv"
+ | state without yet having received any frames.
+
+3.5. Solicited State Transitions
+
+ If an application is no longer interested in the data it is receiving
+ on a stream, it can abort reading the stream and specify an
+ application error code.
+
+ If the stream is in the "Recv" or "Size Known" state, the transport
+ SHOULD signal this by sending a STOP_SENDING frame to prompt closure
+ of the stream in the opposite direction. This typically indicates
+ that the receiving application is no longer reading data it receives
+ from the stream, but it is not a guarantee that incoming data will be
+ ignored.
+
+ STREAM frames received after sending a STOP_SENDING frame are still
+ counted toward connection and stream flow control, even though these
+ frames can be discarded upon receipt.
+
+ A STOP_SENDING frame requests that the receiving endpoint send a
+ RESET_STREAM frame. An endpoint that receives a STOP_SENDING frame
+ MUST send a RESET_STREAM frame if the stream is in the "Ready" or
+ "Send" state. If the stream is in the "Data Sent" state, the
+ endpoint MAY defer sending the RESET_STREAM frame until the packets
+ containing outstanding data are acknowledged or declared lost. If
+ any outstanding data is declared lost, the endpoint SHOULD send a
+ RESET_STREAM frame instead of retransmitting the data.
+
+ An endpoint SHOULD copy the error code from the STOP_SENDING frame to
+ the RESET_STREAM frame it sends, but it can use any application error
+ code. An endpoint that sends a STOP_SENDING frame MAY ignore the
+ error code in any RESET_STREAM frames subsequently received for that
+ stream.
+
+ STOP_SENDING SHOULD only be sent for a stream that has not been reset
+ by the peer. STOP_SENDING is most useful for streams in the "Recv"
+ or "Size Known" state.
+
+ An endpoint is expected to send another STOP_SENDING frame if a
+ packet containing a previous STOP_SENDING is lost. However, once
+ either all stream data or a RESET_STREAM frame has been received for
+ the stream -- that is, the stream is in any state other than "Recv"
+ or "Size Known" -- sending a STOP_SENDING frame is unnecessary.
+
+ An endpoint that wishes to terminate both directions of a
+ bidirectional stream can terminate one direction by sending a
+ RESET_STREAM frame, and it can encourage prompt termination in the
+ opposite direction by sending a STOP_SENDING frame.
+
+4. Flow Control
+
+ Receivers need to limit the amount of data that they are required to
+ buffer, in order to prevent a fast sender from overwhelming them or a
+ malicious sender from consuming a large amount of memory. To enable
+ a receiver to limit memory commitments for a connection, streams are
+ flow controlled both individually and across a connection as a whole.
+ A QUIC receiver controls the maximum amount of data the sender can
+ send on a stream as well as across all streams at any time, as
+ described in Sections 4.1 and 4.2.
+
+ Similarly, to limit concurrency within a connection, a QUIC endpoint
+ controls the maximum cumulative number of streams that its peer can
+ initiate, as described in Section 4.6.
+
+ Data sent in CRYPTO frames is not flow controlled in the same way as
+ stream data. QUIC relies on the cryptographic protocol
+ implementation to avoid excessive buffering of data; see [QUIC-TLS].
+ To avoid excessive buffering at multiple layers, QUIC implementations
+ SHOULD provide an interface for the cryptographic protocol
+ implementation to communicate its buffering limits.
+
+4.1. Data Flow Control
+
+ QUIC employs a limit-based flow control scheme where a receiver
+ advertises the limit of total bytes it is prepared to receive on a
+ given stream or for the entire connection. This leads to two levels
+ of data flow control in QUIC:
+
+ * Stream flow control, which prevents a single stream from consuming
+ the entire receive buffer for a connection by limiting the amount
+ of data that can be sent on each stream.
+
+ * Connection flow control, which prevents senders from exceeding a
+ receiver's buffer capacity for the connection by limiting the
+ total bytes of stream data sent in STREAM frames on all streams.
+
+ Senders MUST NOT send data in excess of either limit.
+
+ A receiver sets initial limits for all streams through transport
+ parameters during the handshake (Section 7.4). Subsequently, a
+ receiver sends MAX_STREAM_DATA frames (Section 19.10) or MAX_DATA
+ frames (Section 19.9) to the sender to advertise larger limits.
+
+ A receiver can advertise a larger limit for a stream by sending a
+ MAX_STREAM_DATA frame with the corresponding stream ID. A
+ MAX_STREAM_DATA frame indicates the maximum absolute byte offset of a
+ stream. A receiver could determine the flow control offset to be
+ advertised based on the current offset of data consumed on that
+ stream.
+
+ A receiver can advertise a larger limit for a connection by sending a
+ MAX_DATA frame, which indicates the maximum of the sum of the
+ absolute byte offsets of all streams. A receiver maintains a
+ cumulative sum of bytes received on all streams, which is used to
+ check for violations of the advertised connection or stream data
+ limits. A receiver could determine the maximum data limit to be
+ advertised based on the sum of bytes consumed on all streams.
+
+ Once a receiver advertises a limit for the connection or a stream, it
+ is not an error to advertise a smaller limit, but the smaller limit
+ has no effect.
+
+ A receiver MUST close the connection with an error of type
+ FLOW_CONTROL_ERROR if the sender violates the advertised connection
+ or stream data limits; see Section 11 for details on error handling.
+
+ A sender MUST ignore any MAX_STREAM_DATA or MAX_DATA frames that do
+ not increase flow control limits.
+
+ If a sender has sent data up to the limit, it will be unable to send
+ new data and is considered blocked. A sender SHOULD send a
+ STREAM_DATA_BLOCKED or DATA_BLOCKED frame to indicate to the receiver
+ that it has data to write but is blocked by flow control limits. If
+ a sender is blocked for a period longer than the idle timeout
+ (Section 10.1), the receiver might close the connection even when the
+ sender has data that is available for transmission. To keep the
+ connection from closing, a sender that is flow control limited SHOULD
+ periodically send a STREAM_DATA_BLOCKED or DATA_BLOCKED frame when it
+ has no ack-eliciting packets in flight.
+
+4.2. Increasing Flow Control Limits
+
+ Implementations decide when and how much credit to advertise in
+ MAX_STREAM_DATA and MAX_DATA frames, but this section offers a few
+ considerations.
+
+ To avoid blocking a sender, a receiver MAY send a MAX_STREAM_DATA or
+ MAX_DATA frame multiple times within a round trip or send it early
+ enough to allow time for loss of the frame and subsequent recovery.
+
+ Control frames contribute to connection overhead. Therefore,
+ frequently sending MAX_STREAM_DATA and MAX_DATA frames with small
+ changes is undesirable. On the other hand, if updates are less
+ frequent, larger increments to limits are necessary to avoid blocking
+ a sender, requiring larger resource commitments at the receiver.
+ There is a trade-off between resource commitment and overhead when
+ determining how large a limit is advertised.
+
+ A receiver can use an autotuning mechanism to tune the frequency and
+ amount of advertised additional credit based on a round-trip time
+ estimate and the rate at which the receiving application consumes
+ data, similar to common TCP implementations. As an optimization, an
+ endpoint could send frames related to flow control only when there
+ are other frames to send, ensuring that flow control does not cause
+ extra packets to be sent.
+
+ A blocked sender is not required to send STREAM_DATA_BLOCKED or
+ DATA_BLOCKED frames. Therefore, a receiver MUST NOT wait for a
+ STREAM_DATA_BLOCKED or DATA_BLOCKED frame before sending a
+ MAX_STREAM_DATA or MAX_DATA frame; doing so could result in the
+ sender being blocked for the rest of the connection. Even if the
+ sender sends these frames, waiting for them will result in the sender
+ being blocked for at least an entire round trip.
+
+ When a sender receives credit after being blocked, it might be able
+ to send a large amount of data in response, resulting in short-term
+ congestion; see Section 7.7 of [QUIC-RECOVERY] for a discussion of
+ how a sender can avoid this congestion.
+
+4.3. Flow Control Performance
+
+ If an endpoint cannot ensure that its peer always has available flow
+ control credit that is greater than the peer's bandwidth-delay
+ product on this connection, its receive throughput will be limited by
+ flow control.
+
+ Packet loss can cause gaps in the receive buffer, preventing the
+ application from consuming data and freeing up receive buffer space.
+
+ Sending timely updates of flow control limits can improve
+ performance. Sending packets only to provide flow control updates
+ can increase network load and adversely affect performance. Sending
+ flow control updates along with other frames, such as ACK frames,
+ reduces the cost of those updates.
+
+4.4. Handling Stream Cancellation
+
+ Endpoints need to eventually agree on the amount of flow control
+ credit that has been consumed on every stream, to be able to account
+ for all bytes for connection-level flow control.
+
+ On receipt of a RESET_STREAM frame, an endpoint will tear down state
+ for the matching stream and ignore further data arriving on that
+ stream.
+
+ RESET_STREAM terminates one direction of a stream abruptly. For a
+ bidirectional stream, RESET_STREAM has no effect on data flow in the
+ opposite direction. Both endpoints MUST maintain flow control state
+ for the stream in the unterminated direction until that direction
+ enters a terminal state.
+
+4.5. Stream Final Size
+
+ The final size is the amount of flow control credit that is consumed
+ by a stream. Assuming that every contiguous byte on the stream was
+ sent once, the final size is the number of bytes sent. More
+ generally, this is one higher than the offset of the byte with the
+ largest offset sent on the stream, or zero if no bytes were sent.
+
+ A sender always communicates the final size of a stream to the
+ receiver reliably, no matter how the stream is terminated. The final
+ size is the sum of the Offset and Length fields of a STREAM frame
+ with a FIN flag, noting that these fields might be implicit.
+ Alternatively, the Final Size field of a RESET_STREAM frame carries
+ this value. This guarantees that both endpoints agree on how much
+ flow control credit was consumed by the sender on that stream.
+
+ An endpoint will know the final size for a stream when the receiving
+ part of the stream enters the "Size Known" or "Reset Recvd" state
+ (Section 3). The receiver MUST use the final size of the stream to
+ account for all bytes sent on the stream in its connection-level flow
+ controller.
+
+ An endpoint MUST NOT send data on a stream at or beyond the final
+ size.
+
+ Once a final size for a stream is known, it cannot change. If a
+ RESET_STREAM or STREAM frame is received indicating a change in the
+ final size for the stream, an endpoint SHOULD respond with an error
+ of type FINAL_SIZE_ERROR; see Section 11 for details on error
+ handling. A receiver SHOULD treat receipt of data at or beyond the
+ final size as an error of type FINAL_SIZE_ERROR, even after a stream
+ is closed. Generating these errors is not mandatory, because
+ requiring that an endpoint generate these errors also means that the
+ endpoint needs to maintain the final size state for closed streams,
+ which could mean a significant state commitment.
+
+4.6. Controlling Concurrency
+
+ An endpoint limits the cumulative number of incoming streams a peer
+ can open. Only streams with a stream ID less than "(max_streams * 4
+ + first_stream_id_of_type)" can be opened; see Table 1. Initial
+ limits are set in the transport parameters; see Section 18.2.
+ Subsequent limits are advertised using MAX_STREAMS frames; see
+ Section 19.11. Separate limits apply to unidirectional and
+ bidirectional streams.
+
+ If a max_streams transport parameter or a MAX_STREAMS frame is
+ received with a value greater than 2^60, this would allow a maximum
+ stream ID that cannot be expressed as a variable-length integer; see
+ Section 16. If either is received, the connection MUST be closed
+ immediately with a connection error of type TRANSPORT_PARAMETER_ERROR
+ if the offending value was received in a transport parameter or of
+ type FRAME_ENCODING_ERROR if it was received in a frame; see
+ Section 10.2.
+
+ Endpoints MUST NOT exceed the limit set by their peer. An endpoint
+ that receives a frame with a stream ID exceeding the limit it has
+ sent MUST treat this as a connection error of type
+ STREAM_LIMIT_ERROR; see Section 11 for details on error handling.
+
+ Once a receiver advertises a stream limit using the MAX_STREAMS
+ frame, advertising a smaller limit has no effect. MAX_STREAMS frames
+ that do not increase the stream limit MUST be ignored.
+
+ As with stream and connection flow control, this document leaves
+ implementations to decide when and how many streams should be
+ advertised to a peer via MAX_STREAMS. Implementations might choose
+ to increase limits as streams are closed, to keep the number of
+ streams available to peers roughly consistent.
+
+ An endpoint that is unable to open a new stream due to the peer's
+ limits SHOULD send a STREAMS_BLOCKED frame (Section 19.14). This
+ signal is considered useful for debugging. An endpoint MUST NOT wait
+ to receive this signal before advertising additional credit, since
+ doing so will mean that the peer will be blocked for at least an
+ entire round trip, and potentially indefinitely if the peer chooses
+ not to send STREAMS_BLOCKED frames.
+
+5. Connections
+
+ A QUIC connection is shared state between a client and a server.
+
+ Each connection starts with a handshake phase, during which the two
+ endpoints establish a shared secret using the cryptographic handshake
+ protocol [QUIC-TLS] and negotiate the application protocol. The
+ handshake (Section 7) confirms that both endpoints are willing to
+ communicate (Section 8.1) and establishes parameters for the
+ connection (Section 7.4).
+
+ An application protocol can use the connection during the handshake
+ phase with some limitations. 0-RTT allows application data to be
+ sent by a client before receiving a response from the server.
+ However, 0-RTT provides no protection against replay attacks; see
+ Section 9.2 of [QUIC-TLS]. A server can also send application data
+ to a client before it receives the final cryptographic handshake
+ messages that allow it to confirm the identity and liveness of the
+ client. These capabilities allow an application protocol to offer
+ the option of trading some security guarantees for reduced latency.
+
+ The use of connection IDs (Section 5.1) allows connections to migrate
+ to a new network path, both as a direct choice of an endpoint and
+ when forced by a change in a middlebox. Section 9 describes
+ mitigations for the security and privacy issues associated with
+ migration.
+
+ For connections that are no longer needed or desired, there are
+ several ways for a client and server to terminate a connection, as
+ described in Section 10.
+
+5.1. Connection ID
+
+ Each connection possesses a set of connection identifiers, or
+ connection IDs, each of which can identify the connection.
+ Connection IDs are independently selected by endpoints; each endpoint
+ selects the connection IDs that its peer uses.
+
+ The primary function of a connection ID is to ensure that changes in
+ addressing at lower protocol layers (UDP, IP) do not cause packets
+ for a QUIC connection to be delivered to the wrong endpoint. Each
+ endpoint selects connection IDs using an implementation-specific (and
+ perhaps deployment-specific) method that will allow packets with that
+ connection ID to be routed back to the endpoint and to be identified
+ by the endpoint upon receipt.
+
+ Multiple connection IDs are used so that endpoints can send packets
+ that cannot be identified by an observer as being for the same
+ connection without cooperation from an endpoint; see Section 9.5.
+
+ Connection IDs MUST NOT contain any information that can be used by
+ an external observer (that is, one that does not cooperate with the
+ issuer) to correlate them with other connection IDs for the same
+ connection. As a trivial example, this means the same connection ID
+ MUST NOT be issued more than once on the same connection.
+
+ Packets with long headers include Source Connection ID and
+ Destination Connection ID fields. These fields are used to set the
+ connection IDs for new connections; see Section 7.2 for details.
+
+ Packets with short headers (Section 17.3) only include the
+ Destination Connection ID and omit the explicit length. The length
+ of the Destination Connection ID field is expected to be known to
+ endpoints. Endpoints using a load balancer that routes based on
+ connection ID could agree with the load balancer on a fixed length
+ for connection IDs or agree on an encoding scheme. A fixed portion
+ could encode an explicit length, which allows the entire connection
+ ID to vary in length and still be used by the load balancer.
+
+ A Version Negotiation (Section 17.2.1) packet echoes the connection
+ IDs selected by the client, both to ensure correct routing toward the
+ client and to demonstrate that the packet is in response to a packet
+ sent by the client.
+
+ A zero-length connection ID can be used when a connection ID is not
+ needed to route to the correct endpoint. However, multiplexing
+ connections on the same local IP address and port while using zero-
+ length connection IDs will cause failures in the presence of peer
+ connection migration, NAT rebinding, and client port reuse. An
+ endpoint MUST NOT use the same IP address and port for multiple
+ concurrent connections with zero-length connection IDs, unless it is
+ certain that those protocol features are not in use.
+
+ When an endpoint uses a non-zero-length connection ID, it needs to
+ ensure that the peer has a supply of connection IDs from which to
+ choose for packets sent to the endpoint. These connection IDs are
+ supplied by the endpoint using the NEW_CONNECTION_ID frame
+ (Section 19.15).
+
+5.1.1. Issuing Connection IDs
+
+ Each connection ID has an associated sequence number to assist in
+ detecting when NEW_CONNECTION_ID or RETIRE_CONNECTION_ID frames refer
+ to the same value. The initial connection ID issued by an endpoint
+ is sent in the Source Connection ID field of the long packet header
+ (Section 17.2) during the handshake. The sequence number of the
+ initial connection ID is 0. If the preferred_address transport
+ parameter is sent, the sequence number of the supplied connection ID
+ is 1.
+
+ Additional connection IDs are communicated to the peer using
+ NEW_CONNECTION_ID frames (Section 19.15). The sequence number on
+ each newly issued connection ID MUST increase by 1. The connection
+ ID that a client selects for the first Destination Connection ID
+ field it sends and any connection ID provided by a Retry packet are
+ not assigned sequence numbers.
+
+ When an endpoint issues a connection ID, it MUST accept packets that
+ carry this connection ID for the duration of the connection or until
+ its peer invalidates the connection ID via a RETIRE_CONNECTION_ID
+ frame (Section 19.16). Connection IDs that are issued and not
+ retired are considered active; any active connection ID is valid for
+ use with the current connection at any time, in any packet type.
+ This includes the connection ID issued by the server via the
+ preferred_address transport parameter.
+
+ An endpoint SHOULD ensure that its peer has a sufficient number of
+ available and unused connection IDs. Endpoints advertise the number
+ of active connection IDs they are willing to maintain using the
+ active_connection_id_limit transport parameter. An endpoint MUST NOT
+ provide more connection IDs than the peer's limit. An endpoint MAY
+ send connection IDs that temporarily exceed a peer's limit if the
+ NEW_CONNECTION_ID frame also requires the retirement of any excess,
+ by including a sufficiently large value in the Retire Prior To field.
+
+ A NEW_CONNECTION_ID frame might cause an endpoint to add some active
+ connection IDs and retire others based on the value of the Retire
+ Prior To field. After processing a NEW_CONNECTION_ID frame and
+ adding and retiring active connection IDs, if the number of active
+ connection IDs exceeds the value advertised in its
+ active_connection_id_limit transport parameter, an endpoint MUST
+ close the connection with an error of type CONNECTION_ID_LIMIT_ERROR.
+
+ An endpoint SHOULD supply a new connection ID when the peer retires a
+ connection ID. If an endpoint provided fewer connection IDs than the
+ peer's active_connection_id_limit, it MAY supply a new connection ID
+ when it receives a packet with a previously unused connection ID. An
+ endpoint MAY limit the total number of connection IDs issued for each
+ connection to avoid the risk of running out of connection IDs; see
+ Section 10.3.2. An endpoint MAY also limit the issuance of
+ connection IDs to reduce the amount of per-path state it maintains,
+ such as path validation status, as its peer might interact with it
+ over as many paths as there are issued connection IDs.
+
+ An endpoint that initiates migration and requires non-zero-length
+ connection IDs SHOULD ensure that the pool of connection IDs
+ available to its peer allows the peer to use a new connection ID on
+ migration, as the peer will be unable to respond if the pool is
+ exhausted.
+
+ An endpoint that selects a zero-length connection ID during the
+ handshake cannot issue a new connection ID. A zero-length
+ Destination Connection ID field is used in all packets sent toward
+ such an endpoint over any network path.
+
+5.1.2. Consuming and Retiring Connection IDs
+
+ An endpoint can change the connection ID it uses for a peer to
+ another available one at any time during the connection. An endpoint
+ consumes connection IDs in response to a migrating peer; see
+ Section 9.5 for more details.
+
+ An endpoint maintains a set of connection IDs received from its peer,
+ any of which it can use when sending packets. When the endpoint
+ wishes to remove a connection ID from use, it sends a
+ RETIRE_CONNECTION_ID frame to its peer. Sending a
+ RETIRE_CONNECTION_ID frame indicates that the connection ID will not
+ be used again and requests that the peer replace it with a new
+ connection ID using a NEW_CONNECTION_ID frame.
+
+ As discussed in Section 9.5, endpoints limit the use of a connection
+ ID to packets sent from a single local address to a single
+ destination address. Endpoints SHOULD retire connection IDs when
+ they are no longer actively using either the local or destination
+ address for which the connection ID was used.
+
+ An endpoint might need to stop accepting previously issued connection
+ IDs in certain circumstances. Such an endpoint can cause its peer to
+ retire connection IDs by sending a NEW_CONNECTION_ID frame with an
+ increased Retire Prior To field. The endpoint SHOULD continue to
+ accept the previously issued connection IDs until they are retired by
+ the peer. If the endpoint can no longer process the indicated
+ connection IDs, it MAY close the connection.
+
+ Upon receipt of an increased Retire Prior To field, the peer MUST
+ stop using the corresponding connection IDs and retire them with
+ RETIRE_CONNECTION_ID frames before adding the newly provided
+ connection ID to the set of active connection IDs. This ordering
+ allows an endpoint to replace all active connection IDs without the
+ possibility of a peer having no available connection IDs and without
+ exceeding the limit the peer sets in the active_connection_id_limit
+ transport parameter; see Section 18.2. Failure to cease using the
+ connection IDs when requested can result in connection failures, as
+ the issuing endpoint might be unable to continue using the connection
+ IDs with the active connection.
+
+ An endpoint SHOULD limit the number of connection IDs it has retired
+ locally for which RETIRE_CONNECTION_ID frames have not yet been
+ acknowledged. An endpoint SHOULD allow for sending and tracking a
+ number of RETIRE_CONNECTION_ID frames of at least twice the value of
+ the active_connection_id_limit transport parameter. An endpoint MUST
+ NOT forget a connection ID without retiring it, though it MAY choose
+ to treat having connection IDs in need of retirement that exceed this
+ limit as a connection error of type CONNECTION_ID_LIMIT_ERROR.
+
+ Endpoints SHOULD NOT issue updates of the Retire Prior To field
+ before receiving RETIRE_CONNECTION_ID frames that retire all
+ connection IDs indicated by the previous Retire Prior To value.
+
+5.2. Matching Packets to Connections
+
+ Incoming packets are classified on receipt. Packets can either be
+ associated with an existing connection or -- for servers --
+ potentially create a new connection.
+
+ Endpoints try to associate a packet with an existing connection. If
+ the packet has a non-zero-length Destination Connection ID
+ corresponding to an existing connection, QUIC processes that packet
+ accordingly. Note that more than one connection ID can be associated
+ with a connection; see Section 5.1.
+
+ If the Destination Connection ID is zero length and the addressing
+ information in the packet matches the addressing information the
+ endpoint uses to identify a connection with a zero-length connection
+ ID, QUIC processes the packet as part of that connection. An
+ endpoint can use just destination IP and port or both source and
+ destination addresses for identification, though this makes
+ connections fragile as described in Section 5.1.
+
+ Endpoints can send a Stateless Reset (Section 10.3) for any packets
+ that cannot be attributed to an existing connection. A Stateless
+ Reset allows a peer to more quickly identify when a connection
+ becomes unusable.
+
+ Packets that are matched to an existing connection are discarded if
+ the packets are inconsistent with the state of that connection. For
+ example, packets are discarded if they indicate a different protocol
+ version than that of the connection or if the removal of packet
+ protection is unsuccessful once the expected keys are available.
+
+ Invalid packets that lack strong integrity protection, such as
+ Initial, Retry, or Version Negotiation, MAY be discarded. An
+ endpoint MUST generate a connection error if processing the contents
+ of these packets prior to discovering an error, or fully revert any
+ changes made during that processing.
+
+5.2.1. Client Packet Handling
+
+ Valid packets sent to clients always include a Destination Connection
+ ID that matches a value the client selects. Clients that choose to
+ receive zero-length connection IDs can use the local address and port
+ to identify a connection. Packets that do not match an existing
+ connection -- based on Destination Connection ID or, if this value is
+ zero length, local IP address and port -- are discarded.
+
+ Due to packet reordering or loss, a client might receive packets for
+ a connection that are encrypted with a key it has not yet computed.
+ The client MAY drop these packets, or it MAY buffer them in
+ anticipation of later packets that allow it to compute the key.
+
+ If a client receives a packet that uses a different version than it
+ initially selected, it MUST discard that packet.
+
+5.2.2. Server Packet Handling
+
+ If a server receives a packet that indicates an unsupported version
+ and if the packet is large enough to initiate a new connection for
+ any supported version, the server SHOULD send a Version Negotiation
+ packet as described in Section 6.1. A server MAY limit the number of
+ packets to which it responds with a Version Negotiation packet.
+ Servers MUST drop smaller packets that specify unsupported versions.
+
+ The first packet for an unsupported version can use different
+ semantics and encodings for any version-specific field. In
+ particular, different packet protection keys might be used for
+ different versions. Servers that do not support a particular version
+ are unlikely to be able to decrypt the payload of the packet or
+ properly interpret the result. Servers SHOULD respond with a Version
+ Negotiation packet, provided that the datagram is sufficiently long.
+
+ Packets with a supported version, or no Version field, are matched to
+ a connection using the connection ID or -- for packets with zero-
+ length connection IDs -- the local address and port. These packets
+ are processed using the selected connection; otherwise, the server
+ continues as described below.
+
+ If the packet is an Initial packet fully conforming with the
+ specification, the server proceeds with the handshake (Section 7).
+ This commits the server to the version that the client selected.
+
+ If a server refuses to accept a new connection, it SHOULD send an
+ Initial packet containing a CONNECTION_CLOSE frame with error code
+ CONNECTION_REFUSED.
+
+ If the packet is a 0-RTT packet, the server MAY buffer a limited
+ number of these packets in anticipation of a late-arriving Initial
+ packet. Clients are not able to send Handshake packets prior to
+ receiving a server response, so servers SHOULD ignore any such
+ packets.
+
+ Servers MUST drop incoming packets under all other circumstances.
+
+5.2.3. Considerations for Simple Load Balancers
+
+ A server deployment could load-balance among servers using only
+ source and destination IP addresses and ports. Changes to the
+ client's IP address or port could result in packets being forwarded
+ to the wrong server. Such a server deployment could use one of the
+ following methods for connection continuity when a client's address
+ changes.
+
+ * Servers could use an out-of-band mechanism to forward packets to
+ the correct server based on connection ID.
+
+ * If servers can use a dedicated server IP address or port, other
+ than the one that the client initially connects to, they could use
+ the preferred_address transport parameter to request that clients
+ move connections to that dedicated address. Note that clients
+ could choose not to use the preferred address.
+
+ A server in a deployment that does not implement a solution to
+ maintain connection continuity when the client address changes SHOULD
+ indicate that migration is not supported by using the
+ disable_active_migration transport parameter. The
+ disable_active_migration transport parameter does not prohibit
+ connection migration after a client has acted on a preferred_address
+ transport parameter.
+
+ Server deployments that use this simple form of load balancing MUST
+ avoid the creation of a stateless reset oracle; see Section 21.11.
+
+5.3. Operations on Connections
+
+ This document does not define an API for QUIC; it instead defines a
+ set of functions for QUIC connections that application protocols can
+ rely upon. An application protocol can assume that an implementation
+ of QUIC provides an interface that includes the operations described
+ in this section. An implementation designed for use with a specific
+ application protocol might provide only those operations that are
+ used by that protocol.
+
+ When implementing the client role, an application protocol can:
+
+ * open a connection, which begins the exchange described in
+ Section 7;
+
+ * enable Early Data when available; and
+
+ * be informed when Early Data has been accepted or rejected by a
+ server.
+
+ When implementing the server role, an application protocol can:
+
+ * listen for incoming connections, which prepares for the exchange
+ described in Section 7;
+
+ * if Early Data is supported, embed application-controlled data in
+ the TLS resumption ticket sent to the client; and
+
+ * if Early Data is supported, retrieve application-controlled data
+ from the client's resumption ticket and accept or reject Early
+ Data based on that information.
+
+ In either role, an application protocol can:
+
+ * configure minimum values for the initial number of permitted
+ streams of each type, as communicated in the transport parameters
+ (Section 7.4);
+
+ * control resource allocation for receive buffers by setting flow
+ control limits both for streams and for the connection;
+
+ * identify whether the handshake has completed successfully or is
+ still ongoing;
+
+ * keep a connection from silently closing, by either generating PING
+ frames (Section 19.2) or requesting that the transport send
+ additional frames before the idle timeout expires (Section 10.1);
+ and
+
+ * immediately close (Section 10.2) the connection.
+
+6. Version Negotiation
+
+ Version negotiation allows a server to indicate that it does not
+ support the version the client used. A server sends a Version
+ Negotiation packet in response to each packet that might initiate a
+ new connection; see Section 5.2 for details.
+
+ The size of the first packet sent by a client will determine whether
+ a server sends a Version Negotiation packet. Clients that support
+ multiple QUIC versions SHOULD ensure that the first UDP datagram they
+ send is sized to the largest of the minimum datagram sizes from all
+ versions they support, using PADDING frames (Section 19.1) as
+ necessary. This ensures that the server responds if there is a
+ mutually supported version. A server might not send a Version
+ Negotiation packet if the datagram it receives is smaller than the
+ minimum size specified in a different version; see Section 14.1.
+
+6.1. Sending Version Negotiation Packets
+
+ If the version selected by the client is not acceptable to the
+ server, the server responds with a Version Negotiation packet; see
+ Section 17.2.1. This includes a list of versions that the server
+ will accept. An endpoint MUST NOT send a Version Negotiation packet
+ in response to receiving a Version Negotiation packet.
+
+ This system allows a server to process packets with unsupported
+ versions without retaining state. Though either the Initial packet
+ or the Version Negotiation packet that is sent in response could be
+ lost, the client will send new packets until it successfully receives
+ a response or it abandons the connection attempt.
+
+ A server MAY limit the number of Version Negotiation packets it
+ sends. For instance, a server that is able to recognize packets as
+ 0-RTT might choose not to send Version Negotiation packets in
+ response to 0-RTT packets with the expectation that it will
+ eventually receive an Initial packet.
+
+6.2. Handling Version Negotiation Packets
+
+ Version Negotiation packets are designed to allow for functionality
+ to be defined in the future that allows QUIC to negotiate the version
+ of QUIC to use for a connection. Future Standards Track
+ specifications might change how implementations that support multiple
+ versions of QUIC react to Version Negotiation packets received in
+ response to an attempt to establish a connection using this version.
+
+ A client that supports only this version of QUIC MUST abandon the
+ current connection attempt if it receives a Version Negotiation
+ packet, with the following two exceptions. A client MUST discard any
+ Version Negotiation packet if it has received and successfully
+ processed any other packet, including an earlier Version Negotiation
+ packet. A client MUST discard a Version Negotiation packet that
+ lists the QUIC version selected by the client.
+
+ How to perform version negotiation is left as future work defined by
+ future Standards Track specifications. In particular, that future
+ work will ensure robustness against version downgrade attacks; see
+ Section 21.12.
+
+6.3. Using Reserved Versions
+
+ For a server to use a new version in the future, clients need to
+ correctly handle unsupported versions. Some version numbers
+ (0x?a?a?a?a, as defined in Section 15) are reserved for inclusion in
+ fields that contain version numbers.
+
+ Endpoints MAY add reserved versions to any field where unknown or
+ unsupported versions are ignored to test that a peer correctly
+ ignores the value. For instance, an endpoint could include a
+ reserved version in a Version Negotiation packet; see Section 17.2.1.
+ Endpoints MAY send packets with a reserved version to test that a
+ peer correctly discards the packet.
+
+7. Cryptographic and Transport Handshake
+
+ QUIC relies on a combined cryptographic and transport handshake to
+ minimize connection establishment latency. QUIC uses the CRYPTO
+ frame (Section 19.6) to transmit the cryptographic handshake. The
+ version of QUIC defined in this document is identified as 0x00000001
+ and uses TLS as described in [QUIC-TLS]; a different QUIC version
+ could indicate that a different cryptographic handshake protocol is
+ in use.
+
+ QUIC provides reliable, ordered delivery of the cryptographic
+ handshake data. QUIC packet protection is used to encrypt as much of
+ the handshake protocol as possible. The cryptographic handshake MUST
+ provide the following properties:
+
+ * authenticated key exchange, where
+
+ - a server is always authenticated,
+
+ - a client is optionally authenticated,
+
+ - every connection produces distinct and unrelated keys, and
+
+ - keying material is usable for packet protection for both 0-RTT
+ and 1-RTT packets.
+
+ * authenticated exchange of values for transport parameters of both
+ endpoints, and confidentiality protection for server transport
+ parameters (see Section 7.4).
+
+ * authenticated negotiation of an application protocol (TLS uses
+ Application-Layer Protocol Negotiation (ALPN) [ALPN] for this
+ purpose).
+
+ The CRYPTO frame can be sent in different packet number spaces
+ (Section 12.3). The offsets used by CRYPTO frames to ensure ordered
+ delivery of cryptographic handshake data start from zero in each
+ packet number space.
+
+ Figure 4 shows a simplified handshake and the exchange of packets and
+ frames that are used to advance the handshake. Exchange of
+ application data during the handshake is enabled where possible,
+ shown with an asterisk ("*"). Once the handshake is complete,
+ endpoints are able to exchange application data freely.
+
+ Client Server
+
+ Initial (CRYPTO)
+ 0-RTT (*) ---------->
+ Initial (CRYPTO)
+ Handshake (CRYPTO)
+ <---------- 1-RTT (*)
+ Handshake (CRYPTO)
+ 1-RTT (*) ---------->
+ <---------- 1-RTT (HANDSHAKE_DONE)
+
+ 1-RTT <=========> 1-RTT
+
+ Figure 4: Simplified QUIC Handshake
+
+ Endpoints can use packets sent during the handshake to test for
+ Explicit Congestion Notification (ECN) support; see Section 13.4. An
+ endpoint validates support for ECN by observing whether the ACK
+ frames acknowledging the first packets it sends carry ECN counts, as
+ described in Section 13.4.2.
+
+ Endpoints MUST explicitly negotiate an application protocol. This
+ avoids situations where there is a disagreement about the protocol
+ that is in use.
+
+7.1. Example Handshake Flows
+
+ Details of how TLS is integrated with QUIC are provided in
+ [QUIC-TLS], but some examples are provided here. An extension of
+ this exchange to support client address validation is shown in
+ Section 8.1.2.
+
+ Once any address validation exchanges are complete, the cryptographic
+ handshake is used to agree on cryptographic keys. The cryptographic
+ handshake is carried in Initial (Section 17.2.2) and Handshake
+ (Section 17.2.4) packets.
+
+ Figure 5 provides an overview of the 1-RTT handshake. Each line
+ shows a QUIC packet with the packet type and packet number shown
+ first, followed by the frames that are typically contained in those
+ packets. For instance, the first packet is of type Initial, with
+ packet number 0, and contains a CRYPTO frame carrying the
+ ClientHello.
+
+ Multiple QUIC packets -- even of different packet types -- can be
+ coalesced into a single UDP datagram; see Section 12.2. As a result,
+ this handshake could consist of as few as four UDP datagrams, or any
+ number more (subject to limits inherent to the protocol, such as
+ congestion control and anti-amplification). For instance, the
+ server's first flight contains Initial packets, Handshake packets,
+ and "0.5-RTT data" in 1-RTT packets.
+
+ Client Server
+
+ Initial[0]: CRYPTO[CH] ->
+
+ Initial[0]: CRYPTO[SH] ACK[0]
+ Handshake[0]: CRYPTO[EE, CERT, CV, FIN]
+ <- 1-RTT[0]: STREAM[1, "..."]
+
+ Initial[1]: ACK[0]
+ Handshake[0]: CRYPTO[FIN], ACK[0]
+ 1-RTT[0]: STREAM[0, "..."], ACK[0] ->
+
+ Handshake[1]: ACK[0]
+ <- 1-RTT[1]: HANDSHAKE_DONE, STREAM[3, "..."], ACK[0]
+
+ Figure 5: Example 1-RTT Handshake
+
+ Figure 6 shows an example of a connection with a 0-RTT handshake and
+ a single packet of 0-RTT data. Note that as described in
+ Section 12.3, the server acknowledges 0-RTT data in 1-RTT packets,
+ and the client sends 1-RTT packets in the same packet number space.
+
+ Client Server
+
+ Initial[0]: CRYPTO[CH]
+ 0-RTT[0]: STREAM[0, "..."] ->
+
+ Initial[0]: CRYPTO[SH] ACK[0]
+ Handshake[0] CRYPTO[EE, FIN]
+ <- 1-RTT[0]: STREAM[1, "..."] ACK[0]
+
+ Initial[1]: ACK[0]
+ Handshake[0]: CRYPTO[FIN], ACK[0]
+ 1-RTT[1]: STREAM[0, "..."] ACK[0] ->
+
+ Handshake[1]: ACK[0]
+ <- 1-RTT[1]: HANDSHAKE_DONE, STREAM[3, "..."], ACK[1]
+
+ Figure 6: Example 0-RTT Handshake
+
+7.2. Negotiating Connection IDs
+
+ A connection ID is used to ensure consistent routing of packets, as
+ described in Section 5.1. The long header contains two connection
+ IDs: the Destination Connection ID is chosen by the recipient of the
+ packet and is used to provide consistent routing; the Source
+ Connection ID is used to set the Destination Connection ID used by
+ the peer.
+
+ During the handshake, packets with the long header (Section 17.2) are
+ used to establish the connection IDs used by both endpoints. Each
+ endpoint uses the Source Connection ID field to specify the
+ connection ID that is used in the Destination Connection ID field of
+ packets being sent to them. After processing the first Initial
+ packet, each endpoint sets the Destination Connection ID field in
+ subsequent packets it sends to the value of the Source Connection ID
+ field that it received.
+
+ When an Initial packet is sent by a client that has not previously
+ received an Initial or Retry packet from the server, the client
+ populates the Destination Connection ID field with an unpredictable
+ value. This Destination Connection ID MUST be at least 8 bytes in
+ length. Until a packet is received from the server, the client MUST
+ use the same Destination Connection ID value on all packets in this
+ connection.
+
+ The Destination Connection ID field from the first Initial packet
+ sent by a client is used to determine packet protection keys for
+ Initial packets. These keys change after receiving a Retry packet;
+ see Section 5.2 of [QUIC-TLS].
+
+ The client populates the Source Connection ID field with a value of
+ its choosing and sets the Source Connection ID Length field to
+ indicate the length.
+
+ 0-RTT packets in the first flight use the same Destination Connection
+ ID and Source Connection ID values as the client's first Initial
+ packet.
+
+ Upon first receiving an Initial or Retry packet from the server, the
+ client uses the Source Connection ID supplied by the server as the
+ Destination Connection ID for subsequent packets, including any 0-RTT
+ packets. This means that a client might have to change the
+ connection ID it sets in the Destination Connection ID field twice
+ during connection establishment: once in response to a Retry packet
+ and once in response to an Initial packet from the server. Once a
+ client has received a valid Initial packet from the server, it MUST
+ discard any subsequent packet it receives on that connection with a
+ different Source Connection ID.
+
+ A client MUST change the Destination Connection ID it uses for
+ sending packets in response to only the first received Initial or
+ Retry packet. A server MUST set the Destination Connection ID it
+ uses for sending packets based on the first received Initial packet.
+ Any further changes to the Destination Connection ID are only
+ permitted if the values are taken from NEW_CONNECTION_ID frames; if
+ subsequent Initial packets include a different Source Connection ID,
+ they MUST be discarded. This avoids unpredictable outcomes that
+ might otherwise result from stateless processing of multiple Initial
+ packets with different Source Connection IDs.
+
+ The Destination Connection ID that an endpoint sends can change over
+ the lifetime of a connection, especially in response to connection
+ migration (Section 9); see Section 5.1.1 for details.
+
+7.3. Authenticating Connection IDs
+
+ The choice each endpoint makes about connection IDs during the
+ handshake is authenticated by including all values in transport
+ parameters; see Section 7.4. This ensures that all connection IDs
+ used for the handshake are also authenticated by the cryptographic
+ handshake.
+
+ Each endpoint includes the value of the Source Connection ID field
+ from the first Initial packet it sent in the
+ initial_source_connection_id transport parameter; see Section 18.2.
+ A server includes the Destination Connection ID field from the first
+ Initial packet it received from the client in the
+ original_destination_connection_id transport parameter; if the server
+ sent a Retry packet, this refers to the first Initial packet received
+ before sending the Retry packet. If it sends a Retry packet, a
+ server also includes the Source Connection ID field from the Retry
+ packet in the retry_source_connection_id transport parameter.
+
+ The values provided by a peer for these transport parameters MUST
+ match the values that an endpoint used in the Destination and Source
+ Connection ID fields of Initial packets that it sent (and received,
+ for servers). Endpoints MUST validate that received transport
+ parameters match received connection ID values. Including connection
+ ID values in transport parameters and verifying them ensures that an
+ attacker cannot influence the choice of connection ID for a
+ successful connection by injecting packets carrying attacker-chosen
+ connection IDs during the handshake.
+
+ An endpoint MUST treat the absence of the
+ initial_source_connection_id transport parameter from either endpoint
+ or the absence of the original_destination_connection_id transport
+ parameter from the server as a connection error of type
+ TRANSPORT_PARAMETER_ERROR.
+
+ An endpoint MUST treat the following as a connection error of type
+ TRANSPORT_PARAMETER_ERROR or PROTOCOL_VIOLATION:
+
+ * absence of the retry_source_connection_id transport parameter from
+ the server after receiving a Retry packet,
+
+ * presence of the retry_source_connection_id transport parameter
+ when no Retry packet was received, or
+
+ * a mismatch between values received from a peer in these transport
+ parameters and the value sent in the corresponding Destination or
+ Source Connection ID fields of Initial packets.
+
+ If a zero-length connection ID is selected, the corresponding
+ transport parameter is included with a zero-length value.
+
+ Figure 7 shows the connection IDs (with DCID=Destination Connection
+ ID, SCID=Source Connection ID) that are used in a complete handshake.
+ The exchange of Initial packets is shown, plus the later exchange of
+ 1-RTT packets that includes the connection ID established during the
+ handshake.
+
+ Client Server
+
+ Initial: DCID=S1, SCID=C1 ->
+ <- Initial: DCID=C1, SCID=S3
+ ...
+ 1-RTT: DCID=S3 ->
+ <- 1-RTT: DCID=C1
+
+ Figure 7: Use of Connection IDs in a Handshake
+
+ Figure 8 shows a similar handshake that includes a Retry packet.
+
+ Client Server
+
+ Initial: DCID=S1, SCID=C1 ->
+ <- Retry: DCID=C1, SCID=S2
+ Initial: DCID=S2, SCID=C1 ->
+ <- Initial: DCID=C1, SCID=S3
+ ...
+ 1-RTT: DCID=S3 ->
+ <- 1-RTT: DCID=C1
+
+ Figure 8: Use of Connection IDs in a Handshake with Retry
+
+ In both cases (Figures 7 and 8), the client sets the value of the
+ initial_source_connection_id transport parameter to "C1".
+
+ When the handshake does not include a Retry (Figure 7), the server
+ sets original_destination_connection_id to "S1" (note that this value
+ is chosen by the client) and initial_source_connection_id to "S3".
+ In this case, the server does not include a
+ retry_source_connection_id transport parameter.
+
+ When the handshake includes a Retry (Figure 8), the server sets
+ original_destination_connection_id to "S1",
+ retry_source_connection_id to "S2", and initial_source_connection_id
+ to "S3".
+
+7.4. Transport Parameters
+
+ During connection establishment, both endpoints make authenticated
+ declarations of their transport parameters. Endpoints are required
+ to comply with the restrictions that each parameter defines; the
+ description of each parameter includes rules for its handling.
+
+ Transport parameters are declarations that are made unilaterally by
+ each endpoint. Each endpoint can choose values for transport
+ parameters independent of the values chosen by its peer.
+
+ The encoding of the transport parameters is detailed in Section 18.
+
+ QUIC includes the encoded transport parameters in the cryptographic
+ handshake. Once the handshake completes, the transport parameters
+ declared by the peer are available. Each endpoint validates the
+ values provided by its peer.
+
+ Definitions for each of the defined transport parameters are included
+ in Section 18.2.
+
+ An endpoint MUST treat receipt of a transport parameter with an
+ invalid value as a connection error of type
+ TRANSPORT_PARAMETER_ERROR.
+
+ An endpoint MUST NOT send a parameter more than once in a given
+ transport parameters extension. An endpoint SHOULD treat receipt of
+ duplicate transport parameters as a connection error of type
+ TRANSPORT_PARAMETER_ERROR.
+
+ Endpoints use transport parameters to authenticate the negotiation of
+ connection IDs during the handshake; see Section 7.3.
+
+ ALPN (see [ALPN]) allows clients to offer multiple application
+ protocols during connection establishment. The transport parameters
+ that a client includes during the handshake apply to all application
+ protocols that the client offers. Application protocols can
+ recommend values for transport parameters, such as the initial flow
+ control limits. However, application protocols that set constraints
+ on values for transport parameters could make it impossible for a
+ client to offer multiple application protocols if these constraints
+ conflict.
+
+7.4.1. Values of Transport Parameters for 0-RTT
+
+ Using 0-RTT depends on both client and server using protocol
+ parameters that were negotiated from a previous connection. To
+ enable 0-RTT, endpoints store the values of the server transport
+ parameters with any session tickets it receives on the connection.
+ Endpoints also store any information required by the application
+ protocol or cryptographic handshake; see Section 4.6 of [QUIC-TLS].
+ The values of stored transport parameters are used when attempting
+ 0-RTT using the session tickets.
+
+ Remembered transport parameters apply to the new connection until the
+ handshake completes and the client starts sending 1-RTT packets.
+ Once the handshake completes, the client uses the transport
+ parameters established in the handshake. Not all transport
+ parameters are remembered, as some do not apply to future connections
+ or they have no effect on the use of 0-RTT.
+
+ The definition of a new transport parameter (Section 7.4.2) MUST
+ specify whether storing the transport parameter for 0-RTT is
+ mandatory, optional, or prohibited. A client need not store a
+ transport parameter it cannot process.
+
+ A client MUST NOT use remembered values for the following parameters:
+ ack_delay_exponent, max_ack_delay, initial_source_connection_id,
+ original_destination_connection_id, preferred_address,
+ retry_source_connection_id, and stateless_reset_token. The client
+ MUST use the server's new values in the handshake instead; if the
+ server does not provide new values, the default values are used.
+
+ A client that attempts to send 0-RTT data MUST remember all other
+ transport parameters used by the server that it is able to process.
+ The server can remember these transport parameters or can store an
+ integrity-protected copy of the values in the ticket and recover the
+ information when accepting 0-RTT data. A server uses the transport
+ parameters in determining whether to accept 0-RTT data.
+
+ If 0-RTT data is accepted by the server, the server MUST NOT reduce
+ any limits or alter any values that might be violated by the client
+ with its 0-RTT data. In particular, a server that accepts 0-RTT data
+ MUST NOT set values for the following parameters (Section 18.2) that
+ are smaller than the remembered values of the parameters.
+
+ * active_connection_id_limit
+
+ * initial_max_data
+
+ * initial_max_stream_data_bidi_local
+
+ * initial_max_stream_data_bidi_remote
+
+ * initial_max_stream_data_uni
+
+ * initial_max_streams_bidi
+
+ * initial_max_streams_uni
+
+ Omitting or setting a zero value for certain transport parameters can
+ result in 0-RTT data being enabled but not usable. The applicable
+ subset of transport parameters that permit the sending of application
+ data SHOULD be set to non-zero values for 0-RTT. This includes
+ initial_max_data and either (1) initial_max_streams_bidi and
+ initial_max_stream_data_bidi_remote or (2) initial_max_streams_uni
+ and initial_max_stream_data_uni.
+
+ A server might provide larger initial stream flow control limits for
+ streams than the remembered values that a client applies when sending
+ 0-RTT. Once the handshake completes, the client updates the flow
+ control limits on all sending streams using the updated values of
+ initial_max_stream_data_bidi_remote and initial_max_stream_data_uni.
+
+ A server MAY store and recover the previously sent values of the
+ max_idle_timeout, max_udp_payload_size, and disable_active_migration
+ parameters and reject 0-RTT if it selects smaller values. Lowering
+ the values of these parameters while also accepting 0-RTT data could
+ degrade the performance of the connection. Specifically, lowering
+ the max_udp_payload_size could result in dropped packets, leading to
+ worse performance compared to rejecting 0-RTT data outright.
+
+ A server MUST reject 0-RTT data if the restored values for transport
+ parameters cannot be supported.
+
+ When sending frames in 0-RTT packets, a client MUST only use
+ remembered transport parameters; importantly, it MUST NOT use updated
+ values that it learns from the server's updated transport parameters
+ or from frames received in 1-RTT packets. Updated values of
+ transport parameters from the handshake apply only to 1-RTT packets.
+ For instance, flow control limits from remembered transport
+ parameters apply to all 0-RTT packets even if those values are
+ increased by the handshake or by frames sent in 1-RTT packets. A
+ server MAY treat the use of updated transport parameters in 0-RTT as
+ a connection error of type PROTOCOL_VIOLATION.
+
+7.4.2. New Transport Parameters
+
+ New transport parameters can be used to negotiate new protocol
+ behavior. An endpoint MUST ignore transport parameters that it does
+ not support. The absence of a transport parameter therefore disables
+ any optional protocol feature that is negotiated using the parameter.
+ As described in Section 18.1, some identifiers are reserved in order
+ to exercise this requirement.
+
+ A client that does not understand a transport parameter can discard
+ it and attempt 0-RTT on subsequent connections. However, if the
+ client adds support for a discarded transport parameter, it risks
+ violating the constraints that the transport parameter establishes if
+ it attempts 0-RTT. New transport parameters can avoid this problem
+ by setting a default of the most conservative value. Clients can
+ avoid this problem by remembering all parameters, even those not
+ currently supported.
+
+ New transport parameters can be registered according to the rules in
+ Section 22.3.
+
+7.5. Cryptographic Message Buffering
+
+ Implementations need to maintain a buffer of CRYPTO data received out
+ of order. Because there is no flow control of CRYPTO frames, an
+ endpoint could potentially force its peer to buffer an unbounded
+ amount of data.
+
+ Implementations MUST support buffering at least 4096 bytes of data
+ received in out-of-order CRYPTO frames. Endpoints MAY choose to
+ allow more data to be buffered during the handshake. A larger limit
+ during the handshake could allow for larger keys or credentials to be
+ exchanged. An endpoint's buffer size does not need to remain
+ constant during the life of the connection.
+
+ Being unable to buffer CRYPTO frames during the handshake can lead to
+ a connection failure. If an endpoint's buffer is exceeded during the
+ handshake, it can expand its buffer temporarily to complete the
+ handshake. If an endpoint does not expand its buffer, it MUST close
+ the connection with a CRYPTO_BUFFER_EXCEEDED error code.
+
+ Once the handshake completes, if an endpoint is unable to buffer all
+ data in a CRYPTO frame, it MAY discard that CRYPTO frame and all
+ CRYPTO frames received in the future, or it MAY close the connection
+ with a CRYPTO_BUFFER_EXCEEDED error code. Packets containing
+ discarded CRYPTO frames MUST be acknowledged because the packet has
+ been received and processed by the transport even though the CRYPTO
+ frame was discarded.
+
+8. Address Validation
+
+ Address validation ensures that an endpoint cannot be used for a
+ traffic amplification attack. In such an attack, a packet is sent to
+ a server with spoofed source address information that identifies a
+ victim. If a server generates more or larger packets in response to
+ that packet, the attacker can use the server to send more data toward
+ the victim than it would be able to send on its own.
+
+ The primary defense against amplification attacks is verifying that a
+ peer is able to receive packets at the transport address that it
+ claims. Therefore, after receiving packets from an address that is
+ not yet validated, an endpoint MUST limit the amount of data it sends
+ to the unvalidated address to three times the amount of data received
+ from that address. This limit on the size of responses is known as
+ the anti-amplification limit.
+
+ Address validation is performed both during connection establishment
+ (see Section 8.1) and during connection migration (see Section 8.2).
+
+8.1. Address Validation during Connection Establishment
+
+ Connection establishment implicitly provides address validation for
+ both endpoints. In particular, receipt of a packet protected with
+ Handshake keys confirms that the peer successfully processed an
+ Initial packet. Once an endpoint has successfully processed a
+ Handshake packet from the peer, it can consider the peer address to
+ have been validated.
+
+ Additionally, an endpoint MAY consider the peer address validated if
+ the peer uses a connection ID chosen by the endpoint and the
+ connection ID contains at least 64 bits of entropy.
+
+ For the client, the value of the Destination Connection ID field in
+ its first Initial packet allows it to validate the server address as
+ a part of successfully processing any packet. Initial packets from
+ the server are protected with keys that are derived from this value
+ (see Section 5.2 of [QUIC-TLS]). Alternatively, the value is echoed
+ by the server in Version Negotiation packets (Section 6) or included
+ in the Integrity Tag in Retry packets (Section 5.8 of [QUIC-TLS]).
+
+ Prior to validating the client address, servers MUST NOT send more
+ than three times as many bytes as the number of bytes they have
+ received. This limits the magnitude of any amplification attack that
+ can be mounted using spoofed source addresses. For the purposes of
+ avoiding amplification prior to address validation, servers MUST
+ count all of the payload bytes received in datagrams that are
+ uniquely attributed to a single connection. This includes datagrams
+ that contain packets that are successfully processed and datagrams
+ that contain packets that are all discarded.
+
+ Clients MUST ensure that UDP datagrams containing Initial packets
+ have UDP payloads of at least 1200 bytes, adding PADDING frames as
+ necessary. A client that sends padded datagrams allows the server to
+ send more data prior to completing address validation.
+
+ Loss of an Initial or Handshake packet from the server can cause a
+ deadlock if the client does not send additional Initial or Handshake
+ packets. A deadlock could occur when the server reaches its anti-
+ amplification limit and the client has received acknowledgments for
+ all the data it has sent. In this case, when the client has no
+ reason to send additional packets, the server will be unable to send
+ more data because it has not validated the client's address. To
+ prevent this deadlock, clients MUST send a packet on a Probe Timeout
+ (PTO); see Section 6.2 of [QUIC-RECOVERY]. Specifically, the client
+ MUST send an Initial packet in a UDP datagram that contains at least
+ 1200 bytes if it does not have Handshake keys, and otherwise send a
+ Handshake packet.
+
+ A server might wish to validate the client address before starting
+ the cryptographic handshake. QUIC uses a token in the Initial packet
+ to provide address validation prior to completing the handshake.
+ This token is delivered to the client during connection establishment
+ with a Retry packet (see Section 8.1.2) or in a previous connection
+ using the NEW_TOKEN frame (see Section 8.1.3).
+
+ In addition to sending limits imposed prior to address validation,
+ servers are also constrained in what they can send by the limits set
+ by the congestion controller. Clients are only constrained by the
+ congestion controller.
+
+8.1.1. Token Construction
+
+ A token sent in a NEW_TOKEN frame or a Retry packet MUST be
+ constructed in a way that allows the server to identify how it was
+ provided to a client. These tokens are carried in the same field but
+ require different handling from servers.
+
+8.1.2. Address Validation Using Retry Packets
+
+ Upon receiving the client's Initial packet, the server can request
+ address validation by sending a Retry packet (Section 17.2.5)
+ containing a token. This token MUST be repeated by the client in all
+ Initial packets it sends for that connection after it receives the
+ Retry packet.
+
+ In response to processing an Initial packet containing a token that
+ was provided in a Retry packet, a server cannot send another Retry
+ packet; it can only refuse the connection or permit it to proceed.
+
+ As long as it is not possible for an attacker to generate a valid
+ token for its own address (see Section 8.1.4) and the client is able
+ to return that token, it proves to the server that it received the
+ token.
+
+ A server can also use a Retry packet to defer the state and
+ processing costs of connection establishment. Requiring the server
+ to provide a different connection ID, along with the
+ original_destination_connection_id transport parameter defined in
+ Section 18.2, forces the server to demonstrate that it, or an entity
+ it cooperates with, received the original Initial packet from the
+ client. Providing a different connection ID also grants a server
+ some control over how subsequent packets are routed. This can be
+ used to direct connections to a different server instance.
+
+ If a server receives a client Initial that contains an invalid Retry
+ token but is otherwise valid, it knows the client will not accept
+ another Retry token. The server can discard such a packet and allow
+ the client to time out to detect handshake failure, but that could
+ impose a significant latency penalty on the client. Instead, the
+ server SHOULD immediately close (Section 10.2) the connection with an
+ INVALID_TOKEN error. Note that a server has not established any
+ state for the connection at this point and so does not enter the
+ closing period.
+
+ A flow showing the use of a Retry packet is shown in Figure 9.
+
+ Client Server
+
+ Initial[0]: CRYPTO[CH] ->
+
+ <- Retry+Token
+
+ Initial+Token[1]: CRYPTO[CH] ->
+
+ Initial[0]: CRYPTO[SH] ACK[1]
+ Handshake[0]: CRYPTO[EE, CERT, CV, FIN]
+ <- 1-RTT[0]: STREAM[1, "..."]
+
+ Figure 9: Example Handshake with Retry
+
+8.1.3. Address Validation for Future Connections
+
+ A server MAY provide clients with an address validation token during
+ one connection that can be used on a subsequent connection. Address
+ validation is especially important with 0-RTT because a server
+ potentially sends a significant amount of data to a client in
+ response to 0-RTT data.
+
+ The server uses the NEW_TOKEN frame (Section 19.7) to provide the
+ client with an address validation token that can be used to validate
+ future connections. In a future connection, the client includes this
+ token in Initial packets to provide address validation. The client
+ MUST include the token in all Initial packets it sends, unless a
+ Retry replaces the token with a newer one. The client MUST NOT use
+ the token provided in a Retry for future connections. Servers MAY
+ discard any Initial packet that does not carry the expected token.
+
+ Unlike the token that is created for a Retry packet, which is used
+ immediately, the token sent in the NEW_TOKEN frame can be used after
+ some period of time has passed. Thus, a token SHOULD have an
+ expiration time, which could be either an explicit expiration time or
+ an issued timestamp that can be used to dynamically calculate the
+ expiration time. A server can store the expiration time or include
+ it in an encrypted form in the token.
+
+ A token issued with NEW_TOKEN MUST NOT include information that would
+ allow values to be linked by an observer to the connection on which
+ it was issued. For example, it cannot include the previous
+ connection ID or addressing information, unless the values are
+ encrypted. A server MUST ensure that every NEW_TOKEN frame it sends
+ is unique across all clients, with the exception of those sent to
+ repair losses of previously sent NEW_TOKEN frames. Information that
+ allows the server to distinguish between tokens from Retry and
+ NEW_TOKEN MAY be accessible to entities other than the server.
+
+ It is unlikely that the client port number is the same on two
+ different connections; validating the port is therefore unlikely to
+ be successful.
+
+ A token received in a NEW_TOKEN frame is applicable to any server
+ that the connection is considered authoritative for (e.g., server
+ names included in the certificate). When connecting to a server for
+ which the client retains an applicable and unused token, it SHOULD
+ include that token in the Token field of its Initial packet.
+ Including a token might allow the server to validate the client
+ address without an additional round trip. A client MUST NOT include
+ a token that is not applicable to the server that it is connecting
+ to, unless the client has the knowledge that the server that issued
+ the token and the server the client is connecting to are jointly
+ managing the tokens. A client MAY use a token from any previous
+ connection to that server.
+
+ A token allows a server to correlate activity between the connection
+ where the token was issued and any connection where it is used.
+ Clients that want to break continuity of identity with a server can
+ discard tokens provided using the NEW_TOKEN frame. In comparison, a
+ token obtained in a Retry packet MUST be used immediately during the
+ connection attempt and cannot be used in subsequent connection
+ attempts.
+
+ A client SHOULD NOT reuse a token from a NEW_TOKEN frame for
+ different connection attempts. Reusing a token allows connections to
+ be linked by entities on the network path; see Section 9.5.
+
+ Clients might receive multiple tokens on a single connection. Aside
+ from preventing linkability, any token can be used in any connection
+ attempt. Servers can send additional tokens to either enable address
+ validation for multiple connection attempts or replace older tokens
+ that might become invalid. For a client, this ambiguity means that
+ sending the most recent unused token is most likely to be effective.
+ Though saving and using older tokens have no negative consequences,
+ clients can regard older tokens as being less likely to be useful to
+ the server for address validation.
+
+ When a server receives an Initial packet with an address validation
+ token, it MUST attempt to validate the token, unless it has already
+ completed address validation. If the token is invalid, then the
+ server SHOULD proceed as if the client did not have a validated
+ address, including potentially sending a Retry packet. Tokens
+ provided with NEW_TOKEN frames and Retry packets can be distinguished
+ by servers (see Section 8.1.1), and the latter can be validated more
+ strictly. If the validation succeeds, the server SHOULD then allow
+ the handshake to proceed.
+
+ | Note: The rationale for treating the client as unvalidated
+ | rather than discarding the packet is that the client might have
+ | received the token in a previous connection using the NEW_TOKEN
+ | frame, and if the server has lost state, it might be unable to
+ | validate the token at all, leading to connection failure if the
+ | packet is discarded.
+
+ In a stateless design, a server can use encrypted and authenticated
+ tokens to pass information to clients that the server can later
+ recover and use to validate a client address. Tokens are not
+ integrated into the cryptographic handshake, and so they are not
+ authenticated. For instance, a client might be able to reuse a
+ token. To avoid attacks that exploit this property, a server can
+ limit its use of tokens to only the information needed to validate
+ client addresses.
+
+ Clients MAY use tokens obtained on one connection for any connection
+ attempt using the same version. When selecting a token to use,
+ clients do not need to consider other properties of the connection
+ that is being attempted, including the choice of possible application
+ protocols, session tickets, or other connection properties.
+
+8.1.4. Address Validation Token Integrity
+
+ An address validation token MUST be difficult to guess. Including a
+ random value with at least 128 bits of entropy in the token would be
+ sufficient, but this depends on the server remembering the value it
+ sends to clients.
+
+ A token-based scheme allows the server to offload any state
+ associated with validation to the client. For this design to work,
+ the token MUST be covered by integrity protection against
+ modification or falsification by clients. Without integrity
+ protection, malicious clients could generate or guess values for
+ tokens that would be accepted by the server. Only the server
+ requires access to the integrity protection key for tokens.
+
+ There is no need for a single well-defined format for the token
+ because the server that generates the token also consumes it. Tokens
+ sent in Retry packets SHOULD include information that allows the
+ server to verify that the source IP address and port in client
+ packets remain constant.
+
+ Tokens sent in NEW_TOKEN frames MUST include information that allows
+ the server to verify that the client IP address has not changed from
+ when the token was issued. Servers can use tokens from NEW_TOKEN
+ frames in deciding not to send a Retry packet, even if the client
+ address has changed. If the client IP address has changed, the
+ server MUST adhere to the anti-amplification limit; see Section 8.
+ Note that in the presence of NAT, this requirement might be
+ insufficient to protect other hosts that share the NAT from
+ amplification attacks.
+
+ Attackers could replay tokens to use servers as amplifiers in DDoS
+ attacks. To protect against such attacks, servers MUST ensure that
+ replay of tokens is prevented or limited. Servers SHOULD ensure that
+ tokens sent in Retry packets are only accepted for a short time, as
+ they are returned immediately by clients. Tokens that are provided
+ in NEW_TOKEN frames (Section 19.7) need to be valid for longer but
+ SHOULD NOT be accepted multiple times. Servers are encouraged to
+ allow tokens to be used only once, if possible; tokens MAY include
+ additional information about clients to further narrow applicability
+ or reuse.
+
+8.2. Path Validation
+
+ Path validation is used by both peers during connection migration
+ (see Section 9) to verify reachability after a change of address. In
+ path validation, endpoints test reachability between a specific local
+ address and a specific peer address, where an address is the 2-tuple
+ of IP address and port.
+
+ Path validation tests that packets sent on a path to a peer are
+ received by that peer. Path validation is used to ensure that
+ packets received from a migrating peer do not carry a spoofed source
+ address.
+
+ Path validation does not validate that a peer can send in the return
+ direction. Acknowledgments cannot be used for return path validation
+ because they contain insufficient entropy and might be spoofed.
+ Endpoints independently determine reachability on each direction of a
+ path, and therefore return reachability can only be established by
+ the peer.
+
+ Path validation can be used at any time by either endpoint. For
+ instance, an endpoint might check that a peer is still in possession
+ of its address after a period of quiescence.
+
+ Path validation is not designed as a NAT traversal mechanism. Though
+ the mechanism described here might be effective for the creation of
+ NAT bindings that support NAT traversal, the expectation is that one
+ endpoint is able to receive packets without first having sent a
+ packet on that path. Effective NAT traversal needs additional
+ synchronization mechanisms that are not provided here.
+
+ An endpoint MAY include other frames with the PATH_CHALLENGE and
+ PATH_RESPONSE frames used for path validation. In particular, an
+ endpoint can include PADDING frames with a PATH_CHALLENGE frame for
+ Path Maximum Transmission Unit Discovery (PMTUD); see Section 14.2.1.
+ An endpoint can also include its own PATH_CHALLENGE frame when
+ sending a PATH_RESPONSE frame.
+
+ An endpoint uses a new connection ID for probes sent from a new local
+ address; see Section 9.5. When probing a new path, an endpoint can
+ ensure that its peer has an unused connection ID available for
+ responses. Sending NEW_CONNECTION_ID and PATH_CHALLENGE frames in
+ the same packet, if the peer's active_connection_id_limit permits,
+ ensures that an unused connection ID will be available to the peer
+ when sending a response.
+
+ An endpoint can choose to simultaneously probe multiple paths. The
+ number of simultaneous paths used for probes is limited by the number
+ of extra connection IDs its peer has previously supplied, since each
+ new local address used for a probe requires a previously unused
+ connection ID.
+
+8.2.1. Initiating Path Validation
+
+ To initiate path validation, an endpoint sends a PATH_CHALLENGE frame
+ containing an unpredictable payload on the path to be validated.
+
+ An endpoint MAY send multiple PATH_CHALLENGE frames to guard against
+ packet loss. However, an endpoint SHOULD NOT send multiple
+ PATH_CHALLENGE frames in a single packet.
+
+ An endpoint SHOULD NOT probe a new path with packets containing a
+ PATH_CHALLENGE frame more frequently than it would send an Initial
+ packet. This ensures that connection migration is no more load on a
+ new path than establishing a new connection.
+
+ The endpoint MUST use unpredictable data in every PATH_CHALLENGE
+ frame so that it can associate the peer's response with the
+ corresponding PATH_CHALLENGE.
+
+ An endpoint MUST expand datagrams that contain a PATH_CHALLENGE frame
+ to at least the smallest allowed maximum datagram size of 1200 bytes,
+ unless the anti-amplification limit for the path does not permit
+ sending a datagram of this size. Sending UDP datagrams of this size
+ ensures that the network path from the endpoint to the peer can be
+ used for QUIC; see Section 14.
+
+ When an endpoint is unable to expand the datagram size to 1200 bytes
+ due to the anti-amplification limit, the path MTU will not be
+ validated. To ensure that the path MTU is large enough, the endpoint
+ MUST perform a second path validation by sending a PATH_CHALLENGE
+ frame in a datagram of at least 1200 bytes. This additional
+ validation can be performed after a PATH_RESPONSE is successfully
+ received or when enough bytes have been received on the path that
+ sending the larger datagram will not result in exceeding the anti-
+ amplification limit.
+
+ Unlike other cases where datagrams are expanded, endpoints MUST NOT
+ discard datagrams that appear to be too small when they contain
+ PATH_CHALLENGE or PATH_RESPONSE.
+
+8.2.2. Path Validation Responses
+
+ On receiving a PATH_CHALLENGE frame, an endpoint MUST respond by
+ echoing the data contained in the PATH_CHALLENGE frame in a
+ PATH_RESPONSE frame. An endpoint MUST NOT delay transmission of a
+ packet containing a PATH_RESPONSE frame unless constrained by
+ congestion control.
+
+ A PATH_RESPONSE frame MUST be sent on the network path where the
+ PATH_CHALLENGE frame was received. This ensures that path validation
+ by a peer only succeeds if the path is functional in both directions.
+ This requirement MUST NOT be enforced by the endpoint that initiates
+ path validation, as that would enable an attack on migration; see
+ Section 9.3.3.
+
+ An endpoint MUST expand datagrams that contain a PATH_RESPONSE frame
+ to at least the smallest allowed maximum datagram size of 1200 bytes.
+ This verifies that the path is able to carry datagrams of this size
+ in both directions. However, an endpoint MUST NOT expand the
+ datagram containing the PATH_RESPONSE if the resulting data exceeds
+ the anti-amplification limit. This is expected to only occur if the
+ received PATH_CHALLENGE was not sent in an expanded datagram.
+
+ An endpoint MUST NOT send more than one PATH_RESPONSE frame in
+ response to one PATH_CHALLENGE frame; see Section 13.3. The peer is
+ expected to send more PATH_CHALLENGE frames as necessary to evoke
+ additional PATH_RESPONSE frames.
+
+8.2.3. Successful Path Validation
+
+ Path validation succeeds when a PATH_RESPONSE frame is received that
+ contains the data that was sent in a previous PATH_CHALLENGE frame.
+ A PATH_RESPONSE frame received on any network path validates the path
+ on which the PATH_CHALLENGE was sent.
+
+ If an endpoint sends a PATH_CHALLENGE frame in a datagram that is not
+ expanded to at least 1200 bytes and if the response to it validates
+ the peer address, the path is validated but not the path MTU. As a
+ result, the endpoint can now send more than three times the amount of
+ data that has been received. However, the endpoint MUST initiate
+ another path validation with an expanded datagram to verify that the
+ path supports the required MTU.
+
+ Receipt of an acknowledgment for a packet containing a PATH_CHALLENGE
+ frame is not adequate validation, since the acknowledgment can be
+ spoofed by a malicious peer.
+
+8.2.4. Failed Path Validation
+
+ Path validation only fails when the endpoint attempting to validate
+ the path abandons its attempt to validate the path.
+
+ Endpoints SHOULD abandon path validation based on a timer. When
+ setting this timer, implementations are cautioned that the new path
+ could have a longer round-trip time than the original. A value of
+ three times the larger of the current PTO or the PTO for the new path
+ (using kInitialRtt, as defined in [QUIC-RECOVERY]) is RECOMMENDED.
+
+ This timeout allows for multiple PTOs to expire prior to failing path
+ validation, so that loss of a single PATH_CHALLENGE or PATH_RESPONSE
+ frame does not cause path validation failure.
+
+ Note that the endpoint might receive packets containing other frames
+ on the new path, but a PATH_RESPONSE frame with appropriate data is
+ required for path validation to succeed.
+
+ When an endpoint abandons path validation, it determines that the
+ path is unusable. This does not necessarily imply a failure of the
+ connection -- endpoints can continue sending packets over other paths
+ as appropriate. If no paths are available, an endpoint can wait for
+ a new path to become available or close the connection. An endpoint
+ that has no valid network path to its peer MAY signal this using the
+ NO_VIABLE_PATH connection error, noting that this is only possible if
+ the network path exists but does not support the required MTU
+ (Section 14).
+
+ A path validation might be abandoned for other reasons besides
+ failure. Primarily, this happens if a connection migration to a new
+ path is initiated while a path validation on the old path is in
+ progress.
+
+9. Connection Migration
+
+ The use of a connection ID allows connections to survive changes to
+ endpoint addresses (IP address and port), such as those caused by an
+ endpoint migrating to a new network. This section describes the
+ process by which an endpoint migrates to a new address.
+
+ The design of QUIC relies on endpoints retaining a stable address for
+ the duration of the handshake. An endpoint MUST NOT initiate
+ connection migration before the handshake is confirmed, as defined in
+ Section 4.1.2 of [QUIC-TLS].
+
+ If the peer sent the disable_active_migration transport parameter, an
+ endpoint also MUST NOT send packets (including probing packets; see
+ Section 9.1) from a different local address to the address the peer
+ used during the handshake, unless the endpoint has acted on a
+ preferred_address transport parameter from the peer. If the peer
+ violates this requirement, the endpoint MUST either drop the incoming
+ packets on that path without generating a Stateless Reset or proceed
+ with path validation and allow the peer to migrate. Generating a
+ Stateless Reset or closing the connection would allow third parties
+ in the network to cause connections to close by spoofing or otherwise
+ manipulating observed traffic.
+
+ Not all changes of peer address are intentional, or active,
+ migrations. The peer could experience NAT rebinding: a change of
+ address due to a middlebox, usually a NAT, allocating a new outgoing
+ port or even a new outgoing IP address for a flow. An endpoint MUST
+ perform path validation (Section 8.2) if it detects any change to a
+ peer's address, unless it has previously validated that address.
+
+ When an endpoint has no validated path on which to send packets, it
+ MAY discard connection state. An endpoint capable of connection
+ migration MAY wait for a new path to become available before
+ discarding connection state.
+
+ This document limits migration of connections to new client
+ addresses, except as described in Section 9.6. Clients are
+ responsible for initiating all migrations. Servers do not send non-
+ probing packets (see Section 9.1) toward a client address until they
+ see a non-probing packet from that address. If a client receives
+ packets from an unknown server address, the client MUST discard these
+ packets.
+
+9.1. Probing a New Path
+
+ An endpoint MAY probe for peer reachability from a new local address
+ using path validation (Section 8.2) prior to migrating the connection
+ to the new local address. Failure of path validation simply means
+ that the new path is not usable for this connection. Failure to
+ validate a path does not cause the connection to end unless there are
+ no valid alternative paths available.
+
+ PATH_CHALLENGE, PATH_RESPONSE, NEW_CONNECTION_ID, and PADDING frames
+ are "probing frames", and all other frames are "non-probing frames".
+ A packet containing only probing frames is a "probing packet", and a
+ packet containing any other frame is a "non-probing packet".
+
+9.2. Initiating Connection Migration
+
+ An endpoint can migrate a connection to a new local address by
+ sending packets containing non-probing frames from that address.
+
+ Each endpoint validates its peer's address during connection
+ establishment. Therefore, a migrating endpoint can send to its peer
+ knowing that the peer is willing to receive at the peer's current
+ address. Thus, an endpoint can migrate to a new local address
+ without first validating the peer's address.
+
+ To establish reachability on the new path, an endpoint initiates path
+ validation (Section 8.2) on the new path. An endpoint MAY defer path
+ validation until after a peer sends the next non-probing frame to its
+ new address.
+
+ When migrating, the new path might not support the endpoint's current
+ sending rate. Therefore, the endpoint resets its congestion
+ controller and RTT estimate, as described in Section 9.4.
+
+ The new path might not have the same ECN capability. Therefore, the
+ endpoint validates ECN capability as described in Section 13.4.
+
+9.3. Responding to Connection Migration
+
+ Receiving a packet from a new peer address containing a non-probing
+ frame indicates that the peer has migrated to that address.
+
+ If the recipient permits the migration, it MUST send subsequent
+ packets to the new peer address and MUST initiate path validation
+ (Section 8.2) to verify the peer's ownership of the address if
+ validation is not already underway. If the recipient has no unused
+ connection IDs from the peer, it will not be able to send anything on
+ the new path until the peer provides one; see Section 9.5.
+
+ An endpoint only changes the address to which it sends packets in
+ response to the highest-numbered non-probing packet. This ensures
+ that an endpoint does not send packets to an old peer address in the
+ case that it receives reordered packets.
+
+ An endpoint MAY send data to an unvalidated peer address, but it MUST
+ protect against potential attacks as described in Sections 9.3.1 and
+ 9.3.2. An endpoint MAY skip validation of a peer address if that
+ address has been seen recently. In particular, if an endpoint
+ returns to a previously validated path after detecting some form of
+ spurious migration, skipping address validation and restoring loss
+ detection and congestion state can reduce the performance impact of
+ the attack.
+
+ After changing the address to which it sends non-probing packets, an
+ endpoint can abandon any path validation for other addresses.
+
+ Receiving a packet from a new peer address could be the result of a
+ NAT rebinding at the peer.
+
+ After verifying a new client address, the server SHOULD send new
+ address validation tokens (Section 8) to the client.
+
+9.3.1. Peer Address Spoofing
+
+ It is possible that a peer is spoofing its source address to cause an
+ endpoint to send excessive amounts of data to an unwilling host. If
+ the endpoint sends significantly more data than the spoofing peer,
+ connection migration might be used to amplify the volume of data that
+ an attacker can generate toward a victim.
+
+ As described in Section 9.3, an endpoint is required to validate a
+ peer's new address to confirm the peer's possession of the new
+ address. Until a peer's address is deemed valid, an endpoint limits
+ the amount of data it sends to that address; see Section 8. In the
+ absence of this limit, an endpoint risks being used for a denial-of-
+ service attack against an unsuspecting victim.
+
+ If an endpoint skips validation of a peer address as described above,
+ it does not need to limit its sending rate.
+
+9.3.2. On-Path Address Spoofing
+
+ An on-path attacker could cause a spurious connection migration by
+ copying and forwarding a packet with a spoofed address such that it
+ arrives before the original packet. The packet with the spoofed
+ address will be seen to come from a migrating connection, and the
+ original packet will be seen as a duplicate and dropped. After a
+ spurious migration, validation of the source address will fail
+ because the entity at the source address does not have the necessary
+ cryptographic keys to read or respond to the PATH_CHALLENGE frame
+ that is sent to it even if it wanted to.
+
+ To protect the connection from failing due to such a spurious
+ migration, an endpoint MUST revert to using the last validated peer
+ address when validation of a new peer address fails. Additionally,
+ receipt of packets with higher packet numbers from the legitimate
+ peer address will trigger another connection migration. This will
+ cause the validation of the address of the spurious migration to be
+ abandoned, thus containing migrations initiated by the attacker
+ injecting a single packet.
+
+ If an endpoint has no state about the last validated peer address, it
+ MUST close the connection silently by discarding all connection
+ state. This results in new packets on the connection being handled
+ generically. For instance, an endpoint MAY send a Stateless Reset in
+ response to any further incoming packets.
+
+9.3.3. Off-Path Packet Forwarding
+
+ An off-path attacker that can observe packets might forward copies of
+ genuine packets to endpoints. If the copied packet arrives before
+ the genuine packet, this will appear as a NAT rebinding. Any genuine
+ packet will be discarded as a duplicate. If the attacker is able to
+ continue forwarding packets, it might be able to cause migration to a
+ path via the attacker. This places the attacker on-path, giving it
+ the ability to observe or drop all subsequent packets.
+
+ This style of attack relies on the attacker using a path that has
+ approximately the same characteristics as the direct path between
+ endpoints. The attack is more reliable if relatively few packets are
+ sent or if packet loss coincides with the attempted attack.
+
+ A non-probing packet received on the original path that increases the
+ maximum received packet number will cause the endpoint to move back
+ to that path. Eliciting packets on this path increases the
+ likelihood that the attack is unsuccessful. Therefore, mitigation of
+ this attack relies on triggering the exchange of packets.
+
+ In response to an apparent migration, endpoints MUST validate the
+ previously active path using a PATH_CHALLENGE frame. This induces
+ the sending of new packets on that path. If the path is no longer
+ viable, the validation attempt will time out and fail; if the path is
+ viable but no longer desired, the validation will succeed but only
+ results in probing packets being sent on the path.
+
+ An endpoint that receives a PATH_CHALLENGE on an active path SHOULD
+ send a non-probing packet in response. If the non-probing packet
+ arrives before any copy made by an attacker, this results in the
+ connection being migrated back to the original path. Any subsequent
+ migration to another path restarts this entire process.
+
+ This defense is imperfect, but this is not considered a serious
+ problem. If the path via the attack is reliably faster than the
+ original path despite multiple attempts to use that original path, it
+ is not possible to distinguish between an attack and an improvement
+ in routing.
+
+ An endpoint could also use heuristics to improve detection of this
+ style of attack. For instance, NAT rebinding is improbable if
+ packets were recently received on the old path; similarly, rebinding
+ is rare on IPv6 paths. Endpoints can also look for duplicated
+ packets. Conversely, a change in connection ID is more likely to
+ indicate an intentional migration rather than an attack.
+
+9.4. Loss Detection and Congestion Control
+
+ The capacity available on the new path might not be the same as the
+ old path. Packets sent on the old path MUST NOT contribute to
+ congestion control or RTT estimation for the new path.
+
+ On confirming a peer's ownership of its new address, an endpoint MUST
+ immediately reset the congestion controller and round-trip time
+ estimator for the new path to initial values (see Appendices A.3 and
+ B.3 of [QUIC-RECOVERY]) unless the only change in the peer's address
+ is its port number. Because port-only changes are commonly the
+ result of NAT rebinding or other middlebox activity, the endpoint MAY
+ instead retain its congestion control state and round-trip estimate
+ in those cases instead of reverting to initial values. In cases
+ where congestion control state retained from an old path is used on a
+ new path with substantially different characteristics, a sender could
+ transmit too aggressively until the congestion controller and the RTT
+ estimator have adapted. Generally, implementations are advised to be
+ cautious when using previous values on a new path.
+
+ There could be apparent reordering at the receiver when an endpoint
+ sends data and probes from/to multiple addresses during the migration
+ period, since the two resulting paths could have different round-trip
+ times. A receiver of packets on multiple paths will still send ACK
+ frames covering all received packets.
+
+ While multiple paths might be used during connection migration, a
+ single congestion control context and a single loss recovery context
+ (as described in [QUIC-RECOVERY]) could be adequate. For instance,
+ an endpoint might delay switching to a new congestion control context
+ until it is confirmed that an old path is no longer needed (such as
+ the case described in Section 9.3.3).
+
+ A sender can make exceptions for probe packets so that their loss
+ detection is independent and does not unduly cause the congestion
+ controller to reduce its sending rate. An endpoint might set a
+ separate timer when a PATH_CHALLENGE is sent, which is canceled if
+ the corresponding PATH_RESPONSE is received. If the timer fires
+ before the PATH_RESPONSE is received, the endpoint might send a new
+ PATH_CHALLENGE and restart the timer for a longer period of time.
+ This timer SHOULD be set as described in Section 6.2.1 of
+ [QUIC-RECOVERY] and MUST NOT be more aggressive.
+
+9.5. Privacy Implications of Connection Migration
+
+ Using a stable connection ID on multiple network paths would allow a
+ passive observer to correlate activity between those paths. An
+ endpoint that moves between networks might not wish to have their
+ activity correlated by any entity other than their peer, so different
+ connection IDs are used when sending from different local addresses,
+ as discussed in Section 5.1. For this to be effective, endpoints
+ need to ensure that connection IDs they provide cannot be linked by
+ any other entity.
+
+ At any time, endpoints MAY change the Destination Connection ID they
+ transmit with to a value that has not been used on another path.
+
+ An endpoint MUST NOT reuse a connection ID when sending from more
+ than one local address -- for example, when initiating connection
+ migration as described in Section 9.2 or when probing a new network
+ path as described in Section 9.1.
+
+ Similarly, an endpoint MUST NOT reuse a connection ID when sending to
+ more than one destination address. Due to network changes outside
+ the control of its peer, an endpoint might receive packets from a new
+ source address with the same Destination Connection ID field value,
+ in which case it MAY continue to use the current connection ID with
+ the new remote address while still sending from the same local
+ address.
+
+ These requirements regarding connection ID reuse apply only to the
+ sending of packets, as unintentional changes in path without a change
+ in connection ID are possible. For example, after a period of
+ network inactivity, NAT rebinding might cause packets to be sent on a
+ new path when the client resumes sending. An endpoint responds to
+ such an event as described in Section 9.3.
+
+ Using different connection IDs for packets sent in both directions on
+ each new network path eliminates the use of the connection ID for
+ linking packets from the same connection across different network
+ paths. Header protection ensures that packet numbers cannot be used
+ to correlate activity. This does not prevent other properties of
+ packets, such as timing and size, from being used to correlate
+ activity.
+
+ An endpoint SHOULD NOT initiate migration with a peer that has
+ requested a zero-length connection ID, because traffic over the new
+ path might be trivially linkable to traffic over the old one. If the
+ server is able to associate packets with a zero-length connection ID
+ to the right connection, it means that the server is using other
+ information to demultiplex packets. For example, a server might
+ provide a unique address to every client -- for instance, using HTTP
+ alternative services [ALTSVC]. Information that might allow correct
+ routing of packets across multiple network paths will also allow
+ activity on those paths to be linked by entities other than the peer.
+
+ A client might wish to reduce linkability by switching to a new
+ connection ID, source UDP port, or IP address (see [RFC8981]) when
+ sending traffic after a period of inactivity. Changing the address
+ from which it sends packets at the same time might cause the server
+ to detect a connection migration. This ensures that the mechanisms
+ that support migration are exercised even for clients that do not
+ experience NAT rebindings or genuine migrations. Changing address
+ can cause a peer to reset its congestion control state (see
+ Section 9.4), so addresses SHOULD only be changed infrequently.
+
+ An endpoint that exhausts available connection IDs cannot probe new
+ paths or initiate migration, nor can it respond to probes or attempts
+ by its peer to migrate. To ensure that migration is possible and
+ packets sent on different paths cannot be correlated, endpoints
+ SHOULD provide new connection IDs before peers migrate; see
+ Section 5.1.1. If a peer might have exhausted available connection
+ IDs, a migrating endpoint could include a NEW_CONNECTION_ID frame in
+ all packets sent on a new network path.
+
+9.6. Server's Preferred Address
+
+ QUIC allows servers to accept connections on one IP address and
+ attempt to transfer these connections to a more preferred address
+ shortly after the handshake. This is particularly useful when
+ clients initially connect to an address shared by multiple servers
+ but would prefer to use a unicast address to ensure connection
+ stability. This section describes the protocol for migrating a
+ connection to a preferred server address.
+
+ Migrating a connection to a new server address mid-connection is not
+ supported by the version of QUIC specified in this document. If a
+ client receives packets from a new server address when the client has
+ not initiated a migration to that address, the client SHOULD discard
+ these packets.
+
+9.6.1. Communicating a Preferred Address
+
+ A server conveys a preferred address by including the
+ preferred_address transport parameter in the TLS handshake.
+
+ Servers MAY communicate a preferred address of each address family
+ (IPv4 and IPv6) to allow clients to pick the one most suited to their
+ network attachment.
+
+ Once the handshake is confirmed, the client SHOULD select one of the
+ two addresses provided by the server and initiate path validation
+ (see Section 8.2). A client constructs packets using any previously
+ unused active connection ID, taken from either the preferred_address
+ transport parameter or a NEW_CONNECTION_ID frame.
+
+ As soon as path validation succeeds, the client SHOULD begin sending
+ all future packets to the new server address using the new connection
+ ID and discontinue use of the old server address. If path validation
+ fails, the client MUST continue sending all future packets to the
+ server's original IP address.
+
+9.6.2. Migration to a Preferred Address
+
+ A client that migrates to a preferred address MUST validate the
+ address it chooses before migrating; see Section 21.5.3.
+
+ A server might receive a packet addressed to its preferred IP address
+ at any time after it accepts a connection. If this packet contains a
+ PATH_CHALLENGE frame, the server sends a packet containing a
+ PATH_RESPONSE frame as per Section 8.2. The server MUST send non-
+ probing packets from its original address until it receives a non-
+ probing packet from the client at its preferred address and until the
+ server has validated the new path.
+
+ The server MUST probe on the path toward the client from its
+ preferred address. This helps to guard against spurious migration
+ initiated by an attacker.
+
+ Once the server has completed its path validation and has received a
+ non-probing packet with a new largest packet number on its preferred
+ address, the server begins sending non-probing packets to the client
+ exclusively from its preferred IP address. The server SHOULD drop
+ newer packets for this connection that are received on the old IP
+ address. The server MAY continue to process delayed packets that are
+ received on the old IP address.
+
+ The addresses that a server provides in the preferred_address
+ transport parameter are only valid for the connection in which they
+ are provided. A client MUST NOT use these for other connections,
+ including connections that are resumed from the current connection.
+
+9.6.3. Interaction of Client Migration and Preferred Address
+
+ A client might need to perform a connection migration before it has
+ migrated to the server's preferred address. In this case, the client
+ SHOULD perform path validation to both the original and preferred
+ server address from the client's new address concurrently.
+
+ If path validation of the server's preferred address succeeds, the
+ client MUST abandon validation of the original address and migrate to
+ using the server's preferred address. If path validation of the
+ server's preferred address fails but validation of the server's
+ original address succeeds, the client MAY migrate to its new address
+ and continue sending to the server's original address.
+
+ If packets received at the server's preferred address have a
+ different source address than observed from the client during the
+ handshake, the server MUST protect against potential attacks as
+ described in Sections 9.3.1 and 9.3.2. In addition to intentional
+ simultaneous migration, this might also occur because the client's
+ access network used a different NAT binding for the server's
+ preferred address.
+
+ Servers SHOULD initiate path validation to the client's new address
+ upon receiving a probe packet from a different address; see
+ Section 8.
+
+ A client that migrates to a new address SHOULD use a preferred
+ address from the same address family for the server.
+
+ The connection ID provided in the preferred_address transport
+ parameter is not specific to the addresses that are provided. This
+ connection ID is provided to ensure that the client has a connection
+ ID available for migration, but the client MAY use this connection ID
+ on any path.
+
+9.7. Use of IPv6 Flow Label and Migration
+
+ Endpoints that send data using IPv6 SHOULD apply an IPv6 flow label
+ in compliance with [RFC6437], unless the local API does not allow
+ setting IPv6 flow labels.
+
+ The flow label generation MUST be designed to minimize the chances of
+ linkability with a previously used flow label, as a stable flow label
+ would enable correlating activity on multiple paths; see Section 9.5.
+
+ [RFC6437] suggests deriving values using a pseudorandom function to
+ generate flow labels. Including the Destination Connection ID field
+ in addition to source and destination addresses when generating flow
+ labels ensures that changes are synchronized with changes in other
+ observable identifiers. A cryptographic hash function that combines
+ these inputs with a local secret is one way this might be
+ implemented.
+
+10. Connection Termination
+
+ An established QUIC connection can be terminated in one of three
+ ways:
+
+ * idle timeout (Section 10.1)
+
+ * immediate close (Section 10.2)
+
+ * stateless reset (Section 10.3)
+
+ An endpoint MAY discard connection state if it does not have a
+ validated path on which it can send packets; see Section 8.2.
+
+10.1. Idle Timeout
+
+ If a max_idle_timeout is specified by either endpoint in its
+ transport parameters (Section 18.2), the connection is silently
+ closed and its state is discarded when it remains idle for longer
+ than the minimum of the max_idle_timeout value advertised by both
+ endpoints.
+
+ Each endpoint advertises a max_idle_timeout, but the effective value
+ at an endpoint is computed as the minimum of the two advertised
+ values (or the sole advertised value, if only one endpoint advertises
+ a non-zero value). By announcing a max_idle_timeout, an endpoint
+ commits to initiating an immediate close (Section 10.2) if it
+ abandons the connection prior to the effective value.
+
+ An endpoint restarts its idle timer when a packet from its peer is
+ received and processed successfully. An endpoint also restarts its
+ idle timer when sending an ack-eliciting packet if no other ack-
+ eliciting packets have been sent since last receiving and processing
+ a packet. Restarting this timer when sending a packet ensures that
+ connections are not closed after new activity is initiated.
+
+ To avoid excessively small idle timeout periods, endpoints MUST
+ increase the idle timeout period to be at least three times the
+ current Probe Timeout (PTO). This allows for multiple PTOs to
+ expire, and therefore multiple probes to be sent and lost, prior to
+ idle timeout.
+
+10.1.1. Liveness Testing
+
+ An endpoint that sends packets close to the effective timeout risks
+ having them be discarded at the peer, since the idle timeout period
+ might have expired at the peer before these packets arrive.
+
+ An endpoint can send a PING or another ack-eliciting frame to test
+ the connection for liveness if the peer could time out soon, such as
+ within a PTO; see Section 6.2 of [QUIC-RECOVERY]. This is especially
+ useful if any available application data cannot be safely retried.
+ Note that the application determines what data is safe to retry.
+
+10.1.2. Deferring Idle Timeout
+
+ An endpoint might need to send ack-eliciting packets to avoid an idle
+ timeout if it is expecting response data but does not have or is
+ unable to send application data.
+
+ An implementation of QUIC might provide applications with an option
+ to defer an idle timeout. This facility could be used when the
+ application wishes to avoid losing state that has been associated
+ with an open connection but does not expect to exchange application
+ data for some time. With this option, an endpoint could send a PING
+ frame (Section 19.2) periodically, which will cause the peer to
+ restart its idle timeout period. Sending a packet containing a PING
+ frame restarts the idle timeout for this endpoint also if this is the
+ first ack-eliciting packet sent since receiving a packet. Sending a
+ PING frame causes the peer to respond with an acknowledgment, which
+ also restarts the idle timeout for the endpoint.
+
+ Application protocols that use QUIC SHOULD provide guidance on when
+ deferring an idle timeout is appropriate. Unnecessary sending of
+ PING frames could have a detrimental effect on performance.
+
+ A connection will time out if no packets are sent or received for a
+ period longer than the time negotiated using the max_idle_timeout
+ transport parameter; see Section 10. However, state in middleboxes
+ might time out earlier than that. Though REQ-5 in [RFC4787]
+ recommends a 2-minute timeout interval, experience shows that sending
+ packets every 30 seconds is necessary to prevent the majority of
+ middleboxes from losing state for UDP flows [GATEWAY].
+
+10.2. Immediate Close
+
+ An endpoint sends a CONNECTION_CLOSE frame (Section 19.19) to
+ terminate the connection immediately. A CONNECTION_CLOSE frame
+ causes all streams to immediately become closed; open streams can be
+ assumed to be implicitly reset.
+
+ After sending a CONNECTION_CLOSE frame, an endpoint immediately
+ enters the closing state; see Section 10.2.1. After receiving a
+ CONNECTION_CLOSE frame, endpoints enter the draining state; see
+ Section 10.2.2.
+
+ Violations of the protocol lead to an immediate close.
+
+ An immediate close can be used after an application protocol has
+ arranged to close a connection. This might be after the application
+ protocol negotiates a graceful shutdown. The application protocol
+ can exchange messages that are needed for both application endpoints
+ to agree that the connection can be closed, after which the
+ application requests that QUIC close the connection. When QUIC
+ consequently closes the connection, a CONNECTION_CLOSE frame with an
+ application-supplied error code will be used to signal closure to the
+ peer.
+
+ The closing and draining connection states exist to ensure that
+ connections close cleanly and that delayed or reordered packets are
+ properly discarded. These states SHOULD persist for at least three
+ times the current PTO interval as defined in [QUIC-RECOVERY].
+
+ Disposing of connection state prior to exiting the closing or
+ draining state could result in an endpoint generating a Stateless
+ Reset unnecessarily when it receives a late-arriving packet.
+ Endpoints that have some alternative means to ensure that late-
+ arriving packets do not induce a response, such as those that are
+ able to close the UDP socket, MAY end these states earlier to allow
+ for faster resource recovery. Servers that retain an open socket for
+ accepting new connections SHOULD NOT end the closing or draining
+ state early.
+
+ Once its closing or draining state ends, an endpoint SHOULD discard
+ all connection state. The endpoint MAY send a Stateless Reset in
+ response to any further incoming packets belonging to this
+ connection.
+
+10.2.1. Closing Connection State
+
+ An endpoint enters the closing state after initiating an immediate
+ close.
+
+ In the closing state, an endpoint retains only enough information to
+ generate a packet containing a CONNECTION_CLOSE frame and to identify
+ packets as belonging to the connection. An endpoint in the closing
+ state sends a packet containing a CONNECTION_CLOSE frame in response
+ to any incoming packet that it attributes to the connection.
+
+ An endpoint SHOULD limit the rate at which it generates packets in
+ the closing state. For instance, an endpoint could wait for a
+ progressively increasing number of received packets or amount of time
+ before responding to received packets.
+
+ An endpoint's selected connection ID and the QUIC version are
+ sufficient information to identify packets for a closing connection;
+ the endpoint MAY discard all other connection state. An endpoint
+ that is closing is not required to process any received frame. An
+ endpoint MAY retain packet protection keys for incoming packets to
+ allow it to read and process a CONNECTION_CLOSE frame.
+
+ An endpoint MAY drop packet protection keys when entering the closing
+ state and send a packet containing a CONNECTION_CLOSE frame in
+ response to any UDP datagram that is received. However, an endpoint
+ that discards packet protection keys cannot identify and discard
+ invalid packets. To avoid being used for an amplification attack,
+ such endpoints MUST limit the cumulative size of packets it sends to
+ three times the cumulative size of the packets that are received and
+ attributed to the connection. To minimize the state that an endpoint
+ maintains for a closing connection, endpoints MAY send the exact same
+ packet in response to any received packet.
+
+ | Note: Allowing retransmission of a closing packet is an
+ | exception to the requirement that a new packet number be used
+ | for each packet; see Section 12.3. Sending new packet numbers
+ | is primarily of advantage to loss recovery and congestion
+ | control, which are not expected to be relevant for a closed
+ | connection. Retransmitting the final packet requires less
+ | state.
+
+ While in the closing state, an endpoint could receive packets from a
+ new source address, possibly indicating a connection migration; see
+ Section 9. An endpoint in the closing state MUST either discard
+ packets received from an unvalidated address or limit the cumulative
+ size of packets it sends to an unvalidated address to three times the
+ size of packets it receives from that address.
+
+ An endpoint is not expected to handle key updates when it is closing
+ (Section 6 of [QUIC-TLS]). A key update might prevent the endpoint
+ from moving from the closing state to the draining state, as the
+ endpoint will not be able to process subsequently received packets,
+ but it otherwise has no impact.
+
+10.2.2. Draining Connection State
+
+ The draining state is entered once an endpoint receives a
+ CONNECTION_CLOSE frame, which indicates that its peer is closing or
+ draining. While otherwise identical to the closing state, an
+ endpoint in the draining state MUST NOT send any packets. Retaining
+ packet protection keys is unnecessary once a connection is in the
+ draining state.
+
+ An endpoint that receives a CONNECTION_CLOSE frame MAY send a single
+ packet containing a CONNECTION_CLOSE frame before entering the
+ draining state, using a NO_ERROR code if appropriate. An endpoint
+ MUST NOT send further packets. Doing so could result in a constant
+ exchange of CONNECTION_CLOSE frames until one of the endpoints exits
+ the closing state.
+
+ An endpoint MAY enter the draining state from the closing state if it
+ receives a CONNECTION_CLOSE frame, which indicates that the peer is
+ also closing or draining. In this case, the draining state ends when
+ the closing state would have ended. In other words, the endpoint
+ uses the same end time but ceases transmission of any packets on this
+ connection.
+
+10.2.3. Immediate Close during the Handshake
+
+ When sending a CONNECTION_CLOSE frame, the goal is to ensure that the
+ peer will process the frame. Generally, this means sending the frame
+ in a packet with the highest level of packet protection to avoid the
+ packet being discarded. After the handshake is confirmed (see
+ Section 4.1.2 of [QUIC-TLS]), an endpoint MUST send any
+ CONNECTION_CLOSE frames in a 1-RTT packet. However, prior to
+ confirming the handshake, it is possible that more advanced packet
+ protection keys are not available to the peer, so another
+ CONNECTION_CLOSE frame MAY be sent in a packet that uses a lower
+ packet protection level. More specifically:
+
+ * A client will always know whether the server has Handshake keys
+ (see Section 17.2.2.1), but it is possible that a server does not
+ know whether the client has Handshake keys. Under these
+ circumstances, a server SHOULD send a CONNECTION_CLOSE frame in
+ both Handshake and Initial packets to ensure that at least one of
+ them is processable by the client.
+
+ * A client that sends a CONNECTION_CLOSE frame in a 0-RTT packet
+ cannot be assured that the server has accepted 0-RTT. Sending a
+ CONNECTION_CLOSE frame in an Initial packet makes it more likely
+ that the server can receive the close signal, even if the
+ application error code might not be received.
+
+ * Prior to confirming the handshake, a peer might be unable to
+ process 1-RTT packets, so an endpoint SHOULD send a
+ CONNECTION_CLOSE frame in both Handshake and 1-RTT packets. A
+ server SHOULD also send a CONNECTION_CLOSE frame in an Initial
+ packet.
+
+ Sending a CONNECTION_CLOSE of type 0x1d in an Initial or Handshake
+ packet could expose application state or be used to alter application
+ state. A CONNECTION_CLOSE of type 0x1d MUST be replaced by a
+ CONNECTION_CLOSE of type 0x1c when sending the frame in Initial or
+ Handshake packets. Otherwise, information about the application
+ state might be revealed. Endpoints MUST clear the value of the
+ Reason Phrase field and SHOULD use the APPLICATION_ERROR code when
+ converting to a CONNECTION_CLOSE of type 0x1c.
+
+ CONNECTION_CLOSE frames sent in multiple packet types can be
+ coalesced into a single UDP datagram; see Section 12.2.
+
+ An endpoint can send a CONNECTION_CLOSE frame in an Initial packet.
+ This might be in response to unauthenticated information received in
+ Initial or Handshake packets. Such an immediate close might expose
+ legitimate connections to a denial of service. QUIC does not include
+ defensive measures for on-path attacks during the handshake; see
+ Section 21.2. However, at the cost of reducing feedback about errors
+ for legitimate peers, some forms of denial of service can be made
+ more difficult for an attacker if endpoints discard illegal packets
+ rather than terminating a connection with CONNECTION_CLOSE. For this
+ reason, endpoints MAY discard packets rather than immediately close
+ if errors are detected in packets that lack authentication.
+
+ An endpoint that has not established state, such as a server that
+ detects an error in an Initial packet, does not enter the closing
+ state. An endpoint that has no state for the connection does not
+ enter a closing or draining period on sending a CONNECTION_CLOSE
+ frame.
+
+10.3. Stateless Reset
+
+ A stateless reset is provided as an option of last resort for an
+ endpoint that does not have access to the state of a connection. A
+ crash or outage might result in peers continuing to send data to an
+ endpoint that is unable to properly continue the connection. An
+ endpoint MAY send a Stateless Reset in response to receiving a packet
+ that it cannot associate with an active connection.
+
+ A stateless reset is not appropriate for indicating errors in active
+ connections. An endpoint that wishes to communicate a fatal
+ connection error MUST use a CONNECTION_CLOSE frame if it is able.
+
+ To support this process, an endpoint issues a stateless reset token,
+ which is a 16-byte value that is hard to guess. If the peer
+ subsequently receives a Stateless Reset, which is a UDP datagram that
+ ends in that stateless reset token, the peer will immediately end the
+ connection.
+
+ A stateless reset token is specific to a connection ID. An endpoint
+ issues a stateless reset token by including the value in the
+ Stateless Reset Token field of a NEW_CONNECTION_ID frame. Servers
+ can also issue a stateless_reset_token transport parameter during the
+ handshake that applies to the connection ID that it selected during
+ the handshake. These exchanges are protected by encryption, so only
+ client and server know their value. Note that clients cannot use the
+ stateless_reset_token transport parameter because their transport
+ parameters do not have confidentiality protection.
+
+ Tokens are invalidated when their associated connection ID is retired
+ via a RETIRE_CONNECTION_ID frame (Section 19.16).
+
+ An endpoint that receives packets that it cannot process sends a
+ packet in the following layout (see Section 1.3):
+
+ Stateless Reset {
+ Fixed Bits (2) = 1,
+ Unpredictable Bits (38..),
+ Stateless Reset Token (128),
+ }
+
+ Figure 10: Stateless Reset
+
+ This design ensures that a Stateless Reset is -- to the extent
+ possible -- indistinguishable from a regular packet with a short
+ header.
+
+ A Stateless Reset uses an entire UDP datagram, starting with the
+ first two bits of the packet header. The remainder of the first byte
+ and an arbitrary number of bytes following it are set to values that
+ SHOULD be indistinguishable from random. The last 16 bytes of the
+ datagram contain a stateless reset token.
+
+ To entities other than its intended recipient, a Stateless Reset will
+ appear to be a packet with a short header. For the Stateless Reset
+ to appear as a valid QUIC packet, the Unpredictable Bits field needs
+ to include at least 38 bits of data (or 5 bytes, less the two fixed
+ bits).
+
+ The resulting minimum size of 21 bytes does not guarantee that a
+ Stateless Reset is difficult to distinguish from other packets if the
+ recipient requires the use of a connection ID. To achieve that end,
+ the endpoint SHOULD ensure that all packets it sends are at least 22
+ bytes longer than the minimum connection ID length that it requests
+ the peer to include in its packets, adding PADDING frames as
+ necessary. This ensures that any Stateless Reset sent by the peer is
+ indistinguishable from a valid packet sent to the endpoint. An
+ endpoint that sends a Stateless Reset in response to a packet that is
+ 43 bytes or shorter SHOULD send a Stateless Reset that is one byte
+ shorter than the packet it responds to.
+
+ These values assume that the stateless reset token is the same length
+ as the minimum expansion of the packet protection AEAD. Additional
+ unpredictable bytes are necessary if the endpoint could have
+ negotiated a packet protection scheme with a larger minimum
+ expansion.
+
+ An endpoint MUST NOT send a Stateless Reset that is three times or
+ more larger than the packet it receives to avoid being used for
+ amplification. Section 10.3.3 describes additional limits on
+ Stateless Reset size.
+
+ Endpoints MUST discard packets that are too small to be valid QUIC
+ packets. To give an example, with the set of AEAD functions defined
+ in [QUIC-TLS], short header packets that are smaller than 21 bytes
+ are never valid.
+
+ Endpoints MUST send Stateless Resets formatted as a packet with a
+ short header. However, endpoints MUST treat any packet ending in a
+ valid stateless reset token as a Stateless Reset, as other QUIC
+ versions might allow the use of a long header.
+
+ An endpoint MAY send a Stateless Reset in response to a packet with a
+ long header. Sending a Stateless Reset is not effective prior to the
+ stateless reset token being available to a peer. In this QUIC
+ version, packets with a long header are only used during connection
+ establishment. Because the stateless reset token is not available
+ until connection establishment is complete or near completion,
+ ignoring an unknown packet with a long header might be as effective
+ as sending a Stateless Reset.
+
+ An endpoint cannot determine the Source Connection ID from a packet
+ with a short header; therefore, it cannot set the Destination
+ Connection ID in the Stateless Reset. The Destination Connection ID
+ will therefore differ from the value used in previous packets. A
+ random Destination Connection ID makes the connection ID appear to be
+ the result of moving to a new connection ID that was provided using a
+ NEW_CONNECTION_ID frame; see Section 19.15.
+
+ Using a randomized connection ID results in two problems:
+
+ * The packet might not reach the peer. If the Destination
+ Connection ID is critical for routing toward the peer, then this
+ packet could be incorrectly routed. This might also trigger
+ another Stateless Reset in response; see Section 10.3.3. A
+ Stateless Reset that is not correctly routed is an ineffective
+ error detection and recovery mechanism. In this case, endpoints
+ will need to rely on other methods -- such as timers -- to detect
+ that the connection has failed.
+
+ * The randomly generated connection ID can be used by entities other
+ than the peer to identify this as a potential Stateless Reset. An
+ endpoint that occasionally uses different connection IDs might
+ introduce some uncertainty about this.
+
+ This stateless reset design is specific to QUIC version 1. An
+ endpoint that supports multiple versions of QUIC needs to generate a
+ Stateless Reset that will be accepted by peers that support any
+ version that the endpoint might support (or might have supported
+ prior to losing state). Designers of new versions of QUIC need to be
+ aware of this and either (1) reuse this design or (2) use a portion
+ of the packet other than the last 16 bytes for carrying data.
+
+10.3.1. Detecting a Stateless Reset
+
+ An endpoint detects a potential Stateless Reset using the trailing 16
+ bytes of the UDP datagram. An endpoint remembers all stateless reset
+ tokens associated with the connection IDs and remote addresses for
+ datagrams it has recently sent. This includes Stateless Reset Token
+ field values from NEW_CONNECTION_ID frames and the server's transport
+ parameters but excludes stateless reset tokens associated with
+ connection IDs that are either unused or retired. The endpoint
+ identifies a received datagram as a Stateless Reset by comparing the
+ last 16 bytes of the datagram with all stateless reset tokens
+ associated with the remote address on which the datagram was
+ received.
+
+ This comparison can be performed for every inbound datagram.
+ Endpoints MAY skip this check if any packet from a datagram is
+ successfully processed. However, the comparison MUST be performed
+ when the first packet in an incoming datagram either cannot be
+ associated with a connection or cannot be decrypted.
+
+ An endpoint MUST NOT check for any stateless reset tokens associated
+ with connection IDs it has not used or for connection IDs that have
+ been retired.
+
+ When comparing a datagram to stateless reset token values, endpoints
+ MUST perform the comparison without leaking information about the
+ value of the token. For example, performing this comparison in
+ constant time protects the value of individual stateless reset tokens
+ from information leakage through timing side channels. Another
+ approach would be to store and compare the transformed values of
+ stateless reset tokens instead of the raw token values, where the
+ transformation is defined as a cryptographically secure pseudorandom
+ function using a secret key (e.g., block cipher, Hashed Message
+ Authentication Code (HMAC) [RFC2104]). An endpoint is not expected
+ to protect information about whether a packet was successfully
+ decrypted or the number of valid stateless reset tokens.
+
+ If the last 16 bytes of the datagram are identical in value to a
+ stateless reset token, the endpoint MUST enter the draining period
+ and not send any further packets on this connection.
+
+10.3.2. Calculating a Stateless Reset Token
+
+ The stateless reset token MUST be difficult to guess. In order to
+ create a stateless reset token, an endpoint could randomly generate
+ [RANDOM] a secret for every connection that it creates. However,
+ this presents a coordination problem when there are multiple
+ instances in a cluster or a storage problem for an endpoint that
+ might lose state. Stateless reset specifically exists to handle the
+ case where state is lost, so this approach is suboptimal.
+
+ A single static key can be used across all connections to the same
+ endpoint by generating the proof using a pseudorandom function that
+ takes a static key and the connection ID chosen by the endpoint (see
+ Section 5.1) as input. An endpoint could use HMAC [RFC2104] (for
+ example, HMAC(static_key, connection_id)) or the HMAC-based Key
+ Derivation Function (HKDF) [RFC5869] (for example, using the static
+ key as input keying material, with the connection ID as salt). The
+ output of this function is truncated to 16 bytes to produce the
+ stateless reset token for that connection.
+
+ An endpoint that loses state can use the same method to generate a
+ valid stateless reset token. The connection ID comes from the packet
+ that the endpoint receives.
+
+ This design relies on the peer always sending a connection ID in its
+ packets so that the endpoint can use the connection ID from a packet
+ to reset the connection. An endpoint that uses this design MUST
+ either use the same connection ID length for all connections or
+ encode the length of the connection ID such that it can be recovered
+ without state. In addition, it cannot provide a zero-length
+ connection ID.
+
+ Revealing the stateless reset token allows any entity to terminate
+ the connection, so a value can only be used once. This method for
+ choosing the stateless reset token means that the combination of
+ connection ID and static key MUST NOT be used for another connection.
+ A denial-of-service attack is possible if the same connection ID is
+ used by instances that share a static key or if an attacker can cause
+ a packet to be routed to an instance that has no state but the same
+ static key; see Section 21.11. A connection ID from a connection
+ that is reset by revealing the stateless reset token MUST NOT be
+ reused for new connections at nodes that share a static key.
+
+ The same stateless reset token MUST NOT be used for multiple
+ connection IDs. Endpoints are not required to compare new values
+ against all previous values, but a duplicate value MAY be treated as
+ a connection error of type PROTOCOL_VIOLATION.
+
+ Note that Stateless Resets do not have any cryptographic protection.
+
+10.3.3. Looping
+
+ The design of a Stateless Reset is such that without knowing the
+ stateless reset token it is indistinguishable from a valid packet.
+ For instance, if a server sends a Stateless Reset to another server,
+ it might receive another Stateless Reset in response, which could
+ lead to an infinite exchange.
+
+ An endpoint MUST ensure that every Stateless Reset that it sends is
+ smaller than the packet that triggered it, unless it maintains state
+ sufficient to prevent looping. In the event of a loop, this results
+ in packets eventually being too small to trigger a response.
+
+ An endpoint can remember the number of Stateless Resets that it has
+ sent and stop generating new Stateless Resets once a limit is
+ reached. Using separate limits for different remote addresses will
+ ensure that Stateless Resets can be used to close connections when
+ other peers or connections have exhausted limits.
+
+ A Stateless Reset that is smaller than 41 bytes might be identifiable
+ as a Stateless Reset by an observer, depending upon the length of the
+ peer's connection IDs. Conversely, not sending a Stateless Reset in
+ response to a small packet might result in Stateless Resets not being
+ useful in detecting cases of broken connections where only very small
+ packets are sent; such failures might only be detected by other
+ means, such as timers.
+
+11. Error Handling
+
+ An endpoint that detects an error SHOULD signal the existence of that
+ error to its peer. Both transport-level and application-level errors
+ can affect an entire connection; see Section 11.1. Only application-
+ level errors can be isolated to a single stream; see Section 11.2.
+
+ The most appropriate error code (Section 20) SHOULD be included in
+ the frame that signals the error. Where this specification
+ identifies error conditions, it also identifies the error code that
+ is used; though these are worded as requirements, different
+ implementation strategies might lead to different errors being
+ reported. In particular, an endpoint MAY use any applicable error
+ code when it detects an error condition; a generic error code (such
+ as PROTOCOL_VIOLATION or INTERNAL_ERROR) can always be used in place
+ of specific error codes.
+
+ A stateless reset (Section 10.3) is not suitable for any error that
+ can be signaled with a CONNECTION_CLOSE or RESET_STREAM frame. A
+ stateless reset MUST NOT be used by an endpoint that has the state
+ necessary to send a frame on the connection.
+
+11.1. Connection Errors
+
+ Errors that result in the connection being unusable, such as an
+ obvious violation of protocol semantics or corruption of state that
+ affects an entire connection, MUST be signaled using a
+ CONNECTION_CLOSE frame (Section 19.19).
+
+ Application-specific protocol errors are signaled using the
+ CONNECTION_CLOSE frame with a frame type of 0x1d. Errors that are
+ specific to the transport, including all those described in this
+ document, are carried in the CONNECTION_CLOSE frame with a frame type
+ of 0x1c.
+
+ A CONNECTION_CLOSE frame could be sent in a packet that is lost. An
+ endpoint SHOULD be prepared to retransmit a packet containing a
+ CONNECTION_CLOSE frame if it receives more packets on a terminated
+ connection. Limiting the number of retransmissions and the time over
+ which this final packet is sent limits the effort expended on
+ terminated connections.
+
+ An endpoint that chooses not to retransmit packets containing a
+ CONNECTION_CLOSE frame risks a peer missing the first such packet.
+ The only mechanism available to an endpoint that continues to receive
+ data for a terminated connection is to attempt the stateless reset
+ process (Section 10.3).
+
+ As the AEAD for Initial packets does not provide strong
+ authentication, an endpoint MAY discard an invalid Initial packet.
+ Discarding an Initial packet is permitted even where this
+ specification otherwise mandates a connection error. An endpoint can
+ only discard a packet if it does not process the frames in the packet
+ or reverts the effects of any processing. Discarding invalid Initial
+ packets might be used to reduce exposure to denial of service; see
+ Section 21.2.
+
+11.2. Stream Errors
+
+ If an application-level error affects a single stream but otherwise
+ leaves the connection in a recoverable state, the endpoint can send a
+ RESET_STREAM frame (Section 19.4) with an appropriate error code to
+ terminate just the affected stream.
+
+ Resetting a stream without the involvement of the application
+ protocol could cause the application protocol to enter an
+ unrecoverable state. RESET_STREAM MUST only be instigated by the
+ application protocol that uses QUIC.
+
+ The semantics of the application error code carried in RESET_STREAM
+ are defined by the application protocol. Only the application
+ protocol is able to cause a stream to be terminated. A local
+ instance of the application protocol uses a direct API call, and a
+ remote instance uses the STOP_SENDING frame, which triggers an
+ automatic RESET_STREAM.
+
+ Application protocols SHOULD define rules for handling streams that
+ are prematurely canceled by either endpoint.
+
+12. Packets and Frames
+
+ QUIC endpoints communicate by exchanging packets. Packets have
+ confidentiality and integrity protection; see Section 12.1. Packets
+ are carried in UDP datagrams; see Section 12.2.
+
+ This version of QUIC uses the long packet header during connection
+ establishment; see Section 17.2. Packets with the long header are
+ Initial (Section 17.2.2), 0-RTT (Section 17.2.3), Handshake
+ (Section 17.2.4), and Retry (Section 17.2.5). Version negotiation
+ uses a version-independent packet with a long header; see
+ Section 17.2.1.
+
+ Packets with the short header are designed for minimal overhead and
+ are used after a connection is established and 1-RTT keys are
+ available; see Section 17.3.
+
+12.1. Protected Packets
+
+ QUIC packets have different levels of cryptographic protection based
+ on the type of packet. Details of packet protection are found in
+ [QUIC-TLS]; this section includes an overview of the protections that
+ are provided.
+
+ Version Negotiation packets have no cryptographic protection; see
+ [QUIC-INVARIANTS].
+
+ Retry packets use an AEAD function [AEAD] to protect against
+ accidental modification.
+
+ Initial packets use an AEAD function, the keys for which are derived
+ using a value that is visible on the wire. Initial packets therefore
+ do not have effective confidentiality protection. Initial protection
+ exists to ensure that the sender of the packet is on the network
+ path. Any entity that receives an Initial packet from a client can
+ recover the keys that will allow them to both read the contents of
+ the packet and generate Initial packets that will be successfully
+ authenticated at either endpoint. The AEAD also protects Initial
+ packets against accidental modification.
+
+ All other packets are protected with keys derived from the
+ cryptographic handshake. The cryptographic handshake ensures that
+ only the communicating endpoints receive the corresponding keys for
+ Handshake, 0-RTT, and 1-RTT packets. Packets protected with 0-RTT
+ and 1-RTT keys have strong confidentiality and integrity protection.
+
+ The Packet Number field that appears in some packet types has
+ alternative confidentiality protection that is applied as part of
+ header protection; see Section 5.4 of [QUIC-TLS] for details. The
+ underlying packet number increases with each packet sent in a given
+ packet number space; see Section 12.3 for details.
+
+12.2. Coalescing Packets
+
+ Initial (Section 17.2.2), 0-RTT (Section 17.2.3), and Handshake
+ (Section 17.2.4) packets contain a Length field that determines the
+ end of the packet. The length includes both the Packet Number and
+ Payload fields, both of which are confidentiality protected and
+ initially of unknown length. The length of the Payload field is
+ learned once header protection is removed.
+
+ Using the Length field, a sender can coalesce multiple QUIC packets
+ into one UDP datagram. This can reduce the number of UDP datagrams
+ needed to complete the cryptographic handshake and start sending
+ data. This can also be used to construct Path Maximum Transmission
+ Unit (PMTU) probes; see Section 14.4.1. Receivers MUST be able to
+ process coalesced packets.
+
+ Coalescing packets in order of increasing encryption levels (Initial,
+ 0-RTT, Handshake, 1-RTT; see Section 4.1.4 of [QUIC-TLS]) makes it
+ more likely that the receiver will be able to process all the packets
+ in a single pass. A packet with a short header does not include a
+ length, so it can only be the last packet included in a UDP datagram.
+ An endpoint SHOULD include multiple frames in a single packet if they
+ are to be sent at the same encryption level, instead of coalescing
+ multiple packets at the same encryption level.
+
+ Receivers MAY route based on the information in the first packet
+ contained in a UDP datagram. Senders MUST NOT coalesce QUIC packets
+ with different connection IDs into a single UDP datagram. Receivers
+ SHOULD ignore any subsequent packets with a different Destination
+ Connection ID than the first packet in the datagram.
+
+ Every QUIC packet that is coalesced into a single UDP datagram is
+ separate and complete. The receiver of coalesced QUIC packets MUST
+ individually process each QUIC packet and separately acknowledge
+ them, as if they were received as the payload of different UDP
+ datagrams. For example, if decryption fails (because the keys are
+ not available or for any other reason), the receiver MAY either
+ discard or buffer the packet for later processing and MUST attempt to
+ process the remaining packets.
+
+ Retry packets (Section 17.2.5), Version Negotiation packets
+ (Section 17.2.1), and packets with a short header (Section 17.3) do
+ not contain a Length field and so cannot be followed by other packets
+ in the same UDP datagram. Note also that there is no situation where
+ a Retry or Version Negotiation packet is coalesced with another
+ packet.
+
+12.3. Packet Numbers
+
+ The packet number is an integer in the range 0 to 2^62-1. This
+ number is used in determining the cryptographic nonce for packet
+ protection. Each endpoint maintains a separate packet number for
+ sending and receiving.
+
+ Packet numbers are limited to this range because they need to be
+ representable in whole in the Largest Acknowledged field of an ACK
+ frame (Section 19.3). When present in a long or short header,
+ however, packet numbers are reduced and encoded in 1 to 4 bytes; see
+ Section 17.1.
+
+ Version Negotiation (Section 17.2.1) and Retry (Section 17.2.5)
+ packets do not include a packet number.
+
+ Packet numbers are divided into three spaces in QUIC:
+
+ Initial space: All Initial packets (Section 17.2.2) are in this
+ space.
+
+ Handshake space: All Handshake packets (Section 17.2.4) are in this
+ space.
+
+ Application data space: All 0-RTT (Section 17.2.3) and 1-RTT
+ (Section 17.3.1) packets are in this space.
+
+ As described in [QUIC-TLS], each packet type uses different
+ protection keys.
+
+ Conceptually, a packet number space is the context in which a packet
+ can be processed and acknowledged. Initial packets can only be sent
+ with Initial packet protection keys and acknowledged in packets that
+ are also Initial packets. Similarly, Handshake packets are sent at
+ the Handshake encryption level and can only be acknowledged in
+ Handshake packets.
+
+ This enforces cryptographic separation between the data sent in the
+ different packet number spaces. Packet numbers in each space start
+ at packet number 0. Subsequent packets sent in the same packet
+ number space MUST increase the packet number by at least one.
+
+ 0-RTT and 1-RTT data exist in the same packet number space to make
+ loss recovery algorithms easier to implement between the two packet
+ types.
+
+ A QUIC endpoint MUST NOT reuse a packet number within the same packet
+ number space in one connection. If the packet number for sending
+ reaches 2^62-1, the sender MUST close the connection without sending
+ a CONNECTION_CLOSE frame or any further packets; an endpoint MAY send
+ a Stateless Reset (Section 10.3) in response to further packets that
+ it receives.
+
+ A receiver MUST discard a newly unprotected packet unless it is
+ certain that it has not processed another packet with the same packet
+ number from the same packet number space. Duplicate suppression MUST
+ happen after removing packet protection for the reasons described in
+ Section 9.5 of [QUIC-TLS].
+
+ Endpoints that track all individual packets for the purposes of
+ detecting duplicates are at risk of accumulating excessive state.
+ The data required for detecting duplicates can be limited by
+ maintaining a minimum packet number below which all packets are
+ immediately dropped. Any minimum needs to account for large
+ variations in round-trip time, which includes the possibility that a
+ peer might probe network paths with much larger round-trip times; see
+ Section 9.
+
+ Packet number encoding at a sender and decoding at a receiver are
+ described in Section 17.1.
+
+12.4. Frames and Frame Types
+
+ The payload of QUIC packets, after removing packet protection,
+ consists of a sequence of complete frames, as shown in Figure 11.
+ Version Negotiation, Stateless Reset, and Retry packets do not
+ contain frames.
+
+ Packet Payload {
+ Frame (8..) ...,
+ }
+
+ Figure 11: QUIC Payload
+
+ The payload of a packet that contains frames MUST contain at least
+ one frame, and MAY contain multiple frames and multiple frame types.
+ An endpoint MUST treat receipt of a packet containing no frames as a
+ connection error of type PROTOCOL_VIOLATION. Frames always fit
+ within a single QUIC packet and cannot span multiple packets.
+
+ Each frame begins with a Frame Type, indicating its type, followed by
+ additional type-dependent fields:
+
+ Frame {
+ Frame Type (i),
+ Type-Dependent Fields (..),
+ }
+
+ Figure 12: Generic Frame Layout
+
+ Table 3 lists and summarizes information about each frame type that
+ is defined in this specification. A description of this summary is
+ included after the table.
+
+ +============+======================+===============+======+======+
+ | Type Value | Frame Type Name | Definition | Pkts | Spec |
+ +============+======================+===============+======+======+
+ | 0x00 | PADDING | Section 19.1 | IH01 | NP |
+ +------------+----------------------+---------------+------+------+
+ | 0x01 | PING | Section 19.2 | IH01 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x02-0x03 | ACK | Section 19.3 | IH_1 | NC |
+ +------------+----------------------+---------------+------+------+
+ | 0x04 | RESET_STREAM | Section 19.4 | __01 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x05 | STOP_SENDING | Section 19.5 | __01 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x06 | CRYPTO | Section 19.6 | IH_1 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x07 | NEW_TOKEN | Section 19.7 | ___1 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x08-0x0f | STREAM | Section 19.8 | __01 | F |
+ +------------+----------------------+---------------+------+------+
+ | 0x10 | MAX_DATA | Section 19.9 | __01 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x11 | MAX_STREAM_DATA | Section 19.10 | __01 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x12-0x13 | MAX_STREAMS | Section 19.11 | __01 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x14 | DATA_BLOCKED | Section 19.12 | __01 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x15 | STREAM_DATA_BLOCKED | Section 19.13 | __01 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x16-0x17 | STREAMS_BLOCKED | Section 19.14 | __01 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x18 | NEW_CONNECTION_ID | Section 19.15 | __01 | P |
+ +------------+----------------------+---------------+------+------+
+ | 0x19 | RETIRE_CONNECTION_ID | Section 19.16 | __01 | |
+ +------------+----------------------+---------------+------+------+
+ | 0x1a | PATH_CHALLENGE | Section 19.17 | __01 | P |
+ +------------+----------------------+---------------+------+------+
+ | 0x1b | PATH_RESPONSE | Section 19.18 | ___1 | P |
+ +------------+----------------------+---------------+------+------+
+ | 0x1c-0x1d | CONNECTION_CLOSE | Section 19.19 | ih01 | N |
+ +------------+----------------------+---------------+------+------+
+ | 0x1e | HANDSHAKE_DONE | Section 19.20 | ___1 | |
+ +------------+----------------------+---------------+------+------+
+
+ Table 3: Frame Types
+
+ The format and semantics of each frame type are explained in more
+ detail in Section 19. The remainder of this section provides a
+ summary of important and general information.
+
+ The Frame Type in ACK, STREAM, MAX_STREAMS, STREAMS_BLOCKED, and
+ CONNECTION_CLOSE frames is used to carry other frame-specific flags.
+ For all other frames, the Frame Type field simply identifies the
+ frame.
+
+ The "Pkts" column in Table 3 lists the types of packets that each
+ frame type could appear in, indicated by the following characters:
+
+ I: Initial (Section 17.2.2)
+
+ H: Handshake (Section 17.2.4)
+
+ 0: 0-RTT (Section 17.2.3)
+
+ 1: 1-RTT (Section 17.3.1)
+
+ ih: Only a CONNECTION_CLOSE frame of type 0x1c can appear in Initial
+ or Handshake packets.
+
+ For more details about these restrictions, see Section 12.5. Note
+ that all frames can appear in 1-RTT packets. An endpoint MUST treat
+ receipt of a frame in a packet type that is not permitted as a
+ connection error of type PROTOCOL_VIOLATION.
+
+ The "Spec" column in Table 3 summarizes any special rules governing
+ the processing or generation of the frame type, as indicated by the
+ following characters:
+
+ N: Packets containing only frames with this marking are not ack-
+ eliciting; see Section 13.2.
+
+ C: Packets containing only frames with this marking do not count
+ toward bytes in flight for congestion control purposes; see
+ [QUIC-RECOVERY].
+
+ P: Packets containing only frames with this marking can be used to
+ probe new network paths during connection migration; see
+ Section 9.1.
+
+ F: The contents of frames with this marking are flow controlled;
+ see Section 4.
+
+ The "Pkts" and "Spec" columns in Table 3 do not form part of the IANA
+ registry; see Section 22.4.
+
+ An endpoint MUST treat the receipt of a frame of unknown type as a
+ connection error of type FRAME_ENCODING_ERROR.
+
+ All frames are idempotent in this version of QUIC. That is, a valid
+ frame does not cause undesirable side effects or errors when received
+ more than once.
+
+ The Frame Type field uses a variable-length integer encoding (see
+ Section 16), with one exception. To ensure simple and efficient
+ implementations of frame parsing, a frame type MUST use the shortest
+ possible encoding. For frame types defined in this document, this
+ means a single-byte encoding, even though it is possible to encode
+ these values as a two-, four-, or eight-byte variable-length integer.
+ For instance, though 0x4001 is a legitimate two-byte encoding for a
+ variable-length integer with a value of 1, PING frames are always
+ encoded as a single byte with the value 0x01. This rule applies to
+ all current and future QUIC frame types. An endpoint MAY treat the
+ receipt of a frame type that uses a longer encoding than necessary as
+ a connection error of type PROTOCOL_VIOLATION.
+
+12.5. Frames and Number Spaces
+
+ Some frames are prohibited in different packet number spaces. The
+ rules here generalize those of TLS, in that frames associated with
+ establishing the connection can usually appear in packets in any
+ packet number space, whereas those associated with transferring data
+ can only appear in the application data packet number space:
+
+ * PADDING, PING, and CRYPTO frames MAY appear in any packet number
+ space.
+
+ * CONNECTION_CLOSE frames signaling errors at the QUIC layer (type
+ 0x1c) MAY appear in any packet number space. CONNECTION_CLOSE
+ frames signaling application errors (type 0x1d) MUST only appear
+ in the application data packet number space.
+
+ * ACK frames MAY appear in any packet number space but can only
+ acknowledge packets that appeared in that packet number space.
+ However, as noted below, 0-RTT packets cannot contain ACK frames.
+
+ * All other frame types MUST only be sent in the application data
+ packet number space.
+
+ Note that it is not possible to send the following frames in 0-RTT
+ packets for various reasons: ACK, CRYPTO, HANDSHAKE_DONE, NEW_TOKEN,
+ PATH_RESPONSE, and RETIRE_CONNECTION_ID. A server MAY treat receipt
+ of these frames in 0-RTT packets as a connection error of type
+ PROTOCOL_VIOLATION.
+
+13. Packetization and Reliability
+
+ A sender sends one or more frames in a QUIC packet; see Section 12.4.
+
+ A sender can minimize per-packet bandwidth and computational costs by
+ including as many frames as possible in each QUIC packet. A sender
+ MAY wait for a short period of time to collect multiple frames before
+ sending a packet that is not maximally packed, to avoid sending out
+ large numbers of small packets. An implementation MAY use knowledge
+ about application sending behavior or heuristics to determine whether
+ and for how long to wait. This waiting period is an implementation
+ decision, and an implementation should be careful to delay
+ conservatively, since any delay is likely to increase application-
+ visible latency.
+
+ Stream multiplexing is achieved by interleaving STREAM frames from
+ multiple streams into one or more QUIC packets. A single QUIC packet
+ can include multiple STREAM frames from one or more streams.
+
+ One of the benefits of QUIC is avoidance of head-of-line blocking
+ across multiple streams. When a packet loss occurs, only streams
+ with data in that packet are blocked waiting for a retransmission to
+ be received, while other streams can continue making progress. Note
+ that when data from multiple streams is included in a single QUIC
+ packet, loss of that packet blocks all those streams from making
+ progress. Implementations are advised to include as few streams as
+ necessary in outgoing packets without losing transmission efficiency
+ to underfilled packets.
+
+13.1. Packet Processing
+
+ A packet MUST NOT be acknowledged until packet protection has been
+ successfully removed and all frames contained in the packet have been
+ processed. For STREAM frames, this means the data has been enqueued
+ in preparation to be received by the application protocol, but it
+ does not require that data be delivered and consumed.
+
+ Once the packet has been fully processed, a receiver acknowledges
+ receipt by sending one or more ACK frames containing the packet
+ number of the received packet.
+
+ An endpoint SHOULD treat receipt of an acknowledgment for a packet it
+ did not send as a connection error of type PROTOCOL_VIOLATION, if it
+ is able to detect the condition. For further discussion of how this
+ might be achieved, see Section 21.4.
+
+13.2. Generating Acknowledgments
+
+ Endpoints acknowledge all packets they receive and process. However,
+ only ack-eliciting packets cause an ACK frame to be sent within the
+ maximum ack delay. Packets that are not ack-eliciting are only
+ acknowledged when an ACK frame is sent for other reasons.
+
+ When sending a packet for any reason, an endpoint SHOULD attempt to
+ include an ACK frame if one has not been sent recently. Doing so
+ helps with timely loss detection at the peer.
+
+ In general, frequent feedback from a receiver improves loss and
+ congestion response, but this has to be balanced against excessive
+ load generated by a receiver that sends an ACK frame in response to
+ every ack-eliciting packet. The guidance offered below seeks to
+ strike this balance.
+
+13.2.1. Sending ACK Frames
+
+ Every packet SHOULD be acknowledged at least once, and ack-eliciting
+ packets MUST be acknowledged at least once within the maximum delay
+ an endpoint communicated using the max_ack_delay transport parameter;
+ see Section 18.2. max_ack_delay declares an explicit contract: an
+ endpoint promises to never intentionally delay acknowledgments of an
+ ack-eliciting packet by more than the indicated value. If it does,
+ any excess accrues to the RTT estimate and could result in spurious
+ or delayed retransmissions from the peer. A sender uses the
+ receiver's max_ack_delay value in determining timeouts for timer-
+ based retransmission, as detailed in Section 6.2 of [QUIC-RECOVERY].
+
+ An endpoint MUST acknowledge all ack-eliciting Initial and Handshake
+ packets immediately and all ack-eliciting 0-RTT and 1-RTT packets
+ within its advertised max_ack_delay, with the following exception.
+ Prior to handshake confirmation, an endpoint might not have packet
+ protection keys for decrypting Handshake, 0-RTT, or 1-RTT packets
+ when they are received. It might therefore buffer them and
+ acknowledge them when the requisite keys become available.
+
+ Since packets containing only ACK frames are not congestion
+ controlled, an endpoint MUST NOT send more than one such packet in
+ response to receiving an ack-eliciting packet.
+
+ An endpoint MUST NOT send a non-ack-eliciting packet in response to a
+ non-ack-eliciting packet, even if there are packet gaps that precede
+ the received packet. This avoids an infinite feedback loop of
+ acknowledgments, which could prevent the connection from ever
+ becoming idle. Non-ack-eliciting packets are eventually acknowledged
+ when the endpoint sends an ACK frame in response to other events.
+
+ An endpoint that is only sending ACK frames will not receive
+ acknowledgments from its peer unless those acknowledgments are
+ included in packets with ack-eliciting frames. An endpoint SHOULD
+ send an ACK frame with other frames when there are new ack-eliciting
+ packets to acknowledge. When only non-ack-eliciting packets need to
+ be acknowledged, an endpoint MAY choose not to send an ACK frame with
+ outgoing frames until an ack-eliciting packet has been received.
+
+ An endpoint that is only sending non-ack-eliciting packets might
+ choose to occasionally add an ack-eliciting frame to those packets to
+ ensure that it receives an acknowledgment; see Section 13.2.4. In
+ that case, an endpoint MUST NOT send an ack-eliciting frame in all
+ packets that would otherwise be non-ack-eliciting, to avoid an
+ infinite feedback loop of acknowledgments.
+
+ In order to assist loss detection at the sender, an endpoint SHOULD
+ generate and send an ACK frame without delay when it receives an ack-
+ eliciting packet either:
+
+ * when the received packet has a packet number less than another
+ ack-eliciting packet that has been received, or
+
+ * when the packet has a packet number larger than the highest-
+ numbered ack-eliciting packet that has been received and there are
+ missing packets between that packet and this packet.
+
+ Similarly, packets marked with the ECN Congestion Experienced (CE)
+ codepoint in the IP header SHOULD be acknowledged immediately, to
+ reduce the peer's response time to congestion events.
+
+ The algorithms in [QUIC-RECOVERY] are expected to be resilient to
+ receivers that do not follow the guidance offered above. However, an
+ implementation should only deviate from these requirements after
+ careful consideration of the performance implications of a change,
+ for connections made by the endpoint and for other users of the
+ network.
+
+13.2.2. Acknowledgment Frequency
+
+ A receiver determines how frequently to send acknowledgments in
+ response to ack-eliciting packets. This determination involves a
+ trade-off.
+
+ Endpoints rely on timely acknowledgment to detect loss; see Section 6
+ of [QUIC-RECOVERY]. Window-based congestion controllers, such as the
+ one described in Section 7 of [QUIC-RECOVERY], rely on
+ acknowledgments to manage their congestion window. In both cases,
+ delaying acknowledgments can adversely affect performance.
+
+ On the other hand, reducing the frequency of packets that carry only
+ acknowledgments reduces packet transmission and processing cost at
+ both endpoints. It can improve connection throughput on severely
+ asymmetric links and reduce the volume of acknowledgment traffic
+ using return path capacity; see Section 3 of [RFC3449].
+
+ A receiver SHOULD send an ACK frame after receiving at least two ack-
+ eliciting packets. This recommendation is general in nature and
+ consistent with recommendations for TCP endpoint behavior [RFC5681].
+ Knowledge of network conditions, knowledge of the peer's congestion
+ controller, or further research and experimentation might suggest
+ alternative acknowledgment strategies with better performance
+ characteristics.
+
+ A receiver MAY process multiple available packets before determining
+ whether to send an ACK frame in response.
+
+13.2.3. Managing ACK Ranges
+
+ When an ACK frame is sent, one or more ranges of acknowledged packets
+ are included. Including acknowledgments for older packets reduces
+ the chance of spurious retransmissions caused by losing previously
+ sent ACK frames, at the cost of larger ACK frames.
+
+ ACK frames SHOULD always acknowledge the most recently received
+ packets, and the more out of order the packets are, the more
+ important it is to send an updated ACK frame quickly, to prevent the
+ peer from declaring a packet as lost and spuriously retransmitting
+ the frames it contains. An ACK frame is expected to fit within a
+ single QUIC packet. If it does not, then older ranges (those with
+ the smallest packet numbers) are omitted.
+
+ A receiver limits the number of ACK Ranges (Section 19.3.1) it
+ remembers and sends in ACK frames, both to limit the size of ACK
+ frames and to avoid resource exhaustion. After receiving
+ acknowledgments for an ACK frame, the receiver SHOULD stop tracking
+ those acknowledged ACK Ranges. Senders can expect acknowledgments
+ for most packets, but QUIC does not guarantee receipt of an
+ acknowledgment for every packet that the receiver processes.
+
+ It is possible that retaining many ACK Ranges could cause an ACK
+ frame to become too large. A receiver can discard unacknowledged ACK
+ Ranges to limit ACK frame size, at the cost of increased
+ retransmissions from the sender. This is necessary if an ACK frame
+ would be too large to fit in a packet. Receivers MAY also limit ACK
+ frame size further to preserve space for other frames or to limit the
+ capacity that acknowledgments consume.
+
+ A receiver MUST retain an ACK Range unless it can ensure that it will
+ not subsequently accept packets with numbers in that range.
+ Maintaining a minimum packet number that increases as ranges are
+ discarded is one way to achieve this with minimal state.
+
+ Receivers can discard all ACK Ranges, but they MUST retain the
+ largest packet number that has been successfully processed, as that
+ is used to recover packet numbers from subsequent packets; see
+ Section 17.1.
+
+ A receiver SHOULD include an ACK Range containing the largest
+ received packet number in every ACK frame. The Largest Acknowledged
+ field is used in ECN validation at a sender, and including a lower
+ value than what was included in a previous ACK frame could cause ECN
+ to be unnecessarily disabled; see Section 13.4.2.
+
+ Section 13.2.4 describes an exemplary approach for determining what
+ packets to acknowledge in each ACK frame. Though the goal of this
+ algorithm is to generate an acknowledgment for every packet that is
+ processed, it is still possible for acknowledgments to be lost.
+
+13.2.4. Limiting Ranges by Tracking ACK Frames
+
+ When a packet containing an ACK frame is sent, the Largest
+ Acknowledged field in that frame can be saved. When a packet
+ containing an ACK frame is acknowledged, the receiver can stop
+ acknowledging packets less than or equal to the Largest Acknowledged
+ field in the sent ACK frame.
+
+ A receiver that sends only non-ack-eliciting packets, such as ACK
+ frames, might not receive an acknowledgment for a long period of
+ time. This could cause the receiver to maintain state for a large
+ number of ACK frames for a long period of time, and ACK frames it
+ sends could be unnecessarily large. In such a case, a receiver could
+ send a PING or other small ack-eliciting frame occasionally, such as
+ once per round trip, to elicit an ACK from the peer.
+
+ In cases without ACK frame loss, this algorithm allows for a minimum
+ of 1 RTT of reordering. In cases with ACK frame loss and reordering,
+ this approach does not guarantee that every acknowledgment is seen by
+ the sender before it is no longer included in the ACK frame. Packets
+ could be received out of order, and all subsequent ACK frames
+ containing them could be lost. In this case, the loss recovery
+ algorithm could cause spurious retransmissions, but the sender will
+ continue making forward progress.
+
+13.2.5. Measuring and Reporting Host Delay
+
+ An endpoint measures the delays intentionally introduced between the
+ time the packet with the largest packet number is received and the
+ time an acknowledgment is sent. The endpoint encodes this
+ acknowledgment delay in the ACK Delay field of an ACK frame; see
+ Section 19.3. This allows the receiver of the ACK frame to adjust
+ for any intentional delays, which is important for getting a better
+ estimate of the path RTT when acknowledgments are delayed.
+
+ A packet might be held in the OS kernel or elsewhere on the host
+ before being processed. An endpoint MUST NOT include delays that it
+ does not control when populating the ACK Delay field in an ACK frame.
+ However, endpoints SHOULD include buffering delays caused by
+ unavailability of decryption keys, since these delays can be large
+ and are likely to be non-repeating.
+
+ When the measured acknowledgment delay is larger than its
+ max_ack_delay, an endpoint SHOULD report the measured delay. This
+ information is especially useful during the handshake when delays
+ might be large; see Section 13.2.1.
+
+13.2.6. ACK Frames and Packet Protection
+
+ ACK frames MUST only be carried in a packet that has the same packet
+ number space as the packet being acknowledged; see Section 12.1. For
+ instance, packets that are protected with 1-RTT keys MUST be
+ acknowledged in packets that are also protected with 1-RTT keys.
+
+ Packets that a client sends with 0-RTT packet protection MUST be
+ acknowledged by the server in packets protected by 1-RTT keys. This
+ can mean that the client is unable to use these acknowledgments if
+ the server cryptographic handshake messages are delayed or lost.
+ Note that the same limitation applies to other data sent by the
+ server protected by the 1-RTT keys.
+
+13.2.7. PADDING Frames Consume Congestion Window
+
+ Packets containing PADDING frames are considered to be in flight for
+ congestion control purposes [QUIC-RECOVERY]. Packets containing only
+ PADDING frames therefore consume congestion window but do not
+ generate acknowledgments that will open the congestion window. To
+ avoid a deadlock, a sender SHOULD ensure that other frames are sent
+ periodically in addition to PADDING frames to elicit acknowledgments
+ from the receiver.
+
+13.3. Retransmission of Information
+
+ QUIC packets that are determined to be lost are not retransmitted
+ whole. The same applies to the frames that are contained within lost
+ packets. Instead, the information that might be carried in frames is
+ sent again in new frames as needed.
+
+ New frames and packets are used to carry information that is
+ determined to have been lost. In general, information is sent again
+ when a packet containing that information is determined to be lost,
+ and sending ceases when a packet containing that information is
+ acknowledged.
+
+ * Data sent in CRYPTO frames is retransmitted according to the rules
+ in [QUIC-RECOVERY], until all data has been acknowledged. Data in
+ CRYPTO frames for Initial and Handshake packets is discarded when
+ keys for the corresponding packet number space are discarded.
+
+ * Application data sent in STREAM frames is retransmitted in new
+ STREAM frames unless the endpoint has sent a RESET_STREAM for that
+ stream. Once an endpoint sends a RESET_STREAM frame, no further
+ STREAM frames are needed.
+
+ * ACK frames carry the most recent set of acknowledgments and the
+ acknowledgment delay from the largest acknowledged packet, as
+ described in Section 13.2.1. Delaying the transmission of packets
+ containing ACK frames or resending old ACK frames can cause the
+ peer to generate an inflated RTT sample or unnecessarily disable
+ ECN.
+
+ * Cancellation of stream transmission, as carried in a RESET_STREAM
+ frame, is sent until acknowledged or until all stream data is
+ acknowledged by the peer (that is, either the "Reset Recvd" or
+ "Data Recvd" state is reached on the sending part of the stream).
+ The content of a RESET_STREAM frame MUST NOT change when it is
+ sent again.
+
+ * Similarly, a request to cancel stream transmission, as encoded in
+ a STOP_SENDING frame, is sent until the receiving part of the
+ stream enters either a "Data Recvd" or "Reset Recvd" state; see
+ Section 3.5.
+
+ * Connection close signals, including packets that contain
+ CONNECTION_CLOSE frames, are not sent again when packet loss is
+ detected. Resending these signals is described in Section 10.
+
+ * The current connection maximum data is sent in MAX_DATA frames.
+ An updated value is sent in a MAX_DATA frame if the packet
+ containing the most recently sent MAX_DATA frame is declared lost
+ or when the endpoint decides to update the limit. Care is
+ necessary to avoid sending this frame too often, as the limit can
+ increase frequently and cause an unnecessarily large number of
+ MAX_DATA frames to be sent; see Section 4.2.
+
+ * The current maximum stream data offset is sent in MAX_STREAM_DATA
+ frames. Like MAX_DATA, an updated value is sent when the packet
+ containing the most recent MAX_STREAM_DATA frame for a stream is
+ lost or when the limit is updated, with care taken to prevent the
+ frame from being sent too often. An endpoint SHOULD stop sending
+ MAX_STREAM_DATA frames when the receiving part of the stream
+ enters a "Size Known" or "Reset Recvd" state.
+
+ * The limit on streams of a given type is sent in MAX_STREAMS
+ frames. Like MAX_DATA, an updated value is sent when a packet
+ containing the most recent MAX_STREAMS for a stream type frame is
+ declared lost or when the limit is updated, with care taken to
+ prevent the frame from being sent too often.
+
+ * Blocked signals are carried in DATA_BLOCKED, STREAM_DATA_BLOCKED,
+ and STREAMS_BLOCKED frames. DATA_BLOCKED frames have connection
+ scope, STREAM_DATA_BLOCKED frames have stream scope, and
+ STREAMS_BLOCKED frames are scoped to a specific stream type. A
+ new frame is sent if a packet containing the most recent frame for
+ a scope is lost, but only while the endpoint is blocked on the
+ corresponding limit. These frames always include the limit that
+ is causing blocking at the time that they are transmitted.
+
+ * A liveness or path validation check using PATH_CHALLENGE frames is
+ sent periodically until a matching PATH_RESPONSE frame is received
+ or until there is no remaining need for liveness or path
+ validation checking. PATH_CHALLENGE frames include a different
+ payload each time they are sent.
+
+ * Responses to path validation using PATH_RESPONSE frames are sent
+ just once. The peer is expected to send more PATH_CHALLENGE
+ frames as necessary to evoke additional PATH_RESPONSE frames.
+
+ * New connection IDs are sent in NEW_CONNECTION_ID frames and
+ retransmitted if the packet containing them is lost.
+ Retransmissions of this frame carry the same sequence number
+ value. Likewise, retired connection IDs are sent in
+ RETIRE_CONNECTION_ID frames and retransmitted if the packet
+ containing them is lost.
+
+ * NEW_TOKEN frames are retransmitted if the packet containing them
+ is lost. No special support is made for detecting reordered and
+ duplicated NEW_TOKEN frames other than a direct comparison of the
+ frame contents.
+
+ * PING and PADDING frames contain no information, so lost PING or
+ PADDING frames do not require repair.
+
+ * The HANDSHAKE_DONE frame MUST be retransmitted until it is
+ acknowledged.
+
+ Endpoints SHOULD prioritize retransmission of data over sending new
+ data, unless priorities specified by the application indicate
+ otherwise; see Section 2.3.
+
+ Even though a sender is encouraged to assemble frames containing up-
+ to-date information every time it sends a packet, it is not forbidden
+ to retransmit copies of frames from lost packets. A sender that
+ retransmits copies of frames needs to handle decreases in available
+ payload size due to changes in packet number length, connection ID
+ length, and path MTU. A receiver MUST accept packets containing an
+ outdated frame, such as a MAX_DATA frame carrying a smaller maximum
+ data value than one found in an older packet.
+
+ A sender SHOULD avoid retransmitting information from packets once
+ they are acknowledged. This includes packets that are acknowledged
+ after being declared lost, which can happen in the presence of
+ network reordering. Doing so requires senders to retain information
+ about packets after they are declared lost. A sender can discard
+ this information after a period of time elapses that adequately
+ allows for reordering, such as a PTO (Section 6.2 of
+ [QUIC-RECOVERY]), or based on other events, such as reaching a memory
+ limit.
+
+ Upon detecting losses, a sender MUST take appropriate congestion
+ control action. The details of loss detection and congestion control
+ are described in [QUIC-RECOVERY].
+
+13.4. Explicit Congestion Notification
+
+ QUIC endpoints can use ECN [RFC3168] to detect and respond to network
+ congestion. ECN allows an endpoint to set an ECN-Capable Transport
+ (ECT) codepoint in the ECN field of an IP packet. A network node can
+ then indicate congestion by setting the ECN-CE codepoint in the ECN
+ field instead of dropping the packet [RFC8087]. Endpoints react to
+ reported congestion by reducing their sending rate in response, as
+ described in [QUIC-RECOVERY].
+
+ To enable ECN, a sending QUIC endpoint first determines whether a
+ path supports ECN marking and whether the peer reports the ECN values
+ in received IP headers; see Section 13.4.2.
+
+13.4.1. Reporting ECN Counts
+
+ The use of ECN requires the receiving endpoint to read the ECN field
+ from an IP packet, which is not possible on all platforms. If an
+ endpoint does not implement ECN support or does not have access to
+ received ECN fields, it does not report ECN counts for packets it
+ receives.
+
+ Even if an endpoint does not set an ECT field in packets it sends,
+ the endpoint MUST provide feedback about ECN markings it receives, if
+ these are accessible. Failing to report the ECN counts will cause
+ the sender to disable the use of ECN for this connection.
+
+ On receiving an IP packet with an ECT(0), ECT(1), or ECN-CE
+ codepoint, an ECN-enabled endpoint accesses the ECN field and
+ increases the corresponding ECT(0), ECT(1), or ECN-CE count. These
+ ECN counts are included in subsequent ACK frames; see Sections 13.2
+ and 19.3.
+
+ Each packet number space maintains separate acknowledgment state and
+ separate ECN counts. Coalesced QUIC packets (see Section 12.2) share
+ the same IP header so the ECN counts are incremented once for each
+ coalesced QUIC packet.
+
+ For example, if one each of an Initial, Handshake, and 1-RTT QUIC
+ packet are coalesced into a single UDP datagram, the ECN counts for
+ all three packet number spaces will be incremented by one each, based
+ on the ECN field of the single IP header.
+
+ ECN counts are only incremented when QUIC packets from the received
+ IP packet are processed. As such, duplicate QUIC packets are not
+ processed and do not increase ECN counts; see Section 21.10 for
+ relevant security concerns.
+
+13.4.2. ECN Validation
+
+ It is possible for faulty network devices to corrupt or erroneously
+ drop packets that carry a non-zero ECN codepoint. To ensure
+ connectivity in the presence of such devices, an endpoint validates
+ the ECN counts for each network path and disables the use of ECN on
+ that path if errors are detected.
+
+ To perform ECN validation for a new path:
+
+ * The endpoint sets an ECT(0) codepoint in the IP header of early
+ outgoing packets sent on a new path to the peer [RFC8311].
+
+ * The endpoint monitors whether all packets sent with an ECT
+ codepoint are eventually deemed lost (Section 6 of
+ [QUIC-RECOVERY]), indicating that ECN validation has failed.
+
+ If an endpoint has cause to expect that IP packets with an ECT
+ codepoint might be dropped by a faulty network element, the endpoint
+ could set an ECT codepoint for only the first ten outgoing packets on
+ a path, or for a period of three PTOs (see Section 6.2 of
+ [QUIC-RECOVERY]). If all packets marked with non-zero ECN codepoints
+ are subsequently lost, it can disable marking on the assumption that
+ the marking caused the loss.
+
+ An endpoint thus attempts to use ECN and validates this for each new
+ connection, when switching to a server's preferred address, and on
+ active connection migration to a new path. Appendix A.4 describes
+ one possible algorithm.
+
+ Other methods of probing paths for ECN support are possible, as are
+ different marking strategies. Implementations MAY use other methods
+ defined in RFCs; see [RFC8311]. Implementations that use the ECT(1)
+ codepoint need to perform ECN validation using the reported ECT(1)
+ counts.
+
+13.4.2.1. Receiving ACK Frames with ECN Counts
+
+ Erroneous application of ECN-CE markings by the network can result in
+ degraded connection performance. An endpoint that receives an ACK
+ frame with ECN counts therefore validates the counts before using
+ them. It performs this validation by comparing newly received counts
+ against those from the last successfully processed ACK frame. Any
+ increase in the ECN counts is validated based on the ECN markings
+ that were applied to packets that are newly acknowledged in the ACK
+ frame.
+
+ If an ACK frame newly acknowledges a packet that the endpoint sent
+ with either the ECT(0) or ECT(1) codepoint set, ECN validation fails
+ if the corresponding ECN counts are not present in the ACK frame.
+ This check detects a network element that zeroes the ECN field or a
+ peer that does not report ECN markings.
+
+ ECN validation also fails if the sum of the increase in ECT(0) and
+ ECN-CE counts is less than the number of newly acknowledged packets
+ that were originally sent with an ECT(0) marking. Similarly, ECN
+ validation fails if the sum of the increases to ECT(1) and ECN-CE
+ counts is less than the number of newly acknowledged packets sent
+ with an ECT(1) marking. These checks can detect remarking of ECN-CE
+ markings by the network.
+
+ An endpoint could miss acknowledgments for a packet when ACK frames
+ are lost. It is therefore possible for the total increase in ECT(0),
+ ECT(1), and ECN-CE counts to be greater than the number of packets
+ that are newly acknowledged by an ACK frame. This is why ECN counts
+ are permitted to be larger than the total number of packets that are
+ acknowledged.
+
+ Validating ECN counts from reordered ACK frames can result in
+ failure. An endpoint MUST NOT fail ECN validation as a result of
+ processing an ACK frame that does not increase the largest
+ acknowledged packet number.
+
+ ECN validation can fail if the received total count for either ECT(0)
+ or ECT(1) exceeds the total number of packets sent with each
+ corresponding ECT codepoint. In particular, validation will fail
+ when an endpoint receives a non-zero ECN count corresponding to an
+ ECT codepoint that it never applied. This check detects when packets
+ are remarked to ECT(0) or ECT(1) in the network.
+
+13.4.2.2. ECN Validation Outcomes
+
+ If validation fails, then the endpoint MUST disable ECN. It stops
+ setting the ECT codepoint in IP packets that it sends, assuming that
+ either the network path or the peer does not support ECN.
+
+ Even if validation fails, an endpoint MAY revalidate ECN for the same
+ path at any later time in the connection. An endpoint could continue
+ to periodically attempt validation.
+
+ Upon successful validation, an endpoint MAY continue to set an ECT
+ codepoint in subsequent packets it sends, with the expectation that
+ the path is ECN capable. Network routing and path elements can
+ change mid-connection; an endpoint MUST disable ECN if validation
+ later fails.
+
+14. Datagram Size
+
+ A UDP datagram can include one or more QUIC packets. The datagram
+ size refers to the total UDP payload size of a single UDP datagram
+ carrying QUIC packets. The datagram size includes one or more QUIC
+ packet headers and protected payloads, but not the UDP or IP headers.
+
+ The maximum datagram size is defined as the largest size of UDP
+ payload that can be sent across a network path using a single UDP
+ datagram. QUIC MUST NOT be used if the network path cannot support a
+ maximum datagram size of at least 1200 bytes.
+
+ QUIC assumes a minimum IP packet size of at least 1280 bytes. This
+ is the IPv6 minimum size [IPv6] and is also supported by most modern
+ IPv4 networks. Assuming the minimum IP header size of 40 bytes for
+ IPv6 and 20 bytes for IPv4 and a UDP header size of 8 bytes, this
+ results in a maximum datagram size of 1232 bytes for IPv6 and 1252
+ bytes for IPv4. Thus, modern IPv4 and all IPv6 network paths are
+ expected to be able to support QUIC.
+
+ | Note: This requirement to support a UDP payload of 1200 bytes
+ | limits the space available for IPv6 extension headers to 32
+ | bytes or IPv4 options to 52 bytes if the path only supports the
+ | IPv6 minimum MTU of 1280 bytes. This affects Initial packets
+ | and path validation.
+
+ Any maximum datagram size larger than 1200 bytes can be discovered
+ using Path Maximum Transmission Unit Discovery (PMTUD) (see
+ Section 14.2.1) or Datagram Packetization Layer PMTU Discovery
+ (DPLPMTUD) (see Section 14.3).
+
+ Enforcement of the max_udp_payload_size transport parameter
+ (Section 18.2) might act as an additional limit on the maximum
+ datagram size. A sender can avoid exceeding this limit, once the
+ value is known. However, prior to learning the value of the
+ transport parameter, endpoints risk datagrams being lost if they send
+ datagrams larger than the smallest allowed maximum datagram size of
+ 1200 bytes.
+
+ UDP datagrams MUST NOT be fragmented at the IP layer. In IPv4
+ [IPv4], the Don't Fragment (DF) bit MUST be set if possible, to
+ prevent fragmentation on the path.
+
+ QUIC sometimes requires datagrams to be no smaller than a certain
+ size; see Section 8.1 as an example. However, the size of a datagram
+ is not authenticated. That is, if an endpoint receives a datagram of
+ a certain size, it cannot know that the sender sent the datagram at
+ the same size. Therefore, an endpoint MUST NOT close a connection
+ when it receives a datagram that does not meet size constraints; the
+ endpoint MAY discard such datagrams.
+
+14.1. Initial Datagram Size
+
+ A client MUST expand the payload of all UDP datagrams carrying
+ Initial packets to at least the smallest allowed maximum datagram
+ size of 1200 bytes by adding PADDING frames to the Initial packet or
+ by coalescing the Initial packet; see Section 12.2. Initial packets
+ can even be coalesced with invalid packets, which a receiver will
+ discard. Similarly, a server MUST expand the payload of all UDP
+ datagrams carrying ack-eliciting Initial packets to at least the
+ smallest allowed maximum datagram size of 1200 bytes.
+
+ Sending UDP datagrams of this size ensures that the network path
+ supports a reasonable Path Maximum Transmission Unit (PMTU), in both
+ directions. Additionally, a client that expands Initial packets
+ helps reduce the amplitude of amplification attacks caused by server
+ responses toward an unverified client address; see Section 8.
+
+ Datagrams containing Initial packets MAY exceed 1200 bytes if the
+ sender believes that the network path and peer both support the size
+ that it chooses.
+
+ A server MUST discard an Initial packet that is carried in a UDP
+ datagram with a payload that is smaller than the smallest allowed
+ maximum datagram size of 1200 bytes. A server MAY also immediately
+ close the connection by sending a CONNECTION_CLOSE frame with an
+ error code of PROTOCOL_VIOLATION; see Section 10.2.3.
+
+ The server MUST also limit the number of bytes it sends before
+ validating the address of the client; see Section 8.
+
+14.2. Path Maximum Transmission Unit
+
+ The PMTU is the maximum size of the entire IP packet, including the
+ IP header, UDP header, and UDP payload. The UDP payload includes one
+ or more QUIC packet headers and protected payloads. The PMTU can
+ depend on path characteristics and can therefore change over time.
+ The largest UDP payload an endpoint sends at any given time is
+ referred to as the endpoint's maximum datagram size.
+
+ An endpoint SHOULD use DPLPMTUD (Section 14.3) or PMTUD
+ (Section 14.2.1) to determine whether the path to a destination will
+ support a desired maximum datagram size without fragmentation. In
+ the absence of these mechanisms, QUIC endpoints SHOULD NOT send
+ datagrams larger than the smallest allowed maximum datagram size.
+
+ Both DPLPMTUD and PMTUD send datagrams that are larger than the
+ current maximum datagram size, referred to as PMTU probes. All QUIC
+ packets that are not sent in a PMTU probe SHOULD be sized to fit
+ within the maximum datagram size to avoid the datagram being
+ fragmented or dropped [RFC8085].
+
+ If a QUIC endpoint determines that the PMTU between any pair of local
+ and remote IP addresses cannot support the smallest allowed maximum
+ datagram size of 1200 bytes, it MUST immediately cease sending QUIC
+ packets, except for those in PMTU probes or those containing
+ CONNECTION_CLOSE frames, on the affected path. An endpoint MAY
+ terminate the connection if an alternative path cannot be found.
+
+ Each pair of local and remote addresses could have a different PMTU.
+ QUIC implementations that implement any kind of PMTU discovery
+ therefore SHOULD maintain a maximum datagram size for each
+ combination of local and remote IP addresses.
+
+ A QUIC implementation MAY be more conservative in computing the
+ maximum datagram size to allow for unknown tunnel overheads or IP
+ header options/extensions.
+
+14.2.1. Handling of ICMP Messages by PMTUD
+
+ PMTUD [RFC1191] [RFC8201] relies on reception of ICMP messages (that
+ is, IPv6 Packet Too Big (PTB) messages) that indicate when an IP
+ packet is dropped because it is larger than the local router MTU.
+ DPLPMTUD can also optionally use these messages. This use of ICMP
+ messages is potentially vulnerable to attacks by entities that cannot
+ observe packets but might successfully guess the addresses used on
+ the path. These attacks could reduce the PMTU to a bandwidth-
+ inefficient value.
+
+ An endpoint MUST ignore an ICMP message that claims the PMTU has
+ decreased below QUIC's smallest allowed maximum datagram size.
+
+ The requirements for generating ICMP [RFC1812] [RFC4443] state that
+ the quoted packet should contain as much of the original packet as
+ possible without exceeding the minimum MTU for the IP version. The
+ size of the quoted packet can actually be smaller, or the information
+ unintelligible, as described in Section 1.1 of [DPLPMTUD].
+
+ QUIC endpoints using PMTUD SHOULD validate ICMP messages to protect
+ from packet injection as specified in [RFC8201] and Section 5.2 of
+ [RFC8085]. This validation SHOULD use the quoted packet supplied in
+ the payload of an ICMP message to associate the message with a
+ corresponding transport connection (see Section 4.6.1 of [DPLPMTUD]).
+ ICMP message validation MUST include matching IP addresses and UDP
+ ports [RFC8085] and, when possible, connection IDs to an active QUIC
+ session. The endpoint SHOULD ignore all ICMP messages that fail
+ validation.
+
+ An endpoint MUST NOT increase the PMTU based on ICMP messages; see
+ Item 6 in Section 3 of [DPLPMTUD]. Any reduction in QUIC's maximum
+ datagram size in response to ICMP messages MAY be provisional until
+ QUIC's loss detection algorithm determines that the quoted packet has
+ actually been lost.
+
+14.3. Datagram Packetization Layer PMTU Discovery
+
+ DPLPMTUD [DPLPMTUD] relies on tracking loss or acknowledgment of QUIC
+ packets that are carried in PMTU probes. PMTU probes for DPLPMTUD
+ that use the PADDING frame implement "Probing using padding data", as
+ defined in Section 4.1 of [DPLPMTUD].
+
+ Endpoints SHOULD set the initial value of BASE_PLPMTU (Section 5.1 of
+ [DPLPMTUD]) to be consistent with QUIC's smallest allowed maximum
+ datagram size. The MIN_PLPMTU is the same as the BASE_PLPMTU.
+
+ QUIC endpoints implementing DPLPMTUD maintain a DPLPMTUD Maximum
+ Packet Size (MPS) (Section 4.4 of [DPLPMTUD]) for each combination of
+ local and remote IP addresses. This corresponds to the maximum
+ datagram size.
+
+14.3.1. DPLPMTUD and Initial Connectivity
+
+ From the perspective of DPLPMTUD, QUIC is an acknowledged
+ Packetization Layer (PL). A QUIC sender can therefore enter the
+ DPLPMTUD BASE state (Section 5.2 of [DPLPMTUD]) when the QUIC
+ connection handshake has been completed.
+
+14.3.2. Validating the Network Path with DPLPMTUD
+
+ QUIC is an acknowledged PL; therefore, a QUIC sender does not
+ implement a DPLPMTUD CONFIRMATION_TIMER while in the SEARCH_COMPLETE
+ state; see Section 5.2 of [DPLPMTUD].
+
+14.3.3. Handling of ICMP Messages by DPLPMTUD
+
+ An endpoint using DPLPMTUD requires the validation of any received
+ ICMP PTB message before using the PTB information, as defined in
+ Section 4.6 of [DPLPMTUD]. In addition to UDP port validation, QUIC
+ validates an ICMP message by using other PL information (e.g.,
+ validation of connection IDs in the quoted packet of any received
+ ICMP message).
+
+ The considerations for processing ICMP messages described in
+ Section 14.2.1 also apply if these messages are used by DPLPMTUD.
+
+14.4. Sending QUIC PMTU Probes
+
+ PMTU probes are ack-eliciting packets.
+
+ Endpoints could limit the content of PMTU probes to PING and PADDING
+ frames, since packets that are larger than the current maximum
+ datagram size are more likely to be dropped by the network. Loss of
+ a QUIC packet that is carried in a PMTU probe is therefore not a
+ reliable indication of congestion and SHOULD NOT trigger a congestion
+ control reaction; see Item 7 in Section 3 of [DPLPMTUD]. However,
+ PMTU probes consume congestion window, which could delay subsequent
+ transmission by an application.
+
+14.4.1. PMTU Probes Containing Source Connection ID
+
+ Endpoints that rely on the Destination Connection ID field for
+ routing incoming QUIC packets are likely to require that the
+ connection ID be included in PMTU probes to route any resulting ICMP
+ messages (Section 14.2.1) back to the correct endpoint. However,
+ only long header packets (Section 17.2) contain the Source Connection
+ ID field, and long header packets are not decrypted or acknowledged
+ by the peer once the handshake is complete.
+
+ One way to construct a PMTU probe is to coalesce (see Section 12.2) a
+ packet with a long header, such as a Handshake or 0-RTT packet
+ (Section 17.2), with a short header packet in a single UDP datagram.
+ If the resulting PMTU probe reaches the endpoint, the packet with the
+ long header will be ignored, but the short header packet will be
+ acknowledged. If the PMTU probe causes an ICMP message to be sent,
+ the first part of the probe will be quoted in that message. If the
+ Source Connection ID field is within the quoted portion of the probe,
+ that could be used for routing or validation of the ICMP message.
+
+ | Note: The purpose of using a packet with a long header is only
+ | to ensure that the quoted packet contained in the ICMP message
+ | contains a Source Connection ID field. This packet does not
+ | need to be a valid packet, and it can be sent even if there is
+ | no current use for packets of that type.
+
+15. Versions
+
+ QUIC versions are identified using a 32-bit unsigned number.
+
+ The version 0x00000000 is reserved to represent version negotiation.
+ This version of the specification is identified by the number
+ 0x00000001.
+
+ Other versions of QUIC might have different properties from this
+ version. The properties of QUIC that are guaranteed to be consistent
+ across all versions of the protocol are described in
+ [QUIC-INVARIANTS].
+
+ Version 0x00000001 of QUIC uses TLS as a cryptographic handshake
+ protocol, as described in [QUIC-TLS].
+
+ Versions with the most significant 16 bits of the version number
+ cleared are reserved for use in future IETF consensus documents.
+
+ Versions that follow the pattern 0x?a?a?a?a are reserved for use in
+ forcing version negotiation to be exercised -- that is, any version
+ number where the low four bits of all bytes is 1010 (in binary). A
+ client or server MAY advertise support for any of these reserved
+ versions.
+
+ Reserved version numbers will never represent a real protocol; a
+ client MAY use one of these version numbers with the expectation that
+ the server will initiate version negotiation; a server MAY advertise
+ support for one of these versions and can expect that clients ignore
+ the value.
+
+16. Variable-Length Integer Encoding
+
+ QUIC packets and frames commonly use a variable-length encoding for
+ non-negative integer values. This encoding ensures that smaller
+ integer values need fewer bytes to encode.
+
+ The QUIC variable-length integer encoding reserves the two most
+ significant bits of the first byte to encode the base-2 logarithm of
+ the integer encoding length in bytes. The integer value is encoded
+ on the remaining bits, in network byte order.
+
+ This means that integers are encoded on 1, 2, 4, or 8 bytes and can
+ encode 6-, 14-, 30-, or 62-bit values, respectively. Table 4
+ summarizes the encoding properties.
+
+ +======+========+=============+=======================+
+ | 2MSB | Length | Usable Bits | Range |
+ +======+========+=============+=======================+
+ | 00 | 1 | 6 | 0-63 |
+ +------+--------+-------------+-----------------------+
+ | 01 | 2 | 14 | 0-16383 |
+ +------+--------+-------------+-----------------------+
+ | 10 | 4 | 30 | 0-1073741823 |
+ +------+--------+-------------+-----------------------+
+ | 11 | 8 | 62 | 0-4611686018427387903 |
+ +------+--------+-------------+-----------------------+
+
+ Table 4: Summary of Integer Encodings
+
+ An example of a decoding algorithm and sample encodings are shown in
+ Appendix A.1.
+
+ Values do not need to be encoded on the minimum number of bytes
+ necessary, with the sole exception of the Frame Type field; see
+ Section 12.4.
+
+ Versions (Section 15), packet numbers sent in the header
+ (Section 17.1), and the length of connection IDs in long header
+ packets (Section 17.2) are described using integers but do not use
+ this encoding.
+
+17. Packet Formats
+
+ All numeric values are encoded in network byte order (that is, big
+ endian), and all field sizes are in bits. Hexadecimal notation is
+ used for describing the value of fields.
+
+17.1. Packet Number Encoding and Decoding
+
+ Packet numbers are integers in the range 0 to 2^62-1 (Section 12.3).
+ When present in long or short packet headers, they are encoded in 1
+ to 4 bytes. The number of bits required to represent the packet
+ number is reduced by including only the least significant bits of the
+ packet number.
+
+ The encoded packet number is protected as described in Section 5.4 of
+ [QUIC-TLS].
+
+ Prior to receiving an acknowledgment for a packet number space, the
+ full packet number MUST be included; it is not to be truncated, as
+ described below.
+
+ After an acknowledgment is received for a packet number space, the
+ sender MUST use a packet number size able to represent more than
+ twice as large a range as the difference between the largest
+ acknowledged packet number and the packet number being sent. A peer
+ receiving the packet will then correctly decode the packet number,
+ unless the packet is delayed in transit such that it arrives after
+ many higher-numbered packets have been received. An endpoint SHOULD
+ use a large enough packet number encoding to allow the packet number
+ to be recovered even if the packet arrives after packets that are
+ sent afterwards.
+
+ As a result, the size of the packet number encoding is at least one
+ bit more than the base-2 logarithm of the number of contiguous
+ unacknowledged packet numbers, including the new packet. Pseudocode
+ and an example for packet number encoding can be found in
+ Appendix A.2.
+
+ At a receiver, protection of the packet number is removed prior to
+ recovering the full packet number. The full packet number is then
+ reconstructed based on the number of significant bits present, the
+ value of those bits, and the largest packet number received in a
+ successfully authenticated packet. Recovering the full packet number
+ is necessary to successfully complete the removal of packet
+ protection.
+
+ Once header protection is removed, the packet number is decoded by
+ finding the packet number value that is closest to the next expected
+ packet. The next expected packet is the highest received packet
+ number plus one. Pseudocode and an example for packet number
+ decoding can be found in Appendix A.3.
+
+17.2. Long Header Packets
+
+ Long Header Packet {
+ Header Form (1) = 1,
+ Fixed Bit (1) = 1,
+ Long Packet Type (2),
+ Type-Specific Bits (4),
+ Version (32),
+ Destination Connection ID Length (8),
+ Destination Connection ID (0..160),
+ Source Connection ID Length (8),
+ Source Connection ID (0..160),
+ Type-Specific Payload (..),
+ }
+
+ Figure 13: Long Header Packet Format
+
+ Long headers are used for packets that are sent prior to the
+ establishment of 1-RTT keys. Once 1-RTT keys are available, a sender
+ switches to sending packets using the short header (Section 17.3).
+ The long form allows for special packets -- such as the Version
+ Negotiation packet -- to be represented in this uniform fixed-length
+ packet format. Packets that use the long header contain the
+ following fields:
+
+ Header Form: The most significant bit (0x80) of byte 0 (the first
+ byte) is set to 1 for long headers.
+
+ Fixed Bit: The next bit (0x40) of byte 0 is set to 1, unless the
+ packet is a Version Negotiation packet. Packets containing a zero
+ value for this bit are not valid packets in this version and MUST
+ be discarded. A value of 1 for this bit allows QUIC to coexist
+ with other protocols; see [RFC7983].
+
+ Long Packet Type: The next two bits (those with a mask of 0x30) of
+ byte 0 contain a packet type. Packet types are listed in Table 5.
+
+ Type-Specific Bits: The semantics of the lower four bits (those with
+ a mask of 0x0f) of byte 0 are determined by the packet type.
+
+ Version: The QUIC Version is a 32-bit field that follows the first
+ byte. This field indicates the version of QUIC that is in use and
+ determines how the rest of the protocol fields are interpreted.
+
+ Destination Connection ID Length: The byte following the version
+ contains the length in bytes of the Destination Connection ID
+ field that follows it. This length is encoded as an 8-bit
+ unsigned integer. In QUIC version 1, this value MUST NOT exceed
+ 20 bytes. Endpoints that receive a version 1 long header with a
+ value larger than 20 MUST drop the packet. In order to properly
+ form a Version Negotiation packet, servers SHOULD be able to read
+ longer connection IDs from other QUIC versions.
+
+ Destination Connection ID: The Destination Connection ID field
+ follows the Destination Connection ID Length field, which
+ indicates the length of this field. Section 7.2 describes the use
+ of this field in more detail.
+
+ Source Connection ID Length: The byte following the Destination
+ Connection ID contains the length in bytes of the Source
+ Connection ID field that follows it. This length is encoded as an
+ 8-bit unsigned integer. In QUIC version 1, this value MUST NOT
+ exceed 20 bytes. Endpoints that receive a version 1 long header
+ with a value larger than 20 MUST drop the packet. In order to
+ properly form a Version Negotiation packet, servers SHOULD be able
+ to read longer connection IDs from other QUIC versions.
+
+ Source Connection ID: The Source Connection ID field follows the
+ Source Connection ID Length field, which indicates the length of
+ this field. Section 7.2 describes the use of this field in more
+ detail.
+
+ Type-Specific Payload: The remainder of the packet, if any, is type
+ specific.
+
+ In this version of QUIC, the following packet types with the long
+ header are defined:
+
+ +======+===========+================+
+ | Type | Name | Section |
+ +======+===========+================+
+ | 0x00 | Initial | Section 17.2.2 |
+ +------+-----------+----------------+
+ | 0x01 | 0-RTT | Section 17.2.3 |
+ +------+-----------+----------------+
+ | 0x02 | Handshake | Section 17.2.4 |
+ +------+-----------+----------------+
+ | 0x03 | Retry | Section 17.2.5 |
+ +------+-----------+----------------+
+
+ Table 5: Long Header Packet Types
+
+ The header form bit, Destination and Source Connection ID lengths,
+ Destination and Source Connection ID fields, and Version fields of a
+ long header packet are version independent. The other fields in the
+ first byte are version specific. See [QUIC-INVARIANTS] for details
+ on how packets from different versions of QUIC are interpreted.
+
+ The interpretation of the fields and the payload are specific to a
+ version and packet type. While type-specific semantics for this
+ version are described in the following sections, several long header
+ packets in this version of QUIC contain these additional fields:
+
+ Reserved Bits: Two bits (those with a mask of 0x0c) of byte 0 are
+ reserved across multiple packet types. These bits are protected
+ using header protection; see Section 5.4 of [QUIC-TLS]. The value
+ included prior to protection MUST be set to 0. An endpoint MUST
+ treat receipt of a packet that has a non-zero value for these bits
+ after removing both packet and header protection as a connection
+ error of type PROTOCOL_VIOLATION. Discarding such a packet after
+ only removing header protection can expose the endpoint to
+ attacks; see Section 9.5 of [QUIC-TLS].
+
+ Packet Number Length: In packet types that contain a Packet Number
+ field, the least significant two bits (those with a mask of 0x03)
+ of byte 0 contain the length of the Packet Number field, encoded
+ as an unsigned two-bit integer that is one less than the length of
+ the Packet Number field in bytes. That is, the length of the
+ Packet Number field is the value of this field plus one. These
+ bits are protected using header protection; see Section 5.4 of
+ [QUIC-TLS].
+
+ Length: This is the length of the remainder of the packet (that is,
+ the Packet Number and Payload fields) in bytes, encoded as a
+ variable-length integer (Section 16).
+
+ Packet Number: This field is 1 to 4 bytes long. The packet number
+ is protected using header protection; see Section 5.4 of
+ [QUIC-TLS]. The length of the Packet Number field is encoded in
+ the Packet Number Length bits of byte 0; see above.
+
+ Packet Payload: This is the payload of the packet -- containing a
+ sequence of frames -- that is protected using packet protection.
+
+17.2.1. Version Negotiation Packet
+
+ A Version Negotiation packet is inherently not version specific.
+ Upon receipt by a client, it will be identified as a Version
+ Negotiation packet based on the Version field having a value of 0.
+
+ The Version Negotiation packet is a response to a client packet that
+ contains a version that is not supported by the server. It is only
+ sent by servers.
+
+ The layout of a Version Negotiation packet is:
+
+ Version Negotiation Packet {
+ Header Form (1) = 1,
+ Unused (7),
+ Version (32) = 0,
+ Destination Connection ID Length (8),
+ Destination Connection ID (0..2040),
+ Source Connection ID Length (8),
+ Source Connection ID (0..2040),
+ Supported Version (32) ...,
+ }
+
+ Figure 14: Version Negotiation Packet
+
+ The value in the Unused field is set to an arbitrary value by the
+ server. Clients MUST ignore the value of this field. Where QUIC
+ might be multiplexed with other protocols (see [RFC7983]), servers
+ SHOULD set the most significant bit of this field (0x40) to 1 so that
+ Version Negotiation packets appear to have the Fixed Bit field. Note
+ that other versions of QUIC might not make a similar recommendation.
+
+ The Version field of a Version Negotiation packet MUST be set to
+ 0x00000000.
+
+ The server MUST include the value from the Source Connection ID field
+ of the packet it receives in the Destination Connection ID field.
+ The value for Source Connection ID MUST be copied from the
+ Destination Connection ID of the received packet, which is initially
+ randomly selected by a client. Echoing both connection IDs gives
+ clients some assurance that the server received the packet and that
+ the Version Negotiation packet was not generated by an entity that
+ did not observe the Initial packet.
+
+ Future versions of QUIC could have different requirements for the
+ lengths of connection IDs. In particular, connection IDs might have
+ a smaller minimum length or a greater maximum length. Version-
+ specific rules for the connection ID therefore MUST NOT influence a
+ decision about whether to send a Version Negotiation packet.
+
+ The remainder of the Version Negotiation packet is a list of 32-bit
+ versions that the server supports.
+
+ A Version Negotiation packet is not acknowledged. It is only sent in
+ response to a packet that indicates an unsupported version; see
+ Section 5.2.2.
+
+ The Version Negotiation packet does not include the Packet Number and
+ Length fields present in other packets that use the long header form.
+ Consequently, a Version Negotiation packet consumes an entire UDP
+ datagram.
+
+ A server MUST NOT send more than one Version Negotiation packet in
+ response to a single UDP datagram.
+
+ See Section 6 for a description of the version negotiation process.
+
+17.2.2. Initial Packet
+
+ An Initial packet uses long headers with a type value of 0x00. It
+ carries the first CRYPTO frames sent by the client and server to
+ perform key exchange, and it carries ACK frames in either direction.
+
+ Initial Packet {
+ Header Form (1) = 1,
+ Fixed Bit (1) = 1,
+ Long Packet Type (2) = 0,
+ Reserved Bits (2),
+ Packet Number Length (2),
+ Version (32),
+ Destination Connection ID Length (8),
+ Destination Connection ID (0..160),
+ Source Connection ID Length (8),
+ Source Connection ID (0..160),
+ Token Length (i),
+ Token (..),
+ Length (i),
+ Packet Number (8..32),
+ Packet Payload (8..),
+ }
+
+ Figure 15: Initial Packet
+
+ The Initial packet contains a long header as well as the Length and
+ Packet Number fields; see Section 17.2. The first byte contains the
+ Reserved and Packet Number Length bits; see also Section 17.2.
+ Between the Source Connection ID and Length fields, there are two
+ additional fields specific to the Initial packet.
+
+ Token Length: A variable-length integer specifying the length of the
+ Token field, in bytes. This value is 0 if no token is present.
+ Initial packets sent by the server MUST set the Token Length field
+ to 0; clients that receive an Initial packet with a non-zero Token
+ Length field MUST either discard the packet or generate a
+ connection error of type PROTOCOL_VIOLATION.
+
+ Token: The value of the token that was previously provided in a
+ Retry packet or NEW_TOKEN frame; see Section 8.1.
+
+ In order to prevent tampering by version-unaware middleboxes, Initial
+ packets are protected with connection- and version-specific keys
+ (Initial keys) as described in [QUIC-TLS]. This protection does not
+ provide confidentiality or integrity against attackers that can
+ observe packets, but it does prevent attackers that cannot observe
+ packets from spoofing Initial packets.
+
+ The client and server use the Initial packet type for any packet that
+ contains an initial cryptographic handshake message. This includes
+ all cases where a new packet containing the initial cryptographic
+ message needs to be created, such as the packets sent after receiving
+ a Retry packet; see Section 17.2.5.
+
+ A server sends its first Initial packet in response to a client
+ Initial. A server MAY send multiple Initial packets. The
+ cryptographic key exchange could require multiple round trips or
+ retransmissions of this data.
+
+ The payload of an Initial packet includes a CRYPTO frame (or frames)
+ containing a cryptographic handshake message, ACK frames, or both.
+ PING, PADDING, and CONNECTION_CLOSE frames of type 0x1c are also
+ permitted. An endpoint that receives an Initial packet containing
+ other frames can either discard the packet as spurious or treat it as
+ a connection error.
+
+ The first packet sent by a client always includes a CRYPTO frame that
+ contains the start or all of the first cryptographic handshake
+ message. The first CRYPTO frame sent always begins at an offset of
+ 0; see Section 7.
+
+ Note that if the server sends a TLS HelloRetryRequest (see
+ Section 4.7 of [QUIC-TLS]), the client will send another series of
+ Initial packets. These Initial packets will continue the
+ cryptographic handshake and will contain CRYPTO frames starting at an
+ offset matching the size of the CRYPTO frames sent in the first
+ flight of Initial packets.
+
+17.2.2.1. Abandoning Initial Packets
+
+ A client stops both sending and processing Initial packets when it
+ sends its first Handshake packet. A server stops sending and
+ processing Initial packets when it receives its first Handshake
+ packet. Though packets might still be in flight or awaiting
+ acknowledgment, no further Initial packets need to be exchanged
+ beyond this point. Initial packet protection keys are discarded (see
+ Section 4.9.1 of [QUIC-TLS]) along with any loss recovery and
+ congestion control state; see Section 6.4 of [QUIC-RECOVERY].
+
+ Any data in CRYPTO frames is discarded -- and no longer retransmitted
+ -- when Initial keys are discarded.
+
+17.2.3. 0-RTT
+
+ A 0-RTT packet uses long headers with a type value of 0x01, followed
+ by the Length and Packet Number fields; see Section 17.2. The first
+ byte contains the Reserved and Packet Number Length bits; see
+ Section 17.2. A 0-RTT packet is used to carry "early" data from the
+ client to the server as part of the first flight, prior to handshake
+ completion. As part of the TLS handshake, the server can accept or
+ reject this early data.
+
+ See Section 2.3 of [TLS13] for a discussion of 0-RTT data and its
+ limitations.
+
+ 0-RTT Packet {
+ Header Form (1) = 1,
+ Fixed Bit (1) = 1,
+ Long Packet Type (2) = 1,
+ Reserved Bits (2),
+ Packet Number Length (2),
+ Version (32),
+ Destination Connection ID Length (8),
+ Destination Connection ID (0..160),
+ Source Connection ID Length (8),
+ Source Connection ID (0..160),
+ Length (i),
+ Packet Number (8..32),
+ Packet Payload (8..),
+ }
+
+ Figure 16: 0-RTT Packet
+
+ Packet numbers for 0-RTT protected packets use the same space as
+ 1-RTT protected packets.
+
+ After a client receives a Retry packet, 0-RTT packets are likely to
+ have been lost or discarded by the server. A client SHOULD attempt
+ to resend data in 0-RTT packets after it sends a new Initial packet.
+ New packet numbers MUST be used for any new packets that are sent; as
+ described in Section 17.2.5.3, reusing packet numbers could
+ compromise packet protection.
+
+ A client only receives acknowledgments for its 0-RTT packets once the
+ handshake is complete, as defined in Section 4.1.1 of [QUIC-TLS].
+
+ A client MUST NOT send 0-RTT packets once it starts processing 1-RTT
+ packets from the server. This means that 0-RTT packets cannot
+ contain any response to frames from 1-RTT packets. For instance, a
+ client cannot send an ACK frame in a 0-RTT packet, because that can
+ only acknowledge a 1-RTT packet. An acknowledgment for a 1-RTT
+ packet MUST be carried in a 1-RTT packet.
+
+ A server SHOULD treat a violation of remembered limits
+ (Section 7.4.1) as a connection error of an appropriate type (for
+ instance, a FLOW_CONTROL_ERROR for exceeding stream data limits).
+
+17.2.4. Handshake Packet
+
+ A Handshake packet uses long headers with a type value of 0x02,
+ followed by the Length and Packet Number fields; see Section 17.2.
+ The first byte contains the Reserved and Packet Number Length bits;
+ see Section 17.2. It is used to carry cryptographic handshake
+ messages and acknowledgments from the server and client.
+
+ Handshake Packet {
+ Header Form (1) = 1,
+ Fixed Bit (1) = 1,
+ Long Packet Type (2) = 2,
+ Reserved Bits (2),
+ Packet Number Length (2),
+ Version (32),
+ Destination Connection ID Length (8),
+ Destination Connection ID (0..160),
+ Source Connection ID Length (8),
+ Source Connection ID (0..160),
+ Length (i),
+ Packet Number (8..32),
+ Packet Payload (8..),
+ }
+
+ Figure 17: Handshake Protected Packet
+
+ Once a client has received a Handshake packet from a server, it uses
+ Handshake packets to send subsequent cryptographic handshake messages
+ and acknowledgments to the server.
+
+ The Destination Connection ID field in a Handshake packet contains a
+ connection ID that is chosen by the recipient of the packet; the
+ Source Connection ID includes the connection ID that the sender of
+ the packet wishes to use; see Section 7.2.
+
+ Handshake packets have their own packet number space, and thus the
+ first Handshake packet sent by a server contains a packet number of
+ 0.
+
+ The payload of this packet contains CRYPTO frames and could contain
+ PING, PADDING, or ACK frames. Handshake packets MAY contain
+ CONNECTION_CLOSE frames of type 0x1c. Endpoints MUST treat receipt
+ of Handshake packets with other frames as a connection error of type
+ PROTOCOL_VIOLATION.
+
+ Like Initial packets (see Section 17.2.2.1), data in CRYPTO frames
+ for Handshake packets is discarded -- and no longer retransmitted --
+ when Handshake protection keys are discarded.
+
+17.2.5. Retry Packet
+
+ As shown in Figure 18, a Retry packet uses a long packet header with
+ a type value of 0x03. It carries an address validation token created
+ by the server. It is used by a server that wishes to perform a
+ retry; see Section 8.1.
+
+ Retry Packet {
+ Header Form (1) = 1,
+ Fixed Bit (1) = 1,
+ Long Packet Type (2) = 3,
+ Unused (4),
+ Version (32),
+ Destination Connection ID Length (8),
+ Destination Connection ID (0..160),
+ Source Connection ID Length (8),
+ Source Connection ID (0..160),
+ Retry Token (..),
+ Retry Integrity Tag (128),
+ }
+
+ Figure 18: Retry Packet
+
+ A Retry packet does not contain any protected fields. The value in
+ the Unused field is set to an arbitrary value by the server; a client
+ MUST ignore these bits. In addition to the fields from the long
+ header, it contains these additional fields:
+
+ Retry Token: An opaque token that the server can use to validate the
+ client's address.
+
+ Retry Integrity Tag: Defined in Section 5.8 ("Retry Packet
+ Integrity") of [QUIC-TLS].
+
+17.2.5.1. Sending a Retry Packet
+
+ The server populates the Destination Connection ID with the
+ connection ID that the client included in the Source Connection ID of
+ the Initial packet.
+
+ The server includes a connection ID of its choice in the Source
+ Connection ID field. This value MUST NOT be equal to the Destination
+ Connection ID field of the packet sent by the client. A client MUST
+ discard a Retry packet that contains a Source Connection ID field
+ that is identical to the Destination Connection ID field of its
+ Initial packet. The client MUST use the value from the Source
+ Connection ID field of the Retry packet in the Destination Connection
+ ID field of subsequent packets that it sends.
+
+ A server MAY send Retry packets in response to Initial and 0-RTT
+ packets. A server can either discard or buffer 0-RTT packets that it
+ receives. A server can send multiple Retry packets as it receives
+ Initial or 0-RTT packets. A server MUST NOT send more than one Retry
+ packet in response to a single UDP datagram.
+
+17.2.5.2. Handling a Retry Packet
+
+ A client MUST accept and process at most one Retry packet for each
+ connection attempt. After the client has received and processed an
+ Initial or Retry packet from the server, it MUST discard any
+ subsequent Retry packets that it receives.
+
+ Clients MUST discard Retry packets that have a Retry Integrity Tag
+ that cannot be validated; see Section 5.8 of [QUIC-TLS]. This
+ diminishes an attacker's ability to inject a Retry packet and
+ protects against accidental corruption of Retry packets. A client
+ MUST discard a Retry packet with a zero-length Retry Token field.
+
+ The client responds to a Retry packet with an Initial packet that
+ includes the provided Retry token to continue connection
+ establishment.
+
+ A client sets the Destination Connection ID field of this Initial
+ packet to the value from the Source Connection ID field in the Retry
+ packet. Changing the Destination Connection ID field also results in
+ a change to the keys used to protect the Initial packet. It also
+ sets the Token field to the token provided in the Retry packet. The
+ client MUST NOT change the Source Connection ID because the server
+ could include the connection ID as part of its token validation
+ logic; see Section 8.1.4.
+
+ A Retry packet does not include a packet number and cannot be
+ explicitly acknowledged by a client.
+
+17.2.5.3. Continuing a Handshake after Retry
+
+ Subsequent Initial packets from the client include the connection ID
+ and token values from the Retry packet. The client copies the Source
+ Connection ID field from the Retry packet to the Destination
+ Connection ID field and uses this value until an Initial packet with
+ an updated value is received; see Section 7.2. The value of the
+ Token field is copied to all subsequent Initial packets; see
+ Section 8.1.2.
+
+ Other than updating the Destination Connection ID and Token fields,
+ the Initial packet sent by the client is subject to the same
+ restrictions as the first Initial packet. A client MUST use the same
+ cryptographic handshake message it included in this packet. A server
+ MAY treat a packet that contains a different cryptographic handshake
+ message as a connection error or discard it. Note that including a
+ Token field reduces the available space for the cryptographic
+ handshake message, which might result in the client needing to send
+ multiple Initial packets.
+
+ A client MAY attempt 0-RTT after receiving a Retry packet by sending
+ 0-RTT packets to the connection ID provided by the server.
+
+ A client MUST NOT reset the packet number for any packet number space
+ after processing a Retry packet. In particular, 0-RTT packets
+ contain confidential information that will most likely be
+ retransmitted on receiving a Retry packet. The keys used to protect
+ these new 0-RTT packets will not change as a result of responding to
+ a Retry packet. However, the data sent in these packets could be
+ different than what was sent earlier. Sending these new packets with
+ the same packet number is likely to compromise the packet protection
+ for those packets because the same key and nonce could be used to
+ protect different content. A server MAY abort the connection if it
+ detects that the client reset the packet number.
+
+ The connection IDs used in Initial and Retry packets exchanged
+ between client and server are copied to the transport parameters and
+ validated as described in Section 7.3.
+
+17.3. Short Header Packets
+
+ This version of QUIC defines a single packet type that uses the short
+ packet header.
+
+17.3.1. 1-RTT Packet
+
+ A 1-RTT packet uses a short packet header. It is used after the
+ version and 1-RTT keys are negotiated.
+
+ 1-RTT Packet {
+ Header Form (1) = 0,
+ Fixed Bit (1) = 1,
+ Spin Bit (1),
+ Reserved Bits (2),
+ Key Phase (1),
+ Packet Number Length (2),
+ Destination Connection ID (0..160),
+ Packet Number (8..32),
+ Packet Payload (8..),
+ }
+
+ Figure 19: 1-RTT Packet
+
+ 1-RTT packets contain the following fields:
+
+ Header Form: The most significant bit (0x80) of byte 0 is set to 0
+ for the short header.
+
+ Fixed Bit: The next bit (0x40) of byte 0 is set to 1. Packets
+ containing a zero value for this bit are not valid packets in this
+ version and MUST be discarded. A value of 1 for this bit allows
+ QUIC to coexist with other protocols; see [RFC7983].
+
+ Spin Bit: The third most significant bit (0x20) of byte 0 is the
+ latency spin bit, set as described in Section 17.4.
+
+ Reserved Bits: The next two bits (those with a mask of 0x18) of byte
+ 0 are reserved. These bits are protected using header protection;
+ see Section 5.4 of [QUIC-TLS]. The value included prior to
+ protection MUST be set to 0. An endpoint MUST treat receipt of a
+ packet that has a non-zero value for these bits, after removing
+ both packet and header protection, as a connection error of type
+ PROTOCOL_VIOLATION. Discarding such a packet after only removing
+ header protection can expose the endpoint to attacks; see
+ Section 9.5 of [QUIC-TLS].
+
+ Key Phase: The next bit (0x04) of byte 0 indicates the key phase,
+ which allows a recipient of a packet to identify the packet
+ protection keys that are used to protect the packet. See
+ [QUIC-TLS] for details. This bit is protected using header
+ protection; see Section 5.4 of [QUIC-TLS].
+
+ Packet Number Length: The least significant two bits (those with a
+ mask of 0x03) of byte 0 contain the length of the Packet Number
+ field, encoded as an unsigned two-bit integer that is one less
+ than the length of the Packet Number field in bytes. That is, the
+ length of the Packet Number field is the value of this field plus
+ one. These bits are protected using header protection; see
+ Section 5.4 of [QUIC-TLS].
+
+ Destination Connection ID: The Destination Connection ID is a
+ connection ID that is chosen by the intended recipient of the
+ packet. See Section 5.1 for more details.
+
+ Packet Number: The Packet Number field is 1 to 4 bytes long. The
+ packet number is protected using header protection; see
+ Section 5.4 of [QUIC-TLS]. The length of the Packet Number field
+ is encoded in Packet Number Length field. See Section 17.1 for
+ details.
+
+ Packet Payload: 1-RTT packets always include a 1-RTT protected
+ payload.
+
+ The header form bit and the Destination Connection ID field of a
+ short header packet are version independent. The remaining fields
+ are specific to the selected QUIC version. See [QUIC-INVARIANTS] for
+ details on how packets from different versions of QUIC are
+ interpreted.
+
+17.4. Latency Spin Bit
+
+ The latency spin bit, which is defined for 1-RTT packets
+ (Section 17.3.1), enables passive latency monitoring from observation
+ points on the network path throughout the duration of a connection.
+ The server reflects the spin value received, while the client "spins"
+ it after one RTT. On-path observers can measure the time between two
+ spin bit toggle events to estimate the end-to-end RTT of a
+ connection.
+
+ The spin bit is only present in 1-RTT packets, since it is possible
+ to measure the initial RTT of a connection by observing the
+ handshake. Therefore, the spin bit is available after version
+ negotiation and connection establishment are completed. On-path
+ measurement and use of the latency spin bit are further discussed in
+ [QUIC-MANAGEABILITY].
+
+ The spin bit is an OPTIONAL feature of this version of QUIC. An
+ endpoint that does not support this feature MUST disable it, as
+ defined below.
+
+ Each endpoint unilaterally decides if the spin bit is enabled or
+ disabled for a connection. Implementations MUST allow administrators
+ of clients and servers to disable the spin bit either globally or on
+ a per-connection basis. Even when the spin bit is not disabled by
+ the administrator, endpoints MUST disable their use of the spin bit
+ for a random selection of at least one in every 16 network paths, or
+ for one in every 16 connection IDs, in order to ensure that QUIC
+ connections that disable the spin bit are commonly observed on the
+ network. As each endpoint disables the spin bit independently, this
+ ensures that the spin bit signal is disabled on approximately one in
+ eight network paths.
+
+ When the spin bit is disabled, endpoints MAY set the spin bit to any
+ value and MUST ignore any incoming value. It is RECOMMENDED that
+ endpoints set the spin bit to a random value either chosen
+ independently for each packet or chosen independently for each
+ connection ID.
+
+ If the spin bit is enabled for the connection, the endpoint maintains
+ a spin value for each network path and sets the spin bit in the
+ packet header to the currently stored value when a 1-RTT packet is
+ sent on that path. The spin value is initialized to 0 in the
+ endpoint for each network path. Each endpoint also remembers the
+ highest packet number seen from its peer on each path.
+
+ When a server receives a 1-RTT packet that increases the highest
+ packet number seen by the server from the client on a given network
+ path, it sets the spin value for that path to be equal to the spin
+ bit in the received packet.
+
+ When a client receives a 1-RTT packet that increases the highest
+ packet number seen by the client from the server on a given network
+ path, it sets the spin value for that path to the inverse of the spin
+ bit in the received packet.
+
+ An endpoint resets the spin value for a network path to 0 when
+ changing the connection ID being used on that network path.
+
+18. Transport Parameter Encoding
+
+ The extension_data field of the quic_transport_parameters extension
+ defined in [QUIC-TLS] contains the QUIC transport parameters. They
+ are encoded as a sequence of transport parameters, as shown in
+ Figure 20:
+
+ Transport Parameters {
+ Transport Parameter (..) ...,
+ }
+
+ Figure 20: Sequence of Transport Parameters
+
+ Each transport parameter is encoded as an (identifier, length, value)
+ tuple, as shown in Figure 21:
+
+ Transport Parameter {
+ Transport Parameter ID (i),
+ Transport Parameter Length (i),
+ Transport Parameter Value (..),
+ }
+
+ Figure 21: Transport Parameter Encoding
+
+ The Transport Parameter Length field contains the length of the
+ Transport Parameter Value field in bytes.
+
+ QUIC encodes transport parameters into a sequence of bytes, which is
+ then included in the cryptographic handshake.
+
+18.1. Reserved Transport Parameters
+
+ Transport parameters with an identifier of the form "31 * N + 27" for
+ integer values of N are reserved to exercise the requirement that
+ unknown transport parameters be ignored. These transport parameters
+ have no semantics and can carry arbitrary values.
+
+18.2. Transport Parameter Definitions
+
+ This section details the transport parameters defined in this
+ document.
+
+ Many transport parameters listed here have integer values. Those
+ transport parameters that are identified as integers use a variable-
+ length integer encoding; see Section 16. Transport parameters have a
+ default value of 0 if the transport parameter is absent, unless
+ otherwise stated.
+
+ The following transport parameters are defined:
+
+ original_destination_connection_id (0x00): This parameter is the
+ value of the Destination Connection ID field from the first
+ Initial packet sent by the client; see Section 7.3. This
+ transport parameter is only sent by a server.
+
+ max_idle_timeout (0x01): The maximum idle timeout is a value in
+ milliseconds that is encoded as an integer; see (Section 10.1).
+ Idle timeout is disabled when both endpoints omit this transport
+ parameter or specify a value of 0.
+
+ stateless_reset_token (0x02): A stateless reset token is used in
+ verifying a stateless reset; see Section 10.3. This parameter is
+ a sequence of 16 bytes. This transport parameter MUST NOT be sent
+ by a client but MAY be sent by a server. A server that does not
+ send this transport parameter cannot use stateless reset
+ (Section 10.3) for the connection ID negotiated during the
+ handshake.
+
+ max_udp_payload_size (0x03): The maximum UDP payload size parameter
+ is an integer value that limits the size of UDP payloads that the
+ endpoint is willing to receive. UDP datagrams with payloads
+ larger than this limit are not likely to be processed by the
+ receiver.
+
+ The default for this parameter is the maximum permitted UDP
+ payload of 65527. Values below 1200 are invalid.
+
+ This limit does act as an additional constraint on datagram size
+ in the same way as the path MTU, but it is a property of the
+ endpoint and not the path; see Section 14. It is expected that
+ this is the space an endpoint dedicates to holding incoming
+ packets.
+
+ initial_max_data (0x04): The initial maximum data parameter is an
+ integer value that contains the initial value for the maximum
+ amount of data that can be sent on the connection. This is
+ equivalent to sending a MAX_DATA (Section 19.9) for the connection
+ immediately after completing the handshake.
+
+ initial_max_stream_data_bidi_local (0x05): This parameter is an
+ integer value specifying the initial flow control limit for
+ locally initiated bidirectional streams. This limit applies to
+ newly created bidirectional streams opened by the endpoint that
+ sends the transport parameter. In client transport parameters,
+ this applies to streams with an identifier with the least
+ significant two bits set to 0x00; in server transport parameters,
+ this applies to streams with the least significant two bits set to
+ 0x01.
+
+ initial_max_stream_data_bidi_remote (0x06): This parameter is an
+ integer value specifying the initial flow control limit for peer-
+ initiated bidirectional streams. This limit applies to newly
+ created bidirectional streams opened by the endpoint that receives
+ the transport parameter. In client transport parameters, this
+ applies to streams with an identifier with the least significant
+ two bits set to 0x01; in server transport parameters, this applies
+ to streams with the least significant two bits set to 0x00.
+
+ initial_max_stream_data_uni (0x07): This parameter is an integer
+ value specifying the initial flow control limit for unidirectional
+ streams. This limit applies to newly created unidirectional
+ streams opened by the endpoint that receives the transport
+ parameter. In client transport parameters, this applies to
+ streams with an identifier with the least significant two bits set
+ to 0x03; in server transport parameters, this applies to streams
+ with the least significant two bits set to 0x02.
+
+ initial_max_streams_bidi (0x08): The initial maximum bidirectional
+ streams parameter is an integer value that contains the initial
+ maximum number of bidirectional streams the endpoint that receives
+ this transport parameter is permitted to initiate. If this
+ parameter is absent or zero, the peer cannot open bidirectional
+ streams until a MAX_STREAMS frame is sent. Setting this parameter
+ is equivalent to sending a MAX_STREAMS (Section 19.11) of the
+ corresponding type with the same value.
+
+ initial_max_streams_uni (0x09): The initial maximum unidirectional
+ streams parameter is an integer value that contains the initial
+ maximum number of unidirectional streams the endpoint that
+ receives this transport parameter is permitted to initiate. If
+ this parameter is absent or zero, the peer cannot open
+ unidirectional streams until a MAX_STREAMS frame is sent. Setting
+ this parameter is equivalent to sending a MAX_STREAMS
+ (Section 19.11) of the corresponding type with the same value.
+
+ ack_delay_exponent (0x0a): The acknowledgment delay exponent is an
+ integer value indicating an exponent used to decode the ACK Delay
+ field in the ACK frame (Section 19.3). If this value is absent, a
+ default value of 3 is assumed (indicating a multiplier of 8).
+ Values above 20 are invalid.
+
+ max_ack_delay (0x0b): The maximum acknowledgment delay is an integer
+ value indicating the maximum amount of time in milliseconds by
+ which the endpoint will delay sending acknowledgments. This value
+ SHOULD include the receiver's expected delays in alarms firing.
+ For example, if a receiver sets a timer for 5ms and alarms
+ commonly fire up to 1ms late, then it should send a max_ack_delay
+ of 6ms. If this value is absent, a default of 25 milliseconds is
+ assumed. Values of 2^14 or greater are invalid.
+
+ disable_active_migration (0x0c): The disable active migration
+ transport parameter is included if the endpoint does not support
+ active connection migration (Section 9) on the address being used
+ during the handshake. An endpoint that receives this transport
+ parameter MUST NOT use a new local address when sending to the
+ address that the peer used during the handshake. This transport
+ parameter does not prohibit connection migration after a client
+ has acted on a preferred_address transport parameter. This
+ parameter is a zero-length value.
+
+ preferred_address (0x0d): The server's preferred address is used to
+ effect a change in server address at the end of the handshake, as
+ described in Section 9.6. This transport parameter is only sent
+ by a server. Servers MAY choose to only send a preferred address
+ of one address family by sending an all-zero address and port
+ (0.0.0.0:0 or [::]:0) for the other family. IP addresses are
+ encoded in network byte order.
+
+ The preferred_address transport parameter contains an address and
+ port for both IPv4 and IPv6. The four-byte IPv4 Address field is
+ followed by the associated two-byte IPv4 Port field. This is
+ followed by a 16-byte IPv6 Address field and two-byte IPv6 Port
+ field. After address and port pairs, a Connection ID Length field
+ describes the length of the following Connection ID field.
+ Finally, a 16-byte Stateless Reset Token field includes the
+ stateless reset token associated with the connection ID. The
+ format of this transport parameter is shown in Figure 22 below.
+
+ The Connection ID field and the Stateless Reset Token field
+ contain an alternative connection ID that has a sequence number of
+ 1; see Section 5.1.1. Having these values sent alongside the
+ preferred address ensures that there will be at least one unused
+ active connection ID when the client initiates migration to the
+ preferred address.
+
+ The Connection ID and Stateless Reset Token fields of a preferred
+ address are identical in syntax and semantics to the corresponding
+ fields of a NEW_CONNECTION_ID frame (Section 19.15). A server
+ that chooses a zero-length connection ID MUST NOT provide a
+ preferred address. Similarly, a server MUST NOT include a zero-
+ length connection ID in this transport parameter. A client MUST
+ treat a violation of these requirements as a connection error of
+ type TRANSPORT_PARAMETER_ERROR.
+
+ Preferred Address {
+ IPv4 Address (32),
+ IPv4 Port (16),
+ IPv6 Address (128),
+ IPv6 Port (16),
+ Connection ID Length (8),
+ Connection ID (..),
+ Stateless Reset Token (128),
+ }
+
+ Figure 22: Preferred Address Format
+
+ active_connection_id_limit (0x0e): This is an integer value
+ specifying the maximum number of connection IDs from the peer that
+ an endpoint is willing to store. This value includes the
+ connection ID received during the handshake, that received in the
+ preferred_address transport parameter, and those received in
+ NEW_CONNECTION_ID frames. The value of the
+ active_connection_id_limit parameter MUST be at least 2. An
+ endpoint that receives a value less than 2 MUST close the
+ connection with an error of type TRANSPORT_PARAMETER_ERROR. If
+ this transport parameter is absent, a default of 2 is assumed. If
+ an endpoint issues a zero-length connection ID, it will never send
+ a NEW_CONNECTION_ID frame and therefore ignores the
+ active_connection_id_limit value received from its peer.
+
+ initial_source_connection_id (0x0f): This is the value that the
+ endpoint included in the Source Connection ID field of the first
+ Initial packet it sends for the connection; see Section 7.3.
+
+ retry_source_connection_id (0x10): This is the value that the server
+ included in the Source Connection ID field of a Retry packet; see
+ Section 7.3. This transport parameter is only sent by a server.
+
+ If present, transport parameters that set initial per-stream flow
+ control limits (initial_max_stream_data_bidi_local,
+ initial_max_stream_data_bidi_remote, and initial_max_stream_data_uni)
+ are equivalent to sending a MAX_STREAM_DATA frame (Section 19.10) on
+ every stream of the corresponding type immediately after opening. If
+ the transport parameter is absent, streams of that type start with a
+ flow control limit of 0.
+
+ A client MUST NOT include any server-only transport parameter:
+ original_destination_connection_id, preferred_address,
+ retry_source_connection_id, or stateless_reset_token. A server MUST
+ treat receipt of any of these transport parameters as a connection
+ error of type TRANSPORT_PARAMETER_ERROR.
+
+19. Frame Types and Formats
+
+ As described in Section 12.4, packets contain one or more frames.
+ This section describes the format and semantics of the core QUIC
+ frame types.
+
+19.1. PADDING Frames
+
+ A PADDING frame (type=0x00) has no semantic value. PADDING frames
+ can be used to increase the size of a packet. Padding can be used to
+ increase an Initial packet to the minimum required size or to provide
+ protection against traffic analysis for protected packets.
+
+ PADDING frames are formatted as shown in Figure 23, which shows that
+ PADDING frames have no content. That is, a PADDING frame consists of
+ the single byte that identifies the frame as a PADDING frame.
+
+ PADDING Frame {
+ Type (i) = 0x00,
+ }
+
+ Figure 23: PADDING Frame Format
+
+19.2. PING Frames
+
+ Endpoints can use PING frames (type=0x01) to verify that their peers
+ are still alive or to check reachability to the peer.
+
+ PING frames are formatted as shown in Figure 24, which shows that
+ PING frames have no content.
+
+ PING Frame {
+ Type (i) = 0x01,
+ }
+
+ Figure 24: PING Frame Format
+
+ The receiver of a PING frame simply needs to acknowledge the packet
+ containing this frame.
+
+ The PING frame can be used to keep a connection alive when an
+ application or application protocol wishes to prevent the connection
+ from timing out; see Section 10.1.2.
+
+19.3. ACK Frames
+
+ Receivers send ACK frames (types 0x02 and 0x03) to inform senders of
+ packets they have received and processed. The ACK frame contains one
+ or more ACK Ranges. ACK Ranges identify acknowledged packets. If
+ the frame type is 0x03, ACK frames also contain the cumulative count
+ of QUIC packets with associated ECN marks received on the connection
+ up until this point. QUIC implementations MUST properly handle both
+ types, and, if they have enabled ECN for packets they send, they
+ SHOULD use the information in the ECN section to manage their
+ congestion state.
+
+ QUIC acknowledgments are irrevocable. Once acknowledged, a packet
+ remains acknowledged, even if it does not appear in a future ACK
+ frame. This is unlike reneging for TCP Selective Acknowledgments
+ (SACKs) [RFC2018].
+
+ Packets from different packet number spaces can be identified using
+ the same numeric value. An acknowledgment for a packet needs to
+ indicate both a packet number and a packet number space. This is
+ accomplished by having each ACK frame only acknowledge packet numbers
+ in the same space as the packet in which the ACK frame is contained.
+
+ Version Negotiation and Retry packets cannot be acknowledged because
+ they do not contain a packet number. Rather than relying on ACK
+ frames, these packets are implicitly acknowledged by the next Initial
+ packet sent by the client.
+
+ ACK frames are formatted as shown in Figure 25.
+
+ ACK Frame {
+ Type (i) = 0x02..0x03,
+ Largest Acknowledged (i),
+ ACK Delay (i),
+ ACK Range Count (i),
+ First ACK Range (i),
+ ACK Range (..) ...,
+ [ECN Counts (..)],
+ }
+
+ Figure 25: ACK Frame Format
+
+ ACK frames contain the following fields:
+
+ Largest Acknowledged: A variable-length integer representing the
+ largest packet number the peer is acknowledging; this is usually
+ the largest packet number that the peer has received prior to
+ generating the ACK frame. Unlike the packet number in the QUIC
+ long or short header, the value in an ACK frame is not truncated.
+
+ ACK Delay: A variable-length integer encoding the acknowledgment
+ delay in microseconds; see Section 13.2.5. It is decoded by
+ multiplying the value in the field by 2 to the power of the
+ ack_delay_exponent transport parameter sent by the sender of the
+ ACK frame; see Section 18.2. Compared to simply expressing the
+ delay as an integer, this encoding allows for a larger range of
+ values within the same number of bytes, at the cost of lower
+ resolution.
+
+ ACK Range Count: A variable-length integer specifying the number of
+ ACK Range fields in the frame.
+
+ First ACK Range: A variable-length integer indicating the number of
+ contiguous packets preceding the Largest Acknowledged that are
+ being acknowledged. That is, the smallest packet acknowledged in
+ the range is determined by subtracting the First ACK Range value
+ from the Largest Acknowledged field.
+
+ ACK Ranges: Contains additional ranges of packets that are
+ alternately not acknowledged (Gap) and acknowledged (ACK Range);
+ see Section 19.3.1.
+
+ ECN Counts: The three ECN counts; see Section 19.3.2.
+
+19.3.1. ACK Ranges
+
+ Each ACK Range consists of alternating Gap and ACK Range Length
+ values in descending packet number order. ACK Ranges can be
+ repeated. The number of Gap and ACK Range Length values is
+ determined by the ACK Range Count field; one of each value is present
+ for each value in the ACK Range Count field.
+
+ ACK Ranges are structured as shown in Figure 26.
+
+ ACK Range {
+ Gap (i),
+ ACK Range Length (i),
+ }
+
+ Figure 26: ACK Ranges
+
+ The fields that form each ACK Range are:
+
+ Gap: A variable-length integer indicating the number of contiguous
+ unacknowledged packets preceding the packet number one lower than
+ the smallest in the preceding ACK Range.
+
+ ACK Range Length: A variable-length integer indicating the number of
+ contiguous acknowledged packets preceding the largest packet
+ number, as determined by the preceding Gap.
+
+ Gap and ACK Range Length values use a relative integer encoding for
+ efficiency. Though each encoded value is positive, the values are
+ subtracted, so that each ACK Range describes progressively lower-
+ numbered packets.
+
+ Each ACK Range acknowledges a contiguous range of packets by
+ indicating the number of acknowledged packets that precede the
+ largest packet number in that range. A value of 0 indicates that
+ only the largest packet number is acknowledged. Larger ACK Range
+ values indicate a larger range, with corresponding lower values for
+ the smallest packet number in the range. Thus, given a largest
+ packet number for the range, the smallest value is determined by the
+ following formula:
+
+ smallest = largest - ack_range
+
+ An ACK Range acknowledges all packets between the smallest packet
+ number and the largest, inclusive.
+
+ The largest value for an ACK Range is determined by cumulatively
+ subtracting the size of all preceding ACK Range Lengths and Gaps.
+
+ Each Gap indicates a range of packets that are not being
+ acknowledged. The number of packets in the gap is one higher than
+ the encoded value of the Gap field.
+
+ The value of the Gap field establishes the largest packet number
+ value for the subsequent ACK Range using the following formula:
+
+ largest = previous_smallest - gap - 2
+
+ If any computed packet number is negative, an endpoint MUST generate
+ a connection error of type FRAME_ENCODING_ERROR.
+
+19.3.2. ECN Counts
+
+ The ACK frame uses the least significant bit of the type value (that
+ is, type 0x03) to indicate ECN feedback and report receipt of QUIC
+ packets with associated ECN codepoints of ECT(0), ECT(1), or ECN-CE
+ in the packet's IP header. ECN counts are only present when the ACK
+ frame type is 0x03.
+
+ When present, there are three ECN counts, as shown in Figure 27.
+
+ ECN Counts {
+ ECT0 Count (i),
+ ECT1 Count (i),
+ ECN-CE Count (i),
+ }
+
+ Figure 27: ECN Count Format
+
+ The ECN count fields are:
+
+ ECT0 Count: A variable-length integer representing the total number
+ of packets received with the ECT(0) codepoint in the packet number
+ space of the ACK frame.
+
+ ECT1 Count: A variable-length integer representing the total number
+ of packets received with the ECT(1) codepoint in the packet number
+ space of the ACK frame.
+
+ ECN-CE Count: A variable-length integer representing the total
+ number of packets received with the ECN-CE codepoint in the packet
+ number space of the ACK frame.
+
+ ECN counts are maintained separately for each packet number space.
+
+19.4. RESET_STREAM Frames
+
+ An endpoint uses a RESET_STREAM frame (type=0x04) to abruptly
+ terminate the sending part of a stream.
+
+ After sending a RESET_STREAM, an endpoint ceases transmission and
+ retransmission of STREAM frames on the identified stream. A receiver
+ of RESET_STREAM can discard any data that it already received on that
+ stream.
+
+ An endpoint that receives a RESET_STREAM frame for a send-only stream
+ MUST terminate the connection with error STREAM_STATE_ERROR.
+
+ RESET_STREAM frames are formatted as shown in Figure 28.
+
+ RESET_STREAM Frame {
+ Type (i) = 0x04,
+ Stream ID (i),
+ Application Protocol Error Code (i),
+ Final Size (i),
+ }
+
+ Figure 28: RESET_STREAM Frame Format
+
+ RESET_STREAM frames contain the following fields:
+
+ Stream ID: A variable-length integer encoding of the stream ID of
+ the stream being terminated.
+
+ Application Protocol Error Code: A variable-length integer
+ containing the application protocol error code (see Section 20.2)
+ that indicates why the stream is being closed.
+
+ Final Size: A variable-length integer indicating the final size of
+ the stream by the RESET_STREAM sender, in units of bytes; see
+ Section 4.5.
+
+19.5. STOP_SENDING Frames
+
+ An endpoint uses a STOP_SENDING frame (type=0x05) to communicate that
+ incoming data is being discarded on receipt per application request.
+ STOP_SENDING requests that a peer cease transmission on a stream.
+
+ A STOP_SENDING frame can be sent for streams in the "Recv" or "Size
+ Known" states; see Section 3.2. Receiving a STOP_SENDING frame for a
+ locally initiated stream that has not yet been created MUST be
+ treated as a connection error of type STREAM_STATE_ERROR. An
+ endpoint that receives a STOP_SENDING frame for a receive-only stream
+ MUST terminate the connection with error STREAM_STATE_ERROR.
+
+ STOP_SENDING frames are formatted as shown in Figure 29.
+
+ STOP_SENDING Frame {
+ Type (i) = 0x05,
+ Stream ID (i),
+ Application Protocol Error Code (i),
+ }
+
+ Figure 29: STOP_SENDING Frame Format
+
+ STOP_SENDING frames contain the following fields:
+
+ Stream ID: A variable-length integer carrying the stream ID of the
+ stream being ignored.
+
+ Application Protocol Error Code: A variable-length integer
+ containing the application-specified reason the sender is ignoring
+ the stream; see Section 20.2.
+
+19.6. CRYPTO Frames
+
+ A CRYPTO frame (type=0x06) is used to transmit cryptographic
+ handshake messages. It can be sent in all packet types except 0-RTT.
+ The CRYPTO frame offers the cryptographic protocol an in-order stream
+ of bytes. CRYPTO frames are functionally identical to STREAM frames,
+ except that they do not bear a stream identifier; they are not flow
+ controlled; and they do not carry markers for optional offset,
+ optional length, and the end of the stream.
+
+ CRYPTO frames are formatted as shown in Figure 30.
+
+ CRYPTO Frame {
+ Type (i) = 0x06,
+ Offset (i),
+ Length (i),
+ Crypto Data (..),
+ }
+
+ Figure 30: CRYPTO Frame Format
+
+ CRYPTO frames contain the following fields:
+
+ Offset: A variable-length integer specifying the byte offset in the
+ stream for the data in this CRYPTO frame.
+
+ Length: A variable-length integer specifying the length of the
+ Crypto Data field in this CRYPTO frame.
+
+ Crypto Data: The cryptographic message data.
+
+ There is a separate flow of cryptographic handshake data in each
+ encryption level, each of which starts at an offset of 0. This
+ implies that each encryption level is treated as a separate CRYPTO
+ stream of data.
+
+ The largest offset delivered on a stream -- the sum of the offset and
+ data length -- cannot exceed 2^62-1. Receipt of a frame that exceeds
+ this limit MUST be treated as a connection error of type
+ FRAME_ENCODING_ERROR or CRYPTO_BUFFER_EXCEEDED.
+
+ Unlike STREAM frames, which include a stream ID indicating to which
+ stream the data belongs, the CRYPTO frame carries data for a single
+ stream per encryption level. The stream does not have an explicit
+ end, so CRYPTO frames do not have a FIN bit.
+
+19.7. NEW_TOKEN Frames
+
+ A server sends a NEW_TOKEN frame (type=0x07) to provide the client
+ with a token to send in the header of an Initial packet for a future
+ connection.
+
+ NEW_TOKEN frames are formatted as shown in Figure 31.
+
+ NEW_TOKEN Frame {
+ Type (i) = 0x07,
+ Token Length (i),
+ Token (..),
+ }
+
+ Figure 31: NEW_TOKEN Frame Format
+
+ NEW_TOKEN frames contain the following fields:
+
+ Token Length: A variable-length integer specifying the length of the
+ token in bytes.
+
+ Token: An opaque blob that the client can use with a future Initial
+ packet. The token MUST NOT be empty. A client MUST treat receipt
+ of a NEW_TOKEN frame with an empty Token field as a connection
+ error of type FRAME_ENCODING_ERROR.
+
+ A client might receive multiple NEW_TOKEN frames that contain the
+ same token value if packets containing the frame are incorrectly
+ determined to be lost. Clients are responsible for discarding
+ duplicate values, which might be used to link connection attempts;
+ see Section 8.1.3.
+
+ Clients MUST NOT send NEW_TOKEN frames. A server MUST treat receipt
+ of a NEW_TOKEN frame as a connection error of type
+ PROTOCOL_VIOLATION.
+
+19.8. STREAM Frames
+
+ STREAM frames implicitly create a stream and carry stream data. The
+ Type field in the STREAM frame takes the form 0b00001XXX (or the set
+ of values from 0x08 to 0x0f). The three low-order bits of the frame
+ type determine the fields that are present in the frame:
+
+ * The OFF bit (0x04) in the frame type is set to indicate that there
+ is an Offset field present. When set to 1, the Offset field is
+ present. When set to 0, the Offset field is absent and the Stream
+ Data starts at an offset of 0 (that is, the frame contains the
+ first bytes of the stream, or the end of a stream that includes no
+ data).
+
+ * The LEN bit (0x02) in the frame type is set to indicate that there
+ is a Length field present. If this bit is set to 0, the Length
+ field is absent and the Stream Data field extends to the end of
+ the packet. If this bit is set to 1, the Length field is present.
+
+ * The FIN bit (0x01) indicates that the frame marks the end of the
+ stream. The final size of the stream is the sum of the offset and
+ the length of this frame.
+
+ An endpoint MUST terminate the connection with error
+ STREAM_STATE_ERROR if it receives a STREAM frame for a locally
+ initiated stream that has not yet been created, or for a send-only
+ stream.
+
+ STREAM frames are formatted as shown in Figure 32.
+
+ STREAM Frame {
+ Type (i) = 0x08..0x0f,
+ Stream ID (i),
+ [Offset (i)],
+ [Length (i)],
+ Stream Data (..),
+ }
+
+ Figure 32: STREAM Frame Format
+
+ STREAM frames contain the following fields:
+
+ Stream ID: A variable-length integer indicating the stream ID of the
+ stream; see Section 2.1.
+
+ Offset: A variable-length integer specifying the byte offset in the
+ stream for the data in this STREAM frame. This field is present
+ when the OFF bit is set to 1. When the Offset field is absent,
+ the offset is 0.
+
+ Length: A variable-length integer specifying the length of the
+ Stream Data field in this STREAM frame. This field is present
+ when the LEN bit is set to 1. When the LEN bit is set to 0, the
+ Stream Data field consumes all the remaining bytes in the packet.
+
+ Stream Data: The bytes from the designated stream to be delivered.
+
+ When a Stream Data field has a length of 0, the offset in the STREAM
+ frame is the offset of the next byte that would be sent.
+
+ The first byte in the stream has an offset of 0. The largest offset
+ delivered on a stream -- the sum of the offset and data length --
+ cannot exceed 2^62-1, as it is not possible to provide flow control
+ credit for that data. Receipt of a frame that exceeds this limit
+ MUST be treated as a connection error of type FRAME_ENCODING_ERROR or
+ FLOW_CONTROL_ERROR.
+
+19.9. MAX_DATA Frames
+
+ A MAX_DATA frame (type=0x10) is used in flow control to inform the
+ peer of the maximum amount of data that can be sent on the connection
+ as a whole.
+
+ MAX_DATA frames are formatted as shown in Figure 33.
+
+ MAX_DATA Frame {
+ Type (i) = 0x10,
+ Maximum Data (i),
+ }
+
+ Figure 33: MAX_DATA Frame Format
+
+ MAX_DATA frames contain the following field:
+
+ Maximum Data: A variable-length integer indicating the maximum
+ amount of data that can be sent on the entire connection, in units
+ of bytes.
+
+ All data sent in STREAM frames counts toward this limit. The sum of
+ the final sizes on all streams -- including streams in terminal
+ states -- MUST NOT exceed the value advertised by a receiver. An
+ endpoint MUST terminate a connection with an error of type
+ FLOW_CONTROL_ERROR if it receives more data than the maximum data
+ value that it has sent. This includes violations of remembered
+ limits in Early Data; see Section 7.4.1.
+
+19.10. MAX_STREAM_DATA Frames
+
+ A MAX_STREAM_DATA frame (type=0x11) is used in flow control to inform
+ a peer of the maximum amount of data that can be sent on a stream.
+
+ A MAX_STREAM_DATA frame can be sent for streams in the "Recv" state;
+ see Section 3.2. Receiving a MAX_STREAM_DATA frame for a locally
+ initiated stream that has not yet been created MUST be treated as a
+ connection error of type STREAM_STATE_ERROR. An endpoint that
+ receives a MAX_STREAM_DATA frame for a receive-only stream MUST
+ terminate the connection with error STREAM_STATE_ERROR.
+
+ MAX_STREAM_DATA frames are formatted as shown in Figure 34.
+
+ MAX_STREAM_DATA Frame {
+ Type (i) = 0x11,
+ Stream ID (i),
+ Maximum Stream Data (i),
+ }
+
+ Figure 34: MAX_STREAM_DATA Frame Format
+
+ MAX_STREAM_DATA frames contain the following fields:
+
+ Stream ID: The stream ID of the affected stream, encoded as a
+ variable-length integer.
+
+ Maximum Stream Data: A variable-length integer indicating the
+ maximum amount of data that can be sent on the identified stream,
+ in units of bytes.
+
+ When counting data toward this limit, an endpoint accounts for the
+ largest received offset of data that is sent or received on the
+ stream. Loss or reordering can mean that the largest received offset
+ on a stream can be greater than the total size of data received on
+ that stream. Receiving STREAM frames might not increase the largest
+ received offset.
+
+ The data sent on a stream MUST NOT exceed the largest maximum stream
+ data value advertised by the receiver. An endpoint MUST terminate a
+ connection with an error of type FLOW_CONTROL_ERROR if it receives
+ more data than the largest maximum stream data that it has sent for
+ the affected stream. This includes violations of remembered limits
+ in Early Data; see Section 7.4.1.
+
+19.11. MAX_STREAMS Frames
+
+ A MAX_STREAMS frame (type=0x12 or 0x13) informs the peer of the
+ cumulative number of streams of a given type it is permitted to open.
+ A MAX_STREAMS frame with a type of 0x12 applies to bidirectional
+ streams, and a MAX_STREAMS frame with a type of 0x13 applies to
+ unidirectional streams.
+
+ MAX_STREAMS frames are formatted as shown in Figure 35.
+
+ MAX_STREAMS Frame {
+ Type (i) = 0x12..0x13,
+ Maximum Streams (i),
+ }
+
+ Figure 35: MAX_STREAMS Frame Format
+
+ MAX_STREAMS frames contain the following field:
+
+ Maximum Streams: A count of the cumulative number of streams of the
+ corresponding type that can be opened over the lifetime of the
+ connection. This value cannot exceed 2^60, as it is not possible
+ to encode stream IDs larger than 2^62-1. Receipt of a frame that
+ permits opening of a stream larger than this limit MUST be treated
+ as a connection error of type FRAME_ENCODING_ERROR.
+
+ Loss or reordering can cause an endpoint to receive a MAX_STREAMS
+ frame with a lower stream limit than was previously received.
+ MAX_STREAMS frames that do not increase the stream limit MUST be
+ ignored.
+
+ An endpoint MUST NOT open more streams than permitted by the current
+ stream limit set by its peer. For instance, a server that receives a
+ unidirectional stream limit of 3 is permitted to open streams 3, 7,
+ and 11, but not stream 15. An endpoint MUST terminate a connection
+ with an error of type STREAM_LIMIT_ERROR if a peer opens more streams
+ than was permitted. This includes violations of remembered limits in
+ Early Data; see Section 7.4.1.
+
+ Note that these frames (and the corresponding transport parameters)
+ do not describe the number of streams that can be opened
+ concurrently. The limit includes streams that have been closed as
+ well as those that are open.
+
+19.12. DATA_BLOCKED Frames
+
+ A sender SHOULD send a DATA_BLOCKED frame (type=0x14) when it wishes
+ to send data but is unable to do so due to connection-level flow
+ control; see Section 4. DATA_BLOCKED frames can be used as input to
+ tuning of flow control algorithms; see Section 4.2.
+
+ DATA_BLOCKED frames are formatted as shown in Figure 36.
+
+ DATA_BLOCKED Frame {
+ Type (i) = 0x14,
+ Maximum Data (i),
+ }
+
+ Figure 36: DATA_BLOCKED Frame Format
+
+ DATA_BLOCKED frames contain the following field:
+
+ Maximum Data: A variable-length integer indicating the connection-
+ level limit at which blocking occurred.
+
+19.13. STREAM_DATA_BLOCKED Frames
+
+ A sender SHOULD send a STREAM_DATA_BLOCKED frame (type=0x15) when it
+ wishes to send data but is unable to do so due to stream-level flow
+ control. This frame is analogous to DATA_BLOCKED (Section 19.12).
+
+ An endpoint that receives a STREAM_DATA_BLOCKED frame for a send-only
+ stream MUST terminate the connection with error STREAM_STATE_ERROR.
+
+ STREAM_DATA_BLOCKED frames are formatted as shown in Figure 37.
+
+ STREAM_DATA_BLOCKED Frame {
+ Type (i) = 0x15,
+ Stream ID (i),
+ Maximum Stream Data (i),
+ }
+
+ Figure 37: STREAM_DATA_BLOCKED Frame Format
+
+ STREAM_DATA_BLOCKED frames contain the following fields:
+
+ Stream ID: A variable-length integer indicating the stream that is
+ blocked due to flow control.
+
+ Maximum Stream Data: A variable-length integer indicating the offset
+ of the stream at which the blocking occurred.
+
+19.14. STREAMS_BLOCKED Frames
+
+ A sender SHOULD send a STREAMS_BLOCKED frame (type=0x16 or 0x17) when
+ it wishes to open a stream but is unable to do so due to the maximum
+ stream limit set by its peer; see Section 19.11. A STREAMS_BLOCKED
+ frame of type 0x16 is used to indicate reaching the bidirectional
+ stream limit, and a STREAMS_BLOCKED frame of type 0x17 is used to
+ indicate reaching the unidirectional stream limit.
+
+ A STREAMS_BLOCKED frame does not open the stream, but informs the
+ peer that a new stream was needed and the stream limit prevented the
+ creation of the stream.
+
+ STREAMS_BLOCKED frames are formatted as shown in Figure 38.
+
+ STREAMS_BLOCKED Frame {
+ Type (i) = 0x16..0x17,
+ Maximum Streams (i),
+ }
+
+ Figure 38: STREAMS_BLOCKED Frame Format
+
+ STREAMS_BLOCKED frames contain the following field:
+
+ Maximum Streams: A variable-length integer indicating the maximum
+ number of streams allowed at the time the frame was sent. This
+ value cannot exceed 2^60, as it is not possible to encode stream
+ IDs larger than 2^62-1. Receipt of a frame that encodes a larger
+ stream ID MUST be treated as a connection error of type
+ STREAM_LIMIT_ERROR or FRAME_ENCODING_ERROR.
+
+19.15. NEW_CONNECTION_ID Frames
+
+ An endpoint sends a NEW_CONNECTION_ID frame (type=0x18) to provide
+ its peer with alternative connection IDs that can be used to break
+ linkability when migrating connections; see Section 9.5.
+
+ NEW_CONNECTION_ID frames are formatted as shown in Figure 39.
+
+ NEW_CONNECTION_ID Frame {
+ Type (i) = 0x18,
+ Sequence Number (i),
+ Retire Prior To (i),
+ Length (8),
+ Connection ID (8..160),
+ Stateless Reset Token (128),
+ }
+
+ Figure 39: NEW_CONNECTION_ID Frame Format
+
+ NEW_CONNECTION_ID frames contain the following fields:
+
+ Sequence Number: The sequence number assigned to the connection ID
+ by the sender, encoded as a variable-length integer; see
+ Section 5.1.1.
+
+ Retire Prior To: A variable-length integer indicating which
+ connection IDs should be retired; see Section 5.1.2.
+
+ Length: An 8-bit unsigned integer containing the length of the
+ connection ID. Values less than 1 and greater than 20 are invalid
+ and MUST be treated as a connection error of type
+ FRAME_ENCODING_ERROR.
+
+ Connection ID: A connection ID of the specified length.
+
+ Stateless Reset Token: A 128-bit value that will be used for a
+ stateless reset when the associated connection ID is used; see
+ Section 10.3.
+
+ An endpoint MUST NOT send this frame if it currently requires that
+ its peer send packets with a zero-length Destination Connection ID.
+ Changing the length of a connection ID to or from zero length makes
+ it difficult to identify when the value of the connection ID changed.
+ An endpoint that is sending packets with a zero-length Destination
+ Connection ID MUST treat receipt of a NEW_CONNECTION_ID frame as a
+ connection error of type PROTOCOL_VIOLATION.
+
+ Transmission errors, timeouts, and retransmissions might cause the
+ same NEW_CONNECTION_ID frame to be received multiple times. Receipt
+ of the same frame multiple times MUST NOT be treated as a connection
+ error. A receiver can use the sequence number supplied in the
+ NEW_CONNECTION_ID frame to handle receiving the same
+ NEW_CONNECTION_ID frame multiple times.
+
+ If an endpoint receives a NEW_CONNECTION_ID frame that repeats a
+ previously issued connection ID with a different Stateless Reset
+ Token field value or a different Sequence Number field value, or if a
+ sequence number is used for different connection IDs, the endpoint
+ MAY treat that receipt as a connection error of type
+ PROTOCOL_VIOLATION.
+
+ The Retire Prior To field applies to connection IDs established
+ during connection setup and the preferred_address transport
+ parameter; see Section 5.1.2. The value in the Retire Prior To field
+ MUST be less than or equal to the value in the Sequence Number field.
+ Receiving a value in the Retire Prior To field that is greater than
+ that in the Sequence Number field MUST be treated as a connection
+ error of type FRAME_ENCODING_ERROR.
+
+ Once a sender indicates a Retire Prior To value, smaller values sent
+ in subsequent NEW_CONNECTION_ID frames have no effect. A receiver
+ MUST ignore any Retire Prior To fields that do not increase the
+ largest received Retire Prior To value.
+
+ An endpoint that receives a NEW_CONNECTION_ID frame with a sequence
+ number smaller than the Retire Prior To field of a previously
+ received NEW_CONNECTION_ID frame MUST send a corresponding
+ RETIRE_CONNECTION_ID frame that retires the newly received connection
+ ID, unless it has already done so for that sequence number.
+
+19.16. RETIRE_CONNECTION_ID Frames
+
+ An endpoint sends a RETIRE_CONNECTION_ID frame (type=0x19) to
+ indicate that it will no longer use a connection ID that was issued
+ by its peer. This includes the connection ID provided during the
+ handshake. Sending a RETIRE_CONNECTION_ID frame also serves as a
+ request to the peer to send additional connection IDs for future use;
+ see Section 5.1. New connection IDs can be delivered to a peer using
+ the NEW_CONNECTION_ID frame (Section 19.15).
+
+ Retiring a connection ID invalidates the stateless reset token
+ associated with that connection ID.
+
+ RETIRE_CONNECTION_ID frames are formatted as shown in Figure 40.
+
+ RETIRE_CONNECTION_ID Frame {
+ Type (i) = 0x19,
+ Sequence Number (i),
+ }
+
+ Figure 40: RETIRE_CONNECTION_ID Frame Format
+
+ RETIRE_CONNECTION_ID frames contain the following field:
+
+ Sequence Number: The sequence number of the connection ID being
+ retired; see Section 5.1.2.
+
+ Receipt of a RETIRE_CONNECTION_ID frame containing a sequence number
+ greater than any previously sent to the peer MUST be treated as a
+ connection error of type PROTOCOL_VIOLATION.
+
+ The sequence number specified in a RETIRE_CONNECTION_ID frame MUST
+ NOT refer to the Destination Connection ID field of the packet in
+ which the frame is contained. The peer MAY treat this as a
+ connection error of type PROTOCOL_VIOLATION.
+
+ An endpoint cannot send this frame if it was provided with a zero-
+ length connection ID by its peer. An endpoint that provides a zero-
+ length connection ID MUST treat receipt of a RETIRE_CONNECTION_ID
+ frame as a connection error of type PROTOCOL_VIOLATION.
+
+19.17. PATH_CHALLENGE Frames
+
+ Endpoints can use PATH_CHALLENGE frames (type=0x1a) to check
+ reachability to the peer and for path validation during connection
+ migration.
+
+ PATH_CHALLENGE frames are formatted as shown in Figure 41.
+
+ PATH_CHALLENGE Frame {
+ Type (i) = 0x1a,
+ Data (64),
+ }
+
+ Figure 41: PATH_CHALLENGE Frame Format
+
+ PATH_CHALLENGE frames contain the following field:
+
+ Data: This 8-byte field contains arbitrary data.
+
+ Including 64 bits of entropy in a PATH_CHALLENGE frame ensures that
+ it is easier to receive the packet than it is to guess the value
+ correctly.
+
+ The recipient of this frame MUST generate a PATH_RESPONSE frame
+ (Section 19.18) containing the same Data value.
+
+19.18. PATH_RESPONSE Frames
+
+ A PATH_RESPONSE frame (type=0x1b) is sent in response to a
+ PATH_CHALLENGE frame.
+
+ PATH_RESPONSE frames are formatted as shown in Figure 42. The format
+ of a PATH_RESPONSE frame is identical to that of the PATH_CHALLENGE
+ frame; see Section 19.17.
+
+ PATH_RESPONSE Frame {
+ Type (i) = 0x1b,
+ Data (64),
+ }
+
+ Figure 42: PATH_RESPONSE Frame Format
+
+ If the content of a PATH_RESPONSE frame does not match the content of
+ a PATH_CHALLENGE frame previously sent by the endpoint, the endpoint
+ MAY generate a connection error of type PROTOCOL_VIOLATION.
+
+19.19. CONNECTION_CLOSE Frames
+
+ An endpoint sends a CONNECTION_CLOSE frame (type=0x1c or 0x1d) to
+ notify its peer that the connection is being closed. The
+ CONNECTION_CLOSE frame with a type of 0x1c is used to signal errors
+ at only the QUIC layer, or the absence of errors (with the NO_ERROR
+ code). The CONNECTION_CLOSE frame with a type of 0x1d is used to
+ signal an error with the application that uses QUIC.
+
+ If there are open streams that have not been explicitly closed, they
+ are implicitly closed when the connection is closed.
+
+ CONNECTION_CLOSE frames are formatted as shown in Figure 43.
+
+ CONNECTION_CLOSE Frame {
+ Type (i) = 0x1c..0x1d,
+ Error Code (i),
+ [Frame Type (i)],
+ Reason Phrase Length (i),
+ Reason Phrase (..),
+ }
+
+ Figure 43: CONNECTION_CLOSE Frame Format
+
+ CONNECTION_CLOSE frames contain the following fields:
+
+ Error Code: A variable-length integer that indicates the reason for
+ closing this connection. A CONNECTION_CLOSE frame of type 0x1c
+ uses codes from the space defined in Section 20.1. A
+ CONNECTION_CLOSE frame of type 0x1d uses codes defined by the
+ application protocol; see Section 20.2.
+
+ Frame Type: A variable-length integer encoding the type of frame
+ that triggered the error. A value of 0 (equivalent to the mention
+ of the PADDING frame) is used when the frame type is unknown. The
+ application-specific variant of CONNECTION_CLOSE (type 0x1d) does
+ not include this field.
+
+ Reason Phrase Length: A variable-length integer specifying the
+ length of the reason phrase in bytes. Because a CONNECTION_CLOSE
+ frame cannot be split between packets, any limits on packet size
+ will also limit the space available for a reason phrase.
+
+ Reason Phrase: Additional diagnostic information for the closure.
+ This can be zero length if the sender chooses not to give details
+ beyond the Error Code value. This SHOULD be a UTF-8 encoded
+ string [RFC3629], though the frame does not carry information,
+ such as language tags, that would aid comprehension by any entity
+ other than the one that created the text.
+
+ The application-specific variant of CONNECTION_CLOSE (type 0x1d) can
+ only be sent using 0-RTT or 1-RTT packets; see Section 12.5. When an
+ application wishes to abandon a connection during the handshake, an
+ endpoint can send a CONNECTION_CLOSE frame (type 0x1c) with an error
+ code of APPLICATION_ERROR in an Initial or Handshake packet.
+
+19.20. HANDSHAKE_DONE Frames
+
+ The server uses a HANDSHAKE_DONE frame (type=0x1e) to signal
+ confirmation of the handshake to the client.
+
+ HANDSHAKE_DONE frames are formatted as shown in Figure 44, which
+ shows that HANDSHAKE_DONE frames have no content.
+
+ HANDSHAKE_DONE Frame {
+ Type (i) = 0x1e,
+ }
+
+ Figure 44: HANDSHAKE_DONE Frame Format
+
+ A HANDSHAKE_DONE frame can only be sent by the server. Servers MUST
+ NOT send a HANDSHAKE_DONE frame before completing the handshake. A
+ server MUST treat receipt of a HANDSHAKE_DONE frame as a connection
+ error of type PROTOCOL_VIOLATION.
+
+19.21. Extension Frames
+
+ QUIC frames do not use a self-describing encoding. An endpoint
+ therefore needs to understand the syntax of all frames before it can
+ successfully process a packet. This allows for efficient encoding of
+ frames, but it means that an endpoint cannot send a frame of a type
+ that is unknown to its peer.
+
+ An extension to QUIC that wishes to use a new type of frame MUST
+ first ensure that a peer is able to understand the frame. An
+ endpoint can use a transport parameter to signal its willingness to
+ receive extension frame types. One transport parameter can indicate
+ support for one or more extension frame types.
+
+ Extensions that modify or replace core protocol functionality
+ (including frame types) will be difficult to combine with other
+ extensions that modify or replace the same functionality unless the
+ behavior of the combination is explicitly defined. Such extensions
+ SHOULD define their interaction with previously defined extensions
+ modifying the same protocol components.
+
+ Extension frames MUST be congestion controlled and MUST cause an ACK
+ frame to be sent. The exception is extension frames that replace or
+ supplement the ACK frame. Extension frames are not included in flow
+ control unless specified in the extension.
+
+ An IANA registry is used to manage the assignment of frame types; see
+ Section 22.4.
+
+20. Error Codes
+
+ QUIC transport error codes and application error codes are 62-bit
+ unsigned integers.
+
+20.1. Transport Error Codes
+
+ This section lists the defined QUIC transport error codes that can be
+ used in a CONNECTION_CLOSE frame with a type of 0x1c. These errors
+ apply to the entire connection.
+
+ NO_ERROR (0x00): An endpoint uses this with CONNECTION_CLOSE to
+ signal that the connection is being closed abruptly in the absence
+ of any error.
+
+ INTERNAL_ERROR (0x01): The endpoint encountered an internal error
+ and cannot continue with the connection.
+
+ CONNECTION_REFUSED (0x02): The server refused to accept a new
+ connection.
+
+ FLOW_CONTROL_ERROR (0x03): An endpoint received more data than it
+ permitted in its advertised data limits; see Section 4.
+
+ STREAM_LIMIT_ERROR (0x04): An endpoint received a frame for a stream
+ identifier that exceeded its advertised stream limit for the
+ corresponding stream type.
+
+ STREAM_STATE_ERROR (0x05): An endpoint received a frame for a stream
+ that was not in a state that permitted that frame; see Section 3.
+
+ FINAL_SIZE_ERROR (0x06): (1) An endpoint received a STREAM frame
+ containing data that exceeded the previously established final
+ size, (2) an endpoint received a STREAM frame or a RESET_STREAM
+ frame containing a final size that was lower than the size of
+ stream data that was already received, or (3) an endpoint received
+ a STREAM frame or a RESET_STREAM frame containing a different
+ final size to the one already established.
+
+ FRAME_ENCODING_ERROR (0x07): An endpoint received a frame that was
+ badly formatted -- for instance, a frame of an unknown type or an
+ ACK frame that has more acknowledgment ranges than the remainder
+ of the packet could carry.
+
+ TRANSPORT_PARAMETER_ERROR (0x08): An endpoint received transport
+ parameters that were badly formatted, included an invalid value,
+ omitted a mandatory transport parameter, included a forbidden
+ transport parameter, or were otherwise in error.
+
+ CONNECTION_ID_LIMIT_ERROR (0x09): The number of connection IDs
+ provided by the peer exceeds the advertised
+ active_connection_id_limit.
+
+ PROTOCOL_VIOLATION (0x0a): An endpoint detected an error with
+ protocol compliance that was not covered by more specific error
+ codes.
+
+ INVALID_TOKEN (0x0b): A server received a client Initial that
+ contained an invalid Token field.
+
+ APPLICATION_ERROR (0x0c): The application or application protocol
+ caused the connection to be closed.
+
+ CRYPTO_BUFFER_EXCEEDED (0x0d): An endpoint has received more data in
+ CRYPTO frames than it can buffer.
+
+ KEY_UPDATE_ERROR (0x0e): An endpoint detected errors in performing
+ key updates; see Section 6 of [QUIC-TLS].
+
+ AEAD_LIMIT_REACHED (0x0f): An endpoint has reached the
+ confidentiality or integrity limit for the AEAD algorithm used by
+ the given connection.
+
+ NO_VIABLE_PATH (0x10): An endpoint has determined that the network
+ path is incapable of supporting QUIC. An endpoint is unlikely to
+ receive a CONNECTION_CLOSE frame carrying this code except when
+ the path does not support a large enough MTU.
+
+ CRYPTO_ERROR (0x0100-0x01ff): The cryptographic handshake failed. A
+ range of 256 values is reserved for carrying error codes specific
+ to the cryptographic handshake that is used. Codes for errors
+ occurring when TLS is used for the cryptographic handshake are
+ described in Section 4.8 of [QUIC-TLS].
+
+ See Section 22.5 for details on registering new error codes.
+
+ In defining these error codes, several principles are applied. Error
+ conditions that might require specific action on the part of a
+ recipient are given unique codes. Errors that represent common
+ conditions are given specific codes. Absent either of these
+ conditions, error codes are used to identify a general function of
+ the stack, like flow control or transport parameter handling.
+ Finally, generic errors are provided for conditions where
+ implementations are unable or unwilling to use more specific codes.
+
+20.2. Application Protocol Error Codes
+
+ The management of application error codes is left to application
+ protocols. Application protocol error codes are used for the
+ RESET_STREAM frame (Section 19.4), the STOP_SENDING frame
+ (Section 19.5), and the CONNECTION_CLOSE frame with a type of 0x1d
+ (Section 19.19).
+
+21. Security Considerations
+
+ The goal of QUIC is to provide a secure transport connection.
+ Section 21.1 provides an overview of those properties; subsequent
+ sections discuss constraints and caveats regarding these properties,
+ including descriptions of known attacks and countermeasures.
+
+21.1. Overview of Security Properties
+
+ A complete security analysis of QUIC is outside the scope of this
+ document. This section provides an informal description of the
+ desired security properties as an aid to implementers and to help
+ guide protocol analysis.
+
+ QUIC assumes the threat model described in [SEC-CONS] and provides
+ protections against many of the attacks that arise from that model.
+
+ For this purpose, attacks are divided into passive and active
+ attacks. Passive attackers have the ability to read packets from the
+ network, while active attackers also have the ability to write
+ packets into the network. However, a passive attack could involve an
+ attacker with the ability to cause a routing change or other
+ modification in the path taken by packets that comprise a connection.
+
+ Attackers are additionally categorized as either on-path attackers or
+ off-path attackers. An on-path attacker can read, modify, or remove
+ any packet it observes such that the packet no longer reaches its
+ destination, while an off-path attacker observes the packets but
+ cannot prevent the original packet from reaching its intended
+ destination. Both types of attackers can also transmit arbitrary
+ packets. This definition differs from that of Section 3.5 of
+ [SEC-CONS] in that an off-path attacker is able to observe packets.
+
+ Properties of the handshake, protected packets, and connection
+ migration are considered separately.
+
+21.1.1. Handshake
+
+ The QUIC handshake incorporates the TLS 1.3 handshake and inherits
+ the cryptographic properties described in Appendix E.1 of [TLS13].
+ Many of the security properties of QUIC depend on the TLS handshake
+ providing these properties. Any attack on the TLS handshake could
+ affect QUIC.
+
+ Any attack on the TLS handshake that compromises the secrecy or
+ uniqueness of session keys, or the authentication of the
+ participating peers, affects other security guarantees provided by
+ QUIC that depend on those keys. For instance, migration (Section 9)
+ depends on the efficacy of confidentiality protections, both for the
+ negotiation of keys using the TLS handshake and for QUIC packet
+ protection, to avoid linkability across network paths.
+
+ An attack on the integrity of the TLS handshake might allow an
+ attacker to affect the selection of application protocol or QUIC
+ version.
+
+ In addition to the properties provided by TLS, the QUIC handshake
+ provides some defense against DoS attacks on the handshake.
+
+21.1.1.1. Anti-Amplification
+
+ Address validation (Section 8) is used to verify that an entity that
+ claims a given address is able to receive packets at that address.
+ Address validation limits amplification attack targets to addresses
+ for which an attacker can observe packets.
+
+ Prior to address validation, endpoints are limited in what they are
+ able to send. Endpoints cannot send data toward an unvalidated
+ address in excess of three times the data received from that address.
+
+ | Note: The anti-amplification limit only applies when an
+ | endpoint responds to packets received from an unvalidated
+ | address. The anti-amplification limit does not apply to
+ | clients when establishing a new connection or when initiating
+ | connection migration.
+
+21.1.1.2. Server-Side DoS
+
+ Computing the server's first flight for a full handshake is
+ potentially expensive, requiring both a signature and a key exchange
+ computation. In order to prevent computational DoS attacks, the
+ Retry packet provides a cheap token exchange mechanism that allows
+ servers to validate a client's IP address prior to doing any
+ expensive computations at the cost of a single round trip. After a
+ successful handshake, servers can issue new tokens to a client, which
+ will allow new connection establishment without incurring this cost.
+
+21.1.1.3. On-Path Handshake Termination
+
+ An on-path or off-path attacker can force a handshake to fail by
+ replacing or racing Initial packets. Once valid Initial packets have
+ been exchanged, subsequent Handshake packets are protected with the
+ Handshake keys, and an on-path attacker cannot force handshake
+ failure other than by dropping packets to cause endpoints to abandon
+ the attempt.
+
+ An on-path attacker can also replace the addresses of packets on
+ either side and therefore cause the client or server to have an
+ incorrect view of the remote addresses. Such an attack is
+ indistinguishable from the functions performed by a NAT.
+
+21.1.1.4. Parameter Negotiation
+
+ The entire handshake is cryptographically protected, with the Initial
+ packets being encrypted with per-version keys and the Handshake and
+ later packets being encrypted with keys derived from the TLS key
+ exchange. Further, parameter negotiation is folded into the TLS
+ transcript and thus provides the same integrity guarantees as
+ ordinary TLS negotiation. An attacker can observe the client's
+ transport parameters (as long as it knows the version-specific salt)
+ but cannot observe the server's transport parameters and cannot
+ influence parameter negotiation.
+
+ Connection IDs are unencrypted but integrity protected in all
+ packets.
+
+ This version of QUIC does not incorporate a version negotiation
+ mechanism; implementations of incompatible versions will simply fail
+ to establish a connection.
+
+21.1.2. Protected Packets
+
+ Packet protection (Section 12.1) applies authenticated encryption to
+ all packets except Version Negotiation packets, though Initial and
+ Retry packets have limited protection due to the use of version-
+ specific keying material; see [QUIC-TLS] for more details. This
+ section considers passive and active attacks against protected
+ packets.
+
+ Both on-path and off-path attackers can mount a passive attack in
+ which they save observed packets for an offline attack against packet
+ protection at a future time; this is true for any observer of any
+ packet on any network.
+
+ An attacker that injects packets without being able to observe valid
+ packets for a connection is unlikely to be successful, since packet
+ protection ensures that valid packets are only generated by endpoints
+ that possess the key material established during the handshake; see
+ Sections 7 and 21.1.1. Similarly, any active attacker that observes
+ packets and attempts to insert new data or modify existing data in
+ those packets should not be able to generate packets deemed valid by
+ the receiving endpoint, other than Initial packets.
+
+ A spoofing attack, in which an active attacker rewrites unprotected
+ parts of a packet that it forwards or injects, such as the source or
+ destination address, is only effective if the attacker can forward
+ packets to the original endpoint. Packet protection ensures that the
+ packet payloads can only be processed by the endpoints that completed
+ the handshake, and invalid packets are ignored by those endpoints.
+
+ An attacker can also modify the boundaries between packets and UDP
+ datagrams, causing multiple packets to be coalesced into a single
+ datagram or splitting coalesced packets into multiple datagrams.
+ Aside from datagrams containing Initial packets, which require
+ padding, modification of how packets are arranged in datagrams has no
+ functional effect on a connection, although it might change some
+ performance characteristics.
+
+21.1.3. Connection Migration
+
+ Connection migration (Section 9) provides endpoints with the ability
+ to transition between IP addresses and ports on multiple paths, using
+ one path at a time for transmission and receipt of non-probing
+ frames. Path validation (Section 8.2) establishes that a peer is
+ both willing and able to receive packets sent on a particular path.
+ This helps reduce the effects of address spoofing by limiting the
+ number of packets sent to a spoofed address.
+
+ This section describes the intended security properties of connection
+ migration under various types of DoS attacks.
+
+21.1.3.1. On-Path Active Attacks
+
+ An attacker that can cause a packet it observes to no longer reach
+ its intended destination is considered an on-path attacker. When an
+ attacker is present between a client and server, endpoints are
+ required to send packets through the attacker to establish
+ connectivity on a given path.
+
+ An on-path attacker can:
+
+ * Inspect packets
+
+ * Modify IP and UDP packet headers
+
+ * Inject new packets
+
+ * Delay packets
+
+ * Reorder packets
+
+ * Drop packets
+
+ * Split and merge datagrams along packet boundaries
+
+ An on-path attacker cannot:
+
+ * Modify an authenticated portion of a packet and cause the
+ recipient to accept that packet
+
+ An on-path attacker has the opportunity to modify the packets that it
+ observes; however, any modifications to an authenticated portion of a
+ packet will cause it to be dropped by the receiving endpoint as
+ invalid, as packet payloads are both authenticated and encrypted.
+
+ QUIC aims to constrain the capabilities of an on-path attacker as
+ follows:
+
+ 1. An on-path attacker can prevent the use of a path for a
+ connection, causing the connection to fail if it cannot use a
+ different path that does not contain the attacker. This can be
+ achieved by dropping all packets, modifying them so that they
+ fail to decrypt, or other methods.
+
+ 2. An on-path attacker can prevent migration to a new path for which
+ the attacker is also on-path by causing path validation to fail
+ on the new path.
+
+ 3. An on-path attacker cannot prevent a client from migrating to a
+ path for which the attacker is not on-path.
+
+ 4. An on-path attacker can reduce the throughput of a connection by
+ delaying packets or dropping them.
+
+ 5. An on-path attacker cannot cause an endpoint to accept a packet
+ for which it has modified an authenticated portion of that
+ packet.
+
+21.1.3.2. Off-Path Active Attacks
+
+ An off-path attacker is not directly on the path between a client and
+ server but could be able to obtain copies of some or all packets sent
+ between the client and the server. It is also able to send copies of
+ those packets to either endpoint.
+
+ An off-path attacker can:
+
+ * Inspect packets
+
+ * Inject new packets
+
+ * Reorder injected packets
+
+ An off-path attacker cannot:
+
+ * Modify packets sent by endpoints
+
+ * Delay packets
+
+ * Drop packets
+
+ * Reorder original packets
+
+ An off-path attacker can create modified copies of packets that it
+ has observed and inject those copies into the network, potentially
+ with spoofed source and destination addresses.
+
+ For the purposes of this discussion, it is assumed that an off-path
+ attacker has the ability to inject a modified copy of a packet into
+ the network that will reach the destination endpoint prior to the
+ arrival of the original packet observed by the attacker. In other
+ words, an attacker has the ability to consistently "win" a race with
+ the legitimate packets between the endpoints, potentially causing the
+ original packet to be ignored by the recipient.
+
+ It is also assumed that an attacker has the resources necessary to
+ affect NAT state. In particular, an attacker can cause an endpoint
+ to lose its NAT binding and then obtain the same port for use with
+ its own traffic.
+
+ QUIC aims to constrain the capabilities of an off-path attacker as
+ follows:
+
+ 1. An off-path attacker can race packets and attempt to become a
+ "limited" on-path attacker.
+
+ 2. An off-path attacker can cause path validation to succeed for
+ forwarded packets with the source address listed as the off-path
+ attacker as long as it can provide improved connectivity between
+ the client and the server.
+
+ 3. An off-path attacker cannot cause a connection to close once the
+ handshake has completed.
+
+ 4. An off-path attacker cannot cause migration to a new path to fail
+ if it cannot observe the new path.
+
+ 5. An off-path attacker can become a limited on-path attacker during
+ migration to a new path for which it is also an off-path
+ attacker.
+
+ 6. An off-path attacker can become a limited on-path attacker by
+ affecting shared NAT state such that it sends packets to the
+ server from the same IP address and port that the client
+ originally used.
+
+21.1.3.3. Limited On-Path Active Attacks
+
+ A limited on-path attacker is an off-path attacker that has offered
+ improved routing of packets by duplicating and forwarding original
+ packets between the server and the client, causing those packets to
+ arrive before the original copies such that the original packets are
+ dropped by the destination endpoint.
+
+ A limited on-path attacker differs from an on-path attacker in that
+ it is not on the original path between endpoints, and therefore the
+ original packets sent by an endpoint are still reaching their
+ destination. This means that a future failure to route copied
+ packets to the destination faster than their original path will not
+ prevent the original packets from reaching the destination.
+
+ A limited on-path attacker can:
+
+ * Inspect packets
+
+ * Inject new packets
+
+ * Modify unencrypted packet headers
+
+ * Reorder packets
+
+ A limited on-path attacker cannot:
+
+ * Delay packets so that they arrive later than packets sent on the
+ original path
+
+ * Drop packets
+
+ * Modify the authenticated and encrypted portion of a packet and
+ cause the recipient to accept that packet
+
+ A limited on-path attacker can only delay packets up to the point
+ that the original packets arrive before the duplicate packets,
+ meaning that it cannot offer routing with worse latency than the
+ original path. If a limited on-path attacker drops packets, the
+ original copy will still arrive at the destination endpoint.
+
+ QUIC aims to constrain the capabilities of a limited off-path
+ attacker as follows:
+
+ 1. A limited on-path attacker cannot cause a connection to close
+ once the handshake has completed.
+
+ 2. A limited on-path attacker cannot cause an idle connection to
+ close if the client is first to resume activity.
+
+ 3. A limited on-path attacker can cause an idle connection to be
+ deemed lost if the server is the first to resume activity.
+
+ Note that these guarantees are the same guarantees provided for any
+ NAT, for the same reasons.
+
+21.2. Handshake Denial of Service
+
+ As an encrypted and authenticated transport, QUIC provides a range of
+ protections against denial of service. Once the cryptographic
+ handshake is complete, QUIC endpoints discard most packets that are
+ not authenticated, greatly limiting the ability of an attacker to
+ interfere with existing connections.
+
+ Once a connection is established, QUIC endpoints might accept some
+ unauthenticated ICMP packets (see Section 14.2.1), but the use of
+ these packets is extremely limited. The only other type of packet
+ that an endpoint might accept is a stateless reset (Section 10.3),
+ which relies on the token being kept secret until it is used.
+
+ During the creation of a connection, QUIC only provides protection
+ against attacks from off the network path. All QUIC packets contain
+ proof that the recipient saw a preceding packet from its peer.
+
+ Addresses cannot change during the handshake, so endpoints can
+ discard packets that are received on a different network path.
+
+ The Source and Destination Connection ID fields are the primary means
+ of protection against an off-path attack during the handshake; see
+ Section 8.1. These are required to match those set by a peer.
+ Except for Initial and Stateless Resets, an endpoint only accepts
+ packets that include a Destination Connection ID field that matches a
+ value the endpoint previously chose. This is the only protection
+ offered for Version Negotiation packets.
+
+ The Destination Connection ID field in an Initial packet is selected
+ by a client to be unpredictable, which serves an additional purpose.
+ The packets that carry the cryptographic handshake are protected with
+ a key that is derived from this connection ID and a salt specific to
+ the QUIC version. This allows endpoints to use the same process for
+ authenticating packets that they receive as they use after the
+ cryptographic handshake completes. Packets that cannot be
+ authenticated are discarded. Protecting packets in this fashion
+ provides a strong assurance that the sender of the packet saw the
+ Initial packet and understood it.
+
+ These protections are not intended to be effective against an
+ attacker that is able to receive QUIC packets prior to the connection
+ being established. Such an attacker can potentially send packets
+ that will be accepted by QUIC endpoints. This version of QUIC
+ attempts to detect this sort of attack, but it expects that endpoints
+ will fail to establish a connection rather than recovering. For the
+ most part, the cryptographic handshake protocol [QUIC-TLS] is
+ responsible for detecting tampering during the handshake.
+
+ Endpoints are permitted to use other methods to detect and attempt to
+ recover from interference with the handshake. Invalid packets can be
+ identified and discarded using other methods, but no specific method
+ is mandated in this document.
+
+21.3. Amplification Attack
+
+ An attacker might be able to receive an address validation token
+ (Section 8) from a server and then release the IP address it used to
+ acquire that token. At a later time, the attacker can initiate a
+ 0-RTT connection with a server by spoofing this same address, which
+ might now address a different (victim) endpoint. The attacker can
+ thus potentially cause the server to send an initial congestion
+ window's worth of data towards the victim.
+
+ Servers SHOULD provide mitigations for this attack by limiting the
+ usage and lifetime of address validation tokens; see Section 8.1.3.
+
+21.4. Optimistic ACK Attack
+
+ An endpoint that acknowledges packets it has not received might cause
+ a congestion controller to permit sending at rates beyond what the
+ network supports. An endpoint MAY skip packet numbers when sending
+ packets to detect this behavior. An endpoint can then immediately
+ close the connection with a connection error of type
+ PROTOCOL_VIOLATION; see Section 10.2.
+
+21.5. Request Forgery Attacks
+
+ A request forgery attack occurs where an endpoint causes its peer to
+ issue a request towards a victim, with the request controlled by the
+ endpoint. Request forgery attacks aim to provide an attacker with
+ access to capabilities of its peer that might otherwise be
+ unavailable to the attacker. For a networking protocol, a request
+ forgery attack is often used to exploit any implicit authorization
+ conferred on the peer by the victim due to the peer's location in the
+ network.
+
+ For request forgery to be effective, an attacker needs to be able to
+ influence what packets the peer sends and where these packets are
+ sent. If an attacker can target a vulnerable service with a
+ controlled payload, that service might perform actions that are
+ attributed to the attacker's peer but are decided by the attacker.
+
+ For example, cross-site request forgery [CSRF] exploits on the Web
+ cause a client to issue requests that include authorization cookies
+ [COOKIE], allowing one site access to information and actions that
+ are intended to be restricted to a different site.
+
+ As QUIC runs over UDP, the primary attack modality of concern is one
+ where an attacker can select the address to which its peer sends UDP
+ datagrams and can control some of the unprotected content of those
+ packets. As much of the data sent by QUIC endpoints is protected,
+ this includes control over ciphertext. An attack is successful if an
+ attacker can cause a peer to send a UDP datagram to a host that will
+ perform some action based on content in the datagram.
+
+ This section discusses ways in which QUIC might be used for request
+ forgery attacks.
+
+ This section also describes limited countermeasures that can be
+ implemented by QUIC endpoints. These mitigations can be employed
+ unilaterally by a QUIC implementation or deployment, without
+ potential targets for request forgery attacks taking action.
+ However, these countermeasures could be insufficient if UDP-based
+ services do not properly authorize requests.
+
+ Because the migration attack described in Section 21.5.4 is quite
+ powerful and does not have adequate countermeasures, QUIC server
+ implementations should assume that attackers can cause them to
+ generate arbitrary UDP payloads to arbitrary destinations. QUIC
+ servers SHOULD NOT be deployed in networks that do not deploy ingress
+ filtering [BCP38] and also have inadequately secured UDP endpoints.
+
+ Although it is not generally possible to ensure that clients are not
+ co-located with vulnerable endpoints, this version of QUIC does not
+ allow servers to migrate, thus preventing spoofed migration attacks
+ on clients. Any future extension that allows server migration MUST
+ also define countermeasures for forgery attacks.
+
+21.5.1. Control Options for Endpoints
+
+ QUIC offers some opportunities for an attacker to influence or
+ control where its peer sends UDP datagrams:
+
+ * initial connection establishment (Section 7), where a server is
+ able to choose where a client sends datagrams -- for example, by
+ populating DNS records;
+
+ * preferred addresses (Section 9.6), where a server is able to
+ choose where a client sends datagrams;
+
+ * spoofed connection migrations (Section 9.3.1), where a client is
+ able to use source address spoofing to select where a server sends
+ subsequent datagrams; and
+
+ * spoofed packets that cause a server to send a Version Negotiation
+ packet (Section 21.5.5).
+
+ In all cases, the attacker can cause its peer to send datagrams to a
+ victim that might not understand QUIC. That is, these packets are
+ sent by the peer prior to address validation; see Section 8.
+
+ Outside of the encrypted portion of packets, QUIC offers an endpoint
+ several options for controlling the content of UDP datagrams that its
+ peer sends. The Destination Connection ID field offers direct
+ control over bytes that appear early in packets sent by the peer; see
+ Section 5.1. The Token field in Initial packets offers a server
+ control over other bytes of Initial packets; see Section 17.2.2.
+
+ There are no measures in this version of QUIC to prevent indirect
+ control over the encrypted portions of packets. It is necessary to
+ assume that endpoints are able to control the contents of frames that
+ a peer sends, especially those frames that convey application data,
+ such as STREAM frames. Though this depends to some degree on details
+ of the application protocol, some control is possible in many
+ protocol usage contexts. As the attacker has access to packet
+ protection keys, they are likely to be capable of predicting how a
+ peer will encrypt future packets. Successful control over datagram
+ content then only requires that the attacker be able to predict the
+ packet number and placement of frames in packets with some amount of
+ reliability.
+
+ This section assumes that limiting control over datagram content is
+ not feasible. The focus of the mitigations in subsequent sections is
+ on limiting the ways in which datagrams that are sent prior to
+ address validation can be used for request forgery.
+
+21.5.2. Request Forgery with Client Initial Packets
+
+ An attacker acting as a server can choose the IP address and port on
+ which it advertises its availability, so Initial packets from clients
+ are assumed to be available for use in this sort of attack. The
+ address validation implicit in the handshake ensures that -- for a
+ new connection -- a client will not send other types of packets to a
+ destination that does not understand QUIC or is not willing to accept
+ a QUIC connection.
+
+ Initial packet protection (Section 5.2 of [QUIC-TLS]) makes it
+ difficult for servers to control the content of Initial packets sent
+ by clients. A client choosing an unpredictable Destination
+ Connection ID ensures that servers are unable to control any of the
+ encrypted portion of Initial packets from clients.
+
+ However, the Token field is open to server control and does allow a
+ server to use clients to mount request forgery attacks. The use of
+ tokens provided with the NEW_TOKEN frame (Section 8.1.3) offers the
+ only option for request forgery during connection establishment.
+
+ Clients, however, are not obligated to use the NEW_TOKEN frame.
+ Request forgery attacks that rely on the Token field can be avoided
+ if clients send an empty Token field when the server address has
+ changed from when the NEW_TOKEN frame was received.
+
+ Clients could avoid using NEW_TOKEN if the server address changes.
+ However, not including a Token field could adversely affect
+ performance. Servers could rely on NEW_TOKEN to enable the sending
+ of data in excess of the three-times limit on sending data; see
+ Section 8.1. In particular, this affects cases where clients use
+ 0-RTT to request data from servers.
+
+ Sending a Retry packet (Section 17.2.5) offers a server the option to
+ change the Token field. After sending a Retry, the server can also
+ control the Destination Connection ID field of subsequent Initial
+ packets from the client. This also might allow indirect control over
+ the encrypted content of Initial packets. However, the exchange of a
+ Retry packet validates the server's address, thereby preventing the
+ use of subsequent Initial packets for request forgery.
+
+21.5.3. Request Forgery with Preferred Addresses
+
+ Servers can specify a preferred address, which clients then migrate
+ to after confirming the handshake; see Section 9.6. The Destination
+ Connection ID field of packets that the client sends to a preferred
+ address can be used for request forgery.
+
+ A client MUST NOT send non-probing frames to a preferred address
+ prior to validating that address; see Section 8. This greatly
+ reduces the options that a server has to control the encrypted
+ portion of datagrams.
+
+ This document does not offer any additional countermeasures that are
+ specific to the use of preferred addresses and can be implemented by
+ endpoints. The generic measures described in Section 21.5.6 could be
+ used as further mitigation.
+
+21.5.4. Request Forgery with Spoofed Migration
+
+ Clients are able to present a spoofed source address as part of an
+ apparent connection migration to cause a server to send datagrams to
+ that address.
+
+ The Destination Connection ID field in any packets that a server
+ subsequently sends to this spoofed address can be used for request
+ forgery. A client might also be able to influence the ciphertext.
+
+ A server that only sends probing packets (Section 9.1) to an address
+ prior to address validation provides an attacker with only limited
+ control over the encrypted portion of datagrams. However,
+ particularly for NAT rebinding, this can adversely affect
+ performance. If the server sends frames carrying application data,
+ an attacker might be able to control most of the content of
+ datagrams.
+
+ This document does not offer specific countermeasures that can be
+ implemented by endpoints, aside from the generic measures described
+ in Section 21.5.6. However, countermeasures for address spoofing at
+ the network level -- in particular, ingress filtering [BCP38] -- are
+ especially effective against attacks that use spoofing and originate
+ from an external network.
+
+21.5.5. Request Forgery with Version Negotiation
+
+ Clients that are able to present a spoofed source address on a packet
+ can cause a server to send a Version Negotiation packet
+ (Section 17.2.1) to that address.
+
+ The absence of size restrictions on the connection ID fields for
+ packets of an unknown version increases the amount of data that the
+ client controls from the resulting datagram. The first byte of this
+ packet is not under client control and the next four bytes are zero,
+ but the client is able to control up to 512 bytes starting from the
+ fifth byte.
+
+ No specific countermeasures are provided for this attack, though
+ generic protections (Section 21.5.6) could apply. In this case,
+ ingress filtering [BCP38] is also effective.
+
+21.5.6. Generic Request Forgery Countermeasures
+
+ The most effective defense against request forgery attacks is to
+ modify vulnerable services to use strong authentication. However,
+ this is not always something that is within the control of a QUIC
+ deployment. This section outlines some other steps that QUIC
+ endpoints could take unilaterally. These additional steps are all
+ discretionary because, depending on circumstances, they could
+ interfere with or prevent legitimate uses.
+
+ Services offered over loopback interfaces often lack proper
+ authentication. Endpoints MAY prevent connection attempts or
+ migration to a loopback address. Endpoints SHOULD NOT allow
+ connections or migration to a loopback address if the same service
+ was previously available at a different interface or if the address
+ was provided by a service at a non-loopback address. Endpoints that
+ depend on these capabilities could offer an option to disable these
+ protections.
+
+ Similarly, endpoints could regard a change in address to a link-local
+ address [RFC4291] or an address in a private-use range [RFC1918] from
+ a global, unique-local [RFC4193], or non-private address as a
+ potential attempt at request forgery. Endpoints could refuse to use
+ these addresses entirely, but that carries a significant risk of
+ interfering with legitimate uses. Endpoints SHOULD NOT refuse to use
+ an address unless they have specific knowledge about the network
+ indicating that sending datagrams to unvalidated addresses in a given
+ range is not safe.
+
+ Endpoints MAY choose to reduce the risk of request forgery by not
+ including values from NEW_TOKEN frames in Initial packets or by only
+ sending probing frames in packets prior to completing address
+ validation. Note that this does not prevent an attacker from using
+ the Destination Connection ID field for an attack.
+
+ Endpoints are not expected to have specific information about the
+ location of servers that could be vulnerable targets of a request
+ forgery attack. However, it might be possible over time to identify
+ specific UDP ports that are common targets of attacks or particular
+ patterns in datagrams that are used for attacks. Endpoints MAY
+ choose to avoid sending datagrams to these ports or not send
+ datagrams that match these patterns prior to validating the
+ destination address. Endpoints MAY retire connection IDs containing
+ patterns known to be problematic without using them.
+
+ | Note: Modifying endpoints to apply these protections is more
+ | efficient than deploying network-based protections, as
+ | endpoints do not need to perform any additional processing when
+ | sending to an address that has been validated.
+
+21.6. Slowloris Attacks
+
+ The attacks commonly known as Slowloris [SLOWLORIS] try to keep many
+ connections to the target endpoint open and hold them open as long as
+ possible. These attacks can be executed against a QUIC endpoint by
+ generating the minimum amount of activity necessary to avoid being
+ closed for inactivity. This might involve sending small amounts of
+ data, gradually opening flow control windows in order to control the
+ sender rate, or manufacturing ACK frames that simulate a high loss
+ rate.
+
+ QUIC deployments SHOULD provide mitigations for the Slowloris
+ attacks, such as increasing the maximum number of clients the server
+ will allow, limiting the number of connections a single IP address is
+ allowed to make, imposing restrictions on the minimum transfer speed
+ a connection is allowed to have, and restricting the length of time
+ an endpoint is allowed to stay connected.
+
+21.7. Stream Fragmentation and Reassembly Attacks
+
+ An adversarial sender might intentionally not send portions of the
+ stream data, causing the receiver to commit resources for the unsent
+ data. This could cause a disproportionate receive buffer memory
+ commitment and/or the creation of a large and inefficient data
+ structure at the receiver.
+
+ An adversarial receiver might intentionally not acknowledge packets
+ containing stream data in an attempt to force the sender to store the
+ unacknowledged stream data for retransmission.
+
+ The attack on receivers is mitigated if flow control windows
+ correspond to available memory. However, some receivers will
+ overcommit memory and advertise flow control offsets in the aggregate
+ that exceed actual available memory. The overcommitment strategy can
+ lead to better performance when endpoints are well behaved, but
+ renders endpoints vulnerable to the stream fragmentation attack.
+
+ QUIC deployments SHOULD provide mitigations for stream fragmentation
+ attacks. Mitigations could consist of avoiding overcommitting
+ memory, limiting the size of tracking data structures, delaying
+ reassembly of STREAM frames, implementing heuristics based on the age
+ and duration of reassembly holes, or some combination of these.
+
+21.8. Stream Commitment Attack
+
+ An adversarial endpoint can open a large number of streams,
+ exhausting state on an endpoint. The adversarial endpoint could
+ repeat the process on a large number of connections, in a manner
+ similar to SYN flooding attacks in TCP.
+
+ Normally, clients will open streams sequentially, as explained in
+ Section 2.1. However, when several streams are initiated at short
+ intervals, loss or reordering can cause STREAM frames that open
+ streams to be received out of sequence. On receiving a higher-
+ numbered stream ID, a receiver is required to open all intervening
+ streams of the same type; see Section 3.2. Thus, on a new
+ connection, opening stream 4000000 opens 1 million and 1 client-
+ initiated bidirectional streams.
+
+ The number of active streams is limited by the
+ initial_max_streams_bidi and initial_max_streams_uni transport
+ parameters as updated by any received MAX_STREAMS frames, as
+ explained in Section 4.6. If chosen judiciously, these limits
+ mitigate the effect of the stream commitment attack. However,
+ setting the limit too low could affect performance when applications
+ expect to open a large number of streams.
+
+21.9. Peer Denial of Service
+
+ QUIC and TLS both contain frames or messages that have legitimate
+ uses in some contexts, but these frames or messages can be abused to
+ cause a peer to expend processing resources without having any
+ observable impact on the state of the connection.
+
+ Messages can also be used to change and revert state in small or
+ inconsequential ways, such as by sending small increments to flow
+ control limits.
+
+ If processing costs are disproportionately large in comparison to
+ bandwidth consumption or effect on state, then this could allow a
+ malicious peer to exhaust processing capacity.
+
+ While there are legitimate uses for all messages, implementations
+ SHOULD track cost of processing relative to progress and treat
+ excessive quantities of any non-productive packets as indicative of
+ an attack. Endpoints MAY respond to this condition with a connection
+ error or by dropping packets.
+
+21.10. Explicit Congestion Notification Attacks
+
+ An on-path attacker could manipulate the value of ECN fields in the
+ IP header to influence the sender's rate. [RFC3168] discusses
+ manipulations and their effects in more detail.
+
+ A limited on-path attacker can duplicate and send packets with
+ modified ECN fields to affect the sender's rate. If duplicate
+ packets are discarded by a receiver, an attacker will need to race
+ the duplicate packet against the original to be successful in this
+ attack. Therefore, QUIC endpoints ignore the ECN field in an IP
+ packet unless at least one QUIC packet in that IP packet is
+ successfully processed; see Section 13.4.
+
+21.11. Stateless Reset Oracle
+
+ Stateless resets create a possible denial-of-service attack analogous
+ to a TCP reset injection. This attack is possible if an attacker is
+ able to cause a stateless reset token to be generated for a
+ connection with a selected connection ID. An attacker that can cause
+ this token to be generated can reset an active connection with the
+ same connection ID.
+
+ If a packet can be routed to different instances that share a static
+ key -- for example, by changing an IP address or port -- then an
+ attacker can cause the server to send a stateless reset. To defend
+ against this style of denial of service, endpoints that share a
+ static key for stateless resets (see Section 10.3.2) MUST be arranged
+ so that packets with a given connection ID always arrive at an
+ instance that has connection state, unless that connection is no
+ longer active.
+
+ More generally, servers MUST NOT generate a stateless reset if a
+ connection with the corresponding connection ID could be active on
+ any endpoint using the same static key.
+
+ In the case of a cluster that uses dynamic load balancing, it is
+ possible that a change in load-balancer configuration could occur
+ while an active instance retains connection state. Even if an
+ instance retains connection state, the change in routing and
+ resulting stateless reset will result in the connection being
+ terminated. If there is no chance of the packet being routed to the
+ correct instance, it is better to send a stateless reset than wait
+ for the connection to time out. However, this is acceptable only if
+ the routing cannot be influenced by an attacker.
+
+21.12. Version Downgrade
+
+ This document defines QUIC Version Negotiation packets (Section 6),
+ which can be used to negotiate the QUIC version used between two
+ endpoints. However, this document does not specify how this
+ negotiation will be performed between this version and subsequent
+ future versions. In particular, Version Negotiation packets do not
+ contain any mechanism to prevent version downgrade attacks. Future
+ versions of QUIC that use Version Negotiation packets MUST define a
+ mechanism that is robust against version downgrade attacks.
+
+21.13. Targeted Attacks by Routing
+
+ Deployments should limit the ability of an attacker to target a new
+ connection to a particular server instance. Ideally, routing
+ decisions are made independently of client-selected values, including
+ addresses. Once an instance is selected, a connection ID can be
+ selected so that later packets are routed to the same instance.
+
+21.14. Traffic Analysis
+
+ The length of QUIC packets can reveal information about the length of
+ the content of those packets. The PADDING frame is provided so that
+ endpoints have some ability to obscure the length of packet content;
+ see Section 19.1.
+
+ Defeating traffic analysis is challenging and the subject of active
+ research. Length is not the only way that information might leak.
+ Endpoints might also reveal sensitive information through other side
+ channels, such as the timing of packets.
+
+22. IANA Considerations
+
+ This document establishes several registries for the management of
+ codepoints in QUIC. These registries operate on a common set of
+ policies as defined in Section 22.1.
+
+22.1. Registration Policies for QUIC Registries
+
+ All QUIC registries allow for both provisional and permanent
+ registration of codepoints. This section documents policies that are
+ common to these registries.
+
+22.1.1. Provisional Registrations
+
+ Provisional registrations of codepoints are intended to allow for
+ private use and experimentation with extensions to QUIC. Provisional
+ registrations only require the inclusion of the codepoint value and
+ contact information. However, provisional registrations could be
+ reclaimed and reassigned for another purpose.
+
+ Provisional registrations require Expert Review, as defined in
+ Section 4.5 of [RFC8126]. The designated expert or experts are
+ advised that only registrations for an excessive proportion of
+ remaining codepoint space or the very first unassigned value (see
+ Section 22.1.2) can be rejected.
+
+ Provisional registrations will include a Date field that indicates
+ when the registration was last updated. A request to update the date
+ on any provisional registration can be made without review from the
+ designated expert(s).
+
+ All QUIC registries include the following fields to support
+ provisional registration:
+
+ Value: The assigned codepoint.
+ Status: "permanent" or "provisional".
+ Specification: A reference to a publicly available specification for
+ the value.
+ Date: The date of the last update to the registration.
+ Change Controller: The entity that is responsible for the definition
+ of the registration.
+ Contact: Contact details for the registrant.
+ Notes: Supplementary notes about the registration.
+
+ Provisional registrations MAY omit the Specification and Notes
+ fields, plus any additional fields that might be required for a
+ permanent registration. The Date field is not required as part of
+ requesting a registration, as it is set to the date the registration
+ is created or updated.
+
+22.1.2. Selecting Codepoints
+
+ New requests for codepoints from QUIC registries SHOULD use a
+ randomly selected codepoint that excludes both existing allocations
+ and the first unallocated codepoint in the selected space. Requests
+ for multiple codepoints MAY use a contiguous range. This minimizes
+ the risk that differing semantics are attributed to the same
+ codepoint by different implementations.
+
+ The use of the first unassigned codepoint is reserved for allocation
+ using the Standards Action policy; see Section 4.9 of [RFC8126]. The
+ early codepoint assignment process [EARLY-ASSIGN] can be used for
+ these values.
+
+ For codepoints that are encoded in variable-length integers
+ (Section 16), such as frame types, codepoints that encode to four or
+ eight bytes (that is, values 2^14 and above) SHOULD be used unless
+ the usage is especially sensitive to having a longer encoding.
+
+ Applications to register codepoints in QUIC registries MAY include a
+ requested codepoint as part of the registration. IANA MUST allocate
+ the selected codepoint if the codepoint is unassigned and the
+ requirements of the registration policy are met.
+
+22.1.3. Reclaiming Provisional Codepoints
+
+ A request might be made to remove an unused provisional registration
+ from the registry to reclaim space in a registry, or a portion of the
+ registry (such as the 64-16383 range for codepoints that use
+ variable-length encodings). This SHOULD be done only for the
+ codepoints with the earliest recorded date, and entries that have
+ been updated less than a year prior SHOULD NOT be reclaimed.
+
+ A request to remove a codepoint MUST be reviewed by the designated
+ experts. The experts MUST attempt to determine whether the codepoint
+ is still in use. Experts are advised to contact the listed contacts
+ for the registration, plus as wide a set of protocol implementers as
+ possible in order to determine whether any use of the codepoint is
+ known. The experts are also advised to allow at least four weeks for
+ responses.
+
+ If any use of the codepoints is identified by this search or a
+ request to update the registration is made, the codepoint MUST NOT be
+ reclaimed. Instead, the date on the registration is updated. A note
+ might be added for the registration recording relevant information
+ that was learned.
+
+ If no use of the codepoint was identified and no request was made to
+ update the registration, the codepoint MAY be removed from the
+ registry.
+
+ This review and consultation process also applies to requests to
+ change a provisional registration into a permanent registration,
+ except that the goal is not to determine whether there is no use of
+ the codepoint but to determine that the registration is an accurate
+ representation of any deployed usage.
+
+22.1.4. Permanent Registrations
+
+ Permanent registrations in QUIC registries use the Specification
+ Required policy (Section 4.6 of [RFC8126]), unless otherwise
+ specified. The designated expert or experts verify that a
+ specification exists and is readily accessible. Experts are
+ encouraged to be biased towards approving registrations unless they
+ are abusive, frivolous, or actively harmful (not merely aesthetically
+ displeasing or architecturally dubious). The creation of a registry
+ MAY specify additional constraints on permanent registrations.
+
+ The creation of a registry MAY identify a range of codepoints where
+ registrations are governed by a different registration policy. For
+ instance, the "QUIC Frame Types" registry (Section 22.4) has a
+ stricter policy for codepoints in the range from 0 to 63.
+
+ Any stricter requirements for permanent registrations do not prevent
+ provisional registrations for affected codepoints. For instance, a
+ provisional registration for a frame type of 61 could be requested.
+
+ All registrations made by Standards Track publications MUST be
+ permanent.
+
+ All registrations in this document are assigned a permanent status
+ and list a change controller of the IETF and a contact of the QUIC
+ Working Group (quic@ietf.org).
+
+22.2. QUIC Versions Registry
+
+ IANA has added a registry for "QUIC Versions" under a "QUIC" heading.
+
+ The "QUIC Versions" registry governs a 32-bit space; see Section 15.
+ This registry follows the registration policy from Section 22.1.
+ Permanent registrations in this registry are assigned using the
+ Specification Required policy (Section 4.6 of [RFC8126]).
+
+ The codepoint of 0x00000001 for the protocol is assigned with
+ permanent status to the protocol defined in this document. The
+ codepoint of 0x00000000 is permanently reserved; the note for this
+ codepoint indicates that this version is reserved for version
+ negotiation.
+
+ All codepoints that follow the pattern 0x?a?a?a?a are reserved, MUST
+ NOT be assigned by IANA, and MUST NOT appear in the listing of
+ assigned values.
+
+22.3. QUIC Transport Parameters Registry
+
+ IANA has added a registry for "QUIC Transport Parameters" under a
+ "QUIC" heading.
+
+ The "QUIC Transport Parameters" registry governs a 62-bit space.
+ This registry follows the registration policy from Section 22.1.
+ Permanent registrations in this registry are assigned using the
+ Specification Required policy (Section 4.6 of [RFC8126]), except for
+ values between 0x00 and 0x3f (in hexadecimal), inclusive, which are
+ assigned using Standards Action or IESG Approval as defined in
+ Sections 4.9 and 4.10 of [RFC8126].
+
+ In addition to the fields listed in Section 22.1.1, permanent
+ registrations in this registry MUST include the following field:
+
+ Parameter Name: A short mnemonic for the parameter.
+
+ The initial contents of this registry are shown in Table 6.
+
+ +=======+=====================================+===============+
+ | Value | Parameter Name | Specification |
+ +=======+=====================================+===============+
+ | 0x00 | original_destination_connection_id | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x01 | max_idle_timeout | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x02 | stateless_reset_token | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x03 | max_udp_payload_size | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x04 | initial_max_data | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x05 | initial_max_stream_data_bidi_local | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x06 | initial_max_stream_data_bidi_remote | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x07 | initial_max_stream_data_uni | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x08 | initial_max_streams_bidi | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x09 | initial_max_streams_uni | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x0a | ack_delay_exponent | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x0b | max_ack_delay | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x0c | disable_active_migration | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x0d | preferred_address | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x0e | active_connection_id_limit | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x0f | initial_source_connection_id | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+ | 0x10 | retry_source_connection_id | Section 18.2 |
+ +-------+-------------------------------------+---------------+
+
+ Table 6: Initial QUIC Transport Parameters Registry Entries
+
+ Each value of the form "31 * N + 27" for integer values of N (that
+ is, 27, 58, 89, ...) are reserved; these values MUST NOT be assigned
+ by IANA and MUST NOT appear in the listing of assigned values.
+
+22.4. QUIC Frame Types Registry
+
+ IANA has added a registry for "QUIC Frame Types" under a "QUIC"
+ heading.
+
+ The "QUIC Frame Types" registry governs a 62-bit space. This
+ registry follows the registration policy from Section 22.1.
+ Permanent registrations in this registry are assigned using the
+ Specification Required policy (Section 4.6 of [RFC8126]), except for
+ values between 0x00 and 0x3f (in hexadecimal), inclusive, which are
+ assigned using Standards Action or IESG Approval as defined in
+ Sections 4.9 and 4.10 of [RFC8126].
+
+ In addition to the fields listed in Section 22.1.1, permanent
+ registrations in this registry MUST include the following field:
+
+ Frame Type Name: A short mnemonic for the frame type.
+
+ In addition to the advice in Section 22.1, specifications for new
+ permanent registrations SHOULD describe the means by which an
+ endpoint might determine that it can send the identified type of
+ frame. An accompanying transport parameter registration is expected
+ for most registrations; see Section 22.3. Specifications for
+ permanent registrations also need to describe the format and assigned
+ semantics of any fields in the frame.
+
+ The initial contents of this registry are tabulated in Table 3. Note
+ that the registry does not include the "Pkts" and "Spec" columns from
+ Table 3.
+
+22.5. QUIC Transport Error Codes Registry
+
+ IANA has added a registry for "QUIC Transport Error Codes" under a
+ "QUIC" heading.
+
+ The "QUIC Transport Error Codes" registry governs a 62-bit space.
+ This space is split into three ranges that are governed by different
+ policies. Permanent registrations in this registry are assigned
+ using the Specification Required policy (Section 4.6 of [RFC8126]),
+ except for values between 0x00 and 0x3f (in hexadecimal), inclusive,
+ which are assigned using Standards Action or IESG Approval as defined
+ in Sections 4.9 and 4.10 of [RFC8126].
+
+ In addition to the fields listed in Section 22.1.1, permanent
+ registrations in this registry MUST include the following fields:
+
+ Code: A short mnemonic for the parameter.
+
+ Description: A brief description of the error code semantics, which
+ MAY be a summary if a specification reference is provided.
+
+ The initial contents of this registry are shown in Table 7.
+
+ +=======+===========================+================+==============+
+ |Value | Code |Description |Specification |
+ +=======+===========================+================+==============+
+ |0x00 | NO_ERROR |No error |Section 20 |
+ +-------+---------------------------+----------------+--------------+
+ |0x01 | INTERNAL_ERROR |Implementation |Section 20 |
+ | | |error | |
+ +-------+---------------------------+----------------+--------------+
+ |0x02 | CONNECTION_REFUSED |Server refuses a|Section 20 |
+ | | |connection | |
+ +-------+---------------------------+----------------+--------------+
+ |0x03 | FLOW_CONTROL_ERROR |Flow control |Section 20 |
+ | | |error | |
+ +-------+---------------------------+----------------+--------------+
+ |0x04 | STREAM_LIMIT_ERROR |Too many streams|Section 20 |
+ | | |opened | |
+ +-------+---------------------------+----------------+--------------+
+ |0x05 | STREAM_STATE_ERROR |Frame received |Section 20 |
+ | | |in invalid | |
+ | | |stream state | |
+ +-------+---------------------------+----------------+--------------+
+ |0x06 | FINAL_SIZE_ERROR |Change to final |Section 20 |
+ | | |size | |
+ +-------+---------------------------+----------------+--------------+
+ |0x07 | FRAME_ENCODING_ERROR |Frame encoding |Section 20 |
+ | | |error | |
+ +-------+---------------------------+----------------+--------------+
+ |0x08 | TRANSPORT_PARAMETER_ERROR |Error in |Section 20 |
+ | | |transport | |
+ | | |parameters | |
+ +-------+---------------------------+----------------+--------------+
+ |0x09 | CONNECTION_ID_LIMIT_ERROR |Too many |Section 20 |
+ | | |connection IDs | |
+ | | |received | |
+ +-------+---------------------------+----------------+--------------+
+ |0x0a | PROTOCOL_VIOLATION |Generic protocol|Section 20 |
+ | | |violation | |
+ +-------+---------------------------+----------------+--------------+
+ |0x0b | INVALID_TOKEN |Invalid Token |Section 20 |
+ | | |received | |
+ +-------+---------------------------+----------------+--------------+
+ |0x0c | APPLICATION_ERROR |Application |Section 20 |
+ | | |error | |
+ +-------+---------------------------+----------------+--------------+
+ |0x0d | CRYPTO_BUFFER_EXCEEDED |CRYPTO data |Section 20 |
+ | | |buffer | |
+ | | |overflowed | |
+ +-------+---------------------------+----------------+--------------+
+ |0x0e | KEY_UPDATE_ERROR |Invalid packet |Section 20 |
+ | | |protection | |
+ | | |update | |
+ +-------+---------------------------+----------------+--------------+
+ |0x0f | AEAD_LIMIT_REACHED |Excessive use of|Section 20 |
+ | | |packet | |
+ | | |protection keys | |
+ +-------+---------------------------+----------------+--------------+
+ |0x10 | NO_VIABLE_PATH |No viable |Section 20 |
+ | | |network path | |
+ | | |exists | |
+ +-------+---------------------------+----------------+--------------+
+ |0x0100-| CRYPTO_ERROR |TLS alert code |Section 20 |
+ |0x01ff | | | |
+ +-------+---------------------------+----------------+--------------+
+
+ Table 7: Initial QUIC Transport Error Codes Registry Entries
+
+23. References
+
+23.1. Normative References
+
+ [BCP38] Ferguson, P. and D. Senie, "Network Ingress Filtering:
+ Defeating Denial of Service Attacks which employ IP Source
+ Address Spoofing", BCP 38, RFC 2827, May 2000.
+
+ <https://www.rfc-editor.org/info/bcp38>
+
+ [DPLPMTUD] Fairhurst, G., Jones, T., Tüxen, M., Rüngeler, I., and T.
+ Völker, "Packetization Layer Path MTU Discovery for
+ Datagram Transports", RFC 8899, DOI 10.17487/RFC8899,
+ September 2020, <https://www.rfc-editor.org/info/rfc8899>.
+
+ [EARLY-ASSIGN]
+ Cotton, M., "Early IANA Allocation of Standards Track Code
+ Points", BCP 100, RFC 7120, DOI 10.17487/RFC7120, January
+ 2014, <https://www.rfc-editor.org/info/rfc7120>.
+
+ [IPv4] Postel, J., "Internet Protocol", STD 5, RFC 791,
+ DOI 10.17487/RFC0791, September 1981,
+ <https://www.rfc-editor.org/info/rfc791>.
+
+ [QUIC-INVARIANTS]
+ Thomson, M., "Version-Independent Properties of QUIC",
+ RFC 8999, DOI 10.17487/RFC8999, May 2021,
+ <https://www.rfc-editor.org/info/rfc8999>.
+
+ [QUIC-RECOVERY]
+ Iyengar, J., Ed. and I. Swett, Ed., "QUIC Loss Detection
+ and Congestion Control", RFC 9002, DOI 10.17487/RFC9002,
+ May 2021, <https://www.rfc-editor.org/info/rfc9002>.
+
+ [QUIC-TLS] Thomson, M., Ed. and S. Turner, Ed., "Using TLS to Secure
+ QUIC", RFC 9001, DOI 10.17487/RFC9001, May 2021,
+ <https://www.rfc-editor.org/info/rfc9001>.
+
+ [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
+ DOI 10.17487/RFC1191, November 1990,
+ <https://www.rfc-editor.org/info/rfc1191>.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
+ of Explicit Congestion Notification (ECN) to IP",
+ RFC 3168, DOI 10.17487/RFC3168, September 2001,
+ <https://www.rfc-editor.org/info/rfc3168>.
+
+ [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
+ 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
+ 2003, <https://www.rfc-editor.org/info/rfc3629>.
+
+ [RFC6437] Amante, S., Carpenter, B., Jiang, S., and J. Rajahalme,
+ "IPv6 Flow Label Specification", RFC 6437,
+ DOI 10.17487/RFC6437, November 2011,
+ <https://www.rfc-editor.org/info/rfc6437>.
+
+ [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage
+ Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085,
+ March 2017, <https://www.rfc-editor.org/info/rfc8085>.
+
+ [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
+ Writing an IANA Considerations Section in RFCs", BCP 26,
+ RFC 8126, DOI 10.17487/RFC8126, June 2017,
+ <https://www.rfc-editor.org/info/rfc8126>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+ [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed.,
+ "Path MTU Discovery for IP version 6", STD 87, RFC 8201,
+ DOI 10.17487/RFC8201, July 2017,
+ <https://www.rfc-editor.org/info/rfc8201>.
+
+ [RFC8311] Black, D., "Relaxing Restrictions on Explicit Congestion
+ Notification (ECN) Experimentation", RFC 8311,
+ DOI 10.17487/RFC8311, January 2018,
+ <https://www.rfc-editor.org/info/rfc8311>.
+
+ [TLS13] Rescorla, E., "The Transport Layer Security (TLS) Protocol
+ Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018,
+ <https://www.rfc-editor.org/info/rfc8446>.
+
+ [UDP] Postel, J., "User Datagram Protocol", STD 6, RFC 768,
+ DOI 10.17487/RFC0768, August 1980,
+ <https://www.rfc-editor.org/info/rfc768>.
+
+23.2. Informative References
+
+ [AEAD] McGrew, D., "An Interface and Algorithms for Authenticated
+ Encryption", RFC 5116, DOI 10.17487/RFC5116, January 2008,
+ <https://www.rfc-editor.org/info/rfc5116>.
+
+ [ALPN] Friedl, S., Popov, A., Langley, A., and E. Stephan,
+ "Transport Layer Security (TLS) Application-Layer Protocol
+ Negotiation Extension", RFC 7301, DOI 10.17487/RFC7301,
+ July 2014, <https://www.rfc-editor.org/info/rfc7301>.
+
+ [ALTSVC] Nottingham, M., McManus, P., and J. Reschke, "HTTP
+ Alternative Services", RFC 7838, DOI 10.17487/RFC7838,
+ April 2016, <https://www.rfc-editor.org/info/rfc7838>.
+
+ [COOKIE] Barth, A., "HTTP State Management Mechanism", RFC 6265,
+ DOI 10.17487/RFC6265, April 2011,
+ <https://www.rfc-editor.org/info/rfc6265>.
+
+ [CSRF] Barth, A., Jackson, C., and J. Mitchell, "Robust defenses
+ for cross-site request forgery", Proceedings of the 15th
+ ACM conference on Computer and communications security -
+ CCS '08, DOI 10.1145/1455770.1455782, 2008,
+ <https://doi.org/10.1145/1455770.1455782>.
+
+ [EARLY-DESIGN]
+ Roskind, J., "QUIC: Multiplexed Stream Transport Over
+ UDP", 2 December 2013, <https://docs.google.com/document/
+ d/1RNHkx_VvKWyWg6Lr8SZ-saqsQx7rFV-ev2jRFUoVD34/
+ edit?usp=sharing>.
+
+ [GATEWAY] Hätönen, S., Nyrhinen, A., Eggert, L., Strowes, S.,
+ Sarolahti, P., and M. Kojo, "An experimental study of home
+ gateway characteristics", Proceedings of the 10th ACM
+ SIGCOMM conference on Internet measurement - IMC '10,
+ DOI 10.1145/1879141.1879174, November 2010,
+ <https://doi.org/10.1145/1879141.1879174>.
+
+ [HTTP2] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
+ Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
+ DOI 10.17487/RFC7540, May 2015,
+ <https://www.rfc-editor.org/info/rfc7540>.
+
+ [IPv6] Deering, S. and R. Hinden, "Internet Protocol, Version 6
+ (IPv6) Specification", STD 86, RFC 8200,
+ DOI 10.17487/RFC8200, July 2017,
+ <https://www.rfc-editor.org/info/rfc8200>.
+
+ [QUIC-MANAGEABILITY]
+ Kuehlewind, M. and B. Trammell, "Manageability of the QUIC
+ Transport Protocol", Work in Progress, Internet-Draft,
+ draft-ietf-quic-manageability-11, 21 April 2021,
+ <https://tools.ietf.org/html/draft-ietf-quic-
+ manageability-11>.
+
+ [RANDOM] Eastlake 3rd, D., Schiller, J., and S. Crocker,
+ "Randomness Requirements for Security", BCP 106, RFC 4086,
+ DOI 10.17487/RFC4086, June 2005,
+ <https://www.rfc-editor.org/info/rfc4086>.
+
+ [RFC1812] Baker, F., Ed., "Requirements for IP Version 4 Routers",
+ RFC 1812, DOI 10.17487/RFC1812, June 1995,
+ <https://www.rfc-editor.org/info/rfc1812>.
+
+ [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G.
+ J., and E. Lear, "Address Allocation for Private
+ Internets", BCP 5, RFC 1918, DOI 10.17487/RFC1918,
+ February 1996, <https://www.rfc-editor.org/info/rfc1918>.
+
+ [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
+ Selective Acknowledgment Options", RFC 2018,
+ DOI 10.17487/RFC2018, October 1996,
+ <https://www.rfc-editor.org/info/rfc2018>.
+
+ [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
+ Hashing for Message Authentication", RFC 2104,
+ DOI 10.17487/RFC2104, February 1997,
+ <https://www.rfc-editor.org/info/rfc2104>.
+
+ [RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M.
+ Sooriyabandara, "TCP Performance Implications of Network
+ Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449,
+ December 2002, <https://www.rfc-editor.org/info/rfc3449>.
+
+ [RFC4193] Hinden, R. and B. Haberman, "Unique Local IPv6 Unicast
+ Addresses", RFC 4193, DOI 10.17487/RFC4193, October 2005,
+ <https://www.rfc-editor.org/info/rfc4193>.
+
+ [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing
+ Architecture", RFC 4291, DOI 10.17487/RFC4291, February
+ 2006, <https://www.rfc-editor.org/info/rfc4291>.
+
+ [RFC4443] Conta, A., Deering, S., and M. Gupta, Ed., "Internet
+ Control Message Protocol (ICMPv6) for the Internet
+ Protocol Version 6 (IPv6) Specification", STD 89,
+ RFC 4443, DOI 10.17487/RFC4443, March 2006,
+ <https://www.rfc-editor.org/info/rfc4443>.
+
+ [RFC4787] Audet, F., Ed. and C. Jennings, "Network Address
+ Translation (NAT) Behavioral Requirements for Unicast
+ UDP", BCP 127, RFC 4787, DOI 10.17487/RFC4787, January
+ 2007, <https://www.rfc-editor.org/info/rfc4787>.
+
+ [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
+ Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
+ <https://www.rfc-editor.org/info/rfc5681>.
+
+ [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand
+ Key Derivation Function (HKDF)", RFC 5869,
+ DOI 10.17487/RFC5869, May 2010,
+ <https://www.rfc-editor.org/info/rfc5869>.
+
+ [RFC7983] Petit-Huguenin, M. and G. Salgueiro, "Multiplexing Scheme
+ Updates for Secure Real-time Transport Protocol (SRTP)
+ Extension for Datagram Transport Layer Security (DTLS)",
+ RFC 7983, DOI 10.17487/RFC7983, September 2016,
+ <https://www.rfc-editor.org/info/rfc7983>.
+
+ [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using
+ Explicit Congestion Notification (ECN)", RFC 8087,
+ DOI 10.17487/RFC8087, March 2017,
+ <https://www.rfc-editor.org/info/rfc8087>.
+
+ [RFC8981] Gont, F., Krishnan, S., Narten, T., and R. Draves,
+ "Temporary Address Extensions for Stateless Address
+ Autoconfiguration in IPv6", RFC 8981,
+ DOI 10.17487/RFC8981, February 2021,
+ <https://www.rfc-editor.org/info/rfc8981>.
+
+ [SEC-CONS] Rescorla, E. and B. Korver, "Guidelines for Writing RFC
+ Text on Security Considerations", BCP 72, RFC 3552,
+ DOI 10.17487/RFC3552, July 2003,
+ <https://www.rfc-editor.org/info/rfc3552>.
+
+ [SLOWLORIS]
+ "RSnake" Hansen, R., "Welcome to Slowloris - the low
+ bandwidth, yet greedy and poisonous HTTP client!", June
+ 2009, <https://web.archive.org/web/20150315054838/
+ http://ha.ckers.org/slowloris/>.
+
+Appendix A. Pseudocode
+
+ The pseudocode in this section describes sample algorithms. These
+ algorithms are intended to be correct and clear, rather than being
+ optimally performant.
+
+ The pseudocode segments in this section are licensed as Code
+ Components; see the Copyright Notice.
+
+A.1. Sample Variable-Length Integer Decoding
+
+ The pseudocode in Figure 45 shows how a variable-length integer can
+ be read from a stream of bytes. The function ReadVarint takes a
+ single argument -- a sequence of bytes, which can be read in network
+ byte order.
+
+ ReadVarint(data):
+ // The length of variable-length integers is encoded in the
+ // first two bits of the first byte.
+ v = data.next_byte()
+ prefix = v >> 6
+ length = 1 << prefix
+
+ // Once the length is known, remove these bits and read any
+ // remaining bytes.
+ v = v & 0x3f
+ repeat length-1 times:
+ v = (v << 8) + data.next_byte()
+ return v
+
+ Figure 45: Sample Variable-Length Integer Decoding Algorithm
+
+ For example, the eight-byte sequence 0xc2197c5eff14e88c decodes to
+ the decimal value 151,288,809,941,952,652; the four-byte sequence
+ 0x9d7f3e7d decodes to 494,878,333; the two-byte sequence 0x7bbd
+ decodes to 15,293; and the single byte 0x25 decodes to 37 (as does
+ the two-byte sequence 0x4025).
+
+A.2. Sample Packet Number Encoding Algorithm
+
+ The pseudocode in Figure 46 shows how an implementation can select an
+ appropriate size for packet number encodings.
+
+ The EncodePacketNumber function takes two arguments:
+
+ * full_pn is the full packet number of the packet being sent.
+
+ * largest_acked is the largest packet number that has been
+ acknowledged by the peer in the current packet number space, if
+ any.
+
+ EncodePacketNumber(full_pn, largest_acked):
+
+ // The number of bits must be at least one more
+ // than the base-2 logarithm of the number of contiguous
+ // unacknowledged packet numbers, including the new packet.
+ if largest_acked is None:
+ num_unacked = full_pn + 1
+ else:
+ num_unacked = full_pn - largest_acked
+
+ min_bits = log(num_unacked, 2) + 1
+ num_bytes = ceil(min_bits / 8)
+
+ // Encode the integer value and truncate to
+ // the num_bytes least significant bytes.
+ return encode(full_pn, num_bytes)
+
+ Figure 46: Sample Packet Number Encoding Algorithm
+
+ For example, if an endpoint has received an acknowledgment for packet
+ 0xabe8b3 and is sending a packet with a number of 0xac5c02, there are
+ 29,519 (0x734f) outstanding packet numbers. In order to represent at
+ least twice this range (59,038 packets, or 0xe69e), 16 bits are
+ required.
+
+ In the same state, sending a packet with a number of 0xace8fe uses
+ the 24-bit encoding, because at least 18 bits are required to
+ represent twice the range (131,222 packets, or 0x020096).
+
+A.3. Sample Packet Number Decoding Algorithm
+
+ The pseudocode in Figure 47 includes an example algorithm for
+ decoding packet numbers after header protection has been removed.
+
+ The DecodePacketNumber function takes three arguments:
+
+ * largest_pn is the largest packet number that has been successfully
+ processed in the current packet number space.
+
+ * truncated_pn is the value of the Packet Number field.
+
+ * pn_nbits is the number of bits in the Packet Number field (8, 16,
+ 24, or 32).
+
+ DecodePacketNumber(largest_pn, truncated_pn, pn_nbits):
+ expected_pn = largest_pn + 1
+ pn_win = 1 << pn_nbits
+ pn_hwin = pn_win / 2
+ pn_mask = pn_win - 1
+ // The incoming packet number should be greater than
+ // expected_pn - pn_hwin and less than or equal to
+ // expected_pn + pn_hwin
+ //
+ // This means we cannot just strip the trailing bits from
+ // expected_pn and add the truncated_pn because that might
+ // yield a value outside the window.
+ //
+ // The following code calculates a candidate value and
+ // makes sure it's within the packet number window.
+ // Note the extra checks to prevent overflow and underflow.
+ candidate_pn = (expected_pn & ~pn_mask) | truncated_pn
+ if candidate_pn <= expected_pn - pn_hwin and
+ candidate_pn < (1 << 62) - pn_win:
+ return candidate_pn + pn_win
+ if candidate_pn > expected_pn + pn_hwin and
+ candidate_pn >= pn_win:
+ return candidate_pn - pn_win
+ return candidate_pn
+
+ Figure 47: Sample Packet Number Decoding Algorithm
+
+ For example, if the highest successfully authenticated packet had a
+ packet number of 0xa82f30ea, then a packet containing a 16-bit value
+ of 0x9b32 will be decoded as 0xa82f9b32.
+
+A.4. Sample ECN Validation Algorithm
+
+ Each time an endpoint commences sending on a new network path, it
+ determines whether the path supports ECN; see Section 13.4. If the
+ path supports ECN, the goal is to use ECN. Endpoints might also
+ periodically reassess a path that was determined to not support ECN.
+
+ This section describes one method for testing new paths. This
+ algorithm is intended to show how a path might be tested for ECN
+ support. Endpoints can implement different methods.
+
+ The path is assigned an ECN state that is one of "testing",
+ "unknown", "failed", or "capable". On paths with a "testing" or
+ "capable" state, the endpoint sends packets with an ECT marking --
+ ECT(0) by default; otherwise, the endpoint sends unmarked packets.
+
+ To start testing a path, the ECN state is set to "testing", and
+ existing ECN counts are remembered as a baseline.
+
+ The testing period runs for a number of packets or a limited time, as
+ determined by the endpoint. The goal is not to limit the duration of
+ the testing period but to ensure that enough marked packets are sent
+ for received ECN counts to provide a clear indication of how the path
+ treats marked packets. Section 13.4.2 suggests limiting this to ten
+ packets or three times the PTO.
+
+ After the testing period ends, the ECN state for the path becomes
+ "unknown". From the "unknown" state, successful validation of the
+ ECN counts in an ACK frame (see Section 13.4.2.1) causes the ECN
+ state for the path to become "capable", unless no marked packet has
+ been acknowledged.
+
+ If validation of ECN counts fails at any time, the ECN state for the
+ affected path becomes "failed". An endpoint can also mark the ECN
+ state for a path as "failed" if marked packets are all declared lost
+ or if they are all ECN-CE marked.
+
+ Following this algorithm ensures that ECN is rarely disabled for
+ paths that properly support ECN. Any path that incorrectly modifies
+ markings will cause ECN to be disabled. For those rare cases where
+ marked packets are discarded by the path, the short duration of the
+ testing period limits the number of losses incurred.
+
+Contributors
+
+ The original design and rationale behind this protocol draw
+ significantly from work by Jim Roskind [EARLY-DESIGN].
+
+ The IETF QUIC Working Group received an enormous amount of support
+ from many people. The following people provided substantive
+ contributions to this document:
+
+ * Alessandro Ghedini
+ * Alyssa Wilk
+ * Antoine Delignat-Lavaud
+ * Brian Trammell
+ * Christian Huitema
+ * Colin Perkins
+ * David Schinazi
+ * Dmitri Tikhonov
+ * Eric Kinnear
+ * Eric Rescorla
+ * Gorry Fairhurst
+ * Ian Swett
+ * Igor Lubashev
+ * 奥 一穂 (Kazuho Oku)
+ * Lars Eggert
+ * Lucas Pardue
+ * Magnus Westerlund
+ * Marten Seemann
+ * Martin Duke
+ * Mike Bishop
+ * Mikkel Fahnøe Jørgensen
+ * Mirja Kühlewind
+ * Nick Banks
+ * Nick Harper
+ * Patrick McManus
+ * Roberto Peon
+ * Ryan Hamilton
+ * Subodh Iyengar
+ * Tatsuhiro Tsujikawa
+ * Ted Hardie
+ * Tom Jones
+ * Victor Vasiliev
+
+Authors' Addresses
+
+ Jana Iyengar (editor)
+ Fastly
+
+ Email: jri.ietf@gmail.com
+
+
+ Martin Thomson (editor)
+ Mozilla
+
+ Email: mt@lowentropy.net