summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc9584.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc9584.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc9584.txt')
-rw-r--r--doc/rfc/rfc9584.txt2311
1 files changed, 2311 insertions, 0 deletions
diff --git a/doc/rfc/rfc9584.txt b/doc/rfc/rfc9584.txt
new file mode 100644
index 0000000..a5bbbe8
--- /dev/null
+++ b/doc/rfc/rfc9584.txt
@@ -0,0 +1,2311 @@
+
+
+
+
+Internet Engineering Task Force (IETF) S. Zhao
+Request for Comments: 9584 Intel
+Category: Standards Track S. Wenger
+ISSN: 2070-1721 Tencent
+ Y. Lim
+ Samsung Electronics
+ June 2024
+
+
+ RTP Payload Format for Essential Video Coding (EVC)
+
+Abstract
+
+ This document describes an RTP payload format for the Essential Video
+ Coding (EVC) standard, published as ISO/IEC International Standard
+ 23094-1. EVC was developed by the MPEG. The RTP payload format
+ allows for the packetization of one or more Network Abstraction Layer
+ (NAL) units in each RTP packet payload and the fragmentation of a NAL
+ unit into multiple RTP packets. The payload format has broad
+ applicability in videoconferencing, Internet video streaming, and
+ high-bitrate entertainment-quality video, among other applications.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ https://www.rfc-editor.org/info/rfc9584.
+
+Copyright Notice
+
+ Copyright (c) 2024 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (https://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Revised BSD License text as described in Section 4.e of the
+ Trust Legal Provisions and are provided without warranty as described
+ in the Revised BSD License.
+
+Table of Contents
+
+ 1. Introduction
+ 1.1. Overview of the EVC Codec
+ 1.1.1. Coding-Tool Features (Informative)
+ 1.1.2. Systems and Transport Interfaces
+ 1.1.3. Parallel Processing Support (Informative)
+ 1.1.4. NAL Unit Header
+ 1.2. Overview of the Payload Format
+ 2. Conventions
+ 3. Definitions and Abbreviations
+ 3.1. Definitions
+ 3.1.1. Definitions from the EVC Standard
+ 3.1.2. Definitions Specific to This Document
+ 3.2. Abbreviations
+ 4. RTP Payload Format
+ 4.1. RTP Header Usage
+ 4.2. Payload Header Usage
+ 4.3. Payload Structures
+ 4.3.1. Single NAL Unit Packets
+ 4.3.2. Aggregation Packets (APs)
+ 4.3.3. Fragmentation Units (FUs)
+ 4.4. Decoding Order Number
+ 5. Packetization Rules
+ 6. De-packetization Process
+ 7. Payload Format Parameters
+ 7.1. Media Type Registration
+ 7.2. Optional Parameters Definition
+ 7.3. SDP Parameters
+ 7.3.1. Mapping of Payload Type Parameters to SDP
+ 7.3.2. Usage with SDP Offer/Answer Model
+ 7.3.3. Multicast
+ 7.3.4. Usage in Declarative Session Descriptions
+ 7.3.5. Considerations for Parameter Sets
+ 8. Use with Feedback Messages
+ 8.1. Picture Loss Indication (PLI)
+ 8.2. Full Intra Request (FIR)
+ 9. Security Considerations
+ 10. Congestion Control
+ 11. IANA Considerations
+ 12. References
+ 12.1. Normative References
+ 12.2. Informative References
+ Acknowledgements
+ Authors' Addresses
+
+1. Introduction
+
+ The Essential Video Coding [EVC] standard, which is formally
+ designated as ISO/IEC International Standard 23094-1 [EVC], was
+ published in 2020. One of MPEG's goals is to keep EVC's Baseline
+ profile essentially royalty-free by using technologies published more
+ than 20 years ago or otherwise known to be available for use without
+ a requirement for paying royalties, whereas more advanced profiles
+ follow a reasonable and non-discriminatory licensing terms policy.
+ Both the Baseline profile and higher profiles of EVC [EVC] are
+ reported to provide coding efficiency gains over High Efficiency
+ Video Coding [HEVC] and Advanced Video Coding [AVC] under certain
+ configurations.
+
+ This document describes an RTP payload format for EVC. It shares its
+ basic design with the NAL unit-based RTP payload formats of H.264
+ Video Coding [RFC6184], Scalable Video Coding (SVC) [RFC6190], High
+ Efficiency Video Coding (HEVC) [RFC7798], and Versatile Video Coding
+ (VVC) [RFC9328]. With respect to design philosophy, security,
+ congestion control, and overall implementation complexity, it has
+ similar properties to those earlier payload format specifications.
+ This is a conscious choice, as at least the RTP Payload Format for
+ H.264 video as described in [RFC6184] is widely deployed and
+ generally known in the relevant implementer communities. Certain
+ mechanisms described in [RFC6190] were incorporated, as EVC supports
+ temporal scalability. EVC currently does not offer higher forms of
+ scalability.
+
+1.1. Overview of the EVC Codec
+
+ The codings described in [EVC], [AVC], [HEVC], and [VVC] share a
+ similar hybrid video codec design. In this document, we provide a
+ very brief overview of those features of EVC that are, in some form,
+ addressed by the payload format specified herein. Implementers have
+ to read, understand, and apply the ISO/IEC standard pertaining to EVC
+ [EVC] to arrive at interoperable, well-performing implementations.
+ The EVC standard has a Baseline profile and a Main profile, the
+ latter being a superset of the Baseline profile but including more
+ advanced features. EVC also includes still image variants of both
+ Baseline and Main profiles, in each of which the bitstream is
+ restricted to a single IDR picture. EVC facilitates certain walled
+ garden implementations under commercial constraints imposed by
+ intellectual property rights by including syntax elements that allow
+ encoders to mark a bitstream as to what of the many independent
+ coding tools are exercised in the bitstream, in a spirit similar to
+ the general_constraint_info of [VVC].
+
+ Conceptually, all EVC, AVC, HEVC, and VVC include a Video Coding
+ Layer (VCL), a term that is often used to refer to the coding-tool
+ features, and a Network Abstraction Layer (NAL), which usually refers
+ to the systems and transport interface aspects of the codecs.
+
+1.1.1. Coding-Tool Features (Informative)
+
+ Coding blocks and transform structure
+ EVC uses a traditional block-based coding structure, which divides
+ the encoded image into blocks of up to 64x64 luma samples for the
+ Baseline profile and 128x128 luma samples for the Main profile
+ that can be recursively divided into smaller blocks. The Baseline
+ profiles utilize HEVC-like quad-tree-blocks partitioning that
+ allows a block to be divided horizontally and vertically into four
+ smaller square blocks. The Main profile adds two advanced coding
+ structure tools: 1) Binary Ternary Tree (BTT) partitioning that
+ allows non-square coding units and 2) Split Unit Coding Order
+ segmentation that changes the processing order of the blocks from
+ traditional left-to-right and top-to-bottom scanning order
+ processing to an alternative right-to-left and bottom-to-top
+ scanning order. In the Main profile, the picture can be divided
+ into slices and tiles, which can be independently encoded and/or
+ decoded in parallel.
+
+ EVC also uses a traditional video codecs prediction model assuming
+ two general types of predictions: Intra (spatial) and Inter
+ (temporal) predictions. A residue block is calculated by
+ subtracting predicted data from the original (encoded) one. The
+ Baseline profile allows only discrete cosine transform (DCT-2) and
+ scalar quantization to transform and quantize residue data,
+ wherein the Main profile additionally has options to use discrete
+ sine transform (DST-7) and another type of discrete cosine
+ transform (DCT-8). In addition, for the Main profile, Improved
+ Quantization and Transform (IQT) uses a different mapping or
+ clipping function for quantization. An inverse zig-zag scanning
+ order is used for coefficient coding. Advanced Coefficient Coding
+ (ADCC) in the Main profile can code coefficient values more
+ efficiently, for example, indicated by the last non-zero
+ coefficient. The Baseline profile uses a straightforward RLE-
+ based approach to encode the quantized coefficients.
+
+ Entropy coding
+ EVC uses a similar binary arithmetic coding mechanism as HEVC
+ CABAC (context adaptive binary arithmetic coding) and VVC. The
+ mechanism includes a binarization step and a probability update
+ defined by a lookup table. In the Main profile, the derivation
+ process of syntax elements based on adjacent blocks makes the
+ context modeling and initialization process more efficient.
+
+ In-loop filtering
+ The Baseline profile of EVC uses the deblocking filter defined in
+ H.263 Annex J [VIDEO-CODING]. In the Main profile, an Advanced
+ Deblocking Filter (ADDB) can be used as an alternative, which can
+ further reduce undesirable compression artifacts. The Main
+ profile also defines two additional in-loop filters that can be
+ used to improve the quality of decoded pictures before output and/
+ or for Inter prediction. A Hadamard Transform Domain Filter
+ (HTDF) is applied to the luma samples before deblocking, and a
+ lookup table is used to determine four adjacent samples for
+ filtering. An adaptive Loop Filter (ALF) allows signals of up to
+ 25 different filters to be sent for the luma components; the best
+ filter can be selected through the classification process for each
+ 4x4 block. Similarly to VVC, the filter parameters of ALF are
+ signaled in the Adaptation Parameter Set (APS).
+
+ Inter prediction
+ The basis of EVC's Inter prediction is motion compensation using
+ interpolation filters with a quarter sample resolution. In the
+ Baseline profile, a motion vector is transmitted using one of
+ three spatially neighboring motion vectors and a temporally
+ collocated motion vector as a predictor. A motion vector
+ difference may be signaled relative to the selected predictor, but
+ there is a case where no motion vector difference is signaled, and
+ there is no remaining data in the block. This mode is called a
+ "skip" mode. The Main profile includes six additional tools to
+ provide improved Inter prediction. With Advanced Motion Vectors
+ Prediction (ADMVP), adjacent blocks can be conceptually merged to
+ indicate that they use the same motion, but more advanced schemes
+ can also be used to create predictions from the basic model list
+ of candidate predictors. The Merge with Motion Vector Difference
+ (MMVD) tool uses a process similar to the concept of merging
+ neighboring blocks but also allows the use of expressions that
+ include a starting point, motion amplitude, and direction of
+ motion to send a motion vector signal. Using Advanced Motion
+ Vector Prediction (AMVP), candidate motion vector predictions for
+ the block can be derived from its neighboring blocks in the same
+ picture and collocated blocks in the reference picture. The
+ Adaptive Motion Vector Resolution (AMVR) tool provides a way to
+ reduce the accuracy of a motion vector from a quarter sample to
+ half sample, full sample, double sample, or quad sample, which
+ provides an efficiency advantage, such as when sending large
+ motion vector differences. The Main profile also includes the
+ Decoder-side Motion Vector Refinement (DMVR), which uses a
+ bilateral template matching process to refine the motion vectors
+ without additional signaling.
+
+ Intra prediction and intra coding
+ Intra prediction in EVC is performed on adjacent samples of coding
+ units in a partitioned structure. For the Baseline profile, when
+ all coding units are square, there are five different prediction
+ modes: DC (mean value of the neighborhood), horizontal, vertical,
+ and two different diagonal directions. In the Main profile, intra
+ prediction can be applied to any rectangular coding unit, and 28
+ additional direction modes are available in the Enhanced Intra
+ Prediction Directions (EIPDs). In the Main profile, an encoder
+ can also use Intra Block Copy (IBC), where previously decoded
+ sample blocks of the same picture are used as a predictor. A
+ displacement vector in integer sample precision is signaled to
+ indicate where the prediction block in the current picture is used
+ for this mode.
+
+ Reference frames management
+ In EVC, decoded pictures can be stored in a decoded picture buffer
+ (DPB) for predicting pictures that follow them in the decoding
+ order. In the Baseline profile, the management of the DPB (i.e.,
+ the process of adding and deleting reference pictures) is
+ controlled by a straightforward AVC-like sliding window approach
+ with very few parameters from the sequence parameter set (SPS).
+ For the Main profile, DPB management can be handled much more
+ flexibly using explicitly signaled Reference Picture Lists (RPLs)
+ in the SPS or slice level.
+
+1.1.2. Systems and Transport Interfaces
+
+ EVC inherits the basic systems and transport interface designs from
+ AVC and HEVC. These include the NAL-unit-based syntax, hierarchical
+ syntax and data unit structure, and Supplemental Enhancement
+ Information (SEI) message mechanism. The hierarchical syntax and
+ data unit structure consists of a sequence-level parameter set (i.e.,
+ SPS), two picture-level parameter sets (i.e., PPS and APS, each of
+ which can apply to one or more pictures), slice-level header
+ parameters, and lower-level parameters.
+
+ A number of key components that influenced the NAL design of EVC as
+ well as this document are described below:
+
+ Sequence parameter set
+ The Sequence Parameter Set (SPS) contains syntax elements
+ pertaining to a Coded Video Sequence (CVS), which is a group of
+ pictures, starting with a random access point picture and followed
+ by zero or more pictures that may depend on each other and the
+ random access point picture. In MPEG-2, the equivalent of a CVS
+ is a Group of Pictures (GOP), which generally starts with an I
+ frame and is followed by P and B frames. While more complex in
+ its options of random access points, EVC retains this basic
+ concept. In many TV-like applications, a CVS contains a few
+ hundred milliseconds to a few seconds of video. In video
+ conferencing (without switching Multipoint Control Units (MCUs)
+ involved), a CVS can be as long in duration as the whole session.
+
+ Picture and adaptation parameter set
+ The Picture Parameter Set (PPS) and the Adaptation Parameter Set
+ (APS) carry information pertaining to a single picture. The PPS
+ contains information that is likely to stay constant from picture
+ to picture, at least for pictures of a certain type; whereas the
+ APS contains information, such as adaptive loop filter
+ coefficients, that are likely to change from picture to picture.
+
+ Profile, level, and toolsets
+ Profiles and levels follow the same design considerations known
+ from AVC, HEVC, and video codecs as old as MPEG-1 Video. The
+ profile defines a set of tools (not to be confused with the
+ "toolset" discussed below) that a decoder compliant with this
+ profile has to support. In EVC, profiles are defined in Annex A
+ of [EVC]. Formally, they are defined as a set of constraints that
+ a bitstream needs to conform to. In EVC, the Baseline profile is
+ much more severely constrained than the Main profile, reducing
+ implementation complexity. Levels relate to bitstream complexity
+ in dimensions such as maximum sample decoding rate, maximum
+ picture size, and similar parameters directly related to
+ computational complexity and/or memory demands.
+
+ Profiles and levels are signaled in the highest parameter set
+ available, the SPS.
+
+ EVC contains another mechanism related to the use of coding tools,
+ known as the toolset syntax elements. These syntax elements,
+ toolset_idc_h and toolset_idc_l (located in the SPS), are bitmasks
+ that allow encoders to indicate which coding tools they are using
+ within the menu of profiles offered by the profile that is also
+ signaled. No decoder conformance point is associated with the
+ toolset, but a bitstream that was using a coding tool that is
+ indicated as not being used in the toolset syntax element would be
+ non-compliant. While MPEG specifically rules out the use of the
+ toolset syntax element as a conformance point, walled garden
+ implementations could do so without incurring the interoperability
+ problems MPEG fears and create bitstreams and decoders that do not
+ support one or more given tools. That, in turn, may be useful to
+ mitigate certain intellectual property-related risks.
+
+ Bitstream and elementary stream
+ Above the Coded Video Sequence (CVS), EVC defines a video
+ bitstream that can be used as an elementary stream in the MPEG
+ systems context. For this document, the video bitstream syntax
+ level is not relevant.
+
+ Random access support
+ EVC supports random access mechanisms based on IDR and clean
+ random access (CRA) access units.
+
+ Temporal scalability support
+ EVC supports temporal scalability through the generalized
+ reference picture selection approach known since AVC/SVC. Up to
+ six temporal layers are supported. The temporal layer is signaled
+ in the NAL unit header (which co-serves as the payload header in
+ this document), in the nuh_temporal_id field.
+
+ Reference picture management
+ EVC's reference picture management is POC-based, similar to HEVC.
+ In the Main profile, substantially all reference picture list
+ manipulations available in HEVC are specified, including explicit
+ transmissions or updates of reference picture lists. Although for
+ reference pictures management purposes, EVC uses a modern VVC-like
+ RPL approach, which is conceptually simpler than the HEVC one. In
+ the Baseline profile, reference picture management is more
+ restricted, allowing for a comparatively simple group of picture
+ structures only.
+
+ SEI Message
+ EVC inherits many of HEVC's SEI messages, occasionally with syntax
+ and/or semantics changes, making them applicable to EVC. In
+ addition, some of the codec-agnostic SEI messages of the VSEI
+ specification [VSEI] are also mapped.
+
+1.1.3. Parallel Processing Support (Informative)
+
+ EVC's Baseline profile includes no tools specifically addressing
+ parallel-processing support. The Main profile includes independently
+ decodable slices for parallel processing. The slices are defined as
+ any rectangular region within a picture. They can be encoded to have
+ coding dependencies with other slices from the previous picture but
+ not with other slices in the same picture. No specific support for
+ parallel processing is specified in this RTP payload format.
+
+1.1.4. NAL Unit Header
+
+ EVC maintains the NAL unit concept of [VVC] with different parameter
+ options. EVC also uses a two-byte NAL unit header, as shown in
+ Figure 1. The payload of a NAL unit refers to the NAL unit excluding
+ the NAL unit header.
+
+ 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |F| Type | TID | Reserve |E|
+ +-------------+-----------------+
+
+ Figure 1: The Structure of the EVC NAL Unit Header
+
+ The semantics of the fields in the NAL unit header are as specified
+ in EVC and described briefly below for convenience. In addition to
+ the name and size of each field, the corresponding syntax element
+ name in EVC is also provided.
+
+ F: 1 bit
+
+ forbidden_zero_bit: Required to be zero in EVC. Note that the
+ inclusion of this bit in the NAL unit header was included to
+ enable transport of EVC video over MPEG-2 transport systems
+ (avoidance of start code emulations) [MPEG2S]. In this
+ document, the value 1 may be used to indicate a syntax
+ violation, e.g., for a NAL unit resulting from aggregating a
+ number of fragmented units of a NAL unit but missing the last
+ fragment, as described in Section 4.3.3.
+
+ Type: 6 bits
+
+ nal_unit_type_plus1: This field allows the NAL Unit Type to be
+ computed. The NAL Unit Type (NalUnitType) is equal to the
+ value found in this field, minus 1; in other words:
+
+ NalUnitType = nal_unit_type_plus1 - 1.
+
+ The NAL unit type is detailed in Table 4 of [EVC]. If the
+ value of NalUnitType is less than or equal to 23, the NAL unit
+ is a VCL NAL unit. Otherwise, the NAL unit is a non-VCL NAL
+ unit. For a reference of all currently defined NAL unit types
+ and their semantics, please refer to Section 7.4.2.2 of [EVC].
+ Note that nal_unit_type_plus1 MUST NOT be zero.
+
+ TID: 3 bits
+
+ nuh_temporal_id: This field specifies the temporal identifier of
+ the NAL unit. The value of TemporalId is equal to TID.
+ TemporalId shall be equal to 0 if it is an IDR NAL unit type
+ (NAL unit type 1).
+
+ Reserve: 5 bits
+
+ nuh_reserved_zero_5bits: This field shall be equal to the version
+ of the EVC standard. Values of nuh_reserved_zero_5bits greater
+ than 0 are reserved for future use by ISO/IEC. Decoders
+ conforming to a profile specified in Annex A of [EVC] shall
+ ignore (i.e., remove from the bitstream and discard) all NAL
+ units with values of nuh_reserved_zero_5bits greater than 0.
+
+ E: 1 bit
+
+ nuh_extension_flag: This field shall be equal to the version of
+ the EVC standard. The value of nuh_extension_flag equal to 1
+ is reserved for future use by ISO/IEC. Decoders conforming to
+ a profile specified in Annex A of [EVC] shall ignore (i.e.,
+ remove from the bitstream and discard) all NAL units with
+ values of nuh_extension_flag equal to 1.
+
+1.2. Overview of the Payload Format
+
+ This payload format defines the following processes required for
+ transport of EVC-coded data over RTP [RFC3550]:
+
+ * usage of RTP header with this payload format
+
+ * packetization of EVC-coded NAL units into RTP packets using three
+ types of payload structures: a single NAL unit, aggregation, and
+ fragment unit
+
+ * transmission of EVC NAL units of the same bitstream within a
+ single RTP stream
+
+ * usage of media type parameters to be used with the Session
+ Description Protocol (SDP) [RFC8866]
+
+ * usage of RTCP feedback messages
+
+2. Conventions
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here.
+
+3. Definitions and Abbreviations
+
+3.1. Definitions
+
+ This document uses the terms and definitions of EVC. Section 3.1.1
+ lists relevant definitions from [EVC] for convenience. Section 3.1.2
+ provides definitions specific to this document.
+
+3.1.1. Definitions from the EVC Standard
+
+ Access Unit (AU):
+ A set of NAL units that are associated with each other according
+ to a specified classification rule, are consecutive in decoding
+ order, and contain exactly one coded picture.
+
+ Adaptation Parameter Set (APS):
+ A syntax structure containing syntax elements that apply to zero
+ or more slices as determined by zero or more syntax elements found
+ in slice headers.
+
+ Bitstream:
+ A sequence of bits, in the form of a NAL unit stream or a byte
+ stream, that forms the representation of coded pictures and
+ associated data forming one or more CVSs.
+
+ Coded Picture:
+ A coded representation of a picture containing all CTUs of the
+ picture.
+
+ Coded Video Sequence (CVS):
+ A sequence of access units that consists, in decoding order, of an
+ IDR access unit, followed by zero or more access units that are
+ not IDR access units, including all subsequent access units up to
+ but not including any subsequent access unit that is an IDR access
+ unit.
+
+ Coding Tree Block (CTB):
+ An NxN block of samples for some value of N such that the division
+ of a component into CTBs is a partitioning.
+
+ Coding Tree Unit (CTU):
+ A CTB of luma samples, two corresponding CTBs of chroma samples of
+ a picture that has three sample arrays, or a CTB of samples of a
+ monochrome picture or a picture that is coded using three separate
+ color planes and syntax structures used to code the samples.
+
+ Decoded Picture:
+ A decoded picture is derived by decoding a coded picture.
+
+ Decoded Picture Buffer (DPB):
+ A buffer holding decoded pictures for reference, output
+ reordering, or output delay specified for the hypothetical
+ reference decoder in Annex C of the [EVC] standard.
+
+ Dynamic Range Adjustment (DRA):
+ A mapping process that is applied to the decoded picture prior to
+ cropping and output as part of the decoding process; it is
+ controlled by parameters conveyed in an Adaptation Parameter Set
+ (APS).
+
+ Hypothetical Reference Decoder (HRD):
+ A hypothetical decoder model that specifies constraints on the
+ variability of conforming NAL unit streams or conforming byte
+ streams that an encoding process may produce.
+
+ IDR Access Unit:
+ An access unit in which the coded picture is an IDR picture.
+
+ IDR Picture:
+ The coded picture for which each VCL NAL unit has NalUnitType
+ equal to IDR_NUT.
+
+ Level:
+ A defined set of constraints on the values that may be taken by
+ the syntax elements and variables of this document, or the value
+ of a transform coefficient prior to scaling.
+
+ Network Abstraction Layer (NAL) Unit:
+ A syntax structure containing an indication of the type of data to
+ follow and bytes containing that data in the form of an RBSP
+ interspersed as necessary.
+
+ Network Abstraction Layer (NAL) Unit Stream:
+ A sequence of NAL units.
+
+ Non-IDR Picture:
+ A coded picture that is not an IDR picture.
+
+ Non-VCL NAL Unit:
+ A NAL unit that is not a VCL NAL unit.
+
+ Picture Parameter Set (PPS):
+ A syntax structure containing syntax elements that apply to zero
+ or more entire coded pictures as determined by a syntax element
+ found in each slice header.
+
+ Picture Order Count (POC):
+ A variable that is associated with each picture, uniquely
+ identifies the associated picture among all pictures in the CVS,
+ and (when the associated picture is to be output from the DPB)
+ indicates the position of the associated picture in output order
+ relative to the output order positions of the other pictures in
+ the same CVS that are to be output from the DPB.
+
+ Raw Byte Sequence Payload (RBSP):
+ A syntax structure containing an integer number of bytes that is
+ encapsulated in a NAL unit and that is either empty or has the
+ form of a string of data bits containing syntax elements followed
+ by an RBSP stop bit and zero or more subsequent bits equal to 0.
+
+ Sequence Parameter Set (SPS):
+ A syntax structure containing syntax elements that apply to zero
+ or more entire CVSs as determined by the content of a syntax
+ element found in the PPS referred to by a syntax element found in
+ each slice header.
+
+ Slice:
+ An integer number of tiles of a picture in the tile scan of the
+ picture, exclusively contained in a single NAL unit.
+
+ Tile:
+ A rectangular region of CTUs within a particular tile column and a
+ particular tile row in a picture.
+
+ Tile Column:
+ A rectangular region of CTUs having a height equal to the height
+ of the picture and width specified by syntax elements in the PPS.
+
+ Tile Row:
+ A rectangular region of CTUs having a height specified by syntax
+ elements in the PPS and a width equal to the width of the picture.
+
+ Tile Scan:
+ A specific sequential ordering of CTUs partitioning a picture in
+ which the CTUs are ordered consecutively in CTU raster scan in a
+ tile, whereas tiles in a picture are ordered consecutively in a
+ raster scan of the tiles of the picture.
+
+ Video Coding Layer (VCL) NAL Unit:
+ A collective term for coded slice NAL units and the subset of NAL
+ units that have reserved values of NalUnitType that are classified
+ as VCL NAL units in this document.
+
+3.1.2. Definitions Specific to This Document
+
+ Media-Aware Network Element (MANE):
+ A network element, such as a middlebox, selective forwarding unit,
+ or application-layer gateway, that is capable of parsing certain
+ aspects of the RTP payload headers or the RTP payload and reacting
+ to their contents.
+
+ | Informative note: The concept of a MANE goes beyond normal
+ | routers or gateways in that a MANE has to be aware of the
+ | signaling (e.g., to learn about the payload type mappings of
+ | the media streams), and in that it has to be trusted when
+ | working with Secure RTP (SRTP). The advantage of using
+ | MANEs is that they allow packets to be dropped according to
+ | the needs of the media coding. For example, if a MANE has
+ | to drop packets due to congestion on a certain link, it can
+ | identify and remove those packets whose elimination produces
+ | the least adverse effect on the user experience. After
+ | dropping packets, MANEs must rewrite RTCP packets to match
+ | the changes to the RTP stream, as specified in Section 7 of
+ | [RFC3550].
+
+ NAL unit decoding order:
+ A NAL unit order that conforms to the constraints on NAL unit
+ order given in Section 7.4.2.3 of [EVC] and follows the order of
+ NAL units in the bitstream.
+
+ NALU-time:
+ The value that the RTP timestamp would have if the NAL unit would
+ be transported in its own RTP packet.
+
+ NAL unit output order:
+ A NAL unit order in which NAL units of different access units are
+ in the output order of the decoded pictures corresponding to the
+ access units, as specified in [EVC], and in which NAL units within
+ an access unit are in their decoding order.
+
+ RTP stream:
+ See [RFC7656]. Within the scope of this document, one RTP stream
+ is utilized to transport an EVC bitstream, which may contain one
+ or more temporal sub-layers.
+
+ Transmission order:
+ The order of packets in ascending RTP sequence number order (in
+ modulo arithmetic). Within an Aggregation Packet (AP), the NAL
+ unit transmission order is the same as the order of appearance of
+ NAL units in the packet.
+
+3.2. Abbreviations
+
+ AU Access Unit
+
+ AP Aggregation Packet
+
+ APS Adaptation Parameter Set
+
+ ATS Adaptive Transform Selection
+
+ B Bi-predictive
+
+ CBR Constant Bit Rate
+
+ CPB Coded Picture Buffer
+
+ CTB Coding Tree Block
+
+ CTU Coding Tree Unit
+
+ CVS Coded Video Sequence
+
+ DPB Decoded Picture Buffer
+
+ HRD Hypothetical Reference Decoder
+
+ HSS Hypothetical Stream Scheduler
+
+ I Intra
+
+ IDR Instantaneous Decoding Refresh
+
+ LSB Least Significant Bit
+
+ LTRP Long-Term Reference Picture
+
+ MMVD Merge with Motion Vector Difference
+
+ MSB Most Significant Bit
+
+ NAL Network Abstraction Layer
+
+ P Predictive
+
+ POC Picture Order Count
+
+ PPS Picture Parameter Set
+
+ QP Quantization Parameter
+
+ RBSP Raw Byte Sequence Payload
+
+ RGB Red, Green, and Blue
+
+ SAR Sample Aspect Ratio
+
+ SEI Supplemental Enhancement Information
+
+ SODB String Of Data Bits
+
+ SPS Sequence Parameter Set
+
+ STRP Short-Term Reference Picture
+
+ VBR Variable Bit Rate
+
+ VCL Video Coding Layer
+
+4. RTP Payload Format
+
+4.1. RTP Header Usage
+
+ The format of the RTP header is specified in [RFC3550] (included as
+ Figure 2 for convenience). This payload format uses the fields of
+ the header in a manner consistent with that specification.
+
+ The RTP payload (and the settings for some RTP header bits) for APs
+ and Fragmentation Units (FUs) are specified in Sections 4.3.2 and
+ 4.3.3, respectively.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | contributing source (CSRC) identifiers |
+ | .... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 2: RTP Header According to RFC 3550
+
+ The RTP header information to be set according to this RTP payload
+ format is set as follows:
+
+ Marker bit (M): 1 bit
+
+ Set for the last packet of the access unit and carried in the
+ current RTP stream. This is in line with the normal use of the M
+ bit in video formats to allow an efficient playout buffer
+ handling.
+
+ Payload Type (PT): 7 bits
+
+ The assignment of an RTP payload type for this new payload format
+ is outside the scope of this document and will not be specified
+ here. The assignment of a payload type has to be performed either
+ through the profile used or in a dynamic way.
+
+ Sequence Number (SN): 16 bits
+
+ Set and used in accordance with [RFC3550].
+
+ Timestamp: 32 bits
+
+ The RTP timestamp is set to the sampling timestamp of the content.
+ A 90 kHz clock rate MUST be used. If the NAL unit has no timing
+ properties of its own (e.g., parameter sets or certain SEI NAL
+ units), the RTP timestamp MUST be set to the RTP timestamp of the
+ coded picture of the access unit in which the NAL unit is
+ included. For SEI messages, this information is specified in
+ Annex D of [EVC]. Receivers MUST use the RTP timestamp for the
+ display process, even when the bitstream contains picture timing
+ SEI messages or decoding unit information SEI messages as
+ specified in [EVC].
+
+ Synchronization source (SSRC): 32 bits
+
+ Used to identify the source of the RTP packets. According to this
+ document, a single SSRC is used for all parts of a single
+ bitstream.
+
+4.2. Payload Header Usage
+
+ The first two bytes of the payload of an RTP packet are referred to
+ as the payload header. The payload header consists of the same
+ fields (F, TID, Reserve, and E) as the NAL unit header, as shown in
+ Section 1.1.4, irrespective of the type of the payload structure.
+
+ The TID value indicates (among other things) the relative importance
+ of an RTP packet, for example, because NAL units with larger TID
+ values are not used to decode the ones with smaller TID values. A
+ lower value of TID indicates a higher importance. More important NAL
+ units MAY be better protected against transmission losses than less
+ important NAL units.
+
+4.3. Payload Structures
+
+ Three different types of RTP packet payload structures are specified.
+ A receiver can identify the type of an RTP packet payload through the
+ Type field in the payload header.
+
+ The three different payload structures are as follows:
+
+ * Single NAL unit packet: Contains a single NAL unit in the payload,
+ and the NAL unit header of the NAL unit also serves as the payload
+ header. This payload structure is specified in Section 4.3.1.
+
+ * Aggregation Packet (AP): Contains more than one NAL unit within
+ one access unit. This payload structure is specified in
+ Section 4.3.2.
+
+ * Fragmentation Unit (FU): Contains a subset of a single NAL unit.
+ This payload structure is specified in Section 4.3.3.
+
+4.3.1. Single NAL Unit Packets
+
+ A single NAL unit packet contains exactly one NAL unit and consists
+ of a payload header as defined in Table 4 of [EVC] (denoted as
+ PayloadHdr), followed by a conditional 16-bit DONL field (in network
+ byte order), and the NAL unit payload data (the NAL unit excluding
+ its NAL unit header) of the contained NAL unit, as shown in Figure 3.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | PayloadHdr | DONL (conditional) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | NAL unit payload data |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 3: The Structure of a Single NAL Unit Packet
+
+ The DONL field, when present, specifies the value of the 16 least
+ significant bits of the decoding order number of the contained NAL
+ unit. If sprop-max-don-diff (defined in Section 7.2) is greater than
+ 0, the DONL field MUST be present, and the variable DON for the
+ contained NAL unit is derived as equal to the value of the DONL
+ field. Otherwise (where sprop-max-don-diff is equal to 0), the DONL
+ field MUST NOT be present.
+
+4.3.2. Aggregation Packets (APs)
+
+ Aggregation Packets (APs) enable the reduction of packetization
+ overhead for small NAL units, such as most of the non-VCL NAL units,
+ which are often only a few octets in size.
+
+ An AP aggregates NAL units of one access unit, and it MUST NOT
+ contain NAL units from more than one AU. Each NAL unit to be carried
+ in an AP is encapsulated in an aggregation unit. NAL units
+ aggregated in one AP are included in NAL-unit-decoding order.
+
+ An AP consists of a payload header, as defined in Table 4 of [EVC]
+ (denoted here as PayloadHdr with Type=56), followed by two or more
+ aggregation units, as shown in Figure 4.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | PayloadHdr (Type=56) | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | |
+ | two or more aggregation units |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 4: The Structure of an Aggregation Packet
+
+ The fields in the payload header of an AP are set as follows. The F
+ bit MUST be equal to 0 if the F bit of each aggregated NAL unit is
+ equal to zero; otherwise, it MUST be equal to 1. The Type field MUST
+ be equal to 56.
+
+ The value of TID MUST be the smallest value of TID of all the
+ aggregated NAL units. The value of Reserve and E MUST be equal to 0
+ for this specification.
+
+ | Informative note: All VCL NAL units in an AP have the same TID
+ | value since they belong to the same access unit. However, an
+ | AP may contain non-VCL NAL units for which the TID value in the
+ | NAL unit header may be different from the TID value of the VCL
+ | NAL units in the same AP.
+
+ An AP MUST carry at least two aggregation units and can carry as many
+ aggregation units as necessary; however, the total amount of data in
+ an AP obviously MUST fit into an IP packet, and the size SHOULD be
+ chosen so that the resulting IP packet is smaller than the path MTU
+ size so to avoid IP layer fragmentation. An AP MUST NOT contain FUs
+ specified in Section 4.3.3. APs MUST NOT be nested; i.e., an AP
+ cannot contain another AP.
+
+ | Informative note: If a receiver encounters nested APs, which is
+ | against the aforementioned requirement, it has several options,
+ | listed in order of ease of implementation: 1) ignore the nested
+ | AP; 2) ignore the nested AP and report a "packet loss" to the
+ | decoder, if such functionality exists in the API; and 3)
+ | implement support for nested APs and extract the NAL units from
+ | these nested APs.
+
+ The first aggregation unit in an AP consists of a conditional 16-bit
+ DONL field (in network byte order) followed by a 16-bit unsigned size
+ information (in network byte order) that indicates the size of the
+ NAL unit in bytes (excluding these two octets but including the NAL
+ unit header), followed by the NAL unit itself, including its NAL unit
+ header, as shown in Figure 5.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | : DONL (conditional) | NALU size |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU size | |
+ +-+-+-+-+-+-+-+-+ NAL unit |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 5: The Structure of the First Aggregation Unit in an AP
+
+ | Informative note: The first octet of Figure 5 (indicated by the
+ | first colon) belongs to a previous aggregation unit. It is
+ | depicted to emphasize that aggregation units are octet aligned
+ | only. Similarly, the NAL unit carried in the aggregation unit
+ | can terminate at the octet boundary.
+
+ The DONL field, when present, specifies the value of the 16 least
+ significant bits of the decoding order number of the aggregated NAL
+ unit.
+
+ If sprop-max-don-diff is greater than 0, the DONL field MUST be
+ present in an aggregation unit that is the first aggregation unit in
+ an AP. The variable DON for the aggregated NAL unit is derived as
+ equal to the value of the DONL field, and the variable Decoding Order
+ Number (DON) for an aggregation unit that is not the first
+ aggregation unit in an AP-aggregated NAL unit is derived as equal to
+ the DON of the preceding aggregated NAL unit in the same AP plus 1
+ modulo 65536. Otherwise (where sprop-max-don-diff is equal to 0),
+ the DONL field MUST NOT be present in an aggregation unit that is the
+ first aggregation unit in an AP.
+
+ An aggregation unit that is not the first aggregation unit in an AP
+ will be followed immediately by a 16-bit unsigned size information
+ (in network byte order) that indicates the size of the NAL unit in
+ bytes (excluding these two octets but including the NAL unit header),
+ followed by the NAL unit itself, including its NAL unit header, as
+ shown in Figure 6.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | : NALU size | NAL unit |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 6: The Structure of an Aggregation Unit That Is Not the First
+ Aggregation Unit in an AP
+
+ | Informative note: The first octet of Figure 6 (indicated by the
+ | first colon) belongs to a previous aggregation unit. It is
+ | depicted to emphasize that aggregation units are octet aligned
+ | only. Similarly, the NAL unit carried in the aggregation unit
+ | can terminate at the octet boundary.
+
+ Figure 7 presents an example of an AP that contains two aggregation
+ units, labeled "NALU 1" and "NALU 2", without the DONL field being
+ present.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | RTP Header |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | PayloadHdr (Type=56) | NALU 1 Size |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 1 HDR | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data |
+ | . . . |
+ | |
+ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | . . . | NALU 2 Size | NALU 2 HDR |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 2 HDR | |
+ +-+-+-+-+-+-+-+-+ NALU 2 Data |
+ | . . . |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 7: An Example of an AP Packet Containing Two Aggregation
+ Units without the DONL Field
+
+ Figure 8 presents an example of an AP that contains two aggregation
+ units, labeled "NALU 1" and "NALU 2", with the DONL field being
+ present.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | RTP Header |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | PayloadHdr (Type=56) | NALU 1 DONL |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 1 Size | NALU 1 HDR |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | NALU 1 Data . . . |
+ | |
+ + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | : NALU 2 Size |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 2 HDR | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data |
+ | |
+ | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 8: An Example of an AP Containing Two Aggregation Units
+ with the DONL Field
+
+4.3.3. Fragmentation Units (FUs)
+
+ FUs are introduced to enable fragmenting a single NAL unit into
+ multiple RTP packets, possibly without cooperation or knowledge of
+ the EVC encoder. A fragment of a NAL unit consists of an integer
+ number of consecutive octets of that NAL unit. Fragments of the same
+ NAL unit MUST be sent in consecutive order with ascending RTP
+ sequence numbers (with no other RTP packets within the same RTP
+ stream being sent between the first and last fragment).
+
+ When a NAL unit is fragmented and conveyed within FUs, it is referred
+ to as a fragmented NAL unit. APs MUST NOT be fragmented. FUs MUST
+ NOT be nested; i.e., an FU must not contain a subset of another FU.
+
+ The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
+ time of the fragmented NAL unit.
+
+ An FU consists of a payload header as defined in Table 4 of [EVC]
+ (denoted as PayloadHdr with Type=57), an FU header of one octet, a
+ conditional 16-bit DONL field (in network byte order), and an FU
+ payload, as shown in Figure 9.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | PayloadHdr (Type=57) | FU header | DONL (cond) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
+ | DONL (cond) | |
+ |-+-+-+-+-+-+-+-+ |
+ | FU payload |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 9: The Structure of an FU
+
+ The fields in the payload header are set as follows. The Type field
+ MUST be equal to 57. The fields F, TID, Reserve, and E MUST be equal
+ to the fields F, TID, Reserve, and E, respectively, of the fragmented
+ NAL unit.
+
+ The FU header consists of an S bit, an E bit, and a 6-bit FuType
+ field, as shown in Figure 10.
+
+ 0 1 2 3 4 5 6 7
+ +-+-+-+-+-+-+-+-+
+ |S|E| FuType |
+ +---------------+
+
+ Figure 10: The Structure of FU Header
+
+ The semantics of the FU header fields are as follows:
+
+ S: 1 bit
+
+ When set to 1, the S bit indicates the start of a fragmented NAL
+ unit, i.e., the first byte of the FU payload is also the first
+ byte of the payload of the fragmented NAL unit. When the FU
+ payload is not the start of the fragmented NAL unit payload, the S
+ bit MUST be set to 0.
+
+ E: 1 bit
+
+ When set to 1, the E bit indicates the end of a fragmented NAL
+ unit, i.e., the last byte of the payload is also the last byte of
+ the fragmented NAL unit. When the FU payload is not the last
+ fragment of a fragmented NAL unit, the E bit MUST be set to 0.
+
+ FuType: 6 bits
+
+ The field FuType MUST be equal to the field Type of the fragmented
+ NAL unit.
+
+ The DONL field, when present, specifies the value of the 16 least
+ significant bits of the decoding order number of the fragmented NAL
+ unit.
+
+ If sprop-max-don-diff is greater than 0 and the S bit is equal to 1,
+ the DONL field MUST be present in the FU, and the variable DON for
+ the fragmented NAL unit is derived as equal to the value of the DONL
+ field. Otherwise (where sprop-max-don-diff is equal to 0, or where
+ the S bit is equal to 0), the DONL field MUST NOT be present in the
+ FU.
+
+ A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
+ the S-bit and E-bit MUST NOT both be set to 1 in the same FU header.
+
+ The FU payload consists of fragments of the payload of the fragmented
+ NAL unit so that if the FU payloads of consecutive FUs, starting with
+ an FU with the S bit equal to 1 and ending with an FU with the E bit
+ equal to 1, are sequentially concatenated, the payload of the
+ fragmented NAL unit can be reconstructed. The NAL unit header of the
+ fragmented NAL unit is not included as such in the FU payload.
+ Instead, the information of the NAL unit header of the fragmented NAL
+ unit is conveyed in F, TID, Reserve, and E fields of the FU payload
+ headers of the FUs and the FuType field of the FU header of the FUs.
+ An FU payload MUST NOT be empty.
+
+ If an FU is lost, the receiver SHOULD discard all following
+ fragmentation units in transmission order corresponding to the same
+ fragmented NAL unit unless the decoder in the receiver is known to
+ gracefully handle incomplete NAL units.
+
+ A receiver in an endpoint or a MANE MAY aggregate the first n-1
+ fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
+ n of that NAL unit is not received. In this case, the
+ forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a
+ syntax violation.
+
+4.4. Decoding Order Number
+
+ For each NAL unit, the variable AbsDon is derived; it represents the
+ decoding order number that is indicative of the NAL unit decoding
+ order.
+
+ Let NAL unit n be the n-th NAL unit in transmission order within an
+ RTP stream.
+
+ If sprop-max-don-diff is equal to 0, then AbsDon[n] (the value of
+ AbsDon for NAL unit n) is derived as equal to n.
+
+ Otherwise (where sprop-max-don-diff is greater than 0), AbsDon[n] is
+ derived as follows, where DON[n] is the value of the variable DON for
+ NAL unit n:
+
+ * If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in
+ transmission order), AbsDon[0] is set equal to DON[0].
+
+ * Otherwise (where n is greater than 0), the following applies for
+ derivation of AbsDon[n]:
+
+ If DON[n] == DON[n-1],
+ AbsDon[n] = AbsDon[n-1]
+
+ If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
+ AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]
+
+ If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
+ AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]
+
+ If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
+ AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n])
+
+ If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
+ AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])
+
+ For any two NAL units (m and n), the following applies:
+
+ * When AbsDon[n] is greater than AbsDon[m], the NAL unit n follows
+ NAL unit m in NAL unit decoding order.
+
+ * When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
+ of the two NAL units can be in either order.
+
+ * When AbsDon[n] is less than AbsDon[m], the NAL unit n precedes NAL
+ unit m in decoding order.
+
+ | Informative note: When two consecutive NAL units in the NAL
+ | unit decoding order has different values of AbsDon, the
+ | absolute difference between the two AbsDon values may be
+ | greater than or equal to 1.
+
+ | Informative note: There are multiple reasons to allow the
+ | absolute difference of the values of AbsDon for two consecutive
+ | NAL units in the NAL unit decoding order to be greater than
+ | one. An increment by one is not required as at the time of
+ | associating values of AbsDon to NAL units, it may not be known
+ | whether all NAL units are to be delivered to the receiver. For
+ | example, a gateway might not forward VCL NAL units of higher
+ | sub-layers or some SEI NAL units when there is congestion in
+ | the network. In another example, the first intra-coded picture
+ | of a pre-encoded clip is transmitted in advance to ensure that
+ | it is readily available in the receiver. When transmitting the
+ | first intra-coded picture, the originator still determines how
+ | many NAL units will be encoded before the first intra-coded
+ | picture of the pre-encoded clip follows in decoding order.
+ | Thus, the values of AbsDon for the NAL units of the first
+ | intra-coded picture of the pre-encoded clip have to be
+ | estimated when they are transmitted and gaps in the values of
+ | AbsDon may occur.
+
+5. Packetization Rules
+
+ The following packetization rules apply:
+
+ * If sprop-max-don-diff is greater than 0, the transmission order of
+ NAL units carried in the RTP stream MAY be different from the NAL
+ unit decoding order. Otherwise (where sprop-max-don-diff equals
+ 0), the transmission order of NAL units carried in the RTP stream
+ MUST be the same as the NAL unit decoding order.
+
+ * A NAL unit of small size SHOULD be encapsulated in an AP together
+ with one or more other NAL units to avoid the unnecessary
+ packetization overhead for small NAL units. For example, non-VCL
+ NAL units, such as access unit delimiters, parameter sets, or SEI
+ NAL units, are typically small and can often be aggregated with
+ VCL NAL units without violating MTU size constraints.
+
+ * Each non-VCL NAL unit SHOULD, when possible from an MTU size match
+ viewpoint, be encapsulated in an AP with its associated VCL NAL
+ unit as, typically, a non-VCL NAL unit would be meaningless
+ without the associated VCL NAL unit being available.
+
+ * A single NAL unit packet MUST be used for carrying precisely one
+ NAL unit in an RTP packet.
+
+6. De-packetization Process
+
+ The general concept behind de-packetization is to get the NAL units
+ out of the RTP packets in an RTP stream and pass them to the decoder
+ in the NAL unit decoding order.
+
+ The de-packetization process is implementation dependent. Therefore,
+ the following description should be seen as an example of a suitable
+ implementation. Other schemes may also be used as long as the output
+ for the same input is the same as the process described below. The
+ output is the same when the set of output NAL units and their order
+ are both identical. Optimizations relative to the described
+ algorithms are possible.
+
+ All normal RTP mechanisms related to buffer management apply. In
+ particular, duplicated or outdated RTP packets (as indicated by the
+ RTP sequence number and the RTP timestamp) are removed. To determine
+ the exact time for decoding, factors such as a possible intentional
+ delay to allow for proper inter-stream synchronization must be
+ considered.
+
+ NAL units with NAL unit type values in the range of 0 to 55,
+ inclusive, may be passed to the decoder. NAL-unit-like structures
+ with NAL unit type values in the range of 56 to 62, inclusive, MUST
+ NOT be passed to the decoder.
+
+ The receiver includes a receiver buffer, which is used to compensate
+ for transmission delay jitter within individual RTP streams and to
+ reorder NAL units from transmission order to the NAL unit decoding
+ order. In this section, the receiver operation is described under
+ the assumption that there is no transmission delay jitter within an
+ RTP stream. To clarify the distinction from a practical receiver
+ buffer, which is also used to compensate for transmission delay
+ jitter, the buffer in this section will henceforth be referred to as
+ the "de-packetization" buffer. Receivers should also prepare for
+ transmission delay jitter; that is, either reserve separate buffers
+ for transmission delay jitter buffering and de-packetization
+ buffering, or use a receiver buffer for both transmission delay
+ jitter and de-packetization. Moreover, receivers should take
+ transmission delay jitter into account in the buffering operation,
+ e.g., by additional initial buffering before starting decoding and
+ playback.
+
+ The de-packetization process extracts the NAL units from the RTP
+ packets in an RTP stream as follows. When an RTP packet carries a
+ single NAL unit packet, the payload of the RTP packet is extracted as
+ a single NAL unit, excluding the DONL field, i.e., third and fourth
+ bytes, when sprop-max-don-diff is greater than 0. When an RTP packet
+ carries an AP, several NAL units are extracted from the payload of
+ the RTP packet. In this case, each NAL unit corresponds to the part
+ of the payload of each aggregation unit that follows the NALU size
+ field, as described in Section 4.3.2. When an RTP packet carries a
+ Fragmentation Unit (FU), all RTP packets from the first FU (with the
+ S field equal to 1) of the fragmented NAL unit up to the last FU
+ (with the E field equal to 1) of the fragmented NAL unit are
+ collected. The NAL unit is extracted from these RTP packets by
+ concatenating all FU payloads in the same order as the corresponding
+ RTP packets and appending the NAL unit header with the fields F and
+ TID set to equal the values of the fields F and TID in the payload
+ header of the FUs, respectively, and with the NAL unit type set equal
+ to the value of the field FuType in the FU header of the FUs, as
+ described in Section 4.3.3.
+
+ When sprop-max-don-diff is equal to 0, the de-packetization buffer
+ size is zero bytes, and the NAL units carried in the single RTP
+ stream are directly passed to the decoder in their transmission
+ order, which is identical to their decoding order.
+
+ When sprop-max-don-diff is greater than 0, the process described in
+ the remainder of this section applies.
+
+ The receiver has two buffering states: initial buffering and
+ buffering while playing. Initial buffering starts when the reception
+ is initialized. After initial buffering, decoding and playback are
+ started, and the buffering-while-playing mode is used.
+
+ Regardless of the buffering state, the receiver stores incoming NAL
+ units in reception order into the de-packetization buffer. NAL units
+ carried in RTP packets are stored in the de-packetization buffer
+ individually, and the value of AbsDon is calculated and stored for
+ each NAL unit.
+
+ Initial buffering lasts until the difference between the greatest and
+ smallest AbsDon values of the NAL units in the de-packetization
+ buffer is greater than or equal to the value of sprop-max-don-diff.
+
+ After initial buffering, whenever the difference between the greatest
+ and smallest AbsDon values of the NAL units in the de-packetization
+ buffer is greater than or equal to the value of sprop-max-don-diff,
+ the following operation is repeatedly applied until this difference
+ is smaller than sprop-max-don-diff:
+
+ The NAL unit in the de-packetization buffer with the smallest
+ value of AbsDon is removed from the de-packetization buffer and
+ passed to the decoder.
+
+ When no more NAL units are flowing into the de-packetization buffer,
+ all NAL units remaining in the de-packetization buffer are removed
+ from the buffer and passed to the decoder in the order of increasing
+ AbsDon values.
+
+7. Payload Format Parameters
+
+ This section specifies the optional parameters. A mapping of the
+ parameters with the Session Description Protocol (SDP) [RFC8866] is
+ also provided for applications that use SDP.
+
+ Parameters starting with the string "sprop" for stream properties can
+ be used by a sender to provide a receiver with the properties of the
+ stream that is or will be sent. The media sender (and not the
+ receiver) selects whether, and with what values, "sprop" parameters
+ are being sent. This uncommon characteristic of the "sprop"
+ parameters may not be intuitive in the context of some signaling
+ protocol concepts, especially with Offer/Answer. Please see
+ Section 7.3.2 for guidance specific to the use of sprop parameters in
+ the Offer/Answer case.
+
+7.1. Media Type Registration
+
+ The receiver MUST ignore any parameter unspecified in this document.
+
+ Type name: video
+
+ Subtype name: evc
+
+ Required parameters: N/A
+
+ Optional parameters: profile-id, level-id, toolset-id, max-recv-
+ level-id, sprop-sps, sprop-pps, sprop-sei, sprop-max-don-diff,
+ sprop-depack-buf-bytes, depack-buf-cap (refer to Section 7.2 for
+ definitions)
+
+ Encoding considerations: This type is only defined for transfer via
+ RTP [RFC3550].
+
+ Security considerations: See Section 9 of RFC 9584.
+
+ Interoperability considerations: N/A
+
+ Published specification: Please refer to RFC 9584 and EVC standard
+ [EVC].
+
+ Applications that use this media type: Any application that relies
+ on EVC-based video services over RTP
+
+ Fragment identifier considerations: N/A
+
+ Additional information: N/A
+
+ Person & email address to contact for further information:
+ Stephan Wenger (stewe@stewe.org)
+
+ Intended usage: COMMON
+
+ Restrictions on usage: N/A
+
+ Author: See Authors' Addresses section of RFC 9584.
+
+ Change controller: IETF <avtcore@ietf.org>
+
+7.2. Optional Parameters Definition
+
+ profile-id, level-id, toolset-id:
+ These parameters indicate the profile, the level, and constraints
+ of the bitstream carried by the RTP stream or a specific set of
+ the profile, the level, and constraints the receiver supports.
+
+ More specifications of these parameters, including how they relate
+ to syntax elements specified in [EVC] are provided below.
+
+ profile-id:
+ When profile-id is not present, a value of 0 (i.e., the Baseline
+ profile) MUST be inferred.
+
+ When used to indicate properties of a bitstream, profile-id MUST
+ be derived from the profile_idc in the SPS.
+
+ EVC bitstreams transported over RTP using the technologies of this
+ document SHOULD refer only to SPSs that have the same value in
+ profile_idc, unless the sender has a priori knowledge that a
+ receiver can correctly decode the EVC bitstream with different
+ profile_idc values (for example, in walled garden scenarios). As
+ exceptions to this rule, if the receiver is known to support a
+ Baseline profile, a bitstream could safely end with CVS referring
+ to an SPS wherein profile_idc indicates the Baseline Still picture
+ profile. A similar exception can be made for Main profile and
+ Main Still picture profile.
+
+ level-id:
+ When level-id is not present, a value of 90 (corresponding to
+ level 3, which allows for approximately standard-definition
+ television (SD TV) resolution and frame rates; see Annex A of
+ [EVC]) MUST be inferred.
+
+ When used to indicate properties of a bitstream, level-id MUST be
+ derived from the level_idc in the SPS.
+
+ If the level-id parameter is used for capability exchange, the
+ following applies. If max-recv-level-id is not present, the
+ default level defined by level-id indicates the highest level the
+ codec wishes to support. Otherwise, max-recv-level-id indicates
+ the highest level the codec supports for receiving. For either
+ receiving or sending, all levels that are lower than the highest
+ level supported MUST also be supported.
+
+ toolset-id:
+ This parameter is a base64-encoding representation (Section 4 of
+ [RFC4648]) of a 64-bit unsigned integer bit mask derived from the
+ concatenation, in network byte order, of the syntax elements
+ toolset_idc_h and toolset_idc_l. When used to indicate properties
+ of a bitstream, its value MUST be derived from toolset_idh_h and
+ toolset_idc_l in the sequence parameter set.
+
+ max-recv-level-id:
+ This parameter MAY be used to indicate the highest level a
+ receiver supports.
+
+ The value of max-recv-level-id MUST be in the range of 0 to 255,
+ inclusive.
+
+ When max-recv-level-id is not present, the value is inferred to be
+ equal to level-id.
+
+ max-recv-level-id MUST NOT be present when the highest level the
+ receiver supports is not higher than the default level.
+
+ sprop-sps:
+ This parameter MAY be used to convey sequence parameter set NAL
+ units of the bitstream for out-of-band transmission of sequence
+ parameter sets. The value of the parameter is a comma-separated
+ (',') list of base64-encoding representations (Section 4 of
+ [RFC4648]) of the sequence parameter set NAL units as specified in
+ Section 7.3.2.1 of [EVC].
+
+ sprop-pps:
+ This parameter MAY be used to convey picture parameter set NAL
+ units of the bitstream for out-of-band transmission of picture
+ parameter sets. The value of the parameter is a comma-separated
+ (',') list of base64-encoding representations (Section 4 of
+ [RFC4648]) of the picture parameter set NAL units as specified in
+ Section 7.3.2.2 of [EVC].
+
+ sprop-sei:
+ This parameter MAY be used to convey one or more SEI messages that
+ describe bitstream characteristics. When present, a decoder can
+ rely on the bitstream characteristics that are described in the
+ SEI messages for the entire duration of the session, independently
+ from the persistence scopes of the SEI messages as specified in
+ [VSEI].
+
+ The value of the parameter is a comma-separated (',') list of
+ base64-encoding representations (Section 4 of [RFC4648]) of SEI
+ NAL units as specified in [VSEI].
+
+ | Informative note: Intentionally, no list of applicable or
+ | inapplicable SEI messages is specified here. Conveying
+ | certain SEI messages in sprop-sei may be sensible in some
+ | application scenarios and meaningless in others. However, a
+ | couple of examples are described below.
+ |
+ | 1. In an environment where the bitstream was created from
+ | film-based source material, and no splicing is going to
+ | occur during the lifetime of the session, the film grain
+ | characteristics SEI message is likely meaningful; and
+ | sending it in sprop-sei rather than in the bitstream at
+ | each entry point may help with saving bits and allow one
+ | to configure the renderer only once, avoiding unwanted
+ | artifacts.
+ |
+ | 2. Examples for SEI messages that would be meaningless to
+ | be conveyed in sprop-sei include the decoded picture
+ | hash SEI message (it is close to impossible that all
+ | decoded pictures have the same hashtag) or the filler
+ | payload SEI message (as there is no point in just having
+ | more bits in SDP).
+
+ sprop-max-don-diff:
+ If there is no NAL unit naluA that is followed in transmission
+ order by any NAL unit preceding naluA in decoding order (i.e., the
+ transmission order of the NAL units is the same as the decoding
+ order), the value of this parameter MUST be equal to 0.
+
+ Otherwise, this parameter specifies the maximum absolute
+ difference between the decoding order number (i.e., AbsDon) values
+ of any two NAL units naluA and naluB, where naluA follows naluB in
+ decoding order and precedes naluB in transmission order.
+
+ The value of sprop-max-don-diff MUST be an integer in the range of
+ 0 to 32767, inclusive.
+
+ When not present, the value of sprop-max-don-diff is inferred to
+ be equal to 0.
+
+ sprop-depack-buf-bytes:
+ This parameter signals the required size of the de-packetization
+ buffer in units of bytes. The value of the parameter MUST be
+ greater than or equal to the maximum buffer occupancy (in units of
+ bytes) of the de-packetization buffer as specified in Section 6.
+
+ The value of sprop-depack-buf-bytes MUST be an integer in the
+ range of 0 to 4294967295, inclusive.
+
+ When sprop-max-don-diff is present and greater than 0, this
+ parameter MUST be present and the value MUST be greater than 0.
+ When not present, the value of sprop-depack-buf-bytes is inferred
+ to be equal to 0.
+
+ | Informative note: The value of sprop-depack-buf-bytes
+ | indicates the required size of the de-packetization buffer
+ | only. When network jitter can occur, an appropriately sized
+ | jitter buffer has to be available as well.
+
+ depack-buf-cap:
+ This parameter signals the capabilities of a receiver
+ implementation and indicates the amount of de-packetization buffer
+ space in units of bytes that the receiver has available for
+ reconstructing the NAL unit decoding order from NAL units carried
+ in the RTP stream. A receiver is able to handle any RTP stream
+ for which the value of the sprop-depack-buf-bytes parameter is
+ smaller than or equal to this parameter.
+
+ When not present, the value of depack-buf-cap is inferred to be
+ equal to 4294967295. The value of depack-buf-cap MUST be an
+ integer in the range of 1 to 4294967295, inclusive.
+
+ | Informative note: The value of depack-buf-cap indicates the
+ | maximum possible size of the de-packetization buffer of the
+ | receiver only, without allowing for network jitter. When
+ | network jitter occurs, an appropriately sized jitter buffer
+ | has to be available as well.
+
+7.3. SDP Parameters
+
+ The receiver MUST ignore any parameter unspecified in this document.
+
+7.3.1. Mapping of Payload Type Parameters to SDP
+
+ The media type video/evc string is mapped to fields in the Session
+ Description Protocol (SDP) [RFC8866] as follows:
+
+ * The media name in the "m=" line of SDP MUST be video.
+
+ * The encoding name in the "a=rtpmap" line of SDP MUST be evc (the
+ media subtype).
+
+ * The clock rate in the "a=rtpmap" line MUST be 90000.
+
+ * The OPTIONAL parameters profile-id, level-id, toolset-id, max-
+ recv-level-id, sprop-max-don-diff, sprop-depack-buf-bytes, and
+ depack-buf-cap, when present, MUST be included in the "a=fmtp"
+ line of SDP. The "a=fmtp" line is expressed as a media type
+ string, in the form of a semicolon-separated list of
+ parameter=value pairs.
+
+ * The OPTIONAL parameters sprop-sps, sprop-pps, and sprop-sei, when
+ present, MUST be included in the "a=fmtp" line of SDP or conveyed
+ using the "fmtp" source attribute as specified in Section 6.3 of
+ [RFC5576]. For a particular media format (i.e., RTP payload
+ type), sprop-sps, sprop-pps, or sprop-sei MUST NOT be both
+ included in the "a=fmtp" line of SDP and conveyed using the "fmtp"
+ source attribute. When included in the "a=fmtp" line of SDP,
+ those parameters are expressed as a media type string, in the form
+ of a semicolon-separated list of parameter=value pairs. When
+ conveyed in the "a=fmtp" line of SDP for a particular payload
+ type, the parameters sprop-sps, sprop-pps, and sprop-sei MUST be
+ applied to each SSRC with the payload type. When conveyed using
+ the "fmtp" source attribute, these parameters are only associated
+ with the given source and payload type as parts of the "fmtp"
+ source attribute.
+
+ | Informative note: Conveyance of sprop-sps and sprop-pps using
+ | the "fmtp" source attribute allows for out-of-band transport of
+ | parameter sets in topologies like Topo-Video-switch-MCU, as
+ | specified in [RFC7667].
+
+ A general usage of media representation in SDP is as follows:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 evc/90000
+ a=fmtp:98 profile-id=1;
+ sprop-sps=<sequence parameter set data>;
+ sprop-pps=<picture parameter set data>;
+
+ A SIP Offer/Answer exchange wherein both parties are expected to both
+ send and receive could look like the following. Only the media
+ codec-specific parts of the SDP are shown.
+
+ Offerer->Answerer:
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 evc/90000
+ a=fmtp:98 profile-id=1; level_id=90;
+
+ The above represents an offer for symmetric video communication using
+ [EVC] and its payload specification at the main profile and level 3.
+ Informally speaking, this offer tells the receiver of the offer that
+ the sender is willing to receive up to xKpxx resolution at the
+ maximum bitrates specified in [EVC]. At the same time, if this offer
+ were accepted "as is", the offer can expect that the Answerer would
+ be able to receive and properly decode EVC media up to and including
+ level 3.
+
+ Answerer->Offerer:
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 evc/90000
+ a=fmtp:98 profile-id=1; level_id=60
+
+ | Informative note: level_id shall be set equal to a value of 30
+ | times the level number specified in Table A.1 of [EVC].
+
+ With this answer to the offer above, the system receiving the offer
+ advises the Offerer that it is incapable of handling evc at level 3
+ but is capable of decoding level 2. As EVC video codecs must support
+ decoding at all levels below the maximum level they implement, the
+ resulting user experience would likely be that both systems send
+ video at level 2. However, nothing prevents an encoder from further
+ downgrading its sending to, for example, level 1 if it were short of
+ cycles or bandwidth or for other reasons.
+
+7.3.2. Usage with SDP Offer/Answer Model
+
+ This section describes the negotiation of unicast messages using the
+ Offer/Answer model described in [RFC3264] and its updates.
+
+ This section applies to all profiles defined in [EVC], specifically
+ to Baseline, Main, and the associated still image profiles.
+
+ The following limitations and rules pertaining to the media
+ configuration apply:
+
+ The parameters identifying a media format configuration for EVC are
+ profile-id and level-id. Profile_id MUST be used symmetrically.
+
+ The Answerer MUST structure its answer according to one of the
+ following two options:
+
+ * maintain all configuration parameters with the values remaining
+ the same as in the offer for the media format (payload type), with
+ the exception that the value of level-id is changeable as long as
+ the highest level indicated by the answer is not higher than that
+ indicated by the offer; or
+
+ * remove the media format (payload type) completely (when one or
+ more of the parameter values are not supported).
+
+ | Informative note: The above requirement for symmetric use does
+ | not apply for level-id and does not apply for the other
+ | bitstream or RTP stream properties and capability parameters,
+ | as described in Section 7.3.2.1 ("Payload Format
+ | Configuration").
+
+ To simplify handling and matching of these configurations, the same
+ RTP payload type number used in the offer SHOULD also be used in the
+ answer, as specified in [RFC3264].
+
+ The answer MUST NOT contain a payload type number used in the offer
+ for the media subtype unless the configuration is the same as in the
+ offer or the configuration in the answer only differs from that in
+ the offer with a different value of level-id.
+
+7.3.2.1. Payload Format Configuration
+
+ The following limitations and rules pertain to the configuration of
+ the payload format buffer management.
+
+ * The parameters sprop-max-don-diff and sprop-depack-buf-bytes
+ describe the properties of an RTP stream that the Offerer or the
+ Answerer is sending for the media format configuration. This
+ differs from the normal usage of the Offer/Answer parameters;
+ normally, such parameters declare the properties of the bitstream
+ or RTP stream that the Offerer or the Answerer is able to receive.
+ When dealing with EVC, the Offerer assumes that the Answerer will
+ be able to receive media encoded using the configuration being
+ offered.
+
+ | Informative note: The above parameters apply for any RTP
+ | stream, when present, sent by a declaring entity with the same
+ | configuration. In other words, the applicability of the above
+ | parameters to RTP streams depends on the source endpoint.
+ | Rather than being bound to the payload type, the values may
+ | have to be applied to another payload type when being sent, as
+ | they apply for the configuration.
+
+ * When an Offerer offers an interleaved stream, indicated by the
+ presence of sprop-max-don-diff with a value larger than zero, the
+ Offerer MUST include the size of the de-packetization buffer
+ sprop-depack-buf-bytes.
+
+ * To enable the Offerer and Answerer to inform each other about
+ their capabilities for de-packetization buffering in receiving RTP
+ streams, both parties are RECOMMENDED to include depack-buf-cap.
+
+ * The parameters sprop-sps or sprop-pps, when present (included in
+ the "a=fmtp" line of SDP or conveyed using the "fmtp" source
+ attribute, as specified in Section 6.3 of [RFC5576]), are used for
+ out-of-band transport of the parameter sets (SPS or PPS,
+ respectively). The Answerer MAY use either out-of-band or in-band
+ transport of parameter sets for the bitstream it is sending,
+ regardless of whether out-of-band parameter sets transport has
+ been used in the Offerer-to-Answerer direction. Parameter sets
+ included in an answer are independent of those parameter sets
+ included in the offer, as they are used for decoding two different
+ bitstreams: one from the Answerer to the Offerer, and the other in
+ the opposite direction. In case some RTP packets are sent before
+ the SDP Offer/Answer settles down, in-band parameter sets MUST be
+ used for those RTP stream parts sent before the SDP Offer/Answer.
+
+ * The following rules apply to transport of parameter sets in the
+ Offerer-to-Answerer direction.
+
+ - An offer MAY include sprop-sps and/or sprop-pps. If none of
+ these parameters are present in the offer, then only in-band
+ transport of parameter sets is used.
+
+ - If the level to use in the Offerer-to-Answerer direction is
+ equal to the default level in the offer, the Answerer MUST be
+ prepared to use the parameter sets included in sprop-sps and
+ sprop-pps (either included in the "a=fmtp" line of SDP or
+ conveyed using the "fmtp" source attribute) for decoding the
+ incoming bitstream, e.g., by passing these parameter set NAL
+ units to the video decoder before passing any NAL units carried
+ in the RTP streams. Otherwise, the Answerer MUST ignore sprop-
+ vps, sprop-sps, and sprop-pps (either included in the "a=fmtp"
+ line of SDP or conveyed using the "fmtp" source attribute), and
+ the Offerer MUST transmit parameter sets in-band.
+
+ * The following rules apply to transport of parameter sets in the
+ Answerer-to-Offerer direction.
+
+ - An answer MAY include sprop-sps and/or sprop-pps. If none of
+ these parameters are present in the answer, then only in-band
+ transport of parameter sets is used.
+
+ - The Offerer MUST be prepared to use the parameter sets included
+ in sprop-sps and sprop-pps (either included in the "a=fmtp"
+ line of SDP or conveyed using the "fmtp" source attribute) for
+ decoding the incoming bitstream, e.g., by passing these
+ parameter set NAL units to the video decoder before passing any
+ NAL units carried in the RTP streams.
+
+ * When sprop-sps and/or sprop-pps are conveyed using the "fmtp"
+ source attribute, as specified in Section 6.3 of [RFC5576], the
+ receiver of the parameters MUST store the parameter sets included
+ in sprop-sps and/or sprop-pps and associate them with the source
+ given as part of the "fmtp" source attribute. Parameter sets
+ associated with one source (given as part of the "fmtp" source
+ attribute) MUST only be used to decode NAL units conveyed in RTP
+ packets from the same source (given as part of the "fmtp" source
+ attribute). When this mechanism is in use, SSRC collision
+ detection and resolution MUST be performed as specified in
+ [RFC5576].
+
+ Figure 11 lists the interpretation of all the parameters that MAY be
+ used for the various combinations of offer, answer, and direction
+ attributes.
+
+ sendonly --+
+ recvonly --+ |
+ sendrecv --+ | |
+ | | |
+ profile-id C C P
+ level-id D D P
+ toolset-id C C P
+ max-recv-level-id R R -
+ sprop-max-don-diff P - P
+ sprop-depack-buf-bytes P - P
+ depack-buf-cap R R -
+ sprop-sei P - P
+ sprop-sps P - P
+ sprop-pps P - P
+
+
+ Legend:
+
+ C: configuration for sending and receiving bitstreams
+ D: changeable configuration; same as C, except possible to
+ answer with a different but consistent value (see the semantics
+ of the level-id parameter on these parameters being
+ consistent -- basically, level down-grading is allowed)
+
+ P: properties of the bitstream to be sent
+ R: receiver capabilities
+ -: not usable; when present MUST be ignored
+
+ Figure 11: Interpretation of Parameters for Various Combinations
+ of Offers, Answers, and Direction Attributes
+
+ Parameters used for declaring receiver capabilities are, in general,
+ downgradable, i.e., they express the upper limit for a sender's
+ possible behavior. Thus, a sender MAY select to set its encoder
+ using only lower/lesser or equal values of these parameters.
+
+ When a sender's capabilities are declared with the configuration
+ parameters, these parameters express a configuration that is
+ acceptable for the sender to receive bitstreams. In order to achieve
+ high interoperability levels, it is often advisable to offer multiple
+ alternative configurations. It is impossible to offer multiple
+ configurations in a single payload type. Thus, when multiple
+ configuration offers are made, each offer requires its own RTP
+ payload type associated with the offer.
+
+ An implementation SHOULD be able to understand all media type
+ parameters (including all optional media type parameters), even if it
+ doesn't support the functionality related to the parameter. This, in
+ conjunction with proper application logic in the implementation,
+ allows the implementation, after having received an offer, to create
+ an answer by potentially downgrading one or more of the optional
+ parameters to the point where the implementation can cope. This
+ leads to higher chances of interoperability beyond the most basic
+ interop points (for which, as described above, no optional parameters
+ are necessary).
+
+ | Informative note: In implementations of various H.26x video
+ | coding payload formats including those for [AVC] and [HEVC], it
+ | was occasionally observed that implementations were incapable
+ | of parsing most (or all) of the optional parameters and hence
+ | rejected offers other than the most basic offers. As a result,
+ | the Offer/Answer exchange resulted in a baseline performance
+ | (using the default values for the optional parameters) with the
+ | resulting suboptimal user experience. However, there are valid
+ | reasons to forego the implementation complexity of implementing
+ | the parsing of some or all of the optional parameters, for
+ | example, when there is predetermined knowledge, not negotiated
+ | by an SDP-based Offer/Answer process, of the capabilities of
+ | the involved systems (walled gardens, baseline requirements
+ | defined in application standards higher up in the stack, and
+ | similar).
+
+ An Answerer MAY extend the offer with additional media format
+ configurations. However, to enable their usage, in most cases, a
+ second offer is required from the Offerer to provide the bitstream
+ property parameters that the media sender will use. This also has
+ the effect that the Offerer has to be able to receive this media
+ format configuration, and not only to send it.
+
+7.3.3. Multicast
+
+ For bitstreams being delivered over multicast, the following rules
+ apply:
+
+ * The media format configuration is identified by profile-id and
+ level-id. These media format configuration parameters, including
+ level-id, MUST be used symmetrically; that is, the Answerer MUST
+ either maintain all configuration parameters or remove the media
+ format (payload type) completely. Note that this implies that the
+ level-id for Offer/Answer in multicast is not changeable.
+
+ * To simplify the handling and matching of these configurations, the
+ same RTP payload type number used in the offer SHOULD also be used
+ in the answer, as specified in [RFC3264]. An answer MUST NOT
+ contain a payload type number used in the offer unless the
+ configuration is the same as in the offer.
+
+ * Parameter sets received MUST be associated with the originating
+ source and MUST only be used in decoding the incoming bitstream
+ from the same source.
+
+ * The rules for other parameters are the same as above for unicast
+ as long as the three above rules are obeyed.
+
+7.3.4. Usage in Declarative Session Descriptions
+
+ When EVC over RTP is offered with SDP in a declarative style, as in
+ the Real-Time Streaming Protocol (RTSP) [RFC7826] or Session
+ Announcement Protocol (SAP) [RFC2974], the following considerations
+ apply.
+
+ * All parameters capable of indicating both bitstream properties and
+ receiver capabilities are used to indicate only bitstream
+ properties. For example, in this case, the parameters profile-id
+ and level-id declare the values used by the bitstream, not the
+ capabilities for receiving bitstreams. As a result, the following
+ interpretation of the parameters MUST be used:
+
+ - Declaring actual configuration or bitstream properties:
+
+ o profile-id
+ o level-id
+ o sprop-sps
+ o sprop-pps
+ o sprop-max-don-diff
+ o sprop-depack-buf-bytes
+ o sprop-sei
+
+ - Not usable (when present, they MUST be ignored):
+
+ o depack-buf-cap
+ o recv-sublayer-id
+
+ - A receiver of the SDP is required to support all parameters and
+ values of the parameters provided; otherwise, the receiver MUST
+ reject (RTSP) or not participate in (SAP) the session. It
+ falls on the creator of the session to use values that are
+ expected to be supported by the receiving application.
+
+7.3.5. Considerations for Parameter Sets
+
+ When out-of-band transport of parameter sets is used, parameter sets
+ MAY still be additionally transported in-band unless explicitly
+ disallowed by an application, and some of these additional parameter
+ sets may update some of the out-of-band transported parameter sets.
+ An update of a parameter set refers to the sending of a parameter set
+ of the same type using the same parameter set ID but with different
+ values for at least one other parameter of the parameter set.
+
+8. Use with Feedback Messages
+
+ The following subsections define the use of the Picture Loss
+ Indication (PLI) [RFC4585] and Full Intra Request (FIR) [RFC5104]
+ feedback messages with [EVC].
+
+ In accordance with this document, a sender MUST NOT send Slice Loss
+ Indication (SLI) or Reference Picture Selection Indication (RPSI);
+ and a receiver MUST ignore RPSI and MUST treat a received SLI as a
+ received PLI, ignoring the "First", "Number", and "PictureID" fields
+ of the PLI.
+
+8.1. Picture Loss Indication (PLI)
+
+ As specified in Section 6.3.1 of [RFC4585], the reception of a PLI by
+ a media sender indicates "the loss of an undefined amount of coded
+ video data belonging to one or more pictures". Without having any
+ specific knowledge of the setup of the bitstream (such as use and
+ location of in-band parameter sets, IDR picture locations, picture
+ structures, and so forth), a reaction to the reception of a PLI by an
+ EVC sender SHOULD be to send an IDR picture and relevant parameter
+ sets, potentially with sufficient redundancy so as to ensure correct
+ reception. However, sometimes information about the bitstream
+ structure is known. For example, such information can be parameter
+ sets that have been conveyed out of band through mechanisms not
+ defined in this document and that are known to stay static for the
+ duration of the session. In that case, it is obviously unnecessary
+ to send them in-band as a result of the reception of a PLI. Other
+ examples could be devised based on a priori knowledge of different
+ aspects of the bitstream structure. In all cases, the timing and
+ congestion-control mechanisms of [RFC4585] MUST be observed.
+
+8.2. Full Intra Request (FIR)
+
+ The purpose of the FIR message is to force an encoder to send an
+ independent decoder refresh point as soon as possible while observing
+ applicable congestion-control-related constraints, such as those set
+ out in [RFC8082].
+
+ Upon reception of a FIR, a sender MUST send an IDR picture.
+ Parameter sets MUST also be sent, except when there is a priori
+ knowledge that the parameter sets have been correctly established. A
+ typical example for that is an understanding between the sender and
+ receiver, established by means outside this document, that parameter
+ sets are exclusively sent out of band.
+
+9. Security Considerations
+
+ The scope of this section is limited to the payload format itself and
+ to one feature of [EVC] that may pose a particularly serious security
+ risk if implemented naively. The payload format, in isolation, does
+ not form a complete system. Implementers are advised to read and
+ understand relevant security-related documents, especially those
+ pertaining to RTP (see the Security Considerations in Section 14 of
+ [RFC3550]) and the security of the call-control stack chosen (that
+ may make use of the media type registration of this document).
+ Implementers should also consider known security vulnerabilities of
+ video coding and decoding implementations in general and avoid those.
+
+ Within this RTP payload format, and with the exception of the user
+ data SEI message as described below, no security threats other than
+ those common to RTP payload formats are known. In other words,
+ neither the various media-plane-based mechanisms nor the signaling
+ part of this document seem to pose a security risk beyond those
+ common to all RTP-based systems.
+
+ RTP packets using the payload format defined in this specification
+ are subject to the security considerations discussed in the RTP
+ specification [RFC3550] and in any applicable RTP profile such as
+ RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/
+ SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP
+ Does Not Mandate a Single Media Security Solution" [RFC7202]
+ discusses, it is not an RTP payload format's responsibility to
+ discuss or mandate what solutions are used to meet the basic security
+ goals like confidentiality, integrity, and source authenticity for
+ RTP in general. This responsibility lies on anyone using RTP in an
+ application. They can find guidance on available security mechanisms
+ and important considerations in "Options for Securing RTP Sessions"
+ [RFC7201]. Applications SHOULD use one or more appropriate strong
+ security mechanisms. The rest of this section discusses the security
+ impacting properties of the payload format itself.
+
+ Because the data compression used with this payload format is applied
+ end to end, any encryption needs to be performed after compression.
+ A potential denial-of-service threat exists for data encodings using
+ compression techniques that have non-uniform receiver-end
+ computational load. The attacker can inject pathological datagrams
+ into the bitstream that are complex to decode and that cause the
+ receiver to be overloaded.
+
+ EVC is particularly vulnerable to such attacks, as it is extremely
+ simple to generate datagrams containing NAL units that affect the
+ decoding process of many future NAL units. Therefore, the usage of
+ data origin authentication and data integrity protection of at least
+ the RTP packet is RECOMMENDED based on [RFC7202].
+
+ Like HEVC [RFC7798] and VVC [VVC], EVC [EVC] includes a user data
+ Supplemental Enhancement Information (SEI) message. This SEI message
+ allows inclusion of an arbitrary bitstring into the video bitstream.
+ Such a bitstring could include JavaScript, machine code, and other
+ active content.
+
+ EVC [EVC] leaves the handling of this SEI message to the receiving
+ system. In order to avoid harmful side effects of the user data SEI
+ message, decoder implementations cannot naively trust its content.
+ For example, forwarding all received JavaScript code detected by a
+ decoder implementation to a web browser unchecked would be a bad and
+ insecure implementation practice. The safest way to deal with user
+ data SEI messages is to simply discard them, but that can have
+ negative side effects on the quality of experience by the user.
+
+ End-to-end security with authentication, integrity, or
+ confidentiality protection will prevent a MANE from performing media-
+ aware operations other than discarding complete packets. In the case
+ of confidentiality protection, it will even be prevented from
+ discarding packets in a media-aware way. To be allowed to perform
+ such operations, a MANE is required to be a trusted entity that is
+ included in the security context establishment.
+
+10. Congestion Control
+
+ Congestion control for RTP SHALL be used in accordance with RTP
+ [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551] or
+ AVPF [RFC4585]. If best-effort service is being used, an additional
+ requirement is that users of this payload format MUST monitor packet
+ loss to ensure that the packet loss rate is within an acceptable
+ range. Packet loss is considered acceptable if a TCP flow across the
+ same network path and experiencing the same network conditions would
+ achieve an average throughput, measured on a reasonable timescale,
+ that is not less than all RTP streams combined. This condition can
+ be satisfied by implementing congestion-control mechanisms to adapt
+ the transmission rate by implementing the number of layers subscribed
+ for a layered multicast session or by arranging for a receiver to
+ leave the session if the loss rate is unacceptably high.
+
+ The bitrate adaptation necessary for obeying the congestion control
+ principle is easily achievable when real-time encoding is used, for
+ example, by adequately tuning the quantization parameter. However,
+ when pre-encoded content is being transmitted, bandwidth adaptation
+ requires the pre-coded bitstream to be tailored for such adaptivity.
+
+ The key mechanism available in [EVC] is temporal scalability. A
+ media sender can remove NAL units belonging to higher temporal sub-
+ layers (i.e., those NAL units with a large value of TID) until the
+ sending bitrate drops to an acceptable range.
+
+ The mechanisms mentioned above generally work within a defined
+ profile and level; therefore, no renegotiation of the channel is
+ required. Only when non-downgradable parameters (such as the
+ profile) are required to be changed does it become necessary to
+ terminate and restart the RTP streams. This may be accomplished by
+ using different RTP payload types.
+
+ MANEs MAY remove certain unusable packets from the RTP stream when
+ that RTP stream was damaged due to previous packet losses. This can
+ help reduce the network load in certain special cases. For example,
+ MANEs can remove those FUs where the leading FUs belonging to the
+ same NAL unit have been lost, because the trailing FUs are
+ meaningless to most decoders. MANE can also remove higher temporal
+ scalable layers if the outbound transmission (from the MANE's
+ viewpoint) experiences congestion.
+
+11. IANA Considerations
+
+ The media type specified in Section 7.1 has been registered with
+ IANA.
+
+12. References
+
+12.1. Normative References
+
+ [EVC] "Information technology -- General video coding -- Part 1:
+ Essential video coding", ISO/IEC 23094-1:2020, October
+ 2020, <https://www.iso.org/standard/57797.html>.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
+ with Session Description Protocol (SDP)", RFC 3264,
+ DOI 10.17487/RFC3264, June 2002,
+ <https://www.rfc-editor.org/info/rfc3264>.
+
+ [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
+ Jacobson, "RTP: A Transport Protocol for Real-Time
+ Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
+ July 2003, <https://www.rfc-editor.org/info/rfc3550>.
+
+ [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
+ Video Conferences with Minimal Control", STD 65, RFC 3551,
+ DOI 10.17487/RFC3551, July 2003,
+ <https://www.rfc-editor.org/info/rfc3551>.
+
+ [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)",
+ RFC 3711, DOI 10.17487/RFC3711, March 2004,
+ <https://www.rfc-editor.org/info/rfc3711>.
+
+ [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
+ "Extended RTP Profile for Real-time Transport Control
+ Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
+ DOI 10.17487/RFC4585, July 2006,
+ <https://www.rfc-editor.org/info/rfc4585>.
+
+ [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
+ Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
+ <https://www.rfc-editor.org/info/rfc4648>.
+
+ [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
+ "Codec Control Messages in the RTP Audio-Visual Profile
+ with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
+ February 2008, <https://www.rfc-editor.org/info/rfc5104>.
+
+ [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for
+ Real-time Transport Control Protocol (RTCP)-Based Feedback
+ (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
+ 2008, <https://www.rfc-editor.org/info/rfc5124>.
+
+ [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific
+ Media Attributes in the Session Description Protocol
+ (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
+ <https://www.rfc-editor.org/info/rfc5576>.
+
+ [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M.,
+ and M. Stiemerling, Ed., "Real-Time Streaming Protocol
+ Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December
+ 2016, <https://www.rfc-editor.org/info/rfc7826>.
+
+ [RFC8082] Wenger, S., Lennox, J., Burman, B., and M. Westerlund,
+ "Using Codec Control Messages in the RTP Audio-Visual
+ Profile with Feedback with Layered Codecs", RFC 8082,
+ DOI 10.17487/RFC8082, March 2017,
+ <https://www.rfc-editor.org/info/rfc8082>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+ [RFC8866] Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP:
+ Session Description Protocol", RFC 8866,
+ DOI 10.17487/RFC8866, January 2021,
+ <https://www.rfc-editor.org/info/rfc8866>.
+
+ [RFC9328] Zhao, S., Wenger, S., Sanchez, Y., Wang, Y.-K., and M. M.
+ Hannuksela, "RTP Payload Format for Versatile Video Coding
+ (VVC)", RFC 9328, DOI 10.17487/RFC9328, December 2022,
+ <https://www.rfc-editor.org/info/rfc9328>.
+
+ [VSEI] ITU-T, "Versatile supplemental enhancement information
+ messages for coded video bitstreams", ITU-T
+ Recommendation H.274, March 2024,
+ <https://www.itu.int/rec/T-REC-H.274>.
+
+12.2. Informative References
+
+ [AVC] ITU-T, "Part 10: Advanced video coding", ITU-T
+ Recommendation H.264, October 2014,
+ <https://www.iso.org/standard/66069.html>.
+
+ [HEVC] ITU-T, "High efficiency video coding", ITU-T
+ Recommendation H.265, November 2019,
+ <https://www.itu.int/rec/T-REC-H.265>.
+
+ [MPEG2S] IS0/IEC, "Information technology - Generic coding of
+ moving pictures and associated audio information - Part 1:
+ Systems", ISO/IEC 13818-1:2013, June 2013.
+
+ [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session
+ Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974,
+ October 2000, <https://www.rfc-editor.org/info/rfc2974>.
+
+ [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
+ Payload Format for H.264 Video", RFC 6184,
+ DOI 10.17487/RFC6184, May 2011,
+ <https://www.rfc-editor.org/info/rfc6184>.
+
+ [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A.
+ Eleftheriadis, "RTP Payload Format for Scalable Video
+ Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011,
+ <https://www.rfc-editor.org/info/rfc6190>.
+
+ [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP
+ Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
+ <https://www.rfc-editor.org/info/rfc7201>.
+
+ [RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP
+ Framework: Why RTP Does Not Mandate a Single Media
+ Security Solution", RFC 7202, DOI 10.17487/RFC7202, April
+ 2014, <https://www.rfc-editor.org/info/rfc7202>.
+
+ [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
+ B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
+ for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
+ DOI 10.17487/RFC7656, November 2015,
+ <https://www.rfc-editor.org/info/rfc7656>.
+
+ [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
+ DOI 10.17487/RFC7667, November 2015,
+ <https://www.rfc-editor.org/info/rfc7667>.
+
+ [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M.
+ M. Hannuksela, "RTP Payload Format for High Efficiency
+ Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798,
+ March 2016, <https://www.rfc-editor.org/info/rfc7798>.
+
+ [VIDEO-CODING]
+ ITU-T, "Video coding for low bit rate communication",
+ ITU-T Recommendation H.263, January 2005,
+ <https://www.itu.int/rec/T-REC-H.263>.
+
+ [VVC] ITU-T, "Versatile video coding", ITU-T
+ Recommendation H.266, August 2020,
+ <http://www.itu.int/rec/T-REC-H.266>.
+
+Acknowledgements
+
+ Large parts of this specification share text with the RTP payload
+ format for VVC [RFC9328]. Roman Chernyak is thanked for his valuable
+ review comments. We thank the authors of that specification for
+ their excellent work.
+
+Authors' Addresses
+
+ Shuai Zhao
+ Intel
+ 2200 Mission College Blvd
+ Santa Clara, California 95054
+ United States of America
+ Email: shuai.zhao@ieee.org
+
+
+ Stephan Wenger
+ Tencent
+ 2747 Park Blvd
+ Palo Alto, California 94588
+ United States of America
+ Email: stewe@stewe.org
+
+
+ Youngkwon Lim
+ Samsung Electronics
+ 6625 Excellence Way
+ Plano, Texas 75013
+ United States of America
+ Email: yklwhite@gmail.com