1 files changed, 3088 insertions, 0 deletions
diff --git a/doc/rfc/rfc9328.txt b/doc/rfc/rfc9328.txt
new file mode 100644
index 0000000..8a530a8
--- /dev/null
+++ b/doc/rfc/rfc9328.txt
@@ -0,0 +1,3088 @@
+
+
+
+
+Internet Engineering Task Force (IETF)                           S. Zhao
+Request for Comments: 9328                                         Intel
+Category: Standards Track                                      S. Wenger
+ISSN: 2070-1721                                                  Tencent
+                                                              Y. Sanchez
+                                                          Fraunhofer HHI
+                                                              Y.-K. Wang
+                                                          Bytedance Inc.
+                                                         M. M Hannuksela
+                                                      Nokia Technologies
+                                                           December 2022
+
+
+          RTP Payload Format for Versatile Video Coding (VVC)
+
+Abstract
+
+   This memo describes an RTP payload format for the Versatile Video
+   Coding (VVC) specification, which was published as both ITU-T
+   Recommendation H.266 and ISO/IEC International Standard 23090-3.  VVC
+   was developed by the Joint Video Experts Team (JVET).  The RTP
+   payload format allows for packetization of one or more Network
+   Abstraction Layer (NAL) units in each RTP packet payload, as well as
+   fragmentation of a NAL unit into multiple RTP packets.  The payload
+   format has wide applicability in videoconferencing, Internet video
+   streaming, and high-bitrate entertainment-quality video, among other
+   applications.
+
+Status of This Memo
+
+   This is an Internet Standards Track document.
+
+   This document is a product of the Internet Engineering Task Force
+   (IETF).  It represents the consensus of the IETF community.  It has
+   received public review and has been approved for publication by the
+   Internet Engineering Steering Group (IESG).  Further information on
+   Internet Standards is available in Section 2 of RFC 7841.
+
+   Information about the current status of this document, any errata,
+   and how to provide feedback on it may be obtained at
+   https://www.rfc-editor.org/info/rfc9328.
+
+Copyright Notice
+
+   Copyright (c) 2022 IETF Trust and the persons identified as the
+   document authors.  All rights reserved.
+
+   This document is subject to BCP 78 and the IETF Trust's Legal
+   Provisions Relating to IETF Documents
+   (https://trustee.ietf.org/license-info) in effect on the date of
+   publication of this document.  Please review these documents
+   carefully, as they describe your rights and restrictions with respect
+   to this document.  Code Components extracted from this document must
+   include Revised BSD License text as described in Section 4.e of the
+   Trust Legal Provisions and are provided without warranty as described
+   in the Revised BSD License.
+
+Table of Contents
+
+   1.  Introduction
+     1.1.  Overview of the VVC Codec
+       1.1.1.  Coding-Tool Features (Informative)
+       1.1.2.  Systems and Transport Interfaces (Informative)
+       1.1.3.  High-Level Picture Partitioning (Informative)
+       1.1.4.  NAL Unit Header
+     1.2.  Overview of the Payload Format
+   2.  Conventions
+   3.  Definitions and Abbreviations
+     3.1.  Definitions
+       3.1.1.  Definitions from the VVC Specification
+       3.1.2.  Definitions Specific to This Memo
+     3.2.  Abbreviations
+   4.  RTP Payload Format
+     4.1.  RTP Header Usage
+     4.2.  Payload Header Usage
+     4.3.  Payload Structures
+       4.3.1.  Single NAL Unit Packets
+       4.3.2.  Aggregation Packets (APs)
+       4.3.3.  Fragmentation Units
+     4.4.  Decoding Order Number
+   5.  Packetization Rules
+   6.  De-packetization Process
+   7.  Payload Format Parameters
+     7.1.  Media Type Registration
+     7.2.  Optional Parameters Definition
+     7.3.  SDP Parameters
+       7.3.1.  Mapping of Payload Type Parameters to SDP
+       7.3.2.  Usage with SDP Offer/Answer Model
+       7.3.3.  Multicast
+       7.3.4.  Usage in Declarative Session Descriptions
+       7.3.5.  Considerations for Parameter Sets
+   8.  Use with Feedback Messages
+     8.1.  Picture Loss Indication (PLI)
+     8.2.  Full Intra Request (FIR)
+   9.  Security Considerations
+   10. Congestion Control
+   11. IANA Considerations
+   12. References
+     12.1.  Normative References
+     12.2.  Informative References
+   Acknowledgements
+   Authors' Addresses
+
+1.  Introduction
+
+   The Versatile Video Coding specification was formally published as
+   both ITU-T Recommendation H.266 [VVC] and ISO/IEC International
+   Standard 23090-3 [ISO23090-3].  VVC is reported to provide
+   significant coding efficiency gains over High Efficiency Video Coding
+   [HEVC], also known as H.265, and other earlier video codecs.
+
+   This memo specifies an RTP payload format for VVC.  It shares its
+   basic design with the NAL-unit-based RTP payload formats of Advanced
+   Video Coding (AVC) [RFC6184], Scalable Video Coding (SVC) [RFC6190],
+   and High Efficiency Video Coding (HEVC) [RFC7798], as well as their
+   respective predecessors.  With respect to design philosophy,
+   security, congestion control, and overall implementation complexity,
+   it has similar properties to those earlier payload format
+   specifications.  This is a conscious choice, as at least [RFC6184] is
+   widely deployed and generally known in the relevant implementer
+   communities.  Certain scalability-related mechanisms known from
+   [RFC6190] were incorporated into this document, as VVC version 1
+   supports temporal, spatial, and signal-to-noise ratio (SNR)
+   scalability.
+
+1.1.  Overview of the VVC Codec
+
+   VVC and HEVC share a similar hybrid video codec design.  In this
+   memo, we provide a very brief overview of those features of VVC that
+   are, in some form, addressed by the payload format specified herein.
+   Implementers have to read, understand, and apply the ITU-T/ISO/IEC
+   specifications pertaining to VVC to arrive at interoperable, well-
+   performing implementations.
+
+   Conceptually, both VVC and HEVC include a Video Coding Layer (VCL),
+   which is often used to refer to the coding-tool features, and a NAL,
+   which is often used to refer to the systems and transport interface
+   aspects of the codecs.
+
+1.1.1.  Coding-Tool Features (Informative)
+
+   Coding-tool features are described below with occasional reference to
+   the coding-tool set of HEVC, which is well known in the community.
+
+   Similar to earlier hybrid-video-coding-based standards, including
+   HEVC, the following basic video coding design is employed by VVC.  A
+   prediction signal is first formed by either intra- or motion-
+   compensated prediction, and the residual (the difference between the
+   original and the prediction) is then coded.  The gains in coding
+   efficiency are achieved by redesigning and improving almost all parts
+   of the codec over earlier designs.  In addition, VVC includes several
+   tools to make the implementation on parallel architectures easier.
+
+   Finally, VVC includes temporal, spatial, and SNR scalability, as well
+   as multiview coding support.
+
+   Coding blocks and transform structure
+      Among major coding-tool differences between HEVC and VVC, one of
+      the important improvements is the more flexible coding tree
+      structure in VVC, i.e., multi-type tree.  In addition to quadtree,
+      binary and ternary trees are also supported, which contributes
+      significant improvement in coding efficiency.  Moreover, the
+      maximum size of a coding tree unit (CTU) is increased from 64x64
+      to 128x128.  To improve the coding efficiency of chroma signal,
+      luma-chroma-separated trees at CTU level may be employed for intra
+      slices.  The square transforms in HEVC are extended to non-square
+      transforms for rectangular blocks resulting from binary and
+      ternary tree splits.  Besides, VVC supports multiple transform
+      sets (MTSs), including DCT-2, DST-7, and DCT-8, as well as the
+      non-separable secondary transform.  The transforms used in VVC can
+      have different sizes with support for larger transform sizes.  For
+      DCT-2, the transform sizes range from 2x2 to 64x64, and for DST-7
+      and DCT-8, the transform sizes range from 4x4 to 32x32.  In
+      addition, VVC also support sub-block transform for both intra- and
+      inter-coded blocks.  For intra-coded blocks, intra sub-
+      partitioning (ISP) may be used to allow sub-block-based intra
+      prediction and transform.  For inter blocks, sub-block transform
+      may be used assuming that only a part of an inter block has non-
+      zero transform coefficients.
+
+   Entropy coding
+      Similar to HEVC, VVC uses a single entropy-coding engine, which is
+      based on context adaptive binary arithmetic coding [CABAC] but
+      with the support of multi-window sizes.  The window sizes can be
+      initialized differently for different context models.  Due to such
+      a design, it has more efficient adaptation speed and better coding
+      efficiency.  A joint chroma residual coding scheme is applied to
+      further exploit the correlation between the residuals of two color
+      components.  In VVC, different residual coding schemes are applied
+      for regular transform coefficients and residual samples generated
+      using transform-skip mode.
+
+   In-loop filtering
+      VVC has more feature support in loop filters than HEVC.  The
+      deblocking filter in VVC is similar to HEVC but operates at a
+      smaller grid.  After deblocking and sample adaptive offset (SAO),
+      an adaptive loop filter (ALF) may be used.  As a Wiener filter,
+      ALF reduces distortion of decoded pictures.  Besides, VVC
+      introduces a new module called luma mapping with chroma scaling to
+      fully utilize the dynamic range of signal so that rate-distortion
+      performance of both Standard Dynamic Range (SDR) and High Dynamic
+      Range (HDR) content is improved.
+
+   Motion prediction and coding
+      Compared to HEVC, VVC introduces several improvements in this
+      area.  First, there is the adaptive motion vector resolution
+      (AMVR), which can save bit cost for motion vectors by adaptively
+      signaling motion vector resolution.  Then, the affine motion
+      compensation is included to capture complicated motion-like
+      zooming and rotation.  Meanwhile, prediction refinement with the
+      optical flow (PROF) with affine mode is further deployed to mimic
+      affine motion at the pixel level.  Thirdly, the decoder-side
+      motion vector refinement (DMVR) is a method to derive the motion
+      vector at the decoder side based on block matching so that fewer
+      bits may be spent on motion vectors.  Bidirectional optical flow
+      (BDOF) is a similar method to PROF.  BDOF adds a sample-wise
+      offset at the 4x4 sub-block level that is derived with equations
+      based on gradients of the prediction samples and a motion
+      difference relative to coding-unit (CU) motion vectors.
+      Furthermore, merge with motion vector difference (MMVD) is a
+      special mode that further signals a limited set of motion vector
+      differences on top of merge mode.  In addition to MMVD, there are
+      another three types of special merge modes, i.e., sub-block merge,
+      triangle, and combined intra/inter prediction (CIIP).  The sub-
+      block merge list includes one candidate of sub-block temporal
+      motion vector prediction (SbTMVP) and up to four candidates of
+      affine motion vectors.  Triangle is based on triangular block
+      motion compensation.  CIIP combines intra and inter predictions
+      with weighting.  Adaptive weighting may be employed with a block-
+      level tool called bi-prediction with CU-based weighting (BCW),
+      which provides more flexibility than in HEVC.
+
+   Intra prediction and intra coding
+      To capture the diversified local image texture directions with
+      finer granularity, VVC supports 65 angular directions instead of
+      33 directions in HEVC.  The intra mode coding is based on a 6-
+      most-probable-modes scheme, and the 6 most probable modes are
+      derived using the neighboring intra prediction directions.  In
+      addition, to deal with the different distributions of intra
+      prediction angles for different block aspect ratios, a wide-angle-
+      intra-prediction (WAIP) scheme is applied in VVC by including
+      intra prediction angles beyond those present in HEVC.  Unlike
+      HEVC, which only allows using the most adjacent line of reference
+      samples for intra prediction, VVC also allows using two further
+      reference lines, known as multi-reference-line (MRL) intra
+      prediction.  The additional reference lines can be only used for
+      the 6 most probable intra prediction modes.  To capture the strong
+      correlation between different color components, in VVC, a cross-
+      component linear mode (CCLM) is utilized, which assumes a linear
+      relationship between the luma sample values and their associated
+      chroma samples.  For intra prediction, VVC also applies a
+      position-dependent prediction combination (PDPC) for refining the
+      prediction samples closer to the intra prediction block boundary.
+      Matrix-based intra prediction (MIP) modes are also used in VVC,
+      which generates an up to 8x8 intra prediction block using a
+      weighted sum of downsampled neighboring reference samples, and the
+      weights are hard-coded constants.
+
+   Other coding-tool features
+      VVC introduces dependent quantization (DQ) to reduce quantization
+      error by state-based switching between two quantizers.
+
+1.1.2.  Systems and Transport Interfaces (Informative)
+
+   VVC inherits the basic systems and transport interface designs from
+   HEVC and AVC.  These include the NAL-unit-based syntax structure, the
+   hierarchical syntax and data unit structure, the supplemental
+   enhancement information (SEI) message mechanism, and the video
+   buffering model based on the hypothetical reference decoder (HRD).
+   The scalability features of VVC are conceptually similar to the
+   scalable extension of HEVC, known as SHVC.  The hierarchical syntax
+   and data unit structure consists of parameter sets at various levels
+   (i.e., decoder, sequence (pertaining to all), sequence (pertaining to
+   a single), and picture), picture-level header parameters, slice-level
+   header parameters, and lower-level parameters.
+
+   A number of key components that influenced the network abstraction
+   layer design of VVC, as well as this memo, are described below
+
+   Decoding capability information
+      The decoding capability information (DCI) includes parameters that
+      stay constant for the lifetime of a VVC bitstream in the duration
+      of a video conference, continuous video stream, and similar, i.e.,
+      any video that is processed by a decoder between setup and
+      teardown.  For streaming, the requirement of constant parameters
+      pertains through splicing.  Such information includes profile,
+      level, and sub-profile information to determine a maximum
+      capability interop point that is guaranteed to never be exceeded,
+      even if splicing of video sequences occurs within a session.  It
+      further includes constraint fields (most of which are flags),
+      which can optionally be set to indicate that the video bitstream
+      will be constrained in the use of certain features, as indicated
+      by the values of those fields.  With this, a bitstream can be
+      labeled as not using certain tools, which allows, among other
+      things, for resource allocation in a decoder implementation.
+
+   Video parameter set
+      The video parameter set (VPS) pertains to one or more coded video
+      sequences (CVSs) of multiple layers covering the same range of
+      access units and includes, among other information, decoding
+      dependency expressed as information for reference-picture-list
+      construction of enhancement layers.  The VPS provides a "big
+      picture" of a scalable sequence, including what types of operation
+      points are provided; the profile, tier, and level of the operation
+      points; and some other high-level properties of the bitstream that
+      can be used as the basis for session negotiation and content
+      selection, etc.  One VPS may be referenced by one or more sequence
+      parameter sets.
+
+   Sequence parameter set
+      The sequence parameter set (SPS) contains syntax elements
+      pertaining to a coded layer video sequence (CLVS), which is a
+      group of pictures belonging to the same layer, starting with a
+      random access point, and followed by pictures that may depend on
+      each other until the next random access point picture.  In MPEG-2,
+      the equivalent of a CVS was a group of pictures (GOP), which
+      normally started with an I frame and was followed by P and B
+      frames.  While more complex in its options of random access
+      points, VVC retains this basic concept.  One remarkable difference
+      of VVC is that a CLVS may start with a Gradual Decoding Refresh
+      (GDR) picture without requiring presence of traditional random
+      access points in the bitstream, such as instantaneous decoding
+      refresh (IDR) or clean random access (CRA) pictures.  In many TV-
+      like applications, a CVS contains a few hundred milliseconds to a
+      few seconds of video.  In video conferencing (without switching
+      Multipoint Control Units (MCUs) involved), a CVS can be as long in
+      duration as the whole session.
+
+   Picture and adaptation parameter set
+      The picture parameter set (PPS) and the adaptation parameter set
+      (APS) carry information pertaining to zero or more pictures and
+      zero or more slices, respectively.  The PPS contains information
+      that is likely to stay constant from picture to picture, at least
+      for pictures for a certain type, whereas the APS contains
+      information, such as adaptive loop filter coefficients, that are
+      likely to change from picture to picture or even within a picture.
+      A single APS is referenced by all slices of the same picture if
+      that APS contains information about luma mapping with chroma
+      scaling (LMCS) or a scaling list.  Different APSs containing ALF
+      parameters can be referenced by slices of the same picture.
+
+   Picture header
+      A picture header (PH) contains information that is common to all
+      slices that belong to the same picture.  Being able to send that
+      information as a separate NAL unit when pictures are split into
+      several slices allows for saving bitrate, compared to repeating
+      the same information in all slices.  However, there might be
+      scenarios where low-bitrate video is transmitted using a single
+      slice per picture.  Having a separate NAL unit to convey that
+      information incurs in an overhead for such scenarios.  For such
+      scenarios, the picture header syntax structure is directly
+      included in the slice header, instead of its own NAL unit.  The
+      mode of the picture header syntax structure being included in its
+      own NAL unit or not can only be switched on/off for an entire CLVS
+      and can only be switched off when, in the entire CLVS, each
+      picture contains only one slice.
+
+   Profile, tier, and level
+      The profile, tier, and level syntax structures in DCI, VPS, and
+      SPS contain profile, tier, and level information for all layers
+      that refer to the DCI, for layers associated with one or more
+      output layer sets specified by the VPS, and for any layer that
+      refers to the SPS, respectively.
+
+   Sub-profiles
+      Within the VVC specification, a sub-profile is a 32-bit number,
+      coded according to ITU-T Recommendation T.35, that does not carry
+      semantics.  It is carried in the profile_tier_level structure and
+      hence is (potentially) present in the DCI, VPS, and SPS.  External
+      registration bodies can register a T.35 codepoint with ITU-T
+      registration authorities and associate with their registration a
+      description of bitstream restrictions beyond the profiles defined
+      by ITU-T and ISO/IEC.  This would allow encoder manufacturers to
+      label the bitstreams generated by their encoder as complying with
+      such sub-profile.  It is expected that upstream standardization
+      organizations (such as Digital Video Broadcasting (DVB) and
+      Advanced Television Systems Committee (ATSC)), as well as walled-
+      garden video services, will take advantage of this labeled system.
+      In contrast to "normal" profiles, it is expected that sub-profiles
+      may indicate encoder choices traditionally left open in the
+      (decoder-centric) video coding specifications, such as GOP
+      structures, minimum/maximum Quantizer Parameter (QP) values, and
+      the mandatory use of certain tools or SEI messages.
+
+   General constraint fields
+      The profile_tier_level structure carries a considerable number of
+      constraint fields (most of which are flags), which an encoder can
+      use to indicate to a decoder that it will not use a certain tool
+      or technology.  They were included in reaction to a perceived
+      market need to label a bitstream as not exercising a certain tool
+      that has become commercially unviable.
+
+   Temporal scalability support
+      VVC includes support of temporal scalability, by the inclusion of
+      the signaling of TemporalId in the NAL unit header, the
+      restriction that pictures of a particular temporal sublayer cannot
+      be used for inter prediction reference by pictures of a lower
+      temporal sublayer, the sub-bitstream extraction process, and the
+      requirement that each sub-bitstream extraction output be a
+      conforming bitstream.  Media-Aware Network Elements (MANEs) can
+      utilize the TemporalId in the NAL unit header for stream
+      adaptation purposes based on temporal scalability.
+
+   Reference picture resampling (RPR)
+      In AVC and HEVC, the spatial resolution of pictures cannot change
+      unless a new sequence using a new SPS starts, with an intra random
+      access point (IRAP) picture.  VVC enables picture resolution
+      change within a sequence at a position without encoding an IRAP
+      picture, which is always intra coded.  This feature is sometimes
+      referred to as reference picture resampling (RPR), as the feature
+      needs resampling of a reference picture used for inter prediction
+      when that reference picture has a different resolution than the
+      current picture being decoded.  RPR allows resolution change
+      without the need of coding an IRAP picture and hence avoids a
+      momentary bit rate spike caused by an IRAP picture in streaming or
+      video conferencing scenarios, e.g., to cope with network condition
+      changes.  RPR can also be used in application scenarios wherein
+      zooming of the entire video region or some region of interest is
+      needed.
+
+   Spatial, SNR, and multiview scalability
+      VVC includes support for spatial, SNR, and multiview scalability.
+      Scalable video coding is widely considered to have technical
+      benefits and enrich services for various video applications.
+      Until recently, however, the functionality has not been included
+      in the first version of specifications of the video codecs.  In
+      VVC, however, all those forms of scalability are supported in the
+      first version of VVC natively through the signaling of the
+      nuh_layer_id in the NAL unit header, the VPS that associates
+      layers with the given nuh_layer_id to each other, reference
+      picture selection, reference picture resampling for spatial
+      scalability, and a number of other mechanisms not relevant for
+      this memo.
+
+      Spatial scalability
+         With the existence of reference picture resampling (RPR), the
+         additional burden for scalability support is just a
+         modification of the high-level syntax (HLS).  The inter-layer
+         prediction is employed in a scalable system to improve the
+         coding efficiency of the enhancement layers.  In addition to
+         the spatial and temporal motion-compensated predictions that
+         are available in a single-layer codec, the inter-layer
+         prediction in VVC uses the possibly resampled video data of the
+         reconstructed reference picture from a reference layer to
+         predict the current enhancement layer.  The resampling process
+         for inter-layer prediction, when used, is performed at the
+         block level, reusing the existing interpolation process for
+         motion compensation in single-layer coding.  It means that no
+         additional resampling process is needed to support spatial
+         scalability.
+
+      SNR scalability
+         SNR scalability is similar to spatial scalability except that
+         the resampling factors are 1:1.  In other words, there is no
+         change in resolution, but there is inter-layer prediction.
+
+      Multiview scalability
+         The first version of VVC also supports multiview scalability,
+         wherein a multi-layer bitstream carries layers representing
+         multiple views, and one or more of the represented views can be
+         output at the same time.
+
+   SEI messages
+      Supplemental enhancement information (SEI) messages are
+      information in the bitstream that do not influence the decoding
+      process as specified in the VVC specification but address issues
+      of representation/rendering of the decoded bitstream, label the
+      bitstream for certain applications, and other, similar tasks.  The
+      overall concept of SEI messages and many of the messages
+      themselves has been inherited from the AVC and HEVC
+      specifications.  Except for the SEI messages that affect the
+      specification of the hypothetical reference decoder (HRD), other
+      SEI messages for use in the VVC environment, which are generally
+      useful also in other video coding technologies, are not included
+      in the main VVC specification but in a companion specification
+      [VSEI].
+
+1.1.3.  High-Level Picture Partitioning (Informative)
+
+   VVC inherited the concept of tiles and wavefront parallel processing
+   (WPP) from HEVC, with some minor to moderate differences.  The basic
+   concept of slices was kept in VVC but designed in an essentially
+   different form.  VVC is the first video coding standard that includes
+   subpictures as a feature, which provides the same functionality as
+   HEVC motion-constrained tile sets (MCTSs) but designed differently to
+   have better coding efficiency and to be friendlier for usage in
+   application systems.  More details of these differences are described
+   below.
+
+   Tiles and WPP
+      Same as in HEVC, a picture can be split into tile rows and tile
+      columns in VVC, in-picture prediction across tile boundaries is
+      disallowed, etc.  However, the syntax for signaling of tile
+      partitioning has been simplified by using a unified syntax design
+      for both the uniform and the non-uniform mode.  In addition,
+      signaling of entry point offsets for tiles in the slice header is
+      optional in VVC, while it is mandatory in HEVC.  The WPP design in
+      VVC has two differences compared to HEVC: i) the CTU row delay is
+      reduced from two CTUs to one CTU, and ii) signaling of entry point
+      offsets for WPP in the slice header is optional in VVC while it is
+      mandatory in HEVC.
+
+   Slices
+      In VVC, the conventional slices based on CTUs (as in HEVC) or
+      macroblocks (as in AVC) have been removed.  The main reasoning
+      behind this architectural change is as follows.  The advances in
+      video coding since 2003 (the publication year of AVC v1) have been
+      such that slice-based error concealment has become practically
+      impossible due to the ever-increasing number and efficiency of in-
+      picture and inter-picture prediction mechanisms.  An error-
+      concealed picture is the decoding result of a transmitted coded
+      picture for which there is some data loss (e.g., loss of some
+      slices) of the coded picture or a reference picture, as at least
+      some part of the coded picture is not error-free (e.g., that
+      reference picture was an error-concealed picture).  For example,
+      when one of the multiple slices of a picture is lost, it may be
+      error-concealed using an interpolation of the neighboring slices.
+      While advanced video coding prediction mechanisms provide
+      significantly higher coding efficiency, they also make it harder
+      for machines to estimate the quality of an error-concealed
+      picture, which was already a hard problem with the use of simpler
+      prediction mechanisms.  Advanced in-picture prediction mechanisms
+      also cause the coding efficiency loss due to splitting a picture
+      into multiple slices to be more significant.  Furthermore, network
+      conditions become significantly better while, at the same time,
+      techniques for dealing with packet losses have become
+      significantly improved.  As a result, very few implementations
+      have recently used slices for maximum-transmission-unit-size
+      matching.  Instead, substantially all applications where low-delay
+      error resilience is required (e.g., video telephony and video
+      conferencing) rely on system/transport-level error resilience
+      (e.g., retransmission or forward error correction) and/or picture-
+      based error resilience tools (e.g., feedback-based error
+      resilience, insertion of IRAPs, scalability with a higher
+      protection level of the base layer, and so on).  Considering all
+      the above, nowadays, it is very rare that a picture that cannot be
+      correctly decoded is passed to the decoder, and when such a rare
+      case occurs, the system can afford to wait for an error-free
+      picture to be decoded and available for display without resulting
+      in frequent and long periods of picture freezing seen by end
+      users.
+
+      Slices in VVC have two modes: rectangular slices and raster-scan
+      slices.  The rectangular slice, as indicated by its name, covers a
+      rectangular region of the picture.  Typically, a rectangular slice
+      consists of several complete tiles.  However, it is also possible
+      that a rectangular slice is a subset of a tile and consists of one
+      or more consecutive, complete CTU rows within a tile.  A raster-
+      scan slice consists of one or more complete tiles in a tile
+      raster-scan order; hence, the region covered by raster-scan slices
+      need not but could have a non-rectangular shape, but it may also
+      happen to have the shape of a rectangle.  The concept of slices in
+      VVC is therefore strongly linked to or based on tiles instead of
+      CTUs (as in HEVC) or macroblocks (as in AVC).
+
+   Subpictures
+      VVC is the first video coding standard that includes the support
+      of subpictures as a feature.  Each subpicture consists of one or
+      more complete rectangular slices that collectively cover a
+      rectangular region of the picture.  A subpicture may be either
+      specified to be extractable (i.e., coded independently of other
+      subpictures of the same picture and of earlier pictures in
+      decoding order) or not extractable.  Regardless of whether a
+      subpicture is extractable or not, the encoder can control whether
+      in-loop filtering (including deblocking, SAO, and ALF) is applied
+      across the subpicture boundaries individually for each subpicture.
+
+      Functionally, subpictures are similar to the motion-constrained
+      tile sets (MCTSs) in HEVC.  They both allow independent coding and
+      extraction of a rectangular subset of a sequence of coded pictures
+      for use cases like viewport-dependent 360-degree video streaming
+      optimization and region of interest (ROI) applications.
+
+      There are several important design differences between subpictures
+      and MCTSs.  First, the subpictures featured in VVC allow motion
+      vectors of a coding block to point outside of the subpicture, even
+      when the subpicture is extractable by applying sample padding at
+      the subpicture boundaries, in this case, similarly as at picture
+      boundaries.  Second, additional changes were introduced for the
+      selection and derivation of motion vectors in the merge mode and
+      in the decoder-side motion vector refinement process of VVC.  This
+      allows higher coding efficiency compared to the non-normative
+      motion constraints applied at the encoder-side for MCTSs.  Third,
+      rewriting of slice headers (SHs) (and PH NAL units, when present)
+      is not needed when extracting one or more extractable subpictures
+      from a sequence of pictures to create a sub-bitstream that is a
+      conforming bitstream.  In sub-bitstream extractions based on HEVC
+      MCTSs, rewriting of SHs is needed.  Note that, in both HEVC MCTSs
+      extraction and VVC subpictures extraction, rewriting of SPSs and
+      PPSs is needed.  However, typically, there are only a few
+      parameter sets in a bitstream, whereas each picture has at least
+      one slice; therefore, rewriting of SHs can be a significant burden
+      for application systems.  Fourth, slices of different subpictures
+      within a picture are allowed to have different NAL unit types.
+      Fifth, VVC specifies HRD and level definitions for subpicture
+      sequences, thus the conformance of the sub-bitstream of each
+      extractable subpicture sequence can be ensured by encoders.
+
+1.1.4.  NAL Unit Header
+
+   VVC maintains the NAL unit concept of HEVC with modifications.  VVC
+   uses a two-byte NAL unit header, as shown in Figure 1.  The payload
+   of a NAL unit refers to the NAL unit excluding the NAL unit header.
+
+                     +---------------+---------------+
+                     |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
+                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                     |F|Z| LayerID   |  Type   | TID |
+                     +---------------+---------------+
+
+             Figure 1: The Structure of the VVC NAL Unit Header
+
+   The semantics of the fields in the NAL unit header are as specified
+   in VVC and described briefly below for convenience.  In addition to
+   the name and size of each field, the corresponding syntax element
+   name in VVC is also provided.
+
+   F: 1 bit
+      forbidden_zero_bit.  This field is required to be zero in VVC.
+      Note that the inclusion of this bit in the NAL unit header was to
+      enable transport of VVC video over MPEG-2 transport systems
+      (avoidance of start code emulations) [MPEG2S].  In the context of
+      this payload format, the value 1 may be used to indicate a syntax
+      violation, e.g., for a NAL unit resulted from aggregating a number
+      of fragmented units of a NAL unit but missing the last fragment,
+      as described in the last sentence of Section 4.3.3.
+
+   Z: 1 bit
+      nuh_reserved_zero_bit.  This field is required to be zero in VVC,
+      and reserved for future extensions by ITU-T and ISO/IEC.
+      This memo does not overload the "Z" bit for local extensions a)
+      because overloading the "F" bit is sufficient and b) in order to
+      preserve the usefulness of this memo to possible future versions
+      of [VVC].
+
+   LayerId: 6 bits
+      nuh_layer_id.  This field identifies the layer a NAL unit belongs
+      to, wherein a layer may be, e.g., a spatial scalable layer, a
+      quality scalable layer, a layer containing a different view, etc.
+
+   Type: 5 bits
+      nal_unit_type.  This field specifies the NAL unit type, as defined
+      in Table 5 of [VVC].  For a reference of all currently defined NAL
+      unit types and their semantics, please refer to Section 7.4.2.2 in
+      [VVC].
+
+   TID: 3 bits
+      nuh_temporal_id_plus1.  This field specifies the temporal
+      identifier of the NAL unit plus 1.  The value of TemporalId is
+      equal to TID minus 1.  A TID value of 0 is illegal to ensure that
+      there is at least one bit in the NAL unit header equal to 1 in
+      order to enable the consideration of start code emulations in the
+      NAL unit payload data independent of the NAL unit header.
+
+1.2.  Overview of the Payload Format
+
+   This payload format defines the following processes required for
+   transport of VVC coded data over RTP [RFC3550]:
+
+   *  usage of the RTP header with this payload format
+
+   *  packetization of VVC coded NAL units into RTP packets using three
+      types of payload structures: a single NAL unit packet, aggregation
+      packet, and fragment unit
+
+   *  transmission of VVC NAL units of the same bitstream within a
+      single RTP stream
+
+   *  media type parameters to be used with the Session Description
+      Protocol (SDP) [RFC8866]
+
+   *  usage of RTCP feedback messages
+
+2.  Conventions
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+   "OPTIONAL" in this document are to be interpreted as described in BCP
+   14 [RFC2119] [RFC8174] when, and only when, they appear in all
+   capitals, as shown here.
+
+3.  Definitions and Abbreviations
+
+3.1.  Definitions
+
+   This document uses the terms and definitions of VVC.  Section 3.1.1
+   lists relevant definitions from [VVC] for convenience.  Section 3.1.2
+   provides definitions specific to this memo.  All the used terms and
+   definitions in this memo are verbatim copies from the [VVC]
+   specification.
+
+3.1.1.  Definitions from the VVC Specification
+
+   Access unit (AU):
+      A set of PUs that belong to different layers and contain coded
+      pictures associated with the same time for output from the DPB.
+
+   Adaptation parameter set (APS):
+      A syntax structure containing syntax elements that apply to zero
+      or more slices as determined by zero or more syntax elements found
+      in slice headers.
+
+   Bitstream:
+      A sequence of bits, in the form of a NAL unit stream or a byte
+      stream, that forms the representation of a sequence of AUs forming
+      one or more coded video sequences (CVSs).
+
+   Coded picture:
+      A coded representation of a picture comprising VCL NAL units with
+      a particular value of nuh_layer_id within an AU and containing all
+      CTUs of the picture.
+
+   Clean random access (CRA) PU:
+      A PU in which the coded picture is a CRA picture.
+
+   Clean random access (CRA) picture:
+      An IRAP picture for which each VCL NAL unit has nal_unit_type
+      equal to CRA_NUT.
+
+   Coded video sequence (CVS):
+      A sequence of AUs that consists, in decoding order, of a CVSS AU,
+      followed by zero or more AUs that are not CVSS AUs, including all
+      subsequent AUs up to but not including any subsequent AU that is a
+      CVSS AU.
+
+   Coded video sequence start (CVSS) AU:
+      An AU in which there is a PU for each layer in the CVS and the
+      coded picture in each PU is a CLVSS picture.
+
+   Coded layer video sequence (CLVS):
+      A sequence of PUs with the same value of nuh_layer_id that
+      consists, in decoding order, of a CLVSS PU, followed by zero or
+      more PUs that are not CLVSS PUs, including all subsequent PUs up
+      to but not including any subsequent PU that is a CLVSS PU.
+
+   Coded layer video sequence start (CLVSS) PU:
+      A PU in which the coded picture is a CLVSS picture.
+
+   Coded layer video sequence start (CLVSS) picture:
+      A coded picture that is an IRAP picture with
+      NoOutputBeforeRecoveryFlag equal to 1 or a GDR picture with
+      NoOutputBeforeRecoveryFlag equal to 1.
+
+   Coding Tree Block (CTB):
+      An NxN block of samples for some value of N such that the division
+      of a component into CTBs is a partitioning.
+
+   Coding tree unit (CTU):
+      A CTB of luma samples, two corresponding CTBs of chroma samples of
+      a picture that has three sample arrays, or a CTB of samples of a
+      monochrome picture or a picture that is coded using three separate
+      colour planes and syntax structures used to code the samples.
+
+   Coding Unit (CU):
+      A coding block of luma samples, two corresponding coding blocks of
+      chroma samples of a picture that has three sample arrays in the
+      single tree mode, or a coding block of luma samples of a picture
+      that has three sample arrays in the dual tree mode, or two coding
+      blocks of chroma samples of a picture that has three sample arrays
+      in the dual tree mode, or a coding block of samples of a
+      monochrome picture, and syntax structures used to code the
+      samples.
+
+   Decoding Capability Information (DCI):
+      A syntax structure containing syntax elements that apply to the
+      entire bitstream.
+
+   Decoded picture buffer (DPB):
+      A buffer holding decoded pictures for reference, output
+      reordering, or output delay specified for the hypothetical
+      reference decoder.
+
+   Gradual decoding refresh (GDR) picture:
+      A picture for which each VCL NAL unit has nal_unit_type equal to
+      GDR_NUT.
+
+   Instantaneous decoding refresh (IDR) PU:
+      A PU in which the coded picture is an IDR picture.
+
+   Instantaneous decoding refresh (IDR) picture:
+      An IRAP picture for which each VCL NAL unit has nal_unit_type
+      equal to IDR_W_RADL or IDR_N_LP.
+
+   Intra random access point (IRAP) AU:
+      An AU in which there is a PU for each layer in the CVS and the
+      coded picture in each PU is an IRAP picture.
+
+   Intra random access point (IRAP) PU:
+      A PU in which the coded picture is an IRAP picture.
+
+   Intra random access point (IRAP) picture:
+      A coded picture for which all VCL NAL units have the same value of
+      nal_unit_type in the range of IDR_W_RADL to CRA_NUT, inclusive.
+
+   Layer:
+      A set of VCL NAL units that all have a particular value of
+      nuh_layer_id and the associated non-VCL NAL units.
+
+   Network abstraction layer (NAL) unit:
+      A syntax structure containing an indication of the type of data to
+      follow and bytes containing that data in the form of an RBSP
+      interspersed as necessary with emulation prevention bytes.
+
+   Network abstraction layer (NAL) unit stream:
+      A sequence of NAL units.
+
+   Output Layer Set (OLS):
+      A set of layers for which one or more layers are specified as the
+      output layers.
+
+   Operation point (OP):
+      A temporal subset of an OLS, identified by an OLS index and a
+      highest value of TemporalId.
+
+   Picture Header (PH):
+      A syntax structure containing syntax elements that apply to all
+      slices of a coded picture.
+
+   Picture parameter set (PPS):
+      A syntax structure containing syntax elements that apply to zero
+      or more entire coded pictures as determined by a syntax element
+      found in each slice header.
+
+   Picture unit (PU):
+      A set of NAL units that are associated with each other according
+      to a specified classification rule, are consecutive in decoding
+      order, and contain exactly one coded picture.
+
+   Random access:
+      The act of starting the decoding process for a bitstream at a
+      point other than the beginning of the bitstream.
+
+   Raw Byte Sequence Payload (RBSP):
+      A syntax structure containing an integer number of bytes that is
+      encapsulated in a NAL unit and is either empty or has the form of
+      a string of data bits containing syntax elements followed by an
+      RBSP stop bit and zero or more subsequent bits equal to 0.
+
+   Sequence parameter set (SPS):
+      A syntax structure containing syntax elements that apply to zero
+      or more entire CLVSs as determined by the content of a syntax
+      element found in the PPS referred to by a syntax element found in
+      each picture header.
+
+   Slice:
+      An integer number of complete tiles or an integer number of
+      consecutive complete CTU rows within a tile of a picture that are
+      exclusively contained in a single NAL unit.
+
+   Slice header (SH):
+      A part of a coded slice containing the data elements pertaining to
+      all tiles or CTU rows within a tile represented in the slice.
+
+   Sublayer:
+      A temporal scalable layer of a temporal scalable bitstream
+      consisting of VCL NAL units with a particular value of the
+      TemporalId variable, and the associated non-VCL NAL units.
+
+   Subpicture:
+      A rectangular region of one or more slices within a picture.
+
+   Sublayer representation:
+      A subset of the bitstream consisting of NAL units of a particular
+      sublayer and the lower sublayers.
+
+   Tile:
+      A rectangular region of CTUs within a particular tile column and a
+      particular tile row in a picture.
+
+   Tile column:
+      A rectangular region of CTUs having a height equal to the height
+      of the picture and a width specified by syntax elements in the
+      picture parameter set.
+
+   Tile row:
+      A rectangular region of CTUs having a height specified by syntax
+      elements in the picture parameter set and a width equal to the
+      width of the picture.
+
+   Video coding layer (VCL) NAL unit:
+      A collective term for coded slice NAL units and the subset of NAL
+      units that have reserved values of nal_unit_type that are
+      classified as VCL NAL units in this Specification.
+
+3.1.2.  Definitions Specific to This Memo
+
+   Media-Aware Network Element (MANE):
+      A network element, such as a middlebox, selective forwarding unit,
+      or application-layer gateway that is capable of parsing certain
+      aspects of the RTP payload headers or the RTP payload and reacting
+      to their contents.
+
+         |  Informative note: The concept of a MANE goes beyond normal
+         |  routers or gateways in that a MANE has to be aware of the
+         |  signaling (e.g., to learn about the payload type mappings of
+         |  the media streams), and in that it has to be trusted when
+         |  working with Secure RTP (SRTP).  The advantage of using
+         |  MANEs is that they allow packets to be dropped according to
+         |  the needs of the media coding.  For example, if a MANE has
+         |  to drop packets due to congestion on a certain link, it can
+         |  identify and remove those packets whose elimination produces
+         |  the least adverse effect on the user experience.  After
+         |  dropping packets, MANEs must rewrite RTCP packets to match
+         |  the changes to the RTP stream, as specified in Section 7 of
+         |  [RFC3550].
+
+   NAL unit decoding order:
+      A NAL unit order that conforms to the constraints on NAL unit
+      order given in Section 7.4.2.4 in [VVC], follow the order of NAL
+      units in the bitstream.
+
+   RTP stream (see [RFC7656]):
+      Within the scope of this memo, one RTP stream is utilized to
+      transport a VVC bitstream, which may contain one or more layers,
+      and each layer may contain one or more temporal sublayers.
+
+   Transmission order:
+      The order of packets in ascending RTP sequence number order (in
+      modulo arithmetic).  Within an aggregation packet, the NAL unit
+      transmission order is the same as the order of appearance of NAL
+      units in the packet.
+
+3.2.  Abbreviations
+
+   AU      Access Unit
+
+   AP      Aggregation Packet
+
+   APS     Adaptation Parameter Set
+
+   CTU     Coding Tree Unit
+
+   CVS     Coded Video Sequence
+
+   DPB     Decoded Picture Buffer
+
+   DCI     Decoding Capability Information
+
+   DON     Decoding Order Number
+
+   FIR     Full Intra Request
+
+   FU      Fragmentation Unit
+
+   GDR     Gradual Decoding Refresh
+
+   HRD     Hypothetical Reference Decoder
+
+   IDR     Instantaneous Decoding Refresh
+
+   IRAP    Intra Random Access Point
+
+   MANE    Media-Aware Network Element
+
+   MTU     Maximum Transfer Unit
+
+   NAL     Network Abstraction Layer
+
+   NALU    Network Abstraction Layer Unit
+
+   OLS     Output Layer Set
+
+   PLI     Picture Loss Indication
+
+   PPS     Picture Parameter Set
+
+   RPSI    Reference Picture Selection Indication
+
+   SEI     Supplemental Enhancement Information
+
+   SLI     Slice Loss Indication
+
+   SPS     Sequence Parameter Set
+
+   VCL     Video Coding Layer
+
+   VPS     Video Parameter Set
+
+4.  RTP Payload Format
+
+4.1.  RTP Header Usage
+
+   The format of the RTP header is specified in [RFC3550] (reprinted as
+   Figure 2 for convenience).  This payload format uses the fields of
+   the header in a manner consistent with that specification.
+
+   The RTP payload (and the settings for some RTP header bits) for
+   aggregation packets and fragmentation units are specified in Sections
+   4.3.2 and 4.3.3, respectively.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                           timestamp                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           synchronization source (SSRC) identifier            |
+      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+      |            contributing source (CSRC) identifiers             |
+      |                             ....                              |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                 Figure 2: RTP Header According to RFC 3550
+
+   The RTP header information to be set according to this RTP payload
+   format is set as follows:
+
+   Marker bit (M): 1 bit
+      Set for the last packet, in transmission order, among each set of
+      packets that contain NAL units of one access unit.  This is in
+      line with the normal use of the M bit in video formats to allow an
+      efficient playout buffer handling.
+
+   Payload Type (PT): 7 bits
+      The assignment of an RTP payload type for this new packet format
+      is outside the scope of this document and will not be specified
+      here.  The assignment of a payload type has to be performed either
+      through the profile used or in a dynamic way.
+
+   Sequence Number (SN): 16 bits
+      Set and used in accordance with [RFC3550].
+
+   Timestamp: 32 bits
+      The RTP timestamp is set to the sampling timestamp of the content.
+      A 90 kHz clock rate MUST be used.  If the NAL unit has no timing
+      properties of its own (e.g., parameter set and SEI NAL units), the
+      RTP timestamp MUST be set to the RTP timestamp of the coded
+      pictures of the access unit in which the NAL unit (according to
+      Section 7.4.2.4 of [VVC]) is included.  Receivers MUST use the RTP
+      timestamp for the display process, even when the bitstream
+      contains picture timing SEI messages or decoding unit information
+      SEI messages, as specified in [VVC].
+
+         |  Informative note: When picture timing SEI messages are
+         |  present, the RTP sender is responsible to ensure that the
+         |  RTP timestamps are consistent with the timing information
+         |  carried in the picture timing SEI messages.
+
+   Synchronization source (SSRC): 32 bits
+      Used to identify the source of the RTP packets.  A single SSRC is
+      used for all parts of a single bitstream.
+
+4.2.  Payload Header Usage
+
+   The first two bytes of the payload of an RTP packet are referred to
+   as the payload header.  The payload header consists of the same
+   fields (F, Z, LayerId, Type, and TID) as the NAL unit header shown in
+   Section 1.1.4, irrespective of the type of the payload structure.
+
+   The TID value indicates (among other things) the relative importance
+   of an RTP packet, for example, because NAL units belonging to higher
+   temporal sublayers are not used for the decoding of lower temporal
+   sublayers.  A lower value of TID indicates a higher importance.  More
+   important NAL units MAY be better protected against transmission
+   losses than less-important NAL units.
+
+4.3.  Payload Structures
+
+   Three different types of RTP packet payload structures are specified.
+   A receiver can identify the type of an RTP packet payload through the
+   Type field in the payload header.
+
+   The three different payload structures are as follows:
+
+   *  Single NAL unit packet: Contains a single NAL unit in the payload,
+      and the NAL unit header of the NAL unit also serves as the payload
+      header.  This payload structure is specified in Section 4.3.1.
+
+   *  Aggregation Packet (AP): Contains more than one NAL unit within
+      one access unit.  This payload structure is specified in
+      Section 4.3.2.
+
+   *  Fragmentation Unit (FU): Contains a subset of a single NAL unit.
+      This payload structure is specified in Section 4.3.3.
+
+4.3.1.  Single NAL Unit Packets
+
+   A single NAL unit packet contains exactly one NAL unit and consists
+   of a payload header, as defined in Table 5 of [VVC] (denoted here as
+   PayloadHdr), following with a conditional 16-bit DONL field (in
+   network byte order), and the NAL unit payload data (the NAL unit
+   excluding its NAL unit header) of the contained NAL unit, as shown in
+   Figure 3.
+
+      0                   1                   2                   3
+      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+     |           PayloadHdr          |      DONL (conditional)       |
+     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+     |                                                               |
+     |                  NAL unit payload data                        |
+     |                                                               |
+     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+     |                               :...OPTIONAL RTP padding        |
+     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+            Figure 3: The Structure of a Single NAL Unit Packet
+
+   The DONL field, when present, specifies the value of the 16 least
+   significant bits of the decoding order number of the contained NAL
+   unit.  If sprop-max-don-diff (defined in Section 7.2) is greater than
+   0, the DONL field MUST be present, and the variable DON for the
+   contained NAL unit is derived as equal to the value of the DONL
+   field.  Otherwise (sprop-max-don-diff is equal to 0), the DONL field
+   MUST NOT be present.
+
+4.3.2.  Aggregation Packets (APs)
+
+   Aggregation packets (APs) can reduce packetization overhead for small
+   NAL units, such as most of the non-VCL NAL units, which are often
+   only a few octets in size.
+
+   An AP aggregates NAL units of one access unit, and it MUST NOT
+   contain NAL units from more than one AU.  Each NAL unit to be carried
+   in an AP is encapsulated in an aggregation unit.  NAL units
+   aggregated in one AP are included in NAL-unit-decoding order.
+
+   An AP consists of a payload header, as defined in Table 5 of [VVC]
+   (denoted here as PayloadHdr with Type=28), followed by two or more
+   aggregation units, as shown in Figure 4.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |    PayloadHdr (Type=28)       |                               |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
+    |                                                               |
+    |             two or more aggregation units                     |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+              Figure 4: The Structure of an Aggregation Packet
+
+   The fields in the payload header of an AP are set as follows.  The F
+   bit MUST be equal to 0 if the F bit of each aggregated NAL unit is
+   equal to zero; otherwise, it MUST be equal to 1.  The Type field MUST
+   be equal to 28.
+
+   The value of LayerId MUST be equal to the lowest value of LayerId of
+   all the aggregated NAL units.  The value of TID MUST be the lowest
+   value of TID of all the aggregated NAL units.
+
+      |  Informative note: All VCL NAL units in an AP have the same TID
+      |  value since they belong to the same access unit.  However, an
+      |  AP may contain non-VCL NAL units for which the TID value in the
+      |  NAL unit header may be different than the TID value of the VCL
+      |  NAL units in the same AP.
+
+      |  Informative note: If a system envisions subpicture-level or
+      |  picture-level modifications, for example, by removing
+      |  subpictures or pictures of a particular layer, a good design
+      |  choice on the sender's side would be to aggregate NAL units
+      |  belonging to only the same subpicture or picture of a
+      |  particular layer.
+
+   An AP MUST carry at least two aggregation units and can carry as many
+   aggregation units as necessary; however, the total amount of data in
+   an AP obviously MUST fit into an IP packet, and the size SHOULD be
+   chosen so that the resulting IP packet is smaller than the MTU size
+   in order to avoid IP layer fragmentation.  An AP MUST NOT contain the
+   FUs specified in Section 4.3.3.  APs MUST NOT be nested, i.e., an AP
+   cannot contain another AP.
+
+   The first aggregation unit in an AP consists of a conditional 16-bit
+   DONL field (in network byte order), followed by 16 bits of unsigned
+   size information (in network byte order) that indicate the size of
+   the NAL unit in bytes (excluding these two octets but including the
+   NAL unit header), followed by the NAL unit itself, including its NAL
+   unit header, as shown in Figure 5.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |               :       DONL (conditional)      |   NALU size   |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |   NALU size   |                                               |
+    +-+-+-+-+-+-+-+-+         NAL unit                              |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+       Figure 5: The Structure of the First Aggregation Unit in an AP
+
+      |  Informative note: The first octet of Figure 5 (indicated by the
+      |  first colon) belongs to a previous aggregation unit.  It is
+      |  depicted to emphasize that aggregation units are octet aligned
+      |  only.  Similarly, the NAL unit carried in the aggregation unit
+      |  can terminate at the octet boundary.
+
+   The DONL field, when present, specifies the value of the 16 least
+   significant bits of the decoding order number of the aggregated NAL
+   unit.
+
+   If sprop-max-don-diff is greater than 0, the DONL field MUST be
+   present in an aggregation unit that is the first aggregation unit in
+   an AP, and the variable DON for the aggregated NAL unit is derived as
+   equal to the value of the DONL field, and the variable DON for an
+   aggregation unit that is not the first aggregation unit in an AP-
+   aggregated NAL unit is derived as equal to the DON of the preceding
+   aggregated NAL unit in the same AP plus 1 modulo 65536.  Otherwise
+   (sprop-max-don-diff is equal to 0), the DONL field MUST NOT be
+   present in an aggregation unit that is the first aggregation unit in
+   an AP.
+
+   An aggregation unit that is not the first aggregation unit in an AP
+   will be followed immediately by 16 bits of unsigned size information
+   (in network byte order) that indicate the size of the NAL unit in
+   bytes (excluding these two octets but including the NAL unit header),
+   followed by the NAL unit itself, including its NAL unit header, as
+   shown in Figure 6.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |               :       NALU size               |   NAL unit    |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 6: The Structure of an Aggregation Unit That Is Not the First
+                         Aggregation Unit in an AP
+
+      |  Informative note: The first octet of Figure 6 (indicated by the
+      |  first colon) belongs to a previous aggregation unit.  It is
+      |  depicted to emphasize that aggregation units are octet aligned
+      |  only.  Similarly, the NAL unit carried in the aggregation unit
+      |  can terminate at the octet boundary.
+
+   Figure 7 presents an example of an AP that contains two aggregation
+   units, labeled as 1 and 2 in the figure, without the DONL field being
+   present.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          RTP Header                           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |   PayloadHdr (Type=28)        |         NALU 1 Size           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |          NALU 1 HDR           |                               |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
+    |                   . . .                                       |
+    |                                                               |
+    +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    | NALU 2 HDR    |                                               |
+    +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
+    |                   . . .                                       |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 7: An Example of an AP Packet Containing Two Aggregation
+                        Units without the DONL Field
+
+   Figure 8 presents an example of an AP that contains two aggregation
+   units, labeled as 1 and 2 in the figure, with the DONL field being
+   present.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          RTP Header                           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |   PayloadHdr (Type=28)        |        NALU 1 DONL            |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |          NALU 1 Size          |            NALU 1 HDR         |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                                                               |
+    |                 NALU 1 Data   . . .                           |
+    |                                                               |
+    +        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :          NALU 2 Size          |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |          NALU 2 HDR           |                               |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
+    |                                                               |
+    |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+       Figure 8: An Example of an AP Containing Two Aggregation Units
+                            with the DONL Field
+
+4.3.3.  Fragmentation Units
+
+   Fragmentation Units (FUs) are introduced to enable fragmenting a
+   single NAL unit into multiple RTP packets, possibly without
+   cooperation or knowledge of the [VVC] encoder.  A fragment of a NAL
+   unit consists of an integer number of consecutive octets of that NAL
+   unit.  Fragments of the same NAL unit MUST be sent in consecutive
+   order with ascending RTP sequence numbers (with no other RTP packets
+   within the same RTP stream being sent between the first and last
+   fragment).
+
+   When a NAL unit is fragmented and conveyed within FUs, it is referred
+   to as a fragmented NAL unit.  APs MUST NOT be fragmented.  FUs MUST
+   NOT be nested, i.e., an FU cannot contain a subset of another FU.
+
+   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
+   time of the fragmented NAL unit.
+
+   An FU consists of a payload header as defined in Table 5 of [VVC]
+   (denoted here as PayloadHdr with Type=29), an FU header of one octet,
+   a conditional 16-bit DONL field (in network byte order), and an FU
+   payload (as shown in Figure 9).
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |   PayloadHdr (Type=29)        |   FU header   | DONL (cond)   |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
+    |   DONL (cond) |                                               |
+    |-+-+-+-+-+-+-+-+                                               |
+    |                         FU payload                            |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                      Figure 9: The Structure of an FU
+
+   The fields in the payload header are set as follows.  The Type field
+   MUST be equal to 29.  The fields F, LayerId, and TID MUST be equal to
+   the fields F, LayerId, and TID, respectively, of the fragmented NAL
+   unit.
+
+   The FU header consists of an S bit, an E bit, an R bit, and a 5-bit
+   FuType field, as shown in Figure 10.
+
+                             +---------------+
+                             |0|1|2|3|4|5|6|7|
+                             +-+-+-+-+-+-+-+-+
+                             |S|E|P|  FuType |
+                             +---------------+
+
+                 Figure 10: The Structure of the FU Header
+
+   The semantics of the FU header fields are as follows:
+
+   S: 1 bit
+      When set to 1, the S bit indicates the start of a fragmented NAL
+      unit, i.e., the first byte of the FU payload is also the first
+      byte of the payload of the fragmented NAL unit.  When the FU
+      payload is not the start of the fragmented NAL unit payload, the S
+      bit MUST be set to 0.
+
+   E: 1 bit
+      When set to 1, the E bit indicates the end of a fragmented NAL
+      unit, i.e., the last byte of the payload is also the last byte of
+      the fragmented NAL unit.  When the FU payload is not the last
+      fragment of a fragmented NAL unit, the E bit MUST be set to 0.
+
+   P: 1 bit
+      When set to 1, the P bit indicates the last FU of the last VCL NAL
+      unit of a coded picture, i.e., the last byte of the FU payload is
+      also the last byte of the last VCL NAL unit of the coded picture.
+      When the FU payload is not the last fragment of the last VCL NAL
+      unit of a coded picture, the P bit MUST be set to 0.
+
+   FuType: 5 bits
+      The field FuType MUST be equal to the field Type of the fragmented
+      NAL unit.
+
+   The DONL field, when present, specifies the value of the 16 least
+   significant bits of the decoding order number of the fragmented NAL
+   unit.
+
+   If sprop-max-don-diff is greater than 0, and the S bit is equal to 1,
+   the DONL field MUST be present in the FU, and the variable DON for
+   the fragmented NAL unit is derived as equal to the value of the DONL
+   field.  Otherwise (sprop-max-don-diff is equal to 0, or the S bit is
+   equal to 0), the DONL field MUST NOT be present in the FU.
+
+   A non-fragmented NAL unit MUST NOT be transmitted in one FU, i.e.,
+   the Start bit and End bit must not both be set to 1 in the same FU
+   header.
+
+   The FU payload consists of fragments of the payload of the fragmented
+   NAL unit so that, if the FU payloads of consecutive FUs, starting
+   with an FU with the S bit equal to 1 and ending with an FU with the E
+   bit equal to 1, are sequentially concatenated, the payload of the
+   fragmented NAL unit can be reconstructed.  The NAL unit header of the
+   fragmented NAL unit is not included as such in the FU payload, but
+   rather the information of the NAL unit header of the fragmented NAL
+   unit is conveyed in the F, LayerId, and TID fields of the FU payload
+   headers of the FUs and the FuType field of the FU header of the FUs.
+   An FU payload MUST NOT be empty.
+
+   If an FU is lost, the receiver SHOULD discard all following
+   fragmentation units in transmission order, corresponding to the same
+   fragmented NAL unit, unless the decoder in the receiver is known to
+   be prepared to gracefully handle incomplete NAL units.
+
+   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
+   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
+   n of that NAL unit is not received.  In this case, the
+   forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a
+   syntax violation.
+
+4.4.  Decoding Order Number
+
+   For each NAL unit, the variable AbsDon is derived, representing the
+   decoding order number that is indicative of the NAL unit decoding
+   order.
+
+   Let NAL unit n be the n-th NAL unit in transmission order within an
+   RTP stream.
+
+   If sprop-max-don-diff is equal to 0, AbsDon[n], the value of AbsDon
+   for NAL unit n, is derived as equal to n.
+
+   Otherwise (sprop-max-don-diff is greater than 0), AbsDon[n] is
+   derived as follows, where DON[n] is the value of the variable DON for
+   NAL unit n:
+
+   *  If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in
+      transmission order), AbsDon[0] is set equal to DON[0].
+
+   *  Otherwise (n is greater than 0), the following applies for
+      derivation of AbsDon[n]:
+
+         If DON[n] == DON[n-1],
+            AbsDon[n] = AbsDon[n-1]
+
+         If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
+            AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]
+
+         If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
+            AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]
+
+         If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
+            AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n])
+
+         If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
+            AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])
+
+   For any two NAL units (m and n), the following applies:
+
+   *  When AbsDon[n] is greater than AbsDon[m], this indicates that NAL
+      unit n follows NAL unit m in NAL unit decoding order.
+
+   *  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
+      of the two NAL units can be in either order.
+
+   *  When AbsDon[n] is less than AbsDon[m], this indicates that NAL
+      unit n precedes NAL unit m in decoding order.
+
+      |  Informative note: When two consecutive NAL units in the NAL
+      |  unit decoding order have different values of AbsDon, the
+      |  absolute difference between the two AbsDon values may be
+      |  greater than or equal to 1.
+
+      |  Informative note: There are multiple reasons to allow for the
+      |  absolute difference of the values of AbsDon for two consecutive
+      |  NAL units in the NAL unit decoding order to be greater than
+      |  one.  An increment by one is not required, as at the time of
+      |  associating values of AbsDon to NAL units, it may not be known
+      |  whether all NAL units are to be delivered to the receiver.  For
+      |  example, a gateway might not forward VCL NAL units of higher
+      |  sublayers or some SEI NAL units when there is congestion in the
+      |  network.  In another example, the first intra-coded picture of
+      |  a pre-encoded clip is transmitted in advance to ensure that it
+      |  is readily available in the receiver, and when transmitting the
+      |  first intra-coded picture, the originator does not exactly know
+      |  how many NAL units will be encoded before the first intra-coded
+      |  picture of the pre-encoded clip follows in decoding order.
+      |  Thus, the values of AbsDon for the NAL units of the first
+      |  intra-coded picture of the pre-encoded clip have to be
+      |  estimated when they are transmitted, and gaps in values of
+      |  AbsDon may occur.
+
+5.  Packetization Rules
+
+   The following packetization rules apply:
+
+   *  If sprop-max-don-diff is greater than 0, the transmission order of
+      NAL units carried in the RTP stream MAY be different than the NAL
+      unit decoding order.  Otherwise (sprop-max-don-diff is equal to
+      0), the transmission order of NAL units carried in the RTP stream
+      MUST be the same as the NAL unit decoding order.
+
+   *  A NAL unit of a small size SHOULD be encapsulated in an
+      aggregation packet together with one or more other NAL units in
+      order to avoid the unnecessary packetization overhead for small
+      NAL units.  For example, non-VCL NAL units, such as access unit
+      delimiters, parameter sets, or SEI NAL units, are typically small
+      and can often be aggregated with VCL NAL units without violating
+      MTU size constraints.
+
+   *  Each non-VCL NAL unit SHOULD, when possible from an MTU size match
+      viewpoint, be encapsulated in an aggregation packet together with
+      its associated VCL NAL unit, as typically a non-VCL NAL unit would
+      be meaningless without the associated VCL NAL unit being
+      available.
+
+   *  For carrying exactly one NAL unit in an RTP packet, a single NAL
+      unit packet MUST be used.
+
+6.  De-packetization Process
+
+   The general concept behind de-packetization is to get the NAL units
+   out of the RTP packets in an RTP stream and pass them to the decoder
+   in the NAL unit decoding order.
+
+   The de-packetization process is implementation dependent.  Therefore,
+   the following description should be seen as an example of a suitable
+   implementation.  Other schemes may be used as well, as long as the
+   output for the same input is the same as the process described below.
+   The output is the same when the set of output NAL units and their
+   order are both identical.  Optimizations relative to the described
+   algorithms are possible.
+
+   All normal RTP mechanisms related to buffer management apply.  In
+   particular, duplicated or outdated RTP packets (as indicated by the
+   RTP sequence number and the RTP timestamp) are removed.  To determine
+   the exact time for decoding, factors, such as a possible intentional
+   delay to allow for proper inter-stream synchronization, MUST be
+   factored in.
+
+   NAL units with NAL unit type values in the range of 0 to 27,
+   inclusive, may be passed to the decoder.  NAL-unit-like structures
+   with NAL unit type values in the range of 28 to 31, inclusive, MUST
+   NOT be passed to the decoder.
+
+   The receiver includes a receiver buffer, which is used to compensate
+   for transmission delay jitter within individual RTP streams and to
+   reorder NAL units from transmission order to the NAL unit decoding
+   order.  In this section, the receiver operation is described under
+   the assumption that there is no transmission delay jitter within an
+   RTP stream.  To make a difference from a practical receiver buffer
+   that is also used for compensation of transmission delay jitter, the
+   receiver buffer is hereafter called the de-packetization buffer in
+   this section.  Receivers should also prepare for transmission delay
+   jitter, that is, either reserve separate buffers for transmission
+   delay jitter buffering and de-packetization buffering or use a
+   receiver buffer for both transmission delay jitter and de-
+   packetization.  Moreover, receivers should take transmission delay
+   jitter into account in the buffering operation, e.g., by additional
+   initial buffering before starting of decoding and playback.
+
+   The de-packetization process extracts the NAL units from the RTP
+   packets in an RTP stream as follows.  When an RTP packet carries a
+   single NAL unit packet, the payload of the RTP packet is extracted as
+   a single NAL unit, excluding the DONL field, i.e., third and fourth
+   bytes, when sprop-max-don-diff is greater than 0.  When an RTP packet
+   carries an aggregation packet, several NAL units are extracted from
+   the payload of the RTP packet.  In this case, each NAL unit
+   corresponds to the part of the payload of each aggregation unit that
+   follows the NALU size field, as described in Section 4.3.2.  When an
+   RTP packet carries a Fragmentation Unit (FU), all RTP packets from
+   the first FU (with the S field equal to 1) of the fragmented NAL unit
+   up to the last FU (with the E field equal to 1) of the fragmented NAL
+   unit are collected.  The NAL unit is extracted from these RTP packets
+   by concatenating all FU payloads in the same order as the
+   corresponding RTP packets and appending the NAL unit header with the
+   fields F, LayerId, and TID set to equal the values of the fields F,
+   LayerId, and TID in the payload header of the FUs, respectively, and
+   with the NAL unit type set equal to the value of the field FuType in
+   the FU header of the FUs, as described in Section 4.3.3.
+
+   When sprop-max-don-diff is equal to 0, the de-packetization buffer
+   size is zero bytes, and the NAL units carried in the single RTP
+   stream are directly passed to the decoder in their transmission
+   order, which is identical to their decoding order.
+
+   When sprop-max-don-diff is greater than 0, the process described in
+   the remainder of this section applies.
+
+   There are two buffering states in the receiver: initial buffering and
+   buffering while playing.  Initial buffering starts when the reception
+   is initialized.  After initial buffering, decoding and playback are
+   started, and the buffering-while-playing mode is used.
+
+   Regardless of the buffering state, the receiver stores incoming NAL
+   units in reception order into the de-packetization buffer.  NAL units
+   carried in RTP packets are stored in the de-packetization buffer
+   individually, and the value of AbsDon is calculated and stored for
+   each NAL unit.
+
+   Initial buffering lasts until the difference between the greatest and
+   smallest AbsDon values of the NAL units in the de-packetization
+   buffer is greater than or equal to the value of sprop-max-don-diff.
+
+   After initial buffering, whenever the difference between the greatest
+   and smallest AbsDon values of the NAL units in the de-packetization
+   buffer is greater than or equal to the value of sprop-max-don-diff,
+   the following operation is repeatedly applied until this difference
+   is smaller than sprop-max-don-diff:
+
+      The NAL unit in the de-packetization buffer with the smallest
+      value of AbsDon is removed from the de-packetization buffer and
+      passed to the decoder.
+
+   When no more NAL units are flowing into the de-packetization buffer,
+   all NAL units remaining in the de-packetization buffer are removed
+   from the buffer and passed to the decoder in the order of increasing
+   AbsDon values.
+
+7.  Payload Format Parameters
+
+   This section specifies the optional parameters.  A mapping of the
+   parameters with Session Description Protocol (SDP) [RFC8866] is also
+   provided for applications that use SDP.
+
+   Parameters starting with the string "sprop" for stream properties can
+   be used by a sender to provide a receiver with the properties of the
+   stream that is or will be sent.  The media sender (and not the
+   receiver) selects whether, and with what values, "sprop" parameters
+   are being sent.  This uncommon characteristic of the "sprop"
+   parameters may not be intuitive in the context of some signaling
+   protocol concepts, especially with offer/answer.  Please see
+   Section 7.3.2 for guidance specific to the use of sprop parameters in
+   the offer/answer case.
+
+7.1.  Media Type Registration
+
+   The receiver MUST ignore any parameter unspecified in this memo.
+
+   Type name:  video
+
+   Subtype name:  H266
+
+   Required parameters:  N/A
+
+   Optional parameters:  profile-id, tier-flag, sub-profile-id, interop-
+      constraints, level-id, sprop-sublayer-id, sprop-ols-id, recv-
+      sublayer-id, recv-ols-id, max-recv-level-id, sprop-dci, sprop-vps,
+      sprop-sps, sprop-pps, sprop-sei, max-lsr, max-fps, sprop-max-don-
+      diff, sprop-depack-buf-bytes, depack-buf-cap (refer to Section 7.2
+      for definitions).
+
+   Encoding considerations:  This type is only defined for transfer via
+      RTP [RFC3550].
+
+   Security considerations:  See Section 9 of RFC 9328.
+
+   Interoperability considerations:  N/A
+
+   Published specification:  Please refer to RFC 9328 and VVC coding
+      specification [VVC].
+
+   Applications that use this media type:  Any application that relies
+      on VVC-based video services over RTP
+
+   Fragment identifier considerations:  N/A
+
+   Additional information:  N/A
+
+   Person & email address to contact for further information:
+      Stephan Wenger (stewe@stewe.org)
+
+   Intended usage:  COMMON
+
+   Restrictions on usage:  N/A
+
+   Author:  See Authors' Addresses section of RFC 9328.
+
+   Change controller:  IETF <avtcore@ietf.org>
+
+7.2.  Optional Parameters Definition
+
+   profile-id, tier-flag, sub-profile-id, interop-constraints, and
+   level-id:
+      These parameters indicate the profile, the tier, the default
+      level, the sub-profile, and some constraints of the bitstream
+      carried by the RTP stream, or a specific set of the profile, the
+      tier, the default level, the sub-profile, and some constraints the
+      receiver supports.
+
+      The subset of coding tools that may have been used to generate the
+      bitstream or that the receiver supports, as well as some
+      additional constraints, are indicated collectively by profile-id,
+      sub-profile-id, and interop-constraints.
+
+         |  Informative note: There are 128 values of profile-id.  The
+         |  subset of coding tools identified by profile-id can be
+         |  further constrained with up to 255 instances of sub-profile-
+         |  id.  In addition, 68 bits included in interop-constraints,
+         |  which can be extended up to 324 bits, provide means to
+         |  further restrict tools from existing profiles.  To be able
+         |  to support this fine-granular signaling of coding-tool
+         |  subsets with profile-id, sub-profile-id, and interop-
+         |  constraints, it would be safe to require symmetric use of
+         |  these parameters in SDP offer/answer unless recv-ols-id is
+         |  included in the SDP answer for choosing one of the layers
+         |  offered.
+
+      The tier is indicated by tier-flag.  The default level is
+      indicated by level-id.  The tier and the default level specify the
+      limits on values of syntax elements or arithmetic combinations of
+      values of syntax elements that are followed when generating the
+      bitstream or that the receiver supports.
+
+      In SDP offer/answer, when the SDP answer does not include the
+      recv-ols-id parameter that is less than the sprop-ols-id parameter
+      in the SDP offer, the following applies:
+
+      *  The tier-flag, profile-id, sub-profile-id, and interop-
+         constraints parameters MUST be used symmetrically, i.e., the
+         value of each of these parameters in the offer MUST be the same
+         as that in the answer, either explicitly signaled or implicitly
+         inferred.
+
+      *  The level-id parameter is changeable as long as the highest
+         level indicated by the answer is either equal to or lower than
+         that in the offer.  Note that the highest level higher than
+         level-id in the offer for receiving can be included as max-
+         recv-level-id.
+
+      In SDP offer/answer, when the SDP answer does include the recv-
+      ols-id parameter that is less than the sprop-ols-id parameter in
+      the SDP offer, the set of tier-flag, profile-id, sub-profile-id,
+      interop-constraints, and level-id parameters included in the
+      answer MUST be consistent with that for the chosen output layer
+      set as indicated in the SDP offer, with the exception that the
+      level-id parameter in the SDP answer is changeable as long as the
+      highest level indicated by the answer is either lower than or
+      equal to that in the offer.
+
+      More specifications of these parameters, including how they relate
+      to syntax elements specified in [VVC], are provided below.
+
+   profile-id:
+      When profile-id is not present, a value of 1 (i.e., the Main 10
+      profile) MUST be inferred.
+
+      When used to indicate properties of a bitstream, profile-id is
+      derived from the general_profile_idc syntax element that applies
+      to the bitstream in an instance of the profile_tier_level( )
+      syntax structure.
+
+      VVC bitstreams transported over RTP using the technologies of this
+      memo SHOULD contain only a single profile_tier_level( ) structure
+      in the DCI, unless the sender can assure that a receiver can
+      correctly decode the VVC bitstream, regardless of which
+      profile_tier_level( ) structure contained in the DCI was used for
+      deriving profile-id and other parameters for the SDP offer/answer
+      exchange.
+
+      As specified in [VVC], a profile_tier_level( ) syntax structure
+      may be contained in an SPS NAL unit, and one or more
+      profile_tier_level( ) syntax structures may be contained in a VPS
+      NAL unit and in a DCI NAL unit.  One of the following three cases
+      applies to the container NAL unit of the profile_tier_level( )
+      syntax structure containing syntax elements used to derive the
+      values of profile-id, tier-flag, level-id, sub-profile-id, or
+      interop-constraints:
+
+      1.  The container NAL unit is an SPS, the bitstream is a single-
+          layer bitstream, and the profile_tier_level( ) syntax
+          structures in all SPSs referenced by the CVSs in the bitstream
+          have the same values respectively for those
+          profile_tier_level( ) syntax elements.
+
+      2.  The container NAL unit is a VPS, the profile_tier_level( )
+          syntax structure is the one in the VPS that applies to the OLS
+          corresponding to the bitstream, and the profile_tier_level( )
+          syntax structures applicable to the OLS corresponding to the
+          bitstream in all VPSs referenced by the CVSs in the bitstream
+          have the same values respectively for those
+          profile_tier_level( ) syntax elements.
+
+      3.  The container NAL unit is a DCI NAL unit, and the
+          profile_tier_level( ) syntax structures in all DCI NAL units
+          in the bitstream have the same values respectively for those
+          profile_tier_level( ) syntax elements.
+
+      [VVC] allows for multiple profile_tier_level( ) structures in a
+      DCI NAL unit, which may contain different values for the syntax
+      elements used to derive the values of profile-id, tier-flag,
+      level-id, sub-profile-id, or interop-constraints in the different
+      entries.  However, herein defined is only a single profile-id,
+      tier-flag, level-id, sub-profile-id, or interop-constraints.  When
+      signaling these parameters and a DCI NAL unit is present with
+      multiple profile_tier_level( ) structures, these values SHOULD be
+      the same as the first profile_tier_level structure in the DCI,
+      unless the sender has ensured that the receiver can decode the
+      bitstream when a different value is chosen.
+
+   tier-flag, level-id:
+      The value of tier-flag MUST be in the range of 0 to 1, inclusive.
+      The value of level-id MUST be in the range of 0 to 255, inclusive.
+
+      If the tier-flag and level-id parameters are used to indicate
+      properties of a bitstream, they indicate the tier and the highest
+      level the bitstream complies with.
+
+      If the tier-flag and level-id parameters are used for capability
+      exchange, the following applies.  If max-recv-level-id is not
+      present, the default level defined by level-id indicates the
+      highest level the codec wishes to support.  Otherwise, max-recv-
+      level-id indicates the highest level the codec supports for
+      receiving.  For either receiving or sending, all levels that are
+      lower than the highest level supported MUST also be supported.
+
+      If no tier-flag is present, a value of 0 MUST be inferred; if no
+      level-id is present, a value of 51 (i.e., level 3.1) MUST be
+      inferred.
+
+         |  Informative note: The level values currently defined in the
+         |  VVC specification are in the form of "majorNum.minorNum",
+         |  and the value of the level-id for each of the levels is
+         |  equal to majorNum * 16 + minorNum * 3.  It is expected that,
+         |  if any levels are defined in the future, the same convention
+         |  will be used, but this cannot be guaranteed.
+
+      When used to indicate properties of a bitstream, the tier-flag and
+      level-id parameters are derived respectively from the syntax
+      element general_tier_flag, and the syntax element
+      general_level_idc or sub_layer_level_idc[j], that apply to the
+      bitstream in an instance of the profile_tier_level( ) syntax
+      structure.
+
+      If the tier-flag and level-id are derived from the
+      profile_tier_level( ) syntax structure in a DCI NAL unit, the
+      following applies:
+
+      *  tier-flag = general_tier_flag
+
+      *  level-id = general_level_idc
+
+      Otherwise, if the tier-flag and level-id are derived from the
+      profile_tier_level( ) syntax structure in an SPS or VPS NAL unit,
+      and the bitstream contains the highest sublayer representation in
+      the OLS corresponding to the bitstream, the following applies:
+
+      *  tier-flag = general_tier_flag
+
+      *  level-id = general_level_idc
+
+      Otherwise, if the tier-flag and level-id are derived from the
+      profile_tier_level( ) syntax structure in an SPS or VPS NAL unit,
+      and the bitstream does not contain the highest sublayer
+      representation in the OLS corresponding to the bitstream, the
+      following applies, with j being the value of the sprop-sublayer-id
+      parameter:
+
+      *  tier-flag = general_tier_flag
+
+      *  level-id = sub_layer_level_idc[j]
+
+   sub-profile-id:
+      The value of the parameter is a comma-separated (',') list of data
+      using base64 encoding (Section 4 of [RFC4648]) representation
+      without "==" padding.
+
+      When used to indicate properties of a bitstream, sub-profile-id is
+      derived from each of the ptl_num_sub_profiles
+      general_sub_profile_idc[i] syntax elements that apply to the
+      bitstream in a profile_tier_level( ) syntax structure.
+
+   interop-constraints:
+      A base64 encoding (Section 4 of [RFC4648]) representation of the
+      data that includes the ptl_frame_only_constraint_flag syntax
+      element, the ptl_multilayer_enabled_flag syntax element, and the
+      general_constraints_info( ) syntax structure that apply to the
+      bitstream in an instance of the profile_tier_level( ) syntax
+      structure.
+
+      If the interop-constraints parameter is not present, the following
+      MUST be inferred:
+
+      *  ptl_frame_only_constraint_flag = 1
+
+      *  ptl_multilayer_enabled_flag = 0
+
+      *  gci_present_flag in the general_constraints_info( ) syntax
+         structure = 0
+
+      Using interop-constraints for capability exchange results in a
+      requirement on any bitstream to be compliant with the interop-
+      constraints.
+
+   sprop-sublayer-id:
+      This parameter MAY be used to indicate the highest allowed value
+      of TID in the bitstream.  When not present, the value of sprop-
+      sublayer-id is inferred to be equal to 6.
+
+      The value of sprop-sublayer-id MUST be in the range of 0 to 6,
+      inclusive.
+
+   sprop-ols-id:
+      This parameter MAY be used to indicate the OLS that the bitstream
+      applies to.  When not present, the value of sprop-ols-id is
+      inferred to be equal to TargetOlsIdx, as specified in
+      Section 8.1.1 of [VVC].  If this optional parameter is present,
+      sprop-vps MUST also be present or its content MUST be known a
+      priori at the receiver.
+
+      The value of sprop-ols-id MUST be in the range of 0 to 256,
+      inclusive.
+
+         |  Informative note: VVC allows having up to 257 output layer
+         |  sets indicated in the VPS, as the number of output layer
+         |  sets minus 2 is indicated with a field of 8 bits.
+
+   recv-sublayer-id:
+      This parameter MAY be used to signal a receiver's choice of the
+      offered or declared sublayer representations in sprop-vps and
+      sprop-sps.  The value of recv-sublayer-id indicates the TID of the
+      highest sublayer that a receiver supports.  When not present, the
+      value of recv-sublayer-id is inferred to be equal to the value of
+      the sprop-sublayer-id parameter in the SDP offer.
+
+      The value of recv-sublayer-id MUST be in the range of 0 to 6,
+      inclusive.
+
+   recv-ols-id:
+      This parameter MAY be used to signal a receiver's choice of the
+      offered or declared output layer sets in sprop-vps.  The value of
+      recv-ols-id indicates the OLS index of the bitstream that a
+      receiver supports.  When not present, the value of recv-ols-id is
+      inferred to be equal to the value of the sprop-ols-id parameter
+      inferred from or indicated in the SDP offer.  When present, the
+      value of recv-ols-id must be included only when sprop-ols-id was
+      received and must refer to an output layer set in the VPS that
+      includes no layers other than all or a subset of the layers of the
+      OLS referred to by sprop-ols-id.  If this optional parameter is
+      present, sprop-vps must have been received or its content must be
+      known a priori at the receiver.
+
+      The value of recv-ols-id MUST be in the range of 0 to 256,
+      inclusive.
+
+   max-recv-level-id:
+      This parameter MAY be used to indicate the highest level a
+      receiver supports.
+
+      The value of max-recv-level-id MUST be in the range of 0 to 255,
+      inclusive.
+
+      When max-recv-level-id is not present, the value is inferred to be
+      equal to level-id.
+
+      max-recv-level-id MUST NOT be present when the highest level the
+      receiver supports is not higher than the default level.
+
+   sprop-dci:
+      This parameter MAY be used to convey a decoding capability
+      information NAL unit of the bitstream for out-of-band
+      transmission.  The parameter MAY also be used for capability
+      exchange.  The value of the parameter is a base64 encoding
+      (Section 4 of [RFC4648]) representation of the decoding capability
+      information NAL unit, as specified in Section 7.3.2.1 of [VVC].
+
+   sprop-vps:
+      This parameter MAY be used to convey any video parameter set to
+      the NAL unit of the bitstream for out-of-band transmission of
+      video parameter sets.  The parameter MAY also be used for
+      capability exchange and to indicate substream characteristics
+      (i.e., properties of output layer sets and sublayer
+      representations, as defined in [VVC]).  The value of the parameter
+      is a comma-separated (',') list of base64 encoding (Section 4 of
+      [RFC4648]) representations of the video parameter set NAL units,
+      as specified in Section 7.3.2.3 of [VVC].
+
+      The sprop-vps parameter MAY contain one or more than one video
+      parameter set NAL units.  However, all other video parameter sets
+      contained in the sprop-vps parameter MUST be consistent with the
+      first video parameter set in the sprop-vps parameter.  A video
+      parameter set vpsB is said to be consistent with another video
+      parameter set vpsA if the number of OLSs in vpsA and vpsB are the
+      same and any decoder that conforms to the profile, tier, level,
+      and constraints indicated by the data starting from the syntax
+      element general_profile_idc to the syntax structure
+      general_constraints_info(), inclusive, in the profile_tier_level(
+      ) syntax structure corresponding to any OLS with index olsIdx in
+      vpsA can decode any CVS(s) referencing vpsB when TargetOlsIdx is
+      equal to olsIdx that conforms to the profile, tier, level, and
+      constraints indicated by the data starting from the syntax element
+      general_profile_idc to the syntax structure
+      general_constraints_info(), inclusive, in the profile_tier_level(
+      ) syntax structure corresponding to the OLS with index
+      TargetOlsIdx in vpsB.
+
+   sprop-sps:
+      This parameter MAY be used to convey sequence parameter set NAL
+      units of the bitstream for out-of-band transmission of sequence
+      parameter sets.  The value of the parameter is a comma-separated
+      (',') list of base64 encoding (Section 4 of [RFC4648])
+      representations of the sequence parameter set NAL units, as
+      specified in Section 7.3.2.4 of [VVC].
+
+      A sequence parameter set spsB is said to be consistent with
+      another sequence parameter set spsA if any decoder that conforms
+      to the profile, tier, level, and constraints indicated by the data
+      starting from the syntax element general_profile_idc to the syntax
+      structure general_constraints_info(), inclusive, in the
+      profile_tier_level( ) syntax structure in spsA can decode any
+      CLVS(s) referencing spsB that conforms to the profile, tier,
+      level, and constraints indicated by the data starting from the
+      syntax element general_profile_idc to the syntax structure
+      general_constraints_info(), inclusive, in the profile_tier_level(
+      ) syntax structure in spsB.
+
+   sprop-pps:
+      This parameter MAY be used to convey picture parameter set NAL
+      units of the bitstream for out-of-band transmission of picture
+      parameter sets.  The value of the parameter is a comma-separated
+      (',') list of base64 encoding (Section 4 of [RFC4648])
+      representations of the picture parameter set NAL units, as
+      specified in Section 7.3.2.5 of [VVC].
+
+   sprop-sei:
+      This parameter MAY be used to convey one or more SEI messages that
+      describe bitstream characteristics.  When present, a decoder can
+      rely on the bitstream characteristics that are described in the
+      SEI messages for the entire duration of the session, independently
+      from the persistence scopes of the SEI messages, as specified in
+      [VSEI].
+
+      The value of the parameter is a comma-separated (',') list of
+      base64 encoding (Section 4 of [RFC4648]) representations of SEI
+      NAL units, as specified in [VSEI].
+
+         |  Informative note: Intentionally, no list of applicable or
+         |  inapplicable SEI messages is specified here.  Conveying
+         |  certain SEI messages in sprop-sei may be sensible in some
+         |  application scenarios and meaningless in others.  However, a
+         |  few examples are described below:
+         |  
+         |  In an environment where the bitstream was created from film-
+         |  based source material, and no splicing is going to occur
+         |  during the lifetime of the session, the film grain
+         |  characteristics SEI message is likely meaningful, and
+         |  sending it in sprop-sei, rather than in the bitstream at
+         |  each entry point, may help with saving bits and allows one
+         |  to configure the renderer only once, avoiding unwanted
+         |  artifacts.
+         |  
+         |  Examples for SEI messages that would be meaningless to be
+         |  conveyed in sprop-sei include the decoded picture hash SEI
+         |  message (it is close to impossible that all decoded pictures
+         |  have the same hashtag) or the filler payload SEI message (as
+         |  there is no point in just having more bits in SDP).
+
+   max-lsr:
+      The max-lsr MAY be used to signal the capabilities of a receiver
+      implementation and MUST NOT be used for any other purpose.  The
+      value of max-lsr is an integer indicating the maximum processing
+      rate in units of luma samples per second.  The max-lsr parameter
+      signals that the receiver is capable of decoding video at a higher
+      rate than is required by the highest level.
+
+         |  Informative note: When the OPTIONAL media type parameters
+         |  are used to signal the properties of a bitstream, and max-
+         |  lsr is not present, the values of tier-flag, profile-id,
+         |  sub-profile-id, interop-constraints, and level-id must
+         |  always be such that the bitstream complies fully with the
+         |  specified profile, sub-profile, tier, level, and interop-
+         |  constraints.
+
+      When max-lsr is signaled, the receiver MUST be able to decode
+      bitstreams that conform to the highest level, with the exception
+      that the MaxLumaSr value in Table A.3 of [VVC] for the highest
+      level is replaced with the value of max-lsr.  Senders MAY use this
+      knowledge to send pictures of a given size at a higher picture
+      rate than is indicated in the highest level.
+
+      When not present, the value of max-lsr is inferred to be equal to
+      the value of MaxLumaSr given in Table A.3 of [VVC] for the highest
+      level.
+
+      The value of max-lsr MUST be in the range of MaxLumaSr to 16 *
+      MaxLumaSr, inclusive, where MaxLumaSr is given in Table A.3 of
+      [VVC] for the highest level.
+
+   max-fps:
+      The value of max-fps is an integer indicating the maximum picture
+      rate in units of pictures per 100 seconds that can be effectively
+      processed by the receiver.  The max-fps parameter MAY be used to
+      signal that the receiver has a constraint in that it is not
+      capable of processing video effectively at the full picture rate
+      that is implied by the highest level and, when present, max-lsr.
+
+      The value of max-fps is not necessarily the picture rate at which
+      the maximum picture size can be sent; it constitutes a constraint
+      on maximum picture rate for all resolutions.
+
+         |  Informative note: The max-fps parameter is semantically
+         |  different from max-lsr in that max-fps is used to signal a
+         |  constraint, lowering the maximum picture rate from what is
+         |  implied by other parameters.
+
+      The encoder MUST use a picture rate equal to or less than this
+      value.  In cases where the max-fps parameter is absent, the
+      encoder is free to choose any picture rate according to the
+      highest level and any signaled optional parameters.
+
+      The value of max-fps MUST be smaller than or equal to the full
+      picture rate that is implied by the highest level and, when
+      present, max-lsr.
+
+   sprop-max-don-diff:
+      If there is no NAL unit naluA that is followed in transmission
+      order by any NAL unit preceding naluA in decoding order (i.e., the
+      transmission order of the NAL units is the same as the decoding
+      order), the value of this parameter MUST be equal to 0.
+
+      Otherwise, this parameter specifies the maximum absolute
+      difference between the decoding order number (i.e., AbsDon) values
+      of any two NAL units naluA and naluB, where naluA follows naluB in
+      decoding order and precedes naluB in transmission order.
+
+      The value of sprop-max-don-diff MUST be an integer in the range of
+      0 to 32767, inclusive.
+
+      When not present, the value of sprop-max-don-diff is inferred to
+      be equal to 0.
+
+   sprop-depack-buf-bytes:
+      This parameter signals the required size of the de-packetization
+      buffer in units of bytes.  The value of the parameter MUST be
+      greater than or equal to the maximum buffer occupancy (in units of
+      bytes) of the de-packetization buffer, as specified in Section 6.
+
+      The value of sprop-depack-buf-bytes MUST be an integer in the
+      range of 0 to 4294967295, inclusive.
+
+      When sprop-max-don-diff is present and greater than 0, this
+      parameter MUST be present and the value MUST be greater than 0.
+      When not present, the value of sprop-depack-buf-bytes is inferred
+      to be equal to 0.
+
+         |  Informative note: The value of sprop-depack-buf-bytes
+         |  indicates the required size of the de-packetization buffer
+         |  only.  When network jitter can occur, an appropriately sized
+         |  jitter buffer has to be available as well.
+
+   depack-buf-cap:
+      This parameter signals the capabilities of a receiver
+      implementation and indicates the amount of de-packetization buffer
+      space in units of bytes that the receiver has available for
+      reconstructing the NAL unit decoding order from NAL units carried
+      in the RTP stream.  A receiver is able to handle any RTP stream
+      for which the value of the sprop-depack-buf-bytes parameter is
+      smaller than or equal to this parameter.
+
+      When not present, the value of depack-buf-cap is inferred to be
+      equal to 4294967295.  The value of depack-buf-cap MUST be an
+      integer in the range of 1 to 4294967295, inclusive.
+
+         |  Informative note: depack-buf-cap indicates the maximum
+         |  possible size of the de-packetization buffer of the receiver
+         |  only, without allowing for network jitter.
+
+7.3.  SDP Parameters
+
+   The receiver MUST ignore any parameter unspecified in this memo.
+
+7.3.1.  Mapping of Payload Type Parameters to SDP
+
+   The media type video/H266 string is mapped to fields in the Session
+   Description Protocol (SDP) [RFC8866] as follows:
+
+   *  The media name in the "m=" line of SDP MUST be video.
+
+   *  The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the
+      media subtype).
+
+   *  The clock rate in the "a=rtpmap" line MUST be 90000.
+
+   *  The OPTIONAL parameters profile-id, tier-flag, sub-profile-id,
+      interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id,
+      recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max-
+      fps, sprop-max-don-diff, sprop-depack-buf-bytes, and depack-buf-
+      cap, when present, MUST be included in the "a=fmtp" line of SDP.
+      The fmtp line is expressed as a media type string, in the form of
+      a semicolon-separated list of parameter=value pairs.
+
+   *  The OPTIONAL parameters sprop-vps, sprop-sps, sprop-pps, sprop-
+      sei, and sprop-dci, when present, MUST be included in the "a=fmtp"
+      line of SDP or conveyed using the "fmtp" source attribute as
+      specified in Section 6.3 of [RFC5576].  For a particular media
+      format (i.e., RTP payload type), sprop-vps, sprop-sps, sprop-pps,
+      sprop-sei, or sprop-dci MUST NOT be both included in the "a=fmtp"
+      line of SDP and conveyed using the "fmtp" source attribute.  When
+      included in the "a=fmtp" line of SDP, those parameters are
+      expressed as a media type string, in the form of a semicolon-
+      separated list of parameter=value pairs.  When conveyed in the
+      "a=fmtp" line of SDP for a particular payload type, the parameters
+      sprop-vps, sprop-sps, sprop-pps, sprop-sei, and sprop-dci MUST be
+      applied to each SSRC with the payload type.  When conveyed using
+      the "fmtp" source attribute, these parameters are only associated
+      with the given source and payload type as parts of the "fmtp"
+      source attribute.
+
+      |  Informative note: Conveyance of sprop-vps, sprop-sps, and
+      |  sprop-pps using the "fmtp" source attribute allows for out-of-
+      |  band transport of parameter sets in topologies like Topo-Video-
+      |  switch-MCU, as specified in [RFC7667].
+
+   A general usage of media representation in SDP is as follows:
+
+           m=video 49170 RTP/AVP 98
+           a=rtpmap:98 H266/90000
+           a=fmtp:98 profile-id=1;
+             sprop-vps=<video parameter sets data>;
+             sprop-sps=<sequence parameter set data>;
+             sprop-pps=<picture parameter set data>;
+
+   A SIP offer/answer exchange wherein both parties are expected to both
+   send and receive could look like the following.  Only the media
+   codec-specific parts of the SDP are shown.  Some lines are wrapped
+   due to text constraints.
+
+     Offerer->Answerer:
+           m=video 49170 RTP/AVP 98
+           a=rtpmap:98 H266/90000
+           a=fmtp:98 profile-id=1; level_id=83;
+
+   The above represents an offer for symmetric video communication using
+   [VVC] and its payload specification at the main profile and level 5.1
+   (and as the levels are downgradable, all lower levels).  Informally
+   speaking, this offer tells the receiver of the offer that the sender
+   is willing to receive up to 4Kp60 resolution at the maximum bitrates
+   specified in [VVC].  At the same time, if this offer were accepted
+   "as is", the offer can expect that the answerer would be able to
+   receive and properly decode H.266 media up to and including level
+   5.1.
+
+     Answerer->Offerer:
+           m=video 49170 RTP/AVP 98
+           a=rtpmap:98 H266/90000
+           a=fmtp:98 profile-id=1; level_id=67
+
+   With this answer to the offer above, the system receiving the offer
+   advises the offerer that it is incapable of handing H.266 at level
+   5.1 but is capable of decoding 1080p60.  As H.266 video codecs must
+   support decoding at all levels below the maximum level they
+   implement, the resulting user experience would likely be that both
+   systems send video at 1080p60.  However, nothing prevents an encoder
+   from further downgrading its sending to, for example, 720p30 if it
+   were short of cycles or bandwidth or for other reasons.
+
+7.3.2.  Usage with SDP Offer/Answer Model
+
+   This section describes the negotiation of unicast messages using the
+   offer/answer model as described in [RFC3264] and its updates.  The
+   section is split into subsections, covering a) media format
+   configurations not involving non-temporal scalability; b) scalable
+   media format configurations; c) the description of the use of those
+   parameters not involving the media configuration itself but rather
+   the parameters of the payload format design; and d) multicast.
+
+7.3.2.1.  Non-scalable Media Format Configuration
+
+   A non-scalable VVC media configuration is such a configuration where
+   no non-temporal scalability mechanisms are allowed.  In [VVC] version
+   1, it is implied that general_profile_idc indicates one of the
+   following profiles: Main 10, Main 10 Still Picture, Main 10 4:4:4, or
+   Main 10 4:4:4 Still Picture, with general_profile_idc values of 1,
+   65, 33, and 97, respectively.  Note that non-scalable media
+   configurations include temporal scalability inline with VVC's design
+   philosophy and profile structure.
+
+   The following limitations and rules pertaining to the media
+   configuration apply:
+
+   *  The parameters identifying a media format configuration for VVC
+      are profile-id, tier-flag, sub-profile-id, level-id, and interop-
+      constraints.  These media configuration parameters, except level-
+      id, MUST be used symmetrically.
+
+      The answerer MUST structure its answer according to one of the
+      following three options:
+
+      1.  maintain all configuration parameters with the values
+          remaining the same as in the offer for the media format
+          (payload type), with the exception that the value of level-id
+          is changeable as long as the highest level indicated by the
+          answer is not higher than that indicated by the offer;
+
+      2.  include in the answer the recv-sublayer-id parameter, with a
+          value less than the sprop-sublayer-id parameter in the offer,
+          for the media format (payload type), and maintain all
+          configuration parameters with the values remaining the same as
+          in the offer for the media format (payload type), with the
+          exception that the value of level-id is changeable as long as
+          the highest level indicated by the answer is not higher than
+          the level indicated by sprop-sps or sprop-vps in offer for the
+          chosen sublayer representation; or
+
+      3.  remove the media format (payload type) completely (when one or
+          more of the parameter values are not supported).
+
+      |  Informative note: The above requirement for symmetric use does
+      |  not apply for level-id and does not apply for the other
+      |  bitstream or RTP stream properties and capability parameters,
+      |  as described in Section 7.3.2.3 below.
+
+   *  To simplify handling and matching of these configurations, the
+      same RTP payload type number used in the offer SHOULD also be used
+      in the answer, as specified in [RFC3264].
+
+   *  The same RTP payload type number used in the offer for the media
+      subtype H266 MUST be used in the answer when the answer includes
+      recv-sublayer-id.  When the answer does not include recv-sublayer-
+      id, the answer MUST NOT contain a payload type number used in the
+      offer for the media subtype H266 unless the configuration is
+      exactly the same as in the offer or the configuration in the
+      answer only differs from that in the offer with a different value
+      of level-id.  The answer MAY contain the recv-sublayer-id
+      parameter if a VVC bitstream contains multiple operation points
+      (using temporal scalability and sublayers) and sprop-sps or sprop-
+      vps is included in the offer where information of sublayers are
+      present in the first sequence parameter set or video parameter set
+      contained in sprop-sps or sprop-vps, respectively.  If sprop-sps
+      or sprop-vps is provided in an offer, an answerer MAY select a
+      particular operation point indicated in the first sequence
+      parameter set or video parameter set contained in sprop-sps or
+      sprop-vps, respectively.  When the answer includes a recv-
+      sublayer-id that is less than a sprop-sublayer-id in the offer,
+      the following applies:
+
+      1.  When the sprop-sps parameter is present, all sequence
+          parameter sets contained in the sprop-sps parameter in the SDP
+          answer and all sequence parameter sets sent in-band for either
+          the offerer-to-answerer direction or the answerer-to-offerer
+          direction MUST be consistent with the first sequence parameter
+          set in the sprop-sps parameter of the offer (see the semantics
+          of sprop-sps in Section 7.1 of this document on one sequence
+          parameter set being consistent with another sequence parameter
+          set).
+
+      2.  When the sprop-vps parameter is present, all video parameter
+          sets contained in the sprop-vps parameter in the SDP answer
+          and all video parameter sets sent in-band for either the
+          offerer-to-answerer direction or the answerer-to-offerer
+          direction MUST be consistent with the first video parameter
+          set in the sprop-vps parameter of the offer (see the semantics
+          of sprop-vps in Section 7.1 of this document on one video
+          parameter set being consistent with another video parameter
+          set).
+
+      3.  The bitstream sent in either direction MUST conform to the
+          profile, tier, level, and constraints of the chosen sublayer
+          representation, as indicated by the profile_tier_level( )
+          syntax structure in the first sequence parameter set in the
+          sprop-sps parameter or by the first profile_tier_level( )
+          syntax structure in the first video parameter set in the
+          sprop-vps parameter of the offer.
+
+      |  Informative note: When an offerer receives an answer that does
+      |  not include recv-sublayer-id, it has to compare payload types
+      |  not declared in the offer based on the media type (i.e., video/
+      |  H266) and the above media configuration parameters with any
+      |  payload types it has already declared.  This will enable it to
+      |  determine whether the configuration in question is new or if it
+      |  is equivalent to configuration already offered, since a
+      |  different payload type number may be used in the answer.  The
+      |  ability to perform operation point selection enables a receiver
+      |  to utilize the temporal scalable nature of a VVC bitstream.
+
+7.3.2.2.  Scalable Media Format Configuration
+
+   A scalable VVC media configuration is such a configuration where non-
+   temporal scalability mechanisms are allowed.  In [VVC] version 1, it
+   is implied that general_profile_idc indicates one of the following
+   profiles: Multilayer Main 10 and Multilayer Main 10 4:4:4, with
+   general_profile_idc values of 17 and 49, respectively.
+
+   The following limitations and rules pertaining to the media
+   configuration apply.  They are listed in an order that would be
+   logical for an implementation to follow:
+
+   *  The parameters identifying a media format configuration for
+      scalable VVC are profile-id, tier-flag, sub-profile-id, level-id,
+      interop-constraints, and sprop-vps.  These media configuration
+      parameters, except level-id, MUST be used symmetrically, except as
+      noted below.
+
+   *  The answerer MAY include a level-id that MUST be lower than or
+      equal to the level-id indicated in the offer (either expressed by
+      level-id in the offer or implied by the default level, as
+      specified in Section 7.1).
+
+   *  When sprop-ols-id is present in an offer, sprop-vps MUST also be
+      present in the same offer and include at least one valid VPS so to
+      allow the answerer to meaningfully interpret sprop-ols-id and
+      select recv-ols-id (see below).
+
+   *  The answerer MUST NOT include recv-ols-id unless the offer
+      includes sprop-ols-id.  When present, recv-ols-id MUST indicate a
+      supported output layer set in the VPS that includes no layers
+      other than all or a subset of the layers of the OLS referred to by
+      sprop-ols-id.  If unable, the answerer MUST remove the media
+      format.
+
+      |  Informative note: If an offerer wants to offer more than one
+      |  output layer set, it can do so by offering multiple VVC media
+      |  with different payload types.
+
+   *  The offerer MAY include sprop-sublayer-id, which indicates the
+      highest allowed value of TID in the bitstream.  The answerer MAY
+      include recv-sublayer-id, which can be used to reduce the number
+      of sublayers from the value of sprop-sublayer-id.
+
+   *  When the answerer includes recv-ols-id and configuration
+      parameters profile-id, tier-flag, sub-profile-id, level-id, and
+      interop-constraints, it MUST use the configuration parameter
+      values as signaled in the sprop-vps for the operating point with
+      the largest number of sublayers for the chosen output layer set,
+      with the exception that the value of level-id is changeable as
+      long as the highest level indicated by the answer is not higher
+      than the level indicated by sprop-vps in offer for the operating
+      point with the largest number of sublayers for the chosen output
+      layer set.
+
+7.3.2.3.  Payload Format Configuration
+
+   The following limitations and rules pertain to the configuration of
+   the payload format buffer management mostly and apply to both
+   scalable and non-scalable VVC.
+
+   *  The parameters sprop-max-don-diff and sprop-depack-buf-bytes
+      describe the properties of an RTP stream that the offerer or the
+      answerer is sending for the media format configuration.  This
+      differs from the normal usage of the offer/answer parameters;
+      normally, such parameters declare the properties of the bitstream
+      or RTP stream that the offerer or the answerer is able to receive.
+      When dealing with VVC, the offerer assumes that the answerer will
+      be able to receive media encoded using the configuration being
+      offered.
+
+      |  Informative note: The above parameters apply for any RTP
+      |  stream, when present, sent by a declaring entity with the same
+      |  configuration.  In other words, the applicability of the above
+      |  parameters to RTP streams depends on the source endpoint.
+      |  Rather than being bound to the payload type, the values may
+      |  have to be applied to another payload type when being sent, as
+      |  they apply for the configuration.
+
+   *  The capability parameter max-lsr MAY be used to declare further
+      capabilities of the offerer or answerer for receiving.  It MUST
+      NOT be present when the direction attribute is sendonly.
+
+   *  The capability parameter max-fps MAY be used to declare lower
+      capabilities of the offerer or answerer for receiving.  It MUST
+      NOT be present when the direction attribute is sendonly.
+
+   *  When an offerer offers an interleaved stream, indicated by the
+      presence of sprop-max-don-diff with a value larger than zero, the
+      offerer MUST include the size of the de-packetization buffer
+      sprop-depack-buf-bytes.
+
+   *  To enable the offerer and answerer to inform each other about
+      their capabilities for de-packetization buffering in receiving RTP
+      streams, both parties are RECOMMENDED to include depack-buf-cap.
+
+   *  The parameters sprop-dci, sprop-vps, sprop-sps, or sprop-pps, when
+      present (included in the "a=fmtp" line of SDP or conveyed using
+      the "fmtp" source attribute, as specified in Section 6.3 of
+      [RFC5576]), are used for out-of-band transport of the parameter
+      sets (DCI, VPS, SPS, or PPS, respectively).
+
+   *  The answerer MAY use either out-of-band or in-band transport of
+      parameter sets for the bitstream it is sending, regardless of
+      whether out-of-band parameter sets transport has been used in the
+      offerer-to-answerer direction.  Parameter sets included in an
+      answer are independent of those parameter sets included in the
+      offer, as they are used for decoding two different bitstreams; one
+      from the answerer to the offerer and the other in the opposite
+      direction.  In case some RTP packets are sent before the SDP
+      offer/answer settles down, in-band parameter sets MUST be used for
+      those RTP stream parts sent before the SDP offer/answer.
+
+   *  The following rules apply to transport of parameter sets in the
+      offerer-to-answerer direction.
+
+      -  An offer MAY include sprop-dci, sprop-vps, sprop-sps, and/or
+         sprop-pps.  If none of these parameters are present in the
+         offer, then only in-band transport of parameter sets is used.
+
+      -  If the level to use in the offerer-to-answerer direction is
+         equal to the default level in the offer, the answerer MUST be
+         prepared to use the parameter sets included in sprop-vps,
+         sprop-sps, and sprop-pps (either included in the "a=fmtp" line
+         of SDP or conveyed using the "fmtp" source attribute) for
+         decoding the incoming bitstream, e.g., by passing these
+         parameter set NAL units to the video decoder before passing any
+         NAL units carried in the RTP streams.  Otherwise, the answerer
+         MUST ignore sprop-vps, sprop-sps, and sprop-pps (either
+         included in the "a=fmtp" line of SDP or conveyed using the
+         "fmtp" source attribute) and the offerer MUST transmit
+         parameter sets in-band.
+
+   *  The following rules apply to transport of parameter sets in the
+      answerer-to-offerer direction.
+
+      -  An answer MAY include sprop-dci, sprop-vps, sprop-sps, and/or
+         sprop-pps.  If none of these parameters are present in the
+         answer, then only in-band transport of parameter sets is used.
+
+      -  The offerer MUST be prepared to use the parameter sets included
+         in sprop-vps, sprop-sps, and sprop-pps (either included in the
+         "a=fmtp" line of SDP or conveyed using the "fmtp" source
+         attribute) for decoding the incoming bitstream, e.g., by
+         passing these parameter set NAL units to the video decoder
+         before passing any NAL units carried in the RTP streams.
+
+   *  When sprop-dci, sprop-vps, sprop-sps, and/or sprop-pps are
+      conveyed using the "fmtp" source attribute, as specified in
+      Section 6.3 of [RFC5576], the receiver of the parameters MUST
+      store the parameter sets included in sprop-dci, sprop-vps, sprop-
+      sps, and/or sprop-pps and associate them with the source given as
+      part of the "fmtp" source attribute.  Parameter sets associated
+      with one source (given as part of the "fmtp" source attribute)
+      MUST only be used to decode NAL units conveyed in RTP packets from
+      the same source (given as part of the "fmtp" source attribute).
+      When this mechanism is in use, SSRC collision detection and
+      resolution MUST be performed as specified in [RFC5576].
+
+   Figure 11 lists the interpretation of all the parameters that MAY be
+   used for the various combinations of offer, answer, and direction
+   attributes.
+
+                                       sendonly --+
+               answer: recvonly, recv-ols-id --+  |
+                 recvonly w/o recv-ols-id --+  |  |
+         answer: sendrecv, recv-ols-id --+  |  |  |
+           sendrecv w/o recv-ols-id --+  |  |  |  |
+                                      |  |  |  |  |
+   profile-id                         C  D  C  D  P
+   tier-flag                          C  D  C  D  P
+   level-id                           D  D  D  D  P
+   sub-profile-id                     C  D  C  D  P
+   interop-constraints                C  D  C  D  P
+   max-recv-level-id                  R  R  R  R  -
+   sprop-max-don-diff                 P  P  -  -  P
+   sprop-depack-buf-bytes             P  P  -  -  P
+   depack-buf-cap                     R  R  R  R  -
+   max-lsr                            R  R  R  R  -
+   max-fps                            R  R  R  R  -
+   sprop-dci                          P  P  -  -  P
+   sprop-sei                          P  P  -  -  P
+   sprop-vps                          P  P  -  -  P
+   sprop-sps                          P  P  -  -  P
+   sprop-pps                          P  P  -  -  P
+   sprop-sublayer-id                  P  P  -  -  P
+   recv-sublayer-id                   O  O  O  O  -
+   sprop-ols-id                       P  P  -  -  P
+   recv-ols-id                        X  O  X  O  -
+
+   Legend:
+
+    C: configuration for sending and receiving bitstreams
+    D: changeable configuration, same as C, except possible
+       to answer with a different but consistent value (see the
+       semantics of the six parameters related to profile, tier,
+       and level on these parameters being consistent)
+    P: properties of the bitstream to be sent
+    R: receiver capabilities
+    O: operation point selection
+    X: MUST NOT be present
+    -: not usable, when present MUST be ignored
+
+      Figure 11: Interpretation of Parameters for Various Combinations
+       of Offers, Answers, and Direction Attributes, with and without
+                                recv-ols-id.
+
+   Parameters used for declaring receiver capabilities are, in general,
+   downgradable, i.e., they express the upper limit for a sender's
+   possible behavior.  Thus, a sender MAY select to set its encoder
+   using only lower/lesser or equal values of these parameters.
+
+   When the answer does not include a recv-ols-id that is less than the
+   sprop-ols-id in the offer, parameters declaring a configuration point
+   are not changeable, with the exception of the level-id parameter for
+   unicast usage, and these parameters express values a receiver expects
+   to be used and MUST be used verbatim in the answer as in the offer.
+
+   When a sender's capabilities are declared with the configuration
+   parameters, these parameters express a configuration that is
+   acceptable for the sender to receive bitstreams.  In order to achieve
+   high interoperability levels, it is often advisable to offer multiple
+   alternative configurations.  It is impossible to offer multiple
+   configurations in a single payload type.  Thus, when multiple
+   configuration offers are made, each offer requires its own RTP
+   payload type associated with the offer.  However, it is possible to
+   offer multiple operation points using one configuration in a single
+   payload type by including sprop-vps in the offer and recv-ols-id in
+   the answer.
+
+   An implementation SHOULD be able to understand all media type
+   parameters (including all optional media type parameters), even if it
+   doesn't support the functionality related to the parameter.  This, in
+   conjunction with proper application logic in the implementation,
+   allows the implementation, after having received an offer, to create
+   an answer by potentially downgrading one or more of the optional
+   parameters to the point where the implementation can cope, leading to
+   higher chances of interoperability beyond the most basic interop
+   points (for which, as described above, no optional parameters are
+   necessary).
+
+      |  Informative note: In implementations of previous H.26x payload
+      |  formats, it was occasionally observed that implementations were
+      |  incapable of parsing most (or all) of the optional parameters.
+      |  As a result, the offer/answer exchange resulted in a baseline
+      |  performance (using the default values for the optional
+      |  parameters) with the resulting suboptimal user experience.
+      |  However, there are valid reasons to forego the implementation
+      |  complexity of implementing the parsing of some or all of the
+      |  optional parameters, for example, when there is predetermined
+      |  knowledge, not negotiated by an SDP-based offer/answer process,
+      |  of the capabilities of the involved systems (walled gardens,
+      |  baseline requirements defined in application standards higher
+      |  up in the stack, and similar).
+
+   An answerer MAY extend the offer with additional media format
+   configurations.  However, to enable their usage, in most cases, a
+   second offer is required from the offerer to provide the bitstream
+   property parameters that the media sender will use.  This also has
+   the effect that the offerer has to be able to receive this media
+   format configuration, not only to send it.
+
+7.3.3.  Multicast
+
+   For bitstreams being delivered over multicast, the following rules
+   apply:
+
+   *  The media format configuration is identified by profile-id, tier-
+      flag, sub-profile-id, level-id, and interop-constraints.  These
+      media format configuration parameters, including level-id, MUST be
+      used symmetrically; that is, the answerer MUST either maintain all
+      configuration parameters or remove the media format (payload type)
+      completely.  Note that this implies that the level-id for offer/
+      answer in multicast is not changeable.
+
+   *  To simplify the handling and matching of these configurations, the
+      same RTP payload type number used in the offer SHOULD also be used
+      in the answer, as specified in [RFC3264].  An answer MUST NOT
+      contain a payload type number used in the offer unless the
+      configuration is the same as in the offer.
+
+   *  Parameter sets received MUST be associated with the originating
+      source and MUST only be used in decoding the incoming bitstream
+      from the same source.
+
+   *  The rules for other parameters are the same as above for unicast
+      as long as the three above rules are obeyed.
+
+7.3.4.  Usage in Declarative Session Descriptions
+
+   When VVC over RTP is offered with SDP in a declarative style, as in
+   Real Time Streaming Protocol (RTSP) [RFC7826] or Session Announcement
+   Protocol (SAP) [RFC2974], the following considerations are necessary.
+
+   *  All parameters capable of indicating both bitstream properties and
+      receiver capabilities are used to indicate only bitstream
+      properties.  For example, in this case, the parameters profile-id,
+      tier-id, and level-id declare the values used by the bitstream,
+      not the capabilities for receiving bitstreams.  As a result, the
+      following interpretation of the parameters MUST be used:
+
+      -  Declaring actual configuration or bitstream properties:
+
+         o  profile-id
+
+         o  tier-flag
+
+         o  level-id
+
+         o  interop-constraints
+
+         o  sub-profile-id
+
+         o  sprop-dci
+
+         o  sprop-vps
+
+         o  sprop-sps
+
+         o  sprop-pps
+
+         o  sprop-max-don-diff
+
+         o  sprop-depack-buf-bytes
+
+         o  sprop-sublayer-id
+
+         o  sprop-ols-id
+
+         o  sprop-sei
+
+      -  Not usable (when present, they MUST be ignored):
+
+         o  max-lsr
+
+         o  max-fps
+
+         o  max-recv-level-id
+
+         o  depack-buf-cap
+
+         o  recv-sublayer-id
+
+         o  recv-ols-id
+
+      -  A receiver of the SDP is required to support all parameters and
+         values of the parameters provided; otherwise, the receiver MUST
+         reject (RTSP) or not participate in (SAP) the session.  It
+         falls on the creator of the session to use values that are
+         expected to be supported by the receiving application.
+
+7.3.5.  Considerations for Parameter Sets
+
+   When out-of-band transport of parameter sets is used, parameter sets
+   MAY still be additionally transported in-band unless explicitly
+   disallowed by an application, and some of these additional parameter
+   sets may update some of the out-of-band transported parameter sets.
+   An update of a parameter set refers to the sending of a parameter set
+   of the same type using the same parameter set ID but with different
+   values for at least one other parameter of the parameter set.
+
+8.  Use with Feedback Messages
+
+   The following subsections define the use of the Picture Loss
+   Indication (PLI) and Full Intra Request (FIR) feedback messages with
+   [VVC].  The PLI is defined in [RFC4585], and the FIR message is
+   defined in [RFC5104].  In accordance with this memo, unlike [HEVC], a
+   sender MUST NOT send Slice Loss Indication (SLI) or Reference Picture
+   Selection Indication (RPSI), and a receiver SHOULD ignore RPSI and
+   treat a received SLI as a PLI.
+
+8.1.  Picture Loss Indication (PLI)
+
+   As specified in Section 6.3.1 of [RFC4585], the reception of a PLI by
+   a media sender indicates "the loss of an undefined amount of coded
+   video data belonging to one or more pictures".  Without having any
+   specific knowledge of the setup of the bitstream (such as use and
+   location of in-band parameter sets, non-IRAP decoder refresh points,
+   picture structures, and so forth), a reaction to the reception of a
+   PLI by a VVC sender SHOULD be to send an IRAP picture and relevant
+   parameter sets, potentially with sufficient redundancy so to ensure
+   correct reception.  However, sometimes information about the
+   bitstream structure is known.  For example, such information can be
+   parameter sets that have been conveyed out of band through mechanisms
+   not defined in this document and that are known to stay static for
+   the duration of the session.  In that case, it is obviously
+   unnecessary to send them in-band as a result of the reception of a
+   PLI.  Other examples could be devised based on a priori knowledge of
+   different aspects of the bitstream structure.  In all cases, the
+   timing and congestion control mechanisms of [RFC4585] MUST be
+   observed.
+
+8.2.  Full Intra Request (FIR)
+
+   The purpose of the FIR message is to force an encoder to send an
+   independent decoder refresh point as soon as possible while observing
+   applicable congestion-control-related constraints, such as those set
+   out in [RFC8082].
+
+   Upon reception of a FIR, a sender MUST send an IDR picture.
+   Parameter sets MUST also be sent, except when there is a priori
+   knowledge that the parameter sets have been correctly established.  A
+   typical example for that is an understanding between the sender and
+   receiver, established by means outside this document, that parameter
+   sets are exclusively sent out of band.
+
+9.  Security Considerations
+
+   The scope of this section is limited to the payload format itself and
+   to one feature of [VVC] that may pose a particularly serious security
+   risk if implemented naively.  The payload format, in isolation, does
+   not form a complete system.  Implementers are advised to read and
+   understand relevant security-related documents, especially those
+   pertaining to RTP (see the Security Considerations section in
+   [RFC3550]) and the security of the call-control stack chosen (that
+   may make use of the media type registration of this memo).
+   Implementers should also consider known security vulnerabilities of
+   video coding and decoding implementations in general and avoid those.
+
+   Within this RTP payload format, and with the exception of the user
+   data SEI message as described below, no security threats other than
+   those common to RTP payload formats are known.  In other words,
+   neither the various media-plane-based mechanisms nor the signaling
+   part of this memo seem to pose a security risk beyond those common to
+   all RTP-based systems.
+
+   RTP packets using the payload format defined in this specification
+   are subject to the security considerations discussed in the RTP
+   specification [RFC3550] and in any applicable RTP profile, such as
+   RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/
+   SAVPF [RFC5124].  However, as "Securing the RTP Framework: Why RTP
+   Does Not Mandate a Single Media Security Solution" [RFC7202]
+   discusses, it is not an RTP payload format's responsibility to
+   discuss or mandate what solutions are used to meet the basic security
+   goals, like confidentiality, integrity, and source authenticity for
+   RTP in general.  This responsibility lays on anyone using RTP in an
+   application.  They can find guidance on available security mechanisms
+   and important considerations in "Options for Securing RTP Sessions"
+   [RFC7201].  The rest of this section discusses the security impacting
+   properties of the payload format itself.
+
+   Because the data compression used with this payload format is applied
+   end to end, any encryption needs to be performed after compression.
+   A potential denial-of-service threat exists for data encodings using
+   compression techniques that have non-uniform receiver-end
+   computational load.  The attacker can inject pathological datagrams
+   into the bitstream that are complex to decode and that cause the
+   receiver to be overloaded.  [VVC] is particularly vulnerable to such
+   attacks, as it is extremely simple to generate datagrams containing
+   NAL units that affect the decoding process of many future NAL units.
+   Therefore, the usage of data origin authentication and data integrity
+   protection of at least the RTP packet is RECOMMENDED but NOT REQUIRED
+   based on the thoughts of [RFC7202].
+
+   Like HEVC [RFC7798], [VVC] includes a user data Supplemental
+   Enhancement Information (SEI) message.  This SEI message allows
+   inclusion of an arbitrary bitstring into the video bitstream.  Such a
+   bitstring could include JavaScript, machine code, and other active
+   content.  [VVC] leaves the handling of this SEI message to the
+   receiving system.  In order to avoid harmful side effects of the user
+   data SEI message, decoder implementations cannot naively trust its
+   content.  For example, it would be a bad and insecure implementation
+   practice to forward any JavaScript a decoder implementation detects
+   to a web browser.  The safest way to deal with user data SEI messages
+   is to simply discard them, but that can have negative side effects on
+   the quality of experience by the user.
+
+   End-to-end security with authentication, integrity, or
+   confidentiality protection will prevent a MANE from performing media-
+   aware operations other than discarding complete packets.  In the case
+   of confidentiality protection, it will even be prevented from
+   discarding packets in a media-aware way.  To be allowed to perform
+   such operations, a MANE is required to be a trusted entity that is
+   included in the security context establishment.  This on-path
+   inclusion of the MANE forgoes end-to-end security guarantees for the
+   end points.
+
+10.  Congestion Control
+
+   Congestion control for RTP SHALL be used in accordance with RTP
+   [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551] or
+   AVPF [RFC4585].  If best-effort service is being used, an additional
+   requirement is that users of this payload format MUST monitor packet
+   loss to ensure that the packet loss rate is within an acceptable
+   range.  Packet loss is considered acceptable if a TCP flow across the
+   same network path and experiencing the same network conditions would
+   achieve an average throughput, measured on a reasonable timescale,
+   that is not less than all RTP streams combined are achieved.  This
+   condition can be satisfied by implementing congestion-control
+   mechanisms to adapt the transmission rate, by implementing the number
+   of layers subscribed for a layered multicast session, or by arranging
+   for a receiver to leave the session if the loss rate is unacceptably
+   high.
+
+   The bitrate adaptation necessary for obeying the congestion control
+   principle is easily achievable when real-time encoding is used, for
+   example, by adequately tuning the quantization parameter.  However,
+   when pre-encoded content is being transmitted, bandwidth adaptation
+   requires the pre-coded bitstream to be tailored for such adaptivity.
+   The key mechanisms available in [VVC] are temporal scalability and
+   spatial/SNR scalability.  A media sender can remove NAL units
+   belonging to higher temporal sublayers (i.e., those NAL units with a
+   high value of TID) or higher spatio-SNR layers until the sending
+   bitrate drops to an acceptable range.
+
+   The mechanisms mentioned above generally work within a defined
+   profile and level; therefore no renegotiation of the channel is
+   required.  Only when non-downgradable parameters (such as profile)
+   are required to be changed does it become necessary to terminate and
+   restart the RTP stream(s).  This may be accomplished by using
+   different RTP payload types.
+
+   MANEs MAY remove certain unusable packets from the RTP stream when
+   that RTP stream was damaged due to previous packet losses.  This can
+   help reduce the network load in certain special cases.  For example,
+   MANEs can remove those FUs where the leading FUs belonging to the
+   same NAL unit have been lost or those dependent slice segments when
+   the leading slice segments belonging to the same slice have been
+   lost, because the trailing FUs or dependent slice segments are
+   meaningless to most decoders.  MANE can also remove higher temporal
+   scalable layers if the outbound transmission (from the MANE's
+   viewpoint) experiences congestion.
+
+11.  IANA Considerations
+
+   A new media type has been registered with IANA; see Section 7.1.
+
+12.  References
+
+12.1.  Normative References
+
+   [ISO23090-3]
+              International Organization for Standardization,
+              "Information technology - Coded representation of
+              immersive media - Part 3: Versatile video coding", ISO/
+              IEC 23090-3:2022, September 2022,
+              <https://www.iso.org/standard/73022.html>.
+
+   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
+              Requirement Levels", BCP 14, RFC 2119,
+              DOI 10.17487/RFC2119, March 1997,
+              <https://www.rfc-editor.org/info/rfc2119>.
+
+   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
+              with Session Description Protocol (SDP)", RFC 3264,
+              DOI 10.17487/RFC3264, June 2002,
+              <https://www.rfc-editor.org/info/rfc3264>.
+
+   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
+              Jacobson, "RTP: A Transport Protocol for Real-Time
+              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
+              July 2003, <https://www.rfc-editor.org/info/rfc3550>.
+
+   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
+              Video Conferences with Minimal Control", STD 65, RFC 3551,
+              DOI 10.17487/RFC3551, July 2003,
+              <https://www.rfc-editor.org/info/rfc3551>.
+
+   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
+              RFC 3711, DOI 10.17487/RFC3711, March 2004,
+              <https://www.rfc-editor.org/info/rfc3711>.
+
+   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
+              "Extended RTP Profile for Real-time Transport Control
+              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
+              DOI 10.17487/RFC4585, July 2006,
+              <https://www.rfc-editor.org/info/rfc4585>.
+
+   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
+              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
+              <https://www.rfc-editor.org/info/rfc4648>.
+
+   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
+              "Codec Control Messages in the RTP Audio-Visual Profile
+              with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
+              February 2008, <https://www.rfc-editor.org/info/rfc5104>.
+
+   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
+              Real-time Transport Control Protocol (RTCP)-Based Feedback
+              (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
+              2008, <https://www.rfc-editor.org/info/rfc5124>.
+
+   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
+              Media Attributes in the Session Description Protocol
+              (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
+              <https://www.rfc-editor.org/info/rfc5576>.
+
+   [RFC8082]  Wenger, S., Lennox, J., Burman, B., and M. Westerlund,
+              "Using Codec Control Messages in the RTP Audio-Visual
+              Profile with Feedback with Layered Codecs", RFC 8082,
+              DOI 10.17487/RFC8082, March 2017,
+              <https://www.rfc-editor.org/info/rfc8082>.
+
+   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+              May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+   [RFC8866]  Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP:
+              Session Description Protocol", RFC 8866,
+              DOI 10.17487/RFC8866, January 2021,
+              <https://www.rfc-editor.org/info/rfc8866>.
+
+   [VSEI]     ITU-T, "Versatile supplemental enhancement information
+              messages for coded video bitstreams", ITU-T
+              Recommendation H.274, May 2022,
+              <https://www.itu.int/rec/T-REC-H.274>.
+
+   [VVC]      ITU-T, "Versatile Video Coding", ITU-T
+              Recommendation H.266, April 2022,
+              <http://www.itu.int/rec/T-REC-H.266>.
+
+12.2.  Informative References
+
+   [CABAC]    Sole, J., et al., "Transform coefficient coding in HEVC",
+              IEEE Transactions on Circuits and Systems for Video
+              Technology, DOI 10.1109/TCSVT.2012.2223055, December 2012,
+              <https://doi.org/10.1109/TCSVT.2012.2223055>.
+
+   [HEVC]     ITU-T, "High efficiency video coding", ITU-T
+              Recommendation H.265, August 2021,
+              <https://www.itu.int/rec/T-REC-H.265>.
+
+   [MPEG2S]   International Organization for Standardization,
+              "Information technology - Generic coding of moving
+              pictures and associated audio information - Part 1:
+              Systems", ISO/IEC 13818-1:2022, September 2022.
+
+   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session
+              Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974,
+              October 2000, <https://www.rfc-editor.org/info/rfc2974>.
+
+   [RFC6184]  Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
+              Payload Format for H.264 Video", RFC 6184,
+              DOI 10.17487/RFC6184, May 2011,
+              <https://www.rfc-editor.org/info/rfc6184>.
+
+   [RFC6190]  Wenger, S., Wang, Y.-K., Schierl, T., and A.
+              Eleftheriadis, "RTP Payload Format for Scalable Video
+              Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011,
+              <https://www.rfc-editor.org/info/rfc6190>.
+
+   [RFC7201]  Westerlund, M. and C. Perkins, "Options for Securing RTP
+              Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
+              <https://www.rfc-editor.org/info/rfc7201>.
+
+   [RFC7202]  Perkins, C. and M. Westerlund, "Securing the RTP
+              Framework: Why RTP Does Not Mandate a Single Media
+              Security Solution", RFC 7202, DOI 10.17487/RFC7202, April
+              2014, <https://www.rfc-editor.org/info/rfc7202>.
+
+   [RFC7656]  Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
+              B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
+              for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
+              DOI 10.17487/RFC7656, November 2015,
+              <https://www.rfc-editor.org/info/rfc7656>.
+
+   [RFC7667]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
+              DOI 10.17487/RFC7667, November 2015,
+              <https://www.rfc-editor.org/info/rfc7667>.
+
+   [RFC7798]  Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M.
+              M. Hannuksela, "RTP Payload Format for High Efficiency
+              Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798,
+              March 2016, <https://www.rfc-editor.org/info/rfc7798>.
+
+   [RFC7826]  Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M.,
+              and M. Stiemerling, Ed., "Real-Time Streaming Protocol
+              Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December
+              2016, <https://www.rfc-editor.org/info/rfc7826>.
+
+Acknowledgements
+
+   Dr. Byeongdoo Choi is thanked for the video-codec-related technical
+   discussion and other aspects in this memo.  Xin Zhao and Dr. Xiang Li
+   are thanked for their contributions on [VVC] specification
+   descriptive content.  Spencer Dawkins is thanked for his valuable
+   review comments that led to great improvements of this memo.  Some
+   parts of this specification share text with the RTP payload format
+   for HEVC [RFC7798].  We thank the authors of that specification for
+   their excellent work.
+
+Authors' Addresses
+
+   Shuai Zhao
+   Intel
+   2200 Mission College Blvd
+   Santa Clara,  95054
+   United States of America
+   Email: shuai.zhao@ieee.org
+
+
+   Stephan Wenger
+   Tencent
+   2747 Park Blvd
+   Palo Alto,  94588
+   United States of America
+   Email: stewe@stewe.org
+
+
+   Yago Sanchez
+   Fraunhofer HHI
+   Einsteinufer 37
+   10587 Berlin
+   Germany
+   Email: yago.sanchez@hhi.fraunhofer.de
+
+
+   Ye-Kui Wang
+   Bytedance Inc.
+   8910 University Center Lane
+   San Diego,  92122
+   United States of America
+   Email: yekui.wang@bytedance.com
+
+
+   Miska M. Hannuksela
+   Nokia Technologies
+   Hatanpään valtatie 30
+   FI-33100 Tampere
+   Finland
+   Email: miska.hannuksela@nokia.com