1 files changed, 5659 insertions, 0 deletions
diff --git a/doc/rfc/rfc6184.txt b/doc/rfc/rfc6184.txt
new file mode 100644
index 0000000..ef748fe
--- /dev/null
+++ b/doc/rfc/rfc6184.txt
@@ -0,0 +1,5659 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF)                        Y.-K. Wang
+Request for Comments: 6184                                       R. Even
+Obsoletes: 3984                                      Huawei Technologies
+Category: Standards Track                                  T. Kristensen
+ISSN: 2070-1721                                                 Tandberg
+                                                                R. Jesup
+                                                WorldGate Communications
+                                                                May 2011
+
+
+                   RTP Payload Format for H.264 Video
+
+Abstract
+
+   This memo describes an RTP Payload format for the ITU-T
+   Recommendation H.264 video codec and the technically identical
+   ISO/IEC International Standard 14496-10 video codec, excluding the
+   Scalable Video Coding (SVC) extension and the Multiview Video Coding
+   extension, for which the RTP payload formats are defined elsewhere.
+   The RTP payload format allows for packetization of one or more
+   Network Abstraction Layer Units (NALUs), produced by an H.264 video
+   encoder, in each RTP payload.  The payload format has wide
+   applicability, as it supports applications from simple low bitrate
+   conversational usage, to Internet video streaming with interleaved
+   transmission, to high bitrate video-on-demand.
+
+   This memo obsoletes RFC 3984.  Changes from RFC 3984 are summarized
+   in Section 14.  Issues on backward compatibility to RFC 3984 are
+   discussed in Section 15.
+
+Status of This Memo
+
+   This is an Internet Standards Track document.
+
+   This document is a product of the Internet Engineering Task Force
+   (IETF).  It represents the consensus of the IETF community.  It has
+   received public review and has been approved for publication by the
+   Internet Engineering Steering Group (IESG).  Further information on
+   Internet Standards is available in Section 2 of RFC 5741.
+
+   Information about the current status of this document, any errata,
+   and how to provide feedback on it may be obtained at
+   http://www.rfc-editor.org/info/rfc6184.
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                    [Page 1]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+Copyright Notice
+
+   Copyright (c) 2011 IETF Trust and the persons identified as the
+   document authors.  All rights reserved.
+
+   This document is subject to BCP 78 and the IETF Trust's Legal
+   Provisions Relating to IETF Documents
+   (http://trustee.ietf.org/license-info) in effect on the date of
+   publication of this document.  Please review these documents
+   carefully, as they describe your rights and restrictions with respect
+   to this document.  Code Components extracted from this document must
+   include Simplified BSD License text as described in Section 4.e of
+   the Trust Legal Provisions and are provided without warranty as
+   described in the Simplified BSD License.
+
+Table of Contents
+
+   1. Introduction ....................................................4
+      1.1. The H.264 Codec ............................................4
+      1.2. Parameter Set Concept ......................................5
+      1.3. Network Abstraction Layer Unit Types .......................6
+   2. Conventions .....................................................7
+   3. Scope ...........................................................7
+   4. Definitions and Abbreviations ...................................7
+      4.1. Definitions ................................................7
+      4.2. Abbreviations ..............................................9
+   5. RTP Payload Format .............................................10
+      5.1. RTP Header Usage ..........................................10
+      5.2. Payload Structures ........................................12
+      5.3. NAL Unit Header Usage .....................................13
+      5.4. Packetization Modes .......................................16
+      5.5. Decoding Order Number (DON) ...............................17
+      5.6. Single NAL Unit Packet ....................................19
+      5.7. Aggregation Packets .......................................20
+           5.7.1. Single-Time Aggregation Packet (STAP) ..............22
+           5.7.2. Multi-Time Aggregation Packets (MTAPs) .............25
+      5.8. Fragmentation Units (FUs) .................................29
+   6. Packetization Rules ............................................33
+      6.1. Common Packetization Rules ................................33
+      6.2. Single NAL Unit Mode ......................................34
+      6.3. Non-Interleaved Mode ......................................34
+      6.4. Interleaved Mode ..........................................34
+   7. De-Packetization Process .......................................35
+      7.1. Single NAL Unit and Non-Interleaved Mode ..................35
+      7.2. Interleaved Mode ..........................................35
+           7.2.1. Size of the De-Interleaving Buffer .................36
+           7.2.2. De-Interleaving Process ............................36
+      7.3. Additional De-Packetization Guidelines ....................38
+
+
+
+Wang, et al.                 Standards Track                    [Page 2]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   8. Payload Format Parameters ......................................39
+      8.1. Media Type Registration ...................................39
+      8.2. SDP Parameters ............................................57
+           8.2.1. Mapping of Payload Type Parameters to SDP ..........57
+           8.2.2. Usage with the SDP Offer/Answer Model ..............58
+           8.2.3. Usage in Declarative Session Descriptions ..........66
+      8.3. Examples ..................................................68
+      8.4. Parameter Set Considerations ..............................75
+      8.5. Decoder Refresh Point Procedure Using In-Band
+           Transport of Parameter Sets (Informative)..................78
+           8.5.1. IDR Procedure to Respond to a Request for
+                  a Decoder Refresh Point ............................78
+           8.5.2. Gradual Recovery Procedure to Respond to
+                  a Request for a Decoder Refresh Point ..............79
+   9. Security Considerations ........................................79
+   10. Congestion Control ............................................80
+   11. IANA Considerations ...........................................81
+   12. Informative Appendix: Application Examples ....................81
+      12.1. Video Telephony According to Annex A of ITU-T
+            Recommendation H.241 .....................................81
+      12.2. Video Telephony, No Slice Data Partitioning, No
+            NAL Unit Aggregation .....................................82
+      12.3. Video Telephony, Interleaved Packetization Using
+            NAL Unit Aggregation .....................................82
+      12.4. Video Telephony with Data Partitioning ...................83
+      12.5. Video Telephony or Streaming with FUs and Forward
+            Error Correction .........................................83
+      12.6. Low Bitrate Streaming ....................................86
+      12.7. Robust Packet Scheduling in Video Streaming ..............86
+   13. Informative Appendix: Rationale for Decoding Order Number .....87
+      13.1. Introduction .............................................87
+      13.2. Example of Multi-Picture Slice Interleaving ..............88
+      13.3. Example of Robust Packet Scheduling ......................89
+      13.4. Robust Transmission Scheduling of Redundant Coded
+            Slices ...................................................93
+      13.5. Remarks on Other Design Possibilities ....................94
+   14. Changes from RFC 3984 .........................................94
+   15. Backward Compatibility to RFC 3984 ............................96
+   16. Acknowledgements ..............................................98
+   17. References ....................................................98
+      17.1. Normative References .....................................98
+      17.2. Informative References ...................................99
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                    [Page 3]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+1.  Introduction
+
+   This memo specifies an RTP payload specification for the video coding
+   standard known as ITU-T Recommendation H.264 [1] and ISO/IEC
+   International Standard 14496-10 [2] (both also known as Advanced
+   Video Coding (AVC)).  In this memo, the name H.264 is used for the
+   codec and the standard, but this memo is equally applicable to the
+   ISO/IEC counterpart of the coding standard.
+
+   This memo obsoletes RFC 3984.  Changes from RFC 3984 are summarized
+   in Section 14.  Issues on backward compatibility to RFC 3984 are
+   discussed in Section 15.
+
+1.1.  The H.264 Codec
+
+   The H.264 video codec has a very broad application range that covers
+   all forms of digital compressed video, from low bitrate Internet
+   streaming applications to HDTV broadcast and Digital Cinema
+   applications with nearly lossless coding.  Compared to the current
+   state of technology, the overall performance of H.264 is such that
+   bitrate savings of 50% or more are reported.  Digital Satellite TV
+   quality, for example, was reported to be achievable at 1.5 Mbit/s,
+   compared to the current operation point of MPEG 2 video at around 3.5
+   Mbit/s [10].
+
+   The codec specification [1] itself conceptually distinguishes between
+   a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL).
+   The VCL contains the signal processing functionality of the codec;
+   mechanisms such as transform, quantization, and motion-compensated
+   prediction; and a loop filter.  It follows the general concept of
+   most of today's video codecs, a macroblock-based coder that uses
+   inter picture prediction with motion compensation and transform
+   coding of the residual signal.  The VCL encoder outputs slices: a bit
+   string that contains the macroblock data of an integer number of
+   macroblocks and the information of the slice header (containing the
+   spatial address of the first macroblock in the slice, the initial
+   quantization parameter, and similar information).  Macroblocks in
+   slices are arranged in scan order unless a different macroblock
+   allocation is specified using the syntax of slice groups.  In-picture
+   prediction is used only within a slice.  More information is provided
+   in [10].
+
+   The NAL encoder encapsulates the slice output of the VCL encoder into
+   Network Abstraction Layer Units (NALUs), which are suitable for
+   transmission over packet networks or for use in packet-oriented
+
+
+
+
+
+
+Wang, et al.                 Standards Track                    [Page 4]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   multiplex environments.  Annex B of H.264 defines an encapsulation
+   process to transmit such NALUs over bytestream-oriented networks.  In
+   the scope of this memo, Annex B is not relevant.
+
+   Internally, the NAL uses NAL units.  A NAL unit consists of a one-
+   byte header and the payload byte string.  The header indicates the
+   type of the NAL unit, the (potential) presence of bit errors or
+   syntax violations in the NAL unit payload, and information regarding
+   the relative importance of the NAL unit for the decoding process.
+   This RTP payload specification is designed to be unaware of the bit
+   string in the NAL unit payload.
+
+   One of the main properties of H.264 is the complete decoupling of the
+   transmission time, the decoding time, and the sampling or
+   presentation time of slices and pictures.  The decoding process
+   specified in H.264 is unaware of time, and the H.264 syntax does not
+   carry information such as the number of skipped frames (as is common
+   in the form of the Temporal Reference in earlier video compression
+   standards).  Also, there are NAL units that affect many pictures and
+   that are, therefore, inherently timeless.  For this reason, the
+   handling of the RTP timestamp requires some special considerations
+   for NAL units for which the sampling or presentation time is not
+   defined or, at transmission time, is unknown.
+
+1.2.  Parameter Set Concept
+
+   One very fundamental design concept of H.264 is to generate self-
+   contained packets, to make mechanisms such as the header duplication
+   of RFC 4629 [11] or MPEG-4 Visual's Header Extension Code (HEC) [12]
+   unnecessary.  This was achieved by decoupling information relevant to
+   more than one slice from the media stream.  This higher-layer meta
+   information should be sent reliably, asynchronously, and in advance
+   from the RTP packet stream that contains the slice packets.
+   (Provisions for sending this information in-band are also available
+   for applications that do not have an out-of-band transport channel
+   appropriate for the purpose).  The combination of the higher-level
+   parameters is called a parameter set.  The H.264 specification
+   includes two types of parameter sets: sequence parameter sets and
+   picture parameter sets.  An active sequence parameter set remains
+   unchanged throughout a coded video sequence, and an active picture
+   parameter set remains unchanged within a coded picture.  The sequence
+   and picture parameter set structures contain information such as
+   picture size, optional coding modes employed, and macroblock to slice
+   group map.
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                    [Page 5]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   To be able to change picture parameters (such as the picture size)
+   without having to transmit parameter set updates synchronously to the
+   slice packet stream, the encoder and decoder can maintain a list of
+   more than one sequence and picture parameter set.  Each slice header
+   contains a codeword that indicates the sequence and picture parameter
+   set to be used.
+
+   This mechanism allows the decoupling of the transmission of parameter
+   sets from the packet stream and the transmission of them by external
+   means (e.g., as a side effect of the capability exchange) or through
+   a (reliable or unreliable) control protocol.  It may even be possible
+   that they are never transmitted but are fixed by an application
+   design specification.
+
+1.3.  Network Abstraction Layer Unit Types
+
+   Tutorial information on the NAL design can be found in [13], [14],
+   and [15].
+
+   All NAL units consist of a single NAL unit type octet, which also
+   co-serves as the payload header of this RTP payload format.  A
+   description of the payload of a NAL unit follows.
+
+   The syntax and semantics of the NAL unit type octet are specified in
+   [1], but the essential properties of the NAL unit type octet are
+   summarized below.  The NAL unit type octet has the following format:
+
+      +---------------+
+      |0|1|2|3|4|5|6|7|
+      +-+-+-+-+-+-+-+-+
+      |F|NRI|  Type   |
+      +---------------+
+
+   The semantics of the components of the NAL unit type octet, as
+   specified in the H.264 specification, are described briefly below.
+
+   F:       1 bit
+            forbidden_zero_bit.  The H.264 specification declares a
+            value of 1 as a syntax violation.
+
+   NRI:     2 bits
+            nal_ref_idc.  A value of 00 indicates that the content of
+            the NAL unit is not used to reconstruct reference pictures
+            for inter picture prediction.  Such NAL units can be
+            discarded without risking the integrity of the reference
+            pictures.  Values greater than 00 indicate that the decoding
+            of the NAL unit is required to maintain the integrity of the
+            reference pictures.
+
+
+
+Wang, et al.                 Standards Track                    [Page 6]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Type:    5 bits
+            nal_unit_type.  This component specifies the NAL unit
+            payload type as defined in Table 7-1 of [1] and later within
+            this memo.  For a reference of all currently defined NAL
+            unit types and their semantics, please refer to Section
+            7.4.1 in [1].
+
+   This memo introduces new NAL unit types, which are presented in
+   Section 5.2.  The NAL unit types defined in this memo are marked as
+   unspecified in [1].  Moreover, this specification extends the
+   semantics of F and NRI as described in Section 5.3.
+
+2.  Conventions
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in RFC 2119 [4].
+
+   This specification uses the notion of setting and clearing a bit when
+   bit fields are handled.  Setting a bit is the same as assigning that
+   bit the value of 1 (On).  Clearing a bit is the same as assigning
+   that bit the value of 0 (Off).
+
+3.  Scope
+
+   This payload specification can only be used to carry the "naked"
+   H.264 NAL unit stream over RTP and not the bitstream format discussed
+   in Annex B of H.264.  Likely, the first applications of this
+   specification will be in the conversational multimedia field, video
+   telephony or video conferencing, but the payload format also covers
+   other applications, such as Internet streaming and TV over IP.
+
+4.  Definitions and Abbreviations
+
+4.1.  Definitions
+
+   This document uses the definitions of [1].  The following terms,
+   defined in [1], are summed up for convenience:
+
+      access unit: A set of NAL units always containing a primary coded
+      picture.  In addition to the primary coded picture, an access unit
+      may also contain one or more redundant coded pictures or other NAL
+      units not containing slices or slice data partitions of a coded
+      picture.  The decoding of an access unit always results in a
+      decoded picture.
+
+
+
+
+
+
+Wang, et al.                 Standards Track                    [Page 7]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      coded video sequence: A sequence of access units that consists, in
+      decoding order, of an instantaneous decoding refresh (IDR) access
+      unit followed by zero or more non-IDR access units including all
+      subsequent access units up to but not including any subsequent IDR
+      access unit.
+
+      IDR access unit: An access unit in which the primary coded picture
+      is an IDR picture.
+
+      IDR picture: A coded picture containing only slices with I or SI
+      slice types that causes a "reset" in the decoding process.  After
+      the decoding of an IDR picture, all following coded pictures in
+      decoding order can be decoded without inter prediction from any
+      picture decoded prior to the IDR picture.
+
+      primary coded picture: The coded representation of a picture to be
+      used by the decoding process for a bitstream conforming to H.264.
+      The primary coded picture contains all macroblocks of the picture.
+
+      redundant coded picture: A coded representation of a picture or a
+      part of a picture.  The content of a redundant coded picture shall
+      not be used by the decoding process for a bitstream conforming to
+      H.264.  The content of a redundant coded picture may be used by
+      the decoding process for a bitstream that contains errors or
+      losses.
+
+      VCL NAL unit: A collective term used to refer to coded slice and
+      coded data partition NAL units.
+
+   In addition, the following definitions apply:
+
+      decoding order number (DON): A field in the payload structure or a
+      derived variable indicating NAL unit decoding order.  Values of
+      DON are in the range of 0 to 65535, inclusive.  After reaching the
+      maximum value, the value of DON wraps around to 0.
+
+      NAL unit decoding order: A NAL unit order that conforms to the
+      constraints on NAL unit order given in Section 7.4.1.2 in [1].
+
+      NALU-time: The value that the RTP timestamp would have if the NAL
+      unit would be transported in its own RTP packet.
+
+      transmission order: The order of packets in ascending RTP sequence
+      number order (in modulo arithmetic).  Within an aggregation
+      packet, the NAL unit transmission order is the same as the order
+      of appearance of NAL units in the packet.
+
+
+
+
+
+Wang, et al.                 Standards Track                    [Page 8]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      media-aware network element (MANE): A network element, such as a
+      middlebox or application layer gateway that is capable of parsing
+      certain aspects of the RTP payload headers or the RTP payload and
+      reacting to the contents.
+
+         Informative note: The concept of a MANE goes beyond normal
+         routers or gateways in that a MANE has to be aware of the
+         signaling (e.g., to learn about the payload type mappings of
+         the media streams) and that it has to be trusted when working
+         with Secure Real-time Transport Protocol (SRTP).  The advantage
+         of using MANEs is that they allow packets to be dropped
+         according to the needs of the media coding.  For example, if a
+         MANE has to drop packets due to congestion on a certain link,
+         it can identify and remove those packets whose elimination
+         produces the least adverse effect on the user experience.
+
+      static macroblock: A certain amount of macroblocks in the video
+      stream can be defined as static, as defined in Section 8.3.2.8 in
+      [3].  Static macroblocks free up additional processing cycles for
+      the handling of non-static macroblocks.  Based on a given amount
+      of video processing resources and a given resolution, a higher
+      number of static macroblocks enables a correspondingly higher
+      frame rate.
+
+      default sub-profile: The subset of coding tools, which may be all
+      coding tools of one profile or the common subset of coding tools
+      of more than one profile, indicated by the profile-level-id
+      parameter.
+
+      default level: The level indicated by the profile-level-id
+      parameter, which consists of three octets, profile_idc, profile-
+      iop, and level_idc.  The default level is indicated by level_idc
+      in most cases, and, in some cases, additionally by profile-iop.
+
+4.2.  Abbreviations
+
+      DON:        Decoding Order Number
+      DONB:       Decoding Order Number Base
+      DOND:       Decoding Order Number Difference
+      FEC:        Forward Error Correction
+      FU:         Fragmentation Unit
+      IDR:        Instantaneous Decoding Refresh
+      IEC:        International Electrotechnical Commission
+      ISO:        International Organization for Standardization
+      ITU-T:      International Telecommunication Union,
+                  Telecommunication Standardization Sector
+      MANE:       Media-Aware Network Element
+      MTAP:       Multi-Time Aggregation Packet
+
+
+
+Wang, et al.                 Standards Track                    [Page 9]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      MTAP16:     MTAP with 16-bit timestamp offset
+      MTAP24:     MTAP with 24-bit timestamp offset
+      NAL:        Network Abstraction Layer
+      NALU:       NAL Unit
+      SAR:        Sample Aspect Ratio
+      SEI:        Supplemental Enhancement Information
+      STAP:       Single-Time Aggregation Packet
+      STAP-A:     STAP type A
+      STAP-B:     STAP type B
+      TS:         Timestamp
+      VCL:        Video Coding Layer
+      VUI:        Video Usability Information
+
+5.  RTP Payload Format
+
+5.1.  RTP Header Usage
+
+   The format of the RTP header is specified in RFC 3550 [5] and
+   reprinted in Figure 1 for convenience.  This payload format uses the
+   fields of the header in a manner consistent with that specification.
+
+   When one NAL unit is encapsulated per RTP packet, the RECOMMENDED RTP
+   payload format is specified in Section 5.6.  The RTP payload (and the
+   settings for some RTP header bits) for aggregation packets and
+   fragmentation units are specified in Sections 5.7.2 and 5.8,
+   respectively.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                           timestamp                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           synchronization source (SSRC) identifier            |
+      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+      |            contributing source (CSRC) identifiers             |
+      |                             ....                              |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 1.  RTP header according to RFC 3550
+
+   The RTP header information to be set according to this RTP payload
+   format is set as follows:
+
+   Marker bit (M): 1 bit
+      Set for the very last packet of the access unit indicated by the
+      RTP timestamp, in line with the normal use of the M bit in video
+
+
+
+Wang, et al.                 Standards Track                   [Page 10]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      formats, to allow an efficient playout buffer handling.  For
+      aggregation packets (STAP and MTAP), the marker bit in the RTP
+      header MUST be set to the value that the marker bit of the last
+      NAL unit of the aggregation packet would have been if it were
+      transported in its own RTP packet.  Decoders MAY use this bit as
+      an early indication of the last packet of an access unit but MUST
+      NOT rely on this property.
+
+         Informative note: Only one M bit is associated with an
+         aggregation packet carrying multiple NAL units.  Thus, if a
+         gateway has re-packetized an aggregation packet into several
+         packets, it cannot reliably set the M bit of those packets.
+
+   Payload type (PT): 7 bits
+      The assignment of an RTP payload type for this new packet format
+      is outside the scope of this document and will not be specified
+      here.  The assignment of a payload type has to be performed either
+      through the profile used or in a dynamic way.
+
+   Sequence number (SN): 16 bits
+      Set and used in accordance with RFC 3550.  For the single NALU and
+      non-interleaved packetization mode, the sequence number is used to
+      determine decoding order for the NALU.
+
+   Timestamp: 32 bits
+      The RTP timestamp is set to the sampling timestamp of the content.
+      A 90 kHz clock rate MUST be used.
+
+      If the NAL unit has no timing properties of its own (e.g.,
+      parameter set and SEI NAL units), the RTP timestamp is set to the
+      RTP timestamp of the primary coded picture of the access unit in
+      which the NAL unit is included, according to Section 7.4.1.2 of
+      [1].
+
+      The setting of the RTP timestamp for MTAPs is defined in Section
+      5.7.2.
+
+      Receivers SHOULD ignore any picture timing SEI messages included
+      in access units that have only one display timestamp.  Instead,
+      receivers SHOULD use the RTP timestamp for synchronizing the
+      display process.
+
+      If one access unit has more than one display timestamp carried in
+      a picture timing SEI message, then the information in the SEI
+      message SHOULD be treated as relative to the RTP timestamp, with
+      the earliest event occurring at the time given by the RTP
+      timestamp and subsequent events later, as given by the difference
+      in picture time values carried in the picture timing SEI message.
+
+
+
+Wang, et al.                 Standards Track                   [Page 11]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      Let tSEI1, tSEI2, ..., tSEIn be the display timestamps carried in
+      the SEI message of an access unit, where tSEI1 is the earliest of
+      all such timestamps.  Let tmadjst() be a function that adjusts the
+      SEI messages time scale to a 90-kHz time scale.  Let TS be the RTP
+      timestamp.  Then, the display time for the event associated with
+      tSEI1 is TS.  The display time for the event with tSEIx, where x
+      is [2..n], is TS + tmadjst (tSEIx - tSEI1).
+
+         Informative note: Displaying coded frames as fields is needed
+         commonly in an operation known as 3:2 pulldown, in which film
+         content that consists of coded frames is displayed on a display
+         using interlaced scanning.  The picture timing SEI message
+         enables carriage of multiple timestamps for the same coded
+         picture, and therefore the 3:2 pulldown process is perfectly
+         controlled.  The picture timing SEI message mechanism is
+         necessary because only one timestamp per coded frame can be
+         conveyed in the RTP timestamp.
+
+5.2.  Payload Structures
+
+   The payload format defines three different basic payload structures.
+   A receiver can identify the payload structure by the first byte of
+   the RTP packet payload, which co-serves as the RTP payload header
+   and, in some cases, as the first byte of the payload.  This byte is
+   always structured as a NAL unit header.  The NAL unit type field
+   indicates which structure is present.  The possible structures are as
+   follows.
+
+   Single NAL Unit Packet: Contains only a single NAL unit in the
+   payload.  The NAL header type field is equal to the original NAL unit
+   type, i.e., in the range of 1 to 23, inclusive.  Specified in Section
+   5.6.
+
+   Aggregation Packet: Packet type used to aggregate multiple NAL units
+   into a single RTP payload.  This packet exists in four versions, the
+   Single-Time Aggregation Packet type A (STAP-A), the Single-Time
+   Aggregation Packet type B (STAP-B), Multi-Time Aggregation Packet
+   (MTAP) with 16-bit offset (MTAP16), and Multi-Time Aggregation Packet
+   (MTAP) with 24-bit offset (MTAP24).  The NAL unit type numbers
+   assigned for STAP-A, STAP-B, MTAP16, and MTAP24 are 24, 25, 26, and
+   27, respectively.  Specified in Section 5.7.
+
+   Fragmentation Unit: Used to fragment a single NAL unit over multiple
+   RTP packets.  Exists with two versions, FU-A and FU-B, identified
+   with the NAL unit type numbers 28 and 29, respectively.  Specified in
+   Section 5.8.
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 12]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      Informative note: This specification does not limit the size of
+      NAL units encapsulated in single NAL unit packets and
+      fragmentation units.  The maximum size of a NAL unit encapsulated
+      in any aggregation packet is 65535 bytes.
+
+   Table 1 summarizes NAL unit types and the corresponding RTP packet
+   types when each of these NAL units is directly used as a packet
+   payload, and where the types are described in this memo.
+
+      Table 1.  Summary of NAL unit types and the corresponding packet
+                types
+
+      NAL Unit  Packet    Packet Type Name               Section
+      Type      Type
+      -------------------------------------------------------------
+      0        reserved                                     -
+      1-23     NAL unit  Single NAL unit packet             5.6
+      24       STAP-A    Single-time aggregation packet     5.7.1
+      25       STAP-B    Single-time aggregation packet     5.7.1
+      26       MTAP16    Multi-time aggregation packet      5.7.2
+      27       MTAP24    Multi-time aggregation packet      5.7.2
+      28       FU-A      Fragmentation unit                 5.8
+      29       FU-B      Fragmentation unit                 5.8
+      30-31    reserved                                     -
+
+5.3.  NAL Unit Header Usage
+
+   The structure and semantics of the NAL unit header were introduced in
+   Section 1.3.  For convenience, the format of the NAL unit header is
+   reprinted below:
+
+      +---------------+
+      |0|1|2|3|4|5|6|7|
+      +-+-+-+-+-+-+-+-+
+      |F|NRI|  Type   |
+      +---------------+
+
+   This section specifies the semantics of F and NRI according to this
+   specification.
+
+   F:    1 bit
+         forbidden_zero_bit.  A value of 0 indicates that the NAL unit
+         type octet and payload should not contain bit errors or other
+         syntax violations.  A value of 1 indicates that the NAL unit
+         type octet and payload may contain bit errors or other syntax
+         violations.
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 13]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         MANEs SHOULD set the F bit to indicate detected bit errors in
+         the NAL unit.  The H.264 specification requires that the F bit
+         be equal to 0.  When the F bit is set, the decoder is advised
+         that bit errors or any other syntax violations may be present
+         in the payload or in the NAL unit type octet.  The simplest
+         decoder reaction to a NAL unit in which the F bit is equal to 1
+         is to discard such a NAL unit and to conceal the lost data in
+         the discarded NAL unit.
+
+   NRI:  2 bits
+         nal_ref_idc.  The semantics of value 00 and a non-zero value
+         remain unchanged from the H.264 specification.  In other words,
+         a value of 00 indicates that the content of the NAL unit is not
+         used to reconstruct reference pictures for inter picture
+         prediction.  Such NAL units can be discarded without risking
+         the integrity of the reference pictures.  Values greater than
+         00 indicate that the decoding of the NAL unit is required to
+         maintain the integrity of the reference pictures.
+
+         In addition to the specification above, according to this RTP
+         payload specification, values of NRI indicate the relative
+         transport priority, as determined by the encoder.  MANEs can
+         use this information to protect more important NAL units better
+         than they do less important NAL units.  The highest transport
+         priority is 11, followed by 10, and then by 01; finally, 00 is
+         the lowest.
+
+            Informative note: Any non-zero value of NRI is handled
+            identically in H.264 decoders.  Therefore, receivers need
+            not manipulate the value of NRI when passing NAL units to
+            the decoder.
+
+         An H.264 encoder MUST set the value of NRI according to the
+         H.264 specification (Subclause 7.4.1) when the value of
+         nal_unit_type is in the range of 1 to 12, inclusive.  In
+         particular, the H.264 specification requires that the value of
+         NRI SHALL be equal to 0 for all NAL units having nal_unit_type
+         equal to 6, 9, 10, 11, or 12.
+
+         For NAL units having nal_unit_type equal to 7 or 8 (indicating
+         a sequence parameter set or a picture parameter set,
+         respectively), an H.264 encoder SHOULD set the value of NRI to
+         11 (in binary format).  For coded slice NAL units of a primary
+         coded picture having nal_unit_type equal to 5 (indicating a
+         coded slice belonging to an IDR picture), an H.264 encoder
+         SHOULD set the value of NRI to 11 (in binary format).
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 14]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         For a mapping of the remaining nal_unit_types to NRI values,
+         the following example MAY be used and has been shown to be
+         efficient in a certain environment [14].  Other mappings MAY
+         also be desirable, depending on the application and the H.264
+         profile in use.
+
+            Informative note: Data partitioning is not available in
+            certain profiles, e.g., in the Main or Baseline profiles.
+            Consequently, the NAL unit types 2, 3, and 4 can occur only
+            if the video bitstream conforms to a profile in which data
+            partitioning is allowed and not in streams that conform to
+            the Main or Baseline profiles.
+
+      Table 2.  Example of NRI values for coded slices and coded slice
+                data partitions of primary coded reference pictures
+
+       NAL Unit Type     Content of NAL Unit              NRI (binary)
+       ----------------------------------------------------------------
+        1              non-IDR coded slice                         10
+        2              Coded slice data partition A                10
+        3              Coded slice data partition B                01
+        4              Coded slice data partition C                01
+
+            Informative note: As mentioned before, the NRI value of non-
+            reference pictures is 00 as mandated by H.264.
+
+         An H.264 encoder SHOULD set the value of NRI for coded slice
+         and coded slice data partition NAL units of redundant coded
+         reference pictures equal to 01 (in binary format).
+
+         Definitions of the values for NRI for NAL unit types 24 to 29,
+         inclusive, are given in Sections 5.7 and 5.8 of this memo.
+
+         No recommendation for the value of NRI is given for NAL units
+         having nal_unit_type in the range of 13 to 23, inclusive,
+         because these values are reserved for ITU-T and ISO/IEC.  No
+         recommendation for the value of NRI is given for NAL units
+         having nal_unit_type equal to 0 or in the range of 30 to 31,
+         inclusive, as the semantics of these values are not specified
+         in this memo.
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 15]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+5.4.  Packetization Modes
+
+   This memo specifies three cases of packetization modes:
+
+   o  Single NAL unit mode
+
+   o  Non-interleaved mode
+
+   o  Interleaved mode
+
+   The single NAL unit mode is targeted for conversational systems that
+   comply with ITU-T Recommendation H.241 [3]  (see Section 12.1).  The
+   non-interleaved mode is targeted for conversational systems that may
+   not comply with ITU-T Recommendation H.241.  In the non-interleaved
+   mode, NAL units are transmitted in NAL unit decoding order.  The
+   interleaved mode is targeted for systems that do not require very low
+   end-to-end latency.  The interleaved mode allows transmission of NAL
+   units out of NAL unit decoding order.
+
+   The packetization mode in use MAY be signaled by the value of the
+   OPTIONAL packetization-mode media type parameter.  The used
+   packetization mode governs which NAL unit types are allowed in RTP
+   payloads.  Table 3 summarizes the allowed packet payload types for
+   each packetization mode.  Packetization modes are explained in more
+   detail in Section 6.
+
+      Table 3.  Summary of allowed NAL unit types for each packetization
+                mode (yes = allowed, no = disallowed, ig = ignore)
+
+      Payload Packet    Single NAL    Non-Interleaved    Interleaved
+      Type    Type      Unit Mode           Mode             Mode
+      -------------------------------------------------------------
+      0      reserved      ig               ig               ig
+      1-23   NAL unit     yes              yes               no
+      24     STAP-A        no              yes               no
+      25     STAP-B        no               no              yes
+      26     MTAP16        no               no              yes
+      27     MTAP24        no               no              yes
+      28     FU-A          no              yes              yes
+      29     FU-B          no               no              yes
+      30-31  reserved      ig               ig               ig
+
+   Some NAL unit or payload type values (indicated as reserved in Table
+   3) are reserved for future extensions.  NAL units of those types
+   SHOULD NOT be sent by a sender (direct as packet payloads, as
+   aggregation units in aggregation packets, or as fragmented units in
+   FU packets) and MUST be ignored by a receiver.  For example, the
+   payload types 1-23, with the associated packet type "NAL unit", are
+
+
+
+Wang, et al.                 Standards Track                   [Page 16]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   allowed in "Single NAL Unit Mode" and in "Non-Interleaved Mode" but
+   disallowed in "Interleaved Mode".  However, NAL units of NAL unit
+   types 1-23 can be used in "Interleaved Mode" as aggregation units in
+   STAP-B, MTAP16, and MTAP24 packets as well as fragmented units in FU-
+   A and FU-B packets.  Similarly, NAL units of NAL unit types 1-23 can
+   also be used in the "Non-Interleaved Mode" as aggregation units in
+   STAP-A packets or fragmented units in FU-A packets, in addition to
+   being directly used as packet payloads.
+
+5.5.  Decoding Order Number (DON)
+
+   In the interleaved packetization mode, the transmission order of NAL
+   units is allowed to differ from the decoding order of the NAL units.
+   Decoding order number (DON) is a field in the payload structure or a
+   derived variable that indicates the NAL unit decoding order.
+   Rationale and examples of use cases for transmission out of decoding
+   order and for the use of DON are given in Section 13.
+
+   The coupling of transmission and decoding order is controlled by the
+   OPTIONAL sprop-interleaving-depth media type parameter as follows.
+   When the value of the OPTIONAL sprop-interleaving-depth media type
+   parameter is equal to 0 (explicitly or per default), the transmission
+   order of NAL units MUST conform to the NAL unit decoding order.  When
+   the value of the OPTIONAL sprop-interleaving-depth media type
+   parameter is greater than 0:
+
+   o  the order of NAL units in an MTAP16 and an MTAP24 is not required
+      to be the NAL unit decoding order, and
+
+   o  the order of NAL units generated by de-packetizing STAP-Bs, MTAPs,
+      and FUs in two consecutive packets is not required to be the NAL
+      unit decoding order.
+
+   The RTP payload structures for a single NAL unit packet, an STAP-A,
+   and an FU-A do not include DON.  STAP-B and FU-B structures include
+   DON, and the structure of MTAPs enables derivation of DON, as
+   specified in Section 5.7.2.
+
+      Informative note: When an FU-A occurs in interleaved mode, it
+      always follows an FU-B, which sets its DON.
+
+      Informative note: If a transmitter wants to encapsulate a single
+      NAL unit per packet and transmit packets out of their decoding
+      order, STAP-B packet type can be used.
+
+   In the single NAL unit packetization mode, the transmission order of
+   NAL units, determined by the RTP sequence number, MUST be the same as
+   their NAL unit decoding order.  In the non-interleaved packetization
+
+
+
+Wang, et al.                 Standards Track                   [Page 17]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   mode, the transmission order of NAL units in single NAL unit packets,
+   STAP-As, and FU-As MUST be the same as their NAL unit decoding order.
+   The NAL units within an STAP MUST appear in the NAL unit decoding
+   order.  Thus, the decoding order is first provided through the
+   implicit order within an STAP and then provided through the RTP
+   sequence number for the order between STAPs, FUs, and single NAL unit
+   packets.
+
+   The signaling of the value of DON for NAL units carried in STAP-B,
+   MTAP, and a series of fragmentation units starting with an FU-B is
+   specified in Sections 5.7.1, 5.7.2, and 5.8, respectively.  The DON
+   value of the first NAL unit in transmission order MAY be set to any
+   value.  Values of DON are in the range of 0 to 65535, inclusive.
+   After reaching the maximum value, the value of DON wraps around to 0.
+
+   The decoding order of two NAL units contained in any STAP-B, MTAP, or
+   a series of fragmentation units starting with an FU-B is determined
+   as follows.  Let DON(i) be the decoding order number of the NAL unit
+   having index i in the transmission order.  Function don_diff(m,n) is
+   specified as follows:
+
+      If DON(m) == DON(n), don_diff(m,n) = 0
+
+      If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
+      don_diff(m,n) = DON(n) - DON(m)
+
+      If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
+      don_diff(m,n) = 65536 - DON(m) + DON(n)
+
+      If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
+      don_diff(m,n) = - (DON(m) + 65536 - DON(n))
+
+      If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
+      don_diff(m,n) = - (DON(m) - DON(n))
+
+   A positive value of don_diff(m,n) indicates that the NAL unit having
+   transmission order index n follows, in decoding order, the NAL unit
+   having transmission order index m.  When don_diff(m,n) is equal to 0,
+   the NAL unit decoding order of the two NAL units can be in either
+   order.  A negative value of don_diff(m,n) indicates that the NAL unit
+   having transmission order index n precedes, in decoding order, the
+   NAL unit having transmission order index m.
+
+   Values of DON-related fields (DON, DONB, and DOND; see Section 5.7)
+   MUST be such that the decoding order determined by the values of DON,
+   as specified above, conforms to the NAL unit decoding order.
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 18]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   If the order of two NAL units in NAL unit decoding order is switched
+   and the new order does not conform to the NAL unit decoding order,
+   the NAL units MUST NOT have the same value of DON.  If the order of
+   two consecutive NAL units in the NAL unit stream is switched and the
+   new order still conforms to the NAL unit decoding order, the NAL
+   units MAY have the same value of DON.  For example, when arbitrary
+   slice order is allowed by the video coding profile in use, all the
+   coded slice NAL units of a coded picture are allowed to have the same
+   value of DON.  Consequently, NAL units having the same value of DON
+   can be decoded in any order, and two NAL units having a different
+   value of DON should be passed to the decoder in the order specified
+   above.  When two consecutive NAL units in the NAL unit decoding order
+   have a different value of DON, the value of DON for the second NAL
+   unit in decoding order SHOULD be the value of DON for the first,
+   incremented by one.
+
+   An example of the de-packetization process to recover the NAL unit
+   decoding order is given in Section 7.
+
+      Informative note: Receivers should not expect that the absolute
+      difference of values of DON for two consecutive NAL units in the
+      NAL unit decoding order will be equal to one, even in error-free
+      transmission.  An increment by one is not required, as at the time
+      of associating values of DON to NAL units, it may not be known
+      whether all NAL units are delivered to the receiver.  For example,
+      a gateway may not forward coded slice NAL units of non-reference
+      pictures or SEI NAL units when there is a shortage of bitrate in
+      the network to which the packets are forwarded.  In another
+      example, a live broadcast is interrupted by pre-encoded content,
+      such as commercials, from time to time.  The first intra picture
+      of a pre-encoded clip is transmitted in advance to ensure that it
+      is readily available in the receiver.  When transmitting the first
+      intra picture, the originator does not exactly know how many NAL
+      units will be encoded before the first intra picture of the pre-
+      encoded clip follows in decoding order.  Thus, the values of DON
+      for the NAL units of the first intra picture of the pre-encoded
+      clip have to be estimated when they are transmitted, and gaps in
+      values of DON may occur.
+
+5.6.  Single NAL Unit Packet
+
+   The single NAL unit packet defined here MUST contain only one NAL
+   unit of the types defined in [1].  This means that neither an
+   aggregation packet nor a fragmentation unit can be used within a
+   single NAL unit packet.  A NAL unit stream composed by de-packetizing
+   single NAL unit packets in RTP sequence number order MUST conform to
+   the NAL unit decoding order.  The structure of the single NAL unit
+   packet is shown in Figure 2.
+
+
+
+Wang, et al.                 Standards Track                   [Page 19]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      Informative note: The first byte of a NAL unit co-serves as the
+      RTP payload header.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |F|NRI|  Type   |                                               |
+    +-+-+-+-+-+-+-+-+                                               |
+    |                                                               |
+    |               Bytes 2..n of a single NAL unit                 |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 2.  RTP payload format for single NAL unit packet
+
+5.7.  Aggregation Packets
+
+   Aggregation packets are the NAL unit aggregation scheme of this
+   payload specification.  The scheme is introduced to reflect the
+   dramatically different MTU sizes of two key target networks: wireline
+   IP networks (with an MTU size that is often limited by the Ethernet
+   MTU size, roughly 1500 bytes) and IP-based or non-IP-based (e.g.,
+   ITU-T H.324/M) wireless communication systems with preferred
+   transmission unit sizes of 254 bytes or less.  To prevent media
+   transcoding between the two worlds, and to avoid undesirable
+   packetization overhead, a NAL unit aggregation scheme is introduced.
+
+   Two types of aggregation packets are defined by this specification:
+
+   o  Single-time aggregation packet (STAP): aggregates NAL units with
+      identical NALU-times.  Two types of STAPs are defined, one without
+      DON (STAP-A) and another including DON (STAP-B).
+
+   o  Multi-time aggregation packet (MTAP): aggregates NAL units with
+      potentially differing NALU-times.  Two different MTAPs are
+      defined, differing in the length of the NAL unit timestamp offset.
+
+   Each NAL unit to be carried in an aggregation packet is encapsulated
+   in an aggregation unit.  Please see below for the four different
+   aggregation units and their characteristics.
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 20]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   The structure of the RTP payload format for aggregation packets is
+   presented in Figure 3.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |F|NRI|  Type   |                                               |
+    +-+-+-+-+-+-+-+-+                                               |
+    |                                                               |
+    |             one or more aggregation units                     |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 3.  RTP payload format for aggregation packets
+
+   MTAPs and STAPs share the following packetization rules:
+
+   o  The RTP timestamp MUST be set to the earliest of the NALU-times of
+      all the NAL units to be aggregated.
+
+   o  The type field of the NAL unit type octet MUST be set to the
+      appropriate value, as indicated in Table 4.
+
+   o  The F bit MUST be cleared if all F bits of the aggregated NAL
+      units are zero; otherwise, it MUST be set.
+
+   o  The value of NRI MUST be the maximum of all the NAL units carried
+      in the aggregation packet.
+
+                 Table 4.  Type field for STAPs and MTAPs
+
+      Type   Packet    Timestamp offset   DON-related fields
+                       field length       (DON, DONB, DOND)
+                       (in bits)          present
+      --------------------------------------------------------
+      24     STAP-A       0                 no
+      25     STAP-B       0                 yes
+      26     MTAP16      16                 yes
+      27     MTAP24      24                 yes
+
+   The marker bit in the RTP header is set to the value that the marker
+   bit of the last NAL unit of the aggregated packet would have if it
+   were transported in its own RTP packet.
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 21]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   The payload of an aggregation packet consists of one or more
+   aggregation units.  See Sections 5.7.1 and 5.7.2 for the four
+   different types of aggregation units.  An aggregation packet can
+   carry as many aggregation units as necessary; however, the total
+   amount of data in an aggregation packet obviously MUST fit into an IP
+   packet, and the size SHOULD be chosen so that the resulting IP packet
+   is smaller than the MTU size.  An aggregation packet MUST NOT contain
+   fragmentation units, as specified in Section 5.8.  Aggregation
+   packets MUST NOT be nested; that is, an aggregation packet MUST NOT
+   contain another aggregation packet.
+
+5.7.1.  Single-Time Aggregation Packet (STAP)
+
+   A single-time aggregation packet (STAP) SHOULD be used whenever NAL
+   units are aggregated that all share the same NALU-time.  The payload
+   of an STAP-A does not include DON and consists of at least one
+   single-time aggregation unit, as presented in Figure 4.  The payload
+   of an STAP-B consists of a 16-bit unsigned decoding order number
+   (DON) (in network byte order) followed by at least one single-time
+   aggregation unit, as presented in Figure 5.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                    :                                               |
+    +-+-+-+-+-+-+-+-+                                               |
+    |                                                               |
+    |                single-time aggregation units                  |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 4.  Payload format for STAP-A
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 22]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                    :  decoding order number (DON)  |               |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+    |                                                               |
+    |                single-time aggregation units                  |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 5.  Payload format for STAP-B
+
+   The DON field specifies the value of DON for the first NAL unit in an
+   STAP-B in transmission order.  For each successive NAL unit in
+   appearance order in an STAP-B, the value of DON is equal to (the
+   value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in
+   which '%' stands for the modulo operation.
+
+   A single-time aggregation unit consists of 16-bit unsigned size
+   information (in network byte order) that indicates the size of the
+   following NAL unit in bytes (excluding these two octets, but
+   including the NAL unit type octet of the NAL unit), followed by the
+   NAL unit itself, including its NAL unit type byte.  A single-time
+   aggregation unit is byte aligned within the RTP payload, but it may
+   not be aligned on a 32-bit word boundary.  Figure 6 presents the
+   structure of the single-time aggregation unit.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                    :        NAL unit size          |               |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+    |                                                               |
+    |                           NAL unit                            |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 6.  Structure for single-time aggregation unit
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 23]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Figure 7 presents an example of an RTP packet that contains an STAP-
+   A.  The STAP contains two single-time aggregation units, labeled as 1
+   and 2 in the figure.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          RTP Header                           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                         NALU 1 Data                           |
+    :                                                               :
+    +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |               | NALU 2 Size                   | NALU 2 HDR    |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                         NALU 2 Data                           |
+    :                                                               :
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 7.  An example of an RTP packet including an STAP-A
+               containing two single-time aggregation units
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 24]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Figure 8 presents an example of an RTP packet that contains an STAP-
+   B.  The STAP contains two single-time aggregation units, labeled as 1
+   and 2 in the figure.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          RTP Header                           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |STAP-B NAL HDR | DON                           | NALU 1 Size   |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    | NALU 1 Size   | NALU 1 HDR    | NALU 1 Data                   |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
+    :                                                               :
+    +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |               | NALU 2 Size                   | NALU 2 HDR    |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                       NALU 2 Data                             |
+    :                                                               :
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 8.  An example of an RTP packet including an STAP-B
+               containing two single-time aggregation units
+
+5.7.2.  Multi-Time Aggregation Packets (MTAPs)
+
+   The NAL unit payload of MTAPs consists of a 16-bit unsigned decoding
+   order number base (DONB) (in network byte order) and one or more
+   multi-time aggregation units, as presented in Figure 9.  DONB MUST
+   contain the value of DON for the first NAL unit in the NAL unit
+   decoding order among the NAL units of the MTAP.
+
+      Informative note: The first NAL unit in the NAL unit decoding
+      order is not necessarily the first NAL unit in the order in which
+      the NAL units are encapsulated in an MTAP.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 25]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                    :  decoding order number base   |               |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+    |                                                               |
+    |                 multi-time aggregation units                  |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 9.  NAL unit payload format for MTAPs
+
+   Two different multi-time aggregation units are defined in this
+   specification.  Both of them consist of 16 bits of unsigned size
+   information of the following NAL unit (in network byte order), an
+   8-bit unsigned decoding order number difference (DOND), and n bits
+   (in network byte order) of timestamp offset (TS offset) for this NAL
+   unit, whereby n can be 16 or 24.  The choice between the different
+   MTAP types (MTAP16 and MTAP24) is application dependent: the larger
+   the timestamp offset is, the higher the flexibility of the MTAP, but
+   the overhead is also higher.
+
+   The structure of the multi-time aggregation units for MTAP16 and
+   MTAP24 are presented in Figures 10 and 11, respectively.  The
+   starting or ending position of an aggregation unit within a packet is
+   not required to be on a 32-bit word boundary.  The DON of the NAL
+   unit contained in a multi-time aggregation unit is equal to (DONB +
+   DOND) % 65536, in which % denotes the modulo operation.  This memo
+   does not specify how the NAL units within an MTAP are ordered, but,
+   in most cases, NAL unit decoding order SHOULD be used.
+
+   The timestamp offset field MUST be set to a value equal to the value
+   of the following formula: if the NALU-time is larger than or equal to
+   the RTP timestamp of the packet, then the timestamp offset equals
+   (the NALU-time of the NAL unit - the RTP timestamp of the packet).
+   If the NALU-time is smaller than the RTP timestamp of the packet,
+   then the timestamp offset is equal to the NALU-time + (2^32 - the RTP
+   timestamp of the packet).
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 26]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    :        NAL unit size          |      DOND     |  TS offset    |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |  TS offset    |                                               |
+    +-+-+-+-+-+-+-+-+              NAL unit                         |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 10.  Multi-time aggregation unit for MTAP16
+
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    :        NAL unit size         |      DOND     |  TS offset    |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |         TS offset             |                               |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
+    |                              NAL unit                         |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 11.  Multi-time aggregation unit for MTAP24
+
+   For the "earliest" multi-time aggregation unit in an MTAP, the
+   timestamp offset MUST be zero.  Hence, the RTP timestamp of the MTAP
+   itself is identical to the earliest NALU-time.
+
+      Informative note: The "earliest" multi-time aggregation unit is
+      the one that would have the smallest extended RTP timestamp among
+      all the aggregation units of an MTAP if the NAL units contained in
+      the aggregation units were encapsulated in single NAL unit
+      packets.  An extended timestamp is a timestamp that has more than
+      32 bits and is capable of counting the wraparound of the timestamp
+      field, thus enabling one to determine the smallest value if the
+      timestamp wraps.  Such an "earliest" aggregation unit may not be
+      the first one in the order in which the aggregation units are
+      encapsulated in an MTAP.  The "earliest" NAL unit need not be the
+      same as the first NAL unit in the NAL unit decoding order either.
+
+   Figure 12 presents an example of an RTP packet that contains a multi-
+   time aggregation packet of type MTAP16 that contains two multi-time
+   aggregation units, labeled as 1 and 2 in the figure.
+
+
+
+Wang, et al.                 Standards Track                   [Page 27]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          RTP Header                           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |MTAP16 NAL HDR |  decoding order number base   | NALU 1 Size   |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offset        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |  NALU 1 HDR   |  NALU 1 DATA                                  |
+    +-+-+-+-+-+-+-+-+                                               +
+    :                                                               :
+    +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |               | NALU 2 SIZE                   |  NALU 2 DOND  |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |       NALU 2 TS offset        |  NALU 2 HDR   |  NALU 2 DATA  |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+    :                                                               :
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 12.  An RTP packet including a multi-time aggregation
+                packet of type MTAP16 containing two multi-time
+                aggregation units
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 28]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Figure 13 presents an example of an RTP packet that contains a multi-
+   time aggregation packet of type MTAP24 that contains two multi-time
+   aggregation units, labeled as 1 and 2 in the figure.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          RTP Header                           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |MTAP24 NAL HDR |  decoding order number base   | NALU 1 Size   |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offs          |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |NALU 1 TS offs |  NALU 1 HDR   |  NALU 1 DATA                  |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
+    :                                                               :
+    +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |               | NALU 2 SIZE                   |  NALU 2 DOND  |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |       NALU 2 TS offset                        |  NALU 2 HDR   |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |  NALU 2 DATA                                                  |
+    :                                                               :
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 13.  An RTP packet including a multi-time aggregation
+                packet of type MTAP24 containing two multi-time
+                aggregation units
+
+5.8.  Fragmentation Units (FUs)
+
+   This payload type allows fragmenting a NAL unit into several RTP
+   packets.  Doing so on the application layer instead of relying on
+   lower-layer fragmentation (e.g., by IP) has the following advantages:
+
+   o  The payload format is capable of transporting NAL units bigger
+      than 64 kbytes over an IPv4 network that may be present in pre-
+      recorded video, particularly in High-Definition formats (there is
+      a limit of the number of slices per picture, which results in a
+      limit of NAL units per picture, which may result in big NAL
+      units).
+
+   o  The fragmentation mechanism allows fragmenting a single NAL unit
+      and applying generic forward error correction as described in
+      Section 12.5.
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 29]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Fragmentation is defined only for a single NAL unit and not for any
+   aggregation packets.  A fragment of a NAL unit consists of an integer
+   number of consecutive octets of that NAL unit.  Each octet of the NAL
+   unit MUST be part of exactly one fragment of that NAL unit.
+   Fragments of the same NAL unit MUST be sent in consecutive order with
+   ascending RTP sequence numbers (with no other RTP packets within the
+   same RTP packet stream being sent between the first and last
+   fragment).  Similarly, a NAL unit MUST be reassembled in RTP sequence
+   number order.
+
+   When a NAL unit is fragmented and conveyed within fragmentation units
+   (FUs), it is referred to as a fragmented NAL unit.  STAPs and MTAPs
+   MUST NOT be fragmented.  FUs MUST NOT be nested; that is, an FU MUST
+   NOT contain another FU.
+
+   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
+   time of the fragmented NAL unit.
+
+   Figure 14 presents the RTP payload format for FU-As.  An FU-A
+   consists of a fragmentation unit indicator of one octet, a
+   fragmentation unit header of one octet, and a fragmentation unit
+   payload.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    | FU indicator  |   FU header   |                               |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
+    |                                                               |
+    |                         FU payload                            |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 14.  RTP payload format for FU-A
+
+   Figure 15 presents the RTP payload format for FU-Bs.  An FU-B
+   consists of a fragmentation unit indicator of one octet, a
+   fragmentation unit header of one octet, a decoding order number (DON)
+   (in network byte order), and a fragmentation unit payload.  In other
+   words, the structure of FU-B is the same as the structure of FU-A,
+   except for the additional DON field.
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 30]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    | FU indicator  |   FU header   |               DON             |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
+    |                                                               |
+    |                         FU payload                            |
+    |                                                               |
+    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                               :...OPTIONAL RTP padding        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+    Figure 15.  RTP payload format for FU-B
+
+   NAL unit type FU-B MUST be used in the interleaved packetization mode
+   for the first fragmentation unit of a fragmented NAL unit.  NAL unit
+   type FU-B MUST NOT be used in any other case.  In other words, in the
+   interleaved packetization mode, each NALU that is fragmented has an
+   FU-B as the first fragment, followed by one or more FU-A fragments.
+
+   The FU indicator octet has the following format:
+
+       +---------------+
+       |0|1|2|3|4|5|6|7|
+       +-+-+-+-+-+-+-+-+
+       |F|NRI|  Type   |
+       +---------------+
+
+   Values equal to 28 and 29 in the type field of the FU indicator octet
+   identify an FU-A and an FU-B, respectively.  The use of the F bit is
+   described in Section 5.3.  The value of the NRI field MUST be set
+   according to the value of the NRI field in the fragmented NAL unit.
+
+   The FU header has the following format:
+
+      +---------------+
+      |0|1|2|3|4|5|6|7|
+      +-+-+-+-+-+-+-+-+
+      |S|E|R|  Type   |
+      +---------------+
+
+   S:     1 bit
+          When set to one, the Start bit indicates the start of a
+          fragmented NAL unit.  When the following FU payload is not the
+          start of a fragmented NAL unit payload, the Start bit is set
+          to zero.
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 31]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   E:     1 bit
+          When set to one, the End bit indicates the end of a fragmented
+          NAL unit, i.e., the last byte of the payload is also the last
+          byte of the fragmented NAL unit.  When the following FU
+          payload is not the last fragment of a fragmented NAL unit, the
+          End bit is set to zero.
+
+   R:     1 bit
+          The Reserved bit MUST be equal to 0 and MUST be ignored by the
+          receiver.
+
+   Type:  5 bits
+          The NAL unit payload type as defined in Table 7-1 of [1].
+
+   The value of DON in FU-Bs is selected as described in Section 5.5.
+
+      Informative note: The DON field in FU-Bs allows gateways to
+      fragment NAL units to FU-Bs without organizing the incoming NAL
+      units to the NAL unit decoding order.
+
+   A fragmented NAL unit MUST NOT be transmitted in one FU; that is, the
+   Start bit and End bit MUST NOT both be set to one in the same FU
+   header.
+
+   The FU payload consists of fragments of the payload of the fragmented
+   NAL unit so that if the fragmentation unit payloads of consecutive
+   FUs are sequentially concatenated, the payload of the fragmented NAL
+   unit can be reconstructed.  The NAL unit type octet of the fragmented
+   NAL unit is not included as such in the fragmentation unit payload,
+   but rather the information of the NAL unit type octet of the
+   fragmented NAL unit is conveyed in the F and NRI fields of the FU
+   indicator octet of the fragmentation unit and in the type field of
+   the FU header.  An FU payload MAY have any number of octets and MAY
+   be empty.
+
+      Informative note: Empty FUs are allowed to reduce the latency of a
+      certain class of senders in nearly lossless environments.  These
+      senders can be characterized in that they packetize NALU fragments
+      before the NALU is completely generated and, hence, before the
+      NALU size is known.  If zero-length NALU fragments were not
+      allowed, the sender would have to generate at least one bit of
+      data of the following fragment before the current fragment could
+      be sent.  Due to the characteristics of H.264, where sometimes
+      several macroblocks occupy zero bits, this is undesirable and can
+      add delay.  However, the (potential) use of zero-length NALU
+      fragments should be carefully weighed against the increased risk
+      of the loss of at least a part of the NALU because of the
+      additional packets employed for its transmission.
+
+
+
+Wang, et al.                 Standards Track                   [Page 32]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   If a fragmentation unit is lost, the receiver SHOULD discard all
+   following fragmentation units in transmission order corresponding to
+   the same fragmented NAL unit.
+
+   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
+   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
+   n of that NAL unit is not received.  In this case, the
+   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
+   syntax violation.
+
+6.  Packetization Rules
+
+   The packetization modes are introduced in Section 5.2.  The
+   packetization rules common to more than one of the packetization
+   modes are specified in Section 6.1.  The packetization rules for the
+   single NAL unit mode, the non-interleaved mode, and the interleaved
+   mode are specified in Sections 6.2, 6.3, and 6.4, respectively.
+
+6.1.  Common Packetization Rules
+
+   All senders MUST enforce the following packetization rules,
+   regardless of the packetization mode in use:
+
+   o  Coded slice NAL units or coded slice data partition NAL units
+      belonging to the same coded picture (and thus sharing the same RTP
+      timestamp value) MAY be sent in any order; however, for delay-
+      critical systems, they SHOULD be sent in their original decoding
+      order to minimize the delay.  Note that the decoding order is the
+      order of the NAL units in the bitstream.
+
+   o  Parameter sets are handled in accordance with the rules and
+      recommendations given in Section 8.4.
+
+   o  MANEs MUST NOT duplicate any NAL unit except for sequence or
+      picture parameter set NAL units, as neither this memo nor the
+      H.264 specification provides means to identify duplicated NAL
+      units.  Sequence and picture parameter set NAL units MAY be
+      duplicated to make their correct reception more probable, but any
+      such duplication MUST NOT affect the contents of any active
+      sequence or picture parameter set.  Duplication SHOULD be
+      performed on the application layer and not by duplicating RTP
+      packets (with identical sequence numbers).
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 33]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Senders using the non-interleaved mode and the interleaved mode MUST
+   enforce the following packetization rule:
+
+   o  In an RTP translator, MANEs MAY convert single NAL unit packets
+      into one aggregation packet, convert an aggregation packet into
+      several single NAL unit packets, or mix both concepts.  The RTP
+      translator SHOULD take into account at least the following
+      parameters: path MTU size, unequal protection mechanisms (e.g.,
+      through packet-based FEC according to RFC 5109 [18], especially
+      for sequence and picture parameter set NAL units and coded slice
+      data partition A NAL units), bearable latency of the system, and
+      buffering capabilities of the receiver.
+
+         Informative note: An RTP translator is required to handle RTP
+         Control Protocol (RTCP) as per RFC 3550.
+
+6.2.  Single NAL Unit Mode
+
+   This mode is in use when the value of the OPTIONAL packetization-mode
+   media type parameter is equal to 0 or the packetization-mode is not
+   present.  All receivers MUST support this mode.  It is primarily
+   intended for low-delay applications that are compatible with systems
+   using ITU-T Recommendation H.241 [3] (see Section 12.1).  Only single
+   NAL unit packets MAY be used in this mode.  STAPs, MTAPs, and FUs
+   MUST NOT be used.  The transmission order of single NAL unit packets
+   MUST comply with the NAL unit decoding order.
+
+6.3.  Non-Interleaved Mode
+
+   This mode is in use when the value of the OPTIONAL packetization-mode
+   media type parameter is equal to 1.  This mode SHOULD be supported.
+   It is primarily intended for low-delay applications.  Only single NAL
+   unit packets, STAP-As, and FU-As MAY be used in this mode.  STAP-Bs,
+   MTAPs, and FU-Bs MUST NOT be used.  The transmission order of NAL
+   units MUST comply with the NAL unit decoding order.
+
+6.4.  Interleaved Mode
+
+   This mode is in use when the value of the OPTIONAL packetization-mode
+   media type parameter is equal to 2.  Some receivers MAY support this
+   mode.  STAP-Bs, MTAPs, FU-As, and FU-Bs MAY be used.  STAP-As and
+   single NAL unit packets MUST NOT be used.  The transmission order of
+   packets and NAL units is constrained as specified in Section 5.5.
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 34]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+7.  De-Packetization Process
+
+   The de-packetization process is implementation dependent.  Therefore,
+   the following description should be seen as an example of a suitable
+   implementation.  Other schemes may also be used as long as the output
+   for the same input is the same as the process described below.  The
+   same output means that the resulting NAL units and their order are
+   identical.  Optimizations relative to the described algorithms are
+   likely possible.  Section 7.1 presents the de-packetization process
+   for the single NAL unit and non-interleaved packetization modes,
+   whereas Section 7.2 describes the process for the interleaved mode.
+   Section 7.3 includes additional de-packetization guidelines for
+   intelligent receivers.
+
+   All normal RTP mechanisms related to buffer management apply.  In
+   particular, duplicated or outdated RTP packets (as indicated by the
+   RTP sequence number and the RTP timestamp) are removed.  To determine
+   the exact time for decoding, factors such as a possible intentional
+   delay to allow for proper inter-stream synchronization must be
+   factored in.
+
+7.1.  Single NAL Unit and Non-Interleaved Mode
+
+   The receiver includes a receiver buffer to compensate for
+   transmission delay jitter.  The receiver stores incoming packets in
+   reception order into the receiver buffer.  Packets are de-packetized
+   in RTP sequence number order.  If a de-packetized packet is a single
+   NAL unit packet, the NAL unit contained in the packet is passed
+   directly to the decoder.  If a de-packetized packet is an STAP-A, the
+   NAL units contained in the packet are passed to the decoder in the
+   order in which they are encapsulated in the packet.  For all the FU-A
+   packets containing fragments of a single NAL unit, the de-packetized
+   fragments are concatenated in their sending order to recover the NAL
+   unit, which is then passed to the decoder.
+
+      Informative note: If the decoder supports arbitrary slice order,
+      coded slices of a picture can be passed to the decoder in any
+      order, regardless of their reception and transmission order.
+
+7.2.  Interleaved Mode
+
+   The general concept behind these de-packetization rules is to reorder
+   NAL units from transmission order to the NAL unit decoding order.
+
+   The receiver includes a receiver buffer, which is used to compensate
+   for transmission delay jitter and to reorder NAL units from
+   transmission order to the NAL unit decoding order.  In this section,
+   the receiver operation is described under the assumption that there
+
+
+
+Wang, et al.                 Standards Track                   [Page 35]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   is no transmission delay jitter.  To differentiate the receiver
+   buffer from a practical receiver buffer that is also used for
+   compensation of transmission delay jitter, the receiver buffer is
+   hereafter called the de-interleaving buffer in this section.
+   Receivers SHOULD also prepare for transmission delay jitter, i.e.,
+   either reserve separate buffers for transmission delay jitter
+   buffering and de-interleaving buffering or use a receiver buffer for
+   both transmission delay jitter and de-interleaving.  Moreover,
+   receivers SHOULD take transmission delay jitter into account in the
+   buffering operation, e.g., by additional initial buffering before
+   starting of decoding and playback.
+
+   This section is organized as follows: Subsection 7.2.1 presents how
+   to calculate the size of the de-interleaving buffer.  Subsection
+   7.2.2 specifies the receiver process on how to organize received NAL
+   units to the NAL unit decoding order.
+
+7.2.1.  Size of the De-Interleaving Buffer
+
+   In either Offer/Answer or declarative Session Description Protocol
+   (SDP) usage, the sprop-deint-buf-req media type parameter signals the
+   requirement for the de-interleaving buffer size.  Therefore, it is
+   RECOMMENDED to set the de-interleaving buffer size, in terms of
+   number of bytes, equal to or greater than the value of the sprop-
+   deint-buf-req media type parameter.
+
+   When the SDP Offer/Answer model or any other capability exchange
+   procedure is used in session setup, the properties of the received
+   stream SHOULD be such that the receiver capabilities are not
+   exceeded.  In the SDP Offer/Answer model, the receiver can indicate
+   its capabilities to allocate a de-interleaving buffer with the deint-
+   buf-cap media type parameter.  See Section 8.1 for further
+   information on the deint-buf-cap and sprop-deint-buf-req media type
+   parameters and Section 8.2.2 for further information on their use in
+   the SDP Offer/Answer model.
+
+7.2.2.  De-Interleaving Process
+
+   There are two buffering states in the receiver: initial buffering and
+   buffering while playing.  Initial buffering occurs when the RTP
+   session is initialized.  After initial buffering, decoding and
+   playback are started, and the buffering-while-playing mode is used.
+
+   Regardless of the buffering state, the receiver stores incoming NAL
+   units, in reception order, in the de-interleaving buffer as follows.
+   NAL units of aggregation packets are stored in the de-interleaving
+   buffer individually.  The value of DON is calculated and stored for
+   each NAL unit.
+
+
+
+Wang, et al.                 Standards Track                   [Page 36]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   The receiver operation is described below with the help of the
+   following functions and constants:
+
+   o  Function AbsDON is specified in Section 8.1.
+
+   o  Function don_diff is specified in Section 5.5.
+
+   o  Constant N is the value of the OPTIONAL sprop-interleaving-depth
+      media type parameter (see Section 8.1) incremented by 1.
+
+   Initial buffering lasts until one of the following conditions is
+   fulfilled:
+
+   o  There are N or more VCL NAL units in the de-interleaving buffer.
+
+   o  If sprop-max-don-diff is present, don_diff(m,n) is greater than
+      the value of sprop-max-don-diff, in which n corresponds to the NAL
+      unit having the greatest value of AbsDON among the received NAL
+      units and m corresponds to the NAL unit having the smallest value
+      of AbsDON among the received NAL units.
+
+   o  Initial buffering has lasted for the duration equal to or greater
+      than the value of the OPTIONAL sprop-init-buf-time media type
+      parameter.
+
+   The NAL units to be removed from the de-interleaving buffer are
+   determined as follows:
+
+   o  If the de-interleaving buffer contains at least N VCL NAL units,
+      NAL units are removed from the de-interleaving buffer and passed
+      to the decoder in the order specified below until the buffer
+      contains N-1 VCL NAL units.
+
+   o  If sprop-max-don-diff is present, all NAL units m for which
+      don_diff(m,n) is greater than sprop-max-don-diff are removed from
+      the de-interleaving buffer and passed to the decoder in the order
+      specified below.  Herein, n corresponds to the NAL unit having the
+      greatest value of AbsDON among the NAL units in the de-
+      interleaving buffer.
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 37]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   The order in which NAL units are passed to the decoder is specified
+   as follows:
+
+   o  Let PDON be a variable that is initialized to 0 at the beginning
+      of the RTP session.
+
+   o  For each NAL unit associated with a value of DON, a DON distance
+      is calculated as follows.  If the value of DON of the NAL unit is
+      larger than the value of PDON, the DON distance is equal to DON -
+      PDON.  Otherwise, the DON distance is equal to 65535 - PDON + DON
+      + 1.
+
+   o  NAL units are delivered to the decoder in ascending order of DON
+      distance.  If several NAL units share the same value of DON
+      distance, they can be passed to the decoder in any order.
+
+   o  When a desired number of NAL units have been passed to the
+      decoder, the value of PDON is set to the value of DON for the last
+      NAL unit passed to the decoder.
+
+7.3.  Additional De-Packetization Guidelines
+
+   The following additional de-packetization rules may be used to
+   implement an operational H.264 de-packetizer:
+
+   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
+      coded slice data partitions A (DPAs).  If a lost DPA is detected,
+      after taking into account possible retransmission and FEC, a
+      gateway may decide not to send the corresponding coded slice data
+      partitions B and C, as their information is meaningless for H.264
+      decoders.  In this way, a MANE can reduce network load by
+      discarding useless packets without parsing a complex bitstream.
+
+   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
+      FUs.  If a lost FU is found, a gateway may decide not to send the
+      following FUs of the same fragmented NAL unit, as their
+      information is meaningless for H.264 decoders.  In this way, a
+      MANE can reduce network load by discarding useless packets without
+      parsing a complex bitstream.
+
+   o  Intelligent receivers having to discard packets or NALUs should
+      first discard all packets/NALUs in which the value of the NRI
+      field of the NAL unit type octet is equal to 0.  This will
+      minimize the impact on user experience and keep the reference
+      pictures intact.  If more packets have to be discarded, then
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 38]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      packets with a numerically lower NRI value should be discarded
+      before packets with a numerically higher NRI value.  However,
+      discarding any packets with an NRI bigger than 0 very likely leads
+      to decoder drift and SHOULD be avoided.
+
+8.  Payload Format Parameters
+
+   This section specifies the parameters that MAY be used to select
+   optional features of the payload format and certain features of the
+   bitstream.  The parameters are specified here as part of the media
+   subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec.  A
+   mapping of the parameters into the Session Description Protocol (SDP)
+   [6] is also provided for applications that use SDP.  Equivalent
+   parameters could be defined elsewhere for use with control protocols
+   that do not use SDP.
+
+   Some parameters provide a receiver with the properties of the stream
+   that will be sent.  The names of all these parameters start with
+   "sprop" for stream properties.  Some of these "sprop" parameters are
+   limited by other payload or codec configuration parameters.  For
+   example, the sprop-parameter-sets parameter is constrained by the
+   profile-level-id parameter.
+
+8.1.  Media Type Registration
+
+   The media subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec has
+   been allocated from the IETF tree.
+
+   Media Type name:     video
+
+   Media subtype name:  H264
+
+   Required parameters: none
+
+   OPTIONAL parameters:
+
+      profile-level-id:
+         A base16 [7] (hexadecimal) representation of the following
+         three bytes in the sequence parameter set NAL unit is specified
+         in [1]: 1) profile_idc, 2) a byte herein referred to as
+         profile-iop, composed of the values of constraint_set0_flag,
+         constraint_set1_flag, constraint_set2_flag,
+         constraint_set3_flag, constraint_set4_flag,
+         constraint_set5_flag, and reserved_zero_2bits in bit-
+         significance order, starting from the most-significant bit, and
+         3) level_idc.  Note that reserved_zero_2bits is required to be
+         equal to 0 in [1], but other values for it may be specified in
+         the future by ITU-T or ISO/IEC.
+
+
+
+Wang, et al.                 Standards Track                   [Page 39]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         The profile-level-id parameter indicates the default sub-
+         profile (i.e., the subset of coding tools that may have been
+         used to generate the stream or that the receiver supports) and
+         the default level of the stream or the receiver supports.
+
+         The default sub-profile is indicated collectively by the
+         profile_idc byte and some fields in the profile-iop byte.
+         Depending on the values of the fields in the profile-iop byte,
+         the default sub-profile may be the set of coding tools
+         supported by one profile, or a common subset of coding tools of
+         multiple profiles, as specified in Section 7.4.2.1.1 of [1].
+         The default level is indicated by the level_idc byte, and, when
+         profile_idc is equal to 66, 77, or 88 (the Baseline, Main, or
+         Extended profile) and level_idc is equal to 11, additionally by
+         bit 4 (constraint_set3_flag) of the profile-iop byte.  When
+         profile_idc is equal to 66, 77, or 88 (the Baseline, Main, or
+         Extended profile), level_idc is equal to 11, and bit 4
+         (constraint_set3_flag) of the profile-iop byte is equal to 1,
+         the default level is Level 1b.
+
+         Table 5 lists all profiles defined in Annex A of [1] and, for
+         each of the profiles, the possible combinations of profile_idc
+         and profile-iop that represent the same sub-profile.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 40]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+            Table 5.  Combinations of profile_idc and profile-iop
+            representing the same sub-profile corresponding to the full
+            set of coding tools supported by one profile.  In the
+            following, x may be either 0 or 1, while the profile names
+            are indicated as follows.  CB: Constrained Baseline profile,
+            B: Baseline profile, M: Main profile, E: Extended profile,
+            H: High profile, H10: High 10 profile, H42: High 4:2:2
+            profile, H44: High 4:4:4 Predictive profile, H10I: High 10
+            Intra profile, H42I: High 4:2:2 Intra profile, H44I: High
+            4:4:4 Intra profile, and C44I: CAVLC 4:4:4 Intra profile.
+
+              Profile     profile_idc        profile-iop
+                          (hexadecimal)      (binary)
+
+              CB          42 (B)             x1xx0000
+                 same as: 4D (M)             1xxx0000
+                 same as: 58 (E)             11xx0000
+              B           42 (B)             x0xx0000
+                 same as: 58 (E)             10xx0000
+              M           4D (M)             0x0x0000
+              E           58                 00xx0000
+              H           64                 00000000
+              H10         6E                 00000000
+              H42         7A                 00000000
+              H44         F4                 00000000
+              H10I        6E                 00010000
+              H42I        7A                 00010000
+              H44I        F4                 00010000
+              C44I        2C                 00010000
+
+         For example, in the table above, profile_idc equal to 58
+         (Extended) with profile-iop equal to 11xx0000 indicates the
+         same sub-profile corresponding to profile_idc equal to 42
+         (Baseline) with profile-iop equal to x1xx0000.  Note that other
+         combinations of profile_idc and profile-iop (not listed in
+         Table 5) may represent a sub-profile equivalent to the common
+         subset of coding tools for more than one profile.  Note also
+         that a decoder conforming to a certain profile may be able to
+         decode bitstreams conforming to other profiles.
+
+         If the profile-level-id parameter is used to indicate
+         properties of a NAL unit stream, it indicates that, to decode
+         the stream, the minimum subset of coding tools a decoder has to
+         support is the default sub-profile, and the lowest level the
+         decoder has to support is the default level.
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 41]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         If the profile-level-id parameter is used for capability
+         exchange or session setup, it indicates the subset of coding
+         tools, which is equal to the default sub-profile, that the
+         codec supports for both receiving and sending.  If max-recv-
+         level is not present, the default level from profile-level-id
+         indicates the highest level the codec wishes to support.  If
+         max-recv-level is present, it indicates the highest level the
+         codec supports for receiving.  For either receiving or sending,
+         all levels that are lower than the highest level supported MUST
+         also be supported.
+
+            Informative note: Capability exchange and session setup
+            procedures should provide means to list the capabilities for
+            each supported sub-profile separately.  For example, the
+            one-of-N codec selection procedure of the SDP Offer/Answer
+            model can be used (Section 10.2 of [8]).  The one-of-N codec
+            selection procedure may also be used to provide different
+            combinations of profile_idc and profile-iop that represent
+            the same sub-profile.  When there are many different
+            combinations of profile_idc and profile-iop that represent
+            the same sub-profile, using the one-of-N codec selection
+            procedure may result in a fairly large SDP message.
+            Therefore, a receiver should understand the different
+            equivalent combinations of profile_idc and profile-iop that
+            represent the same sub-profile and be ready to accept an
+            offer using any of the equivalent combinations.
+
+         If no profile-level-id is present, the Baseline profile,
+         without additional constraints at Level 1, MUST be inferred.
+
+      max-recv-level:
+         This parameter MAY be used to indicate the highest level a
+         receiver supports when the highest level is higher than the
+         default level (the level indicated by profile-level-id).  The
+         value of max-recv-level is a base16 (hexadecimal)
+         representation of the two bytes after the syntax element
+         profile_idc in the sequence parameter set NAL unit specified in
+         [1]: profile-iop (as defined above) and level_idc.  If the
+         level_idc byte of max-recv-level is equal to 11 and bit 4 of
+         the profile-iop byte of max-recv-level is equal to 1 or if the
+         level_idc byte of max-recv-level is equal to 9 and bit 4 of the
+         profile-iop byte of max-recv-level is equal to 0, the highest
+         level the receiver supports is Level 1b.  Otherwise, the
+         highest level the receiver supports is equal to the level_idc
+         byte of max-recv-level divided by 10.
+
+         max-recv-level MUST NOT be present if the highest level the
+         receiver supports is not higher than the default level.
+
+
+
+Wang, et al.                 Standards Track                   [Page 42]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br:
+         These parameters MAY be used to signal the capabilities of a
+         receiver implementation.  These parameters MUST NOT be used for
+         any other purpose.  The highest level conveyed in the value of
+         the profile-level-id parameter or the max-recv-level parameter
+         MUST be such that the receiver is fully capable of supporting.
+         max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br MAY
+         be used to indicate capabilities of the receiver that extend
+         the required capabilities of the signaled highest level, as
+         specified below.
+
+         When more than one parameter from the set (max-mbps, max-smbps,
+         max-fs, max-cpb, max-dpb, max-br) is present, the receiver MUST
+         support all signaled capabilities simultaneously.  For example,
+         if both max-mbps and max-br are present, the signaled highest
+         level with the extension of both the frame rate and bitrate is
+         supported.  That is, the receiver is able to decode NAL unit
+         streams in which the macroblock processing rate is up to max-
+         mbps (inclusive), the bitrate is up to max-br (inclusive), the
+         coded picture buffer size is derived as specified in the
+         semantics of the max-br parameter below, and the other
+         properties comply with the highest level specified in the value
+         of the profile-level-id parameter or the max-recv-level
+         parameter.
+
+         If a receiver can support all the properties of Level A, the
+         highest level specified in the value of the profile-level-id
+         parameter or the max-recv-level parameter MUST be Level A
+         (i.e., MUST NOT be lower than Level A).  In other words, a
+         receiver MUST NOT signal values of max-mbps, max-fs, max-cpb,
+         max-dpb, and max-br that taken together meet the requirements
+         of a higher level compared to the highest level specified in
+         the value of the profile-level-id parameter or the max-recv-
+         level parameter.
+
+            Informative note: When the OPTIONAL media type parameters
+            are used to signal the properties of a NAL unit stream, max-
+            mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br are
+            not present, and the value of profile-level-id must always
+            be such that the NAL unit stream complies fully with the
+            specified profile and level.
+
+      max-mbps: The value of max-mbps is an integer indicating the
+         maximum macroblock processing rate in units of macroblocks per
+         second.  The max-mbps parameter signals that the receiver is
+         capable of decoding video at a higher rate than is required by
+         the signaled highest level conveyed in the value of the
+         profile-level-id parameter or the max-recv-level parameter.
+
+
+
+Wang, et al.                 Standards Track                   [Page 43]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         When max-mbps is signaled, the receiver MUST be able to decode
+         NAL unit streams that conform to the signaled highest level,
+         with the exception that the MaxMBPS value in Table A-1 of [1]
+         for the signaled highest level is replaced with the value of
+         max-mbps.  The value of max-mbps MUST be greater than or equal
+         to the value of MaxMBPS given in Table A-1 of [1] for the
+         highest level.  Senders MAY use this knowledge to send pictures
+         of a given size at a higher picture rate than is indicated in
+         the signaled highest level.
+
+      max-smbps: The value of max-smbps is an integer indicating the
+         maximum static macroblock processing rate in units of static
+         macroblocks per second, under the hypothetical assumption that
+         all macroblocks are static macroblocks.  When max-smbps is
+         signaled, the MaxMBPS value in Table A-1 of [1] should be
+         replaced with the result of the following computation:
+
+         o  If the parameter max-mbps is signaled, set a variable
+            MaxMacroblocksPerSecond to the value of max-mbps.
+            Otherwise, set MaxMacroblocksPerSecond equal to the value of
+            MaxMBPS in Table A-1 [1] for the signaled highest level
+            conveyed in the value of the profile-level-id parameter or
+            the max-recv-level parameter.
+
+         o  Set a variable P_non-static to the proportion of non-static
+            macroblocks in picture n.
+
+         o  Set a variable P_static to the proportion of static
+            macroblocks in picture n.
+
+         o  The value of MaxMBPS in Table A-1 of [1] should be
+            considered by the encoder to be equal to:
+
+            MaxMacroblocksPerSecond * max-smbps / (P_non-static *
+            max-smbps + P_static * MaxMacroblocksPerSecond)
+
+         The encoder should recompute this value for each picture.  The
+         value of max-smbps MUST be greater than or equal to the value
+         of MaxMBPS given explicitly as the value of the max-mbps
+         parameter or implicitly in Table A-1 of [1] for the signaled
+         highest level.  Senders MAY use this knowledge to send pictures
+         of a given size at a higher picture rate than is indicated in
+         the signaled highest level.
+
+      max-fs: The value of max-fs is an integer indicating the maximum
+         frame size in units of macroblocks.  The max-fs parameter
+         signals that the receiver is capable of decoding larger picture
+         sizes than are required by the signaled highest level conveyed
+
+
+
+Wang, et al.                 Standards Track                   [Page 44]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         in the value of the profile-level-id parameter or the max-recv-
+         level parameter.  When max-fs is signaled, the receiver MUST be
+         able to decode NAL unit streams that conform to the signaled
+         highest level, with the exception that the MaxFS value in Table
+         A-1 of [1] for the signaled highest level is replaced with the
+         value of max-fs.  The value of max-fs MUST be greater than or
+         equal to the value of MaxFS given in Table A-1 of [1] for the
+         highest level.  Senders MAY use this knowledge to send larger
+         pictures at a proportionally lower frame rate than is indicated
+         in the signaled highest level.
+
+      max-cpb: The value of max-cpb is an integer indicating the maximum
+         coded picture buffer size in units of 1000 bits for the VCL HRD
+         parameters and in units of 1200 bits for the NAL HRD
+         parameters.  Note that this parameter does not use units of
+         cpbBrVclFactor and cpbBrNALFactor (see Table A-1 of [1]).  The
+         max-cpb parameter signals that the receiver has more memory
+         than the minimum amount of coded picture buffer memory required
+         by the signaled highest level conveyed in the value of the
+         profile-level-id parameter or the max-recv-level parameter.
+         When max-cpb is signaled, the receiver MUST be able to decode
+         NAL unit streams that conform to the signaled highest level,
+         with the exception that the MaxCPB value in Table A-1 of [1]
+         for the signaled highest level is replaced with the value of
+         max-cpb (after taking cpbBrVclFactor and cpbBrNALFactor into
+         consideration when needed).  The value of max-cpb (after taking
+         cpbBrVclFactor and cpbBrNALFactor into consideration when
+         needed) MUST be greater than or equal to the value of MaxCPB
+         given in Table A-1 of [1] for the highest level.  Senders MAY
+         use this knowledge to construct coded video streams with
+         greater variation of bitrate than can be achieved with the
+         MaxCPB value in Table A-1 of [1].
+
+            Informative note: The coded picture buffer is used in the
+            hypothetical reference decoder (Annex C of H.264).  The use
+            of the hypothetical reference decoder is recommended in
+            H.264 encoders to verify that the produced bitstream
+            conforms to the standard and to control the output bitrate.
+            Thus, the coded picture buffer is conceptually independent
+            of any other potential buffers in the receiver, including
+            de-interleaving and de-jitter buffers.  The coded picture
+            buffer need not be implemented in decoders as specified in
+            Annex C of H.264, but rather standard-compliant decoders can
+            have any buffering arrangements provided that they can
+            decode standard-compliant bitstreams.  Thus, in practice,
+            the input buffer for a video decoder can be integrated with
+            de-interleaving and de-jitter buffers of the receiver.
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 45]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      max-dpb: The value of max-dpb is an integer indicating the maximum
+         decoded picture buffer size in units of 8/3 macroblocks.  The
+         max-dpb parameter signals that the receiver has more memory
+         than the minimum amount of decoded picture buffer memory
+         required by the signaled highest level conveyed in the value of
+         the profile-level-id parameter or the max-recv-level parameter.
+         When max-dpb is signaled, the receiver MUST be able to decode
+         NAL unit streams that conform to the signaled highest level,
+         with the exception that the MaxDpbMbs value in Table A-1 of [1]
+         for the signaled highest level is replaced with the value of
+         max-dpb * 3 / 8.  Consequently, a receiver that signals max-dpb
+         MUST be capable of storing the following number of decoded
+         frames, complementary field pairs, and non-paired fields in its
+         decoded picture buffer:
+
+            Min(max-dpb * 3 / 8 / ( PicWidthInMbs * FrameHeightInMbs),
+            16)
+
+         Wherein PicWidthInMbs and FrameHeightInMbs are defined in [1].
+
+         The value of max-dpb MUST be greater than or equal to the value
+         of MaxDpbMbs * 3 / 8, wherein the value of MaxDpbMbs is given
+         in Table A-1 of [1] for the highest level.  Senders MAY use
+         this knowledge to construct coded video streams with improved
+         compression.
+
+            Informative note: This parameter was added primarily to
+            complement a similar codepoint in the ITU-T Recommendation
+            H.245, so as to facilitate signaling gateway designs.  The
+            decoded picture buffer stores reconstructed samples.  There
+            is no relationship between the size of the decoded picture
+            buffer and the buffers used in RTP, especially
+            de-interleaving and de-jitter buffers.
+
+            Informative note: In RFC 3984, which this document
+            obsoletes, the unit of this parameter was 1024 bytes.  The
+            unit has been changed to 8/3 macroblocks in this document.
+            The reason for this change was due to the changes from the
+            2003 version of the H.264 specification referenced by RFC
+            3984 to the 2010 version of the H.264 specification
+            referenced by this document, particularly the changes to
+            Table A-1 in the H.264 specification due to addition of
+            color formats and bit depths not supported earlier.  The
+            changed semantics of this parameter keeps backward
+            compatibility to RFC 3984 and supports all profiles defined
+            in the 2010 version of the H.264 specification.
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 46]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      max-br: The value of max-br is an integer indicating the maximum
+         video bitrate in units of 1000 bits per second for the VCL HRD
+         parameters and in units of 1200 bits per second for the NAL HRD
+         parameters.  Note that this parameter does not use units of
+         cpbBrVclFactor and cpbBrNALFactor (see Table A-1 of [1]).
+
+         The max-br parameter signals that the video decoder of the
+         receiver is capable of decoding video at a higher bitrate than
+         is required by the signaled highest level conveyed in the value
+         of the profile-level-id parameter or the max-recv-level
+         parameter.
+
+         When max-br is signaled, the video codec of the receiver MUST
+         be able to decode NAL unit streams that conform to the signaled
+         highest level, with the following exceptions in the limits
+         specified by the highest level:
+
+         o  The value of max-br (after taking cpbBrVclFactor and
+            cpbBrNALFactor into consideration when needed) replaces the
+            MaxBR value in Table A-1 of [1] for the highest level.
+
+         o  When the max-cpb parameter is not present, the result of the
+            following formula replaces the value of MaxCPB in Table A-1
+            of [1]: (MaxCPB of the signaled level) * max-br / (MaxBR of
+            the signaled highest level).
+
+         For example, if a receiver signals capability for Main profile
+         Level 1.2 with max-br equal to 1550, this indicates a maximum
+         video bitrate of 1550 kbits/sec for VCL HRD parameters, a
+         maximum video bitrate of 1860 kbits/sec for NAL HRD parameters,
+         and a CPB size of 4036458 bits (1550000 / 384000 * 1000 *
+         1000).
+
+         The value of max-br (after taking cpbBrVclFactor and
+         cpbBrNALFactor into consideration when needed) MUST be greater
+         than or equal to the value MaxBR given in Table A-1 of [1] for
+         the signaled highest level.
+
+         Senders MAY use this knowledge to send higher bitrate video as
+         allowed in the level definition of Annex A of H.264 to achieve
+         improved video quality.
+
+            Informative note: This parameter was added primarily to
+            complement a similar codepoint in the ITU-T Recommendation
+            H.245, so as to facilitate signaling gateway designs.  The
+            assumption that the network is capable of handling such
+            bitrates at any given time cannot be made from the value of
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 47]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+            this parameter.  In particular, no conclusion can be drawn
+            that the signaled bitrate is possible under congestion
+            control constraints.
+
+      redundant-pic-cap:
+         This parameter signals the capabilities of a receiver
+         implementation.  When equal to 0, the parameter indicates that
+         the receiver makes no attempt to use redundant coded pictures
+         to correct incorrectly decoded primary coded pictures.  When
+         equal to 0, the receiver is not capable of using redundant
+         slices; therefore, a sender SHOULD avoid sending redundant
+         slices to save bandwidth.  When equal to 1, the receiver is
+         capable of decoding any such redundant slice that covers a
+         corrupted area in a primary decoded picture (at least partly),
+         and therefore a sender MAY send redundant slices.  When the
+         parameter is not present, a value of 0 MUST be used for
+         redundant-pic-cap.  When present, the value of redundant-pic-
+         cap MUST be either 0 or 1.
+
+         When the profile-level-id parameter is present in the same
+         signaling as the redundant-pic-cap parameter and the profile
+         indicated in profile-level-id is such that it disallows the use
+         of redundant coded pictures (e.g., Main profile), the value of
+         redundant-pic-cap MUST be equal to 0.  When a receiver
+         indicates redundant-pic-cap equal to 0, the received stream
+         SHOULD NOT contain redundant coded pictures.
+
+            Informative note: Even if redundant-pic-cap is equal to 0,
+            the decoder is able to ignore redundant codec pictures
+            provided that the decoder supports a profile (Baseline,
+            Extended) in which redundant coded pictures are allowed.
+
+            Informative note: Even if redundant-pic-cap is equal to 1,
+            the receiver may also choose other error concealment
+            strategies to replace or complement decoding of redundant
+            slices.
+
+      sprop-parameter-sets:
+         This parameter MAY be used to convey any sequence and picture
+         parameter set NAL units (herein referred to as the initial
+         parameter set NAL units) that can be placed in the NAL unit
+         stream to precede any other NAL units in decoding order.  The
+         parameter MUST NOT be used to indicate codec capability in any
+         capability exchange procedure.  The value of the parameter is a
+         comma-separated (',') list of base64 [7] representations of
+         parameter set NAL units as specified in Sections 7.3.2.1 and
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 48]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         7.3.2.2 of [1].  Note that the number of bytes in a parameter
+         set NAL unit is typically less than 10, but a picture parameter
+         set NAL unit can contain several hundred bytes.
+
+            Informative note: When several payload types are offered in
+            the SDP Offer/Answer model, each with its own sprop-
+            parameter-sets parameter, the receiver cannot assume that
+            those parameter sets do not use conflicting storage
+            locations (i.e., identical values of parameter set
+            identifiers).  Therefore, a receiver should buffer all
+            sprop-parameter-sets and make them available to the decoder
+            instance that decodes a certain payload type.
+
+         The sprop-parameter-sets parameter MUST only contain parameter
+         sets that are conforming to the profile-level-id, i.e., the
+         subset of coding tools indicated by any of the parameter sets
+         MUST be equal to the default sub-profile, and the level
+         indicated by any of the parameter sets MUST be equal to the
+         default level.
+
+      sprop-level-parameter-sets:
+         This parameter MAY be used to convey any sequence and picture
+         parameter set NAL units (herein referred to as the initial
+         parameter set NAL units) that can be placed in the NAL unit
+         stream to precede any other NAL units in decoding order and
+         that are associated with one or more levels different than the
+         default level.  The parameter MUST NOT be used to indicate
+         codec capability in any capability exchange procedure.
+
+         The sprop-level-parameter-sets parameter contains parameter
+         sets for one or more levels that are different than the default
+         level.  All parameter sets associated with one level are
+         clustered and prefixed with a three-byte field that has the
+         same syntax as profile-level-id.  This enables the receiver to
+         install the parameter sets for one level and discard the rest.
+         The three-byte field is named PLId, and all parameter sets
+         associated with one level are named PSL, which has the same
+         syntax as sprop-parameter-sets.  Parameter sets for each level
+         are represented in the form of PLId:PSL, i.e., PLId followed by
+         a colon (':') and the base64 [7] representation of the initial
+         parameter set NAL units for the level.  Each pair of PLId:PSLs
+         is also separated by a colon.  Note that a PSL can contain
+         multiple parameter sets for that level, separated with commas
+         (',').
+
+         The subset of coding tools indicated by each PLId field MUST be
+         equal to the default sub-profile, and the level indicated by
+         each PLId field MUST be different than the default level.  All
+
+
+
+Wang, et al.                 Standards Track                   [Page 49]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         sequence parameter sets contained in each PSL MUST have the
+         three bytes from profile_idc to level_idc, inclusive, equal to
+         the preceding PLId.
+
+            Informative note: This parameter allows for efficient level
+            downgrade or upgrade in SDP Offer/Answer and out-of-band
+            transport of parameter sets simultaneously.
+
+      use-level-src-parameter-sets:
+         This parameter MAY be used to indicate a receiver capability.
+         The value MAY be equal to either 0 or 1.  When the parameter is
+         not present, the value MUST be inferred to be equal to 0.  The
+         value 0 indicates that the receiver does not understand the
+         sprop-level-parameter-sets parameter, does not understand the
+         "fmtp" source attribute as specified in Section 6.3 of [9],
+         will ignore sprop-level-parameter-sets when present, and will
+         ignore sprop-parameter-sets when conveyed using the "fmtp"
+         source attribute.  The value 1 indicates that the receiver
+         understands the sprop-level-parameter-sets parameter,
+         understands the "fmtp" source attribute as specified in Section
+         6.3 of [9], and is capable of using parameter sets contained in
+         the sprop-level-parameter-sets or contained in the sprop-
+         parameter-sets that is conveyed using the "fmtp" source
+         attribute.
+
+            Informative note: An RFC 3984 receiver does not understand
+            sprop-level-parameter-sets, use-level-src-parameter-sets, or
+            the "fmtp" source attribute as specified in Section 6.3 of
+            [9].  Therefore, during SDP Offer/Answer, an RFC 3984
+            receiver as the answerer will simply ignore sprop-level-
+            parameter-sets when present in an offer and sprop-parameter-
+            sets conveyed using the "fmtp" source attribute, as
+            specified in Section 6.3 of [9].  Assume that the offered
+            payload type was accepted at a level lower than the default
+            level.  If the offered payload type included sprop-level-
+            parameter-sets or included sprop-parameter-sets conveyed
+            using the "fmtp" source attribute and if the offerer sees
+            that the answerer has not included use-level-src-parameter-
+            sets equal to 1 in the answer, the offerer knows that
+            in-band transport of parameter sets is needed.
+
+      in-band-parameter-sets:
+         This parameter MAY be used to indicate a receiver capability.
+         The value MAY be equal to either 0 or 1.  The value 1 indicates
+         that the receiver discards out-of-band parameter sets in sprop-
+         parameter-sets and sprop-level-parameter-sets; therefore, the
+         sender MUST transmit all parameter sets in-band.  The value 0
+         indicates that the receiver utilizes out-of-band parameter sets
+
+
+
+Wang, et al.                 Standards Track                   [Page 50]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         included in sprop-parameter-sets and/or sprop-level-parameter-
+         sets.  However, in this case, the sender MAY still choose to
+         send parameter sets in-band.  When in-band-parameter-sets is
+         equal to 1, use-level-src-parameter-sets MUST NOT be present or
+         MUST be equal to 0.  When the parameter is not present, this
+         receiver capability is not specified, and therefore the sender
+         MAY send out-of-band parameter sets only, it MAY send in-band-
+         parameter-sets only, or it MAY send both.
+
+      level-asymmetry-allowed:
+         This parameter MAY be used in SDP Offer/Answer to indicate
+         whether level asymmetry, i.e., sending media encoded at a
+         different level in the offerer-to-answerer direction than the
+         level in the answerer-to-offerer direction, is allowed.  The
+         value MAY be equal to either 0 or 1.  When the parameter is not
+         present, the value MUST be inferred to be equal to 0.  The
+         value 1 in both the offer and the answer indicates that level
+         asymmetry is allowed.  The value of 0 in either the offer or
+         the answer indicates that level asymmetry is not allowed.
+
+         If level-asymmetry-allowed is equal to 0 (or not present) in
+         either the offer or the answer, level asymmetry is not allowed.
+         In this case, the level to use in the direction from the
+         offerer to the answerer MUST be the same as the level to use in
+         the opposite direction.
+
+      packetization-mode:
+         This parameter signals the properties of an RTP payload type or
+         the capabilities of a receiver implementation.  Only a single
+         configuration point can be indicated; thus, when capabilities
+         to support more than one packetization-mode are declared,
+         multiple configuration points (RTP payload types) must be used.
+
+         When the value of packetization-mode is equal to 0 or
+         packetization-mode is not present, the single NAL mode MUST be
+         used.  This mode is in use in standards using ITU-T
+         Recommendation H.241 [3] (see Section 12.1).  When the value of
+         packetization-mode is equal to 1, the non-interleaved mode MUST
+         be used.  When the value of packetization-mode is equal to 2,
+         the interleaved mode MUST be used.  The value of packetization-
+         mode MUST be an integer in the range of 0 to 2, inclusive.
+
+      sprop-interleaving-depth:
+         This parameter MUST NOT be present when packetization-mode is
+         not present or the value of packetization-mode is equal to 0 or
+         1.  This parameter MUST be present when the value of
+         packetization-mode is equal to 2.
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 51]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         This parameter signals the properties of an RTP packet stream.
+         It specifies the maximum number of VCL NAL units that precede
+         any VCL NAL unit in the RTP packet stream in transmission order
+         and that follow the VCL NAL unit in decoding order.
+         Consequently, it is guaranteed that receivers can reconstruct
+         NAL unit decoding order when the buffer size for NAL unit
+         decoding order recovery is at least the value of sprop-
+         interleaving-depth + 1 in terms of VCL NAL units.
+
+         The value of sprop-interleaving-depth MUST be an integer in the
+         range of 0 to 32767, inclusive.
+
+      sprop-deint-buf-req:
+         This parameter MUST NOT be present when packetization-mode is
+         not present or the value of packetization-mode is equal to 0 or
+         1.  It MUST be present when the value of packetization-mode is
+         equal to 2.
+
+         sprop-deint-buf-req signals the required size of the
+         de-interleaving buffer for the RTP packet stream.  The value of
+         the parameter MUST be greater than or equal to the maximum
+         buffer occupancy (in units of bytes) required in such a
+         de-interleaving buffer that is specified in Section 7.2.  It is
+         guaranteed that receivers can perform the de-interleaving of
+         interleaved NAL units into NAL unit decoding order, when the
+         de-interleaving buffer size is at least the value of sprop-
+         deint-buf-req in terms of bytes.
+
+         The value of sprop-deint-buf-req MUST be an integer in the
+         range of 0 to 4294967295, inclusive.
+
+            Informative note: sprop-deint-buf-req indicates the required
+            size of the de-interleaving buffer only.  When network
+            jitter can occur, an appropriately sized jitter buffer has
+            to be provisioned for as well.
+
+      deint-buf-cap:
+         This parameter signals the capabilities of a receiver
+         implementation and indicates the amount of de-interleaving
+         buffer space in units of bytes that the receiver has available
+         for reconstructing the NAL unit decoding order.  A receiver is
+         able to handle any stream for which the value of the sprop-
+         deint-buf-req parameter is smaller than or equal to this
+         parameter.
+
+         If the parameter is not present, then a value of 0 MUST be used
+         for deint-buf-cap.  The value of deint-buf-cap MUST be an
+         integer in the range of 0 to 4294967295, inclusive.
+
+
+
+Wang, et al.                 Standards Track                   [Page 52]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+            Informative note: deint-buf-cap indicates the maximum
+            possible size of the de-interleaving buffer of the receiver
+            only.  When network jitter can occur, an appropriately sized
+            jitter buffer has to be provisioned for as well.
+
+      sprop-init-buf-time:
+         This parameter MAY be used to signal the properties of an RTP
+         packet stream.  The parameter MUST NOT be present if the value
+         of packetization-mode is equal to 0 or 1.
+
+         The parameter signals the initial buffering time that a
+         receiver MUST wait before starting decoding to recover the NAL
+         unit decoding order from the transmission order.  The parameter
+         is the maximum value of (decoding time of the NAL unit -
+         transmission time of a NAL unit), assuming reliable and
+         instantaneous transmission, the same timeline for transmission
+         and decoding, and commencement of decoding when the first
+         packet arrives.
+
+         An example of specifying the value of sprop-init-buf-time
+         follows.  A NAL unit stream is sent in the following
+         interleaved order, in which the value corresponds to the
+         decoding time and the transmission order is from left to right:
+
+               0  2  1  3  5  4  6  8  7 ...
+
+         Assuming a steady transmission rate of NAL units, the
+         transmission times are:
+
+               0  1  2  3  4  5  6  7  8 ...
+
+         Subtracting the decoding time from the transmission time
+         column-wise results in the following series:
+
+               0 -1  1  0 -1  1  0 -1  1 ...
+
+         Thus, in terms of intervals of NAL unit transmission times, the
+         value of sprop-init-buf-time in this example is 1.  The
+         parameter is coded as a non-negative base10 integer
+         representation in clock ticks of a 90-kHz clock.  If the
+         parameter is not present, then no initial buffering time value
+         is defined.  Otherwise, the value of sprop-init-buf-time MUST
+         be an integer in the range of 0 to 4294967295, inclusive.
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 53]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         In addition to the signaled sprop-init-buf-time, receivers
+         SHOULD take into account the transmission delay jitter
+         buffering, including buffering for the delay jitter caused by
+         mixers, translators, gateways, proxies, traffic-shapers, and
+         other network elements.
+
+      sprop-max-don-diff:
+         This parameter MAY be used to signal the properties of an RTP
+         packet stream.  It MUST NOT be used to signal transmitter,
+         receiver, or codec capabilities.  The parameter MUST NOT be
+         present if the value of packetization-mode is equal to 0 or 1.
+         sprop-max-don-diff is an integer in the range of 0 to 32767,
+         inclusive.  If sprop-max-don-diff is not present, the value of
+         the parameter is unspecified.  sprop-max-don-diff is calculated
+         as follows:
+
+            sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)},
+            for any i and any j>i,
+
+         where i and j indicate the index of the NAL unit in the
+         transmission order and AbsDON denotes a decoding order number
+         of the NAL unit that does not wrap around to 0 after 65535.  In
+         other words, AbsDON is calculated as follows: let m and n be
+         consecutive NAL units in transmission order.  For the very
+         first NAL unit in transmission order (whose index is 0),
+         AbsDON(0) = DON(0).  For other NAL units, AbsDON is calculated
+         as follows:
+
+            If DON(m) == DON(n), AbsDON(n) = AbsDON(m)
+
+            If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
+              AbsDON(n) = AbsDON(m) + DON(n) - DON(m)
+
+            If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
+              AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)
+
+            If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
+              AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n))
+
+            If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
+              AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))
+
+         where DON(i) is the decoding order number of the NAL unit
+         having index i in the transmission order.  The decoding order
+         number is specified in Section 5.5.
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 54]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+            Informative note: Receivers may use sprop-max-don-diff to
+            trigger which NAL units in the receiver buffer can be passed
+            to the decoder.
+
+      max-rcmd-nalu-size:
+         This parameter MAY be used to signal the capabilities of a
+         receiver.  The parameter MUST NOT be used for any other
+         purposes.  The value of the parameter indicates the largest
+         NALU size in bytes that the receiver can handle efficiently.
+         The parameter value is a recommendation, not a strict upper
+         boundary.  The sender MAY create larger NALUs but must be aware
+         that the handling of these may come at a higher cost than NALUs
+         conforming to the limitation.
+
+         The value of max-rcmd-nalu-size MUST be an integer in the range
+         of 0 to 4294967295, inclusive.  If this parameter is not
+         specified, no known limitation to the NALU size exists.
+         Senders still have to consider the MTU size available between
+         the sender and the receiver and SHOULD run MTU discovery for
+         this purpose.
+
+         This parameter is motivated by, for example, an IP to H.223
+         video telephony gateway, where NALUs smaller than the H.223
+         transport data unit will be more efficient.  A gateway may
+         terminate IP; thus, MTU discovery will normally not work beyond
+         the gateway.
+
+            Informative note: Setting this parameter to a lower than
+            necessary value may have a negative impact.
+
+      sar-understood:
+         This parameter MAY be used to indicate a receiver capability
+         and nothing else.  The parameter indicates the maximum value of
+         aspect_ratio_idc (specified in [1]) smaller than 255 that the
+         receiver understands.  Table E-1 of [1] specifies
+         aspect_ratio_idc equal to 0 as "unspecified"; 1 to 16,
+         inclusive, as specific Sample Aspect Ratios (SARs); 17 to 254,
+         inclusive, as "reserved"; and 255 as the Extended SAR, for
+         which SAR width and SAR height are explicitly signaled.
+         Therefore, a receiver with a decoder according to [1]
+         understands aspect_ratio_idc in the range of 1 to 16,
+         inclusive, and aspect_ratio_idc equal to 255, in the sense that
+         the receiver knows exactly what the SAR is.  For such a
+         receiver, the value of sar-understood is 16.  In the future, if
+         Table E-1 of [1] is extended, e.g., such that the SAR for
+         aspect_ratio_idc equal to 17 is specified, then for a receiver
+         with a decoder that understands the extension, the value of
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 55]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         sar-understood is 17.  For a receiver with a decoder according
+         to the 2003 version of [1], the value of sar-understood is 13,
+         as the minimum reserved aspect_ratio_idc therein is 14.
+
+         When sar-understood is not present, the value MUST be inferred
+         to be equal to 13.
+
+      sar-supported:
+         This parameter MAY be used to indicate a receiver capability
+         and nothing else.  The value of this parameter is an integer in
+         the range of 1 to sar-understood, inclusive, equal to 255.  The
+         value of sar-supported equal to N smaller than 255 indicates
+         that the receiver supports all the SARs corresponding to H.264
+         aspect_ratio_idc values (see Table E-1 of [1]) in the range
+         from 1 to N, inclusive, without geometric distortion.  The
+         value of sar-supported equal to 255 indicates that the receiver
+         supports all sample aspect ratios that are expressible using
+         two 16-bit integer values as the numerator and denominator,
+         i.e., those that are expressible using the H.264
+         aspect_ratio_idc value of 255 (Extended_SAR, see Table E-1 of
+         [1]), without geometric distortion.
+
+         H.264-compliant encoders SHOULD NOT send an aspect_ratio_idc
+         equal to 0 or an aspect_ratio_idc larger than sar-understood
+         and smaller than 255.  H.264-compliant encoders SHOULD send an
+         aspect_ratio_idc that the receiver is able to display without
+         geometrical distortion.  However, H.264-compliant encoders MAY
+         choose to send pictures using any SAR.
+
+         Note that the actual sample aspect ratio or extended sample
+         aspect ratio, when present, of the stream is conveyed in the
+         Video Usability Information (VUI) part of the sequence
+         parameter set.
+
+      Encoding considerations:
+         This type is only defined for transfer via RTP (RFC 3550).
+
+      Security considerations:
+         See Section 9 of RFC 6184.
+
+      Public specification:
+         Please refer to RFC 6184 and its Section 17.
+
+      Additional information:
+         None
+
+      File extensions:  none
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 56]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      Macintosh file type code:  none
+
+      Object identifier or OID:  none
+
+      Person & email address to contact for further information:
+         Ye-Kui Wang, yekui.wang@huawei.com
+
+      Intended usage:  COMMON
+
+      Author:
+         Ye-Kui Wang, yekui.wang@huawei.com
+
+      Change controller:
+         IETF Audio/Video Transport working group delegated from the
+         IESG.
+
+8.2.  SDP Parameters
+
+   The receiver MUST ignore any parameter unspecified in this memo.
+
+8.2.1.  Mapping of Payload Type Parameters to SDP
+
+   The media type video/H264 string is mapped to fields in the Session
+   Description Protocol (SDP) [6] as follows:
+
+   o  The media name in the "m=" line of SDP MUST be video.
+
+   o  The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the
+      media subtype).
+
+   o  The clock rate in the "a=rtpmap" line MUST be 90000.
+
+   o  The OPTIONAL parameters profile-level-id, max-recv-level, max-
+      mbps, max-smbps, max-fs, max-cpb, max-dpb, max-br, redundant-pic-
+      cap, use-level-src-parameter-sets, in-band-parameter-sets, level-
+      asymmetry-allowed, packetization-mode, sprop-interleaving-depth,
+      sprop-deint-buf-req, deint-buf-cap, sprop-init-buf-time, sprop-
+      max-don-diff, max-rcmd-nalu-size, sar-understood, and sar-
+      supported, when present, MUST be included in the "a=fmtp" line of
+      SDP.  These parameters are expressed as a media type string, in
+      the form of a semicolon-separated list of parameter=value pairs.
+
+   o  The OPTIONAL parameters sprop-parameter-sets and sprop-level-
+      parameter-sets, when present, MUST be included in the "a=fmtp"
+      line of SDP or conveyed using the "fmtp" source attribute as
+      specified in Section 6.3 of [9].  For a particular media format
+      (i.e., RTP payload type), a sprop-parameter-sets or sprop-level-
+      parameter-sets MUST NOT be both included in the "a=fmtp" line of
+
+
+
+Wang, et al.                 Standards Track                   [Page 57]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      SDP and conveyed using the "fmtp" source attribute.  When included
+      in the "a=fmtp" line of SDP, these parameters are expressed as a
+      media type string, in the form of a semicolon-separated list of
+      parameter=value pairs.  When conveyed using the "fmtp" source
+      attribute, these parameters are only associated with the given
+      source and payload type as parts of the "fmtp" source attribute.
+
+         Informative note: Conveyance of sprop-parameter-sets and sprop-
+         level-parameter-sets using the "fmtp" source attribute allows
+         for out-of-band transport of parameter sets in topologies like
+         Topo-Video-switch-MCU [29].
+
+   An example of media representation in SDP is as follows (Baseline
+   profile, Level 3.0, some of the constraints of the Main profile may
+   not be obeyed):
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E;
+                packetization-mode=1;
+                sprop-parameter-sets=<parameter sets data>
+
+8.2.2.  Usage with the SDP Offer/Answer Model
+
+   When H.264 is offered over RTP using SDP in an Offer/Answer model [8]
+   for negotiation for unicast usage, the following limitations and
+   rules apply:
+
+   o  The parameters identifying a media format configuration for H.264
+      are profile-level-id and packetization-mode.  These media format
+      configuration parameters (except for the level part of profile-
+      level-id) MUST be used symmetrically; that is, the answerer MUST
+      either maintain all configuration parameters or remove the media
+      format (payload type) completely if one or more of the parameter
+      values are not supported.  Note that the level part of profile-
+      level-id includes level_idc, and, for indication of Level 1b when
+      profile_idc is equal to 66, 77, or 88, bit 4
+      (constraint_set3_flag) of profile-iop.  The level part of profile-
+      level-id is changeable.
+
+         Informative note: The requirement for symmetric use does not
+         apply for the level part of profile-level-id and does not apply
+         for the other stream properties and capability parameters.
+
+         Informative note: In H.264 [1], all the levels except for Level
+         1b are equal to the value of level_idc divided by 10.  Level 1b
+         is a level higher than Level 1.0 but lower than Level 1.1 and
+         is signaled in an ad hoc manner, because the level was
+
+
+
+Wang, et al.                 Standards Track                   [Page 58]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+         specified after Level 1.0 and Level 1.1.  For the Baseline,
+         Main, and Extended profiles (with profile_idc equal to 66, 77,
+         and 88, respectively), Level 1b is indicated by level_idc equal
+         to 11 (i.e., same as Level 1.1) and constraint_set3_flag equal
+         to 1.  For other profiles, Level 1b is indicated by level_idc
+         equal to 9 (but note that Level 1b for these profiles are still
+         higher than Level 1, which has level_idc equal to 10 and lower
+         than Level 1.1).  In SDP Offer/Answer, an answer to an offer
+         may indicate a level equal to or lower than the level indicated
+         in the offer.  Due to the ad hoc indication of Level 1b,
+         offerers and answerers must check the value of bit 4
+         (constraint_set3_flag) of the middle octet of the parameter
+         profile-level-id, when profile_idc is equal to 66, 77, or 88
+         and level_idc is equal to 11.
+
+      To simplify the handling and matching of these configurations, the
+      same RTP payload type number used in the offer SHOULD also be used
+      in the answer, as specified in [8].  An answer MUST NOT contain
+      the payload type number used in the offer unless the configuration
+      is exactly the same as in the offer.
+
+         Informative note: When an offerer receives an answer, it has to
+         compare payload types not declared in the offer based on the
+         media type (i.e., video/H264) and the above media configuration
+         parameters with any payload types it has already declared.
+         This will enable it to determine whether the configuration in
+         question is new or if it is equivalent to configuration already
+         offered, since a different payload type number may be used in
+         the answer.
+
+   o  When present, the parameter max-recv-level declares the highest
+      level supported for receiving.  In case max-recv-level is not
+      present, the highest level supported for receiving is equal to the
+      default level indicated by the level part of profile-level-id.
+      When present, max-recv-level MUST be higher than the default
+      level.
+
+   o  The parameter level-asymmetry-allowed indicates whether level
+      asymmetry is allowed.
+
+      If level-asymmetry-allowed is equal to 0 (or not present) in
+      either the offer or the answer, level asymmetry is not allowed.
+      In this case, the level to use in the direction from the offerer
+      to the answerer MUST be the same as the level to use in the
+      opposite direction, and the common level to use is equal to the
+      lower value of the default level in the offer and the default
+      level in the answer.
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 59]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      Otherwise, level-asymmetry-allowed equals 1 in both the offer and
+      the answer, and level asymmetry is allowed.  In this case, the
+      level to use in the offerer-to-answerer direction MUST be equal to
+      the highest level the answerer supports for receiving, and the
+      level to use in the answerer-to-offerer direction MUST be equal to
+      the highest level the offerer supports for receiving.
+
+      When level asymmetry is not allowed, level upgrade is not allowed,
+      i.e., the default level in the answer MUST be equal to or lower
+      than the default level in the offer.
+
+   o  The parameters sprop-deint-buf-req, sprop-interleaving-depth,
+      sprop-max-don-diff, and sprop-init-buf-time describe the
+      properties of the RTP packet stream that the offerer or answerer
+      is sending for the media format configuration.  This differs from
+      the normal usage of the Offer/Answer parameters: normally such
+      parameters declare the properties of the stream that the offerer
+      or the answerer is able to receive.  When dealing with H.264, the
+      offerer assumes that the answerer will be able to receive media
+      encoded using the configuration being offered.
+
+         Informative note: The above parameters apply for any stream
+         sent by a declaring entity with the same configuration; i.e.,
+         they are dependent on their source.  Rather than being bound to
+         the payload type, the values may have to be applied to another
+         payload type when being sent, as they apply for the
+         configuration.
+
+   o  The capability parameters max-mbps, max-smbps, max-fs, max-cpb,
+      max-dpb, max-br, redundant-pic-cap, max-rcmd-nalu-size, sar-
+      understood, and sar-supported MAY be used to declare further
+      capabilities of the offerer or answerer for receiving.  These
+      parameters MUST NOT be present when the direction attribute is
+      "sendonly" and when the parameters describe the limitations of
+      what the offerer or answerer accepts for receiving streams.
+
+   o  An offerer has to include the size of the de-interleaving buffer,
+      sprop-deint-buf-req, in the offer for an interleaved H.264 stream.
+      To enable the offerer and answerer to inform each other about
+      their capabilities for de-interleaving buffering in receiving
+      streams, both parties are RECOMMENDED to include deint-buf-cap.
+      For interleaved streams, it is also RECOMMENDED to consider
+      offering multiple payload types with different buffering
+      requirements when the capabilities of the receiver are unknown.
+
+   o  The sprop-parameter-sets or sprop-level-parameter-sets parameter,
+      when present (included in the "a=fmtp" line of SDP or conveyed
+      using the "fmtp" source attribute as specified in Section 6.3 of
+
+
+
+Wang, et al.                 Standards Track                   [Page 60]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      [9]), is used for out-of-band transport of parameter sets.
+      However, when out-of-band transport of parameter sets is used,
+      parameter sets MAY still be additionally transported in-band.
+
+      The answerer MAY use either out-of-band or in-band transport of
+      parameter sets for the stream it is sending, regardless of whether
+      out-of-band parameter sets transport has been used in the offerer-
+      to-answerer direction.  Parameter sets included in an answer are
+      independent of those parameter sets included in the offer, as they
+      are used for decoding two different video streams, one from the
+      answerer to the offerer and the other in the opposite direction.
+
+      The following rules apply to transport of parameter sets in the
+      offerer-to-answerer direction.
+
+         o  An offer MAY include either or both of sprop-parameter-sets
+            and sprop-level-parameter-sets.  If neither sprop-parameter-
+            sets nor sprop-level-parameter-sets is present in the offer,
+            then only in-band transport of parameter sets is used.
+
+         o  If the answer includes in-band-parameter-sets equal to 1,
+            then the offerer MUST transmit parameter sets in-band.
+            Otherwise, the following applies.
+
+               o  If the level to use in the offerer-to-answerer
+                  direction is equal to the default level in the offer,
+                  the following applies.
+
+                     When there is a sprop-parameter-sets included in
+                     the "a=fmtp" line in the offer, the answerer MUST
+                     be prepared to use the parameter sets included in
+                     the sprop-parameter-sets for decoding the incoming
+                     NAL unit stream.
+
+                     When there is a sprop-parameter-sets conveyed using
+                     the "fmtp" source attribute in the offer, the
+                     following applies.  If the answer includes use-
+                     level-src-parameter-sets equal to 1 or the "fmtp"
+                     source attribute, the answerer MUST be prepared to
+                     use the parameter sets included in the sprop-
+                     parameter-sets for decoding the incoming NAL unit
+                     stream;  otherwise, the offerer MUST transmit
+                     parameter sets in-band.
+
+                     When sprop-parameter-sets is not present in the
+                     offer, the offerer MUST transmit parameter sets in-
+                     band.
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 61]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+                     The answerer MUST ignore sprop-level-parameter-
+                     sets, when present (either included in the "a=fmtp"
+                     line or conveyed using the "fmtp" source attribute)
+                     in the offer.
+
+               o  Otherwise, the level to use in the offerer-to-answerer
+                  direction is not equal to the default level in the
+                  offer, and the following applies.
+
+                     The answerer MUST ignore sprop-parameter-sets, when
+                     present (either included in the "a=fmtp" line or
+                     conveyed using the "fmtp" source attribute) in the
+                     offer.
+
+                     When neither use-level-src-parameter-sets is equal
+                     to 1 nor the "fmtp" source attribute is present in
+                     the answer, the answerer MUST ignore sprop-level-
+                     parameter-sets, when present in the offer, and the
+                     offerer MUST transmit parameter sets in-band.
+
+                     When either use-level-src-parameter-sets is equal
+                     to 1 or the "fmtp" source attribute is present in
+                     the answer, the answerer MUST be prepared to use
+                     the parameter sets that are included in sprop-
+                     level-parameter-sets for the accepted level (i.e.,
+                     the default level in the answer), when present in
+                     the offer, for decoding the incoming NAL unit
+                     stream, and ignore all other parameter sets
+                     included in sprop-level-parameter-sets.
+
+                     When no parameter sets for the level to use in the
+                     offerer-to-answerer direction are present in sprop-
+                     level-parameter-sets in the offer, the offerer MUST
+                     transmit parameter sets in-band.
+
+      The following rules apply to the transport of parameter sets in
+      the answerer-to-offerer direction.
+
+         o  An answer MAY include either sprop-parameter-sets or sprop-
+            level-parameter-sets but MUST NOT include both.  If neither
+            sprop-parameter-sets nor sprop-level-parameter-sets is
+            present in the answer, then only in-band transport of
+            parameter sets is used.
+
+         o  If the offer includes in-band-parameter-sets equal to 1, the
+            answerer MUST NOT include sprop-parameter-sets or sprop-
+            level-parameter-sets in the answer and MUST transmit
+            parameter sets in-band.  Otherwise, the following applies.
+
+
+
+Wang, et al.                 Standards Track                   [Page 62]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+               o  If the level to use in the answerer-to-offerer
+                  direction is equal to the default level in the answer,
+                  the following applies.
+
+                     When there is a sprop-parameter-sets included in
+                     the "a=fmtp" line in the answer, the offerer MUST
+                     be prepared to use the parameter sets included in
+                     the sprop-parameter-sets for decoding the incoming
+                     NAL unit stream.
+
+                     When there is a sprop-parameter-sets conveyed using
+                     the "fmtp" source attribute in the answer, the
+                     following applies.  If the offer includes use-
+                     level-src-parameter-sets equal to 1 or the "fmtp"
+                     source attribute, the offerer MUST be prepared to
+                     use the parameter sets included in the sprop-
+                     parameter-sets for decoding the incoming NAL unit
+                     stream;  otherwise, the answerer MUST transmit
+                     parameter sets in-band.
+
+                     When sprop-parameter-sets is not present in the
+                     answer, the answerer MUST transmit parameter sets
+                     in-band.
+
+                     The offerer MUST ignore sprop-level-parameter-sets,
+                     when present (either included in the "a=fmtp" line
+                     or conveyed using the "fmtp" source attribute) in
+                     the answer.
+
+               o  Otherwise, the level to use in the answerer-to-offerer
+                  direction is not equal to the default level in the
+                  answer, and the following applies.
+
+                     The offerer MUST ignore sprop-parameter-sets when
+                     present (either included in the "a=fmtp" line of
+                     SDP or conveyed using the "fmtp" source attribute)
+                     in the answer.
+
+                     When neither use-level-src-parameter-sets is equal
+                     to 1 nor the "fmtp" source attribute is present in
+                     the offer, the offerer MUST ignore sprop-level-
+                     parameter-sets, when present, and the answerer MUST
+                     transmit parameter sets in-band.
+
+                     When either use-level-src-parameter-sets is equal
+                     to 1 or the "fmtp" source attribute is present in
+                     the offer, the offerer MUST be prepared to use the
+                     parameter sets that are included in sprop-level-
+
+
+
+Wang, et al.                 Standards Track                   [Page 63]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+                     parameter-sets for the level to use in the
+                     answerer-to-offerer direction, when present in the
+                     answer, for decoding the incoming NAL unit stream,
+                     and ignore all other parameter sets included in
+                     sprop-level-parameter-sets in the answer.
+
+                     When no parameter sets for the level to use in the
+                     answerer-to-offerer direction are present in sprop-
+                     level-parameter-sets in the answer, the answerer
+                     MUST transmit parameter sets in-band.
+
+      When sprop-parameter-sets or sprop-level-parameter-sets is
+      conveyed using the "fmtp" source attribute as specified in Section
+      6.3 of [9], the receiver of the parameters MUST store the
+      parameter sets included in the sprop-parameter-sets or sprop-
+      level-parameter-sets for the accepted level and associate them
+      with the source given as a part of the "fmtp" source attribute.
+      Parameter sets associated with one source MUST only be used to
+      decode NAL units conveyed in RTP packets from the same source.
+      When this mechanism is in use, SSRC collision detection and
+      resolution MUST be performed as specified in [9].
+
+         Informative note: Conveyance of sprop-parameter-sets and sprop-
+         level-parameter-sets using the "fmtp" source attribute may be
+         used in topologies like Topo-Video-switch-MCU [29] to enable
+         out-of-band transport of parameter sets.
+
+   For streams being delivered over multicast, the following rules
+   apply:
+
+   o  The media format configuration is identified by "profile-level-
+      id", including the level part, and packetization-mode.  These
+      media format configuration parameters (including the level part of
+      profile-level-id) MUST be used symmetrically; that is, the
+      answerer MUST either maintain all configuration parameters or
+      remove the media format (payload type) completely.  Note that this
+      implies that the level part of profile-level-id for Offer/Answer
+      in multicast is not changeable.
+
+      To simplify the handling and matching of these configurations, the
+      same RTP payload type number used in the offer SHOULD also be used
+      in the answer, as specified in [8].  An answer MUST NOT contain a
+      payload type number used in the offer unless the configuration is
+      the same as in the offer.
+
+   o  Parameter sets received MUST be associated with the originating
+      source and MUST only be used in decoding the incoming NAL unit
+      stream from the same source.
+
+
+
+Wang, et al.                 Standards Track                   [Page 64]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   o  The rules for other parameters are the same as above for unicast
+      as long as the above rules are obeyed.
+
+   Table 6 lists the interpretation of all the media type parameters
+   that MUST be used for the different direction attributes.
+
+       Table 6.  Interpretation of parameters for different direction
+                 attributes
+
+                                              sendonly --+
+                                           recvonly --+  |
+                                        sendrecv --+  |  |
+                                                   |  |  |
+                profile-level-id                   C  C  P
+                max-recv-level                     R  R  -
+                packetization-mode                 C  C  P
+                sprop-deint-buf-req                P  -  P
+                sprop-interleaving-depth           P  -  P
+                sprop-max-don-diff                 P  -  P
+                sprop-init-buf-time                P  -  P
+                max-mbps                           R  R  -
+                max-smbps                          R  R  -
+                max-fs                             R  R  -
+                max-cpb                            R  R  -
+                max-dpb                            R  R  -
+                max-br                             R  R  -
+                redundant-pic-cap                  R  R  -
+                deint-buf-cap                      R  R  -
+                max-rcmd-nalu-size                 R  R  -
+                sar-understood                     R  R  -
+                sar-supported                      R  R  -
+                in-band-parameter-sets             R  R  -
+                use-level-src-parameter-sets       R  R  -
+                level-asymmetry-allowed            O  -  -
+                sprop-parameter-sets               S  -  S
+                sprop-level-parameter-sets         S  -  S
+
+             Legend:
+
+             C: configuration for sending and receiving streams
+             O: offer/answer mode
+             P: properties of the stream to be sent
+             R: receiver capabilities
+             S: out-of-band parameter sets
+             -: not usable (when present, SHOULD be ignored)
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 65]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Parameters used for declaring receiver capabilities are in general
+   downgradable; that is, they express the upper limit for a sender's
+   possible behavior.  Thus, a sender MAY select to set its encoder
+   using only lower/less or equal values of these parameters.
+
+   Parameters declaring a configuration point are not changeable, with
+   the exception of the level part of the profile-level-id parameter for
+   unicast usage.
+
+   When a sender's capabilities are declared and non-downgradable
+   parameters are used in this declaration, these parameters express a
+   configuration that is acceptable for the sender to receive streams.
+   In order to achieve high interoperability levels, it is often
+   advisable to offer multiple alternative configurations, e.g., for the
+   packetization mode.  It is impossible to offer multiple
+   configurations in a single payload type.  Thus, when multiple
+   configuration offers are made, each offer requires its own RTP
+   payload type associated with the offer.
+
+   A receiver SHOULD understand all media type parameters, even if it
+   only supports a subset of the payload format's functionality.  This
+   ensures that a receiver is capable of understanding when an offer to
+   receive media can be downgraded to what is supported by the receiver
+   of the offer.
+
+   An answerer MAY extend the offer with additional media format
+   configurations.  However, to enable their usage, in most cases, a
+   second offer is required from the offerer to provide the stream
+   property parameters that the media sender will use.  This also has
+   the effect that the offerer has to be able to receive this media
+   format configuration, not only to send it.
+
+   If an offerer wishes to have non-symmetric capabilities between
+   sending and receiving, the offerer can allow asymmetric levels via
+   level-asymmetry-allowed being equal to 1.  Alternatively, the offerer
+   could offer different RTP sessions, i.e., different media lines
+   declared as "recvonly" and "sendonly", respectively.  This may have
+   further implications on the system and may require additional
+   external semantics to associate the two media lines.
+
+8.2.3.  Usage in Declarative Session Descriptions
+
+   When H.264 over RTP is offered with SDP in a declarative style, as in
+   Real Time Streaming Protocol (RTSP) [27] or Session Announcement
+   Protocol (SAP) [28], the following considerations are necessary.
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 66]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   o  All parameters capable of indicating both stream properties and
+      receiver capabilities are used to indicate only stream properties.
+      For example, in this case, the parameter profile-level-id declares
+      only the values used by the stream, not the capabilities for
+      receiving streams.  The result of this is that the following
+      interpretation of the parameters MUST be used:
+
+      Declaring actual configuration or stream properties:
+
+         - profile-level-id
+         - packetization-mode
+         - sprop-interleaving-depth
+         - sprop-deint-buf-req
+         - sprop-max-don-diff
+         - sprop-init-buf-time
+
+      Out-of-band transporting of parameter sets:
+
+         - sprop-parameter-sets
+         - sprop-level-parameter-sets
+
+      Not usable (when present, they SHOULD be ignored):
+
+         - max-mbps
+         - max-smbps
+         - max-fs
+         - max-cpb
+         - max-dpb
+         - max-br
+         - max-recv-level
+         - redundant-pic-cap
+         - max-rcmd-nalu-size
+         - deint-buf-cap
+         - sar-understood
+         - sar-supported
+         - in-band-parameter-sets
+         - level-asymmetry-allowed
+         - use-level-src-parameter-sets
+
+   o  A receiver of the SDP is required to support all parameters and
+      values of the parameters provided; otherwise, the receiver MUST
+      reject (RTSP) or not participate in (SAP) the session.  It falls
+      on the creator of the session to use values that are expected to
+      be supported by the receiving application.
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 67]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+8.3.  Examples
+
+   An SDP Offer/Answer exchange wherein both parties are expected to
+   both send and receive could look like the following.  Only the media-
+   codec-specific parts of the SDP are shown.  Some lines are wrapped
+   due to text constraints.
+
+      Offerer -> Answerer SDP message:
+
+      m=video 49170 RTP/AVP 100 99 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; packetization-mode=0;
+        sprop-parameter-sets=<parameter sets data#0>
+      a=rtpmap:99 H264/90000
+      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
+        sprop-parameter-sets=<parameter sets data#1>
+      a=rtpmap:100 H264/90000
+      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
+        sprop-parameter-sets=<parameter sets data#2>;
+        sprop-interleaving-depth=45; sprop-deint-buf-req=64000;
+        sprop-init-buf-time=102478; deint-buf-cap=128000
+
+   The above offer presents the same codec configuration in three
+   different packetization formats.  Payload type 98 represents single
+   NALU mode, payload type 99 represents non-interleaved mode, and
+   payload type 100 indicates the interleaved mode.  In the interleaved
+   mode case, the interleaving parameters that the offerer would use if
+   the answer indicates support for payload type 100 are also included.
+   In all three cases, the parameter sprop-parameter-sets conveys the
+   initial parameter sets that are required by the answerer when
+   receiving a stream from the offerer when this configuration is
+   accepted.  Note that the value for sprop-parameter-sets could be
+   different for each payload type.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 68]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      Answerer -> Offerer SDP message:
+
+      m=video 49170 RTP/AVP 100 99 97
+      a=rtpmap:97 H264/90000
+      a=fmtp:97 profile-level-id=42A01E; packetization-mode=0;
+        sprop-parameter-sets=<parameter sets data#3>
+      a=rtpmap:99 H264/90000
+      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
+        sprop-parameter-sets=<parameter sets data#4>;
+        max-rcmd-nalu-size=3980
+      a=rtpmap:100 H264/90000
+      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
+        sprop-parameter-sets=<parameter sets data#5>;
+        sprop-interleaving-depth=60;
+        sprop-deint-buf-req=86000; sprop-init-buf-time=156320;
+        deint-buf-cap=128000; max-rcmd-nalu-size=3980
+
+   As the Offer/Answer negotiation covers both sending and receiving
+   streams, an offer indicates the exact parameters for what the offerer
+   is willing to receive, whereas the answer indicates the same for what
+   the answerer is willing to receive.  In this case, the offerer
+   declared that it is willing to receive payload type 98.  The answerer
+   accepts this by declaring an equivalent payload type 97; that is, it
+   has identical values for the two parameters profile-level-id and
+   packetization-mode (since packetization-mode is equal to 0 and sprop-
+   deint-buf-req is not present).  As the offered payload type 98 is
+   accepted, the answerer needs to store parameter sets included in
+   sprop-parameter-sets=<parameter sets data#0> in case the offer
+   finally decides to use this configuration.  In the answer, the
+   answerer includes the parameter sets in sprop-parameter-
+   sets=<parameter sets data#3> that the answerer would use in the
+   stream sent from the answerer if this configuration is finally used.
+
+   The answerer also accepts the reception of the two configurations
+   that payload types 99 and 100 represent.  Again, the answerer needs
+   to store parameter sets included in sprop-parameter-sets=<parameter
+   sets data#1> and sprop-parameter-sets=<parameter sets data#2> in case
+   the offer finally decides to use either of these two configurations.
+   The answerer provides the initial parameter sets for the answerer-to-
+   offerer direction, i.e., the parameter sets in sprop-parameter-
+   sets=<parameter sets data#4> and sprop-parameter-sets=<parameter sets
+   data#5>, for payload types 99 and 100, respectively, that it will use
+   to send the payload types.  The answerer also provides the offerer
+   with its memory limit for de-interleaving operations by providing a
+   deint-buf-cap parameter.  This is only useful if the offerer decides
+   on making a second offer, where it can take the new value into
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 69]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   account.  The max-rcmd-nalu-size indicates that the answerer can
+   efficiently process NALUs up to the size of 3980 bytes.  However,
+   there is no guarantee that the network supports this size.
+
+   In the following example, the offer is accepted without level
+   downgrading (i.e., the default level, Level 3.0, is accepted), and
+   both sprop-parameter-sets and sprop-level-parameter-sets are present
+   in the offer.  The answerer must ignore sprop-level-parameter-
+   sets=<parameter sets data#1> and store parameter sets in sprop-
+   parameter-sets=<parameter sets data#0> for decoding the incoming NAL
+   unit stream.  The offerer must store the parameter sets in sprop-
+   parameter-sets=<parameter sets data#2> in the answer for decoding the
+   incoming NAL unit stream.  Note that in this example, parameter sets
+   in sprop-parameter-sets=<parameter sets data#2> must be associated
+   with Level 3.0.
+
+      Offer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1;
+        sprop-parameter-sets=<parameter sets data#0>;
+        sprop-level-parameter-sets=<parameter sets data#1>
+
+      Answer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1;
+        sprop-parameter-sets=<parameter sets data#2>
+
+   In the following example, the offer (Baseline profile, Level 1.1) is
+   accepted with level downgrading (the accepted level is Level 1b), and
+   both sprop-parameter-sets and sprop-level-parameter-sets are present
+   in the offer.  The answerer must ignore sprop-parameter-
+   sets=<parameter sets data#0> and all parameter sets not for the
+   accepted level (Level 1b) in sprop-level-parameter-sets=<parameter
+   sets data#1> and must store parameter sets for the accepted level
+   (Level 1b) in sprop-level-parameter-sets=<parameter sets data#1> for
+   decoding the incoming NAL unit stream.  The offerer must store the
+   parameter sets in sprop-parameter-sets=<parameter sets data#2> in the
+   answer for decoding the incoming NAL unit stream.  Note that in this
+   example, parameter sets in sprop-parameter-sets=<parameter sets
+   data#2> must be associated with Level 1b.
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 70]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      Offer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
+        packetization-mode=1;
+        sprop-parameter-sets=<parameter sets data#0>;
+        sprop-level-parameter-sets=<parameter sets data#1>
+
+      Answer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
+        packetization-mode=1;
+        sprop-parameter-sets=<parameter sets data#2>;
+        use-level-src-parameter-sets=1
+
+   In the following example, the offer (Baseline profile, Level 1.1) is
+   accepted with level downgrading (the accepted level is Level 1b), and
+   both sprop-parameter-sets and sprop-level-parameter-sets are present
+   in the offer.  However, the answerer is a legacy RFC 3984
+   implementation and does not understand sprop-level-parameter-sets;
+   hence, it does not include use-level-src-parameter-sets (which the
+   answerer does not understand either) in the answer.  Therefore, the
+   answerer must ignore both sprop-parameter-sets=<parameter sets
+   data#0> and sprop-level-parameter-sets=<parameter sets data#1>, and
+   the offerer must transport parameter sets in-band.
+
+      Offer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
+        packetization-mode=1;
+        sprop-parameter-sets=<parameter sets data#0>;
+        sprop-level-parameter-sets=<parameter sets data#1>
+
+      Answer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
+        packetization-mode=1
+
+   In the following example, the offer is accepted without level
+   downgrading, and sprop-parameter-sets is present in the offer.
+   Parameter sets in sprop-parameter-sets=<parameter sets data#0> must
+
+
+
+Wang, et al.                 Standards Track                   [Page 71]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   be stored and used by the encoder of the offerer and the decoder of
+   the answerer, and parameter sets in sprop-parameter-sets=<parameter
+   sets data#1> must be used by the encoder of the answerer and the
+   decoder of the offerer.  Note that sprop-parameter-sets=<parameter
+   sets data#0> is basically independent of sprop-parameter-
+   sets=<parameter sets data#1>.
+
+      Offer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1;
+        sprop-parameter-sets=<parameter sets data#0>
+
+      Answer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1;
+        sprop-parameter-sets=<parameter sets data#1>
+
+   In the following example, the offer is accepted without level
+   downgrading, and neither sprop-parameter-sets nor sprop-level-
+   parameter-sets is present in the offer, meaning that there is no out-
+   of-band transmission of parameter sets, which then have to be
+   transported in-band.
+
+      Offer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1
+
+      Answer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 72]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   In the following example, the offer is accepted with level
+   downgrading and sprop-parameter-sets is present in the offer.  As
+   sprop-parameter-sets=<parameter sets data#0> contains level_idc
+   indicating Level 3.0, it therefore cannot be used, as the answerer
+   wants Level 2.0, and must be ignored by the answerer, and in-band
+   parameter sets must be used.
+
+      Offer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1;
+        sprop-parameter-sets=<parameter sets data#0>
+
+      Answer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
+        packetization-mode=1
+
+   In the following example, the offer is also accepted with level
+   downgrading, and neither sprop-parameter-sets nor sprop-level-
+   parameter-sets is present in the offer, meaning that there is no out-
+   of-band transmission of parameter sets, which then have to be
+   transported in-band.
+
+      Offer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1
+
+      Answer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
+        packetization-mode=1
+
+   In the following example, the offer is accepted with level upgrading,
+   and neither sprop-parameter-sets nor sprop-level-parameter-sets is
+   present in the offer or the answer, meaning that there is no out-of-
+   band transmission of parameter sets, which then have to be
+   transported in-band.  The level to use in the offerer-to-answerer
+   direction is Level 3.0, and the level to use in the answerer-to-
+
+
+
+Wang, et al.                 Standards Track                   [Page 73]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   offerer direction is Level 2.0.  The answerer is allowed to send at
+   any level up to and including Level 2.0, and the offerer is allowed
+   to send at any level up to and including Level 3.0.
+
+      Offer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
+        packetization-mode=1; level-asymmetry-allowed=1
+
+      Answer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1; level-asymmetry-allowed=1
+
+   In the following example, the offerer is a Multipoint Control Unit
+   (MCU) in a topology like Topo-Video-switch-MCU [29], offering
+   parameter sets received (using out-of-band transport) from three
+   other participants (B, C, and D) and receiving parameter sets from
+   the participant A, which is the answerer.  The participants are
+   identified by their values of canonical name (CNAME), which are
+   mapped to different SSRC values.  The same codec configuration is
+   used by all four participants.  The participant A stores and
+   associates the parameter sets included in <parameter sets data#B>,
+   <parameter sets data#C>, and <parameter sets data#D> to participants
+   B, C, and D, respectively, and uses <parameter sets data#B> for
+   decoding NAL units carried in RTP packets originating from
+   participant B only, uses <parameter sets data#C> for decoding NAL
+   units carried in RTP packets originating from participant C only, and
+   uses <parameter sets data#D> for decoding NAL units carried in RTP
+   packets originating from participant D only.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 74]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      Offer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=ssrc:SSRC-B cname:CNAME-B
+      a=ssrc:SSRC-C cname:CNAME-C
+      a=ssrc:SSRC-D cname:CNAME-D
+      a=ssrc:SSRC-B fmtp:98
+        sprop-parameter-sets=<parameter sets data#B>
+      a=ssrc:SSRC-C fmtp:98
+        sprop-parameter-sets=<parameter sets data#C>
+      a=ssrc:SSRC-D fmtp:98
+        sprop-parameter-sets=<parameter sets data#D>
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1
+
+      Answer SDP:
+
+      m=video 49170 RTP/AVP 98
+      a=ssrc:SSRC-A cname:CNAME-A
+      a=ssrc:SSRC-A fmtp:98
+        sprop-parameter-sets=<parameter sets data#A>
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+        packetization-mode=1
+
+8.4.  Parameter Set Considerations
+
+   The H.264 parameter sets are a fundamental part of the video codec
+   and vital to its operation (see Section 1.2).  Due to their
+   characteristics and their importance for the decoding process, lost
+   or erroneously transmitted parameter sets can hardly be concealed
+   locally at the receiver.  A reference to a corrupt parameter set
+   normally has fatal results to the decoding process.  Corruption could
+   occur, for example, due to the erroneous transmission or loss of a
+   parameter set NAL unit but also due to the untimely transmission of a
+   parameter set update.  A parameter set update refers to a change of
+   at least one parameter in a picture parameter set or sequence
+   parameter set for which the picture parameter set or sequence
+   parameter set identifier remains unchanged.  Therefore, the following
+   recommendations are provided as a guideline for the implementer of
+   the RTP sender.
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 75]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Parameter set NALUs can be transported using three different
+   principles:
+
+   A.  Using a session control protocol (out-of-band) prior to the
+       actual RTP session.
+
+   B.  Using a session control protocol (out-of-band) during an ongoing
+       RTP session.
+
+   C.  Within the RTP packet stream in the payload (in-band) during an
+       ongoing RTP session.
+
+   It is recommended to implement principles A and B within a session
+   control protocol.  SIP and SDP can be used as described in the SDP
+   Offer/Answer model and in the previous sections of this memo.
+   Section 8.2.2 includes a detailed discussion on transport of
+   parameter sets in-band or out-of-band in SDP Offer/Answer using media
+   type parameters sprop-parameter-sets, sprop-level-parameter-sets,
+   use-level-src-parameter-sets, and in-band-parameter-sets.  This
+   section contains guidelines on how principles A and B should be
+   implemented within session control protocols.  It is independent of
+   the particular protocol used.  Principle C is supported by the RTP
+   payload format defined in this specification.  There are topologies
+   like Topo-Video-switch-MCU [29] for which the use of principle C may
+   be desirable.
+
+   If in-band signaling of parameter sets is used, the picture and
+   sequence parameter set NALUs SHOULD be transmitted in the RTP payload
+   using a reliable method of delivering of RTP (see below), as a loss
+   of a parameter set of either type will likely prevent decoding of a
+   considerable portion of the corresponding RTP packet stream.
+
+   If in-band signaling of parameter sets is used, the sender SHOULD
+   take the error characteristics into account and use mechanisms to
+   provide a high probability for delivering the parameter sets
+   correctly.  Mechanisms that increase the probability for a correct
+   reception include packet repetition, FEC, and retransmission.  The
+   use of an unreliable, out-of-band control protocol has similar
+   disadvantages as the in-band signaling (possible loss) and, in
+   addition, may also lead to difficulties in the synchronization (see
+   below).  Therefore, it is NOT RECOMMENDED.
+
+   Parameter sets MAY be added or updated during the lifetime of a
+   session using principles B and C.  It is required that parameter sets
+   be present at the decoder prior to the NAL units that refer to them.
+   Update or addition of parameter sets can result in further problems;
+   therefore, the following recommendations should be considered.
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 76]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   -  When parameter sets are added or updated, care SHOULD be taken to
+      ensure that any parameter set is delivered prior to its usage.
+      When new parameter sets are added, previously unused parameter set
+      identifiers are used.  It is common that no synchronization is
+      present between out-of-band signaling and in-band traffic.  If
+      out-of-band signaling is used, it is RECOMMENDED that a sender not
+      start sending NALUs requiring the added or updated parameter sets
+      prior to acknowledgement of delivery from the signaling protocol.
+
+   -  When parameter sets are updated, the following synchronization
+      issue should be taken into account.  When overwriting a parameter
+      set at the receiver, the sender has to ensure that the parameter
+      set in question is not needed by any NALU present in the network
+      or receiver buffers.  Otherwise, decoding with a wrong parameter
+      set may occur.  To lessen this problem, it is RECOMMENDED either
+      to overwrite only those parameter sets that have not been used for
+      a sufficiently long time (to ensure that all related NALUs have
+      been consumed) or to add a new parameter set instead (which may
+      have negative consequences for the efficiency of the video
+      coding).
+
+         Informative note: In some topologies like Topo-Video-switch-
+         MCU [29], the origin of the whole set of parameter sets may
+         come from multiple sources that may use non-unique parameter
+         set identifiers.  In this case, an offer may overwrite an
+         existing parameter set if no other mechanism that enables
+         uniqueness of the parameter sets in the out-of-band channel
+         exists.
+
+   -  In a multiparty session, one participant MUST associate parameter
+      sets coming from different sources with the source identification
+      whenever possible, e.g., by conveying out-of-band transported
+      parameter sets, as different sources typically use independent
+      parameter set identifier value spaces.
+
+   -  Adding or modifying parameter sets by using both principles B and
+      C in the same RTP session may lead to inconsistencies of the
+      parameter sets because of the lack of synchronization between the
+      control and the RTP channel.  Therefore, principles B and C MUST
+      NOT both be used in the same session unless sufficient
+      synchronization can be provided.
+
+   In some scenarios (e.g., when only the subset of this payload format
+   specification corresponding to H.241 is used) or topologies, it is
+   not possible to employ out-of-band parameter set transmission.  In
+   this case, parameter sets have to be transmitted in-band.  Here, the
+   synchronization with the non-parameter-set-data in the bitstream is
+   implicit, but the possibility of a loss has to be taken into account.
+
+
+
+Wang, et al.                 Standards Track                   [Page 77]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   The loss probability should be reduced using the mechanisms discussed
+   above.  In case a loss of a parameter set is detected, recovery may
+   be achieved using a Decoder Refresh Point procedure, for example,
+   using RTCP feedback Full Intra Request (FIR) [30].  Two example
+   Decoder Refresh Point procedures are provided in the informative
+   Section 8.5.
+
+   -  When parameter sets are initially provided using principle A and
+      then later added or updated in-band (principle C), there is a risk
+      associated with updating the parameter sets delivered out-of-band.
+      If receivers miss some in-band updates (for example, because of a
+      loss or a late tune-in), those receivers attempt to decode the
+      bitstream using outdated parameters.  It is therefore RECOMMENDED
+      that parameter set IDs be partitioned between the out-of-band and
+      in-band parameter sets.
+
+8.5.  Decoder Refresh Point Procedure Using In-Band Transport of
+      Parameter Sets (Informative)
+
+   When a sender with a video encoder according to [1] receives a
+   request for a decoder refresh point, the encoder shall enter the fast
+   update mode by using one of the procedures specified in Sections
+   8.5.1 or 8.5.2.  The procedure in Section 8.5.1 is the preferred
+   response in a lossless transmission environment.  Both procedures
+   satisfy the requirement to enter the fast update mode for H.264 video
+   encoding.
+
+8.5.1.  IDR Procedure to Respond to a Request for a Decoder Refresh
+        Point
+
+   This section gives one possible way to respond to a request for a
+   decoder refresh point.
+
+   The encoder shall, in the order presented here:
+
+   1) Immediately prepare to send an IDR picture.
+
+   2) Send a sequence parameter set to be used by the IDR picture to be
+      sent.  The encoder may optionally also send other sequence
+      parameter sets.
+
+   3) Send a picture parameter set to be used by the IDR picture to be
+      sent.  The encoder may optionally also send other picture
+      parameter sets.
+
+   4) Send the IDR picture.
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 78]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   5) From this point forward in time, send any other sequence or
+      picture parameter sets that have not yet been sent in this
+      procedure, prior to their reference by any NAL unit, regardless of
+      whether such parameter sets were previously sent prior to
+      receiving the request for a decoder refresh point.  As needed,
+      such parameter sets may be sent in a batch, one at a time, or in
+      any combination of these two methods.  Parameter sets may be
+      re-sent at any time for redundancy.  Caution should be taken when
+      parameter set updates are present, as described above in Section
+      8.4.
+
+8.5.2.  Gradual Recovery Procedure to Respond to a Request for a Decoder
+        Refresh Point
+
+   This section gives another possible way to respond to a request for a
+   decoder refresh point.
+
+   The encoder shall, in the order presented here:
+
+   1) Send a recovery point SEI message (see Sections D.1.7 and D.2.7 of
+      [1]).
+
+   2) Repeat any sequence and picture parameter sets that were sent
+      before the recovery point SEI message, prior to their reference by
+      a NAL unit.
+
+   The encoder shall ensure that the decoder has access to all reference
+   pictures for inter prediction of pictures at or after the recovery
+   point, which is indicated by the recovery point SEI message, in
+   output order, assuming that the transmission from now on is error-
+   free.
+
+   The value of the recovery_frame_cnt syntax element in the recovery
+   point SEI message should be small enough to ensure a fast recovery.
+
+   As needed, such parameter sets may be re-sent in a batch, one at a
+   time, or in any combination of these two methods.  Parameter sets may
+   be re-sent at any time for redundancy.  Caution should be taken when
+   parameter set updates are present, as described above in Section 8.4.
+
+9.  Security Considerations
+
+   RTP packets using the payload format defined in this specification
+   are subject to the security considerations discussed in the RTP
+   specification [5] and in any appropriate RTP profile (for example,
+   [16]).  This implies that confidentiality of the media streams is
+   achieved by encryption, for example, through the application of SRTP
+   [26].  Because the data compression used with this payload format is
+
+
+
+Wang, et al.                 Standards Track                   [Page 79]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   applied end-to-end, any encryption needs to be performed after
+   compression.  A potential denial-of-service threat exists for data
+   encodings using compression techniques that have non-uniform
+   receiver-end computational load.  The attacker can inject
+   pathological datagrams into the stream that are complex to decode and
+   that cause the receiver to be overloaded.  H.264 is particularly
+   vulnerable to such attacks, as it is extremely simple to generate
+   datagrams containing NAL units that affect the decoding process of
+   many future NAL units.  Therefore, the usage of data origin
+   authentication and data integrity protection of at least the RTP
+   packet is RECOMMENDED, for example, with SRTP [26].
+
+   Note that the appropriate mechanism to ensure confidentiality and
+   integrity of RTP packets and their payloads is very dependent on the
+   application and on the transport and signaling protocols employed.
+   Thus, although SRTP is given as an example above, other possible
+   choices exist.
+
+   Decoders MUST exercise caution with respect to the handling of user
+   data SEI messages, particularly if they contain active elements, and
+   MUST restrict their domain of applicability to the presentation
+   containing the stream.
+
+   End-to-end security with either authentication, integrity, or
+   confidentiality protection will prevent a MANE from performing media-
+   aware operations other than discarding complete packets.  In the case
+   of confidentiality protection, it will even be prevented from
+   discarding packets in a media-aware way.  To be allowed to perform
+   its operations, a MANE is required to be a trusted entity that is
+   included in the security context establishment.
+
+10.  Congestion Control
+
+   Congestion control for RTP SHALL be used in accordance with RFC 3550
+   [5] and with any applicable RTP profile, e.g., RFC 3551 [16].  If
+   best-effort service is being used, an additional requirement is that
+   users of this payload format MUST monitor packet loss to ensure that
+   the packet loss rate is within acceptable parameters.  Packet loss is
+   considered acceptable if a TCP flow across the same network path, and
+   experiencing the same network conditions, would achieve an average
+   throughput, measured on a reasonable timescale, that is not less than
+   the RTP flow is achieving.  This condition can be satisfied by
+   implementing congestion control mechanisms to adapt the transmission
+   rate (or the number of layers subscribed for a layered multicast
+   session) or by arranging for a receiver to leave the session if the
+   loss rate is unacceptably high.
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 80]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   The bitrate adaptation necessary for obeying the congestion control
+   principle is easily achievable when real-time encoding is used.
+   However, when pre-encoded content is being transmitted, bandwidth
+   adaptation requires the availability of more than one coded
+   representation of the same content, at different bitrates, or the
+   existence of non-reference pictures or sub-sequences [22] in the
+   bitstream.  The switching between the different representations can
+   normally be performed in the same RTP session, e.g., by employing a
+   concept known as SI/SP slices of the Extended profile or by switching
+   streams at IDR picture boundaries.  Only when non-downgradable
+   parameters (such as the profile part of the profile/level ID) are
+   required to be changed does it become necessary to terminate and
+   restart the media stream.  This may be accomplished by using a
+   different RTP payload type.
+
+   MANEs MAY follow the suggestions outlined in Section 7.3 and remove
+   certain unusable packets from the packet stream when that stream was
+   damaged due to previous packet losses.  This can help reduce the
+   network load in certain special cases.
+
+11.  IANA Considerations
+
+   The H264 media subtype name specified by RFC 3984 has been updated as
+   defined in Section 8.1 of this memo.
+
+12.  Informative Appendix: Application Examples
+
+   This payload specification is very flexible in its use, in order to
+   cover the extremely wide application space anticipated for H.264.
+   However, this great flexibility also makes it difficult for an
+   implementer to decide on a reasonable packetization scheme.  Some
+   information on how to apply this specification to real-world
+   scenarios is likely to appear in the form of academic publications
+   and a test model software and description in the near future.
+   However, some preliminary usage scenarios are described here as well.
+
+12.1.  Video Telephony According to Annex A of ITU-T Recommendation
+       H.241
+
+   H.323-based video telephony systems that use H.264 as an optional
+   video compression scheme are required to support Annex A of H.241 [3]
+   as a packetization scheme.  The packetization mechanism defined in
+   this Annex is technically identical with a small subset of this
+   specification.
+
+   When a system operates according to Annex A of H.241, parameter set
+   NAL units are sent in-band.  Only single NAL unit packets are used.
+   Many such systems are not sending IDR pictures regularly, but only
+
+
+
+Wang, et al.                 Standards Track                   [Page 81]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   when required by user interaction or by control protocol means, e.g.,
+   when switching between video channels in a Multipoint Control Unit or
+   for error recovery requested by feedback.
+
+12.2.  Video Telephony, No Slice Data Partitioning, No NAL Unit
+       Aggregation
+
+   The RTP part of this scheme is implemented and tested (though not the
+   control-protocol part; see below).
+
+   In most real-world video telephony applications, picture parameters
+   such as picture size or optional modes never change during the
+   lifetime of a connection.  Therefore, all necessary parameter sets
+   (usually only one) are sent as a side effect of the capability
+   exchange/announcement process, e.g., according to the SDP syntax
+   specified in Section 8.2 of this document.  As all necessary
+   parameter set information is established before the RTP session
+   starts, there is no need for sending any parameter set NAL units.
+   Slice data partitioning is not used either.  Thus, the RTP packet
+   stream basically consists of NAL units that carry single coded
+   slices.
+
+   The encoder chooses the size of coded slice NAL units so that they
+   offer the best performance.  Often, this is done by adapting the
+   coded slice size to the MTU size of the IP network.  For small
+   picture sizes, this may result in a one-picture-per-one-packet
+   strategy.  Intra refresh algorithms clean up the loss of packets and
+   the resulting drift-related artifacts.
+
+12.3.  Video Telephony, Interleaved Packetization Using NAL Unit
+       Aggregation
+
+   This scheme allows better error concealment and is used in
+   H.263-based designs using RFC 4629 packetization [11].  It has been
+   implemented, and good results were reported [13].
+
+   The VCL encoder codes the source picture so that all macroblocks
+   (MBs) of one MB line are assigned to one slice.  All slices with even
+   MB row addresses are combined into one STAP, and all slices with odd
+   MB row addresses are combined into another.  Those STAPs are
+   transmitted as RTP packets.  The establishment of the parameter sets
+   is performed as discussed above.
+
+   Note that the use of STAPs is essential here, as the high number of
+   individual slices (18 for a Common Intermediate Format (CIF) picture)
+   would lead to unacceptably high IP/UDP/RTP header overhead (unless
+   the source coding tool FMO is used, which is not assumed in this
+   scenario).  Furthermore, some wireless video transmission systems,
+
+
+
+Wang, et al.                 Standards Track                   [Page 82]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   such as H.324M and the IP-based video telephony specified in 3GPP,
+   are likely to use relatively small transport packet size.  For
+   example, a typical MTU size of H.223 AL3 SDU is around 100 bytes
+   [17].  Coding individual slices according to this packetization
+   scheme provides further advantage in communication between wired and
+   wireless networks, as individual slices are likely to be smaller than
+   the preferred maximum packet size of wireless systems.  Consequently,
+   a gateway can convert the STAPs used in a wired network into several
+   RTP packets with only one NAL unit, which are preferred in a wireless
+   network, and vice versa.
+
+12.4.  Video Telephony with Data Partitioning
+
+   This scheme has been implemented and has been shown to offer good
+   performance, especially at higher packet loss rates [13].
+
+   Data partitioning is known to be useful only when some form of
+   unequal error protection is available.  Normally, in single-session
+   RTP environments, even error characteristics are assumed; that is,
+   the packet loss probability of all packets of the session is the same
+   statistically.  However, there are means to reduce the packet loss
+   probability of individual packets in an RTP session.  A FEC packet
+   according to RFC 5109 [18], for example, specifies which media
+   packets are associated with the FEC packet.
+
+   In all cases, the incurred overhead is substantial but is in the same
+   order of magnitude as the number of bits that have otherwise been
+   spent for intra information.  However, this mechanism does not add
+   any delay to the system.
+
+   Again, the complete parameter set establishment is performed through
+   control protocol means.
+
+12.5.  Video Telephony or Streaming with FUs and Forward Error
+       Correction
+
+   This scheme has been implemented and has been shown to provide good
+   performance, especially at higher packet loss rates [19].
+
+   The most efficient means to combat packet losses for scenarios where
+   retransmissions are not applicable is forward error correction (FEC).
+   Although application layer, end-to-end use of FEC is often less
+   efficient than a FEC-based protection of individual links (especially
+   when links of different characteristics are in the transmission
+   path), application layer, end-to-end FEC is unavoidable in some
+   scenarios.  RFC 5109 [18] provides means to use generic, application
+   layer, end-to-end FEC in packet loss environments.  A binary forward
+   error correcting code is generated by applying the XOR operation to
+
+
+
+Wang, et al.                 Standards Track                   [Page 83]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   the bits at the same bit position in different packets.  The binary
+   code can be specified by the parameters (n,k), in which k is the
+   number of information packets used in the connection and n is the
+   total number of packets generated for k information packets; that is,
+   n-k parity packets are generated for k information packets.
+
+   When a code is used with parameters (n,k) within the RFC 5109
+   framework, the following properties are well known:
+
+   a) If applied over one RTP packet, RFC 5109 provides only packet
+      repetition.
+
+   b) RFC 5109 is most bitrate efficient if XOR-connected packets have
+      equal length.
+
+   c) At the same packet loss probability p and for a fixed k, the
+      greater the value of n, the smaller the residual error probability
+      becomes.  For example, for a packet loss probability of 10%, k=1,
+      and n=2, the residual error probability is about 1%, whereas for
+      n=3, the residual error probability is about 0.1%.
+
+   d) At the same packet loss probability p and for a fixed code rate
+      k/n, the greater the value of n, the smaller the residual error
+      probability becomes.  For example, at a packet loss probability of
+      p=10%, k=1, and n=2, the residual error rate is about 1%, whereas
+      for an extended Golay code with k=12 and n=24, the residual error
+      rate is about 0.01%.
+
+   For applying RFC 5109 in combination with H.264 baseline-coded video
+   without using FUs, several options might be considered:
+
+   1) The video encoder produces NAL units for which each video frame is
+      coded in a single slice.  Applying FEC, one could use a simple
+      code, e.g., (n=2, k=1).  That is, each NAL unit would basically
+      just be repeated.  The disadvantage is obviously the bad code
+      performance according to d), above, and the low flexibility, as
+      only (n, k=1) codes can be used.
+
+   2) The video encoder produces NAL units for which each video frame is
+      encoded in one or more consecutive slices.  Applying FEC, one
+      could use a better code, e.g., (n=24, k=12), over a sequence of
+      NAL units.  Depending on the number of RTP packets per frame, a
+      loss may introduce a significant delay, which is reduced when more
+      RTP packets are used per frame.  Packets of completely different
+      lengths might also be connected, which decreases bitrate
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 84]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+      efficiency according to b), above.  However, with some care and
+      for slices of 1 kb or larger, similar length (100-200 bytes
+      difference) may be produced, which will not lower the bit
+      efficiency catastrophically.
+
+   3) The video encoder produces NAL units, for which a certain frame
+      contains k slices of possibly almost equal length.  Then, applying
+      FEC, a better code, e.g., (n=24, k=12), can be used over the
+      sequence of NAL units for each frame.  The delay compared to that
+      of 2), above, may be reduced, but several disadvantages are
+      obvious.  First, the coding efficiency of the encoded video is
+      lowered significantly, as slice-structured coding reduces intra-
+      frame prediction and additional slice overhead is necessary.
+      Second, pre-encoded content or, when operating over a gateway, the
+      video is usually not appropriately coded with k slices such that
+      FEC can be applied.  Finally, the encoding of video producing k
+      slices of equal length is not straightforward and might require
+      more than one encoding pass.
+
+   Many of the mentioned disadvantages can be avoided by applying FUs in
+   combination with FEC.  Each NAL unit can be split into any number of
+   FUs of basically equal length; therefore, FEC, with a reasonable k
+   and n, can be applied, even if the encoder made no effort to produce
+   slices of equal length.  For example, a coded slice NAL unit
+   containing an entire frame can be split to k FUs, and a parity check
+   code (n=k+1, k) can be applied.  However, this has the disadvantage
+   that unless all created fragments can be recovered, the whole slice
+   will be lost.  Thus, a larger section is lost than would be if the
+   frame had been split into several slices.
+
+   The presented technique makes it possible to achieve good
+   transmission error tolerance, even if no additional source coding
+   layer redundancy (such as periodic intra frames) is present.
+   Consequently, the same coded video sequence can be used to achieve
+   the maximum compression efficiency and quality over error-free
+   transmission and for transmission over error-prone networks.
+   Furthermore, the technique allows the application of FEC to pre-
+   encoded sequences without adding delay.  In this case, pre-encoded
+   sequences that are not encoded for error-prone networks can still be
+   transmitted almost reliably without adding extensive delays.  In
+   addition, FUs of equal length result in a bitrate efficient use of
+   RFC 5109.
+
+   If the error probability depends on the length of the transmitted
+   packet (e.g., in case of mobile transmission [15]), the benefits of
+   applying FUs with FEC are even more obvious.  Basically, the
+   flexibility of the size of FUs allows appropriate FEC to be applied
+   for each NAL unit and unequal error protection of NAL units.
+
+
+
+Wang, et al.                 Standards Track                   [Page 85]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   When FUs and FEC are used, the incurred overhead is substantial but
+   is in the same order of magnitude as the number of bits that have to
+   be spent for intra-coded macroblocks if no FEC is applied.  In [19],
+   it was shown that the overall performance of the FEC-based approach
+   enhanced quality when using the same error rate and same overall
+   bitrate, including the overhead.
+
+12.6.  Low Bitrate Streaming
+
+   This scheme has been implemented with H.263 and non-standard RTP
+   packetization and has given good results [20].  There is no technical
+   reason why similarly good results could not be achievable with H.264.
+
+   In today's Internet streaming, some of the offered bitrates are
+   relatively low in order to allow terminals with dial-up modems to
+   access the content.  In wired IP networks, relatively large packets,
+   say 500 - 1500 bytes, are preferred to smaller and more frequently
+   occurring packets in order to reduce network congestion.  Moreover,
+   use of large packets decreases the amount of RTP/UDP/IP header
+   overhead.  For low bitrate video, the use of large packets means that
+   sometimes up to few pictures should be encapsulated in one packet.
+
+   However, the loss of a packet including many coded pictures would
+   have drastic consequences for visual quality, as there is practically
+   no way to conceal the loss of an entire picture other than repeating
+   the previous one.  One way to construct relatively large packets and
+   maintain possibilities for successful loss concealment is to
+   construct MTAPs that contain interleaved slices from several
+   pictures.  An MTAP should not contain spatially adjacent slices from
+   the same picture or spatially overlapping slices from any picture.
+   If a packet is lost, it is likely that a lost slice is surrounded by
+   spatially adjacent slices of the same picture and spatially
+   corresponding slices of the temporally previous and succeeding
+   pictures.  Consequently, concealment of the lost slice is likely to
+   be relatively successful.
+
+12.7.  Robust Packet Scheduling in Video Streaming
+
+   Robust packet scheduling has been implemented with MPEG-4 Part 2 and
+   simulated in a wireless streaming environment [21].  There is no
+   technical reason why similar or better results could not be
+   achievable with H.264.
+
+   Streaming clients typically have a receiver buffer that is capable of
+   storing a relatively large amount of data.  Initially, when a
+   streaming session is established, a client does not start playing the
+   stream back immediately.  Rather, it typically buffers the incoming
+   data for a few seconds.  This buffering helps maintain continuous
+
+
+
+Wang, et al.                 Standards Track                   [Page 86]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   playback, as, in case of occasional increased transmission delays or
+   network throughput drops, the client can decode and play buffered
+   data.  Otherwise, without initial buffering, the client has to freeze
+   the display, stop decoding, and wait for incoming data.  The
+   buffering is also necessary for either automatic or selective
+   retransmission in any protocol level.  If any part of a picture is
+   lost, a retransmission mechanism may be used to resend the lost data.
+   If the retransmitted data is received before its scheduled decoding
+   or playback time, the loss is recovered perfectly.  Coded pictures
+   can be ranked according to their importance in the subjective quality
+   of the decoded sequence.  For example, non-reference pictures, such
+   as conventional B pictures, are subjectively least important, as
+   their absence does not affect decoding of any other pictures.  In
+   addition to non-reference pictures, the ITU-T H.264 | ISO/IEC
+   14496-10 standard includes a temporal scalability method called sub-
+   sequences [22].  Subjective ranking can also be made on coded slice
+   data partition or slice group basis.  Coded slices and coded slice
+   data partitions that are subjectively the most important can be sent
+   earlier than their decoding order indicates, whereas coded slices and
+   coded slice data partitions that are subjectively the least important
+   can be sent later than their natural coding order indicates.
+   Consequently, any retransmitted parts of the most important slices
+   and coded slice data partitions are more likely to be received before
+   their scheduled decoding or playback time compared to the least
+   important slices and slice data partitions.
+
+13.  Informative Appendix: Rationale for Decoding Order Number
+
+13.1.  Introduction
+
+   The Decoding Order Number (DON) concept was introduced mainly to
+   enable efficient multi-picture slice interleaving (see Section 12.6)
+   and robust packet scheduling (see Section 12.7).  In both of these
+   applications, NAL units are transmitted out of decoding order.  DON
+   indicates the decoding order of NAL units and should be used in the
+   receiver to recover the decoding order.  Example use cases for
+   efficient multi-picture slice interleaving and for robust packet
+   scheduling are given in Sections 13.2 and 13.3, respectively.
+   Section 13.4 describes the benefits of the DON concept in error
+   resiliency achieved by redundant coded pictures.  Section 13.5
+   summarizes considered alternatives to DON and justifies why DON was
+   chosen for this RTP payload specification.
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 87]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+13.2.  Example of Multi-Picture Slice Interleaving
+
+   An example of multi-picture slice interleaving follows.  A subset of
+   a coded video sequence is depicted below in output order.  R denotes
+   a reference picture, N denotes a non-reference picture, and the
+   number indicates a relative output time.
+
+      ... R1 N2 R3 N4 R5 ...
+
+   The decoding order of these pictures from left to right is as
+   follows:
+
+      ... R1 R3 N2 R5 N4 ...
+
+   The NAL units of pictures R1, R3, N2, R5, and N4 are marked with a
+   DON equal to 1, 2, 3, 4, and 5, respectively.
+
+   Each reference picture consists of three slice groups that are
+   scattered as follows (a number denotes the slice group number for
+   each macroblock in a Quarter Common Intermediate Format (QCIF)
+   frame):
+
+      0 1 2 0 1 2 0 1 2 0 1
+      2 0 1 2 0 1 2 0 1 2 0
+      1 2 0 1 2 0 1 2 0 1 2
+      0 1 2 0 1 2 0 1 2 0 1
+      2 0 1 2 0 1 2 0 1 2 0
+      1 2 0 1 2 0 1 2 0 1 2
+      0 1 2 0 1 2 0 1 2 0 1
+      2 0 1 2 0 1 2 0 1 2 0
+      1 2 0 1 2 0 1 2 0 1 2
+
+   For the sake of simplicity, we assume that all the macroblocks of a
+   slice group are included in one slice.  Three MTAPs are constructed
+   from three consecutive reference pictures so that each MTAP contains
+   three aggregation units, each of which contains all the macroblocks
+   from one slice group.  The first MTAP contains slice group 0 of
+   picture R1, slice group 1 of picture R3, and slice group 2 of picture
+   R5.  The second MTAP contains slice group 1 of picture R1, slice
+   group 2 of picture R3, and slice group 0 of picture R5.  The third
+   MTAP contains slice group 2 of picture R1, slice group 0 of picture
+   R3, and slice group 1 of picture R5.  Each non-reference picture is
+   encapsulated into an STAP-B.
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 88]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Consequently, the transmission order of NAL units is the following:
+
+      R1, slice group 0, DON 1, carried in MTAP,RTP SN: N
+      R3, slice group 1, DON 2, carried in MTAP,RTP SN: N
+      R5, slice group 2, DON 4, carried in MTAP,RTP SN: N
+      R1, slice group 1, DON 1, carried in MTAP,RTP SN: N+1
+      R3, slice group 2, DON 2, carried in MTAP,RTP SN: N+1
+      R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+1
+      R1, slice group 2, DON 1, carried in MTAP,RTP SN: N+2
+      R3, slice group 1, DON 2, carried in MTAP,RTP SN: N+2
+      R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+2
+      N2, DON 3, carried in STAP-B, RTP SN: N+3
+      N4, DON 5, carried in STAP-B, RTP SN: N+4
+
+   The receiver is able to organize the NAL units back in decoding order
+   based on the value of DON associated with each NAL unit.
+
+   If one of the MTAPs is lost, the spatially adjacent and temporally
+   co-located macroblocks are received and can be used to conceal the
+   loss efficiently.  If one of the STAPs is lost, the effect of the
+   loss does not propagate temporally.
+
+13.3.  Example of Robust Packet Scheduling
+
+   An example of robust packet scheduling follows.  The communication
+   system used in the example consists of the following components in
+   the order that the video is processed from source to sink:
+
+   o camera and capturing
+   o pre-encoding buffer
+   o encoder
+   o encoded picture buffer
+   o transmitter
+   o transmission channel
+   o receiver
+   o receiver buffer
+   o decoder
+   o decoded picture buffer
+   o display
+
+   The video communication system used in this example operates as
+   follows.  Note that processing of the video stream happens gradually
+   and at the same time in all components of the system.  The source
+   video sequence is shot and captured to a pre-encoding buffer.  The
+   pre-encoding buffer can be used to order pictures from sampling order
+   to encoding order or to analyze multiple uncompressed frames for
+   bitrate control purposes, for example.  In some cases, the pre-
+   encoding buffer may not exist; instead, the sampled pictures are
+
+
+
+Wang, et al.                 Standards Track                   [Page 89]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   encoded right away.  The encoder encodes pictures from the pre-
+   encoding buffer and stores the output (i.e., coded pictures) to the
+   encoded picture buffer.  The transmitter encapsulates the coded
+   pictures from the encoded picture buffer to transmission packets and
+   sends them to a receiver through a transmission channel.  The
+   receiver stores the received packets to the receiver buffer.  The
+   receiver buffering process typically includes buffering for
+   transmission delay jitter.  The receiver buffer can also be used to
+   recover correct decoding order of coded data.  The decoder reads
+   coded data from the receiver buffer and produces decoded pictures as
+   output into the decoded picture buffer.  The decoded picture buffer
+   is used to recover the output (or display) order of pictures.
+   Finally, pictures are displayed.
+
+   In the following example figures, I denotes an IDR picture, R denotes
+   a reference picture, N denotes a non-reference picture, and the
+   number after I, R, or N indicates the sampling time relative to the
+   previous IDR picture in decoding order.  Values below the sequence of
+   pictures indicate scaled system clock timestamps.  The system clock
+   is initialized arbitrarily in this example, and time runs from left
+   to right.  Each I, R, and N picture is mapped into the same timeline
+   compared to the previous processing step, if any, assuming that
+   encoding, transmission, and decoding take no time.  Thus, events
+   happening at the same time are located in the same column throughout
+   all example figures.
+
+   A subset of a sequence of coded pictures is depicted below in
+   sampling order.
+
+       ...  N58 N59 I00 N01 N02 R03 N04 N05 R06 ... N58 N59 I00 N01 ...
+       ... --|---|---|---|---|---|---|---|---|- ... -|---|---|---|- ...
+       ...  58  59  60  61  62  63  64  65  66  ... 128 129 130 131 ...
+
+       Figure 16.  Sequence of pictures in sampling order
+
+   The sampled pictures are buffered in the pre-encoding buffer to
+   arrange them in encoding order.  In this example, we assume that the
+   non-reference pictures are predicted from both the previous and the
+   next reference picture in output order, except for the non-reference
+   pictures immediately preceding an IDR picture, which are predicted
+   only from the previous reference picture in output order.  Thus, the
+   pre-encoding buffer has to contain at least two pictures, and the
+   buffering causes a delay of two picture intervals.  The output of the
+   pre-encoding buffering process and the encoding (and decoding) order
+   of the pictures are as follows:
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 90]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
+       ... -|---|---|---|---|---|---|---|---|- ...
+       ... 60  61  62  63  64  65  66  67  68  ...
+
+       Figure 17.  Reordered pictures in the pre-encoding buffer
+
+   The encoder or the transmitter can set the value of DON for each
+   picture to a value of DON for the previous picture in decoding order
+   plus one.
+
+   For the sake of simplicity, let us assume that:
+
+   o  the frame rate of the sequence is constant,
+   o  each picture consists of only one slice,
+   o  each slice is encapsulated in a single NAL unit packet,
+   o  there is no transmission delay, and
+   o  pictures are transmitted at constant intervals (that is, 1 /
+      (frame rate)).
+
+   When pictures are transmitted in decoding order, they are received as
+   follows:
+
+       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
+       ... -|---|---|---|---|---|---|---|---|- ...
+       ... 60  61  62  63  64  65  66  67  68  ...
+
+       Figure 18.  Received pictures in decoding order
+
+   The OPTIONAL sprop-interleaving-depth media type parameter is set to
+   0, as the transmission (or reception) order is identical to the
+   decoding order.
+
+   Initially, the decoder has to buffer for one picture interval in its
+   decoded picture buffer to organize pictures from decoding order to
+   output order, as depicted below:
+
+       ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ...
+       ... -|---|---|---|---|---|---|---|---|- ...
+       ... 61  62  63  64  65  66  67  68  69  ...
+
+       Figure 19.  Output order
+
+   The amount of required initial buffering in the decoded picture
+   buffer can be signaled in the buffering period SEI message or with
+   the num_reorder_frames syntax element of H.264 video usability
+   information.  num_reorder_frames indicates the maximum number of
+   frames, complementary field pairs, or non-paired fields that precede
+   any frame, complementary field pair, or non-paired field in the
+
+
+
+Wang, et al.                 Standards Track                   [Page 91]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   sequence in decoding order and that follow it in output order.  For
+   the sake of simplicity, we assume that num_reorder_frames is used to
+   indicate the initial buffer in the decoded picture buffer.  In this
+   example, num_reorder_frames is equal to 1.
+
+   It can be observed that if the IDR picture I00 is lost during
+   transmission and a retransmission request is issued when the value of
+   the system clock is 62, there is one picture interval of time (until
+   the system clock reaches timestamp 63) to receive the retransmitted
+   IDR picture I00.
+
+   Let us then assume that IDR pictures are transmitted two frame
+   intervals earlier than their decoding position; that is, the pictures
+   are transmitted as follows:
+
+       ...  I00 N58 N59 R03 N01 N02 R06 N04 N05 ...
+       ... --|---|---|---|---|---|---|---|---|- ...
+       ...  62  63  64  65  66  67  68  69  70  ...
+
+       Figure 20.  Interleaving: Early IDR pictures in sending order
+
+   The OPTIONAL sprop-interleaving-depth media type parameter is set
+   equal to 1 according to its definition.  (The value of sprop-
+   interleaving-depth in this example can be derived as follows: picture
+   I00 is the only picture preceding picture N58 or N59 in transmission
+   order and following it in decoding order.  Except for pictures I00,
+   N58, and N59, the transmission order is the same as the decoding
+   order of pictures.  As a coded picture is encapsulated into exactly
+   one NAL unit, the value of sprop-interleaving-depth is equal to the
+   maximum number of pictures preceding any picture in transmission
+   order and following the picture in decoding order).
+
+   The receiver buffering process contains two pictures at a time
+   according to the value of the sprop-interleaving-depth parameter and
+   orders pictures from the reception order to the correct decoding
+   order based on the value of DON associated with each picture.  The
+   output of the receiver buffering process is as follows:
+
+       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
+       ... -|---|---|---|---|---|---|---|---|- ...
+       ... 63  64  65  66  67  68  69  70  71  ...
+
+       Figure 21.  Interleaving: Receiver buffer
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 92]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Again, an initial buffering delay of one picture interval is needed
+   to organize pictures from decoding order to output order, as depicted
+   below:
+
+        ... N58 N59 I00 N01 N02 R03 N04 N05 ...
+        ... -|---|---|---|---|---|---|---|- ...
+        ... 64  65  66  67  68  69  70  71  ...
+
+        Figure 22.  Interleaving: Receiver buffer after reordering
+
+   Note that the maximum delay that IDR pictures can undergo during
+   transmission, including possible application, transport, or link
+   layer retransmission, is equal to three picture intervals.  Thus, the
+   loss resiliency of IDR pictures is improved in systems supporting
+   retransmission compared to the case in which pictures are transmitted
+   in their decoding order.
+
+13.4.  Robust Transmission Scheduling of Redundant Coded Slices
+
+   A redundant coded picture is a coded representation of a picture or a
+   part of a picture that is not used in the decoding process if the
+   corresponding primary coded picture is correctly decoded.  There
+   should be no noticeable difference between any area of the decoded
+   primary picture and a corresponding area that would result from
+   application of the H.264 decoding process for any redundant picture
+   in the same access unit.  A redundant coded slice is a coded slice
+   that is a part of a redundant coded picture.
+
+   Redundant coded pictures can be used to provide unequal error
+   protection in error-prone video transmission.  If a primary coded
+   representation of a picture is decoded incorrectly, a corresponding
+   redundant coded picture can be decoded.  Examples of applications and
+   coding techniques using the redundant codec picture feature include
+   the video redundancy coding [23] and the protection of "key pictures"
+   in multicast streaming [24].
+
+   One property of many error-prone video communications systems is that
+   transmission errors are often bursty.  Therefore, they may affect
+   more than one consecutive transmission packet in transmission order.
+   In low bitrate video communication, it is relatively common for an
+   entire coded picture to be encapsulated into one transmission packet.
+   Consequently, a primary coded picture and the corresponding redundant
+   coded pictures may be transmitted in consecutive packets in
+   transmission order.  To make the transmission scheme more tolerant of
+   bursty transmission errors, it is beneficial to transmit the primary
+   coded picture and redundant coded picture separated by more than a
+   single packet.  The DON concept enables this.
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 93]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+13.5.  Remarks on Other Design Possibilities
+
+   The slice header syntax structure of the H.264 coding standard
+   contains the frame_num syntax element that can indicate the decoding
+   order of coded frames.  However, the usage of the frame_num syntax
+   element is not feasible or desirable to recover the decoding order,
+   due to the following reasons:
+
+   o  The receiver is required to parse at least one slice header per
+      coded picture (before passing the coded data to the decoder).
+
+   o  Coded slices from multiple coded video sequences cannot be
+      interleaved, as the frame number syntax element is reset to 0 in
+      each IDR picture.
+
+   o  The coded fields of a complementary field pair share the same
+      value of the frame_num syntax element.  Thus, the decoding order
+      of the coded fields of a complementary field pair cannot be
+      recovered based on the frame_num syntax element or any other
+      syntax element of the H.264 coding syntax.
+
+   The RTP payload format for transport of MPEG-4 elementary streams
+   [25] enables interleaving of access units and transmission of
+   multiple access units in the same RTP packet.  An access unit is
+   specified in the H.264 coding standard to comprise all NAL units
+   associated with a primary coded picture according to Subclause
+   7.4.1.2 of [1].  Consequently, slices of different pictures cannot be
+   interleaved, and the multi-picture slice interleaving technique (see
+   Section 12.6) for improved error resilience cannot be used.
+
+14.  Changes from RFC 3984
+
+   Following is the list of technical changes (including bug fixes) from
+   RFC 3984.  Besides this list of technical changes, numerous editorial
+   changes have been made, but not documented in this section.  Note
+   that Section 8.2.2 is where much of the important changes in this
+   memo occurs and deserves particular attention.
+
+   1)  In Sections 5.4, 5.5, 6.2, 6.3, and 6.4, removed that the
+       packetization mode in use may be signaled by external means.
+
+   2)  In Section 7.2.2, changed the sentence
+
+       There are N VCL NAL units in the de-interleaving buffer.
+
+       to
+
+       There are N or more VCL NAL units in the de-interleaving buffer.
+
+
+
+Wang, et al.                 Standards Track                   [Page 94]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   3)  In Section 8.1, the semantics of sprop-init-buf-time (paragraph
+       2), changed the sentence
+
+       The parameter is the maximum value of (transmission time of a NAL
+       unit - decoding time of the NAL unit), assuming reliable and
+       instantaneous transmission, the same timeline for transmission
+       and decoding, and that decoding starts when the first packet
+       arrives.
+
+       to
+
+       The parameter is the maximum value of (decoding time of the NAL
+       unit - transmission time of a NAL unit), assuming reliable and
+       instantaneous transmission, the same timeline for transmission
+       and decoding, and that decoding starts when the first packet
+       arrives.
+
+   4)  Added media type parameters max-smbps, sprop-level-parameter-
+       sets, use-level-src-parameter-sets, in-band-parameter-sets, sar-
+       understood, and sar-supported.
+
+   5)  In Section 8.1, removed the specification of parameter-add.
+       Other descriptions of parameter-add (in Sections 8.2 and 8.4)
+       were also removed.
+
+   6)  In Section 8.1, added a constraint to sprop-parameter-sets such
+       that it can only contain parameter sets for the same profile and
+       level as indicated by profile-level-id.
+
+   7)  In Section 8.2.1, added that sprop-parameter-sets and sprop-
+       level-parameter-sets may be either included in the "a=fmtp" line
+       of SDP or conveyed using the "fmtp" source attribute as specified
+       in Section 6.3 of [9].
+
+   8)  In Section 8.2.2, removed sprop-deint-buf-req from being part of
+       the media format configuration in usage with the SDP Offer/Answer
+       model.
+
+   9)  In Section 8.2.2, made it clear that level is downgradable in the
+       SDP Offer/Answer model, i.e., the use of the level part of
+       profile-level-id does not need to be symmetric (the level
+       included in the answer can be lower than or equal to the level
+       included in the offer).
+
+   10) In Section 8.2.2, removed that the capability parameters may be
+       used to declare encoding capabilities.
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 95]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   11) In Section 8.2.2, added rules on how to use sprop-parameter-sets
+       and sprop-level-parameter-sets for out-of-band transport of
+       parameter sets, with or without level downgrading.
+
+   12) In Section 8.2.2, clarified the rules of using the media type
+       parameters with SDP Offer/Answer for multicast.
+
+   13) In Section 8.2.2, completed and corrected the list of how
+       different media type parameters shall be interpreted in the
+       different combinations of offer or answer and direction
+       attribute.
+
+   14) In Section 8.4, changed the text such that both out-of-band and
+       in-band transport of parameter sets are allowed, and neither is
+       recommended or required.
+
+   15) Added Section 8.5 (informative) providing example methods for
+       decoder refresh to handle parameter set losses.
+
+   16) Added media type parameters max-recv-level and level-asymmetry-
+       allowed and adjusted associated text and examples for level
+       upgrade and asymmetry.
+
+15.  Backward Compatibility to RFC 3984
+
+   The current document is a revision of RFC 3984 and obsoletes it.  The
+   technical changes relative to RFC 3984 are listed in Section 14.
+   This section addresses the backward compatibility issues.
+
+   It should be noted that for the majority of cases, there will be no
+   compatibility issues for legacy implementations per RFC 3984 and new
+   implementations per this document to interwork.  Compatibility issues
+   may only occur when both of the following conditions are true: 1)
+   legacy implementations and new implementations are interworking, and
+   2) parameter sets are transported out-of-band.  When such
+   compatibility issues occur, it is easy to debug and find the reason
+   for the incompatibility using the following analyses.
+
+   Items 1, 2, 3, 7, 9, 10, 12, and 13 are bug-fix types of changes and
+   do not incur any backward compatibility issues.
+
+   Item 4 (addition of six new media type parameters) does not incur any
+   backward compatibility issues for SDP Offer/Answer-based
+   applications, as legacy RFC 3984 receivers ignore these parameters,
+   and it is fine for legacy RFC 3984 senders not to use these
+   parameters as they are optional.  However, there is a backward
+   compatibility issue for declarative-usage-based applications (only
+   for the parameter sprop-level-parameter-sets as the other five
+
+
+
+Wang, et al.                 Standards Track                   [Page 96]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   parameters are not usable in declarative usage).  For example,
+   declarative-usage-based applications using RTSP and SAP have a
+   backward compatibility issue because the SDP receiver per RFC 3984
+   cannot accept a session for which the SDP includes an unrecognized
+   parameter.  Therefore, the RTSP or SAP server may have to prepare two
+   sets of streams, one for legacy RFC 3984 receivers and one for
+   receivers according to this memo.
+
+   Items 5, 6, and 11 are related to out-of-band transport of parameter
+   sets.  There are following backward compatibility issues.
+
+   1)  When a legacy sender per RFC 3984 includes parameter sets for a
+       level different than the default level indicated by profile-
+       level-id to sprop-parameter-sets, the parameter value of sprop-
+       parameter-sets is invalid to the receiver per this memo;
+       therefore, the session may be rejected.
+
+   2)  In SDP Offer/Answer between a legacy offerer per RFC 3984 and an
+       answerer per this memo, when the answerer includes in the answer
+       parameter sets that are not a superset of the parameter sets
+       included in the offer, the parameter value of sprop-parameter-
+       sets is invalid to the offerer, and the session may not be
+       initiated properly (related to change item 11).
+
+   3)  When one endpoint A per this memo includes in-band-parameter-sets
+       equal to 1, the other side B per RFC 3984 does not understand
+       that it must transmit parameter sets in-band, and B may still
+       exclude parameter sets in the in-band stream it is sending.
+       Consequently, endpoint A cannot decode the stream it receives.
+
+   Item 7 (allowance of conveying sprop-parameter-sets and sprop-level-
+   parameter-sets using the "fmtp" source attribute as specified in
+   Section 6.3 of [9]) is similar to item 4.  It does not incur any
+   backward compatibility issues for SDP Offer/Answer-based
+   applications, as legacy RFC 3984 receivers ignore the "fmtp" source
+   attribute, and it is fine for legacy RFC 3984 senders not to use the
+   "fmtp" source attribute as it is optional.  However, there is a
+   backward compatibility issue for SDP declarative-usage-based
+   applications, e.g., those using RTSP and SAP, because the SDP
+   receiver per RFC 3984 cannot accept a session for which the SDP
+   includes an unrecognized parameter (i.e., the "fmtp" source
+   attribute).  Therefore, the RTSP or SAP server may have to prepare
+   two sets of streams, one for legacy RFC 3984 receivers and one for
+   receivers according to this memo.
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 97]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   Item 14 does not incur any backward compatibility issues, as out-of-
+   band transport of parameter sets is still allowed.
+
+   Item 15 does not incur any backward compatibility issues, as the
+   added Section 8.5 is informative.
+
+   Item 16 does not create any backward compatibility issues as the
+   handling of the default level is the same if either end is RFC 3984
+   compliant, and, furthermore, RFC-3984-compliant ends would simply
+   ignore the new media type parameters, if present.
+
+16.  Acknowledgements
+
+   Stephan Wenger, Miska Hannuksela, Thomas Stockhammer, Magnus
+   Westerlund, and David Singer are thanked as the authors of RFC 3984.
+   Dave Lindbergh, Philippe Gentric, Gonzalo Camarillo, Gary Sullivan,
+   Joerg Ott, and Colin Perkins are thanked for careful review during
+   the development of RFC 3984. Stephen Botzko, Magnus Westerlund, Alex
+   Eleftheriadis, Thomas Schierl, Tom Taylor, Ali Begen, Aaron Wells,
+   Stuart Taylor, Robert Sparks, Dan Romascanu, and Niclas Comstedt are
+   thanked for their valuable comments and input during the development
+   of this memo.
+
+17.  References
+
+17.1.  Normative References
+
+   [1]   ITU-T Recommendation H.264, "Advanced video coding for generic
+         audiovisual services", March 2010.
+
+   [2]   ISO/IEC International Standard 14496-10:2008.
+
+   [3]   ITU-T Recommendation H.241, "Extended video procedures and
+         control signals for H.300-series terminals", May 2006.
+
+   [4]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
+         Levels", BCP 14, RFC 2119, March 1997.
+
+   [5]   Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
+         "RTP: A Transport Protocol for Real-Time Applications", STD 64,
+         RFC 3550, July 2003.
+
+   [6]   Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+         Description Protocol", RFC 4566, July 2006.
+
+   [7]   Josefsson, S., "The Base16, Base32, and Base64 Data Encodings",
+         RFC 4648, October 2006.
+
+
+
+
+Wang, et al.                 Standards Track                   [Page 98]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   [8]   Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
+         Session Description Protocol (SDP)", RFC 3264, June 2002.
+
+   [9]   Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media
+         Attributes in the Session Description Protocol (SDP)", RFC
+         5576, June 2009.
+
+17.2.  Informative References
+
+   [10]  Luthra, A., Sullivan, G.J., and T. Wiegand (eds.),
+         "Introduction to the special issue on the H.264/AVC video
+         coding standard", IEEE Transactions on Circuits and Systems for
+         Video Technology, Vol. 13, No. 7, July 2003.
+
+   [11]  Ott, J., Bormann, C., Sullivan, G., Wenger, S., and R. Even,
+         Ed., "RTP Payload Format for ITU-T Rec. H.263 Video", RFC 4629,
+         January 2007.
+
+   [12]  ISO/IEC International Standard 14496-2:2004.
+
+   [13]  Wenger, S., "H.264/AVC over IP", IEEE Transaction on Circuits
+         and Systems for Video Technology, Vol. 13, No. 7, July 2003.
+
+   [14]  Wenger, S., "H.26L over IP: The IP-Network Adaptation Layer",
+         Proceedings Packet Video Workshop, April 2002.
+
+   [15]  Stockhammer, T., Hannuksela, M.M., and S. Wenger, "H.26L/JVT
+         Coding Network Abstraction Layer and IP-Based Transport", IEEE
+         International Conference on Image Processing (ICIP 2002),
+         Rochester, NY, September 2002.
+
+   [16]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
+         Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
+
+   [17]  ITU-T Recommendation H.223, "Multiplexing protocol for low bit
+         rate multimedia communication", July 2001.
+
+   [18]  Li, A., Ed., "RTP Payload Format for Generic Forward Error
+         Correction", RFC 5109, December 2007.
+
+   [19]  Stockhammer, T., Wiegand, T., Oelbaum, T., and F. Obermeier,
+         "Video Coding and Transport Layer Techniques for H.264/AVC-
+         Based Transmission over Packet-Lossy Networks", IEEE
+         International Conference on Image Processing (ICIP 2003),
+         Barcelona, Spain, September 2003.
+
+   [20]  Varsa, V. and M. Karczewicz, "Slice interleaving in compressed
+         video packetization", Packet Video Workshop 2000.
+
+
+
+Wang, et al.                 Standards Track                   [Page 99]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+   [21]  Kang, S.H. and A. Zakhor, "Packet scheduling algorithm for
+         wireless video streaming", Packet Video Workshop 2002.
+
+   [22]  Hannuksela, M.M., "Enhanced Concept of GOP", JVT-B042,
+         available http://ftp3.itu.int/av-arch/video-site/0201_Gen/JVT-
+         B042.doc, January 2002.
+
+   [23]  Wenger, S., "Video Redundancy Coding in H.263+", 1997
+         International Workshop on Audio-Visual Services over Packet
+         Networks, September 1997.
+
+   [24]  Wang, Y.-K., Hannuksela, M.M., and M. Gabbouj, "Error Resilient
+         Video Coding Using Unequally Protected Key Pictures", in Proc.
+         International Workshop VLBV03, September 2003.
+
+   [25]  van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
+         P. Gentric, "RTP Payload Format for Transport of MPEG-4
+         Elementary Streams", RFC 3640, November 2003.
+
+   [26]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+         Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
+         3711, March 2004.
+
+   [27]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
+         Protocol (RTSP)", RFC 2326, April 1998.
+
+   [28]  Handley, M., Perkins, C., and E. Whelan, "Session Announcement
+         Protocol", RFC 2974, October 2000.
+
+   [29]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
+         January 2008.
+
+   [30]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman, "Codec
+         Control Messages in the RTP Audio-Visual Profile with Feedback
+         (AVPF)", RFC 5104, February 2008.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                  [Page 100]
+
+RFC 6184           RTP Payload Format for H.264 Video           May 2011
+
+
+Authors' Addresses
+
+   Ye-Kui Wang
+   Huawei Technologies
+   400 Crossing Blvd, 2nd Floor
+   Bridgewater, NJ 08807
+   USA
+
+   Phone: +1-908-541-3518
+   EMail: yekui.wang@huawei.com
+
+
+   Roni Even
+   Huawei Technologies
+   14 David Hamelech
+   Tel Aviv 64953
+   Israel
+
+   Phone: +972-545481099
+   EMail: even.roni@huawei.com
+
+
+   Tom Kristensen
+   TANDBERG
+   Philip Pedersens vei 22
+   N-1366 Lysaker
+   Norway
+
+   Phone: +47 67125125
+   EMail: tom.kristensen@tandberg.com, tomkri@ifi.uio.no
+
+
+   Randell Jesup
+   WorldGate Communications
+   3800 Horizon Blvd, Suite #103
+   Trevose, PA 19053-4947
+   USA
+
+   Phone: +1-215-354-5166
+   EMail: rjesup@wgate.com, randell_ietf@jesup.org
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al.                 Standards Track                  [Page 101]
+