1 files changed, 4651 insertions, 0 deletions
diff --git a/doc/rfc/rfc3984.txt b/doc/rfc/rfc3984.txt
new file mode 100644
index 0000000..f84e338
--- /dev/null
+++ b/doc/rfc/rfc3984.txt
@@ -0,0 +1,4651 @@
+
+
+
+
+
+
+Network Working Group                                          S. Wenger
+Request for Comments: 3984                               M.M. Hannuksela
+Category: Standards Track                                 T. Stockhammer
+                                                           M. Westerlund
+                                                               D. Singer
+                                                           February 2005
+
+
+                   RTP Payload Format for H.264 Video
+
+Status of This Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2005).
+
+Abstract
+
+   This memo describes an RTP Payload format for the ITU-T
+   Recommendation H.264 video codec and the technically identical
+   ISO/IEC International Standard 14496-10 video codec.  The RTP payload
+   format allows for packetization of one or more Network Abstraction
+   Layer Units (NALUs), produced by an H.264 video encoder, in each RTP
+   payload.  The payload format has wide applicability, as it supports
+   applications from simple low bit-rate conversational usage, to
+   Internet video streaming with interleaved transmission, to high bit-
+   rate video-on-demand.
+
+Table of Contents
+
+   1.  Introduction..................................................  3
+       1.1.  The H.264 Codec.........................................  3
+       1.2.  Parameter Set Concept...................................  4
+       1.3.  Network Abstraction Layer Unit Types....................  5
+   2.  Conventions...................................................  6
+   3.  Scope.........................................................  6
+   4.  Definitions and Abbreviations.................................  6
+       4.1.  Definitions.............................................  6
+   5.  RTP Payload Format............................................  8
+       5.1.  RTP Header Usage........................................  8
+       5.2.  Common Structure of the RTP Payload Format.............. 11
+       5.3.  NAL Unit Octet Usage.................................... 12
+
+
+
+Wenger, et al.              Standards Track                     [Page 1]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+       5.4.  Packetization Modes..................................... 14
+       5.5.  Decoding Order Number (DON)............................. 15
+       5.6.  Single NAL Unit Packet.................................. 18
+       5.7.  Aggregation Packets..................................... 18
+       5.8.  Fragmentation Units (FUs)............................... 27
+   6.  Packetization Rules........................................... 31
+       6.1.  Common Packetization Rules.............................. 31
+       6.2.  Single NAL Unit Mode.................................... 32
+       6.3.  Non-Interleaved Mode.................................... 32
+       6.4.  Interleaved Mode........................................ 33
+   7.  De-Packetization Process (Informative)........................ 33
+       7.1.  Single NAL Unit and Non-Interleaved Mode................ 33
+       7.2.  Interleaved Mode........................................ 34
+       7.3.  Additional De-Packetization Guidelines.................. 36
+   8.  Payload Format Parameters..................................... 37
+       8.1.  MIME Registration....................................... 37
+       8.2.  SDP Parameters.......................................... 52
+       8.3.  Examples................................................ 58
+       8.4.  Parameter Set Considerations............................ 60
+   9.  Security Considerations....................................... 62
+   10. Congestion Control............................................ 63
+   11. IANA Considerations........................................... 64
+   12. Informative Appendix: Application Examples.................... 65
+       12.1. Video Telephony according to ITU-T Recommendation H.241
+             Annex A................................................. 65
+       12.2. Video Telephony, No Slice Data Partitioning, No NAL
+             Unit Aggregation........................................ 65
+       12.3. Video Telephony, Interleaved Packetization Using NAL
+             Unit Aggregation........................................ 66
+       12.4. Video Telephony with Data Partitioning.................. 66
+       12.5. Video Telephony or Streaming with FUs and Forward
+             Error Correction........................................ 67
+       12.6. Low Bit-Rate Streaming.................................. 69
+       12.7. Robust Packet Scheduling in Video Streaming............. 70
+   13. Informative Appendix: Rationale for Decoding Order Number..... 71
+       13.1. Introduction............................................ 71
+       13.2. Example of Multi-Picture Slice Interleaving............. 71
+       13.3. Example of Robust Packet Scheduling..................... 73
+       13.4. Robust Transmission Scheduling of Redundant Coded
+             Slices.................................................. 77
+       13.5. Remarks on Other Design Possibilities................... 77
+   14. Acknowledgements.............................................. 78
+   15. References.................................................... 78
+       15.1. Normative References.................................... 78
+       15.2. Informative References.................................. 79
+   Authors' Addresses................................................ 81
+   Full Copyright Statement.......................................... 83
+
+
+
+
+Wenger, et al.              Standards Track                     [Page 2]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+1.  Introduction
+
+1.1.  The H.264 Codec
+
+   This memo specifies an RTP payload specification for the video coding
+   standard known as ITU-T Recommendation H.264 [1] and ISO/IEC
+   International Standard 14496 Part 10 [2] (both also known as Advanced
+   Video Coding, or AVC).  Recommendation H.264 was approved by ITU-T on
+   May 2003, and the approved draft specification is available for
+   public review [8].  In this memo the H.264 acronym is used for the
+   codec and the standard, but the memo is equally applicable to the
+   ISO/IEC counterpart of the coding standard.
+
+   The H.264 video codec has a very broad application range that covers
+   all forms of digital compressed video from, low bit-rate Internet
+   streaming applications to HDTV broadcast and Digital Cinema
+   applications with nearly lossless coding.  Compared to the current
+   state of technology, the overall performance of H.264 is such that
+   bit rate savings of 50% or more are reported.  Digital Satellite TV
+   quality, for example, was reported to be achievable at 1.5 Mbit/s,
+   compared to the current operation point of MPEG 2 video at around 3.5
+   Mbit/s [9].
+
+   The codec specification [1] itself distinguishes conceptually between
+   a video coding layer (VCL) and a network abstraction layer (NAL).
+   The VCL contains the signal processing functionality of the codec;
+   mechanisms such as transform, quantization, and motion compensated
+   prediction; and a loop filter.  It follows the general concept of
+   most of today's video codecs, a macroblock-based coder that uses
+   inter picture prediction with motion compensation and transform
+   coding of the residual signal.  The VCL encoder outputs slices: a bit
+   string that contains the macroblock data of an integer number of
+   macroblocks, and the information of the slice header (containing the
+   spatial address of the first macroblock in the slice, the initial
+   quantization parameter, and similar information).  Macroblocks in
+   slices are arranged in scan order unless a different macroblock
+   allocation is specified, by using the so-called Flexible Macroblock
+   Ordering syntax.  In-picture prediction is used only within a slice.
+   More information is provided in [9].
+
+   The Network Abstraction Layer (NAL) encoder encapsulates the slice
+   output of the VCL encoder into Network Abstraction Layer Units (NAL
+   units), which are suitable for transmission over packet networks or
+   use in packet oriented multiplex environments.  Annex B of H.264
+   defines an encapsulation process to transmit such NAL units over
+   byte-stream oriented networks.  In the scope of this memo, Annex B is
+   not relevant.
+
+
+
+
+Wenger, et al.              Standards Track                     [Page 3]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Internally, the NAL uses NAL units.  A NAL unit consists of a one-
+   byte header and the payload byte string.  The header indicates the
+   type of the NAL unit, the (potential) presence of bit errors or
+   syntax violations in the NAL unit payload, and information regarding
+   the relative importance of the NAL unit for the decoding process.
+   This RTP payload specification is designed to be unaware of the bit
+   string in the NAL unit payload.
+
+   One of the main properties of H.264 is the complete decoupling of the
+   transmission time, the decoding time, and the sampling or
+   presentation time of slices and pictures.  The decoding process
+   specified in H.264 is unaware of time, and the H.264 syntax does not
+   carry information such as the number of skipped frames (as is common
+   in the form of the Temporal Reference in earlier video compression
+   standards).  Also, there are NAL units that affect many pictures and
+   that are, therefore, inherently timeless.  For this reason, the
+   handling of the RTP timestamp requires some special considerations
+   for NAL units for which the sampling or presentation time is not
+   defined or, at transmission time, unknown.
+
+1.2.  Parameter Set Concept
+
+   One very fundamental design concept of H.264 is to generate self-
+   contained packets, to make mechanisms such as the header duplication
+   of RFC 2429 [10] or MPEG-4's Header Extension Code (HEC) [11]
+   unnecessary.  This was achieved by decoupling information relevant to
+   more than one slice from the media stream.  This higher layer meta
+   information should be sent reliably, asynchronously, and in advance
+   from the RTP packet stream that contains the slice packets.
+   (Provisions for sending this information in-band are also available
+   for applications that do not have an out-of-band transport channel
+   appropriate for the purpose.)  The combination of the higher-level
+   parameters is called a parameter set.  The H.264 specification
+   includes two types of parameter sets: sequence parameter set and
+   picture parameter set.  An active sequence parameter set remains
+   unchanged throughout a coded video sequence, and an active picture
+   parameter set remains unchanged within a coded picture.  The sequence
+   and picture parameter set structures contain information such as
+   picture size, optional coding modes employed, and macroblock to slice
+   group map.
+
+   To be able to change picture parameters (such as the picture size)
+   without having to transmit parameter set updates synchronously to the
+   slice packet stream, the encoder and decoder can maintain a list of
+   more than one sequence and picture parameter set.  Each slice header
+   contains a codeword that indicates the sequence and picture parameter
+   set to be used.
+
+
+
+
+Wenger, et al.              Standards Track                     [Page 4]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   This mechanism allows the decoupling of the transmission of parameter
+   sets from the packet stream, and the transmission of them by external
+   means (e.g., as a side effect of the capability exchange), or through
+   a (reliable or unreliable) control protocol.  It may even be possible
+   that they are never transmitted but are fixed by an application
+   design specification.
+
+1.3.  Network Abstraction Layer Unit Types
+
+   Tutorial information on the NAL design can be found in [12], [13],
+   and [14].
+
+   All NAL units consist of a single NAL unit type octet, which also
+   co-serves as the payload header of this RTP payload format.  The
+   payload of a NAL unit follows immediately.
+
+   The syntax and semantics of the NAL unit type octet are specified in
+   [1], but the essential properties of the NAL unit type octet are
+   summarized below.  The NAL unit type octet has the following format:
+
+      +---------------+
+      |0|1|2|3|4|5|6|7|
+      +-+-+-+-+-+-+-+-+
+      |F|NRI|  Type   |
+      +---------------+
+
+   The semantics of the components of the NAL unit type octet, as
+   specified in the H.264 specification, are described briefly below.
+
+   F: 1 bit
+      forbidden_zero_bit.  The H.264 specification declares a value of
+      1 as a syntax violation.
+
+   NRI: 2 bits
+      nal_ref_idc.  A value of 00 indicates that the content of the NAL
+      unit is not used to reconstruct reference pictures for inter
+      picture prediction.  Such NAL units can be discarded without
+      risking the integrity of the reference pictures.  Values greater
+      than 00 indicate that the decoding of the NAL unit is required to
+      maintain the integrity of the reference pictures.
+
+   Type: 5 bits
+      nal_unit_type.  This component specifies the NAL unit payload type
+      as defined in table 7-1 of [1], and later within this memo.  For a
+      reference of all currently defined NAL unit types and their
+      semantics, please refer to section 7.4.1 in [1].
+
+
+
+
+
+Wenger, et al.              Standards Track                     [Page 5]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   This memo introduces new NAL unit types, which are presented in
+   section 5.2.  The NAL unit types defined in this memo are marked as
+   unspecified in [1].  Moreover, this specification extends the
+   semantics of F and NRI as described in section 5.3.
+
+2.  Conventions
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in BCP 14, RFC 2119 [3].
+
+   This specification uses the notion of setting and clearing a bit when
+   bit fields are handled.  Setting a bit is the same as assigning that
+   bit the value of 1 (On).  Clearing a bit is the same as assigning
+   that bit the value of 0 (Off).
+
+3.  Scope
+
+   This payload specification can only be used to carry the "naked"
+   H.264 NAL unit stream over RTP, and not the bitstream format
+   discussed in Annex B of H.264.  Likely, the first applications of
+   this specification will be in the conversational multimedia field,
+   video telephony or video conferencing, but the payload format also
+   covers other applications, such as Internet streaming and TV over IP.
+
+4.  Definitions and Abbreviations
+
+4.1.  Definitions
+
+   This document uses the definitions of [1].  The following terms,
+   defined in [1], are summed up for convenience:
+
+      access unit: A set of NAL units always containing a primary coded
+      picture.  In addition to the primary coded picture, an access unit
+      may also contain one or more redundant coded pictures or other NAL
+      units not containing slices or slice data partitions of a coded
+      picture.  The decoding of an access unit always results in a
+      decoded picture.
+
+      coded video sequence: A sequence of access units that consists, in
+      decoding order, of an instantaneous decoding refresh (IDR) access
+      unit followed by zero or more non-IDR access units including all
+      subsequent access units up to but not including any subsequent IDR
+      access unit.
+
+      IDR access unit: An access unit in which the primary coded picture
+      is an IDR picture.
+
+
+
+
+Wenger, et al.              Standards Track                     [Page 6]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+      IDR picture: A coded picture containing only slices with I or SI
+      slice types that causes a "reset" in the decoding process.  After
+      the decoding of an IDR picture, all following coded pictures in
+      decoding order can be decoded without inter prediction from any
+      picture decoded prior to the IDR picture.
+
+      primary coded picture: The coded representation of a picture to be
+      used by the decoding process for a bitstream conforming to H.264.
+      The primary coded picture contains all macroblocks of the picture.
+
+      redundant coded picture: A coded representation of a picture or a
+      part of a picture.  The content of a redundant coded picture shall
+      not be used by the decoding process for a bitstream conforming to
+      H.264.  The content of a redundant coded picture may be used by
+      the decoding process for a bitstream that contains errors or
+      losses.
+
+      VCL NAL unit: A collective term used to refer to coded slice and
+      coded data partition NAL units.
+
+   In addition, the following definitions apply:
+
+      decoding order number (DON): A field in the payload structure, or
+      a derived variable indicating NAL unit decoding order.  Values of
+      DON are in the range of 0 to 65535, inclusive.  After reaching the
+      maximum value, the value of DON wraps around to 0.
+
+      NAL unit decoding order: A NAL unit order that conforms to the
+      constraints on NAL unit order given in section 7.4.1.2 in [1].
+
+      transmission order: The order of packets in ascending RTP sequence
+      number order (in modulo arithmetic).  Within an aggregation
+      packet, the NAL unit transmission order is the same as the order
+      of appearance of NAL units in the packet.
+
+      media aware network element (MANE): A network element, such as a
+      middlebox or application layer gateway that is capable of parsing
+      certain aspects of the RTP payload headers or the RTP payload and
+      reacting to the contents.
+
+         Informative note: The concept of a MANE goes beyond normal
+         routers or gateways in that a MANE has to be aware of the
+         signaling (e.g., to learn about the payload type mappings of
+         the media streams), and in that it has to be trusted when
+         working with SRTP.  The advantage of using MANEs is that they
+         allow packets to be dropped according to the needs of the media
+         coding.  For example, if a MANE has to drop packets due to
+         congestion on a certain link, it can identify those packets
+
+
+
+Wenger, et al.              Standards Track                     [Page 7]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+         whose dropping has the smallest negative impact on the user
+         experience and remove them in order to remove the congestion
+         and/or keep the delay low.
+
+   Abbreviations
+
+      DON:        Decoding Order Number
+      DONB:       Decoding Order Number Base
+      DOND:       Decoding Order Number Difference
+      FEC:        Forward Error Correction
+      FU:         Fragmentation Unit
+      IDR:        Instantaneous Decoding Refresh
+      IEC:        International Electrotechnical Commission
+      ISO:        International Organization for Standardization
+      ITU-T:      International Telecommunication Union,
+                  Telecommunication Standardization Sector
+      MANE:       Media Aware Network Element
+      MTAP:       Multi-Time Aggregation Packet
+      MTAP16:     MTAP with 16-bit timestamp offset
+      MTAP24:     MTAP with 24-bit timestamp offset
+      NAL:        Network Abstraction Layer
+      NALU:       NAL Unit
+      SEI:        Supplemental Enhancement Information
+      STAP:       Single-Time Aggregation Packet
+      STAP-A:     STAP type A
+      STAP-B:     STAP type B
+      TS:         Timestamp
+      VCL:        Video Coding Layer
+
+5.  RTP Payload Format
+
+5.1.  RTP Header Usage
+
+   The format of the RTP header is specified in RFC 3550 [4] and
+   reprinted in Figure 1 for convenience.  This payload format uses the
+   fields of the header in a manner consistent with that specification.
+
+   When one NAL unit is encapsulated per RTP packet, the RECOMMENDED RTP
+   payload format is specified in section 5.6.  The RTP payload (and the
+   settings for some RTP header bits) for aggregation packets and
+   fragmentation units are specified in sections 5.7 and 5.8,
+   respectively.
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                     [Page 8]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                           timestamp                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           synchronization source (SSRC) identifier            |
+      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+      |            contributing source (CSRC) identifiers             |
+      |                             ....                              |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 1.  RTP header according to RFC 3550
+
+   The RTP header information to be set according to this RTP payload
+   format is set as follows:
+
+   Marker bit (M): 1 bit
+      Set for the very last packet of the access unit indicated by the
+      RTP timestamp, in line with the normal use of the M bit in video
+      formats, to allow an efficient playout buffer handling.  For
+      aggregation packets (STAP and MTAP), the marker bit in the RTP
+      header MUST be set to the value that the marker bit of the last
+      NAL unit of the aggregation packet would have been if it were
+      transported in its own RTP packet.  Decoders MAY use this bit as
+      an early indication of the last packet of an access unit, but MUST
+      NOT rely on this property.
+
+         Informative note: Only one M bit is associated with an
+         aggregation packet carrying multiple NAL units.  Thus, if a
+         gateway has re-packetized an aggregation packet into several
+         packets, it cannot reliably set the M bit of those packets.
+
+   Payload type (PT): 7 bits
+      The assignment of an RTP payload type for this new packet format
+      is outside the scope of this document and will not be specified
+      here.  The assignment of a payload type has to be performed either
+      through the profile used or in a dynamic way.
+
+   Sequence number (SN): 16 bits
+      Set and used in accordance with RFC 3550.  For the single NALU and
+      non-interleaved packetization mode, the sequence number is used to
+      determine decoding order for the NALU.
+
+   Timestamp: 32 bits
+      The RTP timestamp is set to the sampling timestamp of the content.
+      A 90 kHz clock rate MUST be used.
+
+
+
+Wenger, et al.              Standards Track                     [Page 9]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+      If the NAL unit has no timing properties of its own (e.g.,
+      parameter set and SEI NAL units), the RTP timestamp is set to the
+      RTP timestamp of the primary coded picture of the access unit in
+      which the NAL unit is included, according to section 7.4.1.2 of
+      [1].
+
+      The setting of the RTP Timestamp for MTAPs is defined in section
+      5.7.2.
+
+      Receivers SHOULD ignore any picture timing SEI messages included
+      in access units that have only one display timestamp.  Instead,
+      receivers SHOULD use the RTP timestamp for synchronizing the
+      display process.
+
+      RTP senders SHOULD NOT transmit picture timing SEI messages for
+      pictures that are not supposed to be displayed as multiple fields.
+
+      If one access unit has more than one display timestamp carried in
+      a picture timing SEI message, then the information in the SEI
+      message SHOULD be treated as relative to the RTP timestamp, with
+      the earliest event occurring at the time given by the RTP
+      timestamp, and subsequent events later, as given by the difference
+      in SEI message picture timing values.  Let tSEI1, tSEI2, ...,
+      tSEIn be the display timestamps carried in the SEI message of an
+      access unit, where tSEI1 is the earliest of all such timestamps.
+      Let tmadjst() be a function that adjusts the SEI messages time
+      scale to a 90-kHz time scale.  Let TS be the RTP timestamp.  Then,
+      the display time for the event associated with tSEI1 is TS.  The
+      display time for the event with tSEIx, where x is [2..n] is TS +
+      tmadjst (tSEIx - tSEI1).
+
+         Informative note: Displaying coded frames as fields is needed
+         commonly in an operation known as 3:2 pulldown, in which film
+         content that consists of coded frames is displayed on a display
+         using interlaced scanning.  The picture timing SEI message
+         enables carriage of multiple timestamps for the same coded
+         picture, and therefore the 3:2 pulldown process is perfectly
+         controlled.  The picture timing SEI message mechanism is
+         necessary because only one timestamp per coded frame can be
+         conveyed in the RTP timestamp.
+
+         Informative note: Because H.264 allows the decoding order to be
+         different from the display order, values of RTP timestamps may
+         not be monotonically non-decreasing as a function of RTP
+         sequence numbers.  Furthermore, the value for interarrival
+         jitter reported in the RTCP reports may not be a trustworthy
+         indication of the network performance, as the calculation rules
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 10]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+         for interarrival jitter (section 6.4.1 of RFC 3550) assume that
+         the RTP timestamp of a packet is directly proportional to its
+         transmission time.
+
+5.2.  Common Structure of the RTP Payload Format
+
+   The payload format defines three different basic payload structures.
+   A receiver can identify the payload structure by the first byte of
+   the RTP payload, which co-serves as the RTP payload header and, in
+   some cases, as the first byte of the payload.  This byte is always
+   structured as a NAL unit header.  The NAL unit type field indicates
+   which structure is present.  The possible structures are as follows:
+
+   Single NAL Unit Packet: Contains only a single NAL unit in the
+   payload.  The NAL header type field will be equal to the original NAL
+   unit type; i.e., in the range of 1 to 23, inclusive.  Specified in
+   section 5.6.
+
+   Aggregation packet: Packet type used to aggregate multiple NAL units
+   into a single RTP payload.  This packet exists in four versions, the
+   Single-Time Aggregation Packet type A (STAP-A), the Single-Time
+   Aggregation Packet type B (STAP-B), Multi-Time Aggregation Packet
+   (MTAP) with 16-bit offset (MTAP16), and Multi-Time Aggregation Packet
+   (MTAP) with 24-bit offset (MTAP24).  The NAL unit type numbers
+   assigned for STAP-A, STAP-B, MTAP16, and MTAP24 are 24, 25, 26, and
+   27, respectively.  Specified in section 5.7.
+
+   Fragmentation unit: Used to fragment a single NAL unit over multiple
+   RTP packets.  Exists with two versions, FU-A and FU-B, identified
+   with the NAL unit type numbers 28 and 29, respectively.  Specified in
+   section 5.8.
+
+   Table 1.  Summary of NAL unit types and their payload structures
+
+      Type   Packet    Type name                        Section
+      ---------------------------------------------------------
+      0      undefined                                    -
+      1-23   NAL unit  Single NAL unit packet per H.264   5.6
+      24     STAP-A    Single-time aggregation packet     5.7.1
+      25     STAP-B    Single-time aggregation packet     5.7.1
+      26     MTAP16    Multi-time aggregation packet      5.7.2
+      27     MTAP24    Multi-time aggregation packet      5.7.2
+      28     FU-A      Fragmentation unit                 5.8
+      29     FU-B      Fragmentation unit                 5.8
+      30-31  undefined                                    -
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 11]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+      Informative note: This specification does not limit the size of
+      NAL units encapsulated in single NAL unit packets and
+      fragmentation units.  The maximum size of a NAL unit encapsulated
+      in any aggregation packet is 65535 bytes.
+
+5.3.  NAL Unit Octet Usage
+
+   The structure and semantics of the NAL unit octet were introduced in
+   section 1.3.  For convenience, the format of the NAL unit type octet
+   is reprinted below:
+
+      +---------------+
+      |0|1|2|3|4|5|6|7|
+      +-+-+-+-+-+-+-+-+
+      |F|NRI|  Type   |
+      +---------------+
+
+   This section specifies the semantics of F and NRI according to this
+   specification.
+
+   F: 1 bit
+      forbidden_zero_bit.  A value of 0 indicates that the NAL unit type
+      octet and payload should not contain bit errors or other syntax
+      violations.  A value of 1 indicates that the NAL unit type octet
+      and payload may contain bit errors or other syntax violations.
+
+      MANEs SHOULD set the F bit to indicate detected bit errors in the
+      NAL unit.  The H.264 specification requires that the F bit is
+      equal to 0.  When the F bit is set, the decoder is advised that
+      bit errors or any other syntax violations may be present in the
+      payload or in the NAL unit type octet.  The simplest decoder
+      reaction to a NAL unit in which the F bit is equal to 1 is to
+      discard such a NAL unit and to conceal the lost data in the
+      discarded NAL unit.
+
+   NRI: 2 bits
+      nal_ref_idc.  The semantics of value 00 and a non-zero value
+      remain unchanged from the H.264 specification.  In other words, a
+      value of 00 indicates that the content of the NAL unit is not used
+      to reconstruct reference pictures for inter picture prediction.
+      Such NAL units can be discarded without risking the integrity of
+      the reference pictures.  Values greater than 00 indicate that the
+      decoding of the NAL unit is required to maintain the integrity of
+      the reference pictures.
+
+      In addition to the specification above, according to this RTP
+      payload specification, values of NRI greater than 00 indicate the
+      relative transport priority, as determined by the encoder.  MANEs
+
+
+
+Wenger, et al.              Standards Track                    [Page 12]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+      can use this information to protect more important NAL units
+      better than they do less important NAL units.  The highest
+      transport priority is 11, followed by 10, and then by 01; finally,
+      00 is the lowest.
+
+         Informative note: Any non-zero value of NRI is handled
+         identically in H.264 decoders.  Therefore, receivers need not
+         manipulate the value of NRI when passing NAL units to the
+         decoder.
+
+      An H.264 encoder MUST set the value of NRI according to the H.264
+      specification (subclause 7.4.1) when the value of nal_unit_type is
+      in the range of 1 to 12, inclusive.  In particular, the H.264
+      specification requires that the value of NRI SHALL be equal to 0
+      for all NAL units having nal_unit_type equal to 6, 9, 10, 11, or
+      12.
+
+      For NAL units having nal_unit_type equal to 7 or 8 (indicating a
+      sequence parameter set or a picture parameter set, respectively),
+      an H.264 encoder SHOULD set the value of NRI to 11 (in binary
+      format).  For coded slice NAL units of a primary coded picture
+      having nal_unit_type equal to 5 (indicating a coded slice
+      belonging to an IDR picture), an H.264 encoder SHOULD set the
+      value of NRI to 11 (in binary format).
+
+      For a mapping of the remaining nal_unit_types to NRI values, the
+      following example MAY be used and has been shown to be efficient
+      in a certain environment [13].  Other mappings MAY also be
+      desirable, depending on the application and the H.264/AVC Annex A
+      profile in use.
+
+         Informative note: Data Partitioning is not available in certain
+         profiles; e.g., in the Main or Baseline profiles.
+         Consequently, the nal unit types 2, 3, and 4 can occur only if
+         the video bitstream conforms to a profile in which data
+         partitioning is allowed and not in streams that conform to the
+         Main or Baseline profiles.
+
+      Table 2.  Example of NRI values for coded slices and coded slice
+      data partitions of primary coded reference pictures
+
+      NAL Unit Type     Content of NAL unit              NRI (binary)
+      ----------------------------------------------------------------
+       1              non-IDR coded slice                         10
+       2              Coded slice data partition A                10
+       3              Coded slice data partition B                01
+       4              Coded slice data partition C                01
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 13]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+         Informative note: As mentioned before, the NRI value of non-
+         reference pictures is 00 as mandated by H.264/AVC.
+
+      An H.264 encoder SHOULD set the value of NRI for coded slice and
+      coded slice data partition NAL units of redundant coded reference
+      pictures equal to 01 (in binary format).
+
+      Definitions of the values for NRI for NAL unit types 24 to 29,
+      inclusive, are given in sections 5.7 and 5.8 of this memo.
+
+      No recommendation for the value of NRI is given for NAL units
+      having nal_unit_type in the range of 13 to 23, inclusive, because
+      these values are reserved for ITU-T and ISO/IEC.  No
+      recommendation for the value of NRI is given for NAL units having
+      nal_unit_type equal to 0 or in the range of 30 to 31, inclusive,
+      as the semantics of these values are not specified in this memo.
+
+5.4.  Packetization Modes
+
+   This memo specifies three cases of packetization modes:
+
+      o Single NAL unit mode
+      o Non-interleaved mode
+      o Interleaved mode
+
+   The single NAL unit mode is targeted for conversational systems that
+   comply with ITU-T Recommendation H.241 [15] (see section 12.1).  The
+   non-interleaved mode is targeted for conversational systems that may
+   not comply with ITU-T Recommendation H.241.  In the non-interleaved
+   mode, NAL units are transmitted in NAL unit decoding order.  The
+   interleaved mode is targeted for systems that do not require very low
+   end-to-end latency.  The interleaved mode allows transmission of NAL
+   units out of NAL unit decoding order.
+
+   The packetization mode in use MAY be signaled by the value of the
+   OPTIONAL packetization-mode MIME parameter or by external means.  The
+   used packetization mode governs which NAL unit types are allowed in
+   RTP payloads.  Table 3 summarizes the allowed NAL unit types for each
+   packetization mode.  Some NAL unit type values (indicated as
+   undefined in Table 3) are reserved for future extensions.  NAL units
+   of those types SHOULD NOT be sent by a sender and MUST be ignored by
+   a receiver.  For example, the Types 1-23, with the associated packet
+   type "NAL unit", are allowed in "Single NAL Unit Mode" and in "Non-
+   Interleaved Mode", but disallowed in "Interleaved Mode".
+   Packetization modes are explained in more detail in section 6.
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 14]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Table 3.  Summary of allowed NAL unit types for each packetization
+   mode (yes = allowed, no = disallowed, ig = ignore)
+
+      Type   Packet    Single NAL    Non-Interleaved    Interleaved
+                       Unit Mode           Mode             Mode
+      -------------------------------------------------------------
+
+      0      undefined     ig               ig               ig
+      1-23   NAL unit     yes              yes               no
+      24     STAP-A        no              yes               no
+      25     STAP-B        no               no              yes
+      26     MTAP16        no               no              yes
+      27     MTAP24        no               no              yes
+      28     FU-A          no              yes              yes
+      29     FU-B          no               no              yes
+      30-31  undefined     ig               ig               ig
+
+5.5.  Decoding Order Number (DON)
+
+   In the interleaved packetization mode, the transmission order of NAL
+   units is allowed to differ from the decoding order of the NAL units.
+   Decoding order number (DON) is a field in the payload structure or a
+   derived variable that indicates the NAL unit decoding order.
+   Rationale and examples of use cases for transmission out of decoding
+   order and for the use of DON are given in section 13.
+
+   The coupling of transmission and decoding order is controlled by the
+   OPTIONAL sprop-interleaving-depth MIME parameter as follows.  When
+   the value of the OPTIONAL sprop-interleaving-depth MIME parameter is
+   equal to 0 (explicitly or per default) or transmission of NAL units
+   out of their decoding order is disallowed by external means, the
+   transmission order of NAL units MUST conform to the NAL unit decoding
+   order.  When the value of the OPTIONAL sprop-interleaving-depth MIME
+   parameter is greater than 0 or transmission of NAL units out of their
+   decoding order is allowed by external means,
+
+   o  the order of NAL units in an MTAP16 and an MTAP24 is NOT REQUIRED
+      to be the NAL unit decoding order, and
+
+   o  the order of NAL units generated by decapsulating STAP-Bs, MTAPs,
+      and FUs in two consecutive packets is NOT REQUIRED to be the NAL
+      unit decoding order.
+
+   The RTP payload structures for a single NAL unit packet, an STAP-A,
+   and an FU-A do not include DON.  STAP-B and FU-B structures include
+   DON, and the structure of MTAPs enables derivation of DON as
+   specified in section 5.7.2.
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 15]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+      Informative note: When an FU-A occurs in interleaved mode, it
+      always follows an FU-B, which sets its DON.
+
+      Informative note: If a transmitter wants to encapsulate a single
+      NAL unit per packet and transmit packets out of their decoding
+      order, STAP-B packet type can be used.
+
+   In the single NAL unit packetization mode, the transmission order of
+   NAL units, determined by the RTP sequence number, MUST be the same as
+   their NAL unit decoding order.  In the non-interleaved packetization
+   mode, the transmission order of NAL units in single NAL unit packets,
+   STAP-As, and FU-As MUST be the same as their NAL unit decoding order.
+   The NAL units within an STAP MUST appear in the NAL unit decoding
+   order.  Thus, the decoding order is first provided through the
+   implicit order within a STAP, and second provided through the RTP
+   sequence number for the order between STAPs, FUs, and single NAL unit
+   packets.
+
+   Signaling of the value of DON for NAL units carried in STAP-B, MTAP,
+   and a series of fragmentation units starting with an FU-B is
+   specified in sections 5.7.1, 5.7.2, and 5.8, respectively.  The DON
+   value of the first NAL unit in transmission order MAY be set to any
+   value.  Values of DON are in the range of 0 to 65535, inclusive.
+   After reaching the maximum value, the value of DON wraps around to 0.
+
+   The decoding order of two NAL units contained in any STAP-B, MTAP, or
+   a series of fragmentation units starting with an FU-B is determined
+   as follows.  Let DON(i) be the decoding order number of the NAL unit
+   having index i in the transmission order.  Function don_diff(m,n) is
+   specified as follows:
+
+      If DON(m) == DON(n), don_diff(m,n) = 0
+
+      If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
+      don_diff(m,n) = DON(n) - DON(m)
+
+      If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
+      don_diff(m,n) = 65536 - DON(m) + DON(n)
+
+      If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
+      don_diff(m,n) = - (DON(m) + 65536 - DON(n))
+
+      If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
+      don_diff(m,n) = - (DON(m) - DON(n))
+
+   A positive value of don_diff(m,n) indicates that the NAL unit having
+   transmission order index n follows, in decoding order, the NAL unit
+   having transmission order index m.  When don_diff(m,n) is equal to 0,
+
+
+
+Wenger, et al.              Standards Track                    [Page 16]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   then the NAL unit decoding order of the two NAL units can be in
+   either order.  A negative value of don_diff(m,n) indicates that the
+   NAL unit having transmission order index n precedes, in decoding
+   order, the NAL unit having transmission order index m.
+
+   Values of DON related fields (DON, DONB, and DOND; see section 5.7)
+   MUST be such that the decoding order determined by the values of DON,
+   as specified above, conforms to the NAL unit decoding order.  If the
+   order of two NAL units in NAL unit decoding order is switched and the
+   new order does not conform to the NAL unit decoding order, the NAL
+   units MUST NOT have the same value of DON.  If the order of two
+   consecutive NAL units in the NAL unit stream is switched and the new
+   order still conforms to the NAL unit decoding order, the NAL units
+   MAY have the same value of DON.  For example, when arbitrary slice
+   order is allowed by the video coding profile in use, all the coded
+   slice NAL units of a coded picture are allowed to have the same value
+   of DON.  Consequently, NAL units having the same value of DON can be
+   decoded in any order, and two NAL units having a different value of
+   DON should be passed to the decoder in the order specified above.
+   When two consecutive NAL units in the NAL unit decoding order have a
+   different value of DON, the value of DON for the second NAL unit in
+   decoding order SHOULD be the value of DON for the first, incremented
+   by one.
+
+   An example of the decapsulation process to recover the NAL unit
+   decoding order is given in section 7.
+
+      Informative note: Receivers should not expect that the absolute
+      difference of values of DON for two consecutive NAL units in the
+      NAL unit decoding order will be equal to one, even in error-free
+      transmission.  An increment by one is not required, as at the time
+      of associating values of DON to NAL units, it may not be known
+      whether all NAL units are delivered to the receiver.  For example,
+      a gateway may not forward coded slice NAL units of non-reference
+      pictures or SEI NAL units when there is a shortage of bit rate in
+      the network to which the packets are forwarded.  In another
+      example, a live broadcast is interrupted by pre-encoded content,
+      such as commercials, from time to time.  The first intra picture
+      of a pre-encoded clip is transmitted in advance to ensure that it
+      is readily available in the receiver.  When transmitting the first
+      intra picture, the originator does not exactly know how many NAL
+      units will be encoded before the first intra picture of the pre-
+      encoded clip follows in decoding order.  Thus, the values of DON
+      for the NAL units of the first intra picture of the pre-encoded
+      clip have to be estimated when they are transmitted, and gaps in
+      values of DON may occur.
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 17]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+5.6.  Single NAL Unit Packet
+
+   The single NAL unit packet defined here MUST contain only one NAL
+   unit, of the types defined in [1].  This means that neither an
+   aggregation packet nor a fragmentation unit can be used within a
+   single NAL unit packet.  A NAL unit stream composed by decapsulating
+   single NAL unit packets in RTP sequence number order MUST conform to
+   the NAL unit decoding order.  The structure of the single NAL unit
+   packet is shown in Figure 2.
+
+      Informative note: The first byte of a NAL unit co-serves as the
+      RTP payload header.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |F|NRI|  type   |                                               |
+      +-+-+-+-+-+-+-+-+                                               |
+      |                                                               |
+      |               Bytes 2..n of a Single NAL unit                 |
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :...OPTIONAL RTP padding        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 2.  RTP payload format for single NAL unit packet
+
+5.7.  Aggregation Packets
+
+   Aggregation packets are the NAL unit aggregation scheme of this
+   payload specification.  The scheme is introduced to reflect the
+   dramatically different MTU sizes of two key target networks:
+   wireline IP networks (with an MTU size that is often limited by the
+   Ethernet MTU size; roughly 1500 bytes), and IP or non-IP (e.g., ITU-T
+   H.324/M) based wireless communication systems with preferred
+   transmission unit sizes of 254 bytes or less.  To prevent media
+   transcoding between the two worlds, and to avoid undesirable
+   packetization overhead, a NAL unit aggregation scheme is introduced.
+
+   Two types of aggregation packets are defined by this specification:
+
+   o  Single-time aggregation packet (STAP): aggregates NAL units with
+      identical NALU-time.  Two types of STAPs are defined, one without
+      DON (STAP-A) and another including DON (STAP-B).
+
+   o  Multi-time aggregation packet (MTAP): aggregates NAL units with
+      potentially differing NALU-time.  Two different MTAPs are defined,
+      differing in the length of the NAL unit timestamp offset.
+
+
+
+Wenger, et al.              Standards Track                    [Page 18]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   The term NALU-time is defined as the value that the RTP timestamp
+   would have if that NAL unit would be transported in its own RTP
+   packet.
+
+   Each NAL unit to be carried in an aggregation packet is encapsulated
+   in an aggregation unit.  Please see below for the four different
+   aggregation units and their characteristics.
+
+   The structure of the RTP payload format for aggregation packets is
+   presented in Figure 3.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |F|NRI|  type   |                                               |
+      +-+-+-+-+-+-+-+-+                                               |
+      |                                                               |
+      |             one or more aggregation units                     |
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :...OPTIONAL RTP padding        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 3.  RTP payload format for aggregation packets
+
+   MTAPs and STAPs share the following packetization rules:  The RTP
+   timestamp MUST be set to the earliest of the NALU times of all the
+   NAL units to be aggregated.  The type field of the NAL unit type
+   octet MUST be set to the appropriate value, as indicated in Table 4.
+   The F bit MUST be cleared if all F bits of the aggregated NAL units
+   are zero; otherwise, it MUST be set.  The value of NRI MUST be the
+   maximum of all the NAL units carried in the aggregation packet.
+
+      Table 4.  Type field for STAPs and MTAPs
+
+      Type   Packet    Timestamp offset   DON related fields
+                       field length       (DON, DONB, DOND)
+                       (in bits)          present
+      --------------------------------------------------------
+      24     STAP-A       0                 no
+      25     STAP-B       0                 yes
+      26     MTAP16      16                 yes
+      27     MTAP24      24                 yes
+
+   The marker bit in the RTP header is set to the value that the marker
+   bit of the last NAL unit of the aggregated packet would have if it
+   were transported in its own RTP packet.
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 19]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   The payload of an aggregation packet consists of one or more
+   aggregation units.  See sections 5.7.1 and 5.7.2 for the four
+   different types of aggregation units.  An aggregation packet can
+   carry as many aggregation units as necessary; however, the total
+   amount of data in an aggregation packet obviously MUST fit into an IP
+   packet, and the size SHOULD be chosen so that the resulting IP packet
+   is smaller than the MTU size.  An aggregation packet MUST NOT contain
+   fragmentation units specified in section 5.8.  Aggregation packets
+   MUST NOT be nested; i.e., an aggregation packet MUST NOT contain
+   another aggregation packet.
+
+5.7.1.  Single-Time Aggregation Packet
+
+   Single-time aggregation packet (STAP) SHOULD be used whenever NAL
+   units are aggregated that all share the same NALU-time.  The payload
+   of an STAP-A does not include DON and consists of at least one
+   single-time aggregation unit, as presented in Figure 4.  The payload
+   of an STAP-B consists of a 16-bit unsigned decoding order number
+   (DON) (in network byte order) followed by at least one single-time
+   aggregation unit, as presented in Figure 5.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                      :                                               |
+      +-+-+-+-+-+-+-+-+                                               |
+      |                                                               |
+      |                single-time aggregation units                  |
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 4.  Payload format for STAP-A
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                      :  decoding order number (DON)  |               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+      |                                                               |
+      |                single-time aggregation units                  |
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 5.  Payload format for STAP-B
+
+
+
+Wenger, et al.              Standards Track                    [Page 20]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   The DON field specifies the value of DON for the first NAL unit in an
+   STAP-B in transmission order.  For each successive NAL unit in
+   appearance order in an STAP-B, the value of DON is equal to (the
+   value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in
+   which '%' stands for the modulo operation.
+
+   A single-time aggregation unit consists of 16-bit unsigned size
+   information (in network byte order) that indicates the size of the
+   following NAL unit in bytes (excluding these two octets, but
+   including the NAL unit type octet of the NAL unit), followed by the
+   NAL unit itself, including its NAL unit type byte.  A single-time
+   aggregation unit is byte aligned within the RTP payload, but it may
+   not be aligned on a 32-bit word boundary.  Figure 6 presents the
+   structure of the single-time aggregation unit.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                      :        NAL unit size          |               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+      |                                                               |
+      |                           NAL unit                            |
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 6.  Structure for single-time aggregation unit
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 21]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Figure 7 presents an example of an RTP packet that contains an STAP-
+   A.  The STAP contains two single-time aggregation units, labeled as 1
+   and 2 in the figure.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                          RTP Header                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                         NALU 1 Data                           |
+      :                                                               :
+      +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |               | NALU 2 Size                   | NALU 2 HDR    |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                         NALU 2 Data                           |
+      :                                                               :
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :...OPTIONAL RTP padding        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 7.  An example of an RTP packet including an STAP-A and two
+                 single-time aggregation units
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 22]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Figure 8 presents an example of an RTP packet that contains an STAP-
+   B.  The STAP contains two single-time aggregation units, labeled as 1
+   and 2 in the figure.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                          RTP Header                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |STAP-B NAL HDR | DON                           | NALU 1 Size   |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      | NALU 1 Size   | NALU 1 HDR    | NALU 1 Data                   |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
+      :                                                               :
+      +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |               | NALU 2 Size                   | NALU 2 HDR    |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                       NALU 2 Data                             |
+      :                                                               :
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :...OPTIONAL RTP padding        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 8.  An example of an RTP packet including an STAP-B and two
+                 single-time aggregation units
+
+5.7.2.  Multi-Time Aggregation Packets (MTAPs)
+
+   The NAL unit payload of MTAPs consists of a 16-bit unsigned decoding
+   order number base (DONB) (in network byte order) and one or more
+   multi-time aggregation units, as presented in Figure 9.  DONB MUST
+   contain the value of DON for the first NAL unit in the NAL unit
+   decoding order among the NAL units of the MTAP.
+
+      Informative note: The first NAL unit in the NAL unit decoding
+      order is not necessarily the first NAL unit in the order in which
+      the NAL units are encapsulated in an MTAP.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 23]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+                      :  decoding order number base   |               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+      |                                                               |
+      |                 multi-time aggregation units                  |
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 9.  NAL unit payload format for MTAPs
+
+   Two different multi-time aggregation units are defined in this
+   specification.  Both of them consist of 16 bits unsigned size
+   information of the following NAL unit (in network byte order), an 8-
+   bit unsigned decoding order number difference (DOND), and n bits (in
+   network byte order) of timestamp offset (TS offset) for this NAL
+   unit, whereby n can be 16 or 24.  The choice between the different
+   MTAP types (MTAP16 and MTAP24) is application dependent: the larger
+   the timestamp offset is, the higher the flexibility of the MTAP, but
+   the overhead is also higher.
+
+   The structure of the multi-time aggregation units for MTAP16 and
+   MTAP24 are presented in Figures 10 and 11, respectively.  The
+   starting or ending position of an aggregation unit within a packet is
+   NOT REQUIRED to be on a 32-bit word boundary.  The DON of the
+   following NAL unit is equal to (DONB + DOND) % 65536, in which %
+   denotes the modulo operation.  This memo does not specify how the NAL
+   units within an MTAP are ordered, but, in most cases, NAL unit
+   decoding order SHOULD be used.
+
+   The timestamp offset field MUST be set to a value equal to the value
+   of the following formula: If the NALU-time is larger than or equal to
+   the RTP timestamp of the packet, then the timestamp offset equals
+   (the NALU-time of the NAL unit - the RTP timestamp of the packet).
+   If the NALU-time is smaller than the RTP timestamp of the packet,
+   then the timestamp offset is equal to the NALU-time + (2^32 - the RTP
+   timestamp of the packet).
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 24]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      :        NAL unit size          |      DOND     |  TS offset    |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |  TS offset    |                                               |
+      +-+-+-+-+-+-+-+-+              NAL unit                         |
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 10.  Multi-time aggregation unit for MTAP16
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      :        NALU unit size         |      DOND     |  TS offset    |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |         TS offset             |                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
+      |                              NAL unit                         |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 11.  Multi-time aggregation unit for MTAP24
+
+   For the "earliest" multi-time aggregation unit in an MTAP the
+   timestamp offset MUST be zero.  Hence, the RTP timestamp of the MTAP
+   itself is identical to the earliest NALU-time.
+
+      Informative note: The "earliest" multi-time aggregation unit is
+      the one that would have the smallest extended RTP timestamp among
+      all the aggregation units of an MTAP if the aggregation units were
+      encapsulated in single NAL unit packets.  An extended timestamp is
+      a timestamp that has more than 32 bits and is capable of counting
+      the wraparound of the timestamp field, thus enabling one to
+      determine the smallest value if the timestamp wraps.  Such an
+      "earliest" aggregation unit may not be the first one in the order
+      in which the aggregation units are encapsulated in an MTAP.  The
+      "earliest" NAL unit need not be the same as the first NAL unit in
+      the NAL unit decoding order either.
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 25]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Figure 12 presents an example of an RTP packet that contains a
+   multi-time aggregation packet of type MTAP16 that contains two
+   multi-time aggregation units, labeled as 1 and 2 in the figure.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                          RTP Header                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |MTAP16 NAL HDR |  decoding order number base   | NALU 1 Size   |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offset        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |  NALU 1 HDR   |  NALU 1 DATA                                  |
+      +-+-+-+-+-+-+-+-+                                               +
+      :                                                               :
+      +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |               | NALU 2 SIZE                   |  NALU 2 DOND  |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |       NALU 2 TS offset        |  NALU 2 HDR   |  NALU 2 DATA  |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
+      :                                                               :
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :...OPTIONAL RTP padding        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 12.  An RTP packet including a multi-time aggregation
+                  packet of type MTAP16 and two multi-time aggregation
+                  units
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 26]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Figure 13 presents an example of an RTP packet that contains a
+   multi-time aggregation packet of type MTAP24 that contains two
+   multi-time aggregation units, labeled as 1 and 2 in the figure.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                          RTP Header                           |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |MTAP24 NAL HDR |  decoding order number base   | NALU 1 Size   |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offs          |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |NALU 1 TS offs |  NALU 1 HDR   |  NALU 1 DATA                  |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
+      :                                                               :
+      +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |               | NALU 2 SIZE                   |  NALU 2 DOND  |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |       NALU 2 TS offset                        |  NALU 2 HDR   |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |  NALU 2 DATA                                                  |
+      :                                                               :
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :...OPTIONAL RTP padding        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 13.  An RTP packet including a multi-time aggregation
+                  packet of type MTAP24 and two multi-time aggregation
+                  units
+
+5.8.  Fragmentation Units (FUs)
+
+   This payload type allows fragmenting a NAL unit into several RTP
+   packets.  Doing so on the application layer instead of relying on
+   lower layer fragmentation (e.g., by IP) has the following advantages:
+
+   o  The payload format is capable of transporting NAL units bigger
+      than 64 kbytes over an IPv4 network that may be present in pre-
+      recorded video, particularly in High Definition formats (there is
+      a limit of the number of slices per picture, which results in a
+      limit of NAL units per picture, which may result in big NAL
+      units).
+
+   o  The fragmentation mechanism allows fragmenting a single picture
+      and applying generic forward error correction as described in
+      section 12.5.
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 27]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Fragmentation is defined only for a single NAL unit and not for any
+   aggregation packets.  A fragment of a NAL unit consists of an integer
+   number of consecutive octets of that NAL unit.  Each octet of the NAL
+   unit MUST be part of exactly one fragment of that NAL unit.
+   Fragments of the same NAL unit MUST be sent in consecutive order with
+   ascending RTP sequence numbers (with no other RTP packets within the
+   same RTP packet stream being sent between the first and last
+   fragment).  Similarly, a NAL unit MUST be reassembled in RTP sequence
+   number order.
+
+   When a NAL unit is fragmented and conveyed within fragmentation units
+   (FUs), it is referred to as a fragmented NAL unit.  STAPs and MTAPs
+   MUST NOT be fragmented.  FUs MUST NOT be nested; i.e., an FU MUST NOT
+   contain another FU.
+
+   The RTP timestamp of an RTP packet carrying an FU is set to the NALU
+   time of the fragmented NAL unit.
+
+   Figure 14 presents the RTP payload format for FU-As.  An FU-A
+   consists of a fragmentation unit indicator of one octet, a
+   fragmentation unit header of one octet, and a fragmentation unit
+   payload.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      | FU indicator  |   FU header   |                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
+      |                                                               |
+      |                         FU payload                            |
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :...OPTIONAL RTP padding        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 14.  RTP payload format for FU-A
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 28]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Figure 15 presents the RTP payload format for FU-Bs.  An FU-B
+   consists of a fragmentation unit indicator of one octet, a
+   fragmentation unit header of one octet, a decoding order number (DON)
+   (in network byte order), and a fragmentation unit payload.  In other
+   words, the structure of FU-B is the same as the structure of FU-A,
+   except for the additional DON field.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      | FU indicator  |   FU header   |               DON             |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
+      |                                                               |
+      |                         FU payload                            |
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               :...OPTIONAL RTP padding        |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      Figure 15.  RTP payload format for FU-B
+
+   NAL unit type FU-B MUST be used in the interleaved packetization mode
+   for the first fragmentation unit of a fragmented NAL unit.  NAL unit
+   type FU-B MUST NOT be used in any other case.  In other words, in the
+   interleaved packetization mode, each NALU that is fragmented has an
+   FU-B as the first fragment, followed by one or more FU-A fragments.
+
+   The FU indicator octet has the following format:
+
+      +---------------+
+      |0|1|2|3|4|5|6|7|
+      +-+-+-+-+-+-+-+-+
+      |F|NRI|  Type   |
+      +---------------+
+
+   Values equal to 28 and 29 in the Type field of the FU indicator octet
+   identify an FU-A and an FU-B, respectively.  The use of the F bit is
+   described in section 5.3.  The value of the NRI field MUST be set
+   according to the value of the NRI field in the fragmented NAL unit.
+
+   The FU header has the following format:
+
+      +---------------+
+      |0|1|2|3|4|5|6|7|
+      +-+-+-+-+-+-+-+-+
+      |S|E|R|  Type   |
+      +---------------+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 29]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   S: 1 bit
+      When set to one, the Start bit indicates the start of a fragmented
+      NAL unit.  When the following FU payload is not the start of a
+      fragmented NAL unit payload, the Start bit is set to zero.
+
+   E: 1 bit
+      When set to one, the End bit indicates the end of a fragmented NAL
+      unit, i.e., the last byte of the payload is also the last byte of
+      the fragmented NAL unit.  When the following FU payload is not the
+      last fragment of a fragmented NAL unit, the End bit is set to
+      zero.
+
+   R: 1 bit
+      The Reserved bit MUST be equal to 0 and MUST be ignored by the
+      receiver.
+
+   Type: 5 bits
+      The NAL unit payload type as defined in table 7-1 of [1].
+
+   The value of DON in FU-Bs is selected as described in section 5.5.
+
+      Informative note: The DON field in FU-Bs allows gateways to
+      fragment NAL units to FU-Bs without organizing the incoming NAL
+      units to the NAL unit decoding order.
+
+   A fragmented NAL unit MUST NOT be transmitted in one FU; i.e., the
+   Start bit and End bit MUST NOT both be set to one in the same FU
+   header.
+
+   The FU payload consists of fragments of the payload of the fragmented
+   NAL unit so that if the fragmentation unit payloads of consecutive
+   FUs are sequentially concatenated, the payload of the fragmented NAL
+   unit can be reconstructed.  The NAL unit type octet of the fragmented
+   NAL unit is not included as such in the fragmentation unit payload,
+   but rather the information of the NAL unit type octet of the
+   fragmented NAL unit is conveyed in F and NRI fields of the FU
+   indicator octet of the fragmentation unit and in the type field of
+   the FU header.  A FU payload MAY have any number of octets and MAY be
+   empty.
+
+      Informative note: Empty FUs are allowed to reduce the latency of a
+      certain class of senders in nearly lossless environments.  These
+      senders can be characterized in that they packetize NALU fragments
+      before the NALU is completely generated and, hence, before the
+      NALU size is known.  If zero-length NALU fragments were not
+      allowed, the sender would have to generate at least one bit of
+      data of the following fragment before the current fragment could
+      be sent.  Due to the characteristics of H.264, where sometimes
+
+
+
+Wenger, et al.              Standards Track                    [Page 30]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+      several macroblocks occupy zero bits, this is undesirable and can
+      add delay.  However, the (potential) use of zero-length NALUs
+      should be carefully weighed against the increased risk of the loss
+      of the NALU because of the additional packets employed for its
+      transmission.
+
+   If a fragmentation unit is lost, the receiver SHOULD discard all
+   following fragmentation units in transmission order corresponding to
+   the same fragmented NAL unit.
+
+   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
+   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
+   n of that NAL unit is not received.  In this case, the
+   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
+   syntax violation.
+
+6.  Packetization Rules
+
+   The packetization modes are introduced in section 5.2.  The
+   packetization rules common to more than one of the packetization
+   modes are specified in section 6.1.  The packetization rules for the
+   single NAL unit mode, the non-interleaved mode, and the interleaved
+   mode are specified in sections 6.2, 6.3, and 6.4, respectively.
+
+6.1.  Common Packetization Rules
+
+   All senders MUST enforce the following packetization rules regardless
+   of the packetization mode in use:
+
+   o  Coded slice NAL units or coded slice data partition NAL units
+      belonging to the same coded picture (and thus sharing the same RTP
+      timestamp value) MAY be sent in any order permitted by the
+      applicable profile defined in [1]; however, for delay-critical
+      systems, they SHOULD be sent in their original coding order to
+      minimize the delay.  Note that the coding order is not necessarily
+      the scan order, but the order the NAL packets become available to
+      the RTP stack.
+
+   o  Parameter sets are handled in accordance with the rules and
+      recommendations given in section 8.4.
+
+   o  MANEs MUST NOT duplicate any NAL unit except for sequence or
+      picture parameter set NAL units, as neither this memo nor the
+      H.264 specification provides means to identify duplicated NAL
+      units.  Sequence and picture parameter set NAL units MAY be
+      duplicated to make their correct reception more probable, but any
+      such duplication MUST NOT affect the contents of any active
+      sequence or picture parameter set.  Duplication SHOULD be
+
+
+
+Wenger, et al.              Standards Track                    [Page 31]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+      performed on the application layer and not by duplicating RTP
+      packets (with identical sequence numbers).
+
+   Senders using the non-interleaved mode and the interleaved mode MUST
+   enforce the following packetization rule:
+
+   o  MANEs MAY convert single NAL unit packets into one aggregation
+      packet, convert an aggregation packet into several single NAL unit
+      packets, or mix both concepts, in an RTP translator.  The RTP
+      translator SHOULD take into account at least the following
+      parameters: path MTU size, unequal protection mechanisms (e.g.,
+      through packet-based FEC according to RFC 2733 [18], especially
+      for sequence and picture parameter set NAL units and coded slice
+      data partition A NAL units), bearable latency of the system, and
+      buffering capabilities of the receiver.
+
+      Informative note: An RTP translator is required to handle RTCP as
+      per RFC 3550.
+
+6.2.  Single NAL Unit Mode
+
+   This mode is in use when the value of the OPTIONAL packetization-mode
+   MIME parameter is equal to 0, the packetization-mode is not present,
+   or no other packetization mode is signaled by external means.  All
+   receivers MUST support this mode.  It is primarily intended for low-
+   delay applications that are compatible with systems using ITU-T
+   Recommendation H.241 [15] (see section 12.1).  Only single NAL unit
+   packets MAY be used in this mode.  STAPs, MTAPs, and FUs MUST NOT be
+   used.  The transmission order of single NAL unit packets MUST comply
+   with the NAL unit decoding order.
+
+6.3.  Non-Interleaved Mode
+
+   This mode is in use when the value of the OPTIONAL packetization-mode
+   MIME parameter is equal to 1 or the mode is turned on by external
+   means.  This mode SHOULD be supported.  It is primarily intended for
+   low-delay applications.  Only single NAL unit packets, STAP-As, and
+   FU-As MAY be used in this mode.  STAP-Bs, MTAPs, and FU-Bs MUST NOT
+   be used.  The transmission order of NAL units MUST comply with the
+   NAL unit decoding order.
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 32]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+6.4.  Interleaved Mode
+
+   This mode is in use when the value of the OPTIONAL packetization-mode
+   MIME parameter is equal to 2 or the mode is turned on by external
+   means.  Some receivers MAY support this mode.  STAP-Bs, MTAPs, FU-As,
+   and FU-Bs MAY be used.  STAP-As and single NAL unit packets MUST NOT
+   be used.  The transmission order of packets and NAL units is
+   constrained as specified in section 5.5.
+
+7.  De-Packetization Process (Informative)
+
+   The de-packetization process is implementation dependent.  Therefore,
+   the following description should be seen as an example of a suitable
+   implementation.  Other schemes may be used as well.  Optimizations
+   relative to the described algorithms are likely possible.  Section
+   7.1 presents the de-packetization process for the single NAL unit and
+   non-interleaved packetization modes, whereas section 7.2 describes
+   the process for the interleaved mode.  Section 7.3 includes
+   additional decapsulation guidelines for intelligent receivers.
+
+   All normal RTP mechanisms related to buffer management apply.  In
+   particular, duplicated or outdated RTP packets (as indicated by the
+   RTP sequences number and the RTP timestamp) are removed.  To
+   determine the exact time for decoding, factors such as a possible
+   intentional delay to allow for proper inter-stream synchronization
+   must be factored in.
+
+7.1.  Single NAL Unit and Non-Interleaved Mode
+
+   The receiver includes a receiver buffer to compensate for
+   transmission delay jitter.  The receiver stores incoming packets in
+   reception order into the receiver buffer.  Packets are decapsulated
+   in RTP sequence number order.  If a decapsulated packet is a single
+   NAL unit packet, the NAL unit contained in the packet is passed
+   directly to the decoder.  If a decapsulated packet is an STAP-A, the
+   NAL units contained in the packet are passed to the decoder in the
+   order in which they are encapsulated in the packet.  If a
+   decapsulated packet is an FU-A, all the fragments of the fragmented
+   NAL unit are concatenated and passed to the decoder.
+
+      Informative note: If the decoder supports Arbitrary Slice Order,
+      coded slices of a picture can be passed to the decoder in any
+      order regardless of their reception and transmission order.
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 33]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+7.2.  Interleaved Mode
+
+   The general concept behind these de-packetization rules is to reorder
+   NAL units from transmission order to the NAL unit decoding order.
+
+   The receiver includes a receiver buffer, which is used to compensate
+   for transmission delay jitter and to reorder packets from
+   transmission order to the NAL unit decoding order.  In this section,
+   the receiver operation is described under the assumption that there
+   is no transmission delay jitter.  To make a difference from a
+   practical receiver buffer that is also used for compensation of
+   transmission delay jitter, the receiver buffer is here after called
+   the deinterleaving buffer in this section.  Receivers SHOULD also
+   prepare for transmission delay jitter; i.e., either reserve separate
+   buffers for transmission delay jitter buffering and deinterleaving
+   buffering or use a receiver buffer for both transmission delay jitter
+   and deinterleaving.  Moreover, receivers SHOULD take transmission
+   delay jitter into account in the buffering operation; e.g., by
+   additional initial buffering before starting of decoding and
+   playback.
+
+   This section is organized as follows: subsection 7.2.1 presents how
+   to calculate the size of the deinterleaving buffer.  Subsection 7.2.2
+   specifies the receiver process how to organize received NAL units to
+   the NAL unit decoding order.
+
+7.2.1.  Size of the Deinterleaving Buffer
+
+   When SDP Offer/Answer model or any other capability exchange
+   procedure is used in session setup, the properties of the received
+   stream SHOULD be such that the receiver capabilities are not
+   exceeded.  In the SDP Offer/Answer model, the receiver can indicate
+   its capabilities to allocate a deinterleaving buffer with the deint-
+   buf-cap MIME parameter.  The sender indicates the requirement for the
+   deinterleaving buffer size with the sprop-deint-buf-req MIME
+   parameter.  It is therefore RECOMMENDED to set the deinterleaving
+   buffer size, in terms of number of bytes, equal to or greater than
+   the value of sprop-deint-buf-req MIME parameter.  See section 8.1 for
+   further information on deint-buf-cap and sprop-deint-buf-req MIME
+   parameters and section 8.2.2 for further information on their use in
+   SDP Offer/Answer model.
+
+   When a declarative session description is used in session setup, the
+   sprop-deint-buf-req MIME parameter signals the requirement for the
+   deinterleaving buffer size.  It is therefore RECOMMENDED to set the
+   deinterleaving buffer size, in terms of number of bytes, equal to or
+   greater than the value of sprop-deint-buf-req MIME parameter.
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 34]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+7.2.2.  Deinterleaving Process
+
+   There are two buffering states in the receiver: initial buffering and
+   buffering while playing.  Initial buffering occurs when the RTP
+   session is initialized.  After initial buffering, decoding and
+   playback is started, and the buffering-while-playing mode is used.
+
+   Regardless of the buffering state, the receiver stores incoming NAL
+   units, in reception order, in the deinterleaving buffer as follows.
+   NAL units of aggregation packets are stored in the deinterleaving
+   buffer individually.  The value of DON is calculated and stored for
+   all NAL units.
+
+   The receiver operation is described below with the help of the
+   following functions and constants:
+
+   o  Function AbsDON is specified in section 8.1.
+
+   o  Function don_diff is specified in section 5.5.
+
+   o  Constant N is the value of the OPTIONAL sprop-interleaving-depth
+      MIME type parameter (see section 8.1) incremented by 1.
+
+   Initial buffering lasts until one of the following conditions is
+   fulfilled:
+
+   o  There are N VCL NAL units in the deinterleaving buffer.
+
+   o  If sprop-max-don-diff is present, don_diff(m,n) is greater than
+      the value of sprop-max-don-diff, in which n corresponds to the NAL
+      unit having the greatest value of AbsDON among the received NAL
+      units and m corresponds to the NAL unit having the smallest value
+      of AbsDON among the received NAL units.
+
+   o  Initial buffering has lasted for the duration equal to or greater
+      than the value of the OPTIONAL sprop-init-buf-time MIME parameter.
+
+   The NAL units to be removed from the deinterleaving buffer are
+   determined as follows:
+
+   o  If the deinterleaving buffer contains at least N VCL NAL units,
+      NAL units are removed from the deinterleaving buffer and passed to
+      the decoder in the order specified below until the buffer contains
+      N-1 VCL NAL units.
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 35]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   o  If sprop-max-don-diff is present, all NAL units m for which
+      don_diff(m,n) is greater than sprop-max-don-diff are removed from
+      the deinterleaving buffer and passed to the decoder in the order
+      specified below.  Herein, n corresponds to the NAL unit having the
+      greatest value of AbsDON among the received NAL units.
+
+   The order in which NAL units are passed to the decoder is specified
+   as follows:
+
+   o  Let PDON be a variable that is initialized to 0 at the beginning
+      of the an RTP session.
+
+   o  For each NAL unit associated with a value of DON, a DON distance
+      is calculated as follows.  If the value of DON of the NAL unit is
+      larger than the value of PDON, the DON distance is equal to DON -
+      PDON.  Otherwise, the DON distance is equal to 65535 - PDON + DON
+      + 1.
+
+   o  NAL units are delivered to the decoder in ascending order of DON
+      distance.  If several NAL units share the same value of DON
+      distance, they can be passed to the decoder in any order.
+
+   o  When a desired number of NAL units have been passed to the
+      decoder, the value of PDON is set to the value of DON for the last
+      NAL unit passed to the decoder.
+
+7.3.  Additional De-Packetization Guidelines
+
+   The following additional de-packetization rules may be used to
+   implement an operational H.264 de-packetizer:
+
+   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
+      coded slice data partitions A (DPAs).  If a lost DPA is found, a
+      gateway may decide not to send the corresponding coded slice data
+      partitions B and C, as their information is meaningless for H.264
+      decoders.  In this way a MANE can reduce network load by
+      discarding useless packets without parsing a complex bitstream.
+
+   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
+      FUs.  If a lost FU is found, a gateway may decide not to send the
+      following FUs of the same fragmented NAL unit, as their
+      information is meaningless for H.264 decoders.  In this way a MANE
+      can reduce network load by discarding useless packets without
+      parsing a complex bitstream.
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 36]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   o  Intelligent receivers having to discard packets or NALUs should
+      first discard all packets/NALUs in which the value of the NRI
+      field of the NAL unit type octet is equal to 0.  This will
+      minimize the impact on user experience and keep the reference
+      pictures intact.  If more packets have to be discarded, then
+      packets with a numerically lower NRI value should be discarded
+      before packets with a numerically higher NRI value.  However,
+      discarding any packets with an NRI bigger than 0 very likely leads
+      to decoder drift and SHOULD be avoided.
+
+8.  Payload Format Parameters
+
+   This section specifies the parameters that MAY be used to select
+   optional features of the payload format and certain features of the
+   bitstream.  The parameters are specified here as part of the MIME
+   subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec.  A
+   mapping of the parameters into the Session Description Protocol (SDP)
+   [5] is also provided for applications that use SDP.  Equivalent
+   parameters could be defined elsewhere for use with control protocols
+   that do not use MIME or SDP.
+
+   Some parameters provide a receiver with the properties of the stream
+   that will be sent.  The name of all these parameters starts with
+   "sprop" for stream properties.  Some of these "sprop" parameters are
+   limited by other payload or codec configuration parameters.  For
+   example, the sprop-parameter-sets parameter is constrained by the
+   profile-level-id parameter.  The media sender selects all "sprop"
+   parameters rather than the receiver.  This uncommon characteristic of
+   the "sprop" parameters may not be compatible with some signaling
+   protocol concepts, in which case the use of these parameters SHOULD
+   be avoided.
+
+8.1.  MIME Registration
+
+   The MIME subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec is
+   allocated from the IETF tree.
+
+   The receiver MUST ignore any unspecified parameter.
+
+   Media Type name:     video
+
+   Media subtype name:  H264
+
+   Required parameters: none
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 37]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   OPTIONAL parameters:
+       profile-level-id:
+                        A base16 [6] (hexadecimal) representation of
+                        the following three bytes in the sequence
+                        parameter set NAL unit specified in [1]: 1)
+                        profile_idc, 2) a byte herein referred to as
+                        profile-iop, composed of the values of
+                        constraint_set0_flag, constraint_set1_flag,
+                        constraint_set2_flag, and reserved_zero_5bits
+                        in bit-significance order, starting from the
+                        most significant bit, and 3) level_idc.  Note
+                        that reserved_zero_5bits is required to be
+                        equal to 0 in [1], but other values for it may
+                        be specified in the future by ITU-T or ISO/IEC.
+
+                        If the profile-level-id parameter is used to
+                        indicate properties of a NAL unit stream, it
+                        indicates the profile and level that a decoder
+                        has to support in order to comply with [1] when
+                        it decodes the stream.  The profile-iop byte
+                        indicates whether the NAL unit stream also
+                        obeys all constraints of the indicated profiles
+                        as follows.  If bit 7 (the most significant
+                        bit), bit 6, or bit 5 of profile-iop is equal
+                        to 1, all constraints of the Baseline profile,
+                        the Main profile, or the Extended profile,
+                        respectively, are obeyed in the NAL unit
+                        stream.
+
+                        If the profile-level-id parameter is used for
+                        capability exchange or session setup procedure,
+                        it indicates the profile that the codec
+                        supports and the highest level
+                        supported for the signaled profile.  The
+                        profile-iop byte indicates whether the codec
+                        has additional limitations whereby only the
+                        common subset of the algorithmic features and
+                        limitations of the profiles signaled with the
+                        profile-iop byte and of the profile indicated
+                        by profile_idc is supported by the codec.  For
+                        example, if a codec supports only the common
+                        subset of the coding tools of the Baseline
+                        profile and the Main profile at level 2.1 and
+                        below, the profile-level-id becomes 42E015, in
+                        which 42 stands for the Baseline profile, E0
+                        indicates that only the common subset for all
+                        profiles is supported, and 15 indicates level
+                        2.1.
+
+
+
+Wenger, et al.              Standards Track                    [Page 38]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+                            Informative note: Capability exchange and
+                            session setup procedures should provide
+                            means to list the capabilities for each
+                            supported codec profile separately.  For
+                            example, the one-of-N codec selection
+                            procedure of the SDP Offer/Answer model can
+                            be used (section 10.2 of [7]).
+
+                        If no profile-level-id is present, the Baseline
+                        Profile without additional constraints at Level
+                        1 MUST be implied.
+
+       max-mbps, max-fs, max-cpb, max-dpb, and max-br:
+                        These parameters MAY be used to signal the
+                        capabilities of a receiver implementation.
+                        These parameters MUST NOT be used for any other
+                        purpose.  The profile-level-id parameter MUST
+                        be present in the same receiver capability
+                        description that contains any of these
+                        parameters.  The level conveyed in the value of
+                        the profile-level-id parameter MUST be such
+                        that the receiver is fully capable of
+                        supporting.  max-mbps, max-fs, max-cpb, max-
+                        dpb, and max-br MAY be used to indicate
+                        capabilities of the receiver that extend the
+                        required capabilities of the signaled level, as
+                        specified below.
+
+                        When more than one parameter from the set (max-
+                        mbps, max-fs, max-cpb, max-dpb, max-br) is
+                        present, the receiver MUST support all signaled
+                        capabilities simultaneously.  For example, if
+                        both max-mbps and max-br are present, the
+                        signaled level with the extension of both the
+                        frame rate and bit rate is supported.  That is,
+                        the receiver is able to decode NAL unit
+                        streams in which the macroblock processing rate
+                        is up to max-mbps (inclusive), the bit rate is
+                        up to max-br (inclusive), the coded picture
+                        buffer size is derived as specified in the
+                        semantics of the max-br parameter below, and
+                        other properties comply with the level
+                        specified in the value of the profile-level-id
+                        parameter.
+
+                        A receiver MUST NOT signal values of max-
+                        mbps, max-fs, max-cpb, max-dpb, and max-br that
+                        meet the requirements of a higher level,
+
+
+
+Wenger, et al.              Standards Track                    [Page 39]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+                        referred to as level A herein, compared to the
+                        level specified in the value of the profile-
+                        level-id parameter, if the receiver can support
+                        all the properties of level A.
+
+                            Informative note: When the OPTIONAL MIME
+                            type parameters are used to signal the
+                            properties of a NAL unit stream, max-mbps,
+                            max-fs, max-cpb, max-dpb, and max-br are
+                            not present, and the value of profile-
+                            level-id must always be such that the NAL
+                            unit stream complies fully with the
+                            specified profile and level.
+
+       max-mbps:        The value of max-mbps is an integer indicating
+                        the maximum macroblock processing rate in units
+                        of macroblocks per second.  The max-mbps
+                        parameter signals that the receiver is capable
+                        of decoding video at a higher rate than is
+                        required by the signaled level conveyed in the
+                        value of the profile-level-id parameter.  When
+                        max-mbps is signaled, the receiver MUST be able
+                        to decode NAL unit streams that conform to the
+                        signaled level, with the exception that the
+                        MaxMBPS value in Table A-1 of [1] for the
+                        signaled level is replaced with the value of
+                        max-mbps.  The value of max-mbps MUST be
+                        greater than or equal to the value of MaxMBPS
+                        for the level given in Table A-1 of [1].
+                        Senders MAY use this knowledge to send pictures
+                        of a given size at a higher picture rate than
+                        is indicated in the signaled level.
+
+       max-fs:          The value of max-fs is an integer indicating
+                        the maximum frame size in units of macroblocks.
+                        The max-fs parameter signals that the receiver
+                        is capable of decoding larger picture sizes
+                        than are required by the signaled level conveyed
+                        in the value of the profile-level-id parameter.
+                        When max-fs is signaled, the receiver MUST be
+                        able to decode NAL unit streams that conform to
+                        the signaled level, with the exception that the
+                        MaxFS value in Table A-1 of [1] for the
+                        signaled level is replaced with the value of
+                        max-fs.  The value of max-fs MUST be greater
+                        than or equal to the value of MaxFS for the
+                        level given in Table A-1 of [1].  Senders MAY
+                        use this knowledge to send larger pictures at a
+
+
+
+Wenger, et al.              Standards Track                    [Page 40]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+                        proportionally lower frame rate than is
+                        indicated in the signaled level.
+
+       max-cpb          The value of max-cpb is an integer indicating
+                        the maximum coded picture buffer size in units
+                        of 1000 bits for the VCL HRD parameters (see
+                        A.3.1 item i of [1]) and in units of 1200 bits
+                        for the NAL HRD parameters (see A.3.1 item j of
+                        [1]).  The max-cpb parameter signals that the
+                        receiver has more memory than the minimum
+                        amount of coded picture buffer memory required
+                        by the signaled level conveyed in the value of
+                        the profile-level-id parameter.  When max-cpb
+                        is signaled, the receiver MUST be able to
+                        decode NAL unit streams that conform to the
+                        signaled level, with the exception that the
+                        MaxCPB value in Table A-1 of [1] for the
+                        signaled level is replaced with the value of
+                        max-cpb.  The value of max-cpb MUST be greater
+                        than or equal to the value of MaxCPB for the
+                        level given in Table A-1 of [1].  Senders MAY
+                        use this knowledge to construct coded video
+                        streams with greater variation of bit rate
+                        than can be achieved with the
+                        MaxCPB value in Table A-1 of [1].
+
+                            Informative note: The coded picture buffer
+                            is used in the hypothetical reference
+                            decoder (Annex C) of H.264.  The use of the
+                            hypothetical reference decoder is
+                            recommended in H.264 encoders to verify
+                            that the produced bitstream conforms to the
+                            standard and to control the output bitrate.
+                            Thus, the coded picture buffer is
+                            conceptually independent of any other
+                            potential buffers in the receiver,
+                            including de-interleaving and de-jitter
+                            buffers.  The coded picture buffer need not
+                            be implemented in decoders as specified in
+                            Annex C of H.264, but rather standard-
+                            compliant decoders can have any buffering
+                            arrangements provided that they can decode
+                            standard-compliant bitstreams.  Thus, in
+                            practice, the input buffer for video
+                            decoder can be integrated with de-
+                            interleaving and de-jitter buffers of the
+                            receiver.
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 41]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+       max-dpb:         The value of max-dpb is an integer indicating
+                        the maximum decoded picture buffer size in
+                        units of 1024 bytes.  The max-dpb parameter
+                        signals that the receiver has more memory than
+                        the minimum amount of decoded picture buffer
+                        memory required by the signaled level conveyed
+                        in the value of the profile-level-id parameter.
+                        When max-dpb is signaled, the receiver MUST be
+                        able to decode NAL unit streams that conform to
+                        the signaled level, with the exception that the
+                        MaxDPB value in Table A-1 of [1] for the
+                        signaled level is replaced with the value of
+                        max-dpb.  Consequently, a receiver that signals
+                        max-dpb MUST be capable of storing the
+                        following number of decoded frames,
+                        complementary field pairs, and non-paired
+                        fields in its decoded picture buffer:
+
+                        Min(1024 * max-dpb / ( PicWidthInMbs *
+                        FrameHeightInMbs * 256 * ChromaFormatFactor ),
+                        16)
+
+                        PicWidthInMbs, FrameHeightInMbs, and
+                        ChromaFormatFactor are defined in [1].
+
+                        The value of max-dpb MUST be greater than or
+                        equal to the value of MaxDPB for the level
+                        given in Table A-1 of [1].  Senders MAY use
+                        this knowledge to construct coded video streams
+                        with improved compression.
+
+                            Informative note: This parameter was added
+                            primarily to complement a similar codepoint
+                            in the ITU-T Recommendation H.245, so as to
+                            facilitate signaling gateway designs.  The
+                            decoded picture buffer stores reconstructed
+                            samples and is a property of the video
+                            decoder only.  There is no relationship
+                            between the size of the decoded picture
+                            buffer and the buffers used in RTP,
+                            especially de-interleaving and de-jitter
+                            buffers.
+
+       max-br:          The value of max-br is an integer indicating
+                        the maximum video bit rate in units of 1000
+                        bits per second for the VCL HRD parameters (see
+                        A.3.1 item i of [1]) and in units of 1200 bits
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 42]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+                        per second for the NAL HRD parameters (see
+                        A.3.1 item j of [1]).
+
+                        The max-br parameter signals that the video
+                        decoder of the receiver is capable of decoding
+                        video at a higher bit rate than is required by
+                        the signaled level conveyed in the value of the
+                        profile-level-id parameter.  The value of max-
+                        br MUST be greater than or equal to the value
+                        of MaxBR for the level given in Table A-1 of
+                        [1].
+
+                        When max-br is signaled, the video codec of the
+                        receiver MUST be able to decode NAL unit
+                        streams that conform to the signaled level,
+                        conveyed in the profile-level-id parameter,
+                        with the following exceptions in the limits
+                        specified by the level:
+                        o The value of max-br replaces the MaxBR value
+                          of the signaled level (in Table A-1 of [1]).
+                        o When the max-cpb parameter is not present,
+                          the result of the following formula replaces
+                          the value of MaxCPB in Table A-1 of [1]:
+                          (MaxCPB of the signaled level) * max-br /
+                          (MaxBR of the signaled level).
+
+                        For example, if a receiver signals capability
+                        for Level 1.2 with max-br equal to 1550, this
+                        indicates a maximum video bitrate of 1550
+                        kbits/sec for VCL HRD parameters, a maximum
+                        video bitrate of 1860 kbits/sec for NAL HRD
+                        parameters, and a CPB size of 4036458 bits
+                        (1550000 / 384000 * 1000 * 1000).
+
+                        The value of max-br MUST be greater than or
+                        equal to the value MaxBR for the signaled level
+                        given in Table A-1 of [1].
+
+                        Senders MAY use this knowledge to send higher
+                        bitrate video as allowed in the level
+                        definition of Annex A of H.264, to achieve
+                        improved video quality.
+
+                            Informative note: This parameter was added
+                            primarily to complement a similar codepoint
+                            in the ITU-T Recommendation H.245, so as to
+                            facilitate signaling gateway designs.  No
+                            assumption can be made from the value of
+
+
+
+Wenger, et al.              Standards Track                    [Page 43]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+                            this parameter that the network is capable
+                            of handling such bit rates at any given
+                            time.  In particular, no conclusion can be
+                            drawn that the signaled bit rate is
+                            possible under congestion control
+                            constraints.
+
+      redundant-pic-cap:
+                        This parameter signals the capabilities of a
+                        receiver implementation.  When equal to 0, the
+                        parameter indicates that the receiver makes no
+                        attempt to use redundant coded pictures to
+                        correct incorrectly decoded primary coded
+                        pictures.  When equal to 0, the receiver is not
+                        capable of using redundant slices; therefore, a
+                        sender SHOULD avoid sending redundant slices to
+                        save bandwidth.  When equal to 1, the receiver
+                        is capable of decoding any such redundant slice
+                        that covers a corrupted area in a primary
+                        decoded picture (at least partly), and therefore
+                        a sender MAY send redundant slices.  When the
+                        parameter is not present, then a value of 0
+                        MUST be used for redundant-pic-cap.  When
+                        present, the value of redundant-pic-cap MUST be
+                        either 0 or 1.
+
+                        When the profile-level-id parameter is present
+                        in the same capability signaling as the
+                        redundant-pic-cap parameter, and the profile
+                        indicated in profile-level-id is such that it
+                        disallows the use of redundant coded pictures
+                        (e.g., Main Profile), the value of redundant-
+                        pic-cap MUST be equal to 0.  When a receiver
+                        indicates redundant-pic-cap equal to 0, the
+                        received stream SHOULD NOT contain redundant
+                        coded pictures.
+
+                            Informative note: Even if redundant-pic-cap
+                            is equal to 0, the decoder is able to
+                            ignore redundant codec pictures provided
+                            that the decoder supports such a profile
+                            (Baseline, Extended) in which redundant
+                            coded pictures are allowed.
+
+                            Informative note: Even if redundant-pic-cap
+                            is equal to 1, the receiver may also choose
+                            other error concealment strategies to
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 44]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+                            replace or complement decoding of redundant
+                            slices.
+
+       sprop-parameter-sets:
+                        This parameter MAY be used to convey
+                        any sequence and picture parameter set NAL
+                        units (herein referred to as the initial
+                        parameter set NAL units) that MUST precede any
+                        other NAL units in decoding order.  The
+                        parameter MUST NOT be used to indicate codec
+                        capability in any capability exchange
+                        procedure.  The value of the parameter is the
+                        base64 [6] representation of the initial
+                        parameter set NAL units as specified in
+                        sections 7.3.2.1 and 7.3.2.2 of [1].  The
+                        parameter sets are conveyed in decoding order,
+                        and no framing of the parameter set NAL units
+                        takes place.  A comma is used to separate any
+                        pair of parameter sets in the list.  Note that
+                        the number of bytes in a parameter set NAL unit
+                        is typically less than 10, but a picture
+                        parameter set NAL unit can contain several
+                        hundreds of bytes.
+
+                           Informative note: When several payload
+                           types are offered in the SDP Offer/Answer
+                           model, each with its own sprop-parameter-
+                           sets parameter, then the receiver cannot
+                           assume that those parameter sets do not use
+                           conflicting storage locations (i.e.,
+                           identical values of parameter set
+                           identifiers).  Therefore, a receiver should
+                           double-buffer all sprop-parameter-sets and
+                           make them available to the decoder instance
+                           that decodes a certain payload type.
+
+       parameter-add:   This parameter MAY be used to signal whether
+                        the receiver of this parameter is allowed to
+                        add parameter sets in its signaling response
+                        using the sprop-parameter-sets MIME parameter.
+                        The value of this parameter is either 0 or 1.
+                        0 is equal to false; i.e., it is not allowed to
+                        add parameter sets.  1 is equal to true; i.e.,
+                        it is allowed to add parameter sets.  If the
+                        parameter is not present, its value MUST be 1.
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 45]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+       packetization-mode:
+                        This parameter signals the properties of an
+                        RTP payload type or the capabilities of a
+                        receiver implementation.  Only a single
+                        configuration point can be indicated; thus,
+                        when capabilities to support more than one
+                        packetization-mode are declared, multiple
+                        configuration points (RTP payload types) must
+                        be used.
+
+                        When the value of packetization-mode is equal
+                        to 0 or packetization-mode is not present, the
+                        single NAL mode, as defined in section 6.2 of
+                        RFC 3984, MUST be used.  This mode is in use in
+                        standards using ITU-T Recommendation H.241 [15]
+                        (see section 12.1).  When the value of
+                        packetization-mode is equal to 1, the non-
+                        interleaved mode, as defined in section 6.3 of
+                        RFC 3984, MUST be used.  When the value of
+                        packetization-mode is equal to 2, the
+                        interleaved mode, as defined in section 6.4 of
+                        RFC 3984, MUST be used.  The value of
+                        packetization mode MUST be an integer in the
+                        range of 0 to 2, inclusive.
+
+       sprop-interleaving-depth:
+                        This parameter MUST NOT be present
+                        when packetization-mode is not present or the
+                        value of packetization-mode is equal to 0 or 1.
+                        This parameter MUST be present when the value
+                        of packetization-mode is equal to 2.
+
+                        This parameter signals the properties of a NAL
+                        unit stream.  It specifies the maximum number
+                        of VCL NAL units that precede any VCL NAL unit
+                        in the NAL unit stream in transmission order
+                        and follow the VCL NAL unit in decoding order.
+                        Consequently, it is guaranteed that receivers
+                        can reconstruct NAL unit decoding order when
+                        the buffer size for NAL unit decoding order
+                        recovery is at least the value of sprop-
+                        interleaving-depth + 1 in terms of VCL NAL
+                        units.
+
+                        The value of sprop-interleaving-depth MUST be
+                        an integer in the range of 0 to 32767,
+                        inclusive.
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 46]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+       sprop-deint-buf-req:
+                        This parameter MUST NOT be present when
+                        packetization-mode is not present or the value
+                        of packetization-mode is equal to 0 or 1.  It
+                        MUST be present when the value of
+                        packetization-mode is equal to 2.
+
+                        sprop-deint-buf-req signals the required size
+                        of the deinterleaving buffer for the NAL unit
+                        stream.  The value of the parameter MUST be
+                        greater than or equal to the maximum buffer
+                        occupancy (in units of bytes) required in such
+                        a deinterleaving buffer that is specified in
+                        section 7.2 of RFC 3984.  It is guaranteed that
+                        receivers can perform the deinterleaving of
+                        interleaved NAL units into NAL unit decoding
+                        order, when the deinterleaving buffer size is
+                        at least the value of sprop-deint-buf-req in
+                        terms of bytes.
+
+                        The value of sprop-deint-buf-req MUST be an
+                        integer in the range of 0 to 4294967295,
+                        inclusive.
+
+                            Informative note: sprop-deint-buf-req
+                            indicates the required size of the
+                            deinterleaving buffer only.  When network
+                            jitter can occur, an appropriately sized
+                            jitter buffer has to be provisioned for
+                            as well.
+
+       deint-buf-cap:   This parameter signals the capabilities of a
+                        receiver implementation and indicates the
+                        amount of deinterleaving buffer space in units
+                        of bytes that the receiver has available for
+                        reconstructing the NAL unit decoding order.  A
+                        receiver is able to handle any stream for which
+                        the value of the sprop-deint-buf-req parameter
+                        is smaller than or equal to this parameter.
+
+                        If the parameter is not present, then a value
+                        of 0 MUST be used for deint-buf-cap.  The value
+                        of deint-buf-cap MUST be an integer in the
+                        range of 0 to 4294967295, inclusive.
+
+                            Informative note: deint-buf-cap indicates
+                            the maximum possible size of the
+                            deinterleaving buffer of the receiver only.
+
+
+
+Wenger, et al.              Standards Track                    [Page 47]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+                            When network jitter can occur, an
+                            appropriately sized jitter buffer has to
+                            be provisioned for as well.
+
+       sprop-init-buf-time:
+                        This parameter MAY be used to signal the
+                        properties of a NAL unit stream.  The parameter
+                        MUST NOT be present, if the value of
+                        packetization-mode is equal to 0 or 1.
+
+                        The parameter signals the initial buffering
+                        time that a receiver MUST buffer before
+                        starting decoding to recover the NAL unit
+                        decoding order from the transmission order.
+                        The parameter is the maximum value of
+                        (transmission time of a NAL unit - decoding
+                        time of the NAL unit), assuming reliable and
+                        instantaneous transmission, the same
+                        timeline for transmission and decoding, and
+                        that decoding starts when the first packet
+                        arrives.
+
+                        An example of specifying the value of sprop-
+                        init-buf-time follows.  A NAL unit stream is
+                        sent in the following interleaved order, in
+                        which the value corresponds to the decoding
+                        time and the transmission order is from left to
+                        right:
+
+                        0  2  1  3  5  4  6  8  7 ...
+
+                        Assuming a steady transmission rate of NAL
+                        units, the transmission times are:
+
+                        0  1  2  3  4  5  6  7  8 ...
+
+                        Subtracting the decoding time from the
+                        transmission time column-wise results in the
+                        following series:
+
+                        0 -1  1  0 -1  1  0 -1  1 ...
+
+                        Thus, in terms of intervals of NAL unit
+                        transmission times, the value of
+                        sprop-init-buf-time in this
+                        example is 1.
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 48]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+                        The parameter is coded as a non-negative base10
+                        integer representation in clock ticks of a 90-
+                        kHz clock.  If the parameter is not present,
+                        then no initial buffering time value is
+                        defined.  Otherwise the value of sprop-init-
+                        buf-time MUST be an integer in the range of 0
+                        to 4294967295, inclusive.
+
+                        In addition to the signaled sprop-init-buf-
+                        time, receivers SHOULD take into account the
+                        transmission delay jitter buffering, including
+                        buffering for the delay jitter caused by
+                        mixers, translators, gateways, proxies,
+                        traffic-shapers, and other network elements.
+
+       sprop-max-don-diff:
+                        This parameter MAY be used to signal the
+                        properties of a NAL unit stream.  It MUST NOT
+                        be used to signal transmitter or receiver or
+                        codec capabilities.  The parameter MUST NOT be
+                        present if the value of packetization-mode is
+                        equal to 0 or 1.  sprop-max-don-diff is an
+                        integer in the range of 0 to 32767, inclusive.
+                        If sprop-max-don-diff is not present, the value
+                        of the parameter is unspecified.  sprop-max-
+                        don-diff is calculated as follows:
+
+                        sprop-max-don-diff = max{AbsDON(i) -
+                        AbsDON(j)},
+                        for any i and any j>i,
+
+                        where i and j indicate the index of the NAL
+                        unit in the transmission order and AbsDON
+                        denotes a decoding order number of the NAL
+                        unit that does not wrap around to 0 after
+                        65535.  In other words, AbsDON is calculated as
+                        follows: Let m and n be consecutive NAL units
+                        in transmission order.  For the very first NAL
+                        unit in transmission order (whose index is 0),
+                        AbsDON(0) = DON(0).  For other NAL units,
+                        AbsDON is calculated as follows:
+
+                        If DON(m) == DON(n), AbsDON(n) = AbsDON(m)
+
+                        If (DON(m) < DON(n) and DON(n) - DON(m) <
+                        32768),
+                        AbsDON(n) = AbsDON(m) + DON(n) - DON(m)
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 49]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+                        If (DON(m) > DON(n) and DON(m) - DON(n) >=
+                        32768),
+                        AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)
+
+                        If (DON(m) < DON(n) and DON(n) - DON(m) >=
+                        32768),
+
+                        AbsDON(n) = AbsDON(m) - (DON(m) + 65536 -
+                        DON(n))
+
+                        If (DON(m) > DON(n) and DON(m) - DON(n) <
+                        32768),
+                        AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))
+
+                        where DON(i) is the decoding order number of
+                        the NAL unit having index i in the transmission
+                        order.  The decoding order number is specified
+                        in section 5.5 of RFC 3984.
+
+                            Informative note: Receivers may use sprop-
+                            max-don-diff to trigger which NAL units in
+                            the receiver buffer can be passed to the
+                            decoder.
+
+     max-rcmd-nalu-size:
+                        This parameter MAY be used to signal the
+                        capabilities of a receiver.  The parameter MUST
+                        NOT be used for any other purposes.  The value
+                        of the parameter indicates the largest NALU
+                        size in bytes that the receiver can handle
+                        efficiently.  The parameter value is a
+                        recommendation, not a strict upper boundary.
+                        The sender MAY create larger NALUs but must be
+                        aware that the handling of these may come at a
+                        higher cost than NALUs conforming to the
+                        limitation.
+
+                        The value of max-rcmd-nalu-size MUST be an
+                        integer in the range of 0 to 4294967295,
+                        inclusive.  If this parameter is not specified,
+                        no known limitation to the NALU size exists.
+                        Senders still have to consider the MTU size
+                        available between the sender and the receiver
+                        and SHOULD run MTU discovery for this purpose.
+
+                        This parameter is motivated by, for example, an
+                        IP to H.223 video telephony gateway, where
+                        NALUs smaller than the H.223 transport data
+
+
+
+Wenger, et al.              Standards Track                    [Page 50]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+                        unit will be more efficient.  A gateway may
+                        terminate IP; thus, MTU discovery will normally
+                        not work beyond the gateway.
+
+                            Informative note: Setting this parameter to
+                            a lower than necessary value may have a
+                            negative impact.
+
+   Encoding considerations:
+                        This type is only defined for transfer via RTP
+                        (RFC 3550).
+
+                        A file format of H.264/AVC video is defined in
+                        [29].  This definition is utilized by other
+                        file formats, such as the 3GPP multimedia file
+                        format (MIME type video/3gpp) [30] or the MP4
+                        file format (MIME type video/mp4).
+
+   Security considerations:
+                        See section 9 of RFC 3984.
+
+   Public specification:
+                        Please refer to RFC 3984 and its section 15.
+
+   Additional information:
+                        None
+
+   File extensions:     none
+   Macintosh file type code: none
+   Object identifier or OID: none
+
+   Person & email address to contact for further information:
+                        stewe@stewe.org
+
+   Intended usage:      COMMON
+
+   Author:
+                        stewe@stewe.org
+   Change controller:
+                        IETF Audio/Video Transport working group
+                        delegated from the IESG.
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 51]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+8.2.  SDP Parameters
+
+8.2.1.  Mapping of MIME Parameters to SDP
+
+   The MIME media type video/H264 string is mapped to fields in the
+   Session Description Protocol (SDP) [5] as follows:
+
+   o  The media name in the "m=" line of SDP MUST be video.
+
+   o  The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the
+      MIME subtype).
+
+   o  The clock rate in the "a=rtpmap" line MUST be 90000.
+
+   o  The OPTIONAL parameters "profile-level-id", "max-mbps", "max-fs",
+      "max-cpb", "max-dpb", "max-br", "redundant-pic-cap", "sprop-
+      parameter-sets", "parameter-add", "packetization-mode", "sprop-
+      interleaving-depth", "deint-buf-cap", "sprop-deint-buf-req",
+      "sprop-init-buf-time", "sprop-max-don-diff", and "max-rcmd-nalu-
+      size", when present, MUST be included in the "a=fmtp" line of SDP.
+      These parameters are expressed as a MIME media type string, in the
+      form of a semicolon separated list of parameter=value pairs.
+
+   An example of media representation in SDP is as follows (Baseline
+   Profile, Level 3.0, some of the constraints of the Main profile may
+   not be obeyed):
+
+      m=video 49170 RTP/AVP 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E;
+                sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==
+
+8.2.2.  Usage with the SDP Offer/Answer Model
+
+   When H.264 is offered over RTP using SDP in an Offer/Answer model [7]
+   for negotiation for unicast usage, the following limitations and
+   rules apply:
+
+   o  The parameters identifying a media format configuration for H.264
+      are "profile-level-id", "packetization-mode", and, if required by
+      "packetization-mode", "sprop-deint-buf-req".  These three
+      parameters MUST be used symmetrically; i.e., the answerer MUST
+      either maintain all configuration parameters or remove the media
+      format (payload type) completely, if one or more of the parameter
+      values are not supported.
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 52]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+         Informative note: The requirement for symmetric use applies
+         only for the above three parameters and not for the other
+         stream properties and capability parameters.
+
+      To simplify handling and matching of these configurations, the
+      same RTP payload type number used in the offer SHOULD also be used
+      in the answer, as specified in [7].  An answer MUST NOT contain a
+      payload type number used in the offer unless the configuration
+      ("profile-level-id", "packetization-mode", and, if present,
+      "sprop-deint-buf-req") is the same as in the offer.
+
+         Informative note: An offerer, when receiving the answer, has to
+         compare payload types not declared in the offer based on media
+         type (i.e., video/h264) and the above three parameters with any
+         payload types it has already declared, in order to determine
+         whether the configuration in question is new or equivalent to a
+         configuration already offered.
+
+   o  The parameters "sprop-parameter-sets", "sprop-deint-buf-req",
+      "sprop-interleaving-depth", "sprop-max-don-diff", and "sprop-
+      init-buf-time" describe the properties of the NAL unit stream that
+      the offerer or answerer is sending for this media format
+      configuration.  This differs from the normal usage of the
+      Offer/Answer parameters: normally such parameters declare the
+      properties of the stream that the offerer or the answerer is able
+      to receive.  When dealing with H.264, the offerer assumes that the
+      answerer will be able to receive media encoded using the
+      configuration being offered.
+
+         Informative note: The above parameters apply for any stream
+         sent by the declaring entity with the same configuration; i.e.,
+         they are dependent on their source.  Rather then being bound to
+         the payload type, the values may have to be applied to another
+         payload type when being sent, as they apply for the
+         configuration.
+
+   o  The capability parameters ("max-mbps", "max-fs", "max-cpb", "max-
+      dpb", "max-br", ,"redundant-pic-cap", "max-rcmd-nalu-size") MAY be
+      used to declare further capabilities.  Their interpretation
+      depends on the direction attribute.  When the direction attribute
+      is sendonly, then the parameters describe the limits of the RTP
+      packets and the NAL unit stream that the sender is capable of
+      producing.  When the direction attribute is sendrecv or recvonly,
+      then the parameters describe the limitations of what the receiver
+      accepts.
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 53]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   o  As specified above, an offerer has to include the size of the
+      deinterleaving buffer in the offer for an interleaved H.264
+      stream.  To enable the offerer and answerer to inform each other
+      about their capabilities for deinterleaving buffering, both
+      parties are RECOMMENDED to include "deint-buf-cap".  This
+      information MAY be used when the value for "sprop-deint-buf-req"
+      is selected in a second round of offer and answer.  For
+      interleaved streams, it is also RECOMMENDED to consider offering
+      multiple payload types with different buffering requirements when
+      the capabilities of the receiver are unknown.
+
+   o  The "sprop-parameter-sets" parameter is used as described above.
+      In addition, an answerer MUST maintain all parameter sets received
+      in the offer in its answer.  Depending on the value of the
+      "parameter-add" parameter, different rules apply: If "parameter-
+      add" is false (0), the answer MUST NOT add any additional
+      parameter sets.  If "parameter-add" is true (1), the answerer, in
+      its answer, MAY add additional parameter sets to the "sprop-
+      parameter-sets" parameter.  The answerer MUST also, independent of
+      the value of "parameter-add", accept to receive a video stream
+      using the sprop-parameter-sets it declared in the answer.
+
+         Informative note: care must be taken when parameter sets are
+         added not to cause overwriting of already transmitted parameter
+         sets by using conflicting parameter set identifiers.
+
+   For streams being delivered over multicast, the following rules apply
+   in addition:
+
+   o  The stream properties parameters ("sprop-parameter-sets", "sprop-
+      deint-buf-req", "sprop-interleaving-depth", "sprop-max-don-diff",
+      and "sprop-init-buf-time") MUST NOT be changed by the answerer.
+      Thus, a payload type can either be accepted unaltered or removed.
+
+   o  The receiver capability parameters "max-mbps", "max-fs", "max-
+      cpb", "max-dpb", "max-br", and "max-rcmd-nalu-size" MUST be
+      supported by the answerer for all streams declared as sendrecv or
+      recvonly; otherwise, one of the following actions MUST be
+      performed: the media format is removed, or the session rejected.
+
+   o  The receiver capability parameter redundant-pic-cap SHOULD be
+      supported by the answerer for all streams declared as sendrecv or
+      recvonly as follows:  The answerer SHOULD NOT include redundant
+      coded pictures in the transmitted stream if the offerer indicated
+      redundant-pic-cap equal to 0.  Otherwise (when redundant_pic_cap
+      is equal to 1), it is beyond the scope of this memo to recommend
+      how the answerer should use redundant coded pictures.
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 54]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Below are the complete lists of how the different parameters shall be
+   interpreted in the different combinations of offer or answer and
+   direction attribute.
+
+   o  In offers and answers for which "a=sendrecv" or no direction
+      attribute is used, or in offers and answers for which "a=recvonly"
+      is used, the following interpretation of the parameters MUST be
+      used.
+
+      Declaring actual configuration or properties for receiving:
+
+         - profile-level-id
+         - packetization-mode
+
+      Declaring actual properties of the stream to be sent (applicable
+      only when "a=sendrecv" or no direction attribute is used):
+
+         - sprop-deint-buf-req
+         - sprop-interleaving-depth
+         - sprop-parameter-sets
+         - sprop-max-don-diff
+         - sprop-init-buf-time
+
+      Declaring receiver implementation capabilities:
+
+         - max-mbps
+         - max-fs
+         - max-cpb
+         - max-dpb
+         - max-br
+         - redundant-pic-cap
+         - deint-buf-cap
+         - max-rcmd-nalu-size
+
+      Declaring how Offer/Answer negotiation shall be performed:
+
+         - parameter-add
+
+   o  In an offer or answer for which the direction attribute
+      "a=sendonly" is included for the media stream, the following
+      interpretation of the parameters MUST be used:
+
+      Declaring actual configuration and properties of stream proposed
+      to be sent:
+
+         - profile-level-id
+         - packetization-mode
+         - sprop-deint-buf-req
+
+
+
+Wenger, et al.              Standards Track                    [Page 55]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+         - sprop-max-don-diff
+         - sprop-init-buf-time
+         - sprop-parameter-sets
+         - sprop-interleaving-depth
+
+      Declaring the capabilities of the sender when it receives a
+      stream:
+
+         - max-mbps
+         - max-fs
+         - max-cpb
+         - max-dpb
+         - max-br
+         - redundant-pic-cap
+         - deint-buf-cap
+         - max-rcmd-nalu-size
+
+      Declaring how Offer/Answer negotiation shall be performed:
+
+         - parameter-add
+
+   Furthermore, the following considerations are necessary:
+
+   o  Parameters used for declaring receiver capabilities are in general
+      downgradable; i.e., they express the upper limit for a sender's
+      possible behavior.  Thus a sender MAY select to set its encoder
+      using only lower/lesser or equal values of these parameters.
+      "sprop-parameter-sets" MUST NOT be used in a sender's declaration
+      of its capabilities, as the limits of the values that are carried
+      inside the parameter sets are implicit with the profile and level
+      used.
+
+   o  Parameters declaring a configuration point are not downgradable,
+      with the exception of the level part of the "profile-level-id"
+      parameter.  This expresses values a receiver expects to be used
+      and must be used verbatim on the sender side.
+
+   o  When a sender's capabilities are declared, and non-downgradable
+      parameters are used in this declaration, then these parameters
+      express a configuration that is acceptable.  In order to achieve
+      high interoperability levels, it is often advisable to offer
+      multiple alternative configurations; e.g., for the packetization
+      mode.  It is impossible to offer multiple configurations in a
+      single payload type.  Thus, when multiple configuration offers are
+      made, each offer requires its own RTP payload type associated with
+      the offer.
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 56]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   o  A receiver SHOULD understand all MIME parameters, even if it only
+      supports a subset of the payload format's functionality.  This
+      ensures that a receiver is capable of understanding when an offer
+      to receive media can be downgraded to what is supported by the
+      receiver of the offer.
+
+   o  An answerer MAY extend the offer with additional media format
+      configurations.  However, to enable their usage, in most cases a
+      second offer is required from the offerer to provide the stream
+      properties parameters that the media sender will use.  This also
+      has the effect that the offerer has to be able to receive this
+      media format configuration, not only to send it.
+
+   o  If an offerer wishes to have non-symmetric capabilities between
+      sending and receiving, the offerer has to offer different RTP
+      sessions; i.e., different media lines declared as "recvonly" and
+      "sendonly", respectively.  This may have further implications on
+      the system.
+
+8.2.3.  Usage in Declarative Session Descriptions
+
+   When H.264 over RTP is offered with SDP in a declarative style, as in
+   RTSP [27] or SAP [28], the following considerations are necessary.
+
+   o  All parameters capable of indicating the properties of both a NAL
+      unit stream and a receiver are used to indicate the properties of
+      a NAL unit stream.  For example, in this case, the parameter
+      "profile-level-id" declares the values used by the stream, instead
+      of the capabilities of the sender.  This results in that the
+      following interpretation of the parameters MUST be used:
+
+      Declaring actual configuration or properties:
+
+         - profile-level-id
+         - sprop-parameter-sets
+         - packetization-mode
+         - sprop-interleaving-depth
+         - sprop-deint-buf-req
+         - sprop-max-don-diff
+         - sprop-init-buf-time
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 57]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+      Not usable:
+
+         - max-mbps
+         - max-fs
+         - max-cpb
+         - max-dpb
+         - max-br
+         - redundant-pic-cap
+         - max-rcmd-nalu-size
+         - parameter-add
+         - deint-buf-cap
+
+   o  A receiver of the SDP is required to support all parameters and
+      values of the parameters provided; otherwise, the receiver MUST
+      reject (RTSP) or not participate in (SAP) the session.  It falls
+      on the creator of the session to use values that are expected to
+      be supported by the receiving application.
+
+8.3.  Examples
+
+   A SIP Offer/Answer exchange wherein both parties are expected to both
+   send and receive could look like the following.  Only the media codec
+   specific parts of the SDP are shown.  Some lines are wrapped due to
+   text constraints.
+
+      Offerer -> Answer SDP message:
+
+      m=video 49170 RTP/AVP 100 99 98
+      a=rtpmap:98 H264/90000
+      a=fmtp:98 profile-level-id=42A01E; packetization-mode=0;
+                sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==
+      a=rtpmap:99 H264/90000
+      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
+                sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==
+      a=rtpmap:100 H264/90000
+      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
+                 sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==;
+                 sprop-interleaving-depth=45; sprop-deint-buf-req=64000;
+                 sprop-init-buf-time=102478; deint-buf-cap=128000
+
+   The above offer presents the same codec configuration in three
+   different packetization formats.  PT 98 represents single NALU mode,
+   PT 99 non-interleaved mode; PT 100 indicates the interleaved mode.
+   In the interleaved mode case, the interleaving parameters that the
+   offerer would use if the answer indicates support for PT 100 are also
+   included.  In all three cases the parameter "sprop-parameter-sets"
+   conveys the initial parameter sets that are required for the answerer
+   when receiving a stream from the offerer when this configuration
+
+
+
+Wenger, et al.              Standards Track                    [Page 58]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   (profile-level-id and packetization mode) is accepted.  Note that the
+   value for "sprop-parameter-sets", although identical in the example
+   above, could be different for each payload type.
+
+     Answerer -> Offerer SDP message:
+
+     m=video 49170 RTP/AVP 100 99 97
+     a=rtpmap:97 H264/90000
+     a=fmtp:97 profile-level-id=42A01E; packetization-mode=0;
+               sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==,As0DEWlsIOp==,
+               KyzFGleR
+     a=rtpmap:99 H264/90000
+     a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
+               sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==,As0DEWlsIOp==,
+               KyzFGleR; max-rcmd-nalu-size=3980
+     a=rtpmap:100 H264/90000
+     a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
+               sprop-parameter-sets=Z0IACpZTBYmI,aMljiA==,As0DEWlsIOp==,
+               KyzFGleR; sprop-interleaving-depth=60;
+               sprop-deint-buf-req=86000; sprop-init-buf-time=156320;
+               deint-buf-cap=128000; max-rcmd-nalu-size=3980
+
+   As the Offer/Answer negotiation covers both sending and receiving
+   streams, an offer indicates the exact parameters for what the offerer
+   is willing to receive, whereas the answer indicates the same for what
+   the answerer accepts to receive.  In this case the offerer declared
+   that it is willing to receive payload type 98.  The answerer accepts
+   this by declaring a equivalent payload type 97; i.e., it has
+   identical values for the three parameters "profile-level-id",
+   packetization-mode, and "sprop-deint-buf-req".  This has the
+   following implications for both the offerer and the answerer
+   concerning the parameters that declare properties.  The offerer
+   initially declared a certain value of the "sprop-parameter-sets" in
+   the payload definition for PT=98.  However, as the answerer accepted
+   this as PT=97, the values of "sprop-parameter-sets" in PT=98 must now
+   be used instead when the offerer sends PT=97.  Similarly, when the
+   answerer sends PT=98 to the offerer, it has to use the properties
+   parameters it declared in PT=97.
+
+   The answerer also accepts the reception of the two configurations
+   that payload types 99 and 100 represent.  It provides the initial
+   parameter sets for the answerer-to-offerer direction, and for
+   buffering related parameters that it will use to send the payload
+   types.  It also provides the offerer with its memory limit for
+   deinterleaving operations by providing a "deint-buf-cap" parameter.
+   This is only useful if the offerer decides on making a second offer,
+   where it can take the new value into account.  The "max-rcmd-nalu-
+   size" indicates that the answerer can efficiently process NALUs up to
+
+
+
+Wenger, et al.              Standards Track                    [Page 59]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   the size of 3980 bytes.  However, there is no guarantee that the
+   network supports this size.
+
+   Please note that the parameter sets in the above example do not
+   represent a legal operation point of an H.264 codec.  The base64
+   strings are only used for illustration.
+
+8.4.  Parameter Set Considerations
+
+   The H.264 parameter sets are a fundamental part of the video codec
+   and vital to its operation; see section 1.2.  Due to their
+   characteristics and their importance for the decoding process, lost
+   or erroneously transmitted parameter sets can hardly be concealed
+   locally at the receiver.  A reference to a corrupt parameter set has
+   normally fatal results to the decoding process.  Corruption could
+   occur, for example, due to the erroneous transmission or loss of a
+   parameter set data structure, but also due to the untimely
+   transmission of a parameter set update.  Therefore, the following
+   recommendations are provided as a guideline for the implementer of
+   the RTP sender.
+
+   Parameter set NALUs can be transported using three different
+   principles:
+
+   A. Using a session control protocol (out-of-band) prior to the actual
+      RTP session.
+
+   B. Using a session control protocol (out-of-band) during an ongoing
+      RTP session.
+
+   C. Within the RTP stream in the payload (in-band) during an ongoing
+      RTP session.
+
+   It is necessary to implement principles A and B within a session
+   control protocol.  SIP and SDP can be used as described in the SDP
+   Offer/Answer model and in the previous sections of this memo.  This
+   section contains guidelines on how principles A and B must be
+   implemented within session control protocols.  It is independent of
+   the particular protocol used.  Principle C is supported by the RTP
+   payload format defined in this specification.
+
+   The picture and sequence parameter set NALUs SHOULD NOT be
+   transmitted in the RTP payload unless reliable transport is provided
+   for RTP, as a loss of a parameter set of either type will likely
+   prevent decoding of a considerable portion of the corresponding RTP
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 60]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   stream.  Thus, the transmission of parameter sets using a reliable
+   session control protocol (i.e., usage of principle A or B above) is
+   RECOMMENDED.
+
+   In the rest of the section it is assumed that out-of-band signaling
+   provides reliable transport of parameter set NALUs and that in-band
+   transport does not.  If in-band signaling of parameter sets is used,
+   the sender SHOULD take the error characteristics into account and use
+   mechanisms to provide a high probability for delivering the parameter
+   sets correctly.  Mechanisms that increase the probability for a
+   correct reception include packet repetition, FEC, and retransmission.
+   The use of an unreliable, out-of-band control protocol has similar
+   disadvantages as the in-band signaling (possible loss) and, in
+   addition, may also lead to difficulties in the synchronization (see
+   below).  Therefore, it is NOT RECOMMENDED.
+
+   Parameter sets MAY be added or updated during the lifetime of a
+   session using principles B and C.  It is required that parameter sets
+   are present at the decoder prior to the NAL units that refer to them.
+   Updating or adding of parameter sets can result in further problems,
+   and therefore the following recommendations should be considered.
+
+   -  When parameter sets are added or updated, principle C is
+      vulnerable to transmission errors as described above, and
+      therefore principle B is RECOMMENDED.
+
+   -  When parameter sets are added or updated, care SHOULD be taken to
+      ensure that any parameter set is delivered prior to its usage.  It
+      is common that no synchronization is present between out-of-band
+      signaling and in-band traffic.  If out-of-band signaling is used,
+      it is RECOMMENDED that a sender does not start sending NALUs
+      requiring the updated parameter sets prior to acknowledgement of
+      delivery from the signaling protocol.
+
+   -  When parameter sets are updated, the following synchronization
+      issue should be taken into account.  When overwriting a parameter
+      set at the receiver, the sender has to ensure that the parameter
+      set in question is not needed by any NALU present in the network
+      or receiver buffers.  Otherwise, decoding with a wrong parameter
+      set may occur.  To lessen this problem, it is RECOMMENDED either
+      to overwrite only those parameter sets that have not been used for
+      a sufficiently long time (to ensure that all related NALUs have
+      been consumed), or to add a new parameter set instead (which may
+      have negative consequences for the efficiency of the video
+      coding).
+
+   -  When new parameter sets are added, previously unused parameter set
+      identifiers are used.  This avoids the problem identified in the
+
+
+
+Wenger, et al.              Standards Track                    [Page 61]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+      previous paragraph.  However, in a multiparty session, unless a
+      synchronized control protocol is used, there is a risk that
+      multiple entities try to add different parameter sets for the same
+      identifier, which has to be avoided.
+
+   -  Adding or modifying parameter sets by using both principles B and
+      C in the same RTP session may lead to inconsistencies of the
+      parameter sets because of the lack of synchronization between the
+      control and the RTP channel.  Therefore, principles B and C MUST
+      NOT both be used in the same session unless sufficient
+      synchronization can be provided.
+
+   In some scenarios (e.g., when only the subset of this payload format
+   specification corresponding to H.241 is used), it is not possible to
+   employ out-of-band parameter set transmission.  In this case,
+   parameter sets have to be transmitted in-band.  Here, the
+   synchronization with the non-parameter-set-data in the bitstream is
+   implicit, but the possibility of a loss has to be taken into account.
+   The loss probability should be reduced using the mechanisms discussed
+   above.
+
+   -  When parameter sets are initially provided using principle A and
+      then later added or updated in-band (principle C), there is a risk
+      associated with updating the parameter sets delivered out-of-band.
+      If receivers miss some in-band updates (for example, because of a
+      loss or a late tune-in), those receivers attempt to decode the
+      bitstream using out-dated parameters.  It is RECOMMENDED that
+      parameter set IDs be partitioned between the out-of-band and in-
+      band parameter sets.
+
+   To allow for maximum flexibility and best performance from the H.264
+   coder, it is recommended, if possible, to allow any sender to add its
+   own parameter sets to be used in a session.  Setting the "parameter-
+   add" parameter to false should only be done in cases where the
+   session topology prevents a participant to add its own parameter
+   sets.
+
+9.  Security Considerations
+
+   RTP packets using the payload format defined in this specification
+   are subject to the security considerations discussed in the RTP
+   specification [4], and in any appropriate RTP profile (for example,
+   [16]).  This implies that confidentiality of the media streams is
+   achieved by encryption; for example, through the application of SRTP
+   [26].  Because the data compression used with this payload format is
+   applied end-to-end, any encryption needs to be performed after
+   compression.
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 62]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   A potential denial-of-service threat exists for data encodings using
+   compression techniques that have non-uniform receiver-end
+   computational load.  The attacker can inject pathological datagrams
+   into the stream that are complex to decode and that cause the
+   receiver to be overloaded.  H.264 is particularly vulnerable to such
+   attacks, as it is extremely simple to generate datagrams containing
+   NAL units that affect the decoding process of many future NAL units.
+   Therefore, the usage of data origin authentication and data integrity
+   protection of at least the RTP packet is RECOMMENDED; for example,
+   with SRTP [26].
+
+   Note that the appropriate mechanism to ensure confidentiality and
+   integrity of RTP packets and their payloads is very dependent on the
+   application and on the transport and signaling protocols employed.
+   Thus, although SRTP is given as an example above, other possible
+   choices exist.
+
+   Decoders MUST exercise caution with respect to the handling of user
+   data SEI messages, particularly if they contain active elements, and
+   MUST restrict their domain of applicability to the presentation
+   containing the stream.
+
+   End-to-End security with either authentication, integrity or
+   confidentiality protection will prevent a MANE from performing
+   media-aware operations other than discarding complete packets.  And
+   in the case of confidentiality protection it will even be prevented
+   from performing discarding of packets in a media aware way.  To allow
+   any MANE to perform its operations, it will be required to be a
+   trusted entity which is included in the security context
+   establishment.
+
+10.  Congestion Control
+
+   Congestion control for RTP SHALL be used in accordance with RFC 3550
+   [4], and with any applicable RTP profile; e.g., RFC 3551 [16].  An
+   additional requirement if best-effort service is being used is:
+   users of this payload format MUST monitor packet loss to ensure that
+   the packet loss rate is within acceptable parameters.  Packet loss is
+   considered acceptable if a TCP flow across the same network path, and
+   experiencing the same network conditions, would achieve an average
+   throughput, measured on a reasonable timescale, that is not less than
+   the RTP flow is achieving.  This condition can be satisfied by
+   implementing congestion control mechanisms to adapt the transmission
+   rate (or the number of layers subscribed for a layered multicast
+   session), or by arranging for a receiver to leave the session if the
+   loss rate is unacceptably high.
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 63]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   The bit rate adaptation necessary for obeying the congestion control
+   principle is easily achievable when real-time encoding is used.
+   However, when pre-encoded content is being transmitted, bandwidth
+   adaptation requires the availability of more than one coded
+   representation of the same content, at different bit rates, or the
+   existence of non-reference pictures or sub-sequences [22] in the
+   bitstream.  The switching between the different representations can
+   normally be performed in the same RTP session; e.g., by employing a
+   concept known as SI/SP slices of the Extended Profile, or by
+   switching streams at IDR picture boundaries.  Only when non-
+   downgradable parameters (such as the profile part of the
+   profile/level ID) are required to be changed does it become necessary
+   to terminate and re-start the media stream.  This may be accomplished
+   by using a different RTP payload type.
+
+   MANEs MAY follow the suggestions outlined in section 7.3 and remove
+   certain unusable packets from the packet stream when that stream was
+   damaged due to previous packet losses.  This can help reduce the
+   network load in certain special cases.
+
+11.  IANA Consideration
+
+   IANA has registered one new MIME type; see section 8.1.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 64]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+12.  Informative Appendix: Application Examples
+
+   This payload specification is very flexible in its use, in order to
+   cover the extremely wide application space anticipated for H.264.
+   However, this great flexibility also makes it difficult for an
+   implementer to decide on a reasonable packetization scheme.  Some
+   information on how to apply this specification to real-world
+   scenarios is likely to appear in the form of academic publications
+   and a test model software and description in the near future.
+   However, some preliminary usage scenarios are described here as well.
+
+12.1.  Video Telephony according to ITU-T Recommendation H.241
+       Annex A
+
+   H.323-based video telephony systems that use H.264 as an optional
+   video compression scheme are required to support H.241 Annex A [15]
+   as a packetization scheme.  The packetization mechanism defined in
+   this Annex is technically identical with a small subset of this
+   specification.
+
+   When a system operates according to H.241 Annex A, parameter set NAL
+   units are sent in-band.  Only Single NAL unit packets are used.  Many
+   such systems are not sending IDR pictures regularly, but only when
+   required by user interaction or by control protocol means; e.g., when
+   switching between video channels in a Multipoint Control Unit or for
+   error recovery requested by feedback.
+
+12.2.  Video Telephony, No Slice Data Partitioning, No NAL Unit
+       Aggregation
+
+   The RTP part of this scheme is implemented and tested (though not the
+   control-protocol part; see below).
+
+   In most real-world video telephony applications, picture parameters
+   such as picture size or optional modes never change during the
+   lifetime of a connection.  Therefore, all necessary parameter sets
+   (usually only one) are sent as a side effect of the capability
+   exchange/announcement process, e.g., according to the SDP syntax
+   specified in section 8.2 of this document.  As all necessary
+   parameter set information is established before the RTP session
+   starts, there is no need for sending any parameter set NAL units.
+   Slice data partitioning is not used, either.  Thus, the RTP packet
+   stream basically consists of NAL units that carry single coded
+   slices.
+
+   The encoder chooses the size of coded slice NAL units so that they
+   offer the best performance.  Often, this is done by adapting the
+   coded slice size to the MTU size of the IP network.  For small
+
+
+
+Wenger, et al.              Standards Track                    [Page 65]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   picture sizes, this may result in a one-picture-per-one-packet
+   strategy.  Intra refresh algorithms clean up the loss of packets and
+   the resulting drift-related artifacts.
+
+12.3.  Video Telephony, Interleaved Packetization Using NAL Unit
+       Aggregation
+
+   This scheme allows better error concealment and is used in H.263
+   based designs using RFC 2429 packetization [10].  It has been
+   implemented, and good results were reported [12].
+
+   The VCL encoder codes the source picture so that all macroblocks
+   (MBs) of one MB line are assigned to one slice.  All slices with even
+   MB row addresses are combined into one STAP, and all slices with odd
+   MB row addresses into another.  Those STAPs are transmitted as RTP
+   packets.  The establishment of the parameter sets is performed as
+   discussed above.
+
+   Note that the use of STAPs is essential here, as the high number of
+   individual slices (18 for a CIF picture) would lead to unacceptably
+   high IP/UDP/RTP header overhead (unless the source coding tool FMO is
+   used, which is not assumed in this scenario).  Furthermore, some
+   wireless video transmission systems, such as H.324M and the IP-based
+   video telephony specified in 3GPP, are likely to use relatively small
+   transport packet size.  For example, a typical MTU size of H.223 AL3
+   SDU is around 100 bytes [17].  Coding individual slices according to
+   this packetization scheme provides further advantage in communication
+   between wired and wireless networks, as individual slices are likely
+   to be smaller than the preferred maximum packet size of wireless
+   systems.  Consequently, a gateway can convert the STAPs used in a
+   wired network into several RTP packets with only one NAL unit, which
+   are preferred in a wireless network, and vice versa.
+
+12.4.  Video Telephony with Data Partitioning
+
+   This scheme has been implemented and has been shown to offer good
+   performance, especially at higher packet loss rates [12].
+
+   Data Partitioning is known to be useful only when some form of
+   unequal error protection is available.  Normally, in single-session
+   RTP environments, even error characteristics are assumed; i.e., the
+   packet loss probability of all packets of the session is the same
+   statistically.  However, there are means to reduce the packet loss
+   probability of individual packets in an RTP session.  A FEC packet
+   according to RFC 2733 [18], for example, specifies which media
+   packets are associated with the FEC packet.
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 66]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   In all cases, the incurred overhead is substantial but is in the same
+   order of magnitude as the number of bits that have otherwise been
+   spent for intra information.  However, this mechanism does not add
+   any delay to the system.
+
+   Again, the complete parameter set establishment is performed through
+   control protocol means.
+
+12.5.  Video Telephony or Streaming with FUs and Forward Error
+       Correction
+
+   This scheme has been implemented and has been shown to provide good
+   performance, especially at higher packet loss rates [19].
+
+   The most efficient means to combat packet losses for scenarios where
+   retransmissions are not applicable is forward error correction (FEC).
+   Although application layer, end-to-end use of FEC is often less
+   efficient than an FEC-based protection of individual links
+   (especially when links of different characteristics are in the
+   transmission path), application layer, end-to-end FEC is unavoidable
+   in some scenarios.  RFC 2733 [18] provides means to use generic,
+   application layer, end-to-end FEC in packet-loss environments.  A
+   binary forward error correcting code is generated by applying the XOR
+   operation to the bits at the same bit position in different packets.
+   The binary code can be specified by the parameters (n,k) in which k
+   is the number of information packets used in the connection and n is
+   the total number of packets generated for k information packets;
+   i.e., n-k parity packets are generated for k information packets.
+
+   When a code is used with parameters (n,k) within the RFC 2733
+   framework, the following properties are well known:
+
+   a) If applied over one RTP packet, RFC 2733 provides only packet
+      repetition.
+
+   b) RFC 2733 is most bit rate efficient if XOR-connected packets have
+      equal length.
+
+   c) At the same packet loss probability p and for a fixed k, the
+      greater the value of n is, the smaller the residual error
+      probability becomes.  For example, for a packet loss probability
+      of 10%, k=1, and n=2, the residual error probability is about 1%,
+      whereas for n=3, the residual error probability is about 0.1%.
+
+   d) At the same packet loss probability p and for a fixed code rate
+      k/n, the greater the value of n is, the smaller the residual error
+      probability becomes.  For example, at a packet loss probability of
+      p=10%, k=1 and n=2, the residual error rate is about 1%, whereas
+
+
+
+Wenger, et al.              Standards Track                    [Page 67]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+      for an extended Golay code with k=12 and n=24, the residual error
+      rate is about 0.01%.
+
+   For applying RFC 2733 in combination with H.264 baseline coded video
+   without using FUs, several options might be considered:
+
+   1) The video encoder produces NAL units for which each video frame is
+      coded in a single slice.  Applying FEC, one could use a simple
+      code; e.g., (n=2, k=1).  That is, each NAL unit would basically
+      just be repeated.  The disadvantage is obviously the bad code
+      performance according to d), above, and the low flexibility, as
+      only (n, k=1) codes can be used.
+
+   2) The video encoder produces NAL units for which each video frame is
+      encoded in one or more consecutive slices.  Applying FEC, one
+      could use a better code, e.g., (n=24, k=12), over a sequence of
+      NAL units.  Depending on the number of RTP packets per frame, a
+      loss may introduce a significant delay, which is reduced when more
+      RTP packets are used per frame.  Packets of completely different
+      length might also be connected, which decreases bit rate
+      efficiency according to b), above.  However, with some care and
+      for slices of 1kb or larger, similar length (100-200 bytes
+      difference) may be produced, which will not lower the bit
+      efficiency catastrophically.
+
+   3) The video encoder produces NAL units, for which a certain frame
+      contains k slices of possibly almost equal length.  Then, applying
+      FEC, a better code, e.g., (n=24, k=12), can be used over the
+      sequence of NAL units for each frame.  The delay compared to that
+      of 2), above,  may be reduced, but several disadvantages are
+      obvious.  First, the coding efficiency of the encoded video is
+      lowered significantly, as slice-structured coding reduces intra-
+      frame prediction and additional slice overhead is necessary.
+      Second, pre-encoded content or, when operating over a gateway, the
+      video is usually not appropriately coded with k slices such that
+      FEC can be applied.  Finally, the encoding of video producing k
+      slices of equal length is not straightforward and might require
+      more than one encoding pass.
+
+   Many of the mentioned disadvantages can be avoided by applying FUs in
+   combination with FEC.  Each NAL unit can be split into any number of
+   FUs of basically equal length; therefore, FEC with a reasonable k and
+   n can be applied, even if the encoder made no effort to produce
+   slices of equal length.  For example, a coded slice NAL unit
+   containing an entire frame can be split to k FUs, and a parity check
+   code (n=k+1, k) can be applied.  However, this has the disadvantage
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 68]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   that unless all created fragments can be recovered, the whole slice
+   will be lost.  Thus a larger section is lost than would be if the
+   frame had been split into several slices.
+
+   The presented technique makes it possible to achieve good
+   transmission error tolerance, even if no additional source coding
+   layer redundancy (such as periodic intra frames) is present.
+   Consequently, the same coded video sequence can be used to achieve
+   the maximum compression efficiency and quality over error-free
+   transmission and for transmission over error-prone networks.
+   Furthermore, the technique allows the application of FEC to pre-
+   encoded sequences without adding delay.  In this case, pre-encoded
+   sequences that are not encoded for error-prone networks can still be
+   transmitted almost reliably without adding extensive delays.  In
+   addition, FUs of equal length result in a bit rate efficient use of
+   RFC 2733.
+
+   If the error probability depends on the length of the transmitted
+   packet (e.g., in case of mobile transmission [14]), the benefits of
+   applying FUs with FEC are even more obvious.  Basically, the
+   flexibility of the size of FUs allows appropriate FEC to be applied
+   for each NAL unit and unequal error protection of NAL units.
+
+   When FUs and FEC are used, the incurred overhead is substantial but
+   is in the same order of magnitude as the number of bits that have to
+   be spent for intra-coded macroblocks if no FEC is applied.  In [19],
+   it was shown that the overall performance of the FEC-based approach
+   enhanced quality when using the same error rate and same overall bit
+   rate, including the overhead.
+
+12.6.  Low Bit-Rate Streaming
+
+   This scheme has been implemented with H.263 and non-standard RTP
+   packetization and has given good results [20].  There is no technical
+   reason why similarly good results could not be achievable with H.264.
+
+   In today's Internet streaming, some of the offered bit rates are
+   relatively low in order to allow terminals with dial-up modems to
+   access the content.  In wired IP networks, relatively large packets,
+   say 500 - 1500 bytes, are preferred to smaller and more frequently
+   occurring packets in order to reduce network congestion.  Moreover,
+   use of large packets decreases the amount of RTP/UDP/IP header
+   overhead.  For low bit-rate video, the use of large packets means
+   that sometimes up to few pictures should be encapsulated in one
+   packet.
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 69]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   However, loss of a packet including many coded pictures would have
+   drastic consequences for visual quality, as there is practically no
+   other way to conceal a loss of an entire picture than to repeat the
+   previous one.  One way to construct relatively large packets and
+   maintain possibilities for successful loss concealment is to
+   construct MTAPs that contain interleaved slices from several
+   pictures.  An MTAP should not contain spatially adjacent slices from
+   the same picture or spatially overlapping slices from any picture.
+   If a packet is lost, it is likely that a lost slice is surrounded by
+   spatially adjacent slices of the same picture and spatially
+   corresponding slices of the temporally previous and succeeding
+   pictures.  Consequently, concealment of the lost slice is likely to
+   be relatively successful.
+
+12.7.  Robust Packet Scheduling in Video Streaming
+
+   Robust packet scheduling has been implemented with MPEG-4 Part 2 and
+   simulated in a wireless streaming environment [21].  There is no
+   technical reason why similar or better results could not be
+   achievable with H.264.
+
+   Streaming clients typically have a receiver buffer that is capable of
+   storing a relatively large amount of data.  Initially, when a
+   streaming session is established, a client does not start playing the
+   stream back immediately.  Rather, it typically buffers the incoming
+   data for a few seconds.  This buffering helps maintain continuous
+   playback, as, in case of occasional increased transmission delays or
+   network throughput drops, the client can decode and play buffered
+   data.  Otherwise, without initial buffering, the client has to freeze
+   the display, stop decoding, and wait for incoming data.  The
+   buffering is also necessary for either automatic or selective
+   retransmission in any protocol level.  If any part of a picture is
+   lost, a retransmission mechanism may be used to resend the lost data.
+   If the retransmitted data is received before its scheduled decoding
+   or playback time, the loss is recovered perfectly.  Coded pictures
+   can be ranked according to their importance in the subjective quality
+   of the decoded sequence.  For example, non-reference pictures, such
+   as conventional B pictures, are subjectively least important, as
+   their absence does not affect decoding of any other pictures.  In
+   addition to non-reference pictures, the ITU-T H.264 | ISO/IEC
+   14496-10 standard includes a temporal scalability method called sub-
+   sequences [22].  Subjective ranking can also be made on coded slice
+   data partition or slice group basis.  Coded slices and coded slice
+   data partitions that are subjectively the most important can be sent
+   earlier than their decoding order indicates, whereas coded slices and
+   coded slice data partitions that are subjectively the least important
+   can be sent later than their natural coding order indicates.
+   Consequently, any retransmitted parts of the most important slices
+
+
+
+Wenger, et al.              Standards Track                    [Page 70]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   and coded slice data partitions are more likely to be received before
+   their scheduled decoding or playback time compared to the least
+   important slices and slice data partitions.
+
+13.  Informative Appendix: Rationale for Decoding Order Number
+
+13.1.  Introduction
+
+   The Decoding Order Number (DON) concept was introduced mainly to
+   enable efficient multi-picture slice interleaving (see section 12.6)
+   and robust packet scheduling (see section 12.7).  In both of these
+   applications, NAL units are transmitted out of decoding order.  DON
+   indicates the decoding order of NAL units and should be used in the
+   receiver to recover the decoding order.  Example use cases for
+   efficient multi-picture slice interleaving and for robust packet
+   scheduling are given in sections 13.2 and 13.3, respectively.
+   Section 13.4 describes the benefits of the DON concept in error
+   resiliency achieved by redundant coded pictures.  Section 13.5
+   summarizes considered alternatives to DON and justifies why DON was
+   chosen to this RTP payload specification.
+
+13.2.  Example of Multi-Picture Slice Interleaving
+
+   An example of multi-picture slice interleaving follows.  A subset of
+   a coded video sequence is depicted below in output order.  R denotes
+   a reference picture, N denotes a non-reference picture, and the
+   number indicates a relative output time.
+
+      ... R1 N2 R3 N4 R5 ...
+
+   The decoding order of these pictures from left to right is as
+   follows:
+
+      ... R1 R3 N2 R5 N4 ...
+
+   The NAL units of pictures R1, R3, N2, R5, and N4 are marked with a
+   DON equal to 1, 2, 3, 4, and 5, respectively.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 71]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Each reference picture consists of three slice groups that are
+   scattered as follows (a number denotes the slice group number for
+   each macroblock in a QCIF frame):
+
+      0 1 2 0 1 2 0 1 2 0 1
+      2 0 1 2 0 1 2 0 1 2 0
+      1 2 0 1 2 0 1 2 0 1 2
+      0 1 2 0 1 2 0 1 2 0 1
+      2 0 1 2 0 1 2 0 1 2 0
+      1 2 0 1 2 0 1 2 0 1 2
+      0 1 2 0 1 2 0 1 2 0 1
+      2 0 1 2 0 1 2 0 1 2 0
+      1 2 0 1 2 0 1 2 0 1 2
+
+
+   For the sake of simplicity, we assume that all the macroblocks of a
+   slice group are included in one slice.  Three MTAPs are constructed
+   from three consecutive reference pictures so that each MTAP contains
+   three aggregation units, each of which contains all the macroblocks
+   from one slice group.  The first MTAP contains slice group 0 of
+   picture R1, slice group 1 of picture R3, and slice group 2 of
+   picture R5.  The second MTAP contains slice group 1 of picture R1,
+   slice group 2 of picture R3, and slice group 0 of picture R5.  The
+   third MTAP contains slice group 2 of picture R1, slice group 0 of
+   picture R3, and slice group 1 of picture R5.  Each non-reference
+   picture is encapsulated into an STAP-B.
+
+   Consequently, the transmission order of NAL units is the following:
+
+      R1, slice group 0, DON 1, carried in MTAP,   RTP SN: N
+      R3, slice group 1, DON 2, carried in MTAP,   RTP SN: N
+      R5, slice group 2, DON 4, carried in MTAP,   RTP SN: N
+      R1, slice group 1, DON 1, carried in MTAP,   RTP SN: N+1
+      R3, slice group 2, DON 2, carried in MTAP,   RTP SN: N+1
+      R5, slice group 0, DON 4, carried in MTAP,   RTP SN: N+1
+      R1, slice group 2, DON 1, carried in MTAP,   RTP SN: N+2
+      R3, slice group 1, DON 2, carried in MTAP,   RTP SN: N+2
+      R5, slice group 0, DON 4, carried in MTAP,   RTP SN: N+2
+      N2,                DON 3, carried in STAP-B, RTP SN: N+3
+      N4,                DON 5, carried in STAP-B, RTP SN: N+4
+
+   The receiver is able to organize the NAL units back in decoding order
+   based on the value of DON associated with each NAL unit.
+
+   If one of the MTAPs is lost, the spatially adjacent and temporally
+   co-located macroblocks are received and can be used to conceal the
+   loss efficiently.  If one of the STAPs is lost, the effect of the
+   loss does not propagate temporally.
+
+
+
+Wenger, et al.              Standards Track                    [Page 72]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+13.3.  Example of Robust Packet Scheduling
+
+   An example of robust packet scheduling follows.  The communication
+   system used in the example consists of the following components in
+   the order that the video is processed from source to sink:
+
+      o camera and capturing
+      o pre-encoding buffer
+      o encoder
+      o encoded picture buffer
+      o transmitter
+      o transmission channel
+      o receiver
+      o receiver buffer
+      o decoder
+      o decoded picture buffer
+      o display
+
+   The video communication system used in the example operates as
+   follows.  Note that processing of the video stream happens gradually
+   and at the same time in all components of the system.  The source
+   video sequence is shot and captured to a pre-encoding buffer.  The
+   pre-encoding buffer can be used to order pictures from sampling order
+   to encoding order or to analyze multiple uncompressed frames for bit
+   rate control purposes, for example.  In some cases, the pre-encoding
+   buffer may not exist; instead, the sampled pictures are encoded right
+   away.  The encoder encodes pictures from the pre-encoding buffer and
+   stores the output; i.e., coded pictures, to the encoded picture
+   buffer.  The transmitter encapsulates the coded pictures from the
+   encoded picture buffer to transmission packets and sends them to a
+   receiver through a transmission channel.  The receiver stores the
+   received packets to the receiver buffer.  The receiver buffering
+   process typically includes buffering for transmission delay jitter.
+   The receiver buffer can also be used to recover correct decoding
+   order of coded data.  The decoder reads coded data from the receiver
+   buffer and produces decoded pictures as output into the decoded
+   picture buffer.  The decoded picture buffer is used to recover the
+   output (or display) order of pictures.  Finally, pictures are
+   displayed.
+
+   In the following example figures, I denotes an IDR picture, R denotes
+   a reference picture, N denotes a non-reference picture, and the
+   number after I, R, or N indicates the sampling time relative to the
+   previous IDR picture in decoding order.  Values below the sequence of
+   pictures indicate scaled system clock timestamps.  The system clock
+   is initialized arbitrarily in this example, and time runs from left
+   to right.  Each I, R, and N picture is mapped into the same timeline
+   compared to the previous processing step, if any, assuming that
+
+
+
+Wenger, et al.              Standards Track                    [Page 73]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   encoding, transmission, and decoding take no time.  Thus, events
+   happening at the same time are located in the same column throughout
+   all example figures.
+
+   A subset of a sequence of coded pictures is depicted below in
+   sampling order.
+
+       ...  N58 N59 I00 N01 N02 R03 N04 N05 R06 ... N58 N59 I00 N01 ...
+       ... --|---|---|---|---|---|---|---|---|- ... -|---|---|---|- ...
+       ...  58  59  60  61  62  63  64  65  66  ... 128 129 130 131 ...
+
+      Figure 16.  Sequence of pictures in sampling order
+
+   The sampled pictures are buffered in the pre-encoding buffer to
+   arrange them in encoding order.  In this example, we assume that the
+   non-reference pictures are predicted from both the previous and the
+   next reference picture in output order, except for the non-reference
+   pictures immediately preceding an IDR picture, which are predicted
+   only from the previous reference picture in output order.  Thus, the
+   pre-encoding buffer has to contain at least two pictures, and the
+   buffering causes a delay of two picture intervals.  The output of the
+   pre-encoding buffering process and the encoding (and decoding) order
+   of the pictures are as follows:
+
+                ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
+                ... -|---|---|---|---|---|---|---|---|- ...
+                ... 60  61  62  63  64  65  66  67  68  ...
+
+      Figure 17.  Re-ordered pictures in the pre-encoding buffer
+
+   The encoder or the transmitter can set the value of DON for each
+   picture to a value of DON for the previous picture in decoding order
+   plus one.
+
+   For the sake of simplicity, let us assume that:
+
+   o  the frame rate of the sequence is constant,
+   o  each picture consists of only one slice,
+   o  each slice is encapsulated in a single NAL unit packet,
+   o  there is no transmission delay, and
+   o  pictures are transmitted at constant intervals (that is, 1 / frame
+      rate).
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 74]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   When pictures are transmitted in decoding order, they are received as
+   follows:
+
+                ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
+                ... -|---|---|---|---|---|---|---|---|- ...
+                ... 60  61  62  63  64  65  66  67  68  ...
+
+      Figure 18.  Received pictures in decoding order
+
+   The OPTIONAL sprop-interleaving-depth MIME type parameter is set to
+   0, as the transmission (or reception) order is identical to the
+   decoding order.
+
+   The decoder has to buffer for one picture interval initially in its
+   decoded picture buffer to organize pictures from decoding order to
+   output order as depicted below:
+
+                    ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ...
+                    ... -|---|---|---|---|---|---|---|---|- ...
+                    ... 61  62  63  64  65  66  67  68  69  ...
+
+      Figure 19.  Output order
+
+   The amount of required initial buffering in the decoded picture
+   buffer can be signaled in the buffering period SEI message or with
+   the num_reorder_frames syntax element of H.264 video usability
+   information.  num_reorder_frames indicates the maximum number of
+   frames, complementary field pairs, or non-paired fields that precede
+   any frame, complementary field pair, or non-paired field in the
+   sequence in decoding order and that follow it in output order.  For
+   the sake of simplicity, we assume that num_reorder_frames is used to
+   indicate the initial buffer in the decoded picture buffer.  In this
+   example, num_reorder_frames is equal to 1.
+
+   It can be observed that if the IDR picture I00 is lost during
+   transmission and a retransmission request is issued when the value of
+   the system clock is 62, there is one picture interval of time (until
+   the system clock reaches timestamp 63) to receive the retransmitted
+   IDR picture I00.
+
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 75]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   Let us then assume that IDR pictures are transmitted two frame
+   intervals earlier than their decoding position; i.e., the pictures
+   are transmitted as follows:
+
+                       ...  I00 N58 N59 R03 N01 N02 R06 N04 N05 ...
+                       ... --|---|---|---|---|---|---|---|---|- ...
+                       ...  62  63  64  65  66  67  68  69  70  ...
+
+      Figure 20.  Interleaving: Early IDR pictures in sending order
+
+   The OPTIONAL sprop-interleaving-depth MIME type parameter is set
+   equal to 1 according to its definition.  (The value of sprop-
+   interleaving-depth in this example can be derived as follows:
+   Picture I00 is the only picture preceding picture N58 or N59 in
+   transmission order and following it in decoding order.  Except for
+   pictures I00, N58, and N59, the transmission order is the same as the
+   decoding order of pictures.  As a coded picture is encapsulated into
+   exactly one NAL unit, the value of sprop-interleaving-depth is equal
+   to the maximum number of pictures preceding any picture in
+   transmission order and following the picture in decoding order.)
+
+   The receiver buffering process contains two pictures at a time
+   according to the value of the sprop-interleaving-depth parameter and
+   orders pictures from the reception order to the correct decoding
+   order based on the value of DON associated with each picture.  The
+   output of the receiver buffering process is as follows:
+
+                            ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
+                            ... -|---|---|---|---|---|---|---|---|- ...
+                            ... 63  64  65  66  67  68  69  70  71  ...
+
+      Figure 21.  Interleaving: Receiver buffer
+
+   Again, an initial buffering delay of one picture interval is needed
+   to organize pictures from decoding order to output order, as depicted
+   below:
+
+                                ... N58 N59 I00 N01 N02 R03 N04 N05 ...
+                                ... -|---|---|---|---|---|---|---|- ...
+                                ... 64  65  66  67  68  69  70  71  ...
+
+      Figure 22.  Interleaving: Receiver buffer after reordering
+
+   Note that the maximum delay that IDR pictures can undergo during
+   transmission, including possible application, transport, or link
+   layer retransmission, is equal to three picture intervals.  Thus, the
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 76]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   loss resiliency of IDR pictures is improved in systems supporting
+   retransmission compared to the case in which pictures were
+   transmitted in their decoding order.
+
+13.4.  Robust Transmission Scheduling of Redundant Coded Slices
+
+   A redundant coded picture is a coded representation of a picture or a
+   part of a picture that is not used in the decoding process if the
+   corresponding primary coded picture is correctly decoded.  There
+   should be no noticeable difference between any area of the decoded
+   primary picture and a corresponding area that would result from
+   application of the H.264 decoding process for any redundant picture
+   in the same access unit.  A redundant coded slice is a coded slice
+   that is a part of a redundant coded picture.
+
+   Redundant coded pictures can be used to provide unequal error
+   protection in error-prone video transmission.  If a primary coded
+   representation of a picture is decoded incorrectly, a corresponding
+   redundant coded picture can be decoded.  Examples of applications and
+   coding techniques using the redundant codec picture feature include
+   the video redundancy coding [23] and the protection of "key pictures"
+   in multicast streaming [24].
+
+   One property of many error-prone video communications systems is that
+   transmission errors are often bursty.  Therefore, they may affect
+   more than one consecutive transmission packets in transmission order.
+   In low bit-rate video communication, it is relatively common that an
+   entire coded picture can be encapsulated into one transmission
+   packet.  Consequently, a primary coded picture and the corresponding
+   redundant coded pictures may be transmitted in consecutive packets in
+   transmission order.  To make the transmission scheme more tolerant of
+   bursty transmission errors, it is beneficial to transmit the primary
+   coded picture and redundant coded picture separated by more than a
+   single packet.  The DON concept enables this.
+
+13.5.  Remarks on Other Design Possibilities
+
+   The slice header syntax structure of the H.264 coding standard
+   contains the frame_num syntax element that can indicate the decoding
+   order of coded frames.  However, the usage of the frame_num syntax
+   element is not feasible or desirable to recover the decoding order,
+   due to the following reasons:
+
+   o  The receiver is required to parse at least one slice header per
+      coded picture (before passing the coded data to the decoder).
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 77]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   o  Coded slices from multiple coded video sequences cannot be
+      interleaved, as the frame number syntax element is reset to 0 in
+      each IDR picture.
+
+   o  The coded fields of a complementary field pair share the same
+      value of the frame_num syntax element.  Thus, the decoding order
+      of the coded fields of a complementary field pair cannot be
+      recovered based on the frame_num syntax element or any other
+      syntax element of the H.264 coding syntax.
+
+   The RTP payload format for transport of MPEG-4 elementary streams
+   [25] enables interleaving of access units and transmission of
+   multiple access units in the same RTP packet.  An access unit is
+   specified in the H.264 coding standard to comprise all NAL units
+   associated with a primary coded picture according to subclause
+   7.4.1.2 of [1].  Consequently, slices of different pictures cannot be
+   interleaved, and the multi-picture slice interleaving technique (see
+   section 12.6) for improved error resilience cannot be used.
+
+14.  Acknowledgements
+
+   The authors thank Roni Even, Dave Lindbergh, Philippe Gentric,
+   Gonzalo Camarillo, Gary Sullivan, Joerg Ott, and Colin Perkins for
+   careful review.
+
+15.  References
+
+15.1.  Normative References
+
+   [1]  ITU-T Recommendation H.264, "Advanced video coding for generic
+        audiovisual services", May 2003.
+
+   [2]  ISO/IEC International Standard 14496-10:2003.
+
+   [3]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
+        Levels", BCP 14, RFC 2119, March 1997.
+
+   [4]  Schulzrinne, H.,  Casner, S., Frederick, R., and V. Jacobson,
+        "RTP: A Transport Protocol for Real-Time Applications", STD 64,
+        RFC 3550, July 2003.
+
+   [5]  Handley, M. and V. Jacobson, "SDP: Session Description
+        Protocol", RFC 2327, April 1998.
+
+   [6]  Josefsson, S., "The Base16, Base32, and Base64 Data Encodings",
+        RFC 3548, July 2003.
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 78]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   [7]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
+        Session Description Protocol (SDP)", RFC 3264, June 2002.
+
+15.2.  Informative References
+
+   [8]  "Draft ITU-T Recommendation and Final Draft International
+        Standard of Joint Video Specification (ITU-T Rec. H.264 |
+        ISO/IEC 14496-10 AVC)", available from http://ftp3.itu.int/av-
+        arch/jvt-site/2003_03_Pattaya/JVT-G050r1.zip, May 2003.
+
+   [9]  Luthra, A., Sullivan, G.J., and T. Wiegand (eds.), Special Issue
+        on H.264/AVC. IEEE Transactions on Circuits and Systems on Video
+        Technology, July 2003.
+
+   [10] Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C.,
+        Newell, D., Ott, J., Sullivan, G., Wenger, S., and C. Zhu, "RTP
+        Payload Format for the 1998 Version of ITU-T Rec. H.263 Video
+        (H.263+)", RFC 2429, October 1998.
+
+   [11] ISO/IEC IS 14496-2.
+
+   [12] Wenger, S., "H.26L over IP", IEEE Transaction on Circuits and
+        Systems for Video technology, Vol. 13, No. 7, July 2003.
+
+   [13] Wenger, S., "H.26L over IP: The IP Network Adaptation Layer",
+        Proceedings Packet Video Workshop 02, April 2002.
+
+   [14] Stockhammer, T., Hannuksela, M.M., and S. Wenger, "H.26L/JVT
+        Coding Network Abstraction Layer and IP-based Transport" in
+        Proc. ICIP 2002, Rochester, NY, September 2002.
+
+   [15] ITU-T Recommendation H.241, "Extended video procedures and
+        control signals for H.300 series terminals", 2004.
+
+   [16] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
+        Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
+
+   [17] ITU-T Recommendation H.223, "Multiplexing protocol for low bit
+        rate multimedia communication", July 2001.
+
+   [18] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
+        Generic Forward Error Correction", RFC 2733, December 1999.
+
+   [19] Stockhammer, T., Wiegand, T., Oelbaum, T., and F. Obermeier,
+        "Video Coding and Transport Layer Techniques for H.264/AVC-Based
+        Transmission over Packet-Lossy Networks", IEEE International
+        Conference on Image Processing (ICIP 2003), Barcelona, Spain,
+        September 2003.
+
+
+
+Wenger, et al.              Standards Track                    [Page 79]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   [20] Varsa, V. and M. Karczewicz, "Slice interleaving in compressed
+        video packetization", Packet Video Workshop 2000.
+
+   [21] Kang, S.H. and A. Zakhor, "Packet scheduling algorithm for
+        wireless video streaming," International Packet Video Workshop
+        2002.
+
+   [22] Hannuksela, M.M., "Enhanced concept of GOP", JVT-B042, available
+        http://ftp3.itu.int/av-arch/video-site/0201_Gen/JVT-B042.doc,
+        January 2002.
+
+   [23] Wenger, S., "Video Redundancy Coding in H.263+", 1997
+        International Workshop on Audio-Visual Services over Packet
+        Networks, September 1997.
+
+   [24] Wang, Y.-K., Hannuksela, M.M., and M. Gabbouj, "Error Resilient
+        Video Coding Using Unequally Protected Key Pictures", in Proc.
+        International Workshop VLBV03, September 2003.
+
+   [25] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
+        P. Gentric, "RTP Payload Format for Transport of MPEG-4
+        Elementary Streams", RFC 3640, November 2003.
+
+   [26] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+        Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
+        3711, March 2004.
+
+   [27] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
+        Protocol (RTSP)", RFC 2326, April 1998.
+
+   [28] Handley, M., Perkins, C., and E. Whelan, "Session Announcement
+        Protocol", RFC 2974, October 2000.
+
+   [29] ISO/IEC 14496-15: "Information technology - Coding of audio-
+        visual objects - Part 15: Advanced Video Coding (AVC) file
+        format".
+
+   [30] Castagno, R. and D. Singer, "MIME Type Registrations for 3rd
+        Generation Partnership Project (3GPP) Multimedia files", RFC
+        3839, July 2004.
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 80]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+Authors' Addresses
+
+   Stephan Wenger
+   TU Berlin / Teles AG
+   Franklinstr. 28-29
+   D-10587 Berlin
+   Germany
+
+   Phone: +49-172-300-0813
+   EMail: stewe@stewe.org
+
+
+   Miska M. Hannuksela
+   Nokia Corporation
+   P.O. Box 100
+   33721 Tampere
+   Finland
+
+   Phone: +358-7180-73151
+   EMail: miska.hannuksela@nokia.com
+
+
+   Thomas Stockhammer
+   Nomor Research
+   D-83346 Bergen
+   Germany
+
+   Phone: +49-8662-419407
+   EMail: stockhammer@nomor.de
+
+
+   Magnus Westerlund
+   Multimedia Technologies
+   Ericsson Research EAB/TVA/A
+   Ericsson AB
+   Torshamsgatan 23
+   SE-164 80 Stockholm
+   Sweden
+
+   Phone: +46-8-7190000
+   EMail: magnus.westerlund@ericsson.com
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 81]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+   David Singer
+   QuickTime Engineering
+   Apple
+   1 Infinite Loop MS 302-3MT
+   Cupertino
+   CA 95014
+   USA
+
+   Phone +1 408 974-3162
+   EMail: singer@apple.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 82]
+
+RFC 3984           RTP Payload Format for H.264 Video      February 2005
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2005).
+
+   This document is subject to the rights, licenses and restrictions
+   contained in BCP 78, and except as set forth therein, the authors
+   retain all their rights.
+
+   This document and the information contained herein are provided on an
+   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+   The IETF takes no position regarding the validity or scope of any
+   Intellectual Property Rights or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; nor does it represent that it has
+   made any independent effort to identify any such rights.  Information
+   on the IETF's procedures with respect to rights in IETF Documents can
+   be found in BCP 78 and BCP 79.
+
+   Copies of IPR disclosures made to the IETF Secretariat and any
+   assurances of licenses to be made available, or the result of an
+   attempt made to obtain a general license or permission for the use of
+   such proprietary rights by implementers or users of this
+   specification can be obtained from the IETF on-line IPR repository at
+   http://www.ietf.org/ipr.
+
+   The IETF invites any interested party to bring to its attention any
+   copyrights, patents or patent applications, or other proprietary
+   rights that may cover technology that may be required to implement
+   this standard.  Please address the information to the IETF at ietf-
+   ipr@ietf.org.
+
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+Wenger, et al.              Standards Track                    [Page 83]
+