summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6184.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc6184.txt')
-rw-r--r--doc/rfc/rfc6184.txt5659
1 files changed, 5659 insertions, 0 deletions
diff --git a/doc/rfc/rfc6184.txt b/doc/rfc/rfc6184.txt
new file mode 100644
index 0000000..ef748fe
--- /dev/null
+++ b/doc/rfc/rfc6184.txt
@@ -0,0 +1,5659 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) Y.-K. Wang
+Request for Comments: 6184 R. Even
+Obsoletes: 3984 Huawei Technologies
+Category: Standards Track T. Kristensen
+ISSN: 2070-1721 Tandberg
+ R. Jesup
+ WorldGate Communications
+ May 2011
+
+
+ RTP Payload Format for H.264 Video
+
+Abstract
+
+ This memo describes an RTP Payload format for the ITU-T
+ Recommendation H.264 video codec and the technically identical
+ ISO/IEC International Standard 14496-10 video codec, excluding the
+ Scalable Video Coding (SVC) extension and the Multiview Video Coding
+ extension, for which the RTP payload formats are defined elsewhere.
+ The RTP payload format allows for packetization of one or more
+ Network Abstraction Layer Units (NALUs), produced by an H.264 video
+ encoder, in each RTP payload. The payload format has wide
+ applicability, as it supports applications from simple low bitrate
+ conversational usage, to Internet video streaming with interleaved
+ transmission, to high bitrate video-on-demand.
+
+ This memo obsoletes RFC 3984. Changes from RFC 3984 are summarized
+ in Section 14. Issues on backward compatibility to RFC 3984 are
+ discussed in Section 15.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6184.
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 1]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+Copyright Notice
+
+ Copyright (c) 2011 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction ....................................................4
+ 1.1. The H.264 Codec ............................................4
+ 1.2. Parameter Set Concept ......................................5
+ 1.3. Network Abstraction Layer Unit Types .......................6
+ 2. Conventions .....................................................7
+ 3. Scope ...........................................................7
+ 4. Definitions and Abbreviations ...................................7
+ 4.1. Definitions ................................................7
+ 4.2. Abbreviations ..............................................9
+ 5. RTP Payload Format .............................................10
+ 5.1. RTP Header Usage ..........................................10
+ 5.2. Payload Structures ........................................12
+ 5.3. NAL Unit Header Usage .....................................13
+ 5.4. Packetization Modes .......................................16
+ 5.5. Decoding Order Number (DON) ...............................17
+ 5.6. Single NAL Unit Packet ....................................19
+ 5.7. Aggregation Packets .......................................20
+ 5.7.1. Single-Time Aggregation Packet (STAP) ..............22
+ 5.7.2. Multi-Time Aggregation Packets (MTAPs) .............25
+ 5.8. Fragmentation Units (FUs) .................................29
+ 6. Packetization Rules ............................................33
+ 6.1. Common Packetization Rules ................................33
+ 6.2. Single NAL Unit Mode ......................................34
+ 6.3. Non-Interleaved Mode ......................................34
+ 6.4. Interleaved Mode ..........................................34
+ 7. De-Packetization Process .......................................35
+ 7.1. Single NAL Unit and Non-Interleaved Mode ..................35
+ 7.2. Interleaved Mode ..........................................35
+ 7.2.1. Size of the De-Interleaving Buffer .................36
+ 7.2.2. De-Interleaving Process ............................36
+ 7.3. Additional De-Packetization Guidelines ....................38
+
+
+
+Wang, et al. Standards Track [Page 2]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ 8. Payload Format Parameters ......................................39
+ 8.1. Media Type Registration ...................................39
+ 8.2. SDP Parameters ............................................57
+ 8.2.1. Mapping of Payload Type Parameters to SDP ..........57
+ 8.2.2. Usage with the SDP Offer/Answer Model ..............58
+ 8.2.3. Usage in Declarative Session Descriptions ..........66
+ 8.3. Examples ..................................................68
+ 8.4. Parameter Set Considerations ..............................75
+ 8.5. Decoder Refresh Point Procedure Using In-Band
+ Transport of Parameter Sets (Informative)..................78
+ 8.5.1. IDR Procedure to Respond to a Request for
+ a Decoder Refresh Point ............................78
+ 8.5.2. Gradual Recovery Procedure to Respond to
+ a Request for a Decoder Refresh Point ..............79
+ 9. Security Considerations ........................................79
+ 10. Congestion Control ............................................80
+ 11. IANA Considerations ...........................................81
+ 12. Informative Appendix: Application Examples ....................81
+ 12.1. Video Telephony According to Annex A of ITU-T
+ Recommendation H.241 .....................................81
+ 12.2. Video Telephony, No Slice Data Partitioning, No
+ NAL Unit Aggregation .....................................82
+ 12.3. Video Telephony, Interleaved Packetization Using
+ NAL Unit Aggregation .....................................82
+ 12.4. Video Telephony with Data Partitioning ...................83
+ 12.5. Video Telephony or Streaming with FUs and Forward
+ Error Correction .........................................83
+ 12.6. Low Bitrate Streaming ....................................86
+ 12.7. Robust Packet Scheduling in Video Streaming ..............86
+ 13. Informative Appendix: Rationale for Decoding Order Number .....87
+ 13.1. Introduction .............................................87
+ 13.2. Example of Multi-Picture Slice Interleaving ..............88
+ 13.3. Example of Robust Packet Scheduling ......................89
+ 13.4. Robust Transmission Scheduling of Redundant Coded
+ Slices ...................................................93
+ 13.5. Remarks on Other Design Possibilities ....................94
+ 14. Changes from RFC 3984 .........................................94
+ 15. Backward Compatibility to RFC 3984 ............................96
+ 16. Acknowledgements ..............................................98
+ 17. References ....................................................98
+ 17.1. Normative References .....................................98
+ 17.2. Informative References ...................................99
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 3]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+1. Introduction
+
+ This memo specifies an RTP payload specification for the video coding
+ standard known as ITU-T Recommendation H.264 [1] and ISO/IEC
+ International Standard 14496-10 [2] (both also known as Advanced
+ Video Coding (AVC)). In this memo, the name H.264 is used for the
+ codec and the standard, but this memo is equally applicable to the
+ ISO/IEC counterpart of the coding standard.
+
+ This memo obsoletes RFC 3984. Changes from RFC 3984 are summarized
+ in Section 14. Issues on backward compatibility to RFC 3984 are
+ discussed in Section 15.
+
+1.1. The H.264 Codec
+
+ The H.264 video codec has a very broad application range that covers
+ all forms of digital compressed video, from low bitrate Internet
+ streaming applications to HDTV broadcast and Digital Cinema
+ applications with nearly lossless coding. Compared to the current
+ state of technology, the overall performance of H.264 is such that
+ bitrate savings of 50% or more are reported. Digital Satellite TV
+ quality, for example, was reported to be achievable at 1.5 Mbit/s,
+ compared to the current operation point of MPEG 2 video at around 3.5
+ Mbit/s [10].
+
+ The codec specification [1] itself conceptually distinguishes between
+ a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL).
+ The VCL contains the signal processing functionality of the codec;
+ mechanisms such as transform, quantization, and motion-compensated
+ prediction; and a loop filter. It follows the general concept of
+ most of today's video codecs, a macroblock-based coder that uses
+ inter picture prediction with motion compensation and transform
+ coding of the residual signal. The VCL encoder outputs slices: a bit
+ string that contains the macroblock data of an integer number of
+ macroblocks and the information of the slice header (containing the
+ spatial address of the first macroblock in the slice, the initial
+ quantization parameter, and similar information). Macroblocks in
+ slices are arranged in scan order unless a different macroblock
+ allocation is specified using the syntax of slice groups. In-picture
+ prediction is used only within a slice. More information is provided
+ in [10].
+
+ The NAL encoder encapsulates the slice output of the VCL encoder into
+ Network Abstraction Layer Units (NALUs), which are suitable for
+ transmission over packet networks or for use in packet-oriented
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 4]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ multiplex environments. Annex B of H.264 defines an encapsulation
+ process to transmit such NALUs over bytestream-oriented networks. In
+ the scope of this memo, Annex B is not relevant.
+
+ Internally, the NAL uses NAL units. A NAL unit consists of a one-
+ byte header and the payload byte string. The header indicates the
+ type of the NAL unit, the (potential) presence of bit errors or
+ syntax violations in the NAL unit payload, and information regarding
+ the relative importance of the NAL unit for the decoding process.
+ This RTP payload specification is designed to be unaware of the bit
+ string in the NAL unit payload.
+
+ One of the main properties of H.264 is the complete decoupling of the
+ transmission time, the decoding time, and the sampling or
+ presentation time of slices and pictures. The decoding process
+ specified in H.264 is unaware of time, and the H.264 syntax does not
+ carry information such as the number of skipped frames (as is common
+ in the form of the Temporal Reference in earlier video compression
+ standards). Also, there are NAL units that affect many pictures and
+ that are, therefore, inherently timeless. For this reason, the
+ handling of the RTP timestamp requires some special considerations
+ for NAL units for which the sampling or presentation time is not
+ defined or, at transmission time, is unknown.
+
+1.2. Parameter Set Concept
+
+ One very fundamental design concept of H.264 is to generate self-
+ contained packets, to make mechanisms such as the header duplication
+ of RFC 4629 [11] or MPEG-4 Visual's Header Extension Code (HEC) [12]
+ unnecessary. This was achieved by decoupling information relevant to
+ more than one slice from the media stream. This higher-layer meta
+ information should be sent reliably, asynchronously, and in advance
+ from the RTP packet stream that contains the slice packets.
+ (Provisions for sending this information in-band are also available
+ for applications that do not have an out-of-band transport channel
+ appropriate for the purpose). The combination of the higher-level
+ parameters is called a parameter set. The H.264 specification
+ includes two types of parameter sets: sequence parameter sets and
+ picture parameter sets. An active sequence parameter set remains
+ unchanged throughout a coded video sequence, and an active picture
+ parameter set remains unchanged within a coded picture. The sequence
+ and picture parameter set structures contain information such as
+ picture size, optional coding modes employed, and macroblock to slice
+ group map.
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 5]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ To be able to change picture parameters (such as the picture size)
+ without having to transmit parameter set updates synchronously to the
+ slice packet stream, the encoder and decoder can maintain a list of
+ more than one sequence and picture parameter set. Each slice header
+ contains a codeword that indicates the sequence and picture parameter
+ set to be used.
+
+ This mechanism allows the decoupling of the transmission of parameter
+ sets from the packet stream and the transmission of them by external
+ means (e.g., as a side effect of the capability exchange) or through
+ a (reliable or unreliable) control protocol. It may even be possible
+ that they are never transmitted but are fixed by an application
+ design specification.
+
+1.3. Network Abstraction Layer Unit Types
+
+ Tutorial information on the NAL design can be found in [13], [14],
+ and [15].
+
+ All NAL units consist of a single NAL unit type octet, which also
+ co-serves as the payload header of this RTP payload format. A
+ description of the payload of a NAL unit follows.
+
+ The syntax and semantics of the NAL unit type octet are specified in
+ [1], but the essential properties of the NAL unit type octet are
+ summarized below. The NAL unit type octet has the following format:
+
+ +---------------+
+ |0|1|2|3|4|5|6|7|
+ +-+-+-+-+-+-+-+-+
+ |F|NRI| Type |
+ +---------------+
+
+ The semantics of the components of the NAL unit type octet, as
+ specified in the H.264 specification, are described briefly below.
+
+ F: 1 bit
+ forbidden_zero_bit. The H.264 specification declares a
+ value of 1 as a syntax violation.
+
+ NRI: 2 bits
+ nal_ref_idc. A value of 00 indicates that the content of
+ the NAL unit is not used to reconstruct reference pictures
+ for inter picture prediction. Such NAL units can be
+ discarded without risking the integrity of the reference
+ pictures. Values greater than 00 indicate that the decoding
+ of the NAL unit is required to maintain the integrity of the
+ reference pictures.
+
+
+
+Wang, et al. Standards Track [Page 6]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Type: 5 bits
+ nal_unit_type. This component specifies the NAL unit
+ payload type as defined in Table 7-1 of [1] and later within
+ this memo. For a reference of all currently defined NAL
+ unit types and their semantics, please refer to Section
+ 7.4.1 in [1].
+
+ This memo introduces new NAL unit types, which are presented in
+ Section 5.2. The NAL unit types defined in this memo are marked as
+ unspecified in [1]. Moreover, this specification extends the
+ semantics of F and NRI as described in Section 5.3.
+
+2. Conventions
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [4].
+
+ This specification uses the notion of setting and clearing a bit when
+ bit fields are handled. Setting a bit is the same as assigning that
+ bit the value of 1 (On). Clearing a bit is the same as assigning
+ that bit the value of 0 (Off).
+
+3. Scope
+
+ This payload specification can only be used to carry the "naked"
+ H.264 NAL unit stream over RTP and not the bitstream format discussed
+ in Annex B of H.264. Likely, the first applications of this
+ specification will be in the conversational multimedia field, video
+ telephony or video conferencing, but the payload format also covers
+ other applications, such as Internet streaming and TV over IP.
+
+4. Definitions and Abbreviations
+
+4.1. Definitions
+
+ This document uses the definitions of [1]. The following terms,
+ defined in [1], are summed up for convenience:
+
+ access unit: A set of NAL units always containing a primary coded
+ picture. In addition to the primary coded picture, an access unit
+ may also contain one or more redundant coded pictures or other NAL
+ units not containing slices or slice data partitions of a coded
+ picture. The decoding of an access unit always results in a
+ decoded picture.
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 7]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ coded video sequence: A sequence of access units that consists, in
+ decoding order, of an instantaneous decoding refresh (IDR) access
+ unit followed by zero or more non-IDR access units including all
+ subsequent access units up to but not including any subsequent IDR
+ access unit.
+
+ IDR access unit: An access unit in which the primary coded picture
+ is an IDR picture.
+
+ IDR picture: A coded picture containing only slices with I or SI
+ slice types that causes a "reset" in the decoding process. After
+ the decoding of an IDR picture, all following coded pictures in
+ decoding order can be decoded without inter prediction from any
+ picture decoded prior to the IDR picture.
+
+ primary coded picture: The coded representation of a picture to be
+ used by the decoding process for a bitstream conforming to H.264.
+ The primary coded picture contains all macroblocks of the picture.
+
+ redundant coded picture: A coded representation of a picture or a
+ part of a picture. The content of a redundant coded picture shall
+ not be used by the decoding process for a bitstream conforming to
+ H.264. The content of a redundant coded picture may be used by
+ the decoding process for a bitstream that contains errors or
+ losses.
+
+ VCL NAL unit: A collective term used to refer to coded slice and
+ coded data partition NAL units.
+
+ In addition, the following definitions apply:
+
+ decoding order number (DON): A field in the payload structure or a
+ derived variable indicating NAL unit decoding order. Values of
+ DON are in the range of 0 to 65535, inclusive. After reaching the
+ maximum value, the value of DON wraps around to 0.
+
+ NAL unit decoding order: A NAL unit order that conforms to the
+ constraints on NAL unit order given in Section 7.4.1.2 in [1].
+
+ NALU-time: The value that the RTP timestamp would have if the NAL
+ unit would be transported in its own RTP packet.
+
+ transmission order: The order of packets in ascending RTP sequence
+ number order (in modulo arithmetic). Within an aggregation
+ packet, the NAL unit transmission order is the same as the order
+ of appearance of NAL units in the packet.
+
+
+
+
+
+Wang, et al. Standards Track [Page 8]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ media-aware network element (MANE): A network element, such as a
+ middlebox or application layer gateway that is capable of parsing
+ certain aspects of the RTP payload headers or the RTP payload and
+ reacting to the contents.
+
+ Informative note: The concept of a MANE goes beyond normal
+ routers or gateways in that a MANE has to be aware of the
+ signaling (e.g., to learn about the payload type mappings of
+ the media streams) and that it has to be trusted when working
+ with Secure Real-time Transport Protocol (SRTP). The advantage
+ of using MANEs is that they allow packets to be dropped
+ according to the needs of the media coding. For example, if a
+ MANE has to drop packets due to congestion on a certain link,
+ it can identify and remove those packets whose elimination
+ produces the least adverse effect on the user experience.
+
+ static macroblock: A certain amount of macroblocks in the video
+ stream can be defined as static, as defined in Section 8.3.2.8 in
+ [3]. Static macroblocks free up additional processing cycles for
+ the handling of non-static macroblocks. Based on a given amount
+ of video processing resources and a given resolution, a higher
+ number of static macroblocks enables a correspondingly higher
+ frame rate.
+
+ default sub-profile: The subset of coding tools, which may be all
+ coding tools of one profile or the common subset of coding tools
+ of more than one profile, indicated by the profile-level-id
+ parameter.
+
+ default level: The level indicated by the profile-level-id
+ parameter, which consists of three octets, profile_idc, profile-
+ iop, and level_idc. The default level is indicated by level_idc
+ in most cases, and, in some cases, additionally by profile-iop.
+
+4.2. Abbreviations
+
+ DON: Decoding Order Number
+ DONB: Decoding Order Number Base
+ DOND: Decoding Order Number Difference
+ FEC: Forward Error Correction
+ FU: Fragmentation Unit
+ IDR: Instantaneous Decoding Refresh
+ IEC: International Electrotechnical Commission
+ ISO: International Organization for Standardization
+ ITU-T: International Telecommunication Union,
+ Telecommunication Standardization Sector
+ MANE: Media-Aware Network Element
+ MTAP: Multi-Time Aggregation Packet
+
+
+
+Wang, et al. Standards Track [Page 9]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ MTAP16: MTAP with 16-bit timestamp offset
+ MTAP24: MTAP with 24-bit timestamp offset
+ NAL: Network Abstraction Layer
+ NALU: NAL Unit
+ SAR: Sample Aspect Ratio
+ SEI: Supplemental Enhancement Information
+ STAP: Single-Time Aggregation Packet
+ STAP-A: STAP type A
+ STAP-B: STAP type B
+ TS: Timestamp
+ VCL: Video Coding Layer
+ VUI: Video Usability Information
+
+5. RTP Payload Format
+
+5.1. RTP Header Usage
+
+ The format of the RTP header is specified in RFC 3550 [5] and
+ reprinted in Figure 1 for convenience. This payload format uses the
+ fields of the header in a manner consistent with that specification.
+
+ When one NAL unit is encapsulated per RTP packet, the RECOMMENDED RTP
+ payload format is specified in Section 5.6. The RTP payload (and the
+ settings for some RTP header bits) for aggregation packets and
+ fragmentation units are specified in Sections 5.7.2 and 5.8,
+ respectively.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |V=2|P|X| CC |M| PT | sequence number |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | timestamp |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | synchronization source (SSRC) identifier |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | contributing source (CSRC) identifiers |
+ | .... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 1. RTP header according to RFC 3550
+
+ The RTP header information to be set according to this RTP payload
+ format is set as follows:
+
+ Marker bit (M): 1 bit
+ Set for the very last packet of the access unit indicated by the
+ RTP timestamp, in line with the normal use of the M bit in video
+
+
+
+Wang, et al. Standards Track [Page 10]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ formats, to allow an efficient playout buffer handling. For
+ aggregation packets (STAP and MTAP), the marker bit in the RTP
+ header MUST be set to the value that the marker bit of the last
+ NAL unit of the aggregation packet would have been if it were
+ transported in its own RTP packet. Decoders MAY use this bit as
+ an early indication of the last packet of an access unit but MUST
+ NOT rely on this property.
+
+ Informative note: Only one M bit is associated with an
+ aggregation packet carrying multiple NAL units. Thus, if a
+ gateway has re-packetized an aggregation packet into several
+ packets, it cannot reliably set the M bit of those packets.
+
+ Payload type (PT): 7 bits
+ The assignment of an RTP payload type for this new packet format
+ is outside the scope of this document and will not be specified
+ here. The assignment of a payload type has to be performed either
+ through the profile used or in a dynamic way.
+
+ Sequence number (SN): 16 bits
+ Set and used in accordance with RFC 3550. For the single NALU and
+ non-interleaved packetization mode, the sequence number is used to
+ determine decoding order for the NALU.
+
+ Timestamp: 32 bits
+ The RTP timestamp is set to the sampling timestamp of the content.
+ A 90 kHz clock rate MUST be used.
+
+ If the NAL unit has no timing properties of its own (e.g.,
+ parameter set and SEI NAL units), the RTP timestamp is set to the
+ RTP timestamp of the primary coded picture of the access unit in
+ which the NAL unit is included, according to Section 7.4.1.2 of
+ [1].
+
+ The setting of the RTP timestamp for MTAPs is defined in Section
+ 5.7.2.
+
+ Receivers SHOULD ignore any picture timing SEI messages included
+ in access units that have only one display timestamp. Instead,
+ receivers SHOULD use the RTP timestamp for synchronizing the
+ display process.
+
+ If one access unit has more than one display timestamp carried in
+ a picture timing SEI message, then the information in the SEI
+ message SHOULD be treated as relative to the RTP timestamp, with
+ the earliest event occurring at the time given by the RTP
+ timestamp and subsequent events later, as given by the difference
+ in picture time values carried in the picture timing SEI message.
+
+
+
+Wang, et al. Standards Track [Page 11]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Let tSEI1, tSEI2, ..., tSEIn be the display timestamps carried in
+ the SEI message of an access unit, where tSEI1 is the earliest of
+ all such timestamps. Let tmadjst() be a function that adjusts the
+ SEI messages time scale to a 90-kHz time scale. Let TS be the RTP
+ timestamp. Then, the display time for the event associated with
+ tSEI1 is TS. The display time for the event with tSEIx, where x
+ is [2..n], is TS + tmadjst (tSEIx - tSEI1).
+
+ Informative note: Displaying coded frames as fields is needed
+ commonly in an operation known as 3:2 pulldown, in which film
+ content that consists of coded frames is displayed on a display
+ using interlaced scanning. The picture timing SEI message
+ enables carriage of multiple timestamps for the same coded
+ picture, and therefore the 3:2 pulldown process is perfectly
+ controlled. The picture timing SEI message mechanism is
+ necessary because only one timestamp per coded frame can be
+ conveyed in the RTP timestamp.
+
+5.2. Payload Structures
+
+ The payload format defines three different basic payload structures.
+ A receiver can identify the payload structure by the first byte of
+ the RTP packet payload, which co-serves as the RTP payload header
+ and, in some cases, as the first byte of the payload. This byte is
+ always structured as a NAL unit header. The NAL unit type field
+ indicates which structure is present. The possible structures are as
+ follows.
+
+ Single NAL Unit Packet: Contains only a single NAL unit in the
+ payload. The NAL header type field is equal to the original NAL unit
+ type, i.e., in the range of 1 to 23, inclusive. Specified in Section
+ 5.6.
+
+ Aggregation Packet: Packet type used to aggregate multiple NAL units
+ into a single RTP payload. This packet exists in four versions, the
+ Single-Time Aggregation Packet type A (STAP-A), the Single-Time
+ Aggregation Packet type B (STAP-B), Multi-Time Aggregation Packet
+ (MTAP) with 16-bit offset (MTAP16), and Multi-Time Aggregation Packet
+ (MTAP) with 24-bit offset (MTAP24). The NAL unit type numbers
+ assigned for STAP-A, STAP-B, MTAP16, and MTAP24 are 24, 25, 26, and
+ 27, respectively. Specified in Section 5.7.
+
+ Fragmentation Unit: Used to fragment a single NAL unit over multiple
+ RTP packets. Exists with two versions, FU-A and FU-B, identified
+ with the NAL unit type numbers 28 and 29, respectively. Specified in
+ Section 5.8.
+
+
+
+
+
+Wang, et al. Standards Track [Page 12]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Informative note: This specification does not limit the size of
+ NAL units encapsulated in single NAL unit packets and
+ fragmentation units. The maximum size of a NAL unit encapsulated
+ in any aggregation packet is 65535 bytes.
+
+ Table 1 summarizes NAL unit types and the corresponding RTP packet
+ types when each of these NAL units is directly used as a packet
+ payload, and where the types are described in this memo.
+
+ Table 1. Summary of NAL unit types and the corresponding packet
+ types
+
+ NAL Unit Packet Packet Type Name Section
+ Type Type
+ -------------------------------------------------------------
+ 0 reserved -
+ 1-23 NAL unit Single NAL unit packet 5.6
+ 24 STAP-A Single-time aggregation packet 5.7.1
+ 25 STAP-B Single-time aggregation packet 5.7.1
+ 26 MTAP16 Multi-time aggregation packet 5.7.2
+ 27 MTAP24 Multi-time aggregation packet 5.7.2
+ 28 FU-A Fragmentation unit 5.8
+ 29 FU-B Fragmentation unit 5.8
+ 30-31 reserved -
+
+5.3. NAL Unit Header Usage
+
+ The structure and semantics of the NAL unit header were introduced in
+ Section 1.3. For convenience, the format of the NAL unit header is
+ reprinted below:
+
+ +---------------+
+ |0|1|2|3|4|5|6|7|
+ +-+-+-+-+-+-+-+-+
+ |F|NRI| Type |
+ +---------------+
+
+ This section specifies the semantics of F and NRI according to this
+ specification.
+
+ F: 1 bit
+ forbidden_zero_bit. A value of 0 indicates that the NAL unit
+ type octet and payload should not contain bit errors or other
+ syntax violations. A value of 1 indicates that the NAL unit
+ type octet and payload may contain bit errors or other syntax
+ violations.
+
+
+
+
+
+Wang, et al. Standards Track [Page 13]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ MANEs SHOULD set the F bit to indicate detected bit errors in
+ the NAL unit. The H.264 specification requires that the F bit
+ be equal to 0. When the F bit is set, the decoder is advised
+ that bit errors or any other syntax violations may be present
+ in the payload or in the NAL unit type octet. The simplest
+ decoder reaction to a NAL unit in which the F bit is equal to 1
+ is to discard such a NAL unit and to conceal the lost data in
+ the discarded NAL unit.
+
+ NRI: 2 bits
+ nal_ref_idc. The semantics of value 00 and a non-zero value
+ remain unchanged from the H.264 specification. In other words,
+ a value of 00 indicates that the content of the NAL unit is not
+ used to reconstruct reference pictures for inter picture
+ prediction. Such NAL units can be discarded without risking
+ the integrity of the reference pictures. Values greater than
+ 00 indicate that the decoding of the NAL unit is required to
+ maintain the integrity of the reference pictures.
+
+ In addition to the specification above, according to this RTP
+ payload specification, values of NRI indicate the relative
+ transport priority, as determined by the encoder. MANEs can
+ use this information to protect more important NAL units better
+ than they do less important NAL units. The highest transport
+ priority is 11, followed by 10, and then by 01; finally, 00 is
+ the lowest.
+
+ Informative note: Any non-zero value of NRI is handled
+ identically in H.264 decoders. Therefore, receivers need
+ not manipulate the value of NRI when passing NAL units to
+ the decoder.
+
+ An H.264 encoder MUST set the value of NRI according to the
+ H.264 specification (Subclause 7.4.1) when the value of
+ nal_unit_type is in the range of 1 to 12, inclusive. In
+ particular, the H.264 specification requires that the value of
+ NRI SHALL be equal to 0 for all NAL units having nal_unit_type
+ equal to 6, 9, 10, 11, or 12.
+
+ For NAL units having nal_unit_type equal to 7 or 8 (indicating
+ a sequence parameter set or a picture parameter set,
+ respectively), an H.264 encoder SHOULD set the value of NRI to
+ 11 (in binary format). For coded slice NAL units of a primary
+ coded picture having nal_unit_type equal to 5 (indicating a
+ coded slice belonging to an IDR picture), an H.264 encoder
+ SHOULD set the value of NRI to 11 (in binary format).
+
+
+
+
+
+Wang, et al. Standards Track [Page 14]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ For a mapping of the remaining nal_unit_types to NRI values,
+ the following example MAY be used and has been shown to be
+ efficient in a certain environment [14]. Other mappings MAY
+ also be desirable, depending on the application and the H.264
+ profile in use.
+
+ Informative note: Data partitioning is not available in
+ certain profiles, e.g., in the Main or Baseline profiles.
+ Consequently, the NAL unit types 2, 3, and 4 can occur only
+ if the video bitstream conforms to a profile in which data
+ partitioning is allowed and not in streams that conform to
+ the Main or Baseline profiles.
+
+ Table 2. Example of NRI values for coded slices and coded slice
+ data partitions of primary coded reference pictures
+
+ NAL Unit Type Content of NAL Unit NRI (binary)
+ ----------------------------------------------------------------
+ 1 non-IDR coded slice 10
+ 2 Coded slice data partition A 10
+ 3 Coded slice data partition B 01
+ 4 Coded slice data partition C 01
+
+ Informative note: As mentioned before, the NRI value of non-
+ reference pictures is 00 as mandated by H.264.
+
+ An H.264 encoder SHOULD set the value of NRI for coded slice
+ and coded slice data partition NAL units of redundant coded
+ reference pictures equal to 01 (in binary format).
+
+ Definitions of the values for NRI for NAL unit types 24 to 29,
+ inclusive, are given in Sections 5.7 and 5.8 of this memo.
+
+ No recommendation for the value of NRI is given for NAL units
+ having nal_unit_type in the range of 13 to 23, inclusive,
+ because these values are reserved for ITU-T and ISO/IEC. No
+ recommendation for the value of NRI is given for NAL units
+ having nal_unit_type equal to 0 or in the range of 30 to 31,
+ inclusive, as the semantics of these values are not specified
+ in this memo.
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 15]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+5.4. Packetization Modes
+
+ This memo specifies three cases of packetization modes:
+
+ o Single NAL unit mode
+
+ o Non-interleaved mode
+
+ o Interleaved mode
+
+ The single NAL unit mode is targeted for conversational systems that
+ comply with ITU-T Recommendation H.241 [3] (see Section 12.1). The
+ non-interleaved mode is targeted for conversational systems that may
+ not comply with ITU-T Recommendation H.241. In the non-interleaved
+ mode, NAL units are transmitted in NAL unit decoding order. The
+ interleaved mode is targeted for systems that do not require very low
+ end-to-end latency. The interleaved mode allows transmission of NAL
+ units out of NAL unit decoding order.
+
+ The packetization mode in use MAY be signaled by the value of the
+ OPTIONAL packetization-mode media type parameter. The used
+ packetization mode governs which NAL unit types are allowed in RTP
+ payloads. Table 3 summarizes the allowed packet payload types for
+ each packetization mode. Packetization modes are explained in more
+ detail in Section 6.
+
+ Table 3. Summary of allowed NAL unit types for each packetization
+ mode (yes = allowed, no = disallowed, ig = ignore)
+
+ Payload Packet Single NAL Non-Interleaved Interleaved
+ Type Type Unit Mode Mode Mode
+ -------------------------------------------------------------
+ 0 reserved ig ig ig
+ 1-23 NAL unit yes yes no
+ 24 STAP-A no yes no
+ 25 STAP-B no no yes
+ 26 MTAP16 no no yes
+ 27 MTAP24 no no yes
+ 28 FU-A no yes yes
+ 29 FU-B no no yes
+ 30-31 reserved ig ig ig
+
+ Some NAL unit or payload type values (indicated as reserved in Table
+ 3) are reserved for future extensions. NAL units of those types
+ SHOULD NOT be sent by a sender (direct as packet payloads, as
+ aggregation units in aggregation packets, or as fragmented units in
+ FU packets) and MUST be ignored by a receiver. For example, the
+ payload types 1-23, with the associated packet type "NAL unit", are
+
+
+
+Wang, et al. Standards Track [Page 16]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ allowed in "Single NAL Unit Mode" and in "Non-Interleaved Mode" but
+ disallowed in "Interleaved Mode". However, NAL units of NAL unit
+ types 1-23 can be used in "Interleaved Mode" as aggregation units in
+ STAP-B, MTAP16, and MTAP24 packets as well as fragmented units in FU-
+ A and FU-B packets. Similarly, NAL units of NAL unit types 1-23 can
+ also be used in the "Non-Interleaved Mode" as aggregation units in
+ STAP-A packets or fragmented units in FU-A packets, in addition to
+ being directly used as packet payloads.
+
+5.5. Decoding Order Number (DON)
+
+ In the interleaved packetization mode, the transmission order of NAL
+ units is allowed to differ from the decoding order of the NAL units.
+ Decoding order number (DON) is a field in the payload structure or a
+ derived variable that indicates the NAL unit decoding order.
+ Rationale and examples of use cases for transmission out of decoding
+ order and for the use of DON are given in Section 13.
+
+ The coupling of transmission and decoding order is controlled by the
+ OPTIONAL sprop-interleaving-depth media type parameter as follows.
+ When the value of the OPTIONAL sprop-interleaving-depth media type
+ parameter is equal to 0 (explicitly or per default), the transmission
+ order of NAL units MUST conform to the NAL unit decoding order. When
+ the value of the OPTIONAL sprop-interleaving-depth media type
+ parameter is greater than 0:
+
+ o the order of NAL units in an MTAP16 and an MTAP24 is not required
+ to be the NAL unit decoding order, and
+
+ o the order of NAL units generated by de-packetizing STAP-Bs, MTAPs,
+ and FUs in two consecutive packets is not required to be the NAL
+ unit decoding order.
+
+ The RTP payload structures for a single NAL unit packet, an STAP-A,
+ and an FU-A do not include DON. STAP-B and FU-B structures include
+ DON, and the structure of MTAPs enables derivation of DON, as
+ specified in Section 5.7.2.
+
+ Informative note: When an FU-A occurs in interleaved mode, it
+ always follows an FU-B, which sets its DON.
+
+ Informative note: If a transmitter wants to encapsulate a single
+ NAL unit per packet and transmit packets out of their decoding
+ order, STAP-B packet type can be used.
+
+ In the single NAL unit packetization mode, the transmission order of
+ NAL units, determined by the RTP sequence number, MUST be the same as
+ their NAL unit decoding order. In the non-interleaved packetization
+
+
+
+Wang, et al. Standards Track [Page 17]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ mode, the transmission order of NAL units in single NAL unit packets,
+ STAP-As, and FU-As MUST be the same as their NAL unit decoding order.
+ The NAL units within an STAP MUST appear in the NAL unit decoding
+ order. Thus, the decoding order is first provided through the
+ implicit order within an STAP and then provided through the RTP
+ sequence number for the order between STAPs, FUs, and single NAL unit
+ packets.
+
+ The signaling of the value of DON for NAL units carried in STAP-B,
+ MTAP, and a series of fragmentation units starting with an FU-B is
+ specified in Sections 5.7.1, 5.7.2, and 5.8, respectively. The DON
+ value of the first NAL unit in transmission order MAY be set to any
+ value. Values of DON are in the range of 0 to 65535, inclusive.
+ After reaching the maximum value, the value of DON wraps around to 0.
+
+ The decoding order of two NAL units contained in any STAP-B, MTAP, or
+ a series of fragmentation units starting with an FU-B is determined
+ as follows. Let DON(i) be the decoding order number of the NAL unit
+ having index i in the transmission order. Function don_diff(m,n) is
+ specified as follows:
+
+ If DON(m) == DON(n), don_diff(m,n) = 0
+
+ If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
+ don_diff(m,n) = DON(n) - DON(m)
+
+ If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
+ don_diff(m,n) = 65536 - DON(m) + DON(n)
+
+ If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
+ don_diff(m,n) = - (DON(m) + 65536 - DON(n))
+
+ If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
+ don_diff(m,n) = - (DON(m) - DON(n))
+
+ A positive value of don_diff(m,n) indicates that the NAL unit having
+ transmission order index n follows, in decoding order, the NAL unit
+ having transmission order index m. When don_diff(m,n) is equal to 0,
+ the NAL unit decoding order of the two NAL units can be in either
+ order. A negative value of don_diff(m,n) indicates that the NAL unit
+ having transmission order index n precedes, in decoding order, the
+ NAL unit having transmission order index m.
+
+ Values of DON-related fields (DON, DONB, and DOND; see Section 5.7)
+ MUST be such that the decoding order determined by the values of DON,
+ as specified above, conforms to the NAL unit decoding order.
+
+
+
+
+
+Wang, et al. Standards Track [Page 18]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ If the order of two NAL units in NAL unit decoding order is switched
+ and the new order does not conform to the NAL unit decoding order,
+ the NAL units MUST NOT have the same value of DON. If the order of
+ two consecutive NAL units in the NAL unit stream is switched and the
+ new order still conforms to the NAL unit decoding order, the NAL
+ units MAY have the same value of DON. For example, when arbitrary
+ slice order is allowed by the video coding profile in use, all the
+ coded slice NAL units of a coded picture are allowed to have the same
+ value of DON. Consequently, NAL units having the same value of DON
+ can be decoded in any order, and two NAL units having a different
+ value of DON should be passed to the decoder in the order specified
+ above. When two consecutive NAL units in the NAL unit decoding order
+ have a different value of DON, the value of DON for the second NAL
+ unit in decoding order SHOULD be the value of DON for the first,
+ incremented by one.
+
+ An example of the de-packetization process to recover the NAL unit
+ decoding order is given in Section 7.
+
+ Informative note: Receivers should not expect that the absolute
+ difference of values of DON for two consecutive NAL units in the
+ NAL unit decoding order will be equal to one, even in error-free
+ transmission. An increment by one is not required, as at the time
+ of associating values of DON to NAL units, it may not be known
+ whether all NAL units are delivered to the receiver. For example,
+ a gateway may not forward coded slice NAL units of non-reference
+ pictures or SEI NAL units when there is a shortage of bitrate in
+ the network to which the packets are forwarded. In another
+ example, a live broadcast is interrupted by pre-encoded content,
+ such as commercials, from time to time. The first intra picture
+ of a pre-encoded clip is transmitted in advance to ensure that it
+ is readily available in the receiver. When transmitting the first
+ intra picture, the originator does not exactly know how many NAL
+ units will be encoded before the first intra picture of the pre-
+ encoded clip follows in decoding order. Thus, the values of DON
+ for the NAL units of the first intra picture of the pre-encoded
+ clip have to be estimated when they are transmitted, and gaps in
+ values of DON may occur.
+
+5.6. Single NAL Unit Packet
+
+ The single NAL unit packet defined here MUST contain only one NAL
+ unit of the types defined in [1]. This means that neither an
+ aggregation packet nor a fragmentation unit can be used within a
+ single NAL unit packet. A NAL unit stream composed by de-packetizing
+ single NAL unit packets in RTP sequence number order MUST conform to
+ the NAL unit decoding order. The structure of the single NAL unit
+ packet is shown in Figure 2.
+
+
+
+Wang, et al. Standards Track [Page 19]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Informative note: The first byte of a NAL unit co-serves as the
+ RTP payload header.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |F|NRI| Type | |
+ +-+-+-+-+-+-+-+-+ |
+ | |
+ | Bytes 2..n of a single NAL unit |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 2. RTP payload format for single NAL unit packet
+
+5.7. Aggregation Packets
+
+ Aggregation packets are the NAL unit aggregation scheme of this
+ payload specification. The scheme is introduced to reflect the
+ dramatically different MTU sizes of two key target networks: wireline
+ IP networks (with an MTU size that is often limited by the Ethernet
+ MTU size, roughly 1500 bytes) and IP-based or non-IP-based (e.g.,
+ ITU-T H.324/M) wireless communication systems with preferred
+ transmission unit sizes of 254 bytes or less. To prevent media
+ transcoding between the two worlds, and to avoid undesirable
+ packetization overhead, a NAL unit aggregation scheme is introduced.
+
+ Two types of aggregation packets are defined by this specification:
+
+ o Single-time aggregation packet (STAP): aggregates NAL units with
+ identical NALU-times. Two types of STAPs are defined, one without
+ DON (STAP-A) and another including DON (STAP-B).
+
+ o Multi-time aggregation packet (MTAP): aggregates NAL units with
+ potentially differing NALU-times. Two different MTAPs are
+ defined, differing in the length of the NAL unit timestamp offset.
+
+ Each NAL unit to be carried in an aggregation packet is encapsulated
+ in an aggregation unit. Please see below for the four different
+ aggregation units and their characteristics.
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 20]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ The structure of the RTP payload format for aggregation packets is
+ presented in Figure 3.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |F|NRI| Type | |
+ +-+-+-+-+-+-+-+-+ |
+ | |
+ | one or more aggregation units |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 3. RTP payload format for aggregation packets
+
+ MTAPs and STAPs share the following packetization rules:
+
+ o The RTP timestamp MUST be set to the earliest of the NALU-times of
+ all the NAL units to be aggregated.
+
+ o The type field of the NAL unit type octet MUST be set to the
+ appropriate value, as indicated in Table 4.
+
+ o The F bit MUST be cleared if all F bits of the aggregated NAL
+ units are zero; otherwise, it MUST be set.
+
+ o The value of NRI MUST be the maximum of all the NAL units carried
+ in the aggregation packet.
+
+ Table 4. Type field for STAPs and MTAPs
+
+ Type Packet Timestamp offset DON-related fields
+ field length (DON, DONB, DOND)
+ (in bits) present
+ --------------------------------------------------------
+ 24 STAP-A 0 no
+ 25 STAP-B 0 yes
+ 26 MTAP16 16 yes
+ 27 MTAP24 24 yes
+
+ The marker bit in the RTP header is set to the value that the marker
+ bit of the last NAL unit of the aggregated packet would have if it
+ were transported in its own RTP packet.
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 21]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ The payload of an aggregation packet consists of one or more
+ aggregation units. See Sections 5.7.1 and 5.7.2 for the four
+ different types of aggregation units. An aggregation packet can
+ carry as many aggregation units as necessary; however, the total
+ amount of data in an aggregation packet obviously MUST fit into an IP
+ packet, and the size SHOULD be chosen so that the resulting IP packet
+ is smaller than the MTU size. An aggregation packet MUST NOT contain
+ fragmentation units, as specified in Section 5.8. Aggregation
+ packets MUST NOT be nested; that is, an aggregation packet MUST NOT
+ contain another aggregation packet.
+
+5.7.1. Single-Time Aggregation Packet (STAP)
+
+ A single-time aggregation packet (STAP) SHOULD be used whenever NAL
+ units are aggregated that all share the same NALU-time. The payload
+ of an STAP-A does not include DON and consists of at least one
+ single-time aggregation unit, as presented in Figure 4. The payload
+ of an STAP-B consists of a 16-bit unsigned decoding order number
+ (DON) (in network byte order) followed by at least one single-time
+ aggregation unit, as presented in Figure 5.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : |
+ +-+-+-+-+-+-+-+-+ |
+ | |
+ | single-time aggregation units |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 4. Payload format for STAP-A
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 22]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : decoding order number (DON) | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | |
+ | single-time aggregation units |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 5. Payload format for STAP-B
+
+ The DON field specifies the value of DON for the first NAL unit in an
+ STAP-B in transmission order. For each successive NAL unit in
+ appearance order in an STAP-B, the value of DON is equal to (the
+ value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in
+ which '%' stands for the modulo operation.
+
+ A single-time aggregation unit consists of 16-bit unsigned size
+ information (in network byte order) that indicates the size of the
+ following NAL unit in bytes (excluding these two octets, but
+ including the NAL unit type octet of the NAL unit), followed by the
+ NAL unit itself, including its NAL unit type byte. A single-time
+ aggregation unit is byte aligned within the RTP payload, but it may
+ not be aligned on a 32-bit word boundary. Figure 6 presents the
+ structure of the single-time aggregation unit.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : NAL unit size | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | |
+ | NAL unit |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 6. Structure for single-time aggregation unit
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 23]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Figure 7 presents an example of an RTP packet that contains an STAP-
+ A. The STAP contains two single-time aggregation units, labeled as 1
+ and 2 in the figure.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | RTP Header |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |STAP-A NAL HDR | NALU 1 Size | NALU 1 HDR |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 1 Data |
+ : :
+ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | | NALU 2 Size | NALU 2 HDR |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 2 Data |
+ : :
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 7. An example of an RTP packet including an STAP-A
+ containing two single-time aggregation units
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 24]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Figure 8 presents an example of an RTP packet that contains an STAP-
+ B. The STAP contains two single-time aggregation units, labeled as 1
+ and 2 in the figure.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | RTP Header |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |STAP-B NAL HDR | DON | NALU 1 Size |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 1 Size | NALU 1 HDR | NALU 1 Data |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
+ : :
+ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | | NALU 2 Size | NALU 2 HDR |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 2 Data |
+ : :
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 8. An example of an RTP packet including an STAP-B
+ containing two single-time aggregation units
+
+5.7.2. Multi-Time Aggregation Packets (MTAPs)
+
+ The NAL unit payload of MTAPs consists of a 16-bit unsigned decoding
+ order number base (DONB) (in network byte order) and one or more
+ multi-time aggregation units, as presented in Figure 9. DONB MUST
+ contain the value of DON for the first NAL unit in the NAL unit
+ decoding order among the NAL units of the MTAP.
+
+ Informative note: The first NAL unit in the NAL unit decoding
+ order is not necessarily the first NAL unit in the order in which
+ the NAL units are encapsulated in an MTAP.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 25]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : decoding order number base | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | |
+ | multi-time aggregation units |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 9. NAL unit payload format for MTAPs
+
+ Two different multi-time aggregation units are defined in this
+ specification. Both of them consist of 16 bits of unsigned size
+ information of the following NAL unit (in network byte order), an
+ 8-bit unsigned decoding order number difference (DOND), and n bits
+ (in network byte order) of timestamp offset (TS offset) for this NAL
+ unit, whereby n can be 16 or 24. The choice between the different
+ MTAP types (MTAP16 and MTAP24) is application dependent: the larger
+ the timestamp offset is, the higher the flexibility of the MTAP, but
+ the overhead is also higher.
+
+ The structure of the multi-time aggregation units for MTAP16 and
+ MTAP24 are presented in Figures 10 and 11, respectively. The
+ starting or ending position of an aggregation unit within a packet is
+ not required to be on a 32-bit word boundary. The DON of the NAL
+ unit contained in a multi-time aggregation unit is equal to (DONB +
+ DOND) % 65536, in which % denotes the modulo operation. This memo
+ does not specify how the NAL units within an MTAP are ordered, but,
+ in most cases, NAL unit decoding order SHOULD be used.
+
+ The timestamp offset field MUST be set to a value equal to the value
+ of the following formula: if the NALU-time is larger than or equal to
+ the RTP timestamp of the packet, then the timestamp offset equals
+ (the NALU-time of the NAL unit - the RTP timestamp of the packet).
+ If the NALU-time is smaller than the RTP timestamp of the packet,
+ then the timestamp offset is equal to the NALU-time + (2^32 - the RTP
+ timestamp of the packet).
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 26]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : NAL unit size | DOND | TS offset |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | TS offset | |
+ +-+-+-+-+-+-+-+-+ NAL unit |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 10. Multi-time aggregation unit for MTAP16
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : NAL unit size | DOND | TS offset |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | TS offset | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | NAL unit |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 11. Multi-time aggregation unit for MTAP24
+
+ For the "earliest" multi-time aggregation unit in an MTAP, the
+ timestamp offset MUST be zero. Hence, the RTP timestamp of the MTAP
+ itself is identical to the earliest NALU-time.
+
+ Informative note: The "earliest" multi-time aggregation unit is
+ the one that would have the smallest extended RTP timestamp among
+ all the aggregation units of an MTAP if the NAL units contained in
+ the aggregation units were encapsulated in single NAL unit
+ packets. An extended timestamp is a timestamp that has more than
+ 32 bits and is capable of counting the wraparound of the timestamp
+ field, thus enabling one to determine the smallest value if the
+ timestamp wraps. Such an "earliest" aggregation unit may not be
+ the first one in the order in which the aggregation units are
+ encapsulated in an MTAP. The "earliest" NAL unit need not be the
+ same as the first NAL unit in the NAL unit decoding order either.
+
+ Figure 12 presents an example of an RTP packet that contains a multi-
+ time aggregation packet of type MTAP16 that contains two multi-time
+ aggregation units, labeled as 1 and 2 in the figure.
+
+
+
+Wang, et al. Standards Track [Page 27]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | RTP Header |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |MTAP16 NAL HDR | decoding order number base | NALU 1 Size |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 1 Size | NALU 1 DOND | NALU 1 TS offset |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 1 HDR | NALU 1 DATA |
+ +-+-+-+-+-+-+-+-+ +
+ : :
+ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | | NALU 2 SIZE | NALU 2 DOND |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 2 TS offset | NALU 2 HDR | NALU 2 DATA |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ : :
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 12. An RTP packet including a multi-time aggregation
+ packet of type MTAP16 containing two multi-time
+ aggregation units
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 28]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Figure 13 presents an example of an RTP packet that contains a multi-
+ time aggregation packet of type MTAP24 that contains two multi-time
+ aggregation units, labeled as 1 and 2 in the figure.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | RTP Header |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |MTAP24 NAL HDR | decoding order number base | NALU 1 Size |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 1 Size | NALU 1 DOND | NALU 1 TS offs |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |NALU 1 TS offs | NALU 1 HDR | NALU 1 DATA |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
+ : :
+ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | | NALU 2 SIZE | NALU 2 DOND |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 2 TS offset | NALU 2 HDR |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | NALU 2 DATA |
+ : :
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 13. An RTP packet including a multi-time aggregation
+ packet of type MTAP24 containing two multi-time
+ aggregation units
+
+5.8. Fragmentation Units (FUs)
+
+ This payload type allows fragmenting a NAL unit into several RTP
+ packets. Doing so on the application layer instead of relying on
+ lower-layer fragmentation (e.g., by IP) has the following advantages:
+
+ o The payload format is capable of transporting NAL units bigger
+ than 64 kbytes over an IPv4 network that may be present in pre-
+ recorded video, particularly in High-Definition formats (there is
+ a limit of the number of slices per picture, which results in a
+ limit of NAL units per picture, which may result in big NAL
+ units).
+
+ o The fragmentation mechanism allows fragmenting a single NAL unit
+ and applying generic forward error correction as described in
+ Section 12.5.
+
+
+
+
+Wang, et al. Standards Track [Page 29]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Fragmentation is defined only for a single NAL unit and not for any
+ aggregation packets. A fragment of a NAL unit consists of an integer
+ number of consecutive octets of that NAL unit. Each octet of the NAL
+ unit MUST be part of exactly one fragment of that NAL unit.
+ Fragments of the same NAL unit MUST be sent in consecutive order with
+ ascending RTP sequence numbers (with no other RTP packets within the
+ same RTP packet stream being sent between the first and last
+ fragment). Similarly, a NAL unit MUST be reassembled in RTP sequence
+ number order.
+
+ When a NAL unit is fragmented and conveyed within fragmentation units
+ (FUs), it is referred to as a fragmented NAL unit. STAPs and MTAPs
+ MUST NOT be fragmented. FUs MUST NOT be nested; that is, an FU MUST
+ NOT contain another FU.
+
+ The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
+ time of the fragmented NAL unit.
+
+ Figure 14 presents the RTP payload format for FU-As. An FU-A
+ consists of a fragmentation unit indicator of one octet, a
+ fragmentation unit header of one octet, and a fragmentation unit
+ payload.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | FU indicator | FU header | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
+ | |
+ | FU payload |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 14. RTP payload format for FU-A
+
+ Figure 15 presents the RTP payload format for FU-Bs. An FU-B
+ consists of a fragmentation unit indicator of one octet, a
+ fragmentation unit header of one octet, a decoding order number (DON)
+ (in network byte order), and a fragmentation unit payload. In other
+ words, the structure of FU-B is the same as the structure of FU-A,
+ except for the additional DON field.
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 30]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | FU indicator | FU header | DON |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
+ | |
+ | FU payload |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | :...OPTIONAL RTP padding |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 15. RTP payload format for FU-B
+
+ NAL unit type FU-B MUST be used in the interleaved packetization mode
+ for the first fragmentation unit of a fragmented NAL unit. NAL unit
+ type FU-B MUST NOT be used in any other case. In other words, in the
+ interleaved packetization mode, each NALU that is fragmented has an
+ FU-B as the first fragment, followed by one or more FU-A fragments.
+
+ The FU indicator octet has the following format:
+
+ +---------------+
+ |0|1|2|3|4|5|6|7|
+ +-+-+-+-+-+-+-+-+
+ |F|NRI| Type |
+ +---------------+
+
+ Values equal to 28 and 29 in the type field of the FU indicator octet
+ identify an FU-A and an FU-B, respectively. The use of the F bit is
+ described in Section 5.3. The value of the NRI field MUST be set
+ according to the value of the NRI field in the fragmented NAL unit.
+
+ The FU header has the following format:
+
+ +---------------+
+ |0|1|2|3|4|5|6|7|
+ +-+-+-+-+-+-+-+-+
+ |S|E|R| Type |
+ +---------------+
+
+ S: 1 bit
+ When set to one, the Start bit indicates the start of a
+ fragmented NAL unit. When the following FU payload is not the
+ start of a fragmented NAL unit payload, the Start bit is set
+ to zero.
+
+
+
+
+
+Wang, et al. Standards Track [Page 31]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ E: 1 bit
+ When set to one, the End bit indicates the end of a fragmented
+ NAL unit, i.e., the last byte of the payload is also the last
+ byte of the fragmented NAL unit. When the following FU
+ payload is not the last fragment of a fragmented NAL unit, the
+ End bit is set to zero.
+
+ R: 1 bit
+ The Reserved bit MUST be equal to 0 and MUST be ignored by the
+ receiver.
+
+ Type: 5 bits
+ The NAL unit payload type as defined in Table 7-1 of [1].
+
+ The value of DON in FU-Bs is selected as described in Section 5.5.
+
+ Informative note: The DON field in FU-Bs allows gateways to
+ fragment NAL units to FU-Bs without organizing the incoming NAL
+ units to the NAL unit decoding order.
+
+ A fragmented NAL unit MUST NOT be transmitted in one FU; that is, the
+ Start bit and End bit MUST NOT both be set to one in the same FU
+ header.
+
+ The FU payload consists of fragments of the payload of the fragmented
+ NAL unit so that if the fragmentation unit payloads of consecutive
+ FUs are sequentially concatenated, the payload of the fragmented NAL
+ unit can be reconstructed. The NAL unit type octet of the fragmented
+ NAL unit is not included as such in the fragmentation unit payload,
+ but rather the information of the NAL unit type octet of the
+ fragmented NAL unit is conveyed in the F and NRI fields of the FU
+ indicator octet of the fragmentation unit and in the type field of
+ the FU header. An FU payload MAY have any number of octets and MAY
+ be empty.
+
+ Informative note: Empty FUs are allowed to reduce the latency of a
+ certain class of senders in nearly lossless environments. These
+ senders can be characterized in that they packetize NALU fragments
+ before the NALU is completely generated and, hence, before the
+ NALU size is known. If zero-length NALU fragments were not
+ allowed, the sender would have to generate at least one bit of
+ data of the following fragment before the current fragment could
+ be sent. Due to the characteristics of H.264, where sometimes
+ several macroblocks occupy zero bits, this is undesirable and can
+ add delay. However, the (potential) use of zero-length NALU
+ fragments should be carefully weighed against the increased risk
+ of the loss of at least a part of the NALU because of the
+ additional packets employed for its transmission.
+
+
+
+Wang, et al. Standards Track [Page 32]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ If a fragmentation unit is lost, the receiver SHOULD discard all
+ following fragmentation units in transmission order corresponding to
+ the same fragmented NAL unit.
+
+ A receiver in an endpoint or in a MANE MAY aggregate the first n-1
+ fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
+ n of that NAL unit is not received. In this case, the
+ forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
+ syntax violation.
+
+6. Packetization Rules
+
+ The packetization modes are introduced in Section 5.2. The
+ packetization rules common to more than one of the packetization
+ modes are specified in Section 6.1. The packetization rules for the
+ single NAL unit mode, the non-interleaved mode, and the interleaved
+ mode are specified in Sections 6.2, 6.3, and 6.4, respectively.
+
+6.1. Common Packetization Rules
+
+ All senders MUST enforce the following packetization rules,
+ regardless of the packetization mode in use:
+
+ o Coded slice NAL units or coded slice data partition NAL units
+ belonging to the same coded picture (and thus sharing the same RTP
+ timestamp value) MAY be sent in any order; however, for delay-
+ critical systems, they SHOULD be sent in their original decoding
+ order to minimize the delay. Note that the decoding order is the
+ order of the NAL units in the bitstream.
+
+ o Parameter sets are handled in accordance with the rules and
+ recommendations given in Section 8.4.
+
+ o MANEs MUST NOT duplicate any NAL unit except for sequence or
+ picture parameter set NAL units, as neither this memo nor the
+ H.264 specification provides means to identify duplicated NAL
+ units. Sequence and picture parameter set NAL units MAY be
+ duplicated to make their correct reception more probable, but any
+ such duplication MUST NOT affect the contents of any active
+ sequence or picture parameter set. Duplication SHOULD be
+ performed on the application layer and not by duplicating RTP
+ packets (with identical sequence numbers).
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 33]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Senders using the non-interleaved mode and the interleaved mode MUST
+ enforce the following packetization rule:
+
+ o In an RTP translator, MANEs MAY convert single NAL unit packets
+ into one aggregation packet, convert an aggregation packet into
+ several single NAL unit packets, or mix both concepts. The RTP
+ translator SHOULD take into account at least the following
+ parameters: path MTU size, unequal protection mechanisms (e.g.,
+ through packet-based FEC according to RFC 5109 [18], especially
+ for sequence and picture parameter set NAL units and coded slice
+ data partition A NAL units), bearable latency of the system, and
+ buffering capabilities of the receiver.
+
+ Informative note: An RTP translator is required to handle RTP
+ Control Protocol (RTCP) as per RFC 3550.
+
+6.2. Single NAL Unit Mode
+
+ This mode is in use when the value of the OPTIONAL packetization-mode
+ media type parameter is equal to 0 or the packetization-mode is not
+ present. All receivers MUST support this mode. It is primarily
+ intended for low-delay applications that are compatible with systems
+ using ITU-T Recommendation H.241 [3] (see Section 12.1). Only single
+ NAL unit packets MAY be used in this mode. STAPs, MTAPs, and FUs
+ MUST NOT be used. The transmission order of single NAL unit packets
+ MUST comply with the NAL unit decoding order.
+
+6.3. Non-Interleaved Mode
+
+ This mode is in use when the value of the OPTIONAL packetization-mode
+ media type parameter is equal to 1. This mode SHOULD be supported.
+ It is primarily intended for low-delay applications. Only single NAL
+ unit packets, STAP-As, and FU-As MAY be used in this mode. STAP-Bs,
+ MTAPs, and FU-Bs MUST NOT be used. The transmission order of NAL
+ units MUST comply with the NAL unit decoding order.
+
+6.4. Interleaved Mode
+
+ This mode is in use when the value of the OPTIONAL packetization-mode
+ media type parameter is equal to 2. Some receivers MAY support this
+ mode. STAP-Bs, MTAPs, FU-As, and FU-Bs MAY be used. STAP-As and
+ single NAL unit packets MUST NOT be used. The transmission order of
+ packets and NAL units is constrained as specified in Section 5.5.
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 34]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+7. De-Packetization Process
+
+ The de-packetization process is implementation dependent. Therefore,
+ the following description should be seen as an example of a suitable
+ implementation. Other schemes may also be used as long as the output
+ for the same input is the same as the process described below. The
+ same output means that the resulting NAL units and their order are
+ identical. Optimizations relative to the described algorithms are
+ likely possible. Section 7.1 presents the de-packetization process
+ for the single NAL unit and non-interleaved packetization modes,
+ whereas Section 7.2 describes the process for the interleaved mode.
+ Section 7.3 includes additional de-packetization guidelines for
+ intelligent receivers.
+
+ All normal RTP mechanisms related to buffer management apply. In
+ particular, duplicated or outdated RTP packets (as indicated by the
+ RTP sequence number and the RTP timestamp) are removed. To determine
+ the exact time for decoding, factors such as a possible intentional
+ delay to allow for proper inter-stream synchronization must be
+ factored in.
+
+7.1. Single NAL Unit and Non-Interleaved Mode
+
+ The receiver includes a receiver buffer to compensate for
+ transmission delay jitter. The receiver stores incoming packets in
+ reception order into the receiver buffer. Packets are de-packetized
+ in RTP sequence number order. If a de-packetized packet is a single
+ NAL unit packet, the NAL unit contained in the packet is passed
+ directly to the decoder. If a de-packetized packet is an STAP-A, the
+ NAL units contained in the packet are passed to the decoder in the
+ order in which they are encapsulated in the packet. For all the FU-A
+ packets containing fragments of a single NAL unit, the de-packetized
+ fragments are concatenated in their sending order to recover the NAL
+ unit, which is then passed to the decoder.
+
+ Informative note: If the decoder supports arbitrary slice order,
+ coded slices of a picture can be passed to the decoder in any
+ order, regardless of their reception and transmission order.
+
+7.2. Interleaved Mode
+
+ The general concept behind these de-packetization rules is to reorder
+ NAL units from transmission order to the NAL unit decoding order.
+
+ The receiver includes a receiver buffer, which is used to compensate
+ for transmission delay jitter and to reorder NAL units from
+ transmission order to the NAL unit decoding order. In this section,
+ the receiver operation is described under the assumption that there
+
+
+
+Wang, et al. Standards Track [Page 35]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ is no transmission delay jitter. To differentiate the receiver
+ buffer from a practical receiver buffer that is also used for
+ compensation of transmission delay jitter, the receiver buffer is
+ hereafter called the de-interleaving buffer in this section.
+ Receivers SHOULD also prepare for transmission delay jitter, i.e.,
+ either reserve separate buffers for transmission delay jitter
+ buffering and de-interleaving buffering or use a receiver buffer for
+ both transmission delay jitter and de-interleaving. Moreover,
+ receivers SHOULD take transmission delay jitter into account in the
+ buffering operation, e.g., by additional initial buffering before
+ starting of decoding and playback.
+
+ This section is organized as follows: Subsection 7.2.1 presents how
+ to calculate the size of the de-interleaving buffer. Subsection
+ 7.2.2 specifies the receiver process on how to organize received NAL
+ units to the NAL unit decoding order.
+
+7.2.1. Size of the De-Interleaving Buffer
+
+ In either Offer/Answer or declarative Session Description Protocol
+ (SDP) usage, the sprop-deint-buf-req media type parameter signals the
+ requirement for the de-interleaving buffer size. Therefore, it is
+ RECOMMENDED to set the de-interleaving buffer size, in terms of
+ number of bytes, equal to or greater than the value of the sprop-
+ deint-buf-req media type parameter.
+
+ When the SDP Offer/Answer model or any other capability exchange
+ procedure is used in session setup, the properties of the received
+ stream SHOULD be such that the receiver capabilities are not
+ exceeded. In the SDP Offer/Answer model, the receiver can indicate
+ its capabilities to allocate a de-interleaving buffer with the deint-
+ buf-cap media type parameter. See Section 8.1 for further
+ information on the deint-buf-cap and sprop-deint-buf-req media type
+ parameters and Section 8.2.2 for further information on their use in
+ the SDP Offer/Answer model.
+
+7.2.2. De-Interleaving Process
+
+ There are two buffering states in the receiver: initial buffering and
+ buffering while playing. Initial buffering occurs when the RTP
+ session is initialized. After initial buffering, decoding and
+ playback are started, and the buffering-while-playing mode is used.
+
+ Regardless of the buffering state, the receiver stores incoming NAL
+ units, in reception order, in the de-interleaving buffer as follows.
+ NAL units of aggregation packets are stored in the de-interleaving
+ buffer individually. The value of DON is calculated and stored for
+ each NAL unit.
+
+
+
+Wang, et al. Standards Track [Page 36]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ The receiver operation is described below with the help of the
+ following functions and constants:
+
+ o Function AbsDON is specified in Section 8.1.
+
+ o Function don_diff is specified in Section 5.5.
+
+ o Constant N is the value of the OPTIONAL sprop-interleaving-depth
+ media type parameter (see Section 8.1) incremented by 1.
+
+ Initial buffering lasts until one of the following conditions is
+ fulfilled:
+
+ o There are N or more VCL NAL units in the de-interleaving buffer.
+
+ o If sprop-max-don-diff is present, don_diff(m,n) is greater than
+ the value of sprop-max-don-diff, in which n corresponds to the NAL
+ unit having the greatest value of AbsDON among the received NAL
+ units and m corresponds to the NAL unit having the smallest value
+ of AbsDON among the received NAL units.
+
+ o Initial buffering has lasted for the duration equal to or greater
+ than the value of the OPTIONAL sprop-init-buf-time media type
+ parameter.
+
+ The NAL units to be removed from the de-interleaving buffer are
+ determined as follows:
+
+ o If the de-interleaving buffer contains at least N VCL NAL units,
+ NAL units are removed from the de-interleaving buffer and passed
+ to the decoder in the order specified below until the buffer
+ contains N-1 VCL NAL units.
+
+ o If sprop-max-don-diff is present, all NAL units m for which
+ don_diff(m,n) is greater than sprop-max-don-diff are removed from
+ the de-interleaving buffer and passed to the decoder in the order
+ specified below. Herein, n corresponds to the NAL unit having the
+ greatest value of AbsDON among the NAL units in the de-
+ interleaving buffer.
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 37]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ The order in which NAL units are passed to the decoder is specified
+ as follows:
+
+ o Let PDON be a variable that is initialized to 0 at the beginning
+ of the RTP session.
+
+ o For each NAL unit associated with a value of DON, a DON distance
+ is calculated as follows. If the value of DON of the NAL unit is
+ larger than the value of PDON, the DON distance is equal to DON -
+ PDON. Otherwise, the DON distance is equal to 65535 - PDON + DON
+ + 1.
+
+ o NAL units are delivered to the decoder in ascending order of DON
+ distance. If several NAL units share the same value of DON
+ distance, they can be passed to the decoder in any order.
+
+ o When a desired number of NAL units have been passed to the
+ decoder, the value of PDON is set to the value of DON for the last
+ NAL unit passed to the decoder.
+
+7.3. Additional De-Packetization Guidelines
+
+ The following additional de-packetization rules may be used to
+ implement an operational H.264 de-packetizer:
+
+ o Intelligent RTP receivers (e.g., in gateways) may identify lost
+ coded slice data partitions A (DPAs). If a lost DPA is detected,
+ after taking into account possible retransmission and FEC, a
+ gateway may decide not to send the corresponding coded slice data
+ partitions B and C, as their information is meaningless for H.264
+ decoders. In this way, a MANE can reduce network load by
+ discarding useless packets without parsing a complex bitstream.
+
+ o Intelligent RTP receivers (e.g., in gateways) may identify lost
+ FUs. If a lost FU is found, a gateway may decide not to send the
+ following FUs of the same fragmented NAL unit, as their
+ information is meaningless for H.264 decoders. In this way, a
+ MANE can reduce network load by discarding useless packets without
+ parsing a complex bitstream.
+
+ o Intelligent receivers having to discard packets or NALUs should
+ first discard all packets/NALUs in which the value of the NRI
+ field of the NAL unit type octet is equal to 0. This will
+ minimize the impact on user experience and keep the reference
+ pictures intact. If more packets have to be discarded, then
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 38]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ packets with a numerically lower NRI value should be discarded
+ before packets with a numerically higher NRI value. However,
+ discarding any packets with an NRI bigger than 0 very likely leads
+ to decoder drift and SHOULD be avoided.
+
+8. Payload Format Parameters
+
+ This section specifies the parameters that MAY be used to select
+ optional features of the payload format and certain features of the
+ bitstream. The parameters are specified here as part of the media
+ subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec. A
+ mapping of the parameters into the Session Description Protocol (SDP)
+ [6] is also provided for applications that use SDP. Equivalent
+ parameters could be defined elsewhere for use with control protocols
+ that do not use SDP.
+
+ Some parameters provide a receiver with the properties of the stream
+ that will be sent. The names of all these parameters start with
+ "sprop" for stream properties. Some of these "sprop" parameters are
+ limited by other payload or codec configuration parameters. For
+ example, the sprop-parameter-sets parameter is constrained by the
+ profile-level-id parameter.
+
+8.1. Media Type Registration
+
+ The media subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec has
+ been allocated from the IETF tree.
+
+ Media Type name: video
+
+ Media subtype name: H264
+
+ Required parameters: none
+
+ OPTIONAL parameters:
+
+ profile-level-id:
+ A base16 [7] (hexadecimal) representation of the following
+ three bytes in the sequence parameter set NAL unit is specified
+ in [1]: 1) profile_idc, 2) a byte herein referred to as
+ profile-iop, composed of the values of constraint_set0_flag,
+ constraint_set1_flag, constraint_set2_flag,
+ constraint_set3_flag, constraint_set4_flag,
+ constraint_set5_flag, and reserved_zero_2bits in bit-
+ significance order, starting from the most-significant bit, and
+ 3) level_idc. Note that reserved_zero_2bits is required to be
+ equal to 0 in [1], but other values for it may be specified in
+ the future by ITU-T or ISO/IEC.
+
+
+
+Wang, et al. Standards Track [Page 39]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ The profile-level-id parameter indicates the default sub-
+ profile (i.e., the subset of coding tools that may have been
+ used to generate the stream or that the receiver supports) and
+ the default level of the stream or the receiver supports.
+
+ The default sub-profile is indicated collectively by the
+ profile_idc byte and some fields in the profile-iop byte.
+ Depending on the values of the fields in the profile-iop byte,
+ the default sub-profile may be the set of coding tools
+ supported by one profile, or a common subset of coding tools of
+ multiple profiles, as specified in Section 7.4.2.1.1 of [1].
+ The default level is indicated by the level_idc byte, and, when
+ profile_idc is equal to 66, 77, or 88 (the Baseline, Main, or
+ Extended profile) and level_idc is equal to 11, additionally by
+ bit 4 (constraint_set3_flag) of the profile-iop byte. When
+ profile_idc is equal to 66, 77, or 88 (the Baseline, Main, or
+ Extended profile), level_idc is equal to 11, and bit 4
+ (constraint_set3_flag) of the profile-iop byte is equal to 1,
+ the default level is Level 1b.
+
+ Table 5 lists all profiles defined in Annex A of [1] and, for
+ each of the profiles, the possible combinations of profile_idc
+ and profile-iop that represent the same sub-profile.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 40]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Table 5. Combinations of profile_idc and profile-iop
+ representing the same sub-profile corresponding to the full
+ set of coding tools supported by one profile. In the
+ following, x may be either 0 or 1, while the profile names
+ are indicated as follows. CB: Constrained Baseline profile,
+ B: Baseline profile, M: Main profile, E: Extended profile,
+ H: High profile, H10: High 10 profile, H42: High 4:2:2
+ profile, H44: High 4:4:4 Predictive profile, H10I: High 10
+ Intra profile, H42I: High 4:2:2 Intra profile, H44I: High
+ 4:4:4 Intra profile, and C44I: CAVLC 4:4:4 Intra profile.
+
+ Profile profile_idc profile-iop
+ (hexadecimal) (binary)
+
+ CB 42 (B) x1xx0000
+ same as: 4D (M) 1xxx0000
+ same as: 58 (E) 11xx0000
+ B 42 (B) x0xx0000
+ same as: 58 (E) 10xx0000
+ M 4D (M) 0x0x0000
+ E 58 00xx0000
+ H 64 00000000
+ H10 6E 00000000
+ H42 7A 00000000
+ H44 F4 00000000
+ H10I 6E 00010000
+ H42I 7A 00010000
+ H44I F4 00010000
+ C44I 2C 00010000
+
+ For example, in the table above, profile_idc equal to 58
+ (Extended) with profile-iop equal to 11xx0000 indicates the
+ same sub-profile corresponding to profile_idc equal to 42
+ (Baseline) with profile-iop equal to x1xx0000. Note that other
+ combinations of profile_idc and profile-iop (not listed in
+ Table 5) may represent a sub-profile equivalent to the common
+ subset of coding tools for more than one profile. Note also
+ that a decoder conforming to a certain profile may be able to
+ decode bitstreams conforming to other profiles.
+
+ If the profile-level-id parameter is used to indicate
+ properties of a NAL unit stream, it indicates that, to decode
+ the stream, the minimum subset of coding tools a decoder has to
+ support is the default sub-profile, and the lowest level the
+ decoder has to support is the default level.
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 41]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ If the profile-level-id parameter is used for capability
+ exchange or session setup, it indicates the subset of coding
+ tools, which is equal to the default sub-profile, that the
+ codec supports for both receiving and sending. If max-recv-
+ level is not present, the default level from profile-level-id
+ indicates the highest level the codec wishes to support. If
+ max-recv-level is present, it indicates the highest level the
+ codec supports for receiving. For either receiving or sending,
+ all levels that are lower than the highest level supported MUST
+ also be supported.
+
+ Informative note: Capability exchange and session setup
+ procedures should provide means to list the capabilities for
+ each supported sub-profile separately. For example, the
+ one-of-N codec selection procedure of the SDP Offer/Answer
+ model can be used (Section 10.2 of [8]). The one-of-N codec
+ selection procedure may also be used to provide different
+ combinations of profile_idc and profile-iop that represent
+ the same sub-profile. When there are many different
+ combinations of profile_idc and profile-iop that represent
+ the same sub-profile, using the one-of-N codec selection
+ procedure may result in a fairly large SDP message.
+ Therefore, a receiver should understand the different
+ equivalent combinations of profile_idc and profile-iop that
+ represent the same sub-profile and be ready to accept an
+ offer using any of the equivalent combinations.
+
+ If no profile-level-id is present, the Baseline profile,
+ without additional constraints at Level 1, MUST be inferred.
+
+ max-recv-level:
+ This parameter MAY be used to indicate the highest level a
+ receiver supports when the highest level is higher than the
+ default level (the level indicated by profile-level-id). The
+ value of max-recv-level is a base16 (hexadecimal)
+ representation of the two bytes after the syntax element
+ profile_idc in the sequence parameter set NAL unit specified in
+ [1]: profile-iop (as defined above) and level_idc. If the
+ level_idc byte of max-recv-level is equal to 11 and bit 4 of
+ the profile-iop byte of max-recv-level is equal to 1 or if the
+ level_idc byte of max-recv-level is equal to 9 and bit 4 of the
+ profile-iop byte of max-recv-level is equal to 0, the highest
+ level the receiver supports is Level 1b. Otherwise, the
+ highest level the receiver supports is equal to the level_idc
+ byte of max-recv-level divided by 10.
+
+ max-recv-level MUST NOT be present if the highest level the
+ receiver supports is not higher than the default level.
+
+
+
+Wang, et al. Standards Track [Page 42]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br:
+ These parameters MAY be used to signal the capabilities of a
+ receiver implementation. These parameters MUST NOT be used for
+ any other purpose. The highest level conveyed in the value of
+ the profile-level-id parameter or the max-recv-level parameter
+ MUST be such that the receiver is fully capable of supporting.
+ max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br MAY
+ be used to indicate capabilities of the receiver that extend
+ the required capabilities of the signaled highest level, as
+ specified below.
+
+ When more than one parameter from the set (max-mbps, max-smbps,
+ max-fs, max-cpb, max-dpb, max-br) is present, the receiver MUST
+ support all signaled capabilities simultaneously. For example,
+ if both max-mbps and max-br are present, the signaled highest
+ level with the extension of both the frame rate and bitrate is
+ supported. That is, the receiver is able to decode NAL unit
+ streams in which the macroblock processing rate is up to max-
+ mbps (inclusive), the bitrate is up to max-br (inclusive), the
+ coded picture buffer size is derived as specified in the
+ semantics of the max-br parameter below, and the other
+ properties comply with the highest level specified in the value
+ of the profile-level-id parameter or the max-recv-level
+ parameter.
+
+ If a receiver can support all the properties of Level A, the
+ highest level specified in the value of the profile-level-id
+ parameter or the max-recv-level parameter MUST be Level A
+ (i.e., MUST NOT be lower than Level A). In other words, a
+ receiver MUST NOT signal values of max-mbps, max-fs, max-cpb,
+ max-dpb, and max-br that taken together meet the requirements
+ of a higher level compared to the highest level specified in
+ the value of the profile-level-id parameter or the max-recv-
+ level parameter.
+
+ Informative note: When the OPTIONAL media type parameters
+ are used to signal the properties of a NAL unit stream, max-
+ mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br are
+ not present, and the value of profile-level-id must always
+ be such that the NAL unit stream complies fully with the
+ specified profile and level.
+
+ max-mbps: The value of max-mbps is an integer indicating the
+ maximum macroblock processing rate in units of macroblocks per
+ second. The max-mbps parameter signals that the receiver is
+ capable of decoding video at a higher rate than is required by
+ the signaled highest level conveyed in the value of the
+ profile-level-id parameter or the max-recv-level parameter.
+
+
+
+Wang, et al. Standards Track [Page 43]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ When max-mbps is signaled, the receiver MUST be able to decode
+ NAL unit streams that conform to the signaled highest level,
+ with the exception that the MaxMBPS value in Table A-1 of [1]
+ for the signaled highest level is replaced with the value of
+ max-mbps. The value of max-mbps MUST be greater than or equal
+ to the value of MaxMBPS given in Table A-1 of [1] for the
+ highest level. Senders MAY use this knowledge to send pictures
+ of a given size at a higher picture rate than is indicated in
+ the signaled highest level.
+
+ max-smbps: The value of max-smbps is an integer indicating the
+ maximum static macroblock processing rate in units of static
+ macroblocks per second, under the hypothetical assumption that
+ all macroblocks are static macroblocks. When max-smbps is
+ signaled, the MaxMBPS value in Table A-1 of [1] should be
+ replaced with the result of the following computation:
+
+ o If the parameter max-mbps is signaled, set a variable
+ MaxMacroblocksPerSecond to the value of max-mbps.
+ Otherwise, set MaxMacroblocksPerSecond equal to the value of
+ MaxMBPS in Table A-1 [1] for the signaled highest level
+ conveyed in the value of the profile-level-id parameter or
+ the max-recv-level parameter.
+
+ o Set a variable P_non-static to the proportion of non-static
+ macroblocks in picture n.
+
+ o Set a variable P_static to the proportion of static
+ macroblocks in picture n.
+
+ o The value of MaxMBPS in Table A-1 of [1] should be
+ considered by the encoder to be equal to:
+
+ MaxMacroblocksPerSecond * max-smbps / (P_non-static *
+ max-smbps + P_static * MaxMacroblocksPerSecond)
+
+ The encoder should recompute this value for each picture. The
+ value of max-smbps MUST be greater than or equal to the value
+ of MaxMBPS given explicitly as the value of the max-mbps
+ parameter or implicitly in Table A-1 of [1] for the signaled
+ highest level. Senders MAY use this knowledge to send pictures
+ of a given size at a higher picture rate than is indicated in
+ the signaled highest level.
+
+ max-fs: The value of max-fs is an integer indicating the maximum
+ frame size in units of macroblocks. The max-fs parameter
+ signals that the receiver is capable of decoding larger picture
+ sizes than are required by the signaled highest level conveyed
+
+
+
+Wang, et al. Standards Track [Page 44]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ in the value of the profile-level-id parameter or the max-recv-
+ level parameter. When max-fs is signaled, the receiver MUST be
+ able to decode NAL unit streams that conform to the signaled
+ highest level, with the exception that the MaxFS value in Table
+ A-1 of [1] for the signaled highest level is replaced with the
+ value of max-fs. The value of max-fs MUST be greater than or
+ equal to the value of MaxFS given in Table A-1 of [1] for the
+ highest level. Senders MAY use this knowledge to send larger
+ pictures at a proportionally lower frame rate than is indicated
+ in the signaled highest level.
+
+ max-cpb: The value of max-cpb is an integer indicating the maximum
+ coded picture buffer size in units of 1000 bits for the VCL HRD
+ parameters and in units of 1200 bits for the NAL HRD
+ parameters. Note that this parameter does not use units of
+ cpbBrVclFactor and cpbBrNALFactor (see Table A-1 of [1]). The
+ max-cpb parameter signals that the receiver has more memory
+ than the minimum amount of coded picture buffer memory required
+ by the signaled highest level conveyed in the value of the
+ profile-level-id parameter or the max-recv-level parameter.
+ When max-cpb is signaled, the receiver MUST be able to decode
+ NAL unit streams that conform to the signaled highest level,
+ with the exception that the MaxCPB value in Table A-1 of [1]
+ for the signaled highest level is replaced with the value of
+ max-cpb (after taking cpbBrVclFactor and cpbBrNALFactor into
+ consideration when needed). The value of max-cpb (after taking
+ cpbBrVclFactor and cpbBrNALFactor into consideration when
+ needed) MUST be greater than or equal to the value of MaxCPB
+ given in Table A-1 of [1] for the highest level. Senders MAY
+ use this knowledge to construct coded video streams with
+ greater variation of bitrate than can be achieved with the
+ MaxCPB value in Table A-1 of [1].
+
+ Informative note: The coded picture buffer is used in the
+ hypothetical reference decoder (Annex C of H.264). The use
+ of the hypothetical reference decoder is recommended in
+ H.264 encoders to verify that the produced bitstream
+ conforms to the standard and to control the output bitrate.
+ Thus, the coded picture buffer is conceptually independent
+ of any other potential buffers in the receiver, including
+ de-interleaving and de-jitter buffers. The coded picture
+ buffer need not be implemented in decoders as specified in
+ Annex C of H.264, but rather standard-compliant decoders can
+ have any buffering arrangements provided that they can
+ decode standard-compliant bitstreams. Thus, in practice,
+ the input buffer for a video decoder can be integrated with
+ de-interleaving and de-jitter buffers of the receiver.
+
+
+
+
+Wang, et al. Standards Track [Page 45]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ max-dpb: The value of max-dpb is an integer indicating the maximum
+ decoded picture buffer size in units of 8/3 macroblocks. The
+ max-dpb parameter signals that the receiver has more memory
+ than the minimum amount of decoded picture buffer memory
+ required by the signaled highest level conveyed in the value of
+ the profile-level-id parameter or the max-recv-level parameter.
+ When max-dpb is signaled, the receiver MUST be able to decode
+ NAL unit streams that conform to the signaled highest level,
+ with the exception that the MaxDpbMbs value in Table A-1 of [1]
+ for the signaled highest level is replaced with the value of
+ max-dpb * 3 / 8. Consequently, a receiver that signals max-dpb
+ MUST be capable of storing the following number of decoded
+ frames, complementary field pairs, and non-paired fields in its
+ decoded picture buffer:
+
+ Min(max-dpb * 3 / 8 / ( PicWidthInMbs * FrameHeightInMbs),
+ 16)
+
+ Wherein PicWidthInMbs and FrameHeightInMbs are defined in [1].
+
+ The value of max-dpb MUST be greater than or equal to the value
+ of MaxDpbMbs * 3 / 8, wherein the value of MaxDpbMbs is given
+ in Table A-1 of [1] for the highest level. Senders MAY use
+ this knowledge to construct coded video streams with improved
+ compression.
+
+ Informative note: This parameter was added primarily to
+ complement a similar codepoint in the ITU-T Recommendation
+ H.245, so as to facilitate signaling gateway designs. The
+ decoded picture buffer stores reconstructed samples. There
+ is no relationship between the size of the decoded picture
+ buffer and the buffers used in RTP, especially
+ de-interleaving and de-jitter buffers.
+
+ Informative note: In RFC 3984, which this document
+ obsoletes, the unit of this parameter was 1024 bytes. The
+ unit has been changed to 8/3 macroblocks in this document.
+ The reason for this change was due to the changes from the
+ 2003 version of the H.264 specification referenced by RFC
+ 3984 to the 2010 version of the H.264 specification
+ referenced by this document, particularly the changes to
+ Table A-1 in the H.264 specification due to addition of
+ color formats and bit depths not supported earlier. The
+ changed semantics of this parameter keeps backward
+ compatibility to RFC 3984 and supports all profiles defined
+ in the 2010 version of the H.264 specification.
+
+
+
+
+
+Wang, et al. Standards Track [Page 46]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ max-br: The value of max-br is an integer indicating the maximum
+ video bitrate in units of 1000 bits per second for the VCL HRD
+ parameters and in units of 1200 bits per second for the NAL HRD
+ parameters. Note that this parameter does not use units of
+ cpbBrVclFactor and cpbBrNALFactor (see Table A-1 of [1]).
+
+ The max-br parameter signals that the video decoder of the
+ receiver is capable of decoding video at a higher bitrate than
+ is required by the signaled highest level conveyed in the value
+ of the profile-level-id parameter or the max-recv-level
+ parameter.
+
+ When max-br is signaled, the video codec of the receiver MUST
+ be able to decode NAL unit streams that conform to the signaled
+ highest level, with the following exceptions in the limits
+ specified by the highest level:
+
+ o The value of max-br (after taking cpbBrVclFactor and
+ cpbBrNALFactor into consideration when needed) replaces the
+ MaxBR value in Table A-1 of [1] for the highest level.
+
+ o When the max-cpb parameter is not present, the result of the
+ following formula replaces the value of MaxCPB in Table A-1
+ of [1]: (MaxCPB of the signaled level) * max-br / (MaxBR of
+ the signaled highest level).
+
+ For example, if a receiver signals capability for Main profile
+ Level 1.2 with max-br equal to 1550, this indicates a maximum
+ video bitrate of 1550 kbits/sec for VCL HRD parameters, a
+ maximum video bitrate of 1860 kbits/sec for NAL HRD parameters,
+ and a CPB size of 4036458 bits (1550000 / 384000 * 1000 *
+ 1000).
+
+ The value of max-br (after taking cpbBrVclFactor and
+ cpbBrNALFactor into consideration when needed) MUST be greater
+ than or equal to the value MaxBR given in Table A-1 of [1] for
+ the signaled highest level.
+
+ Senders MAY use this knowledge to send higher bitrate video as
+ allowed in the level definition of Annex A of H.264 to achieve
+ improved video quality.
+
+ Informative note: This parameter was added primarily to
+ complement a similar codepoint in the ITU-T Recommendation
+ H.245, so as to facilitate signaling gateway designs. The
+ assumption that the network is capable of handling such
+ bitrates at any given time cannot be made from the value of
+
+
+
+
+Wang, et al. Standards Track [Page 47]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ this parameter. In particular, no conclusion can be drawn
+ that the signaled bitrate is possible under congestion
+ control constraints.
+
+ redundant-pic-cap:
+ This parameter signals the capabilities of a receiver
+ implementation. When equal to 0, the parameter indicates that
+ the receiver makes no attempt to use redundant coded pictures
+ to correct incorrectly decoded primary coded pictures. When
+ equal to 0, the receiver is not capable of using redundant
+ slices; therefore, a sender SHOULD avoid sending redundant
+ slices to save bandwidth. When equal to 1, the receiver is
+ capable of decoding any such redundant slice that covers a
+ corrupted area in a primary decoded picture (at least partly),
+ and therefore a sender MAY send redundant slices. When the
+ parameter is not present, a value of 0 MUST be used for
+ redundant-pic-cap. When present, the value of redundant-pic-
+ cap MUST be either 0 or 1.
+
+ When the profile-level-id parameter is present in the same
+ signaling as the redundant-pic-cap parameter and the profile
+ indicated in profile-level-id is such that it disallows the use
+ of redundant coded pictures (e.g., Main profile), the value of
+ redundant-pic-cap MUST be equal to 0. When a receiver
+ indicates redundant-pic-cap equal to 0, the received stream
+ SHOULD NOT contain redundant coded pictures.
+
+ Informative note: Even if redundant-pic-cap is equal to 0,
+ the decoder is able to ignore redundant codec pictures
+ provided that the decoder supports a profile (Baseline,
+ Extended) in which redundant coded pictures are allowed.
+
+ Informative note: Even if redundant-pic-cap is equal to 1,
+ the receiver may also choose other error concealment
+ strategies to replace or complement decoding of redundant
+ slices.
+
+ sprop-parameter-sets:
+ This parameter MAY be used to convey any sequence and picture
+ parameter set NAL units (herein referred to as the initial
+ parameter set NAL units) that can be placed in the NAL unit
+ stream to precede any other NAL units in decoding order. The
+ parameter MUST NOT be used to indicate codec capability in any
+ capability exchange procedure. The value of the parameter is a
+ comma-separated (',') list of base64 [7] representations of
+ parameter set NAL units as specified in Sections 7.3.2.1 and
+
+
+
+
+
+Wang, et al. Standards Track [Page 48]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ 7.3.2.2 of [1]. Note that the number of bytes in a parameter
+ set NAL unit is typically less than 10, but a picture parameter
+ set NAL unit can contain several hundred bytes.
+
+ Informative note: When several payload types are offered in
+ the SDP Offer/Answer model, each with its own sprop-
+ parameter-sets parameter, the receiver cannot assume that
+ those parameter sets do not use conflicting storage
+ locations (i.e., identical values of parameter set
+ identifiers). Therefore, a receiver should buffer all
+ sprop-parameter-sets and make them available to the decoder
+ instance that decodes a certain payload type.
+
+ The sprop-parameter-sets parameter MUST only contain parameter
+ sets that are conforming to the profile-level-id, i.e., the
+ subset of coding tools indicated by any of the parameter sets
+ MUST be equal to the default sub-profile, and the level
+ indicated by any of the parameter sets MUST be equal to the
+ default level.
+
+ sprop-level-parameter-sets:
+ This parameter MAY be used to convey any sequence and picture
+ parameter set NAL units (herein referred to as the initial
+ parameter set NAL units) that can be placed in the NAL unit
+ stream to precede any other NAL units in decoding order and
+ that are associated with one or more levels different than the
+ default level. The parameter MUST NOT be used to indicate
+ codec capability in any capability exchange procedure.
+
+ The sprop-level-parameter-sets parameter contains parameter
+ sets for one or more levels that are different than the default
+ level. All parameter sets associated with one level are
+ clustered and prefixed with a three-byte field that has the
+ same syntax as profile-level-id. This enables the receiver to
+ install the parameter sets for one level and discard the rest.
+ The three-byte field is named PLId, and all parameter sets
+ associated with one level are named PSL, which has the same
+ syntax as sprop-parameter-sets. Parameter sets for each level
+ are represented in the form of PLId:PSL, i.e., PLId followed by
+ a colon (':') and the base64 [7] representation of the initial
+ parameter set NAL units for the level. Each pair of PLId:PSLs
+ is also separated by a colon. Note that a PSL can contain
+ multiple parameter sets for that level, separated with commas
+ (',').
+
+ The subset of coding tools indicated by each PLId field MUST be
+ equal to the default sub-profile, and the level indicated by
+ each PLId field MUST be different than the default level. All
+
+
+
+Wang, et al. Standards Track [Page 49]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ sequence parameter sets contained in each PSL MUST have the
+ three bytes from profile_idc to level_idc, inclusive, equal to
+ the preceding PLId.
+
+ Informative note: This parameter allows for efficient level
+ downgrade or upgrade in SDP Offer/Answer and out-of-band
+ transport of parameter sets simultaneously.
+
+ use-level-src-parameter-sets:
+ This parameter MAY be used to indicate a receiver capability.
+ The value MAY be equal to either 0 or 1. When the parameter is
+ not present, the value MUST be inferred to be equal to 0. The
+ value 0 indicates that the receiver does not understand the
+ sprop-level-parameter-sets parameter, does not understand the
+ "fmtp" source attribute as specified in Section 6.3 of [9],
+ will ignore sprop-level-parameter-sets when present, and will
+ ignore sprop-parameter-sets when conveyed using the "fmtp"
+ source attribute. The value 1 indicates that the receiver
+ understands the sprop-level-parameter-sets parameter,
+ understands the "fmtp" source attribute as specified in Section
+ 6.3 of [9], and is capable of using parameter sets contained in
+ the sprop-level-parameter-sets or contained in the sprop-
+ parameter-sets that is conveyed using the "fmtp" source
+ attribute.
+
+ Informative note: An RFC 3984 receiver does not understand
+ sprop-level-parameter-sets, use-level-src-parameter-sets, or
+ the "fmtp" source attribute as specified in Section 6.3 of
+ [9]. Therefore, during SDP Offer/Answer, an RFC 3984
+ receiver as the answerer will simply ignore sprop-level-
+ parameter-sets when present in an offer and sprop-parameter-
+ sets conveyed using the "fmtp" source attribute, as
+ specified in Section 6.3 of [9]. Assume that the offered
+ payload type was accepted at a level lower than the default
+ level. If the offered payload type included sprop-level-
+ parameter-sets or included sprop-parameter-sets conveyed
+ using the "fmtp" source attribute and if the offerer sees
+ that the answerer has not included use-level-src-parameter-
+ sets equal to 1 in the answer, the offerer knows that
+ in-band transport of parameter sets is needed.
+
+ in-band-parameter-sets:
+ This parameter MAY be used to indicate a receiver capability.
+ The value MAY be equal to either 0 or 1. The value 1 indicates
+ that the receiver discards out-of-band parameter sets in sprop-
+ parameter-sets and sprop-level-parameter-sets; therefore, the
+ sender MUST transmit all parameter sets in-band. The value 0
+ indicates that the receiver utilizes out-of-band parameter sets
+
+
+
+Wang, et al. Standards Track [Page 50]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ included in sprop-parameter-sets and/or sprop-level-parameter-
+ sets. However, in this case, the sender MAY still choose to
+ send parameter sets in-band. When in-band-parameter-sets is
+ equal to 1, use-level-src-parameter-sets MUST NOT be present or
+ MUST be equal to 0. When the parameter is not present, this
+ receiver capability is not specified, and therefore the sender
+ MAY send out-of-band parameter sets only, it MAY send in-band-
+ parameter-sets only, or it MAY send both.
+
+ level-asymmetry-allowed:
+ This parameter MAY be used in SDP Offer/Answer to indicate
+ whether level asymmetry, i.e., sending media encoded at a
+ different level in the offerer-to-answerer direction than the
+ level in the answerer-to-offerer direction, is allowed. The
+ value MAY be equal to either 0 or 1. When the parameter is not
+ present, the value MUST be inferred to be equal to 0. The
+ value 1 in both the offer and the answer indicates that level
+ asymmetry is allowed. The value of 0 in either the offer or
+ the answer indicates that level asymmetry is not allowed.
+
+ If level-asymmetry-allowed is equal to 0 (or not present) in
+ either the offer or the answer, level asymmetry is not allowed.
+ In this case, the level to use in the direction from the
+ offerer to the answerer MUST be the same as the level to use in
+ the opposite direction.
+
+ packetization-mode:
+ This parameter signals the properties of an RTP payload type or
+ the capabilities of a receiver implementation. Only a single
+ configuration point can be indicated; thus, when capabilities
+ to support more than one packetization-mode are declared,
+ multiple configuration points (RTP payload types) must be used.
+
+ When the value of packetization-mode is equal to 0 or
+ packetization-mode is not present, the single NAL mode MUST be
+ used. This mode is in use in standards using ITU-T
+ Recommendation H.241 [3] (see Section 12.1). When the value of
+ packetization-mode is equal to 1, the non-interleaved mode MUST
+ be used. When the value of packetization-mode is equal to 2,
+ the interleaved mode MUST be used. The value of packetization-
+ mode MUST be an integer in the range of 0 to 2, inclusive.
+
+ sprop-interleaving-depth:
+ This parameter MUST NOT be present when packetization-mode is
+ not present or the value of packetization-mode is equal to 0 or
+ 1. This parameter MUST be present when the value of
+ packetization-mode is equal to 2.
+
+
+
+
+Wang, et al. Standards Track [Page 51]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ This parameter signals the properties of an RTP packet stream.
+ It specifies the maximum number of VCL NAL units that precede
+ any VCL NAL unit in the RTP packet stream in transmission order
+ and that follow the VCL NAL unit in decoding order.
+ Consequently, it is guaranteed that receivers can reconstruct
+ NAL unit decoding order when the buffer size for NAL unit
+ decoding order recovery is at least the value of sprop-
+ interleaving-depth + 1 in terms of VCL NAL units.
+
+ The value of sprop-interleaving-depth MUST be an integer in the
+ range of 0 to 32767, inclusive.
+
+ sprop-deint-buf-req:
+ This parameter MUST NOT be present when packetization-mode is
+ not present or the value of packetization-mode is equal to 0 or
+ 1. It MUST be present when the value of packetization-mode is
+ equal to 2.
+
+ sprop-deint-buf-req signals the required size of the
+ de-interleaving buffer for the RTP packet stream. The value of
+ the parameter MUST be greater than or equal to the maximum
+ buffer occupancy (in units of bytes) required in such a
+ de-interleaving buffer that is specified in Section 7.2. It is
+ guaranteed that receivers can perform the de-interleaving of
+ interleaved NAL units into NAL unit decoding order, when the
+ de-interleaving buffer size is at least the value of sprop-
+ deint-buf-req in terms of bytes.
+
+ The value of sprop-deint-buf-req MUST be an integer in the
+ range of 0 to 4294967295, inclusive.
+
+ Informative note: sprop-deint-buf-req indicates the required
+ size of the de-interleaving buffer only. When network
+ jitter can occur, an appropriately sized jitter buffer has
+ to be provisioned for as well.
+
+ deint-buf-cap:
+ This parameter signals the capabilities of a receiver
+ implementation and indicates the amount of de-interleaving
+ buffer space in units of bytes that the receiver has available
+ for reconstructing the NAL unit decoding order. A receiver is
+ able to handle any stream for which the value of the sprop-
+ deint-buf-req parameter is smaller than or equal to this
+ parameter.
+
+ If the parameter is not present, then a value of 0 MUST be used
+ for deint-buf-cap. The value of deint-buf-cap MUST be an
+ integer in the range of 0 to 4294967295, inclusive.
+
+
+
+Wang, et al. Standards Track [Page 52]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Informative note: deint-buf-cap indicates the maximum
+ possible size of the de-interleaving buffer of the receiver
+ only. When network jitter can occur, an appropriately sized
+ jitter buffer has to be provisioned for as well.
+
+ sprop-init-buf-time:
+ This parameter MAY be used to signal the properties of an RTP
+ packet stream. The parameter MUST NOT be present if the value
+ of packetization-mode is equal to 0 or 1.
+
+ The parameter signals the initial buffering time that a
+ receiver MUST wait before starting decoding to recover the NAL
+ unit decoding order from the transmission order. The parameter
+ is the maximum value of (decoding time of the NAL unit -
+ transmission time of a NAL unit), assuming reliable and
+ instantaneous transmission, the same timeline for transmission
+ and decoding, and commencement of decoding when the first
+ packet arrives.
+
+ An example of specifying the value of sprop-init-buf-time
+ follows. A NAL unit stream is sent in the following
+ interleaved order, in which the value corresponds to the
+ decoding time and the transmission order is from left to right:
+
+ 0 2 1 3 5 4 6 8 7 ...
+
+ Assuming a steady transmission rate of NAL units, the
+ transmission times are:
+
+ 0 1 2 3 4 5 6 7 8 ...
+
+ Subtracting the decoding time from the transmission time
+ column-wise results in the following series:
+
+ 0 -1 1 0 -1 1 0 -1 1 ...
+
+ Thus, in terms of intervals of NAL unit transmission times, the
+ value of sprop-init-buf-time in this example is 1. The
+ parameter is coded as a non-negative base10 integer
+ representation in clock ticks of a 90-kHz clock. If the
+ parameter is not present, then no initial buffering time value
+ is defined. Otherwise, the value of sprop-init-buf-time MUST
+ be an integer in the range of 0 to 4294967295, inclusive.
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 53]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ In addition to the signaled sprop-init-buf-time, receivers
+ SHOULD take into account the transmission delay jitter
+ buffering, including buffering for the delay jitter caused by
+ mixers, translators, gateways, proxies, traffic-shapers, and
+ other network elements.
+
+ sprop-max-don-diff:
+ This parameter MAY be used to signal the properties of an RTP
+ packet stream. It MUST NOT be used to signal transmitter,
+ receiver, or codec capabilities. The parameter MUST NOT be
+ present if the value of packetization-mode is equal to 0 or 1.
+ sprop-max-don-diff is an integer in the range of 0 to 32767,
+ inclusive. If sprop-max-don-diff is not present, the value of
+ the parameter is unspecified. sprop-max-don-diff is calculated
+ as follows:
+
+ sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)},
+ for any i and any j>i,
+
+ where i and j indicate the index of the NAL unit in the
+ transmission order and AbsDON denotes a decoding order number
+ of the NAL unit that does not wrap around to 0 after 65535. In
+ other words, AbsDON is calculated as follows: let m and n be
+ consecutive NAL units in transmission order. For the very
+ first NAL unit in transmission order (whose index is 0),
+ AbsDON(0) = DON(0). For other NAL units, AbsDON is calculated
+ as follows:
+
+ If DON(m) == DON(n), AbsDON(n) = AbsDON(m)
+
+ If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
+ AbsDON(n) = AbsDON(m) + DON(n) - DON(m)
+
+ If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
+ AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)
+
+ If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
+ AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n))
+
+ If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
+ AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))
+
+ where DON(i) is the decoding order number of the NAL unit
+ having index i in the transmission order. The decoding order
+ number is specified in Section 5.5.
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 54]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Informative note: Receivers may use sprop-max-don-diff to
+ trigger which NAL units in the receiver buffer can be passed
+ to the decoder.
+
+ max-rcmd-nalu-size:
+ This parameter MAY be used to signal the capabilities of a
+ receiver. The parameter MUST NOT be used for any other
+ purposes. The value of the parameter indicates the largest
+ NALU size in bytes that the receiver can handle efficiently.
+ The parameter value is a recommendation, not a strict upper
+ boundary. The sender MAY create larger NALUs but must be aware
+ that the handling of these may come at a higher cost than NALUs
+ conforming to the limitation.
+
+ The value of max-rcmd-nalu-size MUST be an integer in the range
+ of 0 to 4294967295, inclusive. If this parameter is not
+ specified, no known limitation to the NALU size exists.
+ Senders still have to consider the MTU size available between
+ the sender and the receiver and SHOULD run MTU discovery for
+ this purpose.
+
+ This parameter is motivated by, for example, an IP to H.223
+ video telephony gateway, where NALUs smaller than the H.223
+ transport data unit will be more efficient. A gateway may
+ terminate IP; thus, MTU discovery will normally not work beyond
+ the gateway.
+
+ Informative note: Setting this parameter to a lower than
+ necessary value may have a negative impact.
+
+ sar-understood:
+ This parameter MAY be used to indicate a receiver capability
+ and nothing else. The parameter indicates the maximum value of
+ aspect_ratio_idc (specified in [1]) smaller than 255 that the
+ receiver understands. Table E-1 of [1] specifies
+ aspect_ratio_idc equal to 0 as "unspecified"; 1 to 16,
+ inclusive, as specific Sample Aspect Ratios (SARs); 17 to 254,
+ inclusive, as "reserved"; and 255 as the Extended SAR, for
+ which SAR width and SAR height are explicitly signaled.
+ Therefore, a receiver with a decoder according to [1]
+ understands aspect_ratio_idc in the range of 1 to 16,
+ inclusive, and aspect_ratio_idc equal to 255, in the sense that
+ the receiver knows exactly what the SAR is. For such a
+ receiver, the value of sar-understood is 16. In the future, if
+ Table E-1 of [1] is extended, e.g., such that the SAR for
+ aspect_ratio_idc equal to 17 is specified, then for a receiver
+ with a decoder that understands the extension, the value of
+
+
+
+
+Wang, et al. Standards Track [Page 55]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ sar-understood is 17. For a receiver with a decoder according
+ to the 2003 version of [1], the value of sar-understood is 13,
+ as the minimum reserved aspect_ratio_idc therein is 14.
+
+ When sar-understood is not present, the value MUST be inferred
+ to be equal to 13.
+
+ sar-supported:
+ This parameter MAY be used to indicate a receiver capability
+ and nothing else. The value of this parameter is an integer in
+ the range of 1 to sar-understood, inclusive, equal to 255. The
+ value of sar-supported equal to N smaller than 255 indicates
+ that the receiver supports all the SARs corresponding to H.264
+ aspect_ratio_idc values (see Table E-1 of [1]) in the range
+ from 1 to N, inclusive, without geometric distortion. The
+ value of sar-supported equal to 255 indicates that the receiver
+ supports all sample aspect ratios that are expressible using
+ two 16-bit integer values as the numerator and denominator,
+ i.e., those that are expressible using the H.264
+ aspect_ratio_idc value of 255 (Extended_SAR, see Table E-1 of
+ [1]), without geometric distortion.
+
+ H.264-compliant encoders SHOULD NOT send an aspect_ratio_idc
+ equal to 0 or an aspect_ratio_idc larger than sar-understood
+ and smaller than 255. H.264-compliant encoders SHOULD send an
+ aspect_ratio_idc that the receiver is able to display without
+ geometrical distortion. However, H.264-compliant encoders MAY
+ choose to send pictures using any SAR.
+
+ Note that the actual sample aspect ratio or extended sample
+ aspect ratio, when present, of the stream is conveyed in the
+ Video Usability Information (VUI) part of the sequence
+ parameter set.
+
+ Encoding considerations:
+ This type is only defined for transfer via RTP (RFC 3550).
+
+ Security considerations:
+ See Section 9 of RFC 6184.
+
+ Public specification:
+ Please refer to RFC 6184 and its Section 17.
+
+ Additional information:
+ None
+
+ File extensions: none
+
+
+
+
+Wang, et al. Standards Track [Page 56]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Macintosh file type code: none
+
+ Object identifier or OID: none
+
+ Person & email address to contact for further information:
+ Ye-Kui Wang, yekui.wang@huawei.com
+
+ Intended usage: COMMON
+
+ Author:
+ Ye-Kui Wang, yekui.wang@huawei.com
+
+ Change controller:
+ IETF Audio/Video Transport working group delegated from the
+ IESG.
+
+8.2. SDP Parameters
+
+ The receiver MUST ignore any parameter unspecified in this memo.
+
+8.2.1. Mapping of Payload Type Parameters to SDP
+
+ The media type video/H264 string is mapped to fields in the Session
+ Description Protocol (SDP) [6] as follows:
+
+ o The media name in the "m=" line of SDP MUST be video.
+
+ o The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the
+ media subtype).
+
+ o The clock rate in the "a=rtpmap" line MUST be 90000.
+
+ o The OPTIONAL parameters profile-level-id, max-recv-level, max-
+ mbps, max-smbps, max-fs, max-cpb, max-dpb, max-br, redundant-pic-
+ cap, use-level-src-parameter-sets, in-band-parameter-sets, level-
+ asymmetry-allowed, packetization-mode, sprop-interleaving-depth,
+ sprop-deint-buf-req, deint-buf-cap, sprop-init-buf-time, sprop-
+ max-don-diff, max-rcmd-nalu-size, sar-understood, and sar-
+ supported, when present, MUST be included in the "a=fmtp" line of
+ SDP. These parameters are expressed as a media type string, in
+ the form of a semicolon-separated list of parameter=value pairs.
+
+ o The OPTIONAL parameters sprop-parameter-sets and sprop-level-
+ parameter-sets, when present, MUST be included in the "a=fmtp"
+ line of SDP or conveyed using the "fmtp" source attribute as
+ specified in Section 6.3 of [9]. For a particular media format
+ (i.e., RTP payload type), a sprop-parameter-sets or sprop-level-
+ parameter-sets MUST NOT be both included in the "a=fmtp" line of
+
+
+
+Wang, et al. Standards Track [Page 57]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ SDP and conveyed using the "fmtp" source attribute. When included
+ in the "a=fmtp" line of SDP, these parameters are expressed as a
+ media type string, in the form of a semicolon-separated list of
+ parameter=value pairs. When conveyed using the "fmtp" source
+ attribute, these parameters are only associated with the given
+ source and payload type as parts of the "fmtp" source attribute.
+
+ Informative note: Conveyance of sprop-parameter-sets and sprop-
+ level-parameter-sets using the "fmtp" source attribute allows
+ for out-of-band transport of parameter sets in topologies like
+ Topo-Video-switch-MCU [29].
+
+ An example of media representation in SDP is as follows (Baseline
+ profile, Level 3.0, some of the constraints of the Main profile may
+ not be obeyed):
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E;
+ packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data>
+
+8.2.2. Usage with the SDP Offer/Answer Model
+
+ When H.264 is offered over RTP using SDP in an Offer/Answer model [8]
+ for negotiation for unicast usage, the following limitations and
+ rules apply:
+
+ o The parameters identifying a media format configuration for H.264
+ are profile-level-id and packetization-mode. These media format
+ configuration parameters (except for the level part of profile-
+ level-id) MUST be used symmetrically; that is, the answerer MUST
+ either maintain all configuration parameters or remove the media
+ format (payload type) completely if one or more of the parameter
+ values are not supported. Note that the level part of profile-
+ level-id includes level_idc, and, for indication of Level 1b when
+ profile_idc is equal to 66, 77, or 88, bit 4
+ (constraint_set3_flag) of profile-iop. The level part of profile-
+ level-id is changeable.
+
+ Informative note: The requirement for symmetric use does not
+ apply for the level part of profile-level-id and does not apply
+ for the other stream properties and capability parameters.
+
+ Informative note: In H.264 [1], all the levels except for Level
+ 1b are equal to the value of level_idc divided by 10. Level 1b
+ is a level higher than Level 1.0 but lower than Level 1.1 and
+ is signaled in an ad hoc manner, because the level was
+
+
+
+Wang, et al. Standards Track [Page 58]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ specified after Level 1.0 and Level 1.1. For the Baseline,
+ Main, and Extended profiles (with profile_idc equal to 66, 77,
+ and 88, respectively), Level 1b is indicated by level_idc equal
+ to 11 (i.e., same as Level 1.1) and constraint_set3_flag equal
+ to 1. For other profiles, Level 1b is indicated by level_idc
+ equal to 9 (but note that Level 1b for these profiles are still
+ higher than Level 1, which has level_idc equal to 10 and lower
+ than Level 1.1). In SDP Offer/Answer, an answer to an offer
+ may indicate a level equal to or lower than the level indicated
+ in the offer. Due to the ad hoc indication of Level 1b,
+ offerers and answerers must check the value of bit 4
+ (constraint_set3_flag) of the middle octet of the parameter
+ profile-level-id, when profile_idc is equal to 66, 77, or 88
+ and level_idc is equal to 11.
+
+ To simplify the handling and matching of these configurations, the
+ same RTP payload type number used in the offer SHOULD also be used
+ in the answer, as specified in [8]. An answer MUST NOT contain
+ the payload type number used in the offer unless the configuration
+ is exactly the same as in the offer.
+
+ Informative note: When an offerer receives an answer, it has to
+ compare payload types not declared in the offer based on the
+ media type (i.e., video/H264) and the above media configuration
+ parameters with any payload types it has already declared.
+ This will enable it to determine whether the configuration in
+ question is new or if it is equivalent to configuration already
+ offered, since a different payload type number may be used in
+ the answer.
+
+ o When present, the parameter max-recv-level declares the highest
+ level supported for receiving. In case max-recv-level is not
+ present, the highest level supported for receiving is equal to the
+ default level indicated by the level part of profile-level-id.
+ When present, max-recv-level MUST be higher than the default
+ level.
+
+ o The parameter level-asymmetry-allowed indicates whether level
+ asymmetry is allowed.
+
+ If level-asymmetry-allowed is equal to 0 (or not present) in
+ either the offer or the answer, level asymmetry is not allowed.
+ In this case, the level to use in the direction from the offerer
+ to the answerer MUST be the same as the level to use in the
+ opposite direction, and the common level to use is equal to the
+ lower value of the default level in the offer and the default
+ level in the answer.
+
+
+
+
+Wang, et al. Standards Track [Page 59]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Otherwise, level-asymmetry-allowed equals 1 in both the offer and
+ the answer, and level asymmetry is allowed. In this case, the
+ level to use in the offerer-to-answerer direction MUST be equal to
+ the highest level the answerer supports for receiving, and the
+ level to use in the answerer-to-offerer direction MUST be equal to
+ the highest level the offerer supports for receiving.
+
+ When level asymmetry is not allowed, level upgrade is not allowed,
+ i.e., the default level in the answer MUST be equal to or lower
+ than the default level in the offer.
+
+ o The parameters sprop-deint-buf-req, sprop-interleaving-depth,
+ sprop-max-don-diff, and sprop-init-buf-time describe the
+ properties of the RTP packet stream that the offerer or answerer
+ is sending for the media format configuration. This differs from
+ the normal usage of the Offer/Answer parameters: normally such
+ parameters declare the properties of the stream that the offerer
+ or the answerer is able to receive. When dealing with H.264, the
+ offerer assumes that the answerer will be able to receive media
+ encoded using the configuration being offered.
+
+ Informative note: The above parameters apply for any stream
+ sent by a declaring entity with the same configuration; i.e.,
+ they are dependent on their source. Rather than being bound to
+ the payload type, the values may have to be applied to another
+ payload type when being sent, as they apply for the
+ configuration.
+
+ o The capability parameters max-mbps, max-smbps, max-fs, max-cpb,
+ max-dpb, max-br, redundant-pic-cap, max-rcmd-nalu-size, sar-
+ understood, and sar-supported MAY be used to declare further
+ capabilities of the offerer or answerer for receiving. These
+ parameters MUST NOT be present when the direction attribute is
+ "sendonly" and when the parameters describe the limitations of
+ what the offerer or answerer accepts for receiving streams.
+
+ o An offerer has to include the size of the de-interleaving buffer,
+ sprop-deint-buf-req, in the offer for an interleaved H.264 stream.
+ To enable the offerer and answerer to inform each other about
+ their capabilities for de-interleaving buffering in receiving
+ streams, both parties are RECOMMENDED to include deint-buf-cap.
+ For interleaved streams, it is also RECOMMENDED to consider
+ offering multiple payload types with different buffering
+ requirements when the capabilities of the receiver are unknown.
+
+ o The sprop-parameter-sets or sprop-level-parameter-sets parameter,
+ when present (included in the "a=fmtp" line of SDP or conveyed
+ using the "fmtp" source attribute as specified in Section 6.3 of
+
+
+
+Wang, et al. Standards Track [Page 60]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ [9]), is used for out-of-band transport of parameter sets.
+ However, when out-of-band transport of parameter sets is used,
+ parameter sets MAY still be additionally transported in-band.
+
+ The answerer MAY use either out-of-band or in-band transport of
+ parameter sets for the stream it is sending, regardless of whether
+ out-of-band parameter sets transport has been used in the offerer-
+ to-answerer direction. Parameter sets included in an answer are
+ independent of those parameter sets included in the offer, as they
+ are used for decoding two different video streams, one from the
+ answerer to the offerer and the other in the opposite direction.
+
+ The following rules apply to transport of parameter sets in the
+ offerer-to-answerer direction.
+
+ o An offer MAY include either or both of sprop-parameter-sets
+ and sprop-level-parameter-sets. If neither sprop-parameter-
+ sets nor sprop-level-parameter-sets is present in the offer,
+ then only in-band transport of parameter sets is used.
+
+ o If the answer includes in-band-parameter-sets equal to 1,
+ then the offerer MUST transmit parameter sets in-band.
+ Otherwise, the following applies.
+
+ o If the level to use in the offerer-to-answerer
+ direction is equal to the default level in the offer,
+ the following applies.
+
+ When there is a sprop-parameter-sets included in
+ the "a=fmtp" line in the offer, the answerer MUST
+ be prepared to use the parameter sets included in
+ the sprop-parameter-sets for decoding the incoming
+ NAL unit stream.
+
+ When there is a sprop-parameter-sets conveyed using
+ the "fmtp" source attribute in the offer, the
+ following applies. If the answer includes use-
+ level-src-parameter-sets equal to 1 or the "fmtp"
+ source attribute, the answerer MUST be prepared to
+ use the parameter sets included in the sprop-
+ parameter-sets for decoding the incoming NAL unit
+ stream; otherwise, the offerer MUST transmit
+ parameter sets in-band.
+
+ When sprop-parameter-sets is not present in the
+ offer, the offerer MUST transmit parameter sets in-
+ band.
+
+
+
+
+Wang, et al. Standards Track [Page 61]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ The answerer MUST ignore sprop-level-parameter-
+ sets, when present (either included in the "a=fmtp"
+ line or conveyed using the "fmtp" source attribute)
+ in the offer.
+
+ o Otherwise, the level to use in the offerer-to-answerer
+ direction is not equal to the default level in the
+ offer, and the following applies.
+
+ The answerer MUST ignore sprop-parameter-sets, when
+ present (either included in the "a=fmtp" line or
+ conveyed using the "fmtp" source attribute) in the
+ offer.
+
+ When neither use-level-src-parameter-sets is equal
+ to 1 nor the "fmtp" source attribute is present in
+ the answer, the answerer MUST ignore sprop-level-
+ parameter-sets, when present in the offer, and the
+ offerer MUST transmit parameter sets in-band.
+
+ When either use-level-src-parameter-sets is equal
+ to 1 or the "fmtp" source attribute is present in
+ the answer, the answerer MUST be prepared to use
+ the parameter sets that are included in sprop-
+ level-parameter-sets for the accepted level (i.e.,
+ the default level in the answer), when present in
+ the offer, for decoding the incoming NAL unit
+ stream, and ignore all other parameter sets
+ included in sprop-level-parameter-sets.
+
+ When no parameter sets for the level to use in the
+ offerer-to-answerer direction are present in sprop-
+ level-parameter-sets in the offer, the offerer MUST
+ transmit parameter sets in-band.
+
+ The following rules apply to the transport of parameter sets in
+ the answerer-to-offerer direction.
+
+ o An answer MAY include either sprop-parameter-sets or sprop-
+ level-parameter-sets but MUST NOT include both. If neither
+ sprop-parameter-sets nor sprop-level-parameter-sets is
+ present in the answer, then only in-band transport of
+ parameter sets is used.
+
+ o If the offer includes in-band-parameter-sets equal to 1, the
+ answerer MUST NOT include sprop-parameter-sets or sprop-
+ level-parameter-sets in the answer and MUST transmit
+ parameter sets in-band. Otherwise, the following applies.
+
+
+
+Wang, et al. Standards Track [Page 62]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ o If the level to use in the answerer-to-offerer
+ direction is equal to the default level in the answer,
+ the following applies.
+
+ When there is a sprop-parameter-sets included in
+ the "a=fmtp" line in the answer, the offerer MUST
+ be prepared to use the parameter sets included in
+ the sprop-parameter-sets for decoding the incoming
+ NAL unit stream.
+
+ When there is a sprop-parameter-sets conveyed using
+ the "fmtp" source attribute in the answer, the
+ following applies. If the offer includes use-
+ level-src-parameter-sets equal to 1 or the "fmtp"
+ source attribute, the offerer MUST be prepared to
+ use the parameter sets included in the sprop-
+ parameter-sets for decoding the incoming NAL unit
+ stream; otherwise, the answerer MUST transmit
+ parameter sets in-band.
+
+ When sprop-parameter-sets is not present in the
+ answer, the answerer MUST transmit parameter sets
+ in-band.
+
+ The offerer MUST ignore sprop-level-parameter-sets,
+ when present (either included in the "a=fmtp" line
+ or conveyed using the "fmtp" source attribute) in
+ the answer.
+
+ o Otherwise, the level to use in the answerer-to-offerer
+ direction is not equal to the default level in the
+ answer, and the following applies.
+
+ The offerer MUST ignore sprop-parameter-sets when
+ present (either included in the "a=fmtp" line of
+ SDP or conveyed using the "fmtp" source attribute)
+ in the answer.
+
+ When neither use-level-src-parameter-sets is equal
+ to 1 nor the "fmtp" source attribute is present in
+ the offer, the offerer MUST ignore sprop-level-
+ parameter-sets, when present, and the answerer MUST
+ transmit parameter sets in-band.
+
+ When either use-level-src-parameter-sets is equal
+ to 1 or the "fmtp" source attribute is present in
+ the offer, the offerer MUST be prepared to use the
+ parameter sets that are included in sprop-level-
+
+
+
+Wang, et al. Standards Track [Page 63]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ parameter-sets for the level to use in the
+ answerer-to-offerer direction, when present in the
+ answer, for decoding the incoming NAL unit stream,
+ and ignore all other parameter sets included in
+ sprop-level-parameter-sets in the answer.
+
+ When no parameter sets for the level to use in the
+ answerer-to-offerer direction are present in sprop-
+ level-parameter-sets in the answer, the answerer
+ MUST transmit parameter sets in-band.
+
+ When sprop-parameter-sets or sprop-level-parameter-sets is
+ conveyed using the "fmtp" source attribute as specified in Section
+ 6.3 of [9], the receiver of the parameters MUST store the
+ parameter sets included in the sprop-parameter-sets or sprop-
+ level-parameter-sets for the accepted level and associate them
+ with the source given as a part of the "fmtp" source attribute.
+ Parameter sets associated with one source MUST only be used to
+ decode NAL units conveyed in RTP packets from the same source.
+ When this mechanism is in use, SSRC collision detection and
+ resolution MUST be performed as specified in [9].
+
+ Informative note: Conveyance of sprop-parameter-sets and sprop-
+ level-parameter-sets using the "fmtp" source attribute may be
+ used in topologies like Topo-Video-switch-MCU [29] to enable
+ out-of-band transport of parameter sets.
+
+ For streams being delivered over multicast, the following rules
+ apply:
+
+ o The media format configuration is identified by "profile-level-
+ id", including the level part, and packetization-mode. These
+ media format configuration parameters (including the level part of
+ profile-level-id) MUST be used symmetrically; that is, the
+ answerer MUST either maintain all configuration parameters or
+ remove the media format (payload type) completely. Note that this
+ implies that the level part of profile-level-id for Offer/Answer
+ in multicast is not changeable.
+
+ To simplify the handling and matching of these configurations, the
+ same RTP payload type number used in the offer SHOULD also be used
+ in the answer, as specified in [8]. An answer MUST NOT contain a
+ payload type number used in the offer unless the configuration is
+ the same as in the offer.
+
+ o Parameter sets received MUST be associated with the originating
+ source and MUST only be used in decoding the incoming NAL unit
+ stream from the same source.
+
+
+
+Wang, et al. Standards Track [Page 64]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ o The rules for other parameters are the same as above for unicast
+ as long as the above rules are obeyed.
+
+ Table 6 lists the interpretation of all the media type parameters
+ that MUST be used for the different direction attributes.
+
+ Table 6. Interpretation of parameters for different direction
+ attributes
+
+ sendonly --+
+ recvonly --+ |
+ sendrecv --+ | |
+ | | |
+ profile-level-id C C P
+ max-recv-level R R -
+ packetization-mode C C P
+ sprop-deint-buf-req P - P
+ sprop-interleaving-depth P - P
+ sprop-max-don-diff P - P
+ sprop-init-buf-time P - P
+ max-mbps R R -
+ max-smbps R R -
+ max-fs R R -
+ max-cpb R R -
+ max-dpb R R -
+ max-br R R -
+ redundant-pic-cap R R -
+ deint-buf-cap R R -
+ max-rcmd-nalu-size R R -
+ sar-understood R R -
+ sar-supported R R -
+ in-band-parameter-sets R R -
+ use-level-src-parameter-sets R R -
+ level-asymmetry-allowed O - -
+ sprop-parameter-sets S - S
+ sprop-level-parameter-sets S - S
+
+ Legend:
+
+ C: configuration for sending and receiving streams
+ O: offer/answer mode
+ P: properties of the stream to be sent
+ R: receiver capabilities
+ S: out-of-band parameter sets
+ -: not usable (when present, SHOULD be ignored)
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 65]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Parameters used for declaring receiver capabilities are in general
+ downgradable; that is, they express the upper limit for a sender's
+ possible behavior. Thus, a sender MAY select to set its encoder
+ using only lower/less or equal values of these parameters.
+
+ Parameters declaring a configuration point are not changeable, with
+ the exception of the level part of the profile-level-id parameter for
+ unicast usage.
+
+ When a sender's capabilities are declared and non-downgradable
+ parameters are used in this declaration, these parameters express a
+ configuration that is acceptable for the sender to receive streams.
+ In order to achieve high interoperability levels, it is often
+ advisable to offer multiple alternative configurations, e.g., for the
+ packetization mode. It is impossible to offer multiple
+ configurations in a single payload type. Thus, when multiple
+ configuration offers are made, each offer requires its own RTP
+ payload type associated with the offer.
+
+ A receiver SHOULD understand all media type parameters, even if it
+ only supports a subset of the payload format's functionality. This
+ ensures that a receiver is capable of understanding when an offer to
+ receive media can be downgraded to what is supported by the receiver
+ of the offer.
+
+ An answerer MAY extend the offer with additional media format
+ configurations. However, to enable their usage, in most cases, a
+ second offer is required from the offerer to provide the stream
+ property parameters that the media sender will use. This also has
+ the effect that the offerer has to be able to receive this media
+ format configuration, not only to send it.
+
+ If an offerer wishes to have non-symmetric capabilities between
+ sending and receiving, the offerer can allow asymmetric levels via
+ level-asymmetry-allowed being equal to 1. Alternatively, the offerer
+ could offer different RTP sessions, i.e., different media lines
+ declared as "recvonly" and "sendonly", respectively. This may have
+ further implications on the system and may require additional
+ external semantics to associate the two media lines.
+
+8.2.3. Usage in Declarative Session Descriptions
+
+ When H.264 over RTP is offered with SDP in a declarative style, as in
+ Real Time Streaming Protocol (RTSP) [27] or Session Announcement
+ Protocol (SAP) [28], the following considerations are necessary.
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 66]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ o All parameters capable of indicating both stream properties and
+ receiver capabilities are used to indicate only stream properties.
+ For example, in this case, the parameter profile-level-id declares
+ only the values used by the stream, not the capabilities for
+ receiving streams. The result of this is that the following
+ interpretation of the parameters MUST be used:
+
+ Declaring actual configuration or stream properties:
+
+ - profile-level-id
+ - packetization-mode
+ - sprop-interleaving-depth
+ - sprop-deint-buf-req
+ - sprop-max-don-diff
+ - sprop-init-buf-time
+
+ Out-of-band transporting of parameter sets:
+
+ - sprop-parameter-sets
+ - sprop-level-parameter-sets
+
+ Not usable (when present, they SHOULD be ignored):
+
+ - max-mbps
+ - max-smbps
+ - max-fs
+ - max-cpb
+ - max-dpb
+ - max-br
+ - max-recv-level
+ - redundant-pic-cap
+ - max-rcmd-nalu-size
+ - deint-buf-cap
+ - sar-understood
+ - sar-supported
+ - in-band-parameter-sets
+ - level-asymmetry-allowed
+ - use-level-src-parameter-sets
+
+ o A receiver of the SDP is required to support all parameters and
+ values of the parameters provided; otherwise, the receiver MUST
+ reject (RTSP) or not participate in (SAP) the session. It falls
+ on the creator of the session to use values that are expected to
+ be supported by the receiving application.
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 67]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+8.3. Examples
+
+ An SDP Offer/Answer exchange wherein both parties are expected to
+ both send and receive could look like the following. Only the media-
+ codec-specific parts of the SDP are shown. Some lines are wrapped
+ due to text constraints.
+
+ Offerer -> Answerer SDP message:
+
+ m=video 49170 RTP/AVP 100 99 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; packetization-mode=0;
+ sprop-parameter-sets=<parameter sets data#0>
+ a=rtpmap:99 H264/90000
+ a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data#1>
+ a=rtpmap:100 H264/90000
+ a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
+ sprop-parameter-sets=<parameter sets data#2>;
+ sprop-interleaving-depth=45; sprop-deint-buf-req=64000;
+ sprop-init-buf-time=102478; deint-buf-cap=128000
+
+ The above offer presents the same codec configuration in three
+ different packetization formats. Payload type 98 represents single
+ NALU mode, payload type 99 represents non-interleaved mode, and
+ payload type 100 indicates the interleaved mode. In the interleaved
+ mode case, the interleaving parameters that the offerer would use if
+ the answer indicates support for payload type 100 are also included.
+ In all three cases, the parameter sprop-parameter-sets conveys the
+ initial parameter sets that are required by the answerer when
+ receiving a stream from the offerer when this configuration is
+ accepted. Note that the value for sprop-parameter-sets could be
+ different for each payload type.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 68]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Answerer -> Offerer SDP message:
+
+ m=video 49170 RTP/AVP 100 99 97
+ a=rtpmap:97 H264/90000
+ a=fmtp:97 profile-level-id=42A01E; packetization-mode=0;
+ sprop-parameter-sets=<parameter sets data#3>
+ a=rtpmap:99 H264/90000
+ a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data#4>;
+ max-rcmd-nalu-size=3980
+ a=rtpmap:100 H264/90000
+ a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
+ sprop-parameter-sets=<parameter sets data#5>;
+ sprop-interleaving-depth=60;
+ sprop-deint-buf-req=86000; sprop-init-buf-time=156320;
+ deint-buf-cap=128000; max-rcmd-nalu-size=3980
+
+ As the Offer/Answer negotiation covers both sending and receiving
+ streams, an offer indicates the exact parameters for what the offerer
+ is willing to receive, whereas the answer indicates the same for what
+ the answerer is willing to receive. In this case, the offerer
+ declared that it is willing to receive payload type 98. The answerer
+ accepts this by declaring an equivalent payload type 97; that is, it
+ has identical values for the two parameters profile-level-id and
+ packetization-mode (since packetization-mode is equal to 0 and sprop-
+ deint-buf-req is not present). As the offered payload type 98 is
+ accepted, the answerer needs to store parameter sets included in
+ sprop-parameter-sets=<parameter sets data#0> in case the offer
+ finally decides to use this configuration. In the answer, the
+ answerer includes the parameter sets in sprop-parameter-
+ sets=<parameter sets data#3> that the answerer would use in the
+ stream sent from the answerer if this configuration is finally used.
+
+ The answerer also accepts the reception of the two configurations
+ that payload types 99 and 100 represent. Again, the answerer needs
+ to store parameter sets included in sprop-parameter-sets=<parameter
+ sets data#1> and sprop-parameter-sets=<parameter sets data#2> in case
+ the offer finally decides to use either of these two configurations.
+ The answerer provides the initial parameter sets for the answerer-to-
+ offerer direction, i.e., the parameter sets in sprop-parameter-
+ sets=<parameter sets data#4> and sprop-parameter-sets=<parameter sets
+ data#5>, for payload types 99 and 100, respectively, that it will use
+ to send the payload types. The answerer also provides the offerer
+ with its memory limit for de-interleaving operations by providing a
+ deint-buf-cap parameter. This is only useful if the offerer decides
+ on making a second offer, where it can take the new value into
+
+
+
+
+
+Wang, et al. Standards Track [Page 69]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ account. The max-rcmd-nalu-size indicates that the answerer can
+ efficiently process NALUs up to the size of 3980 bytes. However,
+ there is no guarantee that the network supports this size.
+
+ In the following example, the offer is accepted without level
+ downgrading (i.e., the default level, Level 3.0, is accepted), and
+ both sprop-parameter-sets and sprop-level-parameter-sets are present
+ in the offer. The answerer must ignore sprop-level-parameter-
+ sets=<parameter sets data#1> and store parameter sets in sprop-
+ parameter-sets=<parameter sets data#0> for decoding the incoming NAL
+ unit stream. The offerer must store the parameter sets in sprop-
+ parameter-sets=<parameter sets data#2> in the answer for decoding the
+ incoming NAL unit stream. Note that in this example, parameter sets
+ in sprop-parameter-sets=<parameter sets data#2> must be associated
+ with Level 3.0.
+
+ Offer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data#0>;
+ sprop-level-parameter-sets=<parameter sets data#1>
+
+ Answer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data#2>
+
+ In the following example, the offer (Baseline profile, Level 1.1) is
+ accepted with level downgrading (the accepted level is Level 1b), and
+ both sprop-parameter-sets and sprop-level-parameter-sets are present
+ in the offer. The answerer must ignore sprop-parameter-
+ sets=<parameter sets data#0> and all parameter sets not for the
+ accepted level (Level 1b) in sprop-level-parameter-sets=<parameter
+ sets data#1> and must store parameter sets for the accepted level
+ (Level 1b) in sprop-level-parameter-sets=<parameter sets data#1> for
+ decoding the incoming NAL unit stream. The offerer must store the
+ parameter sets in sprop-parameter-sets=<parameter sets data#2> in the
+ answer for decoding the incoming NAL unit stream. Note that in this
+ example, parameter sets in sprop-parameter-sets=<parameter sets
+ data#2> must be associated with Level 1b.
+
+
+
+
+
+Wang, et al. Standards Track [Page 70]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Offer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
+ packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data#0>;
+ sprop-level-parameter-sets=<parameter sets data#1>
+
+ Answer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
+ packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data#2>;
+ use-level-src-parameter-sets=1
+
+ In the following example, the offer (Baseline profile, Level 1.1) is
+ accepted with level downgrading (the accepted level is Level 1b), and
+ both sprop-parameter-sets and sprop-level-parameter-sets are present
+ in the offer. However, the answerer is a legacy RFC 3984
+ implementation and does not understand sprop-level-parameter-sets;
+ hence, it does not include use-level-src-parameter-sets (which the
+ answerer does not understand either) in the answer. Therefore, the
+ answerer must ignore both sprop-parameter-sets=<parameter sets
+ data#0> and sprop-level-parameter-sets=<parameter sets data#1>, and
+ the offerer must transport parameter sets in-band.
+
+ Offer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
+ packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data#0>;
+ sprop-level-parameter-sets=<parameter sets data#1>
+
+ Answer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
+ packetization-mode=1
+
+ In the following example, the offer is accepted without level
+ downgrading, and sprop-parameter-sets is present in the offer.
+ Parameter sets in sprop-parameter-sets=<parameter sets data#0> must
+
+
+
+Wang, et al. Standards Track [Page 71]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ be stored and used by the encoder of the offerer and the decoder of
+ the answerer, and parameter sets in sprop-parameter-sets=<parameter
+ sets data#1> must be used by the encoder of the answerer and the
+ decoder of the offerer. Note that sprop-parameter-sets=<parameter
+ sets data#0> is basically independent of sprop-parameter-
+ sets=<parameter sets data#1>.
+
+ Offer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data#0>
+
+ Answer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data#1>
+
+ In the following example, the offer is accepted without level
+ downgrading, and neither sprop-parameter-sets nor sprop-level-
+ parameter-sets is present in the offer, meaning that there is no out-
+ of-band transmission of parameter sets, which then have to be
+ transported in-band.
+
+ Offer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1
+
+ Answer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 72]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ In the following example, the offer is accepted with level
+ downgrading and sprop-parameter-sets is present in the offer. As
+ sprop-parameter-sets=<parameter sets data#0> contains level_idc
+ indicating Level 3.0, it therefore cannot be used, as the answerer
+ wants Level 2.0, and must be ignored by the answerer, and in-band
+ parameter sets must be used.
+
+ Offer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1;
+ sprop-parameter-sets=<parameter sets data#0>
+
+ Answer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
+ packetization-mode=1
+
+ In the following example, the offer is also accepted with level
+ downgrading, and neither sprop-parameter-sets nor sprop-level-
+ parameter-sets is present in the offer, meaning that there is no out-
+ of-band transmission of parameter sets, which then have to be
+ transported in-band.
+
+ Offer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1
+
+ Answer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
+ packetization-mode=1
+
+ In the following example, the offer is accepted with level upgrading,
+ and neither sprop-parameter-sets nor sprop-level-parameter-sets is
+ present in the offer or the answer, meaning that there is no out-of-
+ band transmission of parameter sets, which then have to be
+ transported in-band. The level to use in the offerer-to-answerer
+ direction is Level 3.0, and the level to use in the answerer-to-
+
+
+
+Wang, et al. Standards Track [Page 73]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ offerer direction is Level 2.0. The answerer is allowed to send at
+ any level up to and including Level 2.0, and the offerer is allowed
+ to send at any level up to and including Level 3.0.
+
+ Offer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
+ packetization-mode=1; level-asymmetry-allowed=1
+
+ Answer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1; level-asymmetry-allowed=1
+
+ In the following example, the offerer is a Multipoint Control Unit
+ (MCU) in a topology like Topo-Video-switch-MCU [29], offering
+ parameter sets received (using out-of-band transport) from three
+ other participants (B, C, and D) and receiving parameter sets from
+ the participant A, which is the answerer. The participants are
+ identified by their values of canonical name (CNAME), which are
+ mapped to different SSRC values. The same codec configuration is
+ used by all four participants. The participant A stores and
+ associates the parameter sets included in <parameter sets data#B>,
+ <parameter sets data#C>, and <parameter sets data#D> to participants
+ B, C, and D, respectively, and uses <parameter sets data#B> for
+ decoding NAL units carried in RTP packets originating from
+ participant B only, uses <parameter sets data#C> for decoding NAL
+ units carried in RTP packets originating from participant C only, and
+ uses <parameter sets data#D> for decoding NAL units carried in RTP
+ packets originating from participant D only.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 74]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Offer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=ssrc:SSRC-B cname:CNAME-B
+ a=ssrc:SSRC-C cname:CNAME-C
+ a=ssrc:SSRC-D cname:CNAME-D
+ a=ssrc:SSRC-B fmtp:98
+ sprop-parameter-sets=<parameter sets data#B>
+ a=ssrc:SSRC-C fmtp:98
+ sprop-parameter-sets=<parameter sets data#C>
+ a=ssrc:SSRC-D fmtp:98
+ sprop-parameter-sets=<parameter sets data#D>
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1
+
+ Answer SDP:
+
+ m=video 49170 RTP/AVP 98
+ a=ssrc:SSRC-A cname:CNAME-A
+ a=ssrc:SSRC-A fmtp:98
+ sprop-parameter-sets=<parameter sets data#A>
+ a=rtpmap:98 H264/90000
+ a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
+ packetization-mode=1
+
+8.4. Parameter Set Considerations
+
+ The H.264 parameter sets are a fundamental part of the video codec
+ and vital to its operation (see Section 1.2). Due to their
+ characteristics and their importance for the decoding process, lost
+ or erroneously transmitted parameter sets can hardly be concealed
+ locally at the receiver. A reference to a corrupt parameter set
+ normally has fatal results to the decoding process. Corruption could
+ occur, for example, due to the erroneous transmission or loss of a
+ parameter set NAL unit but also due to the untimely transmission of a
+ parameter set update. A parameter set update refers to a change of
+ at least one parameter in a picture parameter set or sequence
+ parameter set for which the picture parameter set or sequence
+ parameter set identifier remains unchanged. Therefore, the following
+ recommendations are provided as a guideline for the implementer of
+ the RTP sender.
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 75]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Parameter set NALUs can be transported using three different
+ principles:
+
+ A. Using a session control protocol (out-of-band) prior to the
+ actual RTP session.
+
+ B. Using a session control protocol (out-of-band) during an ongoing
+ RTP session.
+
+ C. Within the RTP packet stream in the payload (in-band) during an
+ ongoing RTP session.
+
+ It is recommended to implement principles A and B within a session
+ control protocol. SIP and SDP can be used as described in the SDP
+ Offer/Answer model and in the previous sections of this memo.
+ Section 8.2.2 includes a detailed discussion on transport of
+ parameter sets in-band or out-of-band in SDP Offer/Answer using media
+ type parameters sprop-parameter-sets, sprop-level-parameter-sets,
+ use-level-src-parameter-sets, and in-band-parameter-sets. This
+ section contains guidelines on how principles A and B should be
+ implemented within session control protocols. It is independent of
+ the particular protocol used. Principle C is supported by the RTP
+ payload format defined in this specification. There are topologies
+ like Topo-Video-switch-MCU [29] for which the use of principle C may
+ be desirable.
+
+ If in-band signaling of parameter sets is used, the picture and
+ sequence parameter set NALUs SHOULD be transmitted in the RTP payload
+ using a reliable method of delivering of RTP (see below), as a loss
+ of a parameter set of either type will likely prevent decoding of a
+ considerable portion of the corresponding RTP packet stream.
+
+ If in-band signaling of parameter sets is used, the sender SHOULD
+ take the error characteristics into account and use mechanisms to
+ provide a high probability for delivering the parameter sets
+ correctly. Mechanisms that increase the probability for a correct
+ reception include packet repetition, FEC, and retransmission. The
+ use of an unreliable, out-of-band control protocol has similar
+ disadvantages as the in-band signaling (possible loss) and, in
+ addition, may also lead to difficulties in the synchronization (see
+ below). Therefore, it is NOT RECOMMENDED.
+
+ Parameter sets MAY be added or updated during the lifetime of a
+ session using principles B and C. It is required that parameter sets
+ be present at the decoder prior to the NAL units that refer to them.
+ Update or addition of parameter sets can result in further problems;
+ therefore, the following recommendations should be considered.
+
+
+
+
+Wang, et al. Standards Track [Page 76]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ - When parameter sets are added or updated, care SHOULD be taken to
+ ensure that any parameter set is delivered prior to its usage.
+ When new parameter sets are added, previously unused parameter set
+ identifiers are used. It is common that no synchronization is
+ present between out-of-band signaling and in-band traffic. If
+ out-of-band signaling is used, it is RECOMMENDED that a sender not
+ start sending NALUs requiring the added or updated parameter sets
+ prior to acknowledgement of delivery from the signaling protocol.
+
+ - When parameter sets are updated, the following synchronization
+ issue should be taken into account. When overwriting a parameter
+ set at the receiver, the sender has to ensure that the parameter
+ set in question is not needed by any NALU present in the network
+ or receiver buffers. Otherwise, decoding with a wrong parameter
+ set may occur. To lessen this problem, it is RECOMMENDED either
+ to overwrite only those parameter sets that have not been used for
+ a sufficiently long time (to ensure that all related NALUs have
+ been consumed) or to add a new parameter set instead (which may
+ have negative consequences for the efficiency of the video
+ coding).
+
+ Informative note: In some topologies like Topo-Video-switch-
+ MCU [29], the origin of the whole set of parameter sets may
+ come from multiple sources that may use non-unique parameter
+ set identifiers. In this case, an offer may overwrite an
+ existing parameter set if no other mechanism that enables
+ uniqueness of the parameter sets in the out-of-band channel
+ exists.
+
+ - In a multiparty session, one participant MUST associate parameter
+ sets coming from different sources with the source identification
+ whenever possible, e.g., by conveying out-of-band transported
+ parameter sets, as different sources typically use independent
+ parameter set identifier value spaces.
+
+ - Adding or modifying parameter sets by using both principles B and
+ C in the same RTP session may lead to inconsistencies of the
+ parameter sets because of the lack of synchronization between the
+ control and the RTP channel. Therefore, principles B and C MUST
+ NOT both be used in the same session unless sufficient
+ synchronization can be provided.
+
+ In some scenarios (e.g., when only the subset of this payload format
+ specification corresponding to H.241 is used) or topologies, it is
+ not possible to employ out-of-band parameter set transmission. In
+ this case, parameter sets have to be transmitted in-band. Here, the
+ synchronization with the non-parameter-set-data in the bitstream is
+ implicit, but the possibility of a loss has to be taken into account.
+
+
+
+Wang, et al. Standards Track [Page 77]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ The loss probability should be reduced using the mechanisms discussed
+ above. In case a loss of a parameter set is detected, recovery may
+ be achieved using a Decoder Refresh Point procedure, for example,
+ using RTCP feedback Full Intra Request (FIR) [30]. Two example
+ Decoder Refresh Point procedures are provided in the informative
+ Section 8.5.
+
+ - When parameter sets are initially provided using principle A and
+ then later added or updated in-band (principle C), there is a risk
+ associated with updating the parameter sets delivered out-of-band.
+ If receivers miss some in-band updates (for example, because of a
+ loss or a late tune-in), those receivers attempt to decode the
+ bitstream using outdated parameters. It is therefore RECOMMENDED
+ that parameter set IDs be partitioned between the out-of-band and
+ in-band parameter sets.
+
+8.5. Decoder Refresh Point Procedure Using In-Band Transport of
+ Parameter Sets (Informative)
+
+ When a sender with a video encoder according to [1] receives a
+ request for a decoder refresh point, the encoder shall enter the fast
+ update mode by using one of the procedures specified in Sections
+ 8.5.1 or 8.5.2. The procedure in Section 8.5.1 is the preferred
+ response in a lossless transmission environment. Both procedures
+ satisfy the requirement to enter the fast update mode for H.264 video
+ encoding.
+
+8.5.1. IDR Procedure to Respond to a Request for a Decoder Refresh
+ Point
+
+ This section gives one possible way to respond to a request for a
+ decoder refresh point.
+
+ The encoder shall, in the order presented here:
+
+ 1) Immediately prepare to send an IDR picture.
+
+ 2) Send a sequence parameter set to be used by the IDR picture to be
+ sent. The encoder may optionally also send other sequence
+ parameter sets.
+
+ 3) Send a picture parameter set to be used by the IDR picture to be
+ sent. The encoder may optionally also send other picture
+ parameter sets.
+
+ 4) Send the IDR picture.
+
+
+
+
+
+Wang, et al. Standards Track [Page 78]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ 5) From this point forward in time, send any other sequence or
+ picture parameter sets that have not yet been sent in this
+ procedure, prior to their reference by any NAL unit, regardless of
+ whether such parameter sets were previously sent prior to
+ receiving the request for a decoder refresh point. As needed,
+ such parameter sets may be sent in a batch, one at a time, or in
+ any combination of these two methods. Parameter sets may be
+ re-sent at any time for redundancy. Caution should be taken when
+ parameter set updates are present, as described above in Section
+ 8.4.
+
+8.5.2. Gradual Recovery Procedure to Respond to a Request for a Decoder
+ Refresh Point
+
+ This section gives another possible way to respond to a request for a
+ decoder refresh point.
+
+ The encoder shall, in the order presented here:
+
+ 1) Send a recovery point SEI message (see Sections D.1.7 and D.2.7 of
+ [1]).
+
+ 2) Repeat any sequence and picture parameter sets that were sent
+ before the recovery point SEI message, prior to their reference by
+ a NAL unit.
+
+ The encoder shall ensure that the decoder has access to all reference
+ pictures for inter prediction of pictures at or after the recovery
+ point, which is indicated by the recovery point SEI message, in
+ output order, assuming that the transmission from now on is error-
+ free.
+
+ The value of the recovery_frame_cnt syntax element in the recovery
+ point SEI message should be small enough to ensure a fast recovery.
+
+ As needed, such parameter sets may be re-sent in a batch, one at a
+ time, or in any combination of these two methods. Parameter sets may
+ be re-sent at any time for redundancy. Caution should be taken when
+ parameter set updates are present, as described above in Section 8.4.
+
+9. Security Considerations
+
+ RTP packets using the payload format defined in this specification
+ are subject to the security considerations discussed in the RTP
+ specification [5] and in any appropriate RTP profile (for example,
+ [16]). This implies that confidentiality of the media streams is
+ achieved by encryption, for example, through the application of SRTP
+ [26]. Because the data compression used with this payload format is
+
+
+
+Wang, et al. Standards Track [Page 79]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ applied end-to-end, any encryption needs to be performed after
+ compression. A potential denial-of-service threat exists for data
+ encodings using compression techniques that have non-uniform
+ receiver-end computational load. The attacker can inject
+ pathological datagrams into the stream that are complex to decode and
+ that cause the receiver to be overloaded. H.264 is particularly
+ vulnerable to such attacks, as it is extremely simple to generate
+ datagrams containing NAL units that affect the decoding process of
+ many future NAL units. Therefore, the usage of data origin
+ authentication and data integrity protection of at least the RTP
+ packet is RECOMMENDED, for example, with SRTP [26].
+
+ Note that the appropriate mechanism to ensure confidentiality and
+ integrity of RTP packets and their payloads is very dependent on the
+ application and on the transport and signaling protocols employed.
+ Thus, although SRTP is given as an example above, other possible
+ choices exist.
+
+ Decoders MUST exercise caution with respect to the handling of user
+ data SEI messages, particularly if they contain active elements, and
+ MUST restrict their domain of applicability to the presentation
+ containing the stream.
+
+ End-to-end security with either authentication, integrity, or
+ confidentiality protection will prevent a MANE from performing media-
+ aware operations other than discarding complete packets. In the case
+ of confidentiality protection, it will even be prevented from
+ discarding packets in a media-aware way. To be allowed to perform
+ its operations, a MANE is required to be a trusted entity that is
+ included in the security context establishment.
+
+10. Congestion Control
+
+ Congestion control for RTP SHALL be used in accordance with RFC 3550
+ [5] and with any applicable RTP profile, e.g., RFC 3551 [16]. If
+ best-effort service is being used, an additional requirement is that
+ users of this payload format MUST monitor packet loss to ensure that
+ the packet loss rate is within acceptable parameters. Packet loss is
+ considered acceptable if a TCP flow across the same network path, and
+ experiencing the same network conditions, would achieve an average
+ throughput, measured on a reasonable timescale, that is not less than
+ the RTP flow is achieving. This condition can be satisfied by
+ implementing congestion control mechanisms to adapt the transmission
+ rate (or the number of layers subscribed for a layered multicast
+ session) or by arranging for a receiver to leave the session if the
+ loss rate is unacceptably high.
+
+
+
+
+
+Wang, et al. Standards Track [Page 80]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ The bitrate adaptation necessary for obeying the congestion control
+ principle is easily achievable when real-time encoding is used.
+ However, when pre-encoded content is being transmitted, bandwidth
+ adaptation requires the availability of more than one coded
+ representation of the same content, at different bitrates, or the
+ existence of non-reference pictures or sub-sequences [22] in the
+ bitstream. The switching between the different representations can
+ normally be performed in the same RTP session, e.g., by employing a
+ concept known as SI/SP slices of the Extended profile or by switching
+ streams at IDR picture boundaries. Only when non-downgradable
+ parameters (such as the profile part of the profile/level ID) are
+ required to be changed does it become necessary to terminate and
+ restart the media stream. This may be accomplished by using a
+ different RTP payload type.
+
+ MANEs MAY follow the suggestions outlined in Section 7.3 and remove
+ certain unusable packets from the packet stream when that stream was
+ damaged due to previous packet losses. This can help reduce the
+ network load in certain special cases.
+
+11. IANA Considerations
+
+ The H264 media subtype name specified by RFC 3984 has been updated as
+ defined in Section 8.1 of this memo.
+
+12. Informative Appendix: Application Examples
+
+ This payload specification is very flexible in its use, in order to
+ cover the extremely wide application space anticipated for H.264.
+ However, this great flexibility also makes it difficult for an
+ implementer to decide on a reasonable packetization scheme. Some
+ information on how to apply this specification to real-world
+ scenarios is likely to appear in the form of academic publications
+ and a test model software and description in the near future.
+ However, some preliminary usage scenarios are described here as well.
+
+12.1. Video Telephony According to Annex A of ITU-T Recommendation
+ H.241
+
+ H.323-based video telephony systems that use H.264 as an optional
+ video compression scheme are required to support Annex A of H.241 [3]
+ as a packetization scheme. The packetization mechanism defined in
+ this Annex is technically identical with a small subset of this
+ specification.
+
+ When a system operates according to Annex A of H.241, parameter set
+ NAL units are sent in-band. Only single NAL unit packets are used.
+ Many such systems are not sending IDR pictures regularly, but only
+
+
+
+Wang, et al. Standards Track [Page 81]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ when required by user interaction or by control protocol means, e.g.,
+ when switching between video channels in a Multipoint Control Unit or
+ for error recovery requested by feedback.
+
+12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit
+ Aggregation
+
+ The RTP part of this scheme is implemented and tested (though not the
+ control-protocol part; see below).
+
+ In most real-world video telephony applications, picture parameters
+ such as picture size or optional modes never change during the
+ lifetime of a connection. Therefore, all necessary parameter sets
+ (usually only one) are sent as a side effect of the capability
+ exchange/announcement process, e.g., according to the SDP syntax
+ specified in Section 8.2 of this document. As all necessary
+ parameter set information is established before the RTP session
+ starts, there is no need for sending any parameter set NAL units.
+ Slice data partitioning is not used either. Thus, the RTP packet
+ stream basically consists of NAL units that carry single coded
+ slices.
+
+ The encoder chooses the size of coded slice NAL units so that they
+ offer the best performance. Often, this is done by adapting the
+ coded slice size to the MTU size of the IP network. For small
+ picture sizes, this may result in a one-picture-per-one-packet
+ strategy. Intra refresh algorithms clean up the loss of packets and
+ the resulting drift-related artifacts.
+
+12.3. Video Telephony, Interleaved Packetization Using NAL Unit
+ Aggregation
+
+ This scheme allows better error concealment and is used in
+ H.263-based designs using RFC 4629 packetization [11]. It has been
+ implemented, and good results were reported [13].
+
+ The VCL encoder codes the source picture so that all macroblocks
+ (MBs) of one MB line are assigned to one slice. All slices with even
+ MB row addresses are combined into one STAP, and all slices with odd
+ MB row addresses are combined into another. Those STAPs are
+ transmitted as RTP packets. The establishment of the parameter sets
+ is performed as discussed above.
+
+ Note that the use of STAPs is essential here, as the high number of
+ individual slices (18 for a Common Intermediate Format (CIF) picture)
+ would lead to unacceptably high IP/UDP/RTP header overhead (unless
+ the source coding tool FMO is used, which is not assumed in this
+ scenario). Furthermore, some wireless video transmission systems,
+
+
+
+Wang, et al. Standards Track [Page 82]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ such as H.324M and the IP-based video telephony specified in 3GPP,
+ are likely to use relatively small transport packet size. For
+ example, a typical MTU size of H.223 AL3 SDU is around 100 bytes
+ [17]. Coding individual slices according to this packetization
+ scheme provides further advantage in communication between wired and
+ wireless networks, as individual slices are likely to be smaller than
+ the preferred maximum packet size of wireless systems. Consequently,
+ a gateway can convert the STAPs used in a wired network into several
+ RTP packets with only one NAL unit, which are preferred in a wireless
+ network, and vice versa.
+
+12.4. Video Telephony with Data Partitioning
+
+ This scheme has been implemented and has been shown to offer good
+ performance, especially at higher packet loss rates [13].
+
+ Data partitioning is known to be useful only when some form of
+ unequal error protection is available. Normally, in single-session
+ RTP environments, even error characteristics are assumed; that is,
+ the packet loss probability of all packets of the session is the same
+ statistically. However, there are means to reduce the packet loss
+ probability of individual packets in an RTP session. A FEC packet
+ according to RFC 5109 [18], for example, specifies which media
+ packets are associated with the FEC packet.
+
+ In all cases, the incurred overhead is substantial but is in the same
+ order of magnitude as the number of bits that have otherwise been
+ spent for intra information. However, this mechanism does not add
+ any delay to the system.
+
+ Again, the complete parameter set establishment is performed through
+ control protocol means.
+
+12.5. Video Telephony or Streaming with FUs and Forward Error
+ Correction
+
+ This scheme has been implemented and has been shown to provide good
+ performance, especially at higher packet loss rates [19].
+
+ The most efficient means to combat packet losses for scenarios where
+ retransmissions are not applicable is forward error correction (FEC).
+ Although application layer, end-to-end use of FEC is often less
+ efficient than a FEC-based protection of individual links (especially
+ when links of different characteristics are in the transmission
+ path), application layer, end-to-end FEC is unavoidable in some
+ scenarios. RFC 5109 [18] provides means to use generic, application
+ layer, end-to-end FEC in packet loss environments. A binary forward
+ error correcting code is generated by applying the XOR operation to
+
+
+
+Wang, et al. Standards Track [Page 83]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ the bits at the same bit position in different packets. The binary
+ code can be specified by the parameters (n,k), in which k is the
+ number of information packets used in the connection and n is the
+ total number of packets generated for k information packets; that is,
+ n-k parity packets are generated for k information packets.
+
+ When a code is used with parameters (n,k) within the RFC 5109
+ framework, the following properties are well known:
+
+ a) If applied over one RTP packet, RFC 5109 provides only packet
+ repetition.
+
+ b) RFC 5109 is most bitrate efficient if XOR-connected packets have
+ equal length.
+
+ c) At the same packet loss probability p and for a fixed k, the
+ greater the value of n, the smaller the residual error probability
+ becomes. For example, for a packet loss probability of 10%, k=1,
+ and n=2, the residual error probability is about 1%, whereas for
+ n=3, the residual error probability is about 0.1%.
+
+ d) At the same packet loss probability p and for a fixed code rate
+ k/n, the greater the value of n, the smaller the residual error
+ probability becomes. For example, at a packet loss probability of
+ p=10%, k=1, and n=2, the residual error rate is about 1%, whereas
+ for an extended Golay code with k=12 and n=24, the residual error
+ rate is about 0.01%.
+
+ For applying RFC 5109 in combination with H.264 baseline-coded video
+ without using FUs, several options might be considered:
+
+ 1) The video encoder produces NAL units for which each video frame is
+ coded in a single slice. Applying FEC, one could use a simple
+ code, e.g., (n=2, k=1). That is, each NAL unit would basically
+ just be repeated. The disadvantage is obviously the bad code
+ performance according to d), above, and the low flexibility, as
+ only (n, k=1) codes can be used.
+
+ 2) The video encoder produces NAL units for which each video frame is
+ encoded in one or more consecutive slices. Applying FEC, one
+ could use a better code, e.g., (n=24, k=12), over a sequence of
+ NAL units. Depending on the number of RTP packets per frame, a
+ loss may introduce a significant delay, which is reduced when more
+ RTP packets are used per frame. Packets of completely different
+ lengths might also be connected, which decreases bitrate
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 84]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ efficiency according to b), above. However, with some care and
+ for slices of 1 kb or larger, similar length (100-200 bytes
+ difference) may be produced, which will not lower the bit
+ efficiency catastrophically.
+
+ 3) The video encoder produces NAL units, for which a certain frame
+ contains k slices of possibly almost equal length. Then, applying
+ FEC, a better code, e.g., (n=24, k=12), can be used over the
+ sequence of NAL units for each frame. The delay compared to that
+ of 2), above, may be reduced, but several disadvantages are
+ obvious. First, the coding efficiency of the encoded video is
+ lowered significantly, as slice-structured coding reduces intra-
+ frame prediction and additional slice overhead is necessary.
+ Second, pre-encoded content or, when operating over a gateway, the
+ video is usually not appropriately coded with k slices such that
+ FEC can be applied. Finally, the encoding of video producing k
+ slices of equal length is not straightforward and might require
+ more than one encoding pass.
+
+ Many of the mentioned disadvantages can be avoided by applying FUs in
+ combination with FEC. Each NAL unit can be split into any number of
+ FUs of basically equal length; therefore, FEC, with a reasonable k
+ and n, can be applied, even if the encoder made no effort to produce
+ slices of equal length. For example, a coded slice NAL unit
+ containing an entire frame can be split to k FUs, and a parity check
+ code (n=k+1, k) can be applied. However, this has the disadvantage
+ that unless all created fragments can be recovered, the whole slice
+ will be lost. Thus, a larger section is lost than would be if the
+ frame had been split into several slices.
+
+ The presented technique makes it possible to achieve good
+ transmission error tolerance, even if no additional source coding
+ layer redundancy (such as periodic intra frames) is present.
+ Consequently, the same coded video sequence can be used to achieve
+ the maximum compression efficiency and quality over error-free
+ transmission and for transmission over error-prone networks.
+ Furthermore, the technique allows the application of FEC to pre-
+ encoded sequences without adding delay. In this case, pre-encoded
+ sequences that are not encoded for error-prone networks can still be
+ transmitted almost reliably without adding extensive delays. In
+ addition, FUs of equal length result in a bitrate efficient use of
+ RFC 5109.
+
+ If the error probability depends on the length of the transmitted
+ packet (e.g., in case of mobile transmission [15]), the benefits of
+ applying FUs with FEC are even more obvious. Basically, the
+ flexibility of the size of FUs allows appropriate FEC to be applied
+ for each NAL unit and unequal error protection of NAL units.
+
+
+
+Wang, et al. Standards Track [Page 85]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ When FUs and FEC are used, the incurred overhead is substantial but
+ is in the same order of magnitude as the number of bits that have to
+ be spent for intra-coded macroblocks if no FEC is applied. In [19],
+ it was shown that the overall performance of the FEC-based approach
+ enhanced quality when using the same error rate and same overall
+ bitrate, including the overhead.
+
+12.6. Low Bitrate Streaming
+
+ This scheme has been implemented with H.263 and non-standard RTP
+ packetization and has given good results [20]. There is no technical
+ reason why similarly good results could not be achievable with H.264.
+
+ In today's Internet streaming, some of the offered bitrates are
+ relatively low in order to allow terminals with dial-up modems to
+ access the content. In wired IP networks, relatively large packets,
+ say 500 - 1500 bytes, are preferred to smaller and more frequently
+ occurring packets in order to reduce network congestion. Moreover,
+ use of large packets decreases the amount of RTP/UDP/IP header
+ overhead. For low bitrate video, the use of large packets means that
+ sometimes up to few pictures should be encapsulated in one packet.
+
+ However, the loss of a packet including many coded pictures would
+ have drastic consequences for visual quality, as there is practically
+ no way to conceal the loss of an entire picture other than repeating
+ the previous one. One way to construct relatively large packets and
+ maintain possibilities for successful loss concealment is to
+ construct MTAPs that contain interleaved slices from several
+ pictures. An MTAP should not contain spatially adjacent slices from
+ the same picture or spatially overlapping slices from any picture.
+ If a packet is lost, it is likely that a lost slice is surrounded by
+ spatially adjacent slices of the same picture and spatially
+ corresponding slices of the temporally previous and succeeding
+ pictures. Consequently, concealment of the lost slice is likely to
+ be relatively successful.
+
+12.7. Robust Packet Scheduling in Video Streaming
+
+ Robust packet scheduling has been implemented with MPEG-4 Part 2 and
+ simulated in a wireless streaming environment [21]. There is no
+ technical reason why similar or better results could not be
+ achievable with H.264.
+
+ Streaming clients typically have a receiver buffer that is capable of
+ storing a relatively large amount of data. Initially, when a
+ streaming session is established, a client does not start playing the
+ stream back immediately. Rather, it typically buffers the incoming
+ data for a few seconds. This buffering helps maintain continuous
+
+
+
+Wang, et al. Standards Track [Page 86]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ playback, as, in case of occasional increased transmission delays or
+ network throughput drops, the client can decode and play buffered
+ data. Otherwise, without initial buffering, the client has to freeze
+ the display, stop decoding, and wait for incoming data. The
+ buffering is also necessary for either automatic or selective
+ retransmission in any protocol level. If any part of a picture is
+ lost, a retransmission mechanism may be used to resend the lost data.
+ If the retransmitted data is received before its scheduled decoding
+ or playback time, the loss is recovered perfectly. Coded pictures
+ can be ranked according to their importance in the subjective quality
+ of the decoded sequence. For example, non-reference pictures, such
+ as conventional B pictures, are subjectively least important, as
+ their absence does not affect decoding of any other pictures. In
+ addition to non-reference pictures, the ITU-T H.264 | ISO/IEC
+ 14496-10 standard includes a temporal scalability method called sub-
+ sequences [22]. Subjective ranking can also be made on coded slice
+ data partition or slice group basis. Coded slices and coded slice
+ data partitions that are subjectively the most important can be sent
+ earlier than their decoding order indicates, whereas coded slices and
+ coded slice data partitions that are subjectively the least important
+ can be sent later than their natural coding order indicates.
+ Consequently, any retransmitted parts of the most important slices
+ and coded slice data partitions are more likely to be received before
+ their scheduled decoding or playback time compared to the least
+ important slices and slice data partitions.
+
+13. Informative Appendix: Rationale for Decoding Order Number
+
+13.1. Introduction
+
+ The Decoding Order Number (DON) concept was introduced mainly to
+ enable efficient multi-picture slice interleaving (see Section 12.6)
+ and robust packet scheduling (see Section 12.7). In both of these
+ applications, NAL units are transmitted out of decoding order. DON
+ indicates the decoding order of NAL units and should be used in the
+ receiver to recover the decoding order. Example use cases for
+ efficient multi-picture slice interleaving and for robust packet
+ scheduling are given in Sections 13.2 and 13.3, respectively.
+ Section 13.4 describes the benefits of the DON concept in error
+ resiliency achieved by redundant coded pictures. Section 13.5
+ summarizes considered alternatives to DON and justifies why DON was
+ chosen for this RTP payload specification.
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 87]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+13.2. Example of Multi-Picture Slice Interleaving
+
+ An example of multi-picture slice interleaving follows. A subset of
+ a coded video sequence is depicted below in output order. R denotes
+ a reference picture, N denotes a non-reference picture, and the
+ number indicates a relative output time.
+
+ ... R1 N2 R3 N4 R5 ...
+
+ The decoding order of these pictures from left to right is as
+ follows:
+
+ ... R1 R3 N2 R5 N4 ...
+
+ The NAL units of pictures R1, R3, N2, R5, and N4 are marked with a
+ DON equal to 1, 2, 3, 4, and 5, respectively.
+
+ Each reference picture consists of three slice groups that are
+ scattered as follows (a number denotes the slice group number for
+ each macroblock in a Quarter Common Intermediate Format (QCIF)
+ frame):
+
+ 0 1 2 0 1 2 0 1 2 0 1
+ 2 0 1 2 0 1 2 0 1 2 0
+ 1 2 0 1 2 0 1 2 0 1 2
+ 0 1 2 0 1 2 0 1 2 0 1
+ 2 0 1 2 0 1 2 0 1 2 0
+ 1 2 0 1 2 0 1 2 0 1 2
+ 0 1 2 0 1 2 0 1 2 0 1
+ 2 0 1 2 0 1 2 0 1 2 0
+ 1 2 0 1 2 0 1 2 0 1 2
+
+ For the sake of simplicity, we assume that all the macroblocks of a
+ slice group are included in one slice. Three MTAPs are constructed
+ from three consecutive reference pictures so that each MTAP contains
+ three aggregation units, each of which contains all the macroblocks
+ from one slice group. The first MTAP contains slice group 0 of
+ picture R1, slice group 1 of picture R3, and slice group 2 of picture
+ R5. The second MTAP contains slice group 1 of picture R1, slice
+ group 2 of picture R3, and slice group 0 of picture R5. The third
+ MTAP contains slice group 2 of picture R1, slice group 0 of picture
+ R3, and slice group 1 of picture R5. Each non-reference picture is
+ encapsulated into an STAP-B.
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 88]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Consequently, the transmission order of NAL units is the following:
+
+ R1, slice group 0, DON 1, carried in MTAP,RTP SN: N
+ R3, slice group 1, DON 2, carried in MTAP,RTP SN: N
+ R5, slice group 2, DON 4, carried in MTAP,RTP SN: N
+ R1, slice group 1, DON 1, carried in MTAP,RTP SN: N+1
+ R3, slice group 2, DON 2, carried in MTAP,RTP SN: N+1
+ R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+1
+ R1, slice group 2, DON 1, carried in MTAP,RTP SN: N+2
+ R3, slice group 1, DON 2, carried in MTAP,RTP SN: N+2
+ R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+2
+ N2, DON 3, carried in STAP-B, RTP SN: N+3
+ N4, DON 5, carried in STAP-B, RTP SN: N+4
+
+ The receiver is able to organize the NAL units back in decoding order
+ based on the value of DON associated with each NAL unit.
+
+ If one of the MTAPs is lost, the spatially adjacent and temporally
+ co-located macroblocks are received and can be used to conceal the
+ loss efficiently. If one of the STAPs is lost, the effect of the
+ loss does not propagate temporally.
+
+13.3. Example of Robust Packet Scheduling
+
+ An example of robust packet scheduling follows. The communication
+ system used in the example consists of the following components in
+ the order that the video is processed from source to sink:
+
+ o camera and capturing
+ o pre-encoding buffer
+ o encoder
+ o encoded picture buffer
+ o transmitter
+ o transmission channel
+ o receiver
+ o receiver buffer
+ o decoder
+ o decoded picture buffer
+ o display
+
+ The video communication system used in this example operates as
+ follows. Note that processing of the video stream happens gradually
+ and at the same time in all components of the system. The source
+ video sequence is shot and captured to a pre-encoding buffer. The
+ pre-encoding buffer can be used to order pictures from sampling order
+ to encoding order or to analyze multiple uncompressed frames for
+ bitrate control purposes, for example. In some cases, the pre-
+ encoding buffer may not exist; instead, the sampled pictures are
+
+
+
+Wang, et al. Standards Track [Page 89]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ encoded right away. The encoder encodes pictures from the pre-
+ encoding buffer and stores the output (i.e., coded pictures) to the
+ encoded picture buffer. The transmitter encapsulates the coded
+ pictures from the encoded picture buffer to transmission packets and
+ sends them to a receiver through a transmission channel. The
+ receiver stores the received packets to the receiver buffer. The
+ receiver buffering process typically includes buffering for
+ transmission delay jitter. The receiver buffer can also be used to
+ recover correct decoding order of coded data. The decoder reads
+ coded data from the receiver buffer and produces decoded pictures as
+ output into the decoded picture buffer. The decoded picture buffer
+ is used to recover the output (or display) order of pictures.
+ Finally, pictures are displayed.
+
+ In the following example figures, I denotes an IDR picture, R denotes
+ a reference picture, N denotes a non-reference picture, and the
+ number after I, R, or N indicates the sampling time relative to the
+ previous IDR picture in decoding order. Values below the sequence of
+ pictures indicate scaled system clock timestamps. The system clock
+ is initialized arbitrarily in this example, and time runs from left
+ to right. Each I, R, and N picture is mapped into the same timeline
+ compared to the previous processing step, if any, assuming that
+ encoding, transmission, and decoding take no time. Thus, events
+ happening at the same time are located in the same column throughout
+ all example figures.
+
+ A subset of a sequence of coded pictures is depicted below in
+ sampling order.
+
+ ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ... N58 N59 I00 N01 ...
+ ... --|---|---|---|---|---|---|---|---|- ... -|---|---|---|- ...
+ ... 58 59 60 61 62 63 64 65 66 ... 128 129 130 131 ...
+
+ Figure 16. Sequence of pictures in sampling order
+
+ The sampled pictures are buffered in the pre-encoding buffer to
+ arrange them in encoding order. In this example, we assume that the
+ non-reference pictures are predicted from both the previous and the
+ next reference picture in output order, except for the non-reference
+ pictures immediately preceding an IDR picture, which are predicted
+ only from the previous reference picture in output order. Thus, the
+ pre-encoding buffer has to contain at least two pictures, and the
+ buffering causes a delay of two picture intervals. The output of the
+ pre-encoding buffering process and the encoding (and decoding) order
+ of the pictures are as follows:
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 90]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
+ ... -|---|---|---|---|---|---|---|---|- ...
+ ... 60 61 62 63 64 65 66 67 68 ...
+
+ Figure 17. Reordered pictures in the pre-encoding buffer
+
+ The encoder or the transmitter can set the value of DON for each
+ picture to a value of DON for the previous picture in decoding order
+ plus one.
+
+ For the sake of simplicity, let us assume that:
+
+ o the frame rate of the sequence is constant,
+ o each picture consists of only one slice,
+ o each slice is encapsulated in a single NAL unit packet,
+ o there is no transmission delay, and
+ o pictures are transmitted at constant intervals (that is, 1 /
+ (frame rate)).
+
+ When pictures are transmitted in decoding order, they are received as
+ follows:
+
+ ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
+ ... -|---|---|---|---|---|---|---|---|- ...
+ ... 60 61 62 63 64 65 66 67 68 ...
+
+ Figure 18. Received pictures in decoding order
+
+ The OPTIONAL sprop-interleaving-depth media type parameter is set to
+ 0, as the transmission (or reception) order is identical to the
+ decoding order.
+
+ Initially, the decoder has to buffer for one picture interval in its
+ decoded picture buffer to organize pictures from decoding order to
+ output order, as depicted below:
+
+ ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ...
+ ... -|---|---|---|---|---|---|---|---|- ...
+ ... 61 62 63 64 65 66 67 68 69 ...
+
+ Figure 19. Output order
+
+ The amount of required initial buffering in the decoded picture
+ buffer can be signaled in the buffering period SEI message or with
+ the num_reorder_frames syntax element of H.264 video usability
+ information. num_reorder_frames indicates the maximum number of
+ frames, complementary field pairs, or non-paired fields that precede
+ any frame, complementary field pair, or non-paired field in the
+
+
+
+Wang, et al. Standards Track [Page 91]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ sequence in decoding order and that follow it in output order. For
+ the sake of simplicity, we assume that num_reorder_frames is used to
+ indicate the initial buffer in the decoded picture buffer. In this
+ example, num_reorder_frames is equal to 1.
+
+ It can be observed that if the IDR picture I00 is lost during
+ transmission and a retransmission request is issued when the value of
+ the system clock is 62, there is one picture interval of time (until
+ the system clock reaches timestamp 63) to receive the retransmitted
+ IDR picture I00.
+
+ Let us then assume that IDR pictures are transmitted two frame
+ intervals earlier than their decoding position; that is, the pictures
+ are transmitted as follows:
+
+ ... I00 N58 N59 R03 N01 N02 R06 N04 N05 ...
+ ... --|---|---|---|---|---|---|---|---|- ...
+ ... 62 63 64 65 66 67 68 69 70 ...
+
+ Figure 20. Interleaving: Early IDR pictures in sending order
+
+ The OPTIONAL sprop-interleaving-depth media type parameter is set
+ equal to 1 according to its definition. (The value of sprop-
+ interleaving-depth in this example can be derived as follows: picture
+ I00 is the only picture preceding picture N58 or N59 in transmission
+ order and following it in decoding order. Except for pictures I00,
+ N58, and N59, the transmission order is the same as the decoding
+ order of pictures. As a coded picture is encapsulated into exactly
+ one NAL unit, the value of sprop-interleaving-depth is equal to the
+ maximum number of pictures preceding any picture in transmission
+ order and following the picture in decoding order).
+
+ The receiver buffering process contains two pictures at a time
+ according to the value of the sprop-interleaving-depth parameter and
+ orders pictures from the reception order to the correct decoding
+ order based on the value of DON associated with each picture. The
+ output of the receiver buffering process is as follows:
+
+ ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
+ ... -|---|---|---|---|---|---|---|---|- ...
+ ... 63 64 65 66 67 68 69 70 71 ...
+
+ Figure 21. Interleaving: Receiver buffer
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 92]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Again, an initial buffering delay of one picture interval is needed
+ to organize pictures from decoding order to output order, as depicted
+ below:
+
+ ... N58 N59 I00 N01 N02 R03 N04 N05 ...
+ ... -|---|---|---|---|---|---|---|- ...
+ ... 64 65 66 67 68 69 70 71 ...
+
+ Figure 22. Interleaving: Receiver buffer after reordering
+
+ Note that the maximum delay that IDR pictures can undergo during
+ transmission, including possible application, transport, or link
+ layer retransmission, is equal to three picture intervals. Thus, the
+ loss resiliency of IDR pictures is improved in systems supporting
+ retransmission compared to the case in which pictures are transmitted
+ in their decoding order.
+
+13.4. Robust Transmission Scheduling of Redundant Coded Slices
+
+ A redundant coded picture is a coded representation of a picture or a
+ part of a picture that is not used in the decoding process if the
+ corresponding primary coded picture is correctly decoded. There
+ should be no noticeable difference between any area of the decoded
+ primary picture and a corresponding area that would result from
+ application of the H.264 decoding process for any redundant picture
+ in the same access unit. A redundant coded slice is a coded slice
+ that is a part of a redundant coded picture.
+
+ Redundant coded pictures can be used to provide unequal error
+ protection in error-prone video transmission. If a primary coded
+ representation of a picture is decoded incorrectly, a corresponding
+ redundant coded picture can be decoded. Examples of applications and
+ coding techniques using the redundant codec picture feature include
+ the video redundancy coding [23] and the protection of "key pictures"
+ in multicast streaming [24].
+
+ One property of many error-prone video communications systems is that
+ transmission errors are often bursty. Therefore, they may affect
+ more than one consecutive transmission packet in transmission order.
+ In low bitrate video communication, it is relatively common for an
+ entire coded picture to be encapsulated into one transmission packet.
+ Consequently, a primary coded picture and the corresponding redundant
+ coded pictures may be transmitted in consecutive packets in
+ transmission order. To make the transmission scheme more tolerant of
+ bursty transmission errors, it is beneficial to transmit the primary
+ coded picture and redundant coded picture separated by more than a
+ single packet. The DON concept enables this.
+
+
+
+
+Wang, et al. Standards Track [Page 93]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+13.5. Remarks on Other Design Possibilities
+
+ The slice header syntax structure of the H.264 coding standard
+ contains the frame_num syntax element that can indicate the decoding
+ order of coded frames. However, the usage of the frame_num syntax
+ element is not feasible or desirable to recover the decoding order,
+ due to the following reasons:
+
+ o The receiver is required to parse at least one slice header per
+ coded picture (before passing the coded data to the decoder).
+
+ o Coded slices from multiple coded video sequences cannot be
+ interleaved, as the frame number syntax element is reset to 0 in
+ each IDR picture.
+
+ o The coded fields of a complementary field pair share the same
+ value of the frame_num syntax element. Thus, the decoding order
+ of the coded fields of a complementary field pair cannot be
+ recovered based on the frame_num syntax element or any other
+ syntax element of the H.264 coding syntax.
+
+ The RTP payload format for transport of MPEG-4 elementary streams
+ [25] enables interleaving of access units and transmission of
+ multiple access units in the same RTP packet. An access unit is
+ specified in the H.264 coding standard to comprise all NAL units
+ associated with a primary coded picture according to Subclause
+ 7.4.1.2 of [1]. Consequently, slices of different pictures cannot be
+ interleaved, and the multi-picture slice interleaving technique (see
+ Section 12.6) for improved error resilience cannot be used.
+
+14. Changes from RFC 3984
+
+ Following is the list of technical changes (including bug fixes) from
+ RFC 3984. Besides this list of technical changes, numerous editorial
+ changes have been made, but not documented in this section. Note
+ that Section 8.2.2 is where much of the important changes in this
+ memo occurs and deserves particular attention.
+
+ 1) In Sections 5.4, 5.5, 6.2, 6.3, and 6.4, removed that the
+ packetization mode in use may be signaled by external means.
+
+ 2) In Section 7.2.2, changed the sentence
+
+ There are N VCL NAL units in the de-interleaving buffer.
+
+ to
+
+ There are N or more VCL NAL units in the de-interleaving buffer.
+
+
+
+Wang, et al. Standards Track [Page 94]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ 3) In Section 8.1, the semantics of sprop-init-buf-time (paragraph
+ 2), changed the sentence
+
+ The parameter is the maximum value of (transmission time of a NAL
+ unit - decoding time of the NAL unit), assuming reliable and
+ instantaneous transmission, the same timeline for transmission
+ and decoding, and that decoding starts when the first packet
+ arrives.
+
+ to
+
+ The parameter is the maximum value of (decoding time of the NAL
+ unit - transmission time of a NAL unit), assuming reliable and
+ instantaneous transmission, the same timeline for transmission
+ and decoding, and that decoding starts when the first packet
+ arrives.
+
+ 4) Added media type parameters max-smbps, sprop-level-parameter-
+ sets, use-level-src-parameter-sets, in-band-parameter-sets, sar-
+ understood, and sar-supported.
+
+ 5) In Section 8.1, removed the specification of parameter-add.
+ Other descriptions of parameter-add (in Sections 8.2 and 8.4)
+ were also removed.
+
+ 6) In Section 8.1, added a constraint to sprop-parameter-sets such
+ that it can only contain parameter sets for the same profile and
+ level as indicated by profile-level-id.
+
+ 7) In Section 8.2.1, added that sprop-parameter-sets and sprop-
+ level-parameter-sets may be either included in the "a=fmtp" line
+ of SDP or conveyed using the "fmtp" source attribute as specified
+ in Section 6.3 of [9].
+
+ 8) In Section 8.2.2, removed sprop-deint-buf-req from being part of
+ the media format configuration in usage with the SDP Offer/Answer
+ model.
+
+ 9) In Section 8.2.2, made it clear that level is downgradable in the
+ SDP Offer/Answer model, i.e., the use of the level part of
+ profile-level-id does not need to be symmetric (the level
+ included in the answer can be lower than or equal to the level
+ included in the offer).
+
+ 10) In Section 8.2.2, removed that the capability parameters may be
+ used to declare encoding capabilities.
+
+
+
+
+
+Wang, et al. Standards Track [Page 95]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ 11) In Section 8.2.2, added rules on how to use sprop-parameter-sets
+ and sprop-level-parameter-sets for out-of-band transport of
+ parameter sets, with or without level downgrading.
+
+ 12) In Section 8.2.2, clarified the rules of using the media type
+ parameters with SDP Offer/Answer for multicast.
+
+ 13) In Section 8.2.2, completed and corrected the list of how
+ different media type parameters shall be interpreted in the
+ different combinations of offer or answer and direction
+ attribute.
+
+ 14) In Section 8.4, changed the text such that both out-of-band and
+ in-band transport of parameter sets are allowed, and neither is
+ recommended or required.
+
+ 15) Added Section 8.5 (informative) providing example methods for
+ decoder refresh to handle parameter set losses.
+
+ 16) Added media type parameters max-recv-level and level-asymmetry-
+ allowed and adjusted associated text and examples for level
+ upgrade and asymmetry.
+
+15. Backward Compatibility to RFC 3984
+
+ The current document is a revision of RFC 3984 and obsoletes it. The
+ technical changes relative to RFC 3984 are listed in Section 14.
+ This section addresses the backward compatibility issues.
+
+ It should be noted that for the majority of cases, there will be no
+ compatibility issues for legacy implementations per RFC 3984 and new
+ implementations per this document to interwork. Compatibility issues
+ may only occur when both of the following conditions are true: 1)
+ legacy implementations and new implementations are interworking, and
+ 2) parameter sets are transported out-of-band. When such
+ compatibility issues occur, it is easy to debug and find the reason
+ for the incompatibility using the following analyses.
+
+ Items 1, 2, 3, 7, 9, 10, 12, and 13 are bug-fix types of changes and
+ do not incur any backward compatibility issues.
+
+ Item 4 (addition of six new media type parameters) does not incur any
+ backward compatibility issues for SDP Offer/Answer-based
+ applications, as legacy RFC 3984 receivers ignore these parameters,
+ and it is fine for legacy RFC 3984 senders not to use these
+ parameters as they are optional. However, there is a backward
+ compatibility issue for declarative-usage-based applications (only
+ for the parameter sprop-level-parameter-sets as the other five
+
+
+
+Wang, et al. Standards Track [Page 96]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ parameters are not usable in declarative usage). For example,
+ declarative-usage-based applications using RTSP and SAP have a
+ backward compatibility issue because the SDP receiver per RFC 3984
+ cannot accept a session for which the SDP includes an unrecognized
+ parameter. Therefore, the RTSP or SAP server may have to prepare two
+ sets of streams, one for legacy RFC 3984 receivers and one for
+ receivers according to this memo.
+
+ Items 5, 6, and 11 are related to out-of-band transport of parameter
+ sets. There are following backward compatibility issues.
+
+ 1) When a legacy sender per RFC 3984 includes parameter sets for a
+ level different than the default level indicated by profile-
+ level-id to sprop-parameter-sets, the parameter value of sprop-
+ parameter-sets is invalid to the receiver per this memo;
+ therefore, the session may be rejected.
+
+ 2) In SDP Offer/Answer between a legacy offerer per RFC 3984 and an
+ answerer per this memo, when the answerer includes in the answer
+ parameter sets that are not a superset of the parameter sets
+ included in the offer, the parameter value of sprop-parameter-
+ sets is invalid to the offerer, and the session may not be
+ initiated properly (related to change item 11).
+
+ 3) When one endpoint A per this memo includes in-band-parameter-sets
+ equal to 1, the other side B per RFC 3984 does not understand
+ that it must transmit parameter sets in-band, and B may still
+ exclude parameter sets in the in-band stream it is sending.
+ Consequently, endpoint A cannot decode the stream it receives.
+
+ Item 7 (allowance of conveying sprop-parameter-sets and sprop-level-
+ parameter-sets using the "fmtp" source attribute as specified in
+ Section 6.3 of [9]) is similar to item 4. It does not incur any
+ backward compatibility issues for SDP Offer/Answer-based
+ applications, as legacy RFC 3984 receivers ignore the "fmtp" source
+ attribute, and it is fine for legacy RFC 3984 senders not to use the
+ "fmtp" source attribute as it is optional. However, there is a
+ backward compatibility issue for SDP declarative-usage-based
+ applications, e.g., those using RTSP and SAP, because the SDP
+ receiver per RFC 3984 cannot accept a session for which the SDP
+ includes an unrecognized parameter (i.e., the "fmtp" source
+ attribute). Therefore, the RTSP or SAP server may have to prepare
+ two sets of streams, one for legacy RFC 3984 receivers and one for
+ receivers according to this memo.
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 97]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ Item 14 does not incur any backward compatibility issues, as out-of-
+ band transport of parameter sets is still allowed.
+
+ Item 15 does not incur any backward compatibility issues, as the
+ added Section 8.5 is informative.
+
+ Item 16 does not create any backward compatibility issues as the
+ handling of the default level is the same if either end is RFC 3984
+ compliant, and, furthermore, RFC-3984-compliant ends would simply
+ ignore the new media type parameters, if present.
+
+16. Acknowledgements
+
+ Stephan Wenger, Miska Hannuksela, Thomas Stockhammer, Magnus
+ Westerlund, and David Singer are thanked as the authors of RFC 3984.
+ Dave Lindbergh, Philippe Gentric, Gonzalo Camarillo, Gary Sullivan,
+ Joerg Ott, and Colin Perkins are thanked for careful review during
+ the development of RFC 3984. Stephen Botzko, Magnus Westerlund, Alex
+ Eleftheriadis, Thomas Schierl, Tom Taylor, Ali Begen, Aaron Wells,
+ Stuart Taylor, Robert Sparks, Dan Romascanu, and Niclas Comstedt are
+ thanked for their valuable comments and input during the development
+ of this memo.
+
+17. References
+
+17.1. Normative References
+
+ [1] ITU-T Recommendation H.264, "Advanced video coding for generic
+ audiovisual services", March 2010.
+
+ [2] ISO/IEC International Standard 14496-10:2008.
+
+ [3] ITU-T Recommendation H.241, "Extended video procedures and
+ control signals for H.300-series terminals", May 2006.
+
+ [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [5] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
+ "RTP: A Transport Protocol for Real-Time Applications", STD 64,
+ RFC 3550, July 2003.
+
+ [6] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+ Description Protocol", RFC 4566, July 2006.
+
+ [7] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings",
+ RFC 4648, October 2006.
+
+
+
+
+Wang, et al. Standards Track [Page 98]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ [8] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
+ Session Description Protocol (SDP)", RFC 3264, June 2002.
+
+ [9] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media
+ Attributes in the Session Description Protocol (SDP)", RFC
+ 5576, June 2009.
+
+17.2. Informative References
+
+ [10] Luthra, A., Sullivan, G.J., and T. Wiegand (eds.),
+ "Introduction to the special issue on the H.264/AVC video
+ coding standard", IEEE Transactions on Circuits and Systems for
+ Video Technology, Vol. 13, No. 7, July 2003.
+
+ [11] Ott, J., Bormann, C., Sullivan, G., Wenger, S., and R. Even,
+ Ed., "RTP Payload Format for ITU-T Rec. H.263 Video", RFC 4629,
+ January 2007.
+
+ [12] ISO/IEC International Standard 14496-2:2004.
+
+ [13] Wenger, S., "H.264/AVC over IP", IEEE Transaction on Circuits
+ and Systems for Video Technology, Vol. 13, No. 7, July 2003.
+
+ [14] Wenger, S., "H.26L over IP: The IP-Network Adaptation Layer",
+ Proceedings Packet Video Workshop, April 2002.
+
+ [15] Stockhammer, T., Hannuksela, M.M., and S. Wenger, "H.26L/JVT
+ Coding Network Abstraction Layer and IP-Based Transport", IEEE
+ International Conference on Image Processing (ICIP 2002),
+ Rochester, NY, September 2002.
+
+ [16] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
+ Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
+
+ [17] ITU-T Recommendation H.223, "Multiplexing protocol for low bit
+ rate multimedia communication", July 2001.
+
+ [18] Li, A., Ed., "RTP Payload Format for Generic Forward Error
+ Correction", RFC 5109, December 2007.
+
+ [19] Stockhammer, T., Wiegand, T., Oelbaum, T., and F. Obermeier,
+ "Video Coding and Transport Layer Techniques for H.264/AVC-
+ Based Transmission over Packet-Lossy Networks", IEEE
+ International Conference on Image Processing (ICIP 2003),
+ Barcelona, Spain, September 2003.
+
+ [20] Varsa, V. and M. Karczewicz, "Slice interleaving in compressed
+ video packetization", Packet Video Workshop 2000.
+
+
+
+Wang, et al. Standards Track [Page 99]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+ [21] Kang, S.H. and A. Zakhor, "Packet scheduling algorithm for
+ wireless video streaming", Packet Video Workshop 2002.
+
+ [22] Hannuksela, M.M., "Enhanced Concept of GOP", JVT-B042,
+ available http://ftp3.itu.int/av-arch/video-site/0201_Gen/JVT-
+ B042.doc, January 2002.
+
+ [23] Wenger, S., "Video Redundancy Coding in H.263+", 1997
+ International Workshop on Audio-Visual Services over Packet
+ Networks, September 1997.
+
+ [24] Wang, Y.-K., Hannuksela, M.M., and M. Gabbouj, "Error Resilient
+ Video Coding Using Unequally Protected Key Pictures", in Proc.
+ International Workshop VLBV03, September 2003.
+
+ [25] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
+ P. Gentric, "RTP Payload Format for Transport of MPEG-4
+ Elementary Streams", RFC 3640, November 2003.
+
+ [26] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
+ 3711, March 2004.
+
+ [27] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
+ Protocol (RTSP)", RFC 2326, April 1998.
+
+ [28] Handley, M., Perkins, C., and E. Whelan, "Session Announcement
+ Protocol", RFC 2974, October 2000.
+
+ [29] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
+ January 2008.
+
+ [30] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, "Codec
+ Control Messages in the RTP Audio-Visual Profile with Feedback
+ (AVPF)", RFC 5104, February 2008.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 100]
+
+RFC 6184 RTP Payload Format for H.264 Video May 2011
+
+
+Authors' Addresses
+
+ Ye-Kui Wang
+ Huawei Technologies
+ 400 Crossing Blvd, 2nd Floor
+ Bridgewater, NJ 08807
+ USA
+
+ Phone: +1-908-541-3518
+ EMail: yekui.wang@huawei.com
+
+
+ Roni Even
+ Huawei Technologies
+ 14 David Hamelech
+ Tel Aviv 64953
+ Israel
+
+ Phone: +972-545481099
+ EMail: even.roni@huawei.com
+
+
+ Tom Kristensen
+ TANDBERG
+ Philip Pedersens vei 22
+ N-1366 Lysaker
+ Norway
+
+ Phone: +47 67125125
+ EMail: tom.kristensen@tandberg.com, tomkri@ifi.uio.no
+
+
+ Randell Jesup
+ WorldGate Communications
+ 3800 Horizon Blvd, Suite #103
+ Trevose, PA 19053-4947
+ USA
+
+ Phone: +1-215-354-5166
+ EMail: rjesup@wgate.com, randell_ietf@jesup.org
+
+
+
+
+
+
+
+
+
+
+
+Wang, et al. Standards Track [Page 101]
+