summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc3640.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc3640.txt')
-rw-r--r--doc/rfc/rfc3640.txt2411
1 files changed, 2411 insertions, 0 deletions
diff --git a/doc/rfc/rfc3640.txt b/doc/rfc/rfc3640.txt
new file mode 100644
index 0000000..d0cf1ee
--- /dev/null
+++ b/doc/rfc/rfc3640.txt
@@ -0,0 +1,2411 @@
+
+
+
+
+
+
+Network Working Group J. van der Meer
+Request for Comments: 3640 Philips Electronics
+Category: Standards Track D. Mackie
+ Apple Computer
+ V. Swaminathan
+ Sun Microsystems Inc.
+ D. Singer
+ Apple Computer
+ P. Gentric
+ Philips Electronics
+ November 2003
+
+
+ RTP Payload Format for Transport of MPEG-4 Elementary Streams
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2003). All Rights Reserved.
+
+Abstract
+
+ The Motion Picture Experts Group (MPEG) Committee (ISO/IEC JTC1/SC29
+ WG11) is a working group in ISO that produced the MPEG-4 standard.
+ MPEG defines tools to compress content such as audio-visual
+ information into elementary streams. This specification defines a
+ simple, but generic RTP payload format for transport of any non-
+ multiplexed MPEG-4 elementary stream.
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 2. Carriage of MPEG-4 Elementary Streams Over RTP . . . . . . . . 4
+ 2.1. Signaling by MIME Format Parameters . . . . . . . . . . 4
+ 2.2. MPEG Access Units . . . . . . . . . . . . . . . . . . . 5
+ 2.3. Concatenation of Access Units . . . . . . . . . . . . . 5
+ 2.4. Fragmentation of Access Units . . . . . . . . . . . . . 6
+ 2.5. Interleaving . . . . . . . . . . . . . . . . . . . . . . 6
+ 2.6. Time Stamp Information . . . . . . . . . . . . . . . . . 7
+ 2.7. State Indication of MPEG-4 System Streams . . . . . . . 8
+ 2.8. Random Access Indication . . . . . . . . . . . . . . . . 8
+
+
+
+van der Meer, et al. Standards Track [Page 1]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ 2.9. Carriage of Auxiliary Information . . . . . . . . . . . 8
+ 2.10. MIME Format Parameters and Configuring Conditional Field 8
+ 2.11. Global Structure of Payload Format . . . . . . . . . . . 9
+ 2.12. Modes to Transport MPEG-4 Streams . . . . . . . . . . . 9
+ 2.13. Alignment with RFC 3016 . . . . . . . . . . . . . . . . 10
+ 3. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . 10
+ 3.1. Usage of RTP Header Fields and RTCP . . . . . . . . . . 10
+ 3.2. RTP Payload Structure . . . . . . . . . . . . . . . . . 11
+ 3.2.1. The AU Header Section . . . . . . . . . . . . . 11
+ 3.2.1.1. The AU-header . . . . . . . . . . . . 12
+ 3.2.2. The Auxiliary Section . . . . . . . . . . . . . 14
+ 3.2.3. The Access Unit Data Section . . . . . . . . . . 15
+ 3.2.3.1. Fragmentation. . . . . . . . . . . . . 16
+ 3.2.3.2. Interleaving . . . . . . . . . . . . . 16
+ 3.2.3.3. Constraints for Interleaving . . . . . 17
+ 3.2.3.4. Crucial and Non-Crucial AUs with
+ MPEG-4 System Data . . . . . . . . . . 20
+ 3.3. Usage of this Specification. . . . . . . . . . . . . . . 21
+ 3.3.1. General. . . . . . . . . . . . . . . . . . . . . 21
+ 3.3.2. The Generic Mode . . . . . . . . . . . . . . . . 22
+ 3.3.3. Constant Bit Rate CELP . . . . . . . . . . . . . 22
+ 3.3.4. Variable Bit Rate CELP . . . . . . . . . . . . . 23
+ 3.3.5. Low Bit Rate AAC . . . . . . . . . . . . . . . . 24
+ 3.3.6. High Bit Rate AAC. . . . . . . . . . . . . . . . 25
+ 3.3.7. Additional Modes . . . . . . . . . . . . . . . . 26
+ 4. IANA Considerations. . . . . . . . . . . . . . . . . . . . . . 27
+ 4.1. MIME Type Registration . . . . . . . . . . . . . . . . . 27
+ 4.2. Registration of Mode Definitions with IANA . . . . . . . 33
+ 4.3. Concatenation of Parameters. . . . . . . . . . . . . . . 33
+ 4.4. Usage of SDP . . . . . . . . . . . . . . . . . . . . . . 34
+ 4.4.1. The a=fmtp Keyword . . . . . . . . . . . . . . . 34
+ 5. Security Considerations. . . . . . . . . . . . . . . . . . . . 34
+ 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 35
+ APPENDIX: Usage of this Payload Format. . . . . . . . . . . . . . 36
+ Appendix A. Interleave Analysis . . . . . . . . . . . . . . . . . 36
+ A. Examples of Delay Analysis with Interleave. . . . . . . . . . 36
+ A.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 36
+ A.2. De-interleaving and Error Concealment . . . . . . . . . 36
+ A.3. Simple Group Interleave . . . . . . . . . . . . . . . . 36
+ A.3.1. Introduction . . . . . . . . . . . . . . . . . . 36
+ A.3.2. Determining the De-interleave Buffer Size . . . 37
+ A.3.3. Determining the Maximum Displacement . . . . . . 37
+ A.4. More Subtle Group Interleave . . . . . . . . . . . . . . 38
+ A.4.1. Introduction . . . . . . . . . . . . . . . . . . 38
+ A.4.2. Determining the De-interleave Buffer Size. . . . 38
+ A.4.3. Determining the Maximum Displacement . . . . . . 39
+ A.5. Continuous Interleave . . . . . . . . . . . . . . . . . 39
+ A.5.1. Introduction . . . . . . . . . . . . . . . . . . 39
+
+
+
+van der Meer, et al. Standards Track [Page 2]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ A.5.2. Determining the De-interleave Buffer Size . . . 40
+ A.5.3. Determining the Maximum Displacement . . . . . . 40
+ References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
+ Normative References . . . . . . . . . . . . . . . . . . . . . . . 41
+ Informative References . . . . . . . . . . . . . . . . . . . . . . 41
+ Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42
+ Full Copyright Statement . . . . . . . . . . . . . . . . . . . . . 43
+
+1. Introduction
+
+ The MPEG Committee is Working Group 11 (WG11) in ISO/IEC JTC1 SC29
+ that specified the MPEG-1, MPEG-2 and, more recently, the MPEG-4
+ standards [1]. The MPEG-4 standard specifies compression of audio-
+ visual data into, for example an audio or video elementary stream.
+ In the MPEG-4 standard, these streams take the form of audio-visual
+ objects that may be arranged into an audio-visual scene by means of a
+ scene description. Each MPEG-4 elementary stream consists of a
+ sequence of Access Units; examples of an Access Unit (AU) are an
+ audio frame and a video picture.
+
+ This specification defines a general and configurable payload
+ structure to transport MPEG-4 elementary streams, in particular
+ MPEG-4 audio (including speech) streams, MPEG-4 video streams and
+ also MPEG-4 systems streams, such as BIFS (BInary Format for Scenes),
+ OCI (Object Content Information), OD (Object Descriptor) and IPMP
+ (Intellectual Property Management and Protection) streams. The RTP
+ payload defined in this document is simple to implement and
+ reasonably efficient. It allows for optional interleaving of Access
+ Units (such as audio frames) to increase error resiliency in packet
+ loss.
+
+ Some types of MPEG-4 elementary streams include "crucial" information
+ whose loss cannot be tolerated. However, RTP does not provide
+ reliable transmission, so receipt of that crucial information is not
+ assured. Section 3.2.3.4 specifies how stream state is conveyed so
+ that the receiver can detect the loss of crucial information and
+ cease decoding until the next random access point has been received.
+ Applications transmitting streams that include crucial information,
+ such as OD commands, BIFS commands, or programmatic content such as
+ MPEG-J (Java) and ECMAScript, should include random access points, at
+ a suitable periodicity depending upon the probability of loss, in
+ order to reduce stream corruption to an acceptable level. An example
+ is the carousel mechanism as defined by MPEG in ISO/IEC 14496-1 [1].
+
+ Such applications may also employ additional protocols or services to
+ reduce the probability of loss. At the RTP layer, these measures
+ include payload formats and profiles for retransmission or forward
+ error correction (such as in RFC 2733 [10]), that must be employed
+
+
+
+van der Meer, et al. Standards Track [Page 3]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ with due consideration to congestion control. Another solution that
+ may be appropriate for some applications is to carry RTP over TCP
+ (such as in RFC 2326 [8], section 10.12). At the network layer,
+ resource allocation or preferential service may be available to
+ reduce the probability of loss. For a general description of methods
+ to repair streaming media, see RFC 2354 [9].
+
+ Though the RTP payload format defined in this document is capable of
+ transporting any MPEG-4 stream, other, more specific, formats may
+ exist, such as RFC 3016 [12] for transport of MPEG-4 video (ISO/IEC
+ 14496 [1] part 2).
+
+ Configuration of the payload is provided to accommodate the
+ transportation of any MPEG-4 stream at any possible bit rate.
+ However, for a specific MPEG-4 elementary stream typically only very
+ few configurations are needed. So as to allow for the design of
+ simplified, but dedicated receivers, this specification requires that
+ specific modes be defined for transport of MPEG-4 streams. This
+ document defines modes for MPEG-4 CELP and AAC streams, as well as a
+ generic mode that can be used to transport any MPEG-4 stream. In the
+ future, new RFCs are expected to specify additional modes for the
+ transportation of MPEG-4 streams.
+
+ The RTP payload format defined in this document specifies carriage of
+ system-related information that is often equivalent to the
+ information that may be contained in the MPEG-4 Sync Layer (SL) as
+ defined in MPEG-4 Systems [1]. This document does not prescribe how
+ to transcode or map information from the SL to fields defined in the
+ RTP payload format. Such processing, if any, is left to the
+ discretion of the application. However, to anticipate the need for
+ the transportation of any additional system-related information in
+ the future, an auxiliary field can be configured that may carry any
+ such data.
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in BCP 14, RFC 2119 [4].
+
+2. Carriage of MPEG-4 Elementary Streams over RTP
+
+2.1. Signaling by MIME Format Parameters
+
+ With this payload format, a single MPEG-4 elementary stream can be
+ transported. Information on the type of MPEG-4 stream carried in the
+ payload is conveyed by MIME format parameters, as in an SDP [5]
+ message or by other means (see section 4). These MIME format
+ parameters specify the configuration of the payload. To allow for
+ simplified and dedicated receivers, a MIME format parameter is
+
+
+
+van der Meer, et al. Standards Track [Page 4]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ available to signal a specific mode of using this payload. A mode
+ definition MAY include the type of MPEG-4 elementary stream, as well
+ as the applied configuration, so as to avoid the need for receivers
+ to parse all MIME format parameters. The applied mode MUST be
+ signaled.
+
+2.2. MPEG Access Units
+
+ For carriage of compressed audio-visual data, MPEG defines Access
+ Units. An MPEG Access Unit (AU) is the smallest data entity to which
+ timing information is attributed. In the case of audio, an Access
+ Unit may represent an audio frame and in the case of video, a
+ picture. MPEG Access Units are octet-aligned by definition. If, for
+ example, an audio frame is not octet-aligned, up to 7 zero-padding
+ bits MUST be inserted at the end of the frame to achieve the octet-
+ aligned Access Units, as required by the MPEG-4 specification.
+ MPEG-4 decoders MUST be able to decode AUs in which such padding is
+ applied.
+
+ Consistent with the MPEG-4 specification, this document requires that
+ each MPEG-4 part 2 video Access Unit include all the coded data of a
+ picture, any video stream headers that may precede the coded picture
+ data, and any video stream stuffing that may follow it, up to but not
+ including the startcode indicating the start of a new video stream or
+ the next Access Unit.
+
+2.3. Concatenation of Access Units
+
+ Frequently it is possible to carry multiple Access Units in one RTP
+ packet. This is particularly useful for audio; for example, when AAC
+ is used for encoding a stereo signal at 64 kbits/sec, AAC frames
+ contain on average, approximately 200 octets. On a LAN with a 1500
+ octet MTU, this would allow an average of 7 complete AAC frames to be
+ carried per RTP packet.
+
+ Access Units may have a fixed size in octets, but a variable size is
+ also possible. To facilitate parsing in the case of multiple
+ concatenated AUs in one RTP packet, the size of each AU is made known
+ to the receiver. When concatenating in the case of a constant AU
+ size, this size is communicated "out of band" through a MIME format
+ parameter. When concatenating in case of variable size AUs, the RTP
+ payload carries "in band" an AU size field for each contained AU.
+
+ In combination with the RTP payload length, the size information
+ allows the RTP payload to be split by the receiver back into the
+ individual AUs.
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 5]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ To simplify the implementation of RTP receivers, it is required that
+ when multiple AUs are carried in an RTP packet, each AU MUST be
+ complete, i.e., the number of AUs in an RTP packet MUST be integral.
+
+ In addition, an AU MUST NOT be repeated in other RTP packets; hence
+ repetition of an AU is only possible when using a duplicate RTP
+ packet.
+
+2.4. Fragmentation of Access Units
+
+ MPEG allows for very large Access Units. Since most IP networks have
+ significantly smaller MTU sizes, this payload format allows for the
+ fragmentation of an Access Unit over multiple RTP packets. Hence,
+ when an IP packet is lost after IP-level fragmentation, only an AU
+ fragment may get lost instead of the entire AU. To simplify the
+ implementation of RTP receivers, an RTP packet SHALL either carry one
+ or more complete Access Units or a single fragment of one AU, i.e.,
+ packets MUST NOT contain fragments of multiple Access Units.
+
+2.5. Interleaving
+
+ When an RTP packet carries a contiguous sequence of Access Units, the
+ loss of such a packet can result in a "decoding gap" for the user.
+ One method of alleviating this problem is to allow for the Access
+ Units to be interleaved in the RTP packets. For a modest cost in
+ latency and implementation complexity, significant error resiliency
+ to packet loss can be achieved.
+
+ To support optional interleaving of Access Units, this payload format
+ allows for index information to be sent for each Access Unit. After
+ informing receivers about buffer resources to allocate for de-
+ interleaving, the RTP sender is free to choose the interleaving
+ pattern without propagating this information a priori to the
+ receiver(s). Indeed, the sender could dynamically adjust the
+ interleaving pattern based on the Access Unit size, error rates, etc.
+ The RTP receiver does not need to know the interleaving pattern used;
+ it only needs to extract the index information of the Access Unit and
+ insert the Access Unit into the appropriate sequence in the decoding
+ or rendering queue. An example of interleaving is given below.
+
+
+
+
+
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 6]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ For example, if we assume that an RTP packet contains 3 AUs, and that
+ the AUs are numbered 0, 1, 2, 3, 4, and so forth, and if an
+ interleaving group length of 9 is chosen, then RTP packet(i) contains
+ the following AU(n):
+
+ RTP packet(0): AU(0), AU(3), AU(6)
+ RTP packet(1): AU(1), AU(4), AU(7)
+ RTP packet(2): AU(2), AU(5), AU(8)
+ RTP packet(3): AU(9), AU(12), AU(15)
+ RTP packet(4): AU(10), AU(13), AU(16) Etc.
+
+2.6. Time Stamp Information
+
+ The RTP time stamp MUST carry the sampling instant of the first AU
+ (fragment) in the RTP packet. When multiple AUs are carried within
+ an RTP packet, the time stamps of subsequent AUs can be calculated if
+ the frame period of each AU is known. For audio and video, this is
+ possible if the frame rate is constant. However, in some cases it is
+ not possible to make such a calculation (for example, for variable
+ frame rate video, or for MPEG-4 BIFS streams carrying composition
+ information). To support such cases, this payload format can be
+ configured to carry a time stamp in the RTP payload for each
+ contained Access Unit. A time stamp MAY be conveyed in the RTP
+ payload only for non-first AUs in the RTP packet, and SHALL NOT be
+ conveyed for the first AU (fragment), as the time stamp for the first
+ AU in the RTP packet is carried by the RTP time stamp.
+
+ MPEG-4 defines two types of time stamps: the composition time stamp
+ (CTS) and the decoding time stamp (DTS). The CTS represents the
+ sampling instant of an AU, and hence the CTS is equivalent to the RTP
+ time stamp. The DTS may be used in MPEG-4 video streams that use
+ bi-directional coding, i.e., when pictures are predicted in both
+ forward and backward direction by using either a reference picture in
+ the past, or a reference picture in the future. The DTS cannot be
+ carried in the RTP header. In some cases, the DTS can be derived
+ from the RTP time stamp using frame rate information; this requires
+ deep parsing in the video stream, which may be considered
+ objectionable. If the video frame rate is variable, the required
+ information may not even be present in the video stream. For both
+ reasons, the capability has been defined to optionally carry the DTS
+ in the RTP payload for each contained Access Unit.
+
+ To keep the coding of time stamps efficient, each time stamp
+ contained in the RTP payload is coded as a difference. For the CTS,
+ the offset from the RTP time stamps is provided, and for the DTS, the
+ offset from the CTS.
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 7]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+2.7. State Indication of MPEG-4 System Streams
+
+ ISO/IEC 14496-1 defines states for MPEG-4 system streams. So as to
+ convey state information when transporting MPEG-4 system streams,
+ this payload format allows for the optional carriage in the RTP
+ payload of the stream state for each contained Access Unit. Stream
+ states are used to signal "crucial" AUs that carry information whose
+ loss cannot be tolerated and are also useful when repeating AUs
+ according to the carousel mechanism defined in ISO/IEC 14496-1.
+
+2.8. Random Access Indication
+
+ Random access to the content of MPEG-4 elementary streams may be
+ possible at some but not all Access Units. To signal Access Units
+ where random access is possible, a random access point flag can
+ optionally be carried in the RTP payload for each contained Access
+ Unit. Carriage of random access points is particularly useful for
+ MPEG-4 system streams in combination with the stream state.
+
+2.9. Carriage of Auxiliary Information
+
+ This payload format defines a specific field to carry auxiliary data.
+ The auxiliary data field is preceded by a field that specifies the
+ length of the auxiliary data, so as to facilitate the skipping of
+ data without parsing it. The coding of the auxiliary data is not
+ defined in this document; instead, the format, meaning and signaling
+ of auxiliary information is expected to be specified in one or more
+ future RFCs. Auxiliary information MUST NOT be transmitted until its
+ format, meaning and signaling have been specified and its use has
+ been signaled. Receivers that have knowledge of the auxiliary data
+ MAY decode the auxiliary data, but receivers without knowledge of
+ such data MUST skip the auxiliary data field.
+
+2.10. MIME Format Parameters and Configuring Conditional Fields
+
+ To support the features described in the previous sections, several
+ fields are defined for carriage in the RTP payload. However, their
+ use strongly depends on the type of MPEG-4 elementary stream that is
+ carried. Sometimes a specific field is needed with a certain length,
+ while in other cases such a field is not needed. To be efficient in
+ either case, the fields to support these features are configurable by
+ means of MIME format parameters. In general, a MIME format parameter
+ defines the presence and length of the associated field. A length of
+ zero indicates absence of the field. As a consequence, parsing of
+ the payload requires knowledge of MIME format parameters. The MIME
+ format parameters are conveyed to the receiver via SDP [5] messages,
+ as specified in section 4.4.1, or through other means.
+
+
+
+
+van der Meer, et al. Standards Track [Page 8]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+2.11. Global Structure of Payload Format
+
+ The RTP payload following the RTP header, contains three octet-
+ aligned data sections, of which the first two MAY be empty, see
+ Figure 1.
+
+ +---------+-----------+-----------+---------------+
+ | RTP | AU Header | Auxiliary | Access Unit |
+ | Header | Section | Section | Data Section |
+ +---------+-----------+-----------+---------------+
+
+ <----------RTP Packet Payload----------->
+
+ Figure 1: Data sections within an RTP packet
+
+ The first data section is the AU (Access Unit) Header Section, that
+ contains one or more AU-headers; however, each AU-header MAY be
+ empty, in which case the entire AU Header Section is empty. The
+ second section is the Auxiliary Section, containing auxiliary data;
+ this section MAY also be configured empty. The third section is the
+ Access Unit Data Section, containing either a single fragment of one
+ Access Unit or one or more complete Access Units. The Access Unit
+ Data Section MUST NOT be empty.
+
+2.12. Modes to Transport MPEG-4 Streams
+
+ While it is possible to build fully configurable receivers capable of
+ receiving any MPEG-4 stream, this specification also allows for the
+ design of simplified, but dedicated receivers, that are for example,
+ capable of receiving only one type of MPEG-4 stream. This is
+ achieved by requiring that specific modes be defined in order to use
+ this specification. Each mode may define constraints for transport
+ of one or more types of MPEG-4 streams, for instance on the payload
+ configuration.
+
+ The applied mode MUST be signaled. Signaling the mode is
+ particularly important for receivers that are only capable of
+ decoding one or more specific modes. Such receivers need to
+ determine whether the applied mode is supported, so as to avoid
+ problems with processing of payloads that are beyond the capabilities
+ of the receiver.
+
+ In this document several modes are defined for the transportation of
+ MPEG-4 CELP and AAC streams, as well as a generic mode that can be
+ used for any MPEG-4 stream. In the future, new RFCs may specify
+ other modes of using this specification. However, each mode MUST be
+ in full compliance with this specification (see section 3.3.7).
+
+
+
+
+van der Meer, et al. Standards Track [Page 9]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+2.13. Alignment with RFC 3016
+
+ This payload can be configured as nearly identical to the payload
+ format defined in RFC 3016 [12] for the MPEG-4 video configurations
+ recommended in RFC 3016. Hence, receivers that comply with RFC 3016
+ can decode such RTP payload, provided that additional packets
+ containing video decoder configuration (VO, VOL, VOSH) are inserted
+ in the stream, as required by RFC 3016 [12]. Conversely, receivers
+ that comply with the specification in this document SHOULD be able to
+ decode payloads, names and parameters defined for MPEG-4 video in RFC
+ 3016 [12]. In this respect, it is strongly RECOMMENDED that the
+ implementation provide the ability to ignore "in band" video decoder
+ configuration packets that may be found in streams conforming to the
+ RFC 3016 video payload.
+
+ Note the "out of band" availability of the video decoder
+ configuration is optional in RFC 3016 [12]. To achieve maximum
+ interoperability with the RTP payload format defined in this
+ document, applications that use RFC 3016 to transport MPEG-4 video
+ (part 2) are recommended to make the video decoder configuration
+ available as a MIME parameter.
+
+3. Payload Format
+
+3.1. Usage of RTP Header Fields and RTCP
+
+ Payload Type (PT): The assignment of an RTP payload type for this
+ packet format is outside the scope of this document; it is
+ specified by the RTP profile under which this payload format is
+ used, or signaled dynamically out-of-band (e.g., using SDP).
+
+ Marker (M) bit: The M bit is set to 1 to indicate that the RTP packet
+ payload contains either the final fragment of a fragmented Access
+ Unit or one or more complete Access Units.
+
+ Extension (X) bit: Defined by the RTP profile used.
+
+ Sequence Number: The RTP sequence number SHOULD be generated by the
+ sender in the usual manner with a constant random offset.
+
+ Timestamp: Indicates the sampling instant of the first AU contained
+ in the RTP payload. This sampling instant is equivalent to the
+ CTS in the MPEG-4 time domain. When using SDP, the clock rate of
+ the RTP time stamp MUST be expressed using the "rtpmap" attribute.
+ If an MPEG-4 audio stream is transported, the rate SHOULD be set
+ to the same value as the sampling rate of the audio stream. If an
+ MPEG-4 video stream is transported, it is RECOMMENDED that the
+ rate be set to 90 kHz.
+
+
+
+van der Meer, et al. Standards Track [Page 10]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ In all cases, the sender SHALL make sure that RTP time stamps are
+ identical only if the RTP time stamp refers to fragments of the same
+ Access Unit.
+
+ According to RFC 3550 [2] (section 5.1), it is RECOMMENDED that RTP
+ time stamps start at a random value for security reasons. This is
+ not an issue for synchronization of multiple RTP streams. However,
+ when streams from multiple sources are to be synchronized (for
+ example one stream from local storage, another from an RTP streaming
+ server), synchronization may become impossible if the receiver only
+ knows the original time stamp relationships. In such cases the time
+ stamp relationship required for obtaining synchronization may be
+ provided by out of band means. The format of such information, as
+ well as methods to convey such information, are beyond the scope of
+ this specification.
+
+ SSRC: set as described in RFC 3550 [2].
+
+ CC and CSRC fields are used as described in RFC 3550 [2].
+
+ RTCP SHOULD be used as defined in RFC 3550 [2]. Note that time
+ stamps in RTCP Sender Reports may be used to synchronize multiple
+ MPEG-4 elementary streams and also to synchronize MPEG-4 streams with
+ non-MPEG-4 streams, in case the delivery of these streams uses RTP.
+
+3.2. RTP Payload Structure
+
+3.2.1. The AU Header Section
+
+ When present, the AU Header Section consists of the AU-headers-length
+ field, followed by a number of AU-headers, see Figure 2.
+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
+ |AU-headers-length|AU-header|AU-header| |AU-header|padding|
+ | | (1) | (2) | | (n) | bits |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+-+
+
+ Figure 2: The AU Header Section
+
+ The AU-headers are configured using MIME format parameters and MAY be
+ empty. If the AU-header is configured empty, the AU-headers-length
+ field SHALL NOT be present and consequently the AU Header Section is
+ empty. If the AU-header is not configured empty, then the AU-
+ headers-length is a two octet field that specifies the length in bits
+ of the immediately following AU-headers, excluding the padding bits.
+
+ Each AU-header is associated with a single Access Unit (fragment)
+ contained in the Access Unit Data Section in the same RTP packet.
+
+
+
+van der Meer, et al. Standards Track [Page 11]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ For each contained Access Unit (fragment), there is exactly one AU-
+ header. Within the AU Header Section, the AU-headers are bit-wise
+ concatenated in the order in which the Access Units are contained in
+ the Access Unit Data Section. Hence, the n-th AU-header refers to
+ the n-th AU (fragment). If the concatenated AU-headers consume a
+ non-integer number of octets, up to 7 zero-padding bits MUST be
+ inserted at the end in order to achieve octet-alignment of the AU
+ Header Section.
+
+3.2.1.1. The AU-header
+
+ Each AU-header may contain the fields given in Figure 3. The length
+ in bits of the fields, with the exception of the CTS-flag, the
+ DTS-flag and the RAP-flag fields, is defined by MIME format
+ parameters; see section 4.1. If a MIME format parameter has the
+ default value of zero, then the associated field is not present. The
+ number of bits for fields that are present and that represent the
+ value of a parameter MUST be chosen large enough to correctly encode
+ the largest value of that parameter during the session.
+
+ If present, the fields MUST occur in the mutual order given in Figure
+ 3. In the general case, a receiver can only discover the size of an
+ AU-header by parsing it since the presence of the CTS-delta and DTS-
+ delta fields is signaled by the value of the CTS-flag and DTS-flag,
+ respectively.
+
+ +---------------------------------------+
+ | AU-size |
+ +---------------------------------------+
+ | AU-Index / AU-Index-delta |
+ +---------------------------------------+
+ | CTS-flag |
+ +---------------------------------------+
+ | CTS-delta |
+ +---------------------------------------+
+ | DTS-flag |
+ +---------------------------------------+
+ | DTS-delta |
+ +---------------------------------------+
+ | RAP-flag |
+ +---------------------------------------+
+ | Stream-state |
+ +---------------------------------------+
+
+ Figure 3: The fields in the AU-header. If used, the AU-Index field
+ only occurs in the first AU-header within an AU Header
+ Section; in any other AU-header, the AU-Index-delta field
+ occurs instead.
+
+
+
+van der Meer, et al. Standards Track [Page 12]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ AU-size: Indicates the size in octets of the associated Access Unit
+ in the Access Unit Data Section in the same RTP packet. When the
+ AU-size is associated with an AU fragment, the AU size indicates
+ the size of the entire AU and not the size of the fragment. In
+ this case, the size of the fragment is known from the size of the
+ AU data section. This can be exploited to determine whether a
+ packet contains an entire AU or a fragment, which is particularly
+ useful after losing a packet carrying the last fragment of an AU.
+
+ AU-Index: Indicates the serial number of the associated Access Unit
+ (fragment). For each (in decoding order) consecutive AU or AU
+ fragment, the serial number is incremented by 1. When present,
+ the AU-Index field occurs in the first AU-header in the AU Header
+ Section, but MUST NOT occur in any subsequent (non-first) AU-
+ header in that Section. To encode the serial number in any such
+ non-first AU-header, the AU-Index-delta field is used.
+
+ AU-Index-delta: The AU-Index-delta field is an unsigned integer that
+ specifies the serial number of the associated AU as the difference
+ with respect to the serial number of the previous Access Unit.
+ Hence, for the n-th (n>1) AU, the serial number is found from:
+
+ AU-Index(n) = AU-Index(n-1) + AU-Index-delta(n) + 1
+
+ If the AU-Index field is present in the first AU-header in the AU
+ Header Section, then the AU-Index-delta field MUST be present in
+ any subsequent (non-first) AU-header. When the AU-Index-delta is
+ coded with the value 0, it indicates that the Access Units are
+ consecutive in decoding order. An AU-Index-delta value larger
+ than 0 signals that interleaving is applied.
+
+ CTS-flag: Indicates whether the CTS-delta field is present. A value
+ of 1 indicates that the field is present, a value of 0 indicates
+ that it is not present.
+
+ The CTS-flag field MUST be present in each AU-header if the length
+ of the CTS-delta field is signaled to be larger than zero. In
+ that case, the CTS-flag field MUST have the value 0 in the first
+ AU-header and MAY have the value 1 in all non-first AU-headers.
+ The CTS-flag field SHOULD be 0 for any non-first fragment of an
+ Access Unit.
+
+
+
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 13]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ CTS-delta: Encodes the CTS by specifying the value of CTS as a 2's
+ complement offset (delta) from the time stamp in the RTP header of
+ this RTP packet. The CTS MUST use the same clock rate as the time
+ stamp in the RTP header.
+
+ DTS-flag: Indicates whether the DTS-delta field is present. A value
+ of 1 indicates that DTS-delta is present, a value of 0 indicates
+ that it is not present.
+
+ The DTS-flag field MUST be present in each AU-header if the length
+ of the DTS-delta field is signaled to be larger than zero. The
+ DTS-flag field MUST have the same value for all fragments of an
+ Access Unit.
+
+ DTS-delta: Specifies the value of the DTS as a 2's complement offset
+ (delta) from the CTS. The DTS MUST use the same clock rate as the
+ time stamp in the RTP header. The DTS-delta field MUST have the
+ same value for all fragments of an Access Unit.
+
+ RAP-flag: When set to 1, indicates that the associated Access Unit
+ provides a random access point to the content of the stream. If
+ an Access Unit is fragmented, the RAP flag, if present, MUST be
+ set to 0 for each non-first fragment of the AU.
+
+ Stream-state: Specifies the state of the stream for an AU of an
+ MPEG-4 system stream; each state is identified by a value of a
+ modulo counter. In ISO/IEC 14496-1, MPEG-4 system streams use the
+ AU_SequenceNumber to signal stream states. When the stream state
+ changes, the value of the stream-state MUST be incremented by one.
+
+ Note: no relation is required between stream-states of different
+ streams.
+
+3.2.2. The Auxiliary Section
+
+ The Auxiliary Section consists of the auxiliary-data-size field
+ followed by the auxiliary-data field. Receivers MAY (but are not
+ required to) parse the auxiliary-data field; to facilitate skipping
+ of the auxiliary-data field by receivers, the auxiliary-data-size
+ field indicates the length in bits of the auxiliary-data. If the
+ concatenation of the auxiliary-data-size and the auxiliary-data
+ fields consume a non-integer number of octets, up to 7 zero padding
+ bits MUST be inserted immediately after the auxiliary data in order
+ to achieve octet-alignment. See Figure 4.
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 14]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
+ | auxiliary-data-size | auxiliary-data |padding bits |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. -+-+-+-+-+-+-+-+-+
+
+ Figure 4: The fields in the Auxiliary Section
+
+ The length in bits of the auxiliary-data-size field is configurable
+ by a MIME format parameter; see section 4.1. The default length of
+ zero indicates that the entire Auxiliary Section is absent.
+
+ auxiliary-data-size: specifies the length in bits of the immediately
+ following auxiliary-data field;
+
+ auxiliary-data: the auxiliary-data field contains data of a format
+ not defined by this specification.
+
+3.2.3. The Access Unit Data Section
+
+ The Access Unit Data Section contains an integer number of complete
+ Access Units or a single fragment of one AU. The Access Unit Data
+ Section is never empty. If data of more than one Access Unit is
+ present, then the AUs are concatenated into a contiguous string of
+ octets. See Figure 5. The AUs inside the Access Unit Data Section
+ MUST be in decoding order, though not necessarily contiguous in the
+ case of interleaving.
+
+ The size and number of Access Units SHOULD be adjusted such that the
+ resulting RTP packet is not larger than the path MTU. To handle
+ larger packets, this payload format relies on lower layers for
+ fragmentation, which may result in reduced performance.
+
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |AU(1) |
+ + |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |AU(2) |
+ +-+-+-+-+-+-+-+-+ |
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | | AU(n) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |AU(n) continued|
+ |-+-+-+-+-+-+-+-+
+
+ Figure 5: Access Unit Data Section; each AU is octet-aligned.
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 15]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ When multiple Access Units are carried, the size of each AU MUST be
+ made available to the receiver. If the AU size is variable, then the
+ size of each AU MUST be indicated in the AU-size field of the
+ corresponding AU-header. However, if the AU size is constant for a
+ stream, this mechanism SHOULD NOT be used; instead, the fixed size
+ SHOULD be signaled by the MIME format parameter "constantSize"; see
+ section 4.1.
+
+ The absence of both AU-size in the AU-header and the constantSize
+ MIME format parameter indicates the carriage of a single AU
+ (fragment), i.e., that a single Access Unit (fragment) is transported
+ in each RTP packet for that stream.
+
+3.2.3.1. Fragmentation
+
+ A packet SHALL carry either one or more complete Access Units, or a
+ single fragment of an Access Unit. Fragments of the same Access Unit
+ have the same time stamp but different RTP sequence numbers. The
+ marker bit in the RTP header is 1 on the last fragment of an Access
+ Unit, and 0 on all other fragments.
+
+3.2.3.2. Interleaving
+
+ Unless prohibited by the signaled mode, a sender MAY interleave
+ Access Units. Receivers that are capable of receiving modes that
+ support interleaving MUST be able to decode interleaved Access Units.
+
+ When a sender interleaves Access Units, it needs to provide
+ sufficient information to enable a receiver to unambiguously
+ reconstruct the original order, even in the case of out-of-order
+ packets, packet loss or duplication. The information that senders
+ need to provide depends on whether or not the Access Units have a
+ constant time duration. Access Units have a constant time duration,
+ if:
+
+ TS(i+1) - TS(i) = constant
+
+ for any i, where:
+ i indicates the index of the AU in the original order, and
+ TS(i) denotes the time stamp of AU(i)
+
+ The MIME parameter "constantDuration" SHOULD be used to signal that
+ Access Units have a constant time duration; see section 4.1.
+
+ If the "constantDuration" parameter is present, the receiver can
+ reconstruct the original Access Unit timing based solely on the RTP
+ timestamp and AU-Index-delta. Accordingly, when transmitting Access
+ Units of constant duration, the AU-Index, if present, MUST be set to
+
+
+
+van der Meer, et al. Standards Track [Page 16]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ the value 0. Receivers of constant duration Access Units MUST use
+ the RTP timestamp to determine the index of the first AU in the RTP
+ packet. The AU-Index-delta header and the signaled
+ "constantDuration" are used to reconstruct AU timing.
+
+ If the "constantDuration" parameter is not present, then senders MAY
+ signal AUs of constant duration by coding the AU-Index with zero in
+ each RTP packet. In the absence of the constantDuration parameter
+ receivers MUST conclude that the AUs have constant duration if the
+ AU-index is zero in two consecutive RTP packets.
+
+ When transmitting Access Units of variable duration, then the
+ "constantDuration" parameter MUST NOT be present, and the transmitter
+ MUST use the AU-Index to encode the index information required for
+ re-ordering, and the receiver MUST use that value to determine the
+ index of each AU in the RTP packet. The number of bits of the AU-
+ Index field MUST be chosen so that valid index information is
+ provided at the applied interleaving scheme, without causing problems
+ due to roll-over of the AU-Index field. In addition, the CTS-delta
+ MUST be coded in the AU header for each non-first AU in the RTP
+ packet, so that receivers can place the AUs correctly in time.
+
+ When interleaving is applied, a de-interleave buffer is needed in
+ receivers to put the Access Units in their correct logical
+ consecutive decoding order. This requires the computation of the
+ time stamp for each Access Unit. In case of a constant time duration
+ per Access Unit, the time stamp of the i-th access unit in an RTP
+ packet with RTP time stamp T is calculated as follows:
+
+ Timestamp[0] = T
+ Timestamp[i, i > 0] = T +(Sum(for k=1 to i of (AU-Index-delta[k]
+ + 1))) * access-unit-duration
+
+ When AU-Index-delta is always 0, this reduces to T + i * (access-
+ unit-duration). This is the non-interleaved case, where the frames
+ are consecutive in decoding order. Note that the AU-Index field
+ (present for the first Access Unit) is indeed not needed in this
+ calculation.
+
+3.2.3.3. Constraints for Interleaving
+
+ The size of the packets should be suitably chosen to be appropriate
+ to both the path MTU and the capacity of the receiver's de-interleave
+ buffer. The maximum packet size for a session SHOULD be chosen to
+ not exceed the path MTU.
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 17]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ To allow receivers to allocate sufficient resources for de-
+ interleaving, senders MUST provide the information to receivers as
+ specified in this section.
+
+ AUs enter the decoder in decoding order. The de-interleave buffer is
+ used to re-order a stream of interleaved AUs back into decoding
+ order. When interleaving is applied, the decoding of "early" AUs has
+ to be postponed until all AUs that precede it in decoding order are
+ present. Therefore, these "early" AUs are stored in the de-
+ interleave buffer. As an example in Figure 6, the interleaving
+ pattern from section 2.5 is considered.
+
+ +--+--+--+--+--+--+--+--+--+--+--+-
+ Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
+ +--+--+--+--+--+--+--+--+--+--+--+-
+ Storage of "early" AUs 3 3 3 3 3 3
+ 6 6 6 6 6 6
+ 4 4 4
+ 7 7 7
+ 12 12
+
+ Figure 6: Storage of "early" AUs in the de-interleave buffer per
+ interleaved AU.
+
+ AU(3) is to be delivered to the decoder after AU(0), AU(1) and AU(2);
+ of these AUs, AU(2) arrives from the network last and hence AU(3)
+ needs to be stored until AU(2) is present in the pattern. Similarly,
+ AU(6) is to be stored until AU(5) is present, while AU(4) and AU(7)
+ are to be stored until AU(2) and AU(5) are present, respectively.
+ Note that the fullness of the de-interleave buffer varies in time.
+ In Figure 6, the de-interleave buffer contains at most 4, but often
+ less AUs.
+
+ So as to give a rough indication of the resources needed in the
+ receiver for de-interleaving, the maximum displacement in time of an
+ AU is defined. For any AU(j) in the pattern, each AU(i) with i<j
+ that is not yet present can be determined. The maximum displacement
+ in time of an AU is the maximum difference between the time stamp of
+ an AU in the pattern and the time stamp of the earliest AU that is
+ not yet present. In other words, when considering a sequence of
+ interleaved AUs, then:
+
+ Maximum displacement = max{TS(i) - TS(j)}
+
+ for any i and any j>i, where:
+ i and j indicate the index of the AU in the interleaving
+ pattern, and
+ TS denotes the time stamp of the AU.
+
+
+
+van der Meer, et al. Standards Track [Page 18]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ As an example in Figure 7, the interleaving pattern from section 2.5
+ is considered. For each AU in the pattern, the index is given of the
+ earliest of any earlier AUs not yet present. Hence for each AU(n) in
+ the interleaving pattern the smallest index k (with k<n) of not yet
+ delivered AUs is indicated. A "-" indicates that all previous AUs
+ are present. If the AU period is constant, the maximum displacement
+ equals 5 AU periods, as found for AU(6) and AU(7).
+
+ +--+--+--+--+--+--+--+--+--+--+--+-
+ Interleaved AUs | 0| 3| 6| 1| 4| 7| 2| 5| 8| 9|12|..
+ +--+--+--+--+--+--+--+--+--+--+--+-
+
+ Earliest not yet present AU - 1 1 - 2 2 - - - - 10
+
+ Figure 7: For each AU in the interleaving pattern, the earliest of
+ any earlier AUs not yet present
+
+ When interleaving, senders MUST signal the maximum displacement in
+ time during the session via the MIME format parameter
+ "maxDisplacement"; see section 4.1.
+
+ An estimate of the size of the de-interleave buffer is found by
+ multiplying the maximum displacement by the maximum bit rate:
+
+ size(de-interleave buffer) = {(maxDisplacement) * Rate(max)} / (RTP
+ clock frequency),
+
+ where:
+ Rate(max) is the maximum bit-rate of the transported stream.
+
+ Note that receivers can derive Rate(max) from the MIME format
+ parameters streamType, profile-level-id, and config.
+
+ However, this calculation estimates the size of the de-interleave
+ buffer and the required size may differ from the calculated value.
+ If this calculation under-estimates the size of the
+ de-interleave buffer, then senders, when interleaving, MUST signal a
+ size of the de-interleave buffer via the MIME format parameter
+ "de-interleaveBufferSize"; see section 4.1. If the calculation
+ over-estimates the size of the de-interleave buffer, then senders,
+ when interleaving, MAY signal a size of the de-interleave buffer via
+ the MIME format parameter "de-interleaveBufferSize".
+
+
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 19]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ The signaled size of the de-interleave buffer MUST be large enough to
+ contain all "early" AUs at any point in time during the session.
+ That is:
+
+ minimum de-interleave buffer size = max [sum {if TS(i) > TS(j) then
+ AU-size(i) else 0}]
+
+ for any j and any i<j, where:
+ i and j indicate the index of an AU in the interleaving
+ pattern,
+ TS(i) denotes the time stamp of AU(i), and
+ AU-size(i) denotes the size of AU(i) in number of octets.
+
+ If the "de-interleaveBufferSize" parameter is present, then the
+ applied buffer for de-interleaving in a receiver MUST have a size
+ that is at least equal to the signaled size of the de-interleave
+ buffer, else a size that is at least equal to the calculated size of
+ the de-interleave buffer.
+
+ No matter what interleaving scheme is used, the scheme must be
+ analyzed to calculate the applicable maxDisplacement value, as well
+ as the required size of the de-interleave buffer. Senders SHOULD
+ signal values that are not larger than the strictly required values;
+ if larger values are signaled, the receiver will buffer excessively.
+
+ Note that for low bit-rate material, the applied interleaving may
+ make packets shorter than the MTU size.
+
+3.2.3.4. Crucial and Non-Crucial AUs with MPEG-4 System Data
+
+ Some Access Units with MPEG-4 system data, called "crucial" AUs,
+ carry information whose loss cannot be tolerated, either in the
+ presentation or in the decoder. At each crucial AU in an MPEG-4
+ system stream, the stream state changes. The stream-state MAY remain
+ constant at non-crucial AUs. In ISO/IEC 14496-1, MPEG-4 system
+ streams use the AU_SequenceNumber to signal stream states.
+
+ Example: Given three AUs, AU1 = "Insertion of node X", AU2 = "Set
+ position of node X", AU3 = "Set position of node X". AU1 is crucial,
+ since if it is lost, AU2 cannot be executed. However, AU2 is not
+ crucial, since AU3 can be executed even if AU2 is lost.
+
+ When a crucial AU is (possibly) lost, the stream is corrupted. For
+ example, when an AU is lost and the stream state has changed at the
+ next received AU, then it is possible that the lost AU was crucial.
+ Once corrupted, the stream remains corrupted until the next random
+ access point. Note that loss of non-crucial AUs does not corrupt the
+ stream. When a decoder starts receiving a stream, the decoder MUST
+
+
+
+van der Meer, et al. Standards Track [Page 20]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ consider the stream corrupted until an AU is received that provides a
+ random access point.
+
+ An AU that provides a random access point, as signaled by the RAP-
+ flag, may or may not be crucial. Non-crucial RAP AUs provide a
+ "repeated" random access point for use by decoders that recently
+ joined the stream or that need to re-start decoding after a stream
+ corruption. Non-crucial RAP AUs MUST include all updates since the
+ last crucial RAP AU.
+
+ Upon receiving AUs, decoders are to react as follows:
+
+ a) if the RAP-flag is set to 1 and the stream-state changes, then the
+ AU is a crucial RAP AU, and the AU MUST be decoded.
+
+ b) if the RAP-flag is set to 1 and the stream state does not change,
+ then the AU is a non-crucial RAP AU, and the receiver SHOULD
+ decode it if the stream is corrupted. Otherwise, the decoder MUST
+ ignore the AU.
+
+ c) if the RAP-flag is set to 0, then the AU MUST be decoded, unless
+ the stream is corrupted, in which case the AU MUST be ignored.
+
+3.3. Usage of this Specification
+
+3.3.1. General
+
+ Usage of this specification requires definition of a mode. A mode
+ defines how to use this specification, as deemed appropriate.
+ Senders MUST signal the applied mode via the MIME format parameter
+ "mode", as specified in section 4.1. This specification defines a
+ generic mode that can be used for any MPEG-4 stream, as well as
+ specific modes for the transportation of MPEG-4 CELP and MPEG-4 AAC
+ streams, defined in ISO/IEC 14496-3 [1].
+
+ When use of this payload format is signaled using SDP [5], an
+ "rtpmap" attribute is part of that signaling. The same requirements
+ apply for the rtpmap attribute in any mode compliant to this
+ specification. The general form of an rtpmap attribute is:
+
+ a=rtpmap:<payload type> <encoding name>/<clock rate>[/<encoding
+ parameters>]
+
+ For audio streams, <encoding parameters> specifies the number of
+ audio channels: 2 for stereo material (see RFC 2327 [5]) and 1 for
+ mono. Provided no additional parameters are needed, this parameter
+ may be omitted for mono material, hence its default value is 1.
+
+
+
+
+van der Meer, et al. Standards Track [Page 21]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+3.3.2. The Generic Mode
+
+ The generic mode can be used for any MPEG-4 stream. In this mode, no
+ mode-specific constraints are applied; hence, in the generic mode,
+ the full flexibility of this specification can be exploited. The
+ generic mode is signaled by mode=generic.
+
+ An example is given below for the transportation of a BIFS-Anim
+ stream. In this example carriage of multiple BIFS-Anim Access Units
+ is allowed in one RTP packet. The AU-header contains the AU-size
+ field, the CTS-flag and, if the CTS flag is set to 1, the CTS-delta
+ field. The number of bits of the AU-size and the CTS-delta fields
+ are 10 and 16, respectively. The AU-header also contains the RAP-
+ flag and the Stream-state of 4 bits. This results in an AU-header
+ with a total size of two or four octets per BIFS-Anim AU. The RTP
+ time stamp uses a 1 kHz clock. Note that the media type name is
+ video, because the BIFS-Anim stream is part of an audio-visual
+ presentation. For conventions on media type names, see section 4.1.
+
+ In detail:
+
+ m=video 49230 RTP/AVP 96
+ a=rtpmap:96 mpeg4-generic/1000
+ a=fmtp:96 streamtype=3; profile-level-id=1807; mode=generic;
+ objectType=2; config=0842237F24001FB400094002C0; sizeLength=10;
+ CTSDeltaLength=16; randomAccessIndication=1;
+ streamStateIndication=4
+
+ Note: The a=fmtp line has been wrapped to fit the page, it comprises
+ a single line in the SDP file.
+
+ The hexadecimal value of the "config" parameter is the
+ BIFSConfiguration() as defined in ISO/IEC 14496-1. The
+ BIFSConfiguration() specifies that the BIFS stream is a BIFS-Anim
+ stream. For the description of MIME parameters, see section 4.1.
+
+3.3.3. Constant Bit-rate CELP
+
+ This mode is signaled by mode=CELP-cbr. In this mode, one or more
+ complete CELP frames of fixed size can be transported in one RTP
+ packet; interleaving MUST NOT be used with this mode. The RTP
+ payload consists of one or more concatenated CELP frames, each of
+ equal size. CELP frames MUST NOT be fragmented when using this mode.
+ Both the AU Header Section and the Auxiliary Section MUST be empty.
+
+ The MIME format parameter constantSize MUST be provided to specify
+ the length of each CELP frame.
+
+
+
+
+van der Meer, et al. Standards Track [Page 22]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ For example:
+
+ m=audio 49230 RTP/AVP 96
+ a=rtpmap:96 mpeg4-generic/16000/1
+ a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-cbr; config=
+ 440E00; constantSize=27; constantDuration=240
+
+ Note: The a=fmtp line has been wrapped to fit the page, it comprises
+ a single line in the SDP file.
+
+ The hexadecimal value of the "config" parameter is the
+ AudioSpecificConfig()as defined in ISO/IEC 14496-3.
+ AudioSpecificConfig() specifies a mono CELP stream with a sampling
+ rate of 16 kHz at a fixed bitrate of 14.4 kb/s and 6 sub-frames per
+ CELP frame. For the description of MIME parameters, see section 4.1.
+
+3.3.4. Variable Bit-rate CELP
+
+ This mode is signaled by mode=CELP-vbr. With this mode, one or more
+ complete CELP frames of variable size can be transported in one RTP
+ packet with OPTIONAL interleaving. In this mode, the largest
+ possible value for AU-size is greater than the maximum CELP frame
+ size. Because CELP frames are very small, there is no support for
+ fragmentation of CELP frames. Hence, CELP frames MUST NOT be
+ fragmented when using this mode.
+
+ In this mode, the RTP payload consists of the AU Header Section,
+ followed by one or more concatenated CELP frames. The Auxiliary
+ Section MUST be empty. For each CELP frame contained in the payload,
+ there MUST be a one octet AU-header in the AU Header Section to
+ provide:
+
+ a) the size of each CELP frame in the payload and
+
+ b) index information for computing the sequence (and hence timing) of
+ each CELP frame.
+
+ Transport of CELP frames requires that the AU-size field be coded
+ with 6 bits. Therefore, in this mode 6 bits are allocated to the
+ AU-size field, and 2 bits to the AU-Index(-delta) field. Each AU-
+ Index field MUST be coded with the value 0. In the AU Header
+ Section, the concatenated AU-headers are preceded by the 16-bit AU-
+ headers-length field, as specified in section 3.2.1.
+
+ In addition to the required MIME format parameters, the following
+ parameters MUST be present: sizeLength, indexLength, and
+ indexDeltaLength. CELP frames always have a fixed duration per
+ Access Unit; when interleaving in this mode, this specific duration
+
+
+
+van der Meer, et al. Standards Track [Page 23]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ MUST be signaled by the MIME format parameter constantDuration. In
+ addition, the parameter maxDisplacement MUST be present when
+ interleaving.
+
+ For example:
+
+ m=audio 49230 RTP/AVP 96
+ a=rtpmap:96 mpeg4-generic/16000/1
+ a=fmtp:96 streamtype=5; profile-level-id=14; mode=CELP-vbr; config=
+ 440F20; sizeLength=6; indexLength=2; indexDeltaLength=2;
+ constantDuration=160; maxDisplacement=5
+
+ Note: The a=fmtp line has been wrapped to fit the page; it comprises
+ a single line in the SDP file.
+
+ The hexadecimal value of the "config" parameter is the
+ AudioSpecificConfig() as defined in ISO/IEC 14496-3.
+ AudioSpecificConfig() specifies a mono CELP stream with a sampling
+ rate of 16 kHz, at a bitrate that varies between 13.9 and 16.2 kb/s
+ and with 4 sub-frames per CELP frame. For the description of MIME
+ parameters, see section 4.1.
+
+3.3.5. Low Bit-rate AAC
+
+ This mode is signaled by mode=AAC-lbr. This mode supports the
+ transportation of one or more complete AAC frames of variable size.
+ In this mode, the AAC frames are allowed to be interleaved and hence
+ receivers MUST support de-interleaving. The maximum size of an AAC
+ frame in this mode is 63 octets. AAC frames MUST NOT be fragmented
+ when using this mode. Hence, when using this mode, encoders MUST
+ ensure that the size of each AAC frame is at most 63 octets.
+
+ The payload configuration in this mode is the same as in the variable
+ bit-rate CELP mode as defined in 3.3.4. The RTP payload consists of
+ the AU Header Section, followed by concatenated AAC frames. The
+ Auxiliary Section MUST be empty. For each AAC frame contained in the
+ payload, the one octet AU-header MUST provide:
+
+ a) the size of each AAC frame in the payload and
+
+ b) index information for computing the sequence (and hence timing) of
+ each AAC frame.
+
+ In the AU-header Section, the concatenated AU-headers MUST be
+ preceded by the 16-bit AU-headers-length field, as specified in
+ section 3.2.1.
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 24]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ In addition to the required MIME format parameters, the following
+ parameters MUST be present: sizeLength, indexLength, and
+ indexDeltaLength. AAC frames always have a fixed duration per Access
+ Unit; when interleaving in this mode, this specific duration MUST be
+ signaled by the MIME format parameter constantDuration. In addition,
+ the parameter maxDisplacement MUST be present when interleaving.
+
+ For example:
+
+ m=audio 49230 RTP/AVP 96
+ a=rtpmap:96 mpeg4-generic/22050/1
+ a=fmtp:96 streamtype=5; profile-level-id=14; mode=AAC-lbr; config=
+ 1388; sizeLength=6; indexLength=2; indexDeltaLength=2;
+ constantDuration=1024; maxDisplacement=5
+
+ Note: The a=fmtp line has been wrapped to fit the page; it comprises
+ a single line in the SDP file.
+
+ The hexadecimal value of the "config" parameter is the
+ AudioSpecificConfig(), as defined in ISO/IEC 14496-3.
+ AudioSpecificConfig() specifies a mono AAC stream with a sampling
+ rate of 22.05 kHz. For the description of MIME parameters, see
+ section 4.1.
+
+3.3.6. High Bit-rate AAC
+
+ This mode is signaled by mode=AAC-hbr. This mode supports the
+ transportation of variable size AAC frames. In one RTP packet,
+ either one or more complete AAC frames are carried, or a single
+ fragment of an AAC frame is carried. In this mode, the AAC frames
+ are allowed to be interleaved and hence receivers MUST support de-
+ interleaving. The maximum size of an AAC frame in this mode is 8191
+ octets.
+
+ In this mode, the RTP payload consists of the AU Header Section,
+ followed by either one AAC frame, several concatenated AAC frames or
+ one fragmented AAC frame. The Auxiliary Section MUST be empty. For
+ each AAC frame contained in the payload, there MUST be an AU-header
+ in the AU Header Section to provide:
+
+ a) the size of each AAC frame in the payload and
+
+ b) index information for computing the sequence (and hence timing) of
+ each AAC frame.
+
+ To code the maximum size of an AAC frame requires 13 bits.
+ Therefore, in this configuration 13 bits are allocated to the AU-
+ size, and 3 bits to the AU-Index(-delta) field. Thus, each AU-header
+
+
+
+van der Meer, et al. Standards Track [Page 25]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ has a size of 2 octets. Each AU-Index field MUST be coded with the
+ value 0. In the AU Header Section, the concatenated AU-headers MUST
+ be preceded by the 16-bit AU-headers-length field, as specified in
+ section 3.2.1.
+
+ In addition to the required MIME format parameters, the following
+ parameters MUST be present: sizeLength, indexLength, and
+ indexDeltaLength. AAC frames always have a fixed duration per Access
+ Unit; when interleaving in this mode, this specific duration MUST be
+ signaled by the MIME format parameter constantDuration. In addition,
+ the parameter maxDisplacement MUST be present when interleaving.
+
+ For example:
+
+ m=audio 49230 RTP/AVP 96
+ a=rtpmap:96 mpeg4-generic/48000/6
+ a=fmtp:96 streamtype=5; profile-level-id=16; mode=AAC-hbr;
+ config=11B0; sizeLength=13; indexLength=3;
+ indexDeltaLength=3; constantDuration=1024
+
+ Note: The a=fmtp line has been wrapped to fit the page; it comprises
+ a single line in the SDP file.
+
+ The hexadecimal value of the "config" parameter is the
+ AudioSpecificConfig(), as defined in ISO/IEC 14496-3.
+ AudioSpecificConfig() specifies a 5.1 channel AAC stream with a
+ sampling rate of 48 kHz. For the description of MIME parameters, see
+ section 4.1.
+
+3.3.7. Additional Modes
+
+ This specification only defines the modes specified in sections 3.3.2
+ through 3.3.6. Additional modes are expected to be defined in future
+ RFCs. Each additional mode MUST be in full compliance with this
+ specification.
+
+ Any new mode MUST be defined such that an implementation including
+ all the features of this specification can decode the payload format
+ corresponding to this new mode. For this reason, a mode MUST NOT
+ specify new default values for MIME parameters. In particular, MIME
+ parameters that configure the RTP payload MUST be present (unless
+ they have the default value), even if its presence is redundant in
+ case the mode assigns a fixed value to a parameter. A mode may
+ additionally define that some MIME parameters are required instead of
+ optional, that some MIME parameters have fixed values (or ranges),
+ and that there are rules restricting its usage.
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 26]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+4. IANA Considerations
+
+ This section describes the MIME types and names associated with this
+ payload format. Section 4.1 registers the MIME types, as per RFC
+ 2048 [3].
+
+ This format may require additional information about the mapping to
+ be made available to the receiver. This is done using parameters
+ described in the next section.
+
+4.1. MIME Type Registration
+
+ MIME media type name: "video" or "audio" or "application"
+
+ "video" MUST be used for MPEG-4 Visual streams (ISO/IEC 14496-2) or
+ MPEG-4 Systems streams (ISO/IEC 14496-1) that convey information
+ needed for an audio/visual presentation.
+
+ "audio" MUST be used for MPEG-4 Audio streams (ISO/IEC 14496-3) or
+ MPEG-4 Systems streams that convey information needed for an audio
+ only presentation.
+
+ "application" MUST be used for MPEG-4 Systems streams (ISO/IEC
+ 14496-1) that serve purposes other than audio/visual presentation,
+ e.g., in some cases when MPEG-J (Java) streams are transmitted.
+
+ Depending on the required payload configuration, MIME format
+ parameters may need to be available to the receiver. This is done
+ using the parameters described in the next section. There are
+ required and optional parameters.
+
+ Optional parameters are of two types: general parameters and
+ configuration parameters. The configuration parameters are used to
+ configure the fields in the AU Header section and in the auxiliary
+ section. The absence of any configuration parameter is equivalent to
+ the associated field set to its default value, which is always zero.
+ The absence of all configuration parameters results in a default
+ "basic" configuration with an empty AU-header section and an empty
+ auxiliary section in each RTP packet.
+
+ MIME subtype name: mpeg4-generic
+
+ Required parameters:
+
+ MIME format parameters are not case dependent; for clarity however,
+ both upper and lower case are used in the names of the parameters
+ described in this specification.
+
+
+
+
+van der Meer, et al. Standards Track [Page 27]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ streamType:
+ The integer value that indicates the type of MPEG-4 stream that is
+ carried; its coding corresponds to the values of the streamType,
+ as defined in Table 9 (streamType Values) in ISO/IEC 14496-1.
+
+ profile-level-id:
+ A decimal representation of the MPEG-4 Profile Level indication.
+ This parameter MUST be used in the capability exchange or session
+ set-up procedure to indicate the MPEG-4 Profile and Level
+ combination of which the relevant MPEG-4 media codec is capable.
+
+ For MPEG-4 Audio streams, this parameter is the decimal value from
+ Table 5 (audioProfileLevelIndication Values) in ISO/IEC 14496-
+ 1, indicating which MPEG-4 Audio tool subsets are required to
+ decode the audio stream.
+
+ For MPEG-4 Visual streams, this parameter is the decimal value
+ from Table G-1 (FLC table for profile and level indication) of
+ ISO/IEC 14496-2 [1], indicating which MPEG-4 Visual tool
+ subsets are required to decode the visual stream.
+
+ For BIFS streams, this parameter is the decimal value obtained
+ from (SPLI + 256*GPLI), where:
+ SPLI is the decimal value from Table 4 in ISO/IEC 14496-1 with
+ the applied sceneProfileLevelIndication;
+ GPLI is the decimal value from Table 7 in ISO/IEC 14496-1 with
+ the applied graphicsProfileLevelIndication.
+
+ For MPEG-J streams, this parameter is the decimal value from table
+ 13 (MPEGJProfileLevelIndication) in ISO/IEC 14496-1, indicating
+ the profile and level of the MPEG-J stream.
+
+ For OD streams, this parameter is the decimal value from table 3
+ (ODProfileLevelIndication) in ISO/IEC 14496-1, indicating the
+ profile and level of the OD stream.
+
+ For IPMP streams, this parameter has either the decimal value 0,
+ indicating an unspecified profile and level, or a value larger
+ than zero, indicating an MPEG-4 IPMP profile and level as
+ defined in a future MPEG-4 specification.
+
+ For Clock Reference streams and Object Content Info streams, this
+ parameter has the decimal value zero, indicating that profile
+ and level information is conveyed through the OD framework.
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 28]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ config:
+ A hexadecimal representation of an octet string that expresses the
+ media payload configuration. Configuration data is mapped onto
+ the hexadecimal octet string in an MSB-first basis. The first bit
+ of the configuration data SHALL be located at the MSB of the first
+ octet. In the last octet, if necessary to achieve octet-
+ alignment, up to 7 zero-valued padding bits shall follow the
+ configuration data.
+
+ For MPEG-4 Audio streams, config is the audio object type specific
+ decoder configuration data AudioSpecificConfig(), as defined in
+ ISO/IEC 14496-3. For Structured Audio, the
+ AudioSpecificConfig() may be conveyed by other means, not
+ defined by this specification. If the AudioSpecificConfig() is
+ conveyed by other means for Structured Audio, then the config
+ MUST be a quoted empty hexadecimal octet string, as follows:
+ config="".
+
+ Note that a future mode of using this RTP payload format for
+ Structured Audio may define such other means.
+
+ For MPEG-4 Visual streams, config is the MPEG-4 Visual
+ configuration information as defined in subclause 6.2.1, Start
+ codes of ISO/IEC 14496-2. The configuration information
+ indicated by this parameter SHALL be the same as the
+ configuration information in the corresponding MPEG-4 Visual
+ stream, except for first-half-vbv-occupancy and latter-half-
+ vbv-occupancy, if it exists, which may vary in the repeated
+ configuration information inside an MPEG-4 Visual stream (See
+ 6.2.1 Start codes of ISO/IEC 14496-2).
+
+ For BIFS streams, this is the BIFSConfig() information as defined
+ in ISO/IEC 14496-1. Version 1 of BIFSConfig is defined in
+ section 9.3.5.2, and version 2 is defined in section 9.3.5.3.
+ The MIME format parameter objectType signals the version of
+ BIFSConfig.
+
+ For IPMP streams, this is either a quoted empty hexadecimal octet
+ string, indicating the absence of any decoder configuration
+ information (config=""), or the IPMPConfiguration() as will be
+ defined in a future MPEG-4 IPMP specification.
+
+ For Object Content Info (OCI) streams, this is the
+ OCIDecoderConfiguration() information of the OCI stream, as
+ defined in section 8.4.2.4 in ISO/IEC 14496-1.
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 29]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ For OD streams, Clock Reference streams and MPEG-J streams, this
+ is a quoted empty hexadecimal octet string (config=""), as no
+ information on the decoder configuration is required.
+
+ mode:
+ The mode in which this specification is used. The following modes
+ can be signaled:
+
+ mode=generic,
+ mode=CELP-cbr,
+ mode=CELP-vbr,
+ mode=AAC-lbr and
+ mode=AAC-hbr.
+
+ Other modes are expected to be defined in future RFCs. See also
+ section 3.3.7 and 4.2 of RFC 3640.
+
+ Optional general parameters:
+
+ objectType:
+ The decimal value from Table 8 in ISO/IEC 14496-1, indicating the
+ value of the objectTypeIndication of the transported stream. For
+ BIFS streams, this parameter MUST be present to signal the version
+ of BIFSConfiguration(). Note that objectTypeIndication may signal
+ a non-MPEG-4 stream and that the RTP payload format defined in
+ this document may not be suitable for carrying a stream that is
+ not defined by MPEG-4. The objectType parameter SHOULD NOT be set
+ to a value that signals a stream that cannot be carried by this
+ payload format.
+
+ constantSize:
+ The constant size in octets of each Access Unit for this stream.
+ The constantSize and the sizeLength parameters MUST NOT be
+ simultaneously present.
+
+ constantDuration:
+ The constant duration of each Access Unit for this stream,
+ measured with the same units as the RTP time stamp.
+
+ maxDisplacement:
+ The decimal representation of the maximum displacement in time of
+ an interleaved AU, as defined in section 3.2.3.3, expressed in
+ units of the RTP time stamp clock.
+
+ This parameter MUST be present when interleaving is applied.
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 30]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ de-interleaveBufferSize:
+ The decimal representation in number of octets of the size of the
+ de-interleave buffer, described in section 3.2.3.3. When
+ interleaving, this parameter MUST be present if the calculation of
+ the de-interleave buffer size given in 3.2.3.3 and based on
+ maxDisplacement and rate(max) under-estimates the size of the
+ de-interleave buffer. If this calculation does not under-estimate
+ the size of the de-interleave buffer, then the
+ de-interleaveBufferSize parameter SHOULD NOT be present.
+
+ Optional configuration parameters:
+
+ sizeLength:
+ The number of bits on which the AU-size field is encoded in the
+ AU-header. The sizeLength and the constantSize parameters MUST
+ NOT be simultaneously present.
+
+ indexLength:
+ The number of bits on which the AU-Index is encoded in the first
+ AU-header. The default value of zero indicates the absence of the
+ AU-Index field in each first AU-header.
+
+ indexDeltaLength:
+ The number of bits on which the AU-Index-delta field is encoded in
+ any non-first AU-header. The default value of zero indicates the
+ absence of the AU-Index-delta field in each non-first AU-header.
+
+ CTSDeltaLength:
+ The number of bits on which the CTS-delta field is encoded in the
+ AU-header.
+
+ DTSDeltaLength:
+ The number of bits on which the DTS-delta field is encoded in the
+ AU-header.
+
+ randomAccessIndication:
+ A decimal value of zero or one, indicating whether the RAP-flag is
+ present in the AU-header. The decimal value of one indicates
+ presence of the RAP-flag, the default value zero indicates its
+ absence.
+
+ streamStateIndication:
+ The number of bits on which the Stream-state field is encoded in
+ the AU-header. This parameter MAY be present when transporting
+ MPEG-4 system streams, and SHALL NOT be present for MPEG-4 audio
+ and MPEG-4 video streams.
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 31]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ auxiliaryDataSizeLength:
+ The number of bits that is used to encode the auxiliary-data-size
+ field.
+
+ Applications MAY use more parameters, in addition to those defined
+ above. Each additional parameter MUST be registered with IANA to
+ ensure that there is not a clash of names. Each additional parameter
+ MUST be accompanied by a specification in the form of an RFC, MPEG
+ standard, or other permanent and readily available reference (the
+ "Specification Required" policy defined in RFC 2434 [6]). Receivers
+ MUST tolerate the presence of such additional parameters, but these
+ parameters SHALL NOT impact the decoding of receivers that comply
+ with this specification.
+
+ Encoding considerations:
+ This MIME subtype is defined for RTP transport only. System
+ bitstreams MUST be generated according to MPEG-4 Systems
+ specifications (ISO/IEC 14496-1). Video bitstreams MUST be generated
+ according to MPEG-4 Visual specifications (ISO/IEC 14496-2). Audio
+ bitstreams MUST be generated according to MPEG-4 Audio specifications
+ (ISO/IEC 14496-3). The RTP packets MUST be packetized according to
+ the RTP payload format defined in RFC 3640.
+
+ Security considerations:
+ As defined in section 5 of RFC 3640.
+
+ Interoperability considerations:
+ MPEG-4 provides a large and rich set of tools for the coding of
+ visual objects. For effective implementation of the standard,
+ subsets of the MPEG-4 tool sets have been provided for use in
+ specific applications. These subsets, called 'Profiles', limit the
+ size of the tool set a decoder is required to implement. In order to
+ restrict computational complexity, one or more 'Levels' are set for
+ each Profile. A Profile@Level combination allows:
+
+ . a codec builder to implement only the subset of the standard
+ he needs, while maintaining interworking with other MPEG-4
+ devices that implement the same combination, and
+
+ . checking whether MPEG-4 devices comply with the standard
+ ('conformance testing').
+
+ A stream SHALL be compliant with the MPEG-4 Profile@Level specified
+ by the parameter "profile-level-id". Interoperability between a
+ sender and a receiver is achieved by specifying the parameter
+ "profile-level-id" in MIME content. In the capability
+ exchange/announcement procedure, this parameter may mutually be set
+ to the same value.
+
+
+
+van der Meer, et al. Standards Track [Page 32]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ Published specification:
+ The specifications for MPEG-4 streams are presented in ISO/IEC
+ 14496-1, 14496-2, and 14496-3. The RTP payload format is described
+ in RFC 3640.
+
+ Applications which use this media type:
+ Multimedia streaming and conferencing tools.
+
+ Additional information: none
+
+ Magic number(s): none
+
+ File extension(s):
+ None. A file format with the extension .mp4 has been defined for
+ MPEG-4 content but is not directly correlated with this MIME type for
+ which the sole purpose is RTP transport.
+
+ Macintosh File Type Code(s): none
+
+ Person & email address to contact for further information:
+ Authors of RFC 3640, IETF Audio/Video Transport working group.
+
+ Intended usage: COMMON
+
+ Author/Change controller:
+ Authors of RFC 3640, IETF Audio/Video Transport working group.
+
+4.2. Registration of Mode Definitions with IANA
+
+ This specification can be used in a number of modes. The mode of
+ operation is signaled using the "mode" MIME parameter, with the
+ initial set of values specified in section 4.1. New modes may be
+ defined at any time, as described in section 3.3.7. These modes MUST
+ be registered with IANA, to ensure that there is not a clash of
+ names.
+
+ A new mode registration MUST be accompanied by a specification in the
+ form of an RFC, MPEG standard, or other permanent and readily
+ available reference (the "Specification Required" policy defined in
+ RFC 2434 [6]).
+
+4.3. Concatenation of Parameters
+
+ Multiple parameters SHOULD be expressed as a MIME media type string,
+ in the form of a semicolon-separated list of parameter=value pairs
+ (for parameter usage examples see sections 3.3.2 up to 3.3.6).
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 33]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+4.4. Usage of SDP
+
+4.4.1. The a=fmtp Keyword
+
+ It is assumed that one typical way to transport the above-described
+ parameters associated with this payload format is via an SDP message
+ [5] for example transported to the client in reply to an RTSP
+ DESCRIBE [8] or via SAP [11]. In that case, the (a=fmtp) keyword
+ MUST be used as described in RFC 2327 [5], section 6, the syntax then
+ being:
+
+ a=fmtp:<format> <parameter name>=<value>[; <parameter name>=<value>]
+
+5. Security Considerations
+
+ RTP packets using the payload format defined in this specification
+ are subject to the security considerations discussed in the RTP
+ specification [2]. This implies that confidentiality of the media
+ streams is achieved by encryption. Because the data compression used
+ with this payload format is applied end-to-end, encryption may be
+ performed on the compressed data so there is no conflict between the
+ two operations. The packet processing complexity of this payload
+ type (i.e., excluding media data processing) does not exhibit any
+ significant non-uniformity in the receiver side to cause a denial-
+ of-service threat.
+
+ However, it is possible to inject non-compliant MPEG streams (Audio,
+ Video, and Systems) so that the receiver/decoder's buffers are
+ overloaded, which might compromise the functionality of the receiver
+ or even crash it. This is especially true for end-to-end systems
+ like MPEG, where the buffer models are precisely defined.
+
+ MPEG-4 Systems support stream types including commands that are
+ executed on the terminal, like OD commands, BIFS commands, etc. and
+ programmatic content like MPEG-J (Java(TM) Byte Code) and MPEG-4
+ scripts. It is possible to use one or more of the above in a manner
+ non-compliant to MPEG to crash the receiver or make it temporarily
+ unavailable. Senders that transport MPEG-4 content SHOULD ensure
+ that such content is MPEG compliant, as defined in the compliance
+ part of IEC/ISO 14496 [1]. Receivers that support MPEG-4 content
+ should prevent malfunctioning of the receiver in case of non MPEG
+ compliant content.
+
+ Authentication mechanisms can be used to validate the sender and the
+ data to prevent security problems due to non-compliant malignant
+ MPEG-4 streams.
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 34]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ In ISO/IEC 14496-1, a security model is defined for MPEG-4 Systems
+ streams carrying MPEG-J access units that comprise Java(TM) classes
+ and objects. MPEG-J defines a set of Java APIs and a secure
+ execution model. MPEG-J content can call this set of APIs and
+ Java(TM) methods from a set of Java packages supported in the
+ receiver within the defined security model. According to this
+ security model, downloaded byte code is forbidden to load libraries,
+ define native methods, start programs, read or write files, or read
+ system properties. Receivers can implement intelligent filters to
+ validate the buffer requirements or parametric (OD, BIFS, etc.) or
+ programmatic (MPEG-J, MPEG-4 scripts) commands in the streams.
+ However, this can increase the complexity significantly.
+
+ Implementors of MPEG-4 streaming over RTP who also implement MPEG-4
+ scripts (subset of ECMAScript) MUST ensure that the action of such
+ scripts is limited solely to the domain of the single presentation in
+ which they reside (thus disallowing session to session communication,
+ access to local resources and storage, etc). Though loading static
+ network-located resources (such as media) into the presentation
+ should be permitted, network access by scripts MUST be restricted to
+ such a (media) download.
+
+6. Acknowledgements
+
+ This document evolved into RFC 3640 after several revisions. Thanks
+ to contributions from people in the ISMA forum, the IETF AVT Working
+ Group and the 4-on-IP ad-hoc group within MPEG. The authors wish to
+ thank all people involved, particularly Andrea Basso, Stephen Casner,
+ M. Reha Civanlar, Carsten Herpel, John Lazaro, Zvi Lifshitz, Young-
+ kwon Lim, Alex MacAulay, Bill May, Colin Perkins, Dorairaj V and
+ Stephan Wenger for their valuable comments and support.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 35]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+APPENDIX: Usage of this Payload Format
+
+Appendix A. Interleave Analysis
+
+A. Examples of Delay Analysis with Interleave
+
+A.1. Introduction
+
+ Interleaving issues are discussed in this appendix. Some general
+ notes are provided on de-interleaving and error concealment, while a
+ number of interleaving patterns are examined, in particular for
+ determining the size of the de-interleave buffer and the maximum
+ displacement of access units in time. In these examples, the maximum
+ displacement is cited in terms of an access unit count, for ease of
+ reading. In actual streams, it is signaled in units of the RTP time
+ stamp clock.
+
+A.2. De-interleaving and Error Concealment
+
+ This appendix does not describe any details on de-interleaving and
+ error concealment, as the control of the AU decoding and error
+ concealment process has little to do with interleaving. If the next
+ AU to be decoded is present and there is sufficient storage available
+ for the decoded AU, then decode it immediately. If not, wait. When
+ the decoding deadline is reached (i.e., the time when decoding must
+ begin in order to be completed by the time the AU is to be
+ presented), or if the decoder is some hardware that presents a
+ constant delay between initiation of decoding of an AU and
+ presentation of that AU, then decoding must begin at that deadline
+ time.
+
+ If the next AU to be decoded is not present when the decoding
+ deadline is reached, then that AU is lost so the receiver must take
+ whatever error concealment measures are deemed appropriate. The
+ play-out delay may need to be adjusted at that point (especially if
+ other AUs have also missed their deadline recently). Or, if it was a
+ momentary delay, and maintaining the latency is important, then the
+ receiver should minimize the glitch and continue processing with the
+ next AU.
+
+A.3. Simple Group Interleave
+
+A.3.1. Introduction
+
+ An example of regular interleave is when packets are formed into
+ groups. If the 'stride' of the interleave (the distance between
+ interleaved AUs) is N, packet 0 could contain AU(0), AU(N), AU(2N),
+ and so on; packet 1 could contain AU(1), AU(1+N), AU(1+2N), and so
+
+
+
+van der Meer, et al. Standards Track [Page 36]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ on. If there are M access units in a packet, then there are M*N
+ access units in the group.
+
+ An example with N=M=3 follows; note that this is the same example as
+ given in section 2.5 and that a fixed time duration per Access Unit
+ is assumed:
+
+ Packet Time stamp Carried AUs AU-Index, AU-Index-delta
+ P(0) T[0] 0, 3, 6 0, 2, 2
+ P(1) T[1] 1, 4, 7 0, 2, 2
+ P(2) T[2] 2, 5, 8 0, 2, 2
+ P(3) T[9] 9,12,15 0, 2, 2
+
+ In this example, the AU-Index is present in the first AU-header and
+ coded with the value 0, as required for fixed duration AUs. The
+ position of the first AU of each packet within the group is defined
+ by the RTP time stamp, while the AU-Index-delta field indicates the
+ position of subsequent AUs relative to the first AU in the packet.
+ All AU-Index-delta fields are coded with the value N-1, equal to 2 in
+ this example. Hence the RTP time stamp and the AU-Index-delta are
+ used to reconstruct the original order. See also section 3.2.3.2.
+
+A.3.2. Determining the De-interleave Buffer Size
+
+ For the regular pattern as in this example, Figure 6 in section
+ 3.2.3.3 shows that the de-interleave buffer stores at most 4 AUs. A
+ de-interleaveBufferSize value that is at least equal to the total
+ number of octets of any 4 "early" AUs that are stored at the same
+ time may be signaled.
+
+A.3.3. Determining the Maximum Displacement
+
+ For the regular pattern as in this example, Figure 7 in section 3.3
+ shows that the maximum displacement in time equals 5 AU periods.
+ Hence, the minimum maxDisplacement value that must be signaled is 5
+ AU periods. In case each AU has the same size, this maxDisplacement
+ value over-estimates the de-interleave buffer size with one AU.
+ However, note that in case of variable AU sizes, the total size of
+ any 4 "early" AUs that must be stored at the same time may exceed
+ maxDisplacement times the maximum bitrate, in which case the de-
+ interleaveBufferSize must be signaled.
+
+
+
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 37]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+A.4. More Subtle Group Interleave
+
+A.4.1. Introduction
+
+ Another example of forming packets with group interleave is given
+ below. In this example, the packets are formed such that the loss of
+ two subsequent RTP packets does not cause the loss of two subsequent
+ AUs. Note that in this example, the RTP time stamps of packet 3 and
+ packet 4 are earlier than the RTP time stamps of packets 1 and 2,
+ respectively; a fixed time duration per Access Unit is assumed.
+
+ Packet Time stamp Carried AUs AU-Index, AU-Index-delta
+ 0 T[0] 0, 5 0, 4
+ 1 T[2] 2, 7 0, 4
+ 2 T[4] 4, 9 0, 4
+ 3 T[1] 1, 6 0, 4
+ 4 T[3] 3, 8 0, 4
+ 5 T[10] 10, 15 0, 4
+ and so on ..
+
+ In this example, the AU-Index is present in the first AU-header and
+ coded with the value 0, as required for AUs with a fixed duration.
+ To reconstruct the original order, the RTP time stamp and the AU-
+ Index-delta (coded with the value 4) are used. See also section
+ 3.2.3.2.
+
+A.4.2. Determining the De-interleave Buffer Size
+
+ From Figure 8, it can be to determined that at most 5 "early" AUs are
+ to be stored. If the AUs are of constant size, then this value
+ equals 5 times the AU size. The minimum size of the de-interleave
+ buffer equals the maximum total number of octets of the "early" AUs
+ that are to be stored at the same time. This gives the minimum value
+ of the de-interleaveBufferSize that may be signaled.
+
+ +--+--+--+--+--+--+--+--+--+--+
+ Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
+ +--+--+--+--+--+--+--+--+--+--+
+ - - 5 - 5 - 2 7 4 9
+ 7 4 9 5
+ "Early" AUs 5 6
+ 7 7
+ 9 9
+
+ Figure 8: Storage of "early" AUs in the de-interleave buffer per
+ interleaved AU.
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 38]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+A.4.3. Determining the Maximum Displacement
+
+ From Figure 9, it can be seen that the maximum displacement in time
+ equals 8 AU periods. Hence the minimum maxDisplacement value to be
+ signaled is 8 AU periods.
+
+ +--+--+--+--+--+--+--+--+--+--+
+ Interleaved AUs | 0| 5| 2| 7| 4| 9| 1| 6| 3| 8|
+ +--+--+--+--+--+--+--+--+--+--+
+
+ Earliest not yet present AU - 1 1 1 1 1 - 3 - -
+
+ Figure 9: For each AU in the interleaving pattern, the earliest of
+ any earlier AUs not yet present
+
+ In case each AU has the same size, the found maxDisplacement value
+ over-estimates the de-interleave buffer size with three AUs.
+ However, in case of variable AU sizes, the total size of any 5
+ "early" AUs stored at the same time may exceed maxDisplacement times
+ the maximum bitrate, in which case de-interleaveBufferSize must be
+ signaled.
+
+A.5. Continuous Interleave
+
+A.5.1. Introduction
+
+ In continuous interleave, once the scheme is 'primed', the number of
+ AUs in a packet exceeds the 'stride' (the distance between them).
+ This shortens the buffering needed, smoothes the data-flow, and gives
+ slightly larger packets -- and thus lower overhead -- for the same
+ interleave. For example, here is a continuous interleave also over a
+ stride of 3 AUs, but with 4 AUs per packet, for a run of 20 AUs.
+ This shows both how the scheme 'starts up' and how it finishes. Once
+ again, the example assumes fixed time duration per Access Unit.
+
+ Packet Time-stamp Carried AUs AU-Index, AU-Index-delta
+ 0 T[0] 0 0
+ 1 T[1] 1 4 0 2
+ 2 T[2] 2 5 8 0 2 2
+ 3 T[3] 3 6 9 12 0 2 2 2
+ 4 T[7] 7 10 13 16 0 2 2 2
+ 5 T[11] 11 14 17 20 0 2 2 2
+ 6 T[15] 15 18 0 2
+ 7 T[19] 19 0
+
+ In this example, the AU-Index is present in the first AU-header and
+ coded with the value 0, as required for AUs with a fixed duration.
+ To reconstruct the original order, the RTP time stamp and the
+
+
+
+van der Meer, et al. Standards Track [Page 39]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+ AU-Index-delta (coded with the value 2) are used. See also 3.2.3.2.
+ Note that this example has RTP time-stamps in increasing order.
+
+A.5.2. Determining the De-interleave Buffer Size
+
+ For this example the de-interleave buffer size can be derived from
+ Figure 10. The maximum number of "early" AUs is 3. If the AUs are
+ of constant size, then the de-interleave buffer size equals 3 times
+ the AU size. Compared to the example in A.2, for constant size AUs
+ the de-interleave buffer size is reduced from 4 to 3 times the AU
+ size, while maintaining the same 'stride'.
+
+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
+ Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16|
+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
+ - - - 4 - - 4 8 - - 8 12 - -
+ 5 9
+ "Early" AUs 8 12
+
+
+ Figure 10: Storage of "early" AUs in the de-interleave buffer per
+ interleaved AU.
+
+A.5.3. Determining the Maximum Displacement
+
+ For this example, the maximum displacement has a value of 5 AU
+ periods. See Figure 11. Compared to the example in A.2, the maximum
+ displacement does not decrease, though in fact less de-interleave
+ buffering is required.
+
+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
+ Interleaved AUs | 0| 1| 4| 2| 5| 8| 3| 6| 9|12| 7|10|13|16|
+ +--+--+--+--+--+--+--+--+--+--+--+--+--+--+-
+ Earliest not yet
+ present AU - - 2 - 3 3 - - 7 7 - - 11 11
+
+
+ Figure 11: For each AU in the interleaving pattern, the earliest of
+ any earlier AUs not yet present
+
+
+
+
+
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 40]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+References
+
+Normative References
+
+ [1] ISO/IEC International Standard 14496 (MPEG-4); "Information
+ technology - Coding of audio-visual objects", January 2000
+
+ [2] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson,
+ "RTP: A Transport Protocol for Real-Time Applications", RFC
+ 3550, July 2003.
+
+ [3] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet
+ Mail Extensions (MIME) Part Four: Registration Procedures", BCP
+ 13, RFC 2048, November 1996.
+
+ [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [5] Handley, M. and V. Jacobson, "SDP: Session Description
+ Protocol", RFC 2327, April 1998.
+
+ [6] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
+ Considerations Section in RFCs", BCP 26, RFC 2434, October 1998.
+
+Informative References
+
+ [7] Hoffman, D., Fernando, G., Goyal, V. and M. Civanlar, "RTP
+ Payload Format for MPEG1/MPEG2 Video", RFC 2250, January 1998.
+
+ [8] Schulzrinne, H., Rao, A. and R. Lanphier, "Real-Time Session
+ Protocol (RTSP)", RFC 2326, April 1998.
+
+ [9] Perkins, C. and O. Hodson, "Options for Repair of Streaming
+ Media", RFC 2354, June 1998.
+
+ [10] Schulzrinne, H. and J. Rosenberg, "An RTP Payload Format for
+ Generic Forward Error Correction", RFC 2733, December 1999.
+
+ [11] Handley, M., Perkins, C. and E. Whelan, "Session Announcement
+ Protocol", RFC 2974, October 2000.
+
+ [12] Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y. and H. Kimata,
+ "RTP Payload Format for MPEG-4 Audio/Visual Streams", RFC 3016,
+ November 2000.
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 41]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+Authors' Addresses
+
+ Jan van der Meer
+ Philips Electronics
+ Prof Holstlaan 4
+ Building WAH-1
+ 5600 JZ Eindhoven
+ Netherlands
+
+ EMail: jan.vandermeer@philips.com
+
+
+ David Mackie
+ Apple Computer, Inc.
+ One Infinite Loop, MS:302-3KS
+ Cupertino CA 95014
+
+ EMail: dmackie@apple.com
+
+
+ Viswanathan Swaminathan
+ Sun Microsystems Inc.
+ 2600 Casey Avenue
+ Mountain View, CA 94043
+
+ EMail: viswanathan.swaminathan@sun.com
+
+
+ David Singer
+ Apple Computer, Inc.
+ One Infinite Loop, MS:302-3MT
+ Cupertino CA 95014
+
+ EMail: singer@apple.com
+
+
+ Philippe Gentric
+ Philips Electronics
+ 51 rue Carnot
+ 92156 Suresnes
+ France
+
+ EMail: philippe.gentric@philips.com
+
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 42]
+
+RFC 3640 Transport of MPEG-4 Elementary Streams November 2003
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2003). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assignees.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+van der Meer, et al. Standards Track [Page 43]
+