diff options
Diffstat (limited to 'doc/rfc/rfc3016.txt')
-rw-r--r-- | doc/rfc/rfc3016.txt | 1179 |
1 files changed, 1179 insertions, 0 deletions
diff --git a/doc/rfc/rfc3016.txt b/doc/rfc/rfc3016.txt new file mode 100644 index 0000000..3328c1a --- /dev/null +++ b/doc/rfc/rfc3016.txt @@ -0,0 +1,1179 @@ + + + + + + +Network Working Group Y. Kikuchi +Request for Comments: 3016 Toshiba +Category: Standards Track T. Nomura + NEC + S. Fukunaga + Oki + Y. Matsui + Matsushita + H. Kimata + NTT + November 2000 + + + RTP Payload Format for MPEG-4 Audio/Visual Streams + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2000). All Rights Reserved. + +Abstract + + This document describes Real-Time Transport Protocol (RTP) payload + formats for carrying each of MPEG-4 Audio and MPEG-4 Visual + bitstreams without using MPEG-4 Systems. For the purpose of directly + mapping MPEG-4 Audio/Visual bitstreams onto RTP packets, it provides + specifications for the use of RTP header fields and also specifies + fragmentation rules. It also provides specifications for + Multipurpose Internet Mail Extensions (MIME) type registrations and + the use of Session Description Protocol (SDP). + +1. Introduction + + The RTP payload formats described in this document specify how MPEG-4 + Audio [3][5] and MPEG-4 Visual streams [2][4] are to be fragmented + and mapped directly onto RTP packets. + + These RTP payload formats enable transport of MPEG-4 Audio/Visual + streams without using the synchronization and stream management + functionality of MPEG-4 Systems [6]. Such RTP payload formats will + be used in systems that have intrinsic stream management + + + +Kikuchi, et al. Standards Track [Page 1] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + functionality and thus require no such functionality from MPEG-4 + Systems. H.323 terminals are an example of such systems, where + MPEG-4 Audio/Visual streams are not managed by MPEG-4 Systems Object + Descriptors but by H.245. The streams are directly mapped onto RTP + packets without using MPEG-4 Systems Sync Layer. Other examples are + SIP and RTSP where MIME and SDP are used. MIME types and SDP usages + of the RTP payload formats described in this document are defined to + directly specify the attribute of Audio/Visual streams (e.g., media + type, packetization format and codec configuration) without using + MPEG-4 Systems. The obvious benefit is that these MPEG-4 + Audio/Visual RTP payload formats can be handled in an unified way + together with those formats defined for non-MPEG-4 codecs. The + disadvantage is that interoperability with environments using MPEG-4 + Systems may be difficult, other payload formats may be better suited + to those applications. + + The semantics of RTP headers in such cases need to be clearly + defined, including the association with MPEG-4 Audio/Visual data + elements. In addition, it is beneficial to define the fragmentation + rules of RTP packets for MPEG-4 Video streams so as to enhance error + resiliency by utilizing the error resilience tools provided inside + the MPEG-4 Video stream. + +1.1 MPEG-4 Visual RTP payload format + + MPEG-4 Visual is a visual coding standard with many new features: + high coding efficiency; high error resiliency; multiple, arbitrary + shape object-based coding; etc. [2]. It covers a wide range of + bitrates from scores of Kbps to several Mbps. It also covers a wide + variety of networks, ranging from those guaranteed to be almost + error-free to mobile networks with high error rates. + + With respect to the fragmentation rules for an MPEG-4 Visual + bitstream defined in this document, since MPEG-4 Visual is used for a + wide variety of networks, it is desirable not to apply too much + restriction on fragmentation, and a fragmentation rule such as "a + single video packet shall always be mapped on a single RTP packet" + may be inappropriate. On the other hand, careless, media unaware + fragmentation may cause degradation in error resiliency and bandwidth + efficiency. The fragmentation rules described in this document are + flexible but manage to define the minimum rules for preventing + meaningless fragmentation while utilizing the error resilience + functionalities of MPEG-4 Visual. + + The fragmentation rule recommends not to map more than one VOP in an + RTP packet so that the RTP timestamp uniquely indicates the VOP time + framing. On the other hand, MPEG-4 video may generate VOPs of very + small size, in cases with an empty VOP (vop_coded=0) containing only + + + +Kikuchi, et al. Standards Track [Page 2] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + VOP header or an arbitrary shaped VOP with a small number of coding + blocks. To reduce the overhead for such cases, the fragmentation + rule permits concatenating multiple VOPs in an RTP packet. (See + fragmentation rule (4) in section 3.2 and marker bit and timestamp in + section 3.1.) + + While the additional media specific RTP header defined for such video + coding tools as H.261 or MPEG-1/2 is effective in helping to recover + picture headers corrupted by packet losses, MPEG-4 Visual has already + error resilience functionalities for recovering corrupt headers, and + these can be used on RTP/IP networks as well as on other networks + (H.223/mobile, MPEG-2/TS, etc.). Therefore, no extra RTP header + fields are defined in this MPEG-4 Visual RTP payload format. + +1.2 MPEG-4 Audio RTP payload format + + MPEG-4 Audio is a new kind of audio standard that integrates many + different types of audio coding tools. Low-overhead MPEG-4 Audio + Transport Multiplex (LATM) manages the sequences of audio data with + relatively small overhead. In audio-only applications, then, it is + desirable for LATM-based MPEG-4 Audio bitstreams to be directly + mapped onto the RTP packets without using MPEG-4 Systems. + + While LATM has several multiplexing features as follows; + + - Carrying configuration information with audio data, + - Concatenation of multiple audio frames in one audio stream, + - Multiplexing multiple objects (programs), + - Multiplexing scalable layers, + + in RTP transmission there is no need for the last two features. + Therefore, these two features MUST NOT be used in applications based + on RTP packetization specified by this document. Since LATM has been + developed for only natural audio coding tools, i.e., not for + synthesis tools, it seems difficult to transmit Structured Audio (SA) + data and Text to Speech Interface (TTSI) data by LATM. Therefore, SA + data and TTSI data MUST NOT be transported by the RTP packetization + in this document. + + For transmission of scalable streams, audio data of each layer SHOULD + be packetized onto different RTP packets allowing for the different + layers to be treated differently at the IP level, for example via + some means of differentiated service. On the other hand, all + configuration data of the scalable streams are contained in one LATM + configuration data "StreamMuxConfig" and every scalable layer shares + the StreamMuxConfig. The mapping between each layer and its + configuration data is achieved by LATM header information attached to + + + + +Kikuchi, et al. Standards Track [Page 3] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + the audio data. In order to indicate the dependency information of + the scalable streams, a restriction is applied to the dynamic + assignment rule of payload type (PT) values (see section 4.2). + + For MPEG-4 Audio coding tools, as is true for other audio coders, if + the payload is a single audio frame, packet loss will not impair the + decodability of adjacent packets. Therefore, the additional media + specific header for recovering errors will not be required for MPEG-4 + Audio. Existing RTP protection mechanisms, such as Generic Forward + Error Correction (RFC 2733) and Redundant Audio Data (RFC 2198), MAY + be applied to improve error resiliency. + +2. Conventions used in this document + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC-2119 [7]. + +3. RTP Packetization of MPEG-4 Visual bitstream + + This section specifies RTP packetization rules for MPEG-4 Visual + content. An MPEG-4 Visual bitstream is mapped directly onto RTP + packets without the addition of extra header fields or any removal of + Visual syntax elements. The Combined Configuration/Elementary stream + mode MUST be used so that configuration information will be carried + to the same RTP port as the elementary stream. (see 6.2.1 "Start + codes" of ISO/IEC 14496-2 [2][9][4]) The configuration information + MAY additionally be specified by some out-of-band means. If needed + for an H.323 terminal, H.245 codepoint + "decoderConfigurationInformation" MUST be used for this purpose. If + needed by systems using MIME content type and SDP parameters, e.g., + SIP and RTSP, the optional parameter "config" MUST be used to specify + the configuration information (see 5.1 and 5.2). + + When the short video header mode is used, the RTP payload format for + H.263 SHOULD be used (the format defined in RFC 2429 is RECOMMENDED, + but the RFC 2190 format MAY be used for compatibility with older + implementations). + + + + + + + + + + + + + +Kikuchi, et al. Standards Track [Page 4] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + +0 1 2 3 +0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +|V=2|P|X| CC |M| PT | sequence number | RTP ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| timestamp | Header ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| synchronization source (SSRC) identifier | ++=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +| contributing source (CSRC) identifiers | +| .... | ++=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +| | RTP +| MPEG-4 Visual stream (byte aligned) | Pay- +| | load +| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| :...OPTIONAL RTP padding | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 1 - An RTP packet for MPEG-4 Visual stream + +3.1 Use of RTP header fields for MPEG-4 Visual + + Payload Type (PT): The assignment of an RTP payload type for this new + packet format is outside the scope of this document, and will not be + specified here. It is expected that the RTP profile for a particular + class of applications will assign a payload type for this encoding, + or if that is not done then a payload type in the dynamic range SHALL + be chosen by means of an out of band signaling protocol (e.g., H.245, + SIP, etc). + + Extension (X) bit: Defined by the RTP profile used. + + Sequence Number: Incremented by one for each RTP data packet sent, + starting, for security reasons, with a random initial value. + + Marker (M) bit: The marker bit is set to one to indicate the last RTP + packet (or only RTP packet) of a VOP. When multiple VOPs are carried + in the same RTP packet, the marker bit is set to one. + + Timestamp: The timestamp indicates the sampling instance of the VOP + contained in the RTP packet. A constant offset, which is random, is + added for security reasons. + + - When multiple VOPs are carried in the same RTP packet, the + timestamp indicates the earliest of the VOP times within the VOPs + carried in the RTP packet. Timestamp information of the rest of + + + + +Kikuchi, et al. Standards Track [Page 5] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + the VOPs are derived from the timestamp fields in the VOP header + (modulo_time_base and vop_time_increment). + - If the RTP packet contains only configuration information and/or + Group_of_VideoObjectPlane() fields, the timestamp of the next VOP + in the coding order is used. + - If the RTP packet contains only visual_object_sequence_end_code + information, the timestamp of the immediately preceding VOP in the + coding order is used. + + The resolution of the timestamp is set to its default value of 90kHz, + unless specified by an out-of-band means (e.g., SDP parameter or MIME + parameter as defined in section 5). + + Other header fields are used as described in RFC 1889 [8]. + +3.2 Fragmentation of MPEG-4 Visual bitstream + + A fragmented MPEG-4 Visual bitstream is mapped directly onto the RTP + payload without any addition of extra header fields or any removal of + Visual syntax elements. The Combined Configuration/Elementary + streams mode is used. The following rules apply for the + fragmentation. + + In the following, header means one of the following: + + - Configuration information (Visual Object Sequence Header, Visual + Object Header and Video Object Layer Header) + - visual_object_sequence_end_code + - The header of the entry point function for an elementary stream + (Group_of_VideoObjectPlane() or the header of VideoObjectPlane(), + video_plane_with_short_header(), MeshObject() or FaceObject()) + - The video packet header (video_packet_header() excluding + next_resync_marker()) + - The header of gob_layer() + See 6.2.1 "Start codes" of ISO/IEC 14496-2 [2][9][4] for the + definition of the configuration information and the entry point + functions. + + (1) Configuration information and Group_of_VideoObjectPlane() fields + SHALL be placed at the beginning of the RTP payload (just after the + RTP header) or just after the header of the syntactically upper layer + function. + + (2) If one or more headers exist in the RTP payload, the RTP payload + SHALL begin with the header of the syntactically highest function. + Note: The visual_object_sequence_end_code is regarded as the lowest + function. + + + + +Kikuchi, et al. Standards Track [Page 6] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + (3) A header SHALL NOT be split into a plurality of RTP packets. + + (4) Different VOPs SHOULD be fragmented into different RTP packets so + that one RTP packet consists of the data bytes associated with a + unique VOP time instance (that is indicated in the timestamp field in + the RTP packet header), with the exception that multiple consecutive + VOPs MAY be carried within one RTP packet in the decoding order if + the size of the VOPs is small. + + Note: When multiple VOPs are carried in one RTP payload, the + timestamp of the VOPs after the first one may be calculated by the + decoder. This operation is necessary only for RTP packets in which + the marker bit equals to one and the beginning of RTP payload + corresponds to a start code. (See timestamp and marker bit in section + 3.1.) + + (5) It is RECOMMENDED that a single video packet is sent as a single + RTP packet. The size of a video packet SHOULD be adjusted in such a + way that the resulting RTP packet is not larger than the path-MTU. + Note: Rule (5) does not apply when the video packet is disabled by + the coder configuration (by setting resync_marker_disable in the VOL + header to 1), or in coding tools where the video packet is not + supported. In this case, a VOP MAY be split at arbitrary byte- + positions. + + The video packet starts with the VOP header or the video packet + header, followed by motion_shape_texture(), and ends with + next_resync_marker() or next_start_code(). + +3.3 Examples of packetized MPEG-4 Visual bitstream + + Figure 2 shows examples of RTP packets generated based on the + criteria described in 3.2 + + (a) is an example of the first RTP packet or the random access point + of an MPEG-4 Visual bitstream containing the configuration + information. According to criterion (1), the Visual Object Sequence + Header(VS header) is placed at the beginning of the RTP payload, + preceding the Visual Object Header and the Video Object Layer + Header(VO header, VOL header). Since the fragmentation rule defined + in 3.2 guarantees that the configuration information, starting with + visual_object_sequence_start_code, is always placed at the beginning + of the RTP payload, RTP receivers can detect the random access point + by checking if the first 32-bit field of the RTP payload is + visual_object_sequence_start_code. + + + + + + +Kikuchi, et al. Standards Track [Page 7] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + (b) is another example of the RTP packet containing the configuration + information. It differs from example (a) in that the RTP packet also + contains a video packet in the VOP following the configuration + information. Since the length of the configuration information is + relatively short (typically scores of bytes) and an RTP packet + containing only the configuration information may thus increase the + overhead, the configuration information and the immediately following + GOV and/or (a part of) VOP can be packetized into a single RTP packet + as in this example. + + (c) is an example of an RTP packet that contains + Group_of_VideoObjectPlane(GOV). Following criterion (1), the GOV is + placed at the beginning of the RTP payload. It would be a waste of + RTP/IP header overhead to generate an RTP packet containing only a + GOV whose length is 7 bytes. Therefore, (a part of) the following + VOP can be placed in the same RTP packet as shown in (c). + + (d) is an example of the case where one video packet is packetized + into one RTP packet. When the packet-loss rate of the underlying + network is high, this kind of packetization is recommended. Even + when the RTP packet containing the VOP header is discarded by a + packet loss, the other RTP packets can be decoded by using the + HEC(Header Extension Code) information in the video packet header. + No extra RTP header field is necessary. + + (e) is an example of the case where more than one video packet is + packetized into one RTP packet. This kind of packetization is + effective to save the overhead of RTP/IP headers when the bit-rate of + the underlying network is low. However, it will decrease the + packet-loss resiliency because multiple video packets are discarded + by a single RTP packet loss. The optimal number of video packets in + an RTP packet and the length of the RTP packet can be determined + considering the packet-loss rate and the bit-rate of the underlying + network. + + (f) is an example of the case when the video packet is disabled by + setting resync_marker_disable in the VOL header to 1. In this case, + a VOP may be split into a plurality of RTP packets at arbitrary + byte-positions. For example, it is possible to split a VOP into + fixed-length packets. This kind of coder configuration and RTP + packet fragmentation may be used when the underlying network is + guaranteed to be error-free. On the other hand, it is not + recommended to use it in error-prone environment since it provides + only poor packet loss resiliency. + + Figure 3 shows examples of RTP packets prohibited by the criteria of + 3.2. + + + + +Kikuchi, et al. Standards Track [Page 8] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + Fragmentation of a header into multiple RTP packets, as in (a), will + not only increase the overhead of RTP/IP headers but also decrease + the error resiliency. Therefore, it is prohibited by the criterion + (3). + + When concatenating more than one video packets into an RTP packet, + VOP header or video_packet_header() shall not be placed in the middle + of the RTP payload. The packetization as in (b) is not allowed by + criterion (2) due to the aspect of the error resiliency. Comparing + this example with Figure 2(d), although two video packets are mapped + onto two RTP packets in both cases, the packet-loss resiliency is not + identical. Namely, if the second RTP packet is lost, both video + packets 1 and 2 are lost in the case of Figure 3(b) whereas only + video packet 2 is lost in the case of Figure 2(d). + + +------+------+------+------+ +(a) | RTP | VS | VO | VOL | + |header|header|header|header| + +------+------+------+------+ + + +------+------+------+------+------------+ +(b) | RTP | VS | VO | VOL |Video Packet| + |header|header|header|header| | + +------+------+------+------+------------+ + + +------+-----+------------------+ +(c) | RTP | GOV |Video Object Plane| + |header| | | + +------+-----+------------------+ + + +------+------+------------+ +------+------+------------+ +(d) | RTP | VOP |Video Packet| | RTP | VP |Video Packet| + |header|header| (1) | |header|header| (2) | + +------+------+------------+ +------+------+------------+ + + +------+------+------------+------+------------+------+------------+ +(e) | RTP | VP |Video Packet| VP |Video Packet| VP |Video Packet| + |header|header| (1) |header| (2) |header| (3) | + +------+------+------------+------+------------+------+------------+ + + +------+------+------------+ +------+------------+ +(f) | RTP | VOP |VOP fragment| | RTP |VOP fragment| + |header|header| (1) | |header| (2) | ___ + +------+------+------------+ +------+------------+ + + Figure 2 - Examples of RTP packetized MPEG-4 Visual bitstream + + + + + +Kikuchi, et al. Standards Track [Page 9] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + +------+-------------+ +------+------------+------------+ +(a) | RTP |First half of| | RTP |Last half of|Video Packet| + |header| VP header | |header| VP header | | + +------+-------------+ +------+------------+------------+ + + +------+------+----------+ +------+---------+------+------------+ +(b) | RTP | VOP |First half| | RTP |Last half| VP |Video Packet| + |header|header| of VP(1) | |header| of VP(1)|header| (2) | + +------+------+----------+ +------+---------+------+------------+ + + Figure 3 - Examples of prohibited RTP packetization for MPEG-4 Visual + bitstream + +4. RTP Packetization of MPEG-4 Audio bitstream + + This section specifies RTP packetization rules for MPEG-4 Audio + bitstreams. MPEG-4 Audio streams MUST be formatted by LATM (Low- + overhead MPEG-4 Audio Transport Multiplex) tool [5], and the LATM- + based streams are then mapped onto RTP packets as described the three + sections below. + +4.1 RTP Packet Format + + LATM-based streams consist of a sequence of audioMuxElements that + include one or more audio frames. A complete audioMuxElement or a + part of one SHALL be mapped directly onto an RTP payload without any + removal of audioMuxElement syntax elements (see Figure 4). The first + byte of each audioMuxElement SHALL be located at the first payload + location in an RTP packet. + + + + + + + + + + + + + + + + + + + + + + +Kikuchi, et al. Standards Track [Page 10] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + +0 1 2 3 +0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +|V=2|P|X| CC |M| PT | sequence number |RTP ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| timestamp |Header ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| synchronization source (SSRC) identifier | ++=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +| contributing source (CSRC) identifiers | +| .... | ++=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ +| |RTP +: audioMuxElement (byte aligned) :Payload +| | +| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| :...OPTIONAL RTP padding | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 4 - An RTP packet for MPEG-4 Audio + + In order to decode the audioMuxElement, the following + muxConfigPresent information is required to be indicated by an out- + of-band means. When SDP is utilized for this indication, MIME + parameter "cpresent" corresponds to the muxConfigPresent information + (see section 5.3). + + muxConfigPresent: If this value is set to 1 (in-band mode), the + audioMuxElement SHALL include an indication bit "useSameStreamMux" + and MAY include the configuration information for audio compression + "StreamMuxConfig". The useSameStreamMux bit indicates whether the + StreamMuxConfig element in the previous frame is applied in the + current frame. If the useSameStreamMux bit indicates to use the + StreamMuxConfig from the previous frame, but if the previous frame + has been lost, the current frame may not be decodable. Therefore, in + case of in-band mode, the StreamMuxConfig element SHOULD be + transmitted repeatedly depending on the network condition. On the + other hand, if muxConfigPresent is set to 0 (out-band mode), the + StreamMuxConfig element is required to be transmitted by an out-of- + band means. In case of SDP, MIME parameter "config" is utilized (see + section 5.3). + +4.2 Use of RTP Header Fields for MPEG-4 Audio + + Payload Type (PT): The assignment of an RTP payload type for this new + packet format is outside the scope of this document, and will not be + specified here. It is expected that the RTP profile for a particular + class of applications will assign a payload type for this encoding, + + + +Kikuchi, et al. Standards Track [Page 11] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + or if that is not done then a payload type in the dynamic range shall + be chosen by means of an out of band signaling protocol (e.g., H.245, + SIP, etc). In the dynamic assignment of RTP payload types for + scalable streams, a different value SHOULD be assigned to each layer. + The assigned values SHOULD be in order of enhance layer dependency, + where the base layer has the smallest value. + + Marker (M) bit: The marker bit indicates audioMuxElement boundaries. + It is set to one to indicate that the RTP packet contains a complete + audioMuxElement or the last fragment of an audioMuxElement. + + Timestamp: The timestamp indicates the sampling instance of the first + audio frame contained in the RTP packet. Timestamps are recommended + to start at a random value for security reasons. + + Unless specified by an out-of-band means, the resolution of the + timestamp is set to its default value of 90 kHz. + + Sequence Number: Incremented by one for each RTP packet sent, + starting, for security reasons, with a random value. + + Other header fields are used as described in RFC 1889 [8]. + +4.3 Fragmentation of MPEG-4 Audio bitstream + + It is RECOMMENDED to put one audioMuxElement in each RTP packet. If + the size of an audioMuxElement can be kept small enough that the size + of the RTP packet containing it does not exceed the size of the + path-MTU, this will be no problem. If it cannot, the audioMuxElement + MAY be fragmented and spread across multiple packets. + +5. MIME type registration for MPEG-4 Audio/Visual streams + + The following sections describe the MIME type registrations for + MPEG-4 Audio/Visual streams. MIME type registration and SDP usage + for the MPEG-4 Visual stream are described in Sections 5.1 and 5.2, + respectively, while MIME type registration and SDP usage for MPEG-4 + Audio stream are described in Sections 5.3 and 5.4, respectively. + +5.1 MIME type registration for MPEG-4 Visual + + MIME media type name: video + + MIME subtype name: MP4V-ES + + Required parameters: none + + Optional parameters: + + + +Kikuchi, et al. Standards Track [Page 12] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + rate: This parameter is used only for RTP transport. It indicates + the resolution of the timestamp field in the RTP header. If this + parameter is not specified, its default value of 90000 (90kHz) is + used. + + profile-level-id: A decimal representation of MPEG-4 Visual + Profile and Level indication value (profile_and_level_indication) + defined in Table G-1 of ISO/IEC 14496-2 [2][4]. This parameter + MAY be used in the capability exchange or session setup procedure + to indicate MPEG-4 Visual Profile and Level combination of which + the MPEG-4 Visual codec is capable. If this parameter is not + specified by the procedure, its default value of 1 (Simple + Profile/Level 1) is used. + + config: This parameter SHALL be used to indicate the configuration + of the corresponding MPEG-4 Visual bitstream. It SHALL NOT be + used to indicate the codec capability in the capability exchange + procedure. It is a hexadecimal representation of an octet string + that expresses the MPEG-4 Visual configuration information, as + defined in subclause 6.2.1 Start codes of ISO/IEC14496-2 + [2][4][9]. The configuration information is mapped onto the octet + string in an MSB-first basis. The first bit of the configuration + information SHALL be located at the MSB of the first octet. The + configuration information indicated by this parameter SHALL be the + same as the configuration information in the corresponding MPEG-4 + Visual stream, except for first_half_vbv_occupancy and + latter_half_vbv_occupancy, if exist, which may vary in the + repeated configuration information inside an MPEG-4 Visual stream + (See 6.2.1 Start codes of ISO/IEC14496-2). + + Example usages for these parameters are: + + - MPEG-4 Visual Simple Profile/Level 1: + Content-type: video/mp4v-es; profile-level-id=1 + + - MPEG-4 Visual Core Profile/Level 2: + Content-type: video/mp4v-es; profile-level-id=34 + + - MPEG-4 Visual Advanced Real Time Simple Profile/Level 1: + Content-type: video/mp4v-es; profile-level-id=145 + + Published specification: + The specifications for MPEG-4 Visual streams are presented in + ISO/IEC 14469-2 [2][4][9]. The RTP payload format is described in + RFC 3016. + + + + + + +Kikuchi, et al. Standards Track [Page 13] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + Encoding considerations: + Video bitstreams MUST be generated according to MPEG-4 Visual + specifications (ISO/IEC 14496-2). A video bitstream is binary + data and MUST be encoded for non-binary transport (for Email, the + Base64 encoding is sufficient). This type is also defined for + transfer via RTP. The RTP packets MUST be packetized according to + the MPEG-4 Visual RTP payload format defined in RFC 3016. + + Security considerations: + See section 6 of RFC 3016. + + Interoperability considerations: + MPEG-4 Visual provides a large and rich set of tools for the + coding of visual objects. For effective implementation of the + standard, subsets of the MPEG-4 Visual tool sets have been + provided for use in specific applications. These subsets, called + 'Profiles', limit the size of the tool set a decoder is required + to implement. In order to restrict computational complexity, one + or more Levels are set for each Profile. A Profile@Level + combination allows: + + o a codec builder to implement only the subset of the standard he + needs, while maintaining interworking with other MPEG-4 devices + included in the same combination, and + + o checking whether MPEG-4 devices comply with the standard (' + conformance testing'). + + The visual stream SHALL be compliant with the MPEG-4 Visual + Profile@Level specified by the parameter "profile-level-id". + Interoperability between a sender and a receiver may be achieved + by specifying the parameter "profile-level-id" in MIME content, or + by arranging in the capability exchange/announcement procedure to + set this parameter mutually to the same value. + + Applications which use this media type: + Audio and visual streaming and conferencing tools, Internet + messaging and Email applications. + + Additional information: none + + Person & email address to contact for further information: + The authors of RFC 3016. (See section 8.) + + Intended usage: COMMON + + Author/Change controller: + The authors of RFC 3016. (See section 8.) + + + +Kikuchi, et al. Standards Track [Page 14] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + +5.2 SDP usage of MPEG-4 Visual + + The MIME media type video/MP4V-ES string is mapped to fields in the + Session Description Protocol (SDP), RFC 2327, as follows: + + o The MIME type (video) goes in SDP "m=" as the media name. + + o The MIME subtype (MP4V-ES) goes in SDP "a=rtpmap" as the encoding + name. + + o The optional parameter "rate" goes in "a=rtpmap" as the clock + rate. + + o The optional parameter "profile-level-id" and "config" go in the + "a=fmtp" line to indicate the coder capability and configuration, + respectively. These parameters are expressed as a MIME media type + string, in the form of as a semicolon separated list of + parameter=value pairs. + + The following are some examples of media representation in SDP: + +Simple Profile/Level 1, rate=90000(90kHz), "profile-level-id" and +"config" are present in "a=fmtp" line: + m=video 49170/2 RTP/AVP 98 + a=rtpmap:98 MP4V-ES/90000 + a=fmtp:98 profile-level-id=1;config=000001B001000001B509000001000000012 + 0008440FA282C2090A21F + +Core Profile/Level 2, rate=90000(90kHz), "profile-level-id" is present in +"a=fmtp" line: + m=video 49170/2 RTP/AVP 98 + a=rtpmap:98 MP4V-ES/90000 + a=fmtp:98 profile-level-id=34 + +Advance Real Time Simple Profile/Level 1, rate=90000(90kHz), +"profile-level-id" is present in "a=fmtp" line: + m=video 49170/2 RTP/AVP 98 + a=rtpmap:98 MP4V-ES/90000 + a=fmtp:98 profile-level-id=145 + +5.3 MIME type registration of MPEG-4 Audio + + MIME media type name: audio + + MIME subtype name: MP4A-LATM + + + + + + +Kikuchi, et al. Standards Track [Page 15] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + Required parameters: + rate: the rate parameter indicates the RTP time stamp clock rate. + The default value is 90000. Other rates MAY be specified only if + they are set to the same value as the audio sampling rate (number + of samples per second). + + Optional parameters: + profile-level-id: a decimal representation of MPEG-4 Audio Profile + Level indication value defined in ISO/IEC 14496-1 ([6] and its + amendments). This parameter indicates which MPEG-4 Audio tool + subsets the decoder is capable of using. If this parameter is not + specified in the capability exchange or session setup procedure, + its default value of 30 (Natural Audio Profile/Level 1) is used. + + object: a decimal representation of the MPEG-4 Audio Object Type + value defined in ISO/IEC 14496-3 [5]. This parameter specifies + the tool to be used by the coder. It CAN be used to limit the + capability within the specified "profile-level-id". + + bitrate: the data rate for the audio bit stream. + + cpresent: a boolean parameter indicates whether audio payload + configuration data has been multiplexed into an RTP payload (see + section 4.1). A 0 indicates the configuration data has not been + multiplexed into an RTP payload, a 1 indicates that it has. The + default if the parameter is omitted is 1. + + config: a hexadecimal representation of an octet string that + expresses the audio payload configuration data "StreamMuxConfig", + as defined in ISO/IEC 14496-3 [5] (see section 4.1). + Configuration data is mapped onto the octet string in an MSB-first + basis. The first bit of the configuration data SHALL be located + at the MSB of the first octet. In the last octet, zero-padding + bits, if necessary, SHALL follow the configuration data. + + ptime: RECOMMENDED duration of each packet in milliseconds. + + Published specification: + Payload format specifications are described in this document. + Encoding specifications are provided in ISO/IEC 14496-3 [3][5]. + + Encoding considerations: + This type is only defined for transfer via RTP. + + Security considerations: + See Section 6 of RFC 3016. + + + + + +Kikuchi, et al. Standards Track [Page 16] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + Interoperability considerations: + MPEG-4 Audio provides a large and rich set of tools for the coding + of audio objects. For effective implementation of the standard, + subsets of the MPEG-4 Audio tool sets similar to those used in + MPEG-4 Visual have been provided (see section 5.1). + + The audio stream SHALL be compliant with the MPEG-4 Audio + Profile@Level specified by the parameter "profile-level-id". + Interoperability between a sender and a receiver may be achieved + by specifying the parameter "profile-level-id" in MIME content, or + by arranging in the capability exchange procedure to set this + parameter mutually to the same value. Furthermore, the "object" + parameter can be used to limit the capability within the specified + Profile@Level in capability exchange. + + Applications which use this media type: + Audio and video streaming and conferencing tools. + + Additional information: none + + Personal & email address to contact for further information: + See Section 8 of RFC 3016. + + Intended usage: COMMON + + Author/Change controller: + See Section 8 of RFC 3016. + +5.4 SDP usage of MPEG-4 Audio + + The MIME media type audio/MP4A-LATM string is mapped to fields in the + Session Description Protocol (SDP), RFC 2327, as follows: + + o The MIME type (audio) goes in SDP "m=" as the media name. + + o The MIME subtype (MP4A-LATM) goes in SDP "a=rtpmap" as the + encoding name. + + o The required parameter "rate" goes in "a=rtpmap" as the clock + rate. + + o The optional parameter "ptime" goes in SDP "a=ptime" attribute. + + o The optional parameter "profile-level-id" goes in the "a=fmtp" + line to indicate the coder capability. The "object" parameter + goes in the "a=fmtp" attribute. The payload-format-specific + parameters + + + + +Kikuchi, et al. Standards Track [Page 17] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + "bitrate", "cpresent" and "config" go in the "a=fmtp" line. These + parameters are expressed as a MIME media type string, in the form + of as a semicolon separated list of parameter=value pairs. + + The following are some examples of the media representation in SDP: + +For 6 kb/s CELP bitstreams (with an audio sampling rate of 8 kHz), + m=audio 49230 RTP/AVP 96 + a=rtpmap:96 MP4A-LATM/8000 + a=fmtp:96 profile-level-id=9;object=8;cpresent=0;config=9128B1071070 + a=ptime:20 + + For 64 kb/s AAC LC stereo bitstreams (with an audio sampling rate of + 24 kHz), + + m=audio 49230 RTP/AVP 96 + a=rtpmap:96 MP4A-LATM/24000 + a=fmtp:96 profile-level-id=1; bitrate=64000; cpresent=0; + config=9122620000 + + In the above two examples, audio configuration data is not + multiplexed into the RTP payload and is described only in SDP. + Furthermore, the "clock rate" is set to the audio sampling rate. + + If the clock rate has been set to its default value and it is + necessary to obtain the audio sampling rate, this can be done by + parsing the "config" parameter (see the following example). + + m=audio 49230 RTP/AVP 96 + a=rtpmap:96 MP4A-LATM/90000 + a=fmtp:96 object=8; cpresent=0; config=9128B1071070 + + The following example shows that the audio configuration data appears + in the RTP payload. + + m=audio 49230 RTP/AVP 96 + a=rtpmap:96 MP4A-LATM/90000 + a=fmtp:96 object=2; cpresent=1 + +6. Security Considerations + + RTP packets using the payload format defined in this specification + are subject to the security considerations discussed in the RTP + specification [8]. This implies that confidentiality of the media + streams is achieved by encryption. Because the data compression used + with this payload format is applied end-to-end, encryption may be + performed on the compressed data so there is no conflict between the + two operations. + + + +Kikuchi, et al. Standards Track [Page 18] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + + The complete MPEG-4 system allows for transport of a wide range of + content, including Java applets (MPEG-J) and scripts. Since this + payload format is restricted to audio and video streams, it is not + possible to transport such active content in this format. + +7. References + + 1 Bradner, S., "The Internet Standards Process -- Revision 3", BCP + 9, RFC 2026, October 1996. + + 2 ISO/IEC 14496-2:1999, "Information technology - Coding of audio- + visual objects - Part2: Visual". + + 3 ISO/IEC 14496-3:1999, "Information technology - Coding of audio- + visual objects - Part3: Audio". + + 4 ISO/IEC 14496-2:1999/Amd.1:2000, "Information technology - Coding + of audio-visual objects - Part 2: Visual, Amendment 1: Visual + extensions". + + 5 ISO/IEC 14496-3:1999/Amd.1:2000, "Information technology - Coding + of audio-visual objects - Part3: Audio, Amendment 1: Audio + extensions". + + 6 ISO/IEC 14496-1:1999, "Information technology - Coding of audio- + visual objects - Part1: Systems". + + 7 Bradner, S., "Key words for use in RFCs to Indicate Requirement + Levels", BCP 14, RFC 2119, March 1997. + + 8 Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson "RTP: A + Transport Protocol for Real Time Applications", RFC 1889, January + 1996. + + 9 ISO/IEC 14496-2:1999/Cor.1:2000, "Information technology - Coding + of audio-visual objects - Part2: Visual, Technical corrigendum 1". + + + + + + + + + + + + + + + +Kikuchi, et al. Standards Track [Page 19] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + +8. Authors' Addresses + + Yoshihiro Kikuchi + Toshiba corporation + 1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki, 212-8582, Japan + + EMail: yoshihiro.kikuchi@toshiba.co.jp + + + Yoshinori Matsui + Matsushita Electric Industrial Co., LTD. + 1006, Kadoma, Kadoma-shi, Osaka, Japan + + EMail: matsui@drl.mei.co.jp + + + Toshiyuki Nomura + NEC Corporation + 4-1-1,Miyazaki,Miyamae-ku,Kawasaki,JAPAN + + EMail: t-nomura@ccm.cl.nec.co.jp + + + Shigeru Fukunaga + Oki Electric Industry Co., Ltd. + 1-2-27 Shiromi, Chuo-ku, Osaka 540-6025 Japan. + + EMail: fukunaga444@oki.co.jp + + + Hideaki Kimata + Nippon Telegraph and Telephone Corporation + 1-1, Hikari-no-oka, Yokosuka-shi, Kanagawa, Japan + + EMail: kimata@nttvdt.hil.ntt.co.jp + + + + + + + + + + + + + + + + +Kikuchi, et al. Standards Track [Page 20] + +RFC 3016 RTP Payload Format for MPEG-4 Audio/Visual November 2000 + + +9. Full Copyright Statement + + Copyright (C) The Internet Society (2000). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Kikuchi, et al. Standards Track [Page 21] + |