diff options
Diffstat (limited to 'doc/rfc/rfc2038.txt')
-rw-r--r-- | doc/rfc/rfc2038.txt | 619 |
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc2038.txt b/doc/rfc/rfc2038.txt new file mode 100644 index 0000000..0a4673d --- /dev/null +++ b/doc/rfc/rfc2038.txt @@ -0,0 +1,619 @@ + + + + + + +Network Working Group D. Hoffman +Request for Comments: 2038 G. Fernando +Category: Standards Track Sun Microsystems, Inc. + V. Goyal + Precept Software, Inc. + October 1996 + + + RTP Payload Format for MPEG1/MPEG2 Video + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + This memo describes a packetization scheme for MPEG video and audio + streams. The scheme proposed can be used to transport such a video + or audio flow over the transport protocols supported by RTP. Two + approaches are described. The first is designed to support maximum + interoperability with MPEG System environments. The second is + designed to provide maximum compatibility with other RTP-encapsulated + media streams and future conference control work of the IETF. + +1. Introduction + + ISO/IEC JTC1/SC29 WG11 (also referred to as the MPEG committee) has + defined the MPEG1 standard (ISO/IEC 11172)[1] and the MPEG2 standard + (ISO/IEC 13818)[2]. This memo describes a packetization scheme to + transport MPEG video and audio streams using the Real-time Transport + Protocol (RTP), version 2 [3, 4]. + + The MPEG1 specification is defined in three parts: System, Video and + Audio. It is designed primarily for CD-ROM-based applications, and + is optimized for approximately 1.5 Mbits/sec combined data rates. The + video and audio portions of the specification describe the basic + format of the video or audio stream. These formats define the + Elementary Streams (ES). The MPEG1 System specification defines an + encapsulation of the ES that contains Presentation Time Stamps (PTS), + Decoding Time Stamps and System Clock references, and performs + multiplexing of MPEG1 compressed video and audio ES's with user data. + + + + + + +Hoffman, et. al. Standards Track [Page 1] + +RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996 + + + The MPEG2 specification is structured in a similar way. However, it + hasn't been restricted only to CD-ROM applications. The MPEG2 System + specification defines two system stream formats: the MPEG2 Transport + Stream (MTS) and the MPEG2 Program Stream (MPS). The MTS is tailored + for communicating or storing one or more programs of MPEG2 compressed + data and also other data in relatively error-prone environments. The + MPS is tailored for relatively error-free environments. + + We seek to achieve interoperability among 4 types of end-systems in + the following specification. The 4 types are: + + 1. Transmitting Interworking Unit (TIU) + + Receives MPEG information from a native MTS system for + distribution over packet networks using a native RTP-based + system layer (such as an IP-based internetwork). Examples: + real-time encoder, MTS satellite link to Internet, video + server with MTS-encoded source material. + + 2. Receiving Interworking Unit (RIU) + + Receives MPEG information in real time from an RTP-based + network for forwarding to a native MTS environment. + Examples: Internet-based video server to MTS-based cable + distribution plant. + + 3. Transmitting Internet End-System (TAES) + + Transmits MPEG information generated or stored within the + internet end-system itself, or received from internet-based + computer networks. Example: video server. + + 4. Receiving Internet End-System (RAES) + + Receives MPEG information over an RTP-based internet for + consumption at the internet end-system or forwarding to + traditional computer network. Example: desktop PC or + workstation viewing training video. + + Each of the 2 types of transmitters must work with each of the 2 + types of receivers. Because it is probable that the TAES, and + certain that the RAES, will be based on existing and planned + internet-connected computers, it is highly desirable for the + interoperable protocol to be based on RTP. + + Because of the range of applications that might employ MPEG streams, + we propose to define two payload formats. + + + + +Hoffman, et. al. Standards Track [Page 2] + +RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996 + + + Much interest in the MPEG community is in the use of one of the MPEG + System encodings, and hence, in Section 2 we propose encapsulations + of MPEG1 System streams and MPEG2 Transport and Program Streams with + RTP. This profile supports the full semantics of MPEG System and + offers basic interoperability among all four end-system types. + + When operating only among internet-based end-systems (i.e., TAES and + RAES) a payload format that provides greater compatibility with the + Internet architecture is desired, deferring some of the system issues + to other protocols being defined in the Internet community (such as + the MMUSIC WG). In Section 3 we propose an encapsulation of + compressed video and audio data (referred to in MPEG documentation as + "Elementary Streams" (ES)) complying with either MPEG1 or MPEG2. + Here, neither of the System standards of MPEG1 or MPEG2 are utilized. + The ES's are directly encapsulated with RTP. + + Throughout this specification, we make extensive use of MPEG + terminology. The reader should consult the primary MPEG references + for definitive descriptions of this terminology. + +2. Encapsulation of MPEG System and Transport Streams + + Each RTP packet will contain a timestamp derived from the sender's + 90KHz clock reference. This clock is synchronized to the system + stream Program Clock Reference (PCR) or System Clock Reference (SCR) + and represents the target transmission time of the first byte of the + packet payload. The RTP timestamp will not be passed to the MPEG + decoder. This use of the timestamp is somewhat different than + normally is the case in RTP, in that it is not considered to be the + media display or presentation timestamp. The primary purposes of the + RTP timestamp will be to estimate and reduce any network-induced + jitter and to synchronize relative time drift between the transmitter + and receiver. + + For MPEG2 Transport Streams the RTP payload will contain an integral + number of MPEG transport packets. To avoid end system + inefficiencies, data from multiple small MTS packets (normally fixed + in size at 188 bytes) are aggregated into a single RTP packet. The + number of transport packets contained is computed by dividing RTP + payload length by the length of an MTS packet (188). + + For MPEG2 Program streams and MPEG1 system streams there are no + packetization restrictions; these streams are treated as a packetized + stream of bytes. + + + + + + + +Hoffman, et. al. Standards Track [Page 3] + +RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996 + + +2.1 RTP header usage + + The RTP header fields are used as follows: + + Payload Type: Distinct payload types should be assigned for + of MPEG1 System Streams, MPEG2 Program Streams and MPEG2 + Transport Streams. See [4] for payload type assignments. + + M bit: Set to 1 whenever the timestamp is discontinuous + (such as might happen when a sender switches from one data + source to another). This allows the receiver and any + intervening RTP mixers or translators that are synchronizing + to the flow to ignore the difference between this timestamp + and any previous timestamp in their clock phase detectors. + + timestamp: 32 bit 90K Hz timestamp representing the target + transmission time for the first byte of the packet. + +3. Encapsulation of MPEG Elementary Streams + + The following ES types may be encapsulated directly in RTP: + + (a) MPEG1 Video (ISO/IEC 11172-2) + (b) MPEG2 Video (ISO/IEC 13818-2) + (c) MPEG1 Audio (ISO/IEC 11172-3) + (d) MPEG2 Audio (ISO/IEC 13818-3) + + A distinct RTP payload type is assigned to MPEG1/MPEG2 Video and + MPEG1/MPEG2 Audio, respectively. Further indication as to whether the + data is MPEG1 or MPEG2 need not be provided in the RTP or MPEG- + specific headers of this encapsulation, as this information is + available in the ES headers. + + Presentation Time Stamps (PTS) of 32 bits with an accuracy of 90 kHz + shall be carried in the fixed RTP header. All packets that make up a + audio or video frame shall have the same time stamp. + +3.1 MPEG Video elementary streams + + MPEG1 Video can be distinguished from MPEG2 Video at the video + sequence header, i.e. for MPEG2 Video a sequence_header() is followed + by sequence_extension(). The particular profile and level of MPEG2 + Video (MAIN_Profile@MAIN_Level, HIGH_Profile@HIGH_Level, etc) are + determined by the profile_and_level_indicator field of the + sequence_extension header of MPEG2 Video. + + The MPEG bit-stream semantics were designed for relatively error-free + environments, and there is significant amount of dependency (both + + + +Hoffman, et. al. Standards Track [Page 4] + +RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996 + + + temporal and spatial) within the stream such that loss of some data + make other uncorrupted data useless. The format as defined in this + encapsulation uses application layer framing information plus + additional information in the RTP stream-specific header to allow for + certain recovery mechanisms. Appendix 1 suggests several recovery + strategies based on the properties of this encapsulation. + + Since MPEG pictures can be large, they will normally be fragmented + into packets of size less than a typical LAN/WAN MTU. The following + fragmentation rules apply: + + 1. The MPEG Video_Sequence_Header, when present, will always + be at the beginning of an RTP payload. + 2. An MPEG GOP_header, when present, will always be at the + beginning of the RTP payload, or will follow a + Video_Sequence_Header. + 3. An MPEG Picture_Header, when present, will always be at the + beginning of a RTP payload, or will follow a GOP_header. + + Each ES header must be completely contained within the packet. + Consequently, a minimum RTP payload size of 261 bytes must be + supported to contain the largest single header defined in the ES + (that is, the extension_data() header containing the + quant_matrix_extension()). Otherwise, there are no restrictions on + where headers may appear within packet payloads. + + In MPEG, each picture is made up of one or more "slices," and a slice + is intended to be the unit of recovery from data loss or corruption. + An MPEG-compliant decoder will normally advance to the beginning of + next slice whenever an error is encountered in the stream. MPEG + slice begin and end bits are provided in the encapsulation header to + facilitate this. + + The beginning of a slice must either be the first data in a packet + (after any MPEG ES headers) or must follow after some integral number + of slices in a packet. This requirement insures that the beginning + of the next slice after one with a missing packet can be found + without requiring that the receiver scan the packet contents. Slices + may be fragmented across packets as long as all the above rules are + met. + + An implementation based on this encapsulation assumes that the + Video_Sequence_Header is repeated periodically in the MPEG bit- + stream. In practice (though not required by MPEG standard) this is + used to allow channel switching and to receive and start decoding a + continuously relayed MPEG bit-stream at arbitrary points in the media + stream. It is suggested that when playing back from an MPEG stream + from a file format (where the Video_Sequence_Header may only be + + + +Hoffman, et. al. Standards Track [Page 5] + +RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996 + + + represented at the beginning of the stream) that the first + Video_Sequence_Header (preceded by an end-of-stream indicator) be + saved by the packetizer for periodic injection in to the network + stream. + +3.2 MPEG Audio elementary streams + + MPEG1 Audio can be distinguished from MPEG2 Audio from the MPEG + ancillary_data() header. For either MPEG1 or MPEG2 Audio, distinct + Presentation Time Stamps may be present for frames which correspond + to either 384 samples for Layer-I, or 1152 samples for Layer-II or + Layer-III. The actual number of bytes required to represent this + number of samples will vary depending on the encoder parameters. + + Multiple audio frames may be encapsulated within one RTP packet. In + this case, an integral number of audio frames must be contained + within the packet and the fragmentation header defined in Section 3.5 + shall be set to 0. + + Also, if relatively short packets are to be used, one frame may be so + large that it may straddle multiple RTP packets. For example, for + Layer-II MPEG audio sampled at a rate of 44.1 KHz each frame would + represent a time slot of 26.1 msec. At this sampling rate if the + compressed bit-rate is 384 kbits/sec (i.e. 48 kBytes/sec) then the + average audio frame size would be 1.25 KBytes. If packets were to be + 500 Bytes long, then each audio frame would straddle 3 RTP packets. + The audio fragmentation indicator header (See Section 3.5) shall be + present for an MPEG1/2 Audio payload type to provide for this + fragmentation. + +3.3 RTP Fixed Header for MPEG ES encapsulation + + The RTP header fields are used as follows: + + Payload Type: Distinct payload types should be assigned + for video elementary streams and audio elementary streams. + See [4] for payload type assignments. + + M bit: For video, set to 1 on packet containing MPEG frame + end code, 0 otherwise. For audio, set to 1 on first packet + of a "talk-spurt," 0 otherwise. + + PT: MPEG video or audio stream ID. + + timestamp: 32-bit 90K Hz timestamp representing presentation + time of MPEG picture or audio frame. Same for all packets + that make up a picture or audio frame. May not be + monotonically increasing in video stream if B pictures + + + +Hoffman, et. al. Standards Track [Page 6] + +RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996 + + + present in stream. For packets that contain only a video + sequence and/or GOP header, the timestamp is that of the + subsequent picture. + +3.4 MPEG Video-specific header + + This header shall be attached to each RTP packet after the RTP fixed + header. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| MBZ | TR |MBZ|S|B|E| P | | BFC | | FFC | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + FBV FFV + + MBZ: Unused. Must be set to zero in current + specification. This space is reserved for future use. + + TR: Temporal-Reference (10 bits). The temporal reference of + the current picture within the current GOP. This value + ranges from 0-1023 and is constant for all RTP packets of a + given picture. + + MBZ: Unused. Must be set to zero in current + specification. This space is reserved for future use. + + S: Sequence-header-present (1 bit). Normally 0 and set to 1 at + the occurrence of each MPEG sequence header. Used to + detect presence of sequence header in RTP packet. + + B: Beginning-of-slice (BS) (1 bit). Set when the start of the + packet payload is a slice start code, or when a slice start + code is preceded only by one or more of a + Video_Sequence_Header, GOP_header and/or Picture_Header. + + E: End-of-slice (ES) (1 bit). Set when the last byte of the + payload is the end of an MPEG slice. + + P: Picture-Type (3 bits). I (1), P (2), B (3) or D (4). This + value is constant for each RTP packet of a given picture. + Value 000B is forbidden and 101B - 111B are reserved to + support future extensions to the MPEG ES specification. + + FBV: full_pel_backward_vector + BFC: backward_f_code + FFV: full_pel_forward_vector + FFC: forward_f_code + + + +Hoffman, et. al. Standards Track [Page 7] + +RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996 + + + Obtained from the most recent picture header, and are + constant for each RTP packet of a given picture. None of + these values are used for I frames and must be set to zero + in the RTP header. For P frames only the last two values + are present and FBV and BFC must be set to zero in the RTP + header. For B frames all the four values are present. + +3.5 MPEG Audio-specific header + + This header shall be attached to each RTP packet at the start of the + payload and after any RTP headers for an MPEG1/2 Audio payload type. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +| MBZ | Frag_offset | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Frag_offset: Byte offset into the audio frame for the data + in this packet. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Hoffman, et. al. Standards Track [Page 8] + +RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996 + + +Appendix 1. Error Recovery and Resynchronization Strategies. + + The following error recovery and resynchronization strategies are + intended to be guidelines only. A compliant receiver is free to + employ alternative (or no) strategies. + + When initially decoding an RTP-encapsulated MPEG Elementary Stream, + the receiver may discard all packets until the Sequence-header- + present bit is set to 1. At this point, sufficient state information + is contained in the stream to allow processing by an MPEG decoder. + + Loss of packets containing the GOP_header and/or Picture_Header are + detected by an unexpected change in the Temporal-Reference and + Picture-Type values. Consider the following example GOP sequence: + + In display order: 0B 1B 2I 3B 4B 5P 6B 7B 8P GOP_HDR 0B ... + In stream order: 2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_HDR 2I ... + + Consider also two counters: + + ref_pic_temp (Reference Picture (I,P) Temporal Reference) + dep_pic_temp (Dependent Picture (B) Temporal Reference) + + At each GOP beginning, set these counters to the temporal reference + value of the corresponding picture type. For our example GOP + sequence, ref_pic_temp = 2 and dep_pic_temp = 0. Keep incrementing + BOTH counters by unity with each following picture. Ref_pic_temp + should match the temporal references of the I and P frames, and + dep_pic_temp should match the temporal references of the B frames. + + dep_pic_temp: - 0 1 2 3 4 5 6 7 8 9 + In stream order: 2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_H 2I 0B 1B ... + ref_pic_temp: 2 3 4 5 6 7 8 9 10 ^ 11 + -------------------------- | ^ + Match Drop | + Mismatch + in ref_pic_temp + + The loss of a GOP header can be detected by matching the appropriate + counter (based on picture type) to the temporal reference value. A + mismatch indicates a lost GOP header. If desired, a GOP header can be + re-constructed using a "null" time_code, repeating the closed_gop + flag from previous GOP headers, and setting the broken_link flag to + 1. + + The loss of a Picture_Header can also be detected by a mismatch in + the Temporal Reference contained in the RTP packet from the + appropriate dep_pic_temp or ref_pic_temp counters at the receiver. + + + +Hoffman, et. al. Standards Track [Page 9] + +RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996 + + + After scanning to the next Beginning-of-slice the Picture_Header is + reconstructed from the P, TR, FBV, BFC, FFV and FFC contained in that + packet, and from stream-dependent default values. + + Any time an RTP packet is lost (as indicated by a gap in the RTP + sequence number), the receiver may discard all packets until the + Beginning-of-slice bit is set. At this point, sufficient state + information is contained in the stream to allow processing by an MPEG + decoder starting at the next slice boundary (possibly after + reconstruction of the GOP_header and/or Picture_Header as described + above). + +References + + [1] ISO/IEC International Standard 11172; "Coding of moving pictures + and associated audio for digital storage media up to about 1,5 + Mbits/s", November 1993. + + [2] ISO/IEC International Standard 13818; "Generic coding of moving + pictures and associated audio information", November 1994. + + [3] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, + "RTP: A Transport Protocol for Real-Time Applications", + RFC 1889, January 1996. + + [4] H. Schulzrinne, "RTP Profile for Audio and Video Conferences + with Minimal Control", RFC 1890, January 1996. + + + + + + + + + + + + + + + + + + + + + + + + +Hoffman, et. al. Standards Track [Page 10] + +RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996 + + +Authors' Addresses + + Gerard Fernando + Sun Microsystems, Inc. + Mail-stop UMPK14-305 + 2550 Garcia Avenue + Mountain View, California 94043-1100 + USA + + Phone: +1 415-786-6373 + EMail: gerard.fernando@eng.sun.com + + + Vivek Goyal + Precept Software, Inc. + 1072 Arastradero Rd, + Palo Alto, CA 94304 + USA + + Phone: +1 415-845-5200 + EMail: goyal@precept.com + + + Don Hoffman + Sun Microsystems, Inc. + Mail-stop UMPK14-305 + 2550 Garcia Avenue + Mountain View, California 94043-1100 + USA + + Phone: +1 503-297-1580 + EMail: don.hoffman@eng.sun.com + + + + + + + + + + + + + + + + + + + +Hoffman, et. al. Standards Track [Page 11] + |