summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc2038.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc2038.txt')
-rw-r--r--doc/rfc/rfc2038.txt619
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc2038.txt b/doc/rfc/rfc2038.txt
new file mode 100644
index 0000000..0a4673d
--- /dev/null
+++ b/doc/rfc/rfc2038.txt
@@ -0,0 +1,619 @@
+
+
+
+
+
+
+Network Working Group D. Hoffman
+Request for Comments: 2038 G. Fernando
+Category: Standards Track Sun Microsystems, Inc.
+ V. Goyal
+ Precept Software, Inc.
+ October 1996
+
+
+ RTP Payload Format for MPEG1/MPEG2 Video
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Abstract
+
+ This memo describes a packetization scheme for MPEG video and audio
+ streams. The scheme proposed can be used to transport such a video
+ or audio flow over the transport protocols supported by RTP. Two
+ approaches are described. The first is designed to support maximum
+ interoperability with MPEG System environments. The second is
+ designed to provide maximum compatibility with other RTP-encapsulated
+ media streams and future conference control work of the IETF.
+
+1. Introduction
+
+ ISO/IEC JTC1/SC29 WG11 (also referred to as the MPEG committee) has
+ defined the MPEG1 standard (ISO/IEC 11172)[1] and the MPEG2 standard
+ (ISO/IEC 13818)[2]. This memo describes a packetization scheme to
+ transport MPEG video and audio streams using the Real-time Transport
+ Protocol (RTP), version 2 [3, 4].
+
+ The MPEG1 specification is defined in three parts: System, Video and
+ Audio. It is designed primarily for CD-ROM-based applications, and
+ is optimized for approximately 1.5 Mbits/sec combined data rates. The
+ video and audio portions of the specification describe the basic
+ format of the video or audio stream. These formats define the
+ Elementary Streams (ES). The MPEG1 System specification defines an
+ encapsulation of the ES that contains Presentation Time Stamps (PTS),
+ Decoding Time Stamps and System Clock references, and performs
+ multiplexing of MPEG1 compressed video and audio ES's with user data.
+
+
+
+
+
+
+Hoffman, et. al. Standards Track [Page 1]
+
+RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996
+
+
+ The MPEG2 specification is structured in a similar way. However, it
+ hasn't been restricted only to CD-ROM applications. The MPEG2 System
+ specification defines two system stream formats: the MPEG2 Transport
+ Stream (MTS) and the MPEG2 Program Stream (MPS). The MTS is tailored
+ for communicating or storing one or more programs of MPEG2 compressed
+ data and also other data in relatively error-prone environments. The
+ MPS is tailored for relatively error-free environments.
+
+ We seek to achieve interoperability among 4 types of end-systems in
+ the following specification. The 4 types are:
+
+ 1. Transmitting Interworking Unit (TIU)
+
+ Receives MPEG information from a native MTS system for
+ distribution over packet networks using a native RTP-based
+ system layer (such as an IP-based internetwork). Examples:
+ real-time encoder, MTS satellite link to Internet, video
+ server with MTS-encoded source material.
+
+ 2. Receiving Interworking Unit (RIU)
+
+ Receives MPEG information in real time from an RTP-based
+ network for forwarding to a native MTS environment.
+ Examples: Internet-based video server to MTS-based cable
+ distribution plant.
+
+ 3. Transmitting Internet End-System (TAES)
+
+ Transmits MPEG information generated or stored within the
+ internet end-system itself, or received from internet-based
+ computer networks. Example: video server.
+
+ 4. Receiving Internet End-System (RAES)
+
+ Receives MPEG information over an RTP-based internet for
+ consumption at the internet end-system or forwarding to
+ traditional computer network. Example: desktop PC or
+ workstation viewing training video.
+
+ Each of the 2 types of transmitters must work with each of the 2
+ types of receivers. Because it is probable that the TAES, and
+ certain that the RAES, will be based on existing and planned
+ internet-connected computers, it is highly desirable for the
+ interoperable protocol to be based on RTP.
+
+ Because of the range of applications that might employ MPEG streams,
+ we propose to define two payload formats.
+
+
+
+
+Hoffman, et. al. Standards Track [Page 2]
+
+RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996
+
+
+ Much interest in the MPEG community is in the use of one of the MPEG
+ System encodings, and hence, in Section 2 we propose encapsulations
+ of MPEG1 System streams and MPEG2 Transport and Program Streams with
+ RTP. This profile supports the full semantics of MPEG System and
+ offers basic interoperability among all four end-system types.
+
+ When operating only among internet-based end-systems (i.e., TAES and
+ RAES) a payload format that provides greater compatibility with the
+ Internet architecture is desired, deferring some of the system issues
+ to other protocols being defined in the Internet community (such as
+ the MMUSIC WG). In Section 3 we propose an encapsulation of
+ compressed video and audio data (referred to in MPEG documentation as
+ "Elementary Streams" (ES)) complying with either MPEG1 or MPEG2.
+ Here, neither of the System standards of MPEG1 or MPEG2 are utilized.
+ The ES's are directly encapsulated with RTP.
+
+ Throughout this specification, we make extensive use of MPEG
+ terminology. The reader should consult the primary MPEG references
+ for definitive descriptions of this terminology.
+
+2. Encapsulation of MPEG System and Transport Streams
+
+ Each RTP packet will contain a timestamp derived from the sender's
+ 90KHz clock reference. This clock is synchronized to the system
+ stream Program Clock Reference (PCR) or System Clock Reference (SCR)
+ and represents the target transmission time of the first byte of the
+ packet payload. The RTP timestamp will not be passed to the MPEG
+ decoder. This use of the timestamp is somewhat different than
+ normally is the case in RTP, in that it is not considered to be the
+ media display or presentation timestamp. The primary purposes of the
+ RTP timestamp will be to estimate and reduce any network-induced
+ jitter and to synchronize relative time drift between the transmitter
+ and receiver.
+
+ For MPEG2 Transport Streams the RTP payload will contain an integral
+ number of MPEG transport packets. To avoid end system
+ inefficiencies, data from multiple small MTS packets (normally fixed
+ in size at 188 bytes) are aggregated into a single RTP packet. The
+ number of transport packets contained is computed by dividing RTP
+ payload length by the length of an MTS packet (188).
+
+ For MPEG2 Program streams and MPEG1 system streams there are no
+ packetization restrictions; these streams are treated as a packetized
+ stream of bytes.
+
+
+
+
+
+
+
+Hoffman, et. al. Standards Track [Page 3]
+
+RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996
+
+
+2.1 RTP header usage
+
+ The RTP header fields are used as follows:
+
+ Payload Type: Distinct payload types should be assigned for
+ of MPEG1 System Streams, MPEG2 Program Streams and MPEG2
+ Transport Streams. See [4] for payload type assignments.
+
+ M bit: Set to 1 whenever the timestamp is discontinuous
+ (such as might happen when a sender switches from one data
+ source to another). This allows the receiver and any
+ intervening RTP mixers or translators that are synchronizing
+ to the flow to ignore the difference between this timestamp
+ and any previous timestamp in their clock phase detectors.
+
+ timestamp: 32 bit 90K Hz timestamp representing the target
+ transmission time for the first byte of the packet.
+
+3. Encapsulation of MPEG Elementary Streams
+
+ The following ES types may be encapsulated directly in RTP:
+
+ (a) MPEG1 Video (ISO/IEC 11172-2)
+ (b) MPEG2 Video (ISO/IEC 13818-2)
+ (c) MPEG1 Audio (ISO/IEC 11172-3)
+ (d) MPEG2 Audio (ISO/IEC 13818-3)
+
+ A distinct RTP payload type is assigned to MPEG1/MPEG2 Video and
+ MPEG1/MPEG2 Audio, respectively. Further indication as to whether the
+ data is MPEG1 or MPEG2 need not be provided in the RTP or MPEG-
+ specific headers of this encapsulation, as this information is
+ available in the ES headers.
+
+ Presentation Time Stamps (PTS) of 32 bits with an accuracy of 90 kHz
+ shall be carried in the fixed RTP header. All packets that make up a
+ audio or video frame shall have the same time stamp.
+
+3.1 MPEG Video elementary streams
+
+ MPEG1 Video can be distinguished from MPEG2 Video at the video
+ sequence header, i.e. for MPEG2 Video a sequence_header() is followed
+ by sequence_extension(). The particular profile and level of MPEG2
+ Video (MAIN_Profile@MAIN_Level, HIGH_Profile@HIGH_Level, etc) are
+ determined by the profile_and_level_indicator field of the
+ sequence_extension header of MPEG2 Video.
+
+ The MPEG bit-stream semantics were designed for relatively error-free
+ environments, and there is significant amount of dependency (both
+
+
+
+Hoffman, et. al. Standards Track [Page 4]
+
+RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996
+
+
+ temporal and spatial) within the stream such that loss of some data
+ make other uncorrupted data useless. The format as defined in this
+ encapsulation uses application layer framing information plus
+ additional information in the RTP stream-specific header to allow for
+ certain recovery mechanisms. Appendix 1 suggests several recovery
+ strategies based on the properties of this encapsulation.
+
+ Since MPEG pictures can be large, they will normally be fragmented
+ into packets of size less than a typical LAN/WAN MTU. The following
+ fragmentation rules apply:
+
+ 1. The MPEG Video_Sequence_Header, when present, will always
+ be at the beginning of an RTP payload.
+ 2. An MPEG GOP_header, when present, will always be at the
+ beginning of the RTP payload, or will follow a
+ Video_Sequence_Header.
+ 3. An MPEG Picture_Header, when present, will always be at the
+ beginning of a RTP payload, or will follow a GOP_header.
+
+ Each ES header must be completely contained within the packet.
+ Consequently, a minimum RTP payload size of 261 bytes must be
+ supported to contain the largest single header defined in the ES
+ (that is, the extension_data() header containing the
+ quant_matrix_extension()). Otherwise, there are no restrictions on
+ where headers may appear within packet payloads.
+
+ In MPEG, each picture is made up of one or more "slices," and a slice
+ is intended to be the unit of recovery from data loss or corruption.
+ An MPEG-compliant decoder will normally advance to the beginning of
+ next slice whenever an error is encountered in the stream. MPEG
+ slice begin and end bits are provided in the encapsulation header to
+ facilitate this.
+
+ The beginning of a slice must either be the first data in a packet
+ (after any MPEG ES headers) or must follow after some integral number
+ of slices in a packet. This requirement insures that the beginning
+ of the next slice after one with a missing packet can be found
+ without requiring that the receiver scan the packet contents. Slices
+ may be fragmented across packets as long as all the above rules are
+ met.
+
+ An implementation based on this encapsulation assumes that the
+ Video_Sequence_Header is repeated periodically in the MPEG bit-
+ stream. In practice (though not required by MPEG standard) this is
+ used to allow channel switching and to receive and start decoding a
+ continuously relayed MPEG bit-stream at arbitrary points in the media
+ stream. It is suggested that when playing back from an MPEG stream
+ from a file format (where the Video_Sequence_Header may only be
+
+
+
+Hoffman, et. al. Standards Track [Page 5]
+
+RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996
+
+
+ represented at the beginning of the stream) that the first
+ Video_Sequence_Header (preceded by an end-of-stream indicator) be
+ saved by the packetizer for periodic injection in to the network
+ stream.
+
+3.2 MPEG Audio elementary streams
+
+ MPEG1 Audio can be distinguished from MPEG2 Audio from the MPEG
+ ancillary_data() header. For either MPEG1 or MPEG2 Audio, distinct
+ Presentation Time Stamps may be present for frames which correspond
+ to either 384 samples for Layer-I, or 1152 samples for Layer-II or
+ Layer-III. The actual number of bytes required to represent this
+ number of samples will vary depending on the encoder parameters.
+
+ Multiple audio frames may be encapsulated within one RTP packet. In
+ this case, an integral number of audio frames must be contained
+ within the packet and the fragmentation header defined in Section 3.5
+ shall be set to 0.
+
+ Also, if relatively short packets are to be used, one frame may be so
+ large that it may straddle multiple RTP packets. For example, for
+ Layer-II MPEG audio sampled at a rate of 44.1 KHz each frame would
+ represent a time slot of 26.1 msec. At this sampling rate if the
+ compressed bit-rate is 384 kbits/sec (i.e. 48 kBytes/sec) then the
+ average audio frame size would be 1.25 KBytes. If packets were to be
+ 500 Bytes long, then each audio frame would straddle 3 RTP packets.
+ The audio fragmentation indicator header (See Section 3.5) shall be
+ present for an MPEG1/2 Audio payload type to provide for this
+ fragmentation.
+
+3.3 RTP Fixed Header for MPEG ES encapsulation
+
+ The RTP header fields are used as follows:
+
+ Payload Type: Distinct payload types should be assigned
+ for video elementary streams and audio elementary streams.
+ See [4] for payload type assignments.
+
+ M bit: For video, set to 1 on packet containing MPEG frame
+ end code, 0 otherwise. For audio, set to 1 on first packet
+ of a "talk-spurt," 0 otherwise.
+
+ PT: MPEG video or audio stream ID.
+
+ timestamp: 32-bit 90K Hz timestamp representing presentation
+ time of MPEG picture or audio frame. Same for all packets
+ that make up a picture or audio frame. May not be
+ monotonically increasing in video stream if B pictures
+
+
+
+Hoffman, et. al. Standards Track [Page 6]
+
+RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996
+
+
+ present in stream. For packets that contain only a video
+ sequence and/or GOP header, the timestamp is that of the
+ subsequent picture.
+
+3.4 MPEG Video-specific header
+
+ This header shall be attached to each RTP packet after the RTP fixed
+ header.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| MBZ | TR |MBZ|S|B|E| P | | BFC | | FFC |
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ FBV FFV
+
+ MBZ: Unused. Must be set to zero in current
+ specification. This space is reserved for future use.
+
+ TR: Temporal-Reference (10 bits). The temporal reference of
+ the current picture within the current GOP. This value
+ ranges from 0-1023 and is constant for all RTP packets of a
+ given picture.
+
+ MBZ: Unused. Must be set to zero in current
+ specification. This space is reserved for future use.
+
+ S: Sequence-header-present (1 bit). Normally 0 and set to 1 at
+ the occurrence of each MPEG sequence header. Used to
+ detect presence of sequence header in RTP packet.
+
+ B: Beginning-of-slice (BS) (1 bit). Set when the start of the
+ packet payload is a slice start code, or when a slice start
+ code is preceded only by one or more of a
+ Video_Sequence_Header, GOP_header and/or Picture_Header.
+
+ E: End-of-slice (ES) (1 bit). Set when the last byte of the
+ payload is the end of an MPEG slice.
+
+ P: Picture-Type (3 bits). I (1), P (2), B (3) or D (4). This
+ value is constant for each RTP packet of a given picture.
+ Value 000B is forbidden and 101B - 111B are reserved to
+ support future extensions to the MPEG ES specification.
+
+ FBV: full_pel_backward_vector
+ BFC: backward_f_code
+ FFV: full_pel_forward_vector
+ FFC: forward_f_code
+
+
+
+Hoffman, et. al. Standards Track [Page 7]
+
+RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996
+
+
+ Obtained from the most recent picture header, and are
+ constant for each RTP packet of a given picture. None of
+ these values are used for I frames and must be set to zero
+ in the RTP header. For P frames only the last two values
+ are present and FBV and BFC must be set to zero in the RTP
+ header. For B frames all the four values are present.
+
+3.5 MPEG Audio-specific header
+
+ This header shall be attached to each RTP packet at the start of the
+ payload and after any RTP headers for an MPEG1/2 Audio payload type.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| MBZ | Frag_offset |
++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Frag_offset: Byte offset into the audio frame for the data
+ in this packet.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hoffman, et. al. Standards Track [Page 8]
+
+RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996
+
+
+Appendix 1. Error Recovery and Resynchronization Strategies.
+
+ The following error recovery and resynchronization strategies are
+ intended to be guidelines only. A compliant receiver is free to
+ employ alternative (or no) strategies.
+
+ When initially decoding an RTP-encapsulated MPEG Elementary Stream,
+ the receiver may discard all packets until the Sequence-header-
+ present bit is set to 1. At this point, sufficient state information
+ is contained in the stream to allow processing by an MPEG decoder.
+
+ Loss of packets containing the GOP_header and/or Picture_Header are
+ detected by an unexpected change in the Temporal-Reference and
+ Picture-Type values. Consider the following example GOP sequence:
+
+ In display order: 0B 1B 2I 3B 4B 5P 6B 7B 8P GOP_HDR 0B ...
+ In stream order: 2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_HDR 2I ...
+
+ Consider also two counters:
+
+ ref_pic_temp (Reference Picture (I,P) Temporal Reference)
+ dep_pic_temp (Dependent Picture (B) Temporal Reference)
+
+ At each GOP beginning, set these counters to the temporal reference
+ value of the corresponding picture type. For our example GOP
+ sequence, ref_pic_temp = 2 and dep_pic_temp = 0. Keep incrementing
+ BOTH counters by unity with each following picture. Ref_pic_temp
+ should match the temporal references of the I and P frames, and
+ dep_pic_temp should match the temporal references of the B frames.
+
+ dep_pic_temp: - 0 1 2 3 4 5 6 7 8 9
+ In stream order: 2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_H 2I 0B 1B ...
+ ref_pic_temp: 2 3 4 5 6 7 8 9 10 ^ 11
+ -------------------------- | ^
+ Match Drop |
+ Mismatch
+ in ref_pic_temp
+
+ The loss of a GOP header can be detected by matching the appropriate
+ counter (based on picture type) to the temporal reference value. A
+ mismatch indicates a lost GOP header. If desired, a GOP header can be
+ re-constructed using a "null" time_code, repeating the closed_gop
+ flag from previous GOP headers, and setting the broken_link flag to
+ 1.
+
+ The loss of a Picture_Header can also be detected by a mismatch in
+ the Temporal Reference contained in the RTP packet from the
+ appropriate dep_pic_temp or ref_pic_temp counters at the receiver.
+
+
+
+Hoffman, et. al. Standards Track [Page 9]
+
+RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996
+
+
+ After scanning to the next Beginning-of-slice the Picture_Header is
+ reconstructed from the P, TR, FBV, BFC, FFV and FFC contained in that
+ packet, and from stream-dependent default values.
+
+ Any time an RTP packet is lost (as indicated by a gap in the RTP
+ sequence number), the receiver may discard all packets until the
+ Beginning-of-slice bit is set. At this point, sufficient state
+ information is contained in the stream to allow processing by an MPEG
+ decoder starting at the next slice boundary (possibly after
+ reconstruction of the GOP_header and/or Picture_Header as described
+ above).
+
+References
+
+ [1] ISO/IEC International Standard 11172; "Coding of moving pictures
+ and associated audio for digital storage media up to about 1,5
+ Mbits/s", November 1993.
+
+ [2] ISO/IEC International Standard 13818; "Generic coding of moving
+ pictures and associated audio information", November 1994.
+
+ [3] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson,
+ "RTP: A Transport Protocol for Real-Time Applications",
+ RFC 1889, January 1996.
+
+ [4] H. Schulzrinne, "RTP Profile for Audio and Video Conferences
+ with Minimal Control", RFC 1890, January 1996.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hoffman, et. al. Standards Track [Page 10]
+
+RFC 2038 RTP Payload Format for MPEG1/MPEG2 Video October 1996
+
+
+Authors' Addresses
+
+ Gerard Fernando
+ Sun Microsystems, Inc.
+ Mail-stop UMPK14-305
+ 2550 Garcia Avenue
+ Mountain View, California 94043-1100
+ USA
+
+ Phone: +1 415-786-6373
+ EMail: gerard.fernando@eng.sun.com
+
+
+ Vivek Goyal
+ Precept Software, Inc.
+ 1072 Arastradero Rd,
+ Palo Alto, CA 94304
+ USA
+
+ Phone: +1 415-845-5200
+ EMail: goyal@precept.com
+
+
+ Don Hoffman
+ Sun Microsystems, Inc.
+ Mail-stop UMPK14-305
+ 2550 Garcia Avenue
+ Mountain View, California 94043-1100
+ USA
+
+ Phone: +1 503-297-1580
+ EMail: don.hoffman@eng.sun.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hoffman, et. al. Standards Track [Page 11]
+