diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc2032.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc2032.txt')
-rw-r--r-- | doc/rfc/rfc2032.txt | 619 |
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc2032.txt b/doc/rfc/rfc2032.txt new file mode 100644 index 0000000..3d19e9c --- /dev/null +++ b/doc/rfc/rfc2032.txt @@ -0,0 +1,619 @@ + + + + + + +Network Working Group T. Turletti +Request for Comments: 2032 MIT +Category: Standards Track C. Huitema + Bellcore + October 1996 + + + RTP Payload Format for H.261 Video Streams + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Table of Contents + + 1. Abstract ............................................. 1 + 2. Purpose of this document ............................. 2 + 3. Structure of the packet stream ....................... 2 + 3.1 Overview of the ITU-T recommendation H.261 .......... 2 + 3.2 Considerations for packetization .................... 3 + 4. Specification of the packetization scheme ............ 4 + 4.1 Usage of RTP ........................................ 4 + 4.2 Recommendations for operation with hardware codecs .. 6 + 5. Packet loss issues ................................... 7 + 5.1 Use of optional H.261-specific control packets ...... 8 + 5.2 H.261 control packets definition .................... 9 + 5.2.1 Full INTRA-frame Request (FIR) packet ............. 9 + 5.2.2 Negative ACKnowledgements (NACK) packet ........... 9 + 6. Security Considerations .............................. 10 + Authors' Addresses ..................................... 10 + Acknowledgements ....................................... 10 + References ............................................. 11 + +1. Abstract + + This memo describes a scheme to packetize an H.261 video stream for + transport using the Real-time Transport Protocol, RTP, with any of + the underlying protocols that carry RTP. + + This specification is a product of the Audio/Video Transport working + group within the Internet Engineering Task Force. Comments are + solicited and should be addressed to the working group's mailing list + at rem-conf@es.net and/or the authors. + + + + +Turletti & Huitema Standards Track [Page 1] + +RFC 2032 RTP Payload Format for H.261 Video October 1996 + + +2. Purpose of this document + + The ITU-T recommendation H.261 [6] specifies the encodings used by + ITU-T compliant video-conference codecs. Although these encodings + were originally specified for fixed data rate ISDN circuits, + experiments [3],[8] have shown that they can also be used over + packet-switched networks such as the Internet. + + The purpose of this memo is to specify the RTP payload format for + encapsulating H.261 video streams in RTP [1]. + +3. Structure of the packet stream + +3.1. Overview of the ITU-T recommendation H.261 + + The H.261 coding is organized as a hierarchy of groupings. The video + stream is composed of a sequence of images, or frames, which are + themselves organized as a set of Groups of Blocks (GOB). Note that + H.261 "pictures" are referred as "frames" in this document. Each GOB + holds a set of 3 lines of 11 macro blocks (MB). Each MB carries + information on a group of 16x16 pixels: luminance information is + specified for 4 blocks of 8x8 pixels, while chrominance information + is given by two "red" and "blue" color difference components at a + resolution of only 8x8 pixels. These components and the codes + representing their sampled values are as defined in the ITU-R + Recommendation 601 [7]. + + This grouping is used to specify information at each level of the + hierarchy: + + - At the frame level, one specifies information such as the + delay from the previous frame, the image format, and + various indicators. + + - At the GOB level, one specifies the GOB number and the + default quantifier that will be used for the MBs. + + - At the MB level, one specifies which blocks are present + and which did not change, and optionally a quantifier and + motion vectors. + + Blocks which have changed are encoded by computing the discrete + cosine transform (DCT) of their coefficients, which are then + quantized and Huffman encoded (Variable Length Codes). + + The H.261 Huffman encoding includes a special "GOB start" pattern, + composed of 15 zeroes followed by a single 1, that cannot be imitated + by any other code words. This pattern is included at the beginning of + + + +Turletti & Huitema Standards Track [Page 2] + +RFC 2032 RTP Payload Format for H.261 Video October 1996 + + + each GOB header (and also at the beginning of each frame header) to + mark the separation between two GOBs, and is in fact used as an + indicator that the current GOB is terminated. The encoding also + includes a stuffing pattern, composed of seven zeroes followed by + four ones; that stuffing pattern can only be entered between the + encoding of MBs, or just before the GOB separator. + +3.2. Considerations for packetization + + H.261 codecs designed for operation over ISDN circuits produce a bit + stream composed of several levels of encoding specified by H.261 and + companion recommendations. The bits resulting from the Huffman + encoding are arranged in 512-bit frames, containing 2 bits of + synchronization, 492 bits of data and 18 bits of error correcting + code. The 512-bit frames are then interlaced with an audio stream + and transmitted over px64 kbps circuits according to specification + H.221 [5]. + + When transmitting over the Internet, we will directly consider the + output of the Huffman encoding. All the bits produced by the Huffman + encoding stage will be included in the packet. We will not carry the + 512-bit frames, as protection against bit errors can be obtained by + other means. Similarly, we will not attempt to multiplex audio and + video signals in the same packets, as UDP and RTP provide a much more + efficient way to achieve multiplexing. + + Directly transmitting the result of the Huffman encoding over an + unreliable stream of UDP datagrams would, however, have poor error + resistance characteristics. The result of the hierachical structure + of H.261 bit stream is that one needs to receive the information + present in the frame header to decode the GOBs, as well as the + information present in the GOB header to decode the MBs. Without + precautions, this would mean that one has to receive all the packets + that carry an image in order to properly decode its components. + + If each image could be carried in a single packet, this requirement + would not create a problem. However, a video image or even one GOB by + itself can sometimes be too large to fit in a single packet. + Therefore, the MB is taken as the unit of fragmentation. Packets + must start and end on a MB boundary, i.e. a MB cannot be split across + multiple packets. Multiple MBs may be carried in a single packet + when they will fit within the maximal packet size allowed. This + practice is recommended to reduce the packet send rate and packet + overhead. + + To allow each packet to be processed independently for efficient + resynchronization in the presence of packet losses, some state + information from the frame header and GOB header is carried with each + + + +Turletti & Huitema Standards Track [Page 3] + +RFC 2032 RTP Payload Format for H.261 Video October 1996 + + + packet to allow the MBs in that packet to be decoded. This state + information includes the GOB number in effect at the start of the + packet, the macroblock address predictor (i.e. the last MBA encoded + in the previous packet), the quantizer value in effect prior to the + start of this packet (GQUANT, MQUANT or zero in case of a beginning + of GOB) and the reference motion vector data (MVD) for computing the + true MVDs contained within this packet. The bit stream cannot be + fragmented between a GOB header and MB 1 of that GOB. + + Moreover, since the compressed MB may not fill an integer number of + octets, the data header contains two three-bit integers, SBIT and + EBIT, to indicate the number of unused bits in the first and last + octets of the H.261 data, respectively. + +4. Specification of the packetization scheme + +4.1. Usage of RTP + + The H.261 information is carried as payload data within the RTP + protocol. The following fields of the RTP header are specified: + + - The payload type should specify H.261 payload format (see + the companion RTP profile document RFC 1890). + + - The RTP timestamp encodes the sampling instant of the + first video image contained in the RTP data packet. If a + video image occupies more than one packet, the timestamp + will be the same on all of those packets. Packets from + different video images must have different timestamps so + that frames may be distinguished by the timestamp. For + H.261 video streams, the RTP timestamp is based on a + 90kHz clock. This clock rate is a multiple of the natural + H.261 frame rate (i.e. 30000/1001 or approx. 29.97 Hz). + That way, for each frame time, the clock is just + incremented by the multiple and this removes inaccuracy + in calculating the timestamp. Furthermore, the initial + value of the timestamp is random (unpredictable) to make + known-plaintext attacks on encryption more difficult, see + RTP [1]. Note that if multiple frames are encoded in a + packet (e.g. when there are very little changes between + two images), it is necessary to calculate display times + for the frames after the first using the timing + information in the H.261 frame header. This is required + because the RTP timestamp only gives the display time of + the first frame in the packet. + + - The marker bit of the RTP header is set to one in the + last packet of a video frame, and otherwise, must be + + + +Turletti & Huitema Standards Track [Page 4] + +RFC 2032 RTP Payload Format for H.261 Video October 1996 + + + zero. Thus, it is not necessary to wait for a following + packet (which contains the start code that terminates the + current frame) to detect that a new frame should be + displayed. + + The H.261 data will follow the RTP header, as in: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + . . + . RTP header . + . . + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | H.261 header | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | H.261 stream ... . + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + The H.261 header is defined as following: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |SBIT |EBIT |I|V| GOBN | MBAP | QUANT | HMVD | VMVD | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + The fields in the H.261 header have the following meanings: + + Start bit position (SBIT): 3 bits + Number of most significant bits that should be ignored + in the first data octet. + + End bit position (EBIT): 3 bits + Number of least significant bits that should be ignored + in the last data octet. + + INTRA-frame encoded data (I): 1 bit + Set to 1 if this stream contains only INTRA-frame coded + blocks. Set to 0 if this stream may or may not contain + INTRA-frame coded blocks. The sense of this bit may not + change during the course of the RTP session. + + Motion Vector flag (V): 1 bit + Set to 0 if motion vectors are not used in this stream. + Set to 1 if motion vectors may or may not be used in + this stream. The sense of this bit may not change during + the course of the session. + + + +Turletti & Huitema Standards Track [Page 5] + +RFC 2032 RTP Payload Format for H.261 Video October 1996 + + + GOB number (GOBN): 4 bits + Encodes the GOB number in effect at the start of the + packet. Set to 0 if the packet begins with a GOB header. + + Macroblock address predictor (MBAP): 5 bits + Encodes the macroblock address predictor (i.e. the last + MBA encoded in the previous packet). This predictor ranges + from 0-32 (to predict the valid MBAs 1-33), but because + the bit stream cannot be fragmented between a GOB header + and MB 1, the predictor at the start of the packet can + never be 0. Therefore, the range is 1-32, which is biased + by -1 to fit in 5 bits. For example, if MBAP is 0, the + value of the MBA predictor is 1. Set to 0 if the packet + begins with a GOB header. + + Quantizer (QUANT): 5 bits + Quantizer value (MQUANT or GQUANT) in effect prior to the + start of this packet. Set to 0 if the packet begins with + a GOB header. + + Horizontal motion vector data (HMVD): 5 bits + Reference horizontal motion vector data (MVD). Set to 0 + if V flag is 0 or if the packet begins with a GOB header, + or when the MTYPE of the last MB encoded in the previous + packet was not MC. HMVD is encoded as a 2's complement + number, and `10000' corresponding to the value -16 is + forbidden (motion vector fields range from +/-15). + + Vertical motion vector data (VMVD): 5 bits + Reference vertical motion vector data (MVD). Set to 0 if + V flag is 0 or if the packet begins with a GOB header, or + when the MTYPE of the last MB encoded in the previous + packet was not MC. VMVD is encoded as a 2's complement + number, and `10000' corresponding to the value -16 is + forbidden (motion vector fields range from +/-15). + + Note that the I and V flags are hint flags, i.e. they can be inferred + from the bit stream. They are included to allow decoders to make + optimizations that would not be possible if these hints were not + provided before bit stream was decoded. Therefore, these bits cannot + change for the duration of the stream. A conformant implementation + can always set V=1 and I=0. + +4.2. Recommendations for operation with hardware codecs + + Packetizers for hardware codecs can trivially figure out GOB + boundaries using the GOB-start pattern included in the H.261 data. + (Note that software encoders already know the boundaries.) The + + + +Turletti & Huitema Standards Track [Page 6] + +RFC 2032 RTP Payload Format for H.261 Video October 1996 + + + cheapest packetization implementation is to packetize at the GOB + level all the GOBs that fit in a packet. But when a GOB is too + large, the packetizer has to parse it to do MB fragmentation. (Note + that only the Huffman encoding must be parsed and that it is not + necessary to fully decompress the stream, so this requires relatively + little processing; example implementations can be found in some + public H.261 codecs such as IVS [4] and VIC [9].) It is recommended + that MB level fragmentation be used when feasible in order to obtain + more efficient packetization. Using this fragmentation scheme reduces + the output packet rate and therefore reduces the overhead. + + At the receiver, the data stream can be depacketized and directed to + a hardware codec's input. If the hardware decoder operates at a + fixed bit rate, synchronization may be maintained by inserting the + stuffing pattern between MBs (i.e., between packets) when the packet + arrival rate is slower than the bit rate. + +5. Packet loss issues + + On the Internet, most packet losses are due to network congestion + rather than transmission errors. Using UDP, no mechanism is available + at the sender to know if a packet has been successfully received. It + is up to the application, i.e. coder and decoder, to handle the + packet loss. Each RTP packet includes a a sequence number field which + can be used to detect packet loss. + + H.261 uses the temporal redundancy of video to perform compression. + This differential coding (or INTER-frame coding) is sensitive to + packet loss. After a packet loss, parts of the image may remain + corrupt until all corresponding MBs have been encoded in INTRA-frame + mode (i.e. encoded independently of past frames). There are several + ways to mitigate packet loss: + + (1) One way is to use only INTRA-frame encoding and MB level + conditional replenishment. That is, only MBs that change + (beyond some threshold) are transmitted. + + (2) Another way is to adjust the INTRA-frame encoding + refreshment rate according to the packet loss observed by + the receivers. The H.261 recommendation specifies that a + MB is INTRA-frame encoded at least every 132 times it is + transmitted. However, the INTRA-frame refreshment rate + can be raised in order to speed the recovery when the + measured loss rate is significant. + + (3) The fastest way to repair a corrupted image is to request + an INTRA-frame coded image refreshment after a packet + loss is detected. One means to accomplish this is for the + + + +Turletti & Huitema Standards Track [Page 7] + +RFC 2032 RTP Payload Format for H.261 Video October 1996 + + + decoder to send to the coder a list of packets lost. The + coder can decide to encode every MB of every GOB of the + following video frame in INTRA-frame mode (i.e. Full + INTRA-frame encoded), or if the coder can deduce from the + packet sequence numbers which MBs were affected by the + loss, it can save bandwidth by sending only those MBs in + INTRA-frame mode. This mode is particularly efficient in + point-to-point connection or when the number of decoders + is low. The next section specifies how the refresh + function may be implemented. + + Note that the method (1) is currently implemented in the VIC + videoconferencing software [9]. Methods (2) and (3) are currently + implemented in the IVS videoconferencing software [4]. + +5.1. Use of optional H.261-specific control packets + + This specification defines two H.261-specific RTCP control packets, + "Full INTRA-frame Request" and "Negative Acknowledgement", described + in the next section. Their purpose is to speed up refreshment of the + video in those situations where their use is feasible. Support of + these H.261-specific control packets by the H.261 sender is optional; + in particular, early experiments have shown that the usage of this + feature could have very negative effects when the number of sites is + very large. Thus, these control packets should be used with caution. + + The H.261-specific control packets differ from normal RTCP packets in + that they are not transmitted to the normal RTCP destination + transport address for the RTP session (which is often a multicast + address). Instead, these control packets are sent directly via + unicast from the decoder to the coder. The destination port for + these control packets is the same port that the coder uses as a + source port for transmitting RTP (data) packets. Therefore, these + packets may be considered "reverse" control packets. + + As a consequence, these control packets may only be used when no RTP + mixers or translators intervene in the path from the coder to the + decoder. If such intermediate systems do intervene, the address of + the coder would no longer be present as the network-level source + address in packets received by the decoder, and in fact, it might not + be possible for the decoder to send packets directly to the coder. + + Some reliable multicast protocols use similar NACK control packets + transmitted over the normal multicast distribution channel, but they + typically use random delays to prevent a NACK implosion problem [2]. + The goal of such protocols is to provide reliable multicast packet + delivery at the expense of delay, which is appropriate for + applications such as a shared whiteboard. + + + +Turletti & Huitema Standards Track [Page 8] + +RFC 2032 RTP Payload Format for H.261 Video October 1996 + + + On the other hand, interactive video transmission is more sensitive + to delay and does not require full reliability. For video + applications it is more effective to send the NACK control packets as + soon as possible, i.e. as soon as a loss is detected, without adding + any random delays. In this case, multicasting the NACK control + packets would generate useless traffic between receivers since only + the coder will use them. But this method is only effective when the + number of receivers is small. e.g. in IVS [4] the H.261 specific + control packets are used only in point-to-point connections or in + point-to-multipoint connections when there are less than 10 + participants in the conference. + +5.2. H.261 control packets definition + +5.2.1. Full INTRA-frame Request (FIR) packet + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |V=2|P| MBZ | PT=RTCP_FIR | length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | SSRC | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + This packet indicates that a receiver requires a full encoded image + in order to either start decoding with an entire image or to refresh + its image and speed the recovery after a burst of lost packets. The + receiver requests the source to force the next image in full "INTRA- + frame" coding mode, i.e. without using differential coding. The + various fields are defined in the RTP specification [1]. SSRC is the + synchronization source identifier for the sender of this packet. The + value of the packet type (PT) identifier is the constant RTCP_FIR + (192). + +5.2.2. Negative ACKnowledgements (NACK) packet + + The format of the NACK packet is as follow: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |V=2|P| MBZ | PT=RTCP_NACK | length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | SSRC | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | FSN | BLP | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + + + +Turletti & Huitema Standards Track [Page 9] + +RFC 2032 RTP Payload Format for H.261 Video October 1996 + + + The various fields T, P, PT, length and SSRC are defined in the RTP + specification [1]. The value of the packet type (PT) identifier is + the constant RTCP_NACK (193). SSRC is the synchronization source + identifier for the sender of this packet. + + The two remaining fields have the following meanings: + + First Sequence Number (FSN): 16 bits + Identifies the first sequence number lost. + + Bitmask of following lost packets (BLP): 16 bits + A bit is set to 1 if the corresponding packet has been lost, + and set to 0 otherwise. BLP is set to 0 only if no packet + other than that being NACKed (using the FSN field) has been + lost. BLP is set to 0x00001 if the packet corresponding to + the FSN and the following packet have been lost, etc. + +6. Security Considerations + + Security issues are not discussed in this memo. + +Authors' Addresses + + Thierry Turletti + INRIA - RODEO Project + 2004 route des Lucioles + BP 93, 06902 Sophia Antipolis + FRANCE + + EMail: turletti@sophia.inria.fr + + + Christian Huitema + MCC 1J236B Bellcore + 445 South Street + Morristown, NJ 07960-6438 + + EMail: huitema@bellcore.com + +Acknowledgements + + This memo is based on discussion within the AVT working group chaired + by Stephen Casner. Steve McCanne, Stephen Casner, Ronan Flood, Mark + Handley, Van Jacobson, Henning G. Schulzrinne and John Wroclawski + provided valuable comments. Stephen Casner and Steve McCanne also + helped greatly with getting this document into readable form. + + + + + +Turletti & Huitema Standards Track [Page 10] + +RFC 2032 RTP Payload Format for H.261 Video October 1996 + + +References + + [1] Schulzrinne, H., Casner, S., Frederick, R., and + V. Jacobson, "RTP: A Transport Protocol for Real-Time + Applications", RFC 1889, January 1996. + + [2] Sridhar Pingali, Don Towsley and James F. Kurose, A + comparison of sender-initiated and receiver-initiated + reliable multicast protocols, IEEE GLOBECOM '94. + + [3] Thierry Turletti, H.261 software codec for + videoconferencing over the Internet INRIA Research Report + no 1834, January 1993. + + [4] Thierry Turletti, INRIA Videoconferencing tool (IVS), + available by anonymous ftp from zenon.inria.fr in the + "rodeo/ivs/last_version" directory. See also URL + <http://www.inria.fr/rodeo/ivs.html>. + + [5] Frame structure for Audiovisual Services for a 64 to 1920 + kbps Channel in Audiovisual Services ITU-T (International + Telecommunication Union - Telecommunication + Standardisation Sector) Recommendation H.221, 1990. + + [6] Video codec for audiovisual services at p x 64 kbit/s + ITU-T (International Telecommunication Union - + Telecommunication Standardisation Sector) Recommendation + H.261, 1993. + + [7] Digital Methods of Transmitting Television Information + ITU-R (International Telecommunication Union - + Radiocommunication Standardisation Sector) Recommendation + 601, 1986. + + [8] M.A Sasse, U. Bilting, C-D Schulz, T. Turletti, Remote + Seminars through MultiMedia Conferencing: Experiences + from the MICE project, Proc. INET'94/JENC5, Prague, June + 1994, pp. 251/1-251/8. + + [9] Steve MacCanne, Van Jacobson, VIC Videoconferencing tool, + available by anonymous ftp from ee.lbl.gov in the + "conferencing/vic" directory. + + + + + + + + + +Turletti & Huitema Standards Track [Page 11] + |