diff options
Diffstat (limited to 'doc/rfc/rfc3551.txt')
-rw-r--r-- | doc/rfc/rfc3551.txt | 2467 |
1 files changed, 2467 insertions, 0 deletions
diff --git a/doc/rfc/rfc3551.txt b/doc/rfc/rfc3551.txt new file mode 100644 index 0000000..c43ff34 --- /dev/null +++ b/doc/rfc/rfc3551.txt @@ -0,0 +1,2467 @@ + + + + + + +Network Working Group H. Schulzrinne +Request for Comments: 3551 Columbia University +Obsoletes: 1890 S. Casner +Category: Standards Track Packet Design + July 2003 + + + RTP Profile for Audio and Video Conferences + with Minimal Control + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2003). All Rights Reserved. + +Abstract + + This document describes a profile called "RTP/AVP" for the use of the + real-time transport protocol (RTP), version 2, and the associated + control protocol, RTCP, within audio and video multiparticipant + conferences with minimal control. It provides interpretations of + generic fields within the RTP specification suitable for audio and + video conferences. In particular, this document defines a set of + default mappings from payload type numbers to encodings. + + This document also describes how audio and video data may be carried + within RTP. It defines a set of standard encodings and their names + when used within RTP. The descriptions provide pointers to reference + implementations and the detailed standards. This document is meant + as an aid for implementors of audio, video and other real-time + multimedia applications. + + This memorandum obsoletes RFC 1890. It is mostly backwards- + compatible except for functions removed because two interoperable + implementations were not found. The additions to RFC 1890 codify + existing practice in the use of payload formats under this profile + and include new payload formats defined since RFC 1890 was published. + + + + + + + +Schulzrinne & Casner Standards Track [Page 1] + +RFC 3551 RTP A/V Profile July 2003 + + +Table of Contents + + 1. Introduction ................................................. 3 + 1.1 Terminology ............................................. 3 + 2. RTP and RTCP Packet Forms and Protocol Behavior .............. 4 + 3. Registering Additional Encodings ............................. 6 + 4. Audio ........................................................ 8 + 4.1 Encoding-Independent Rules .............................. 8 + 4.2 Operating Recommendations ............................... 9 + 4.3 Guidelines for Sample-Based Audio Encodings ............. 10 + 4.4 Guidelines for Frame-Based Audio Encodings .............. 11 + 4.5 Audio Encodings ......................................... 12 + 4.5.1 DVI4 ............................................ 13 + 4.5.2 G722 ............................................ 14 + 4.5.3 G723 ............................................ 14 + 4.5.4 G726-40, G726-32, G726-24, and G726-16 .......... 18 + 4.5.5 G728 ............................................ 19 + 4.5.6 G729 ............................................ 20 + 4.5.7 G729D and G729E ................................. 22 + 4.5.8 GSM ............................................. 24 + 4.5.9 GSM-EFR ......................................... 27 + 4.5.10 L8 .............................................. 27 + 4.5.11 L16 ............................................. 27 + 4.5.12 LPC ............................................. 27 + 4.5.13 MPA ............................................. 28 + 4.5.14 PCMA and PCMU ................................... 28 + 4.5.15 QCELP ........................................... 28 + 4.5.16 RED ............................................. 29 + 4.5.17 VDVI ............................................ 29 + 5. Video ........................................................ 30 + 5.1 CelB .................................................... 30 + 5.2 JPEG .................................................... 30 + 5.3 H261 .................................................... 30 + 5.4 H263 .................................................... 31 + 5.5 H263-1998 ............................................... 31 + 5.6 MPV ..................................................... 31 + 5.7 MP2T .................................................... 31 + 5.8 nv ...................................................... 32 + 6. Payload Type Definitions ..................................... 32 + 7. RTP over TCP and Similar Byte Stream Protocols ............... 34 + 8. Port Assignment .............................................. 34 + 9. Changes from RFC 1890 ........................................ 35 + 10. Security Considerations ...................................... 38 + 11. IANA Considerations .......................................... 39 + 12. References ................................................... 39 + 12.1 Normative References .................................... 39 + 12.2 Informative References .................................. 39 + 13. Current Locations of Related Resources ....................... 41 + + + +Schulzrinne & Casner Standards Track [Page 2] + +RFC 3551 RTP A/V Profile July 2003 + + + 14. Acknowledgments .............................................. 42 + 15. Intellectual Property Rights Statement ....................... 43 + 16. Authors' Addresses ........................................... 43 + 17. Full Copyright Statement ..................................... 44 + +1. Introduction + + This profile defines aspects of RTP left unspecified in the RTP + Version 2 protocol definition (RFC 3550) [1]. This profile is + intended for the use within audio and video conferences with minimal + session control. In particular, no support for the negotiation of + parameters or membership control is provided. The profile is + expected to be useful in sessions where no negotiation or membership + control are used (e.g., using the static payload types and the + membership indications provided by RTCP), but this profile may also + be useful in conjunction with a higher-level control protocol. + + Use of this profile may be implicit in the use of the appropriate + applications; there may be no explicit indication by port number, + protocol identifier or the like. Applications such as session + directories may use the name for this profile specified in Section + 11. + + Other profiles may make different choices for the items specified + here. + + This document also defines a set of encodings and payload formats for + audio and video. These payload format descriptions are included here + only as a matter of convenience since they are too small to warrant + separate documents. Use of these payload formats is NOT REQUIRED to + use this profile. Only the binding of some of the payload formats to + static payload type numbers in Tables 4 and 5 is normative. + +1.1 Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119 [2] and + indicate requirement levels for implementations compliant with this + RTP profile. + + This document defines the term media type as dividing encodings of + audio and video content into three classes: audio, video and + audio/video (interleaved). + + + + + + + +Schulzrinne & Casner Standards Track [Page 3] + +RFC 3551 RTP A/V Profile July 2003 + + +2. RTP and RTCP Packet Forms and Protocol Behavior + + The section "RTP Profiles and Payload Format Specifications" of RFC + 3550 enumerates a number of items that can be specified or modified + in a profile. This section addresses these items. Generally, this + profile follows the default and/or recommended aspects of the RTP + specification. + + RTP data header: The standard format of the fixed RTP data + header is used (one marker bit). + + Payload types: Static payload types are defined in Section 6. + + RTP data header additions: No additional fixed fields are + appended to the RTP data header. + + RTP data header extensions: No RTP header extensions are + defined, but applications operating under this profile MAY use + such extensions. Thus, applications SHOULD NOT assume that the + RTP header X bit is always zero and SHOULD be prepared to ignore + the header extension. If a header extension is defined in the + future, that definition MUST specify the contents of the first 16 + bits in such a way that multiple different extensions can be + identified. + + RTCP packet types: No additional RTCP packet types are defined + by this profile specification. + + RTCP report interval: The suggested constants are to be used for + the RTCP report interval calculation. Sessions operating under + this profile MAY specify a separate parameter for the RTCP traffic + bandwidth rather than using the default fraction of the session + bandwidth. The RTCP traffic bandwidth MAY be divided into two + separate session parameters for those participants which are + active data senders and those which are not. Following the + recommendation in the RTP specification [1] that 1/4 of the RTCP + bandwidth be dedicated to data senders, the RECOMMENDED default + values for these two parameters would be 1.25% and 3.75%, + respectively. For a particular session, the RTCP bandwidth for + non-data-senders MAY be set to zero when operating on + unidirectional links or for sessions that don't require feedback + on the quality of reception. The RTCP bandwidth for data senders + SHOULD be kept non-zero so that sender reports can still be sent + for inter-media synchronization and to identify the source by + CNAME. The means by which the one or two session parameters for + RTCP bandwidth are specified is beyond the scope of this memo. + + + + + +Schulzrinne & Casner Standards Track [Page 4] + +RFC 3551 RTP A/V Profile July 2003 + + + SR/RR extension: No extension section is defined for the RTCP SR + or RR packet. + + SDES use: Applications MAY use any of the SDES items described + in the RTP specification. While CNAME information MUST be sent + every reporting interval, other items SHOULD only be sent every + third reporting interval, with NAME sent seven out of eight times + within that slot and the remaining SDES items cyclically taking up + the eighth slot, as defined in Section 6.2.2 of the RTP + specification. In other words, NAME is sent in RTCP packets 1, 4, + 7, 10, 13, 16, 19, while, say, EMAIL is used in RTCP packet 22. + + Security: The RTP default security services are also the default + under this profile. + + String-to-key mapping: No mapping is specified by this profile. + + Congestion: RTP and this profile may be used in the context of + enhanced network service, for example, through Integrated Services + (RFC 1633) [4] or Differentiated Services (RFC 2475) [5], or they + may be used with best effort service. + + If enhanced service is being used, RTP receivers SHOULD monitor + packet loss to ensure that the service that was requested is + actually being delivered. If it is not, then they SHOULD assume + that they are receiving best-effort service and behave + accordingly. + + If best-effort service is being used, RTP receivers SHOULD monitor + packet loss to ensure that the packet loss rate is within + acceptable parameters. Packet loss is considered acceptable if a + TCP flow across the same network path and experiencing the same + network conditions would achieve an average throughput, measured + on a reasonable timescale, that is not less than the RTP flow is + achieving. This condition can be satisfied by implementing + congestion control mechanisms to adapt the transmission rate (or + the number of layers subscribed for a layered multicast session), + or by arranging for a receiver to leave the session if the loss + rate is unacceptably high. + + The comparison to TCP cannot be specified exactly, but is intended + as an "order-of-magnitude" comparison in timescale and throughput. + The timescale on which TCP throughput is measured is the round- + trip time of the connection. In essence, this requirement states + that it is not acceptable to deploy an application (using RTP or + any other transport protocol) on the best-effort Internet which + consumes bandwidth arbitrarily and does not compete fairly with + TCP within an order of magnitude. + + + +Schulzrinne & Casner Standards Track [Page 5] + +RFC 3551 RTP A/V Profile July 2003 + + + Underlying protocol: The profile specifies the use of RTP over + unicast and multicast UDP as well as TCP. (This does not preclude + the use of these definitions when RTP is carried by other lower- + layer protocols.) + + Transport mapping: The standard mapping of RTP and RTCP to + transport-level addresses is used. + + Encapsulation: This profile leaves to applications the + specification of RTP encapsulation in protocols other than UDP. + +3. Registering Additional Encodings + + This profile lists a set of encodings, each of which is comprised of + a particular media data compression or representation plus a payload + format for encapsulation within RTP. Some of those payload formats + are specified here, while others are specified in separate RFCs. It + is expected that additional encodings beyond the set listed here will + be created in the future and specified in additional payload format + RFCs. + + This profile also assigns to each encoding a short name which MAY be + used by higher-level control protocols, such as the Session + Description Protocol (SDP), RFC 2327 [6], to identify encodings + selected for a particular RTP session. + + In some contexts it may be useful to refer to these encodings in the + form of a MIME content-type. To facilitate this, RFC 3555 [7] + provides registrations for all of the encodings names listed here as + MIME subtype names under the "audio" and "video" MIME types through + the MIME registration procedure as specified in RFC 2048 [8]. + + Any additional encodings specified for use under this profile (or + others) may also be assigned names registered as MIME subtypes with + the Internet Assigned Numbers Authority (IANA). This registry + provides a means to insure that the names assigned to the additional + encodings are kept unique. RFC 3555 specifies the information that + is required for the registration of RTP encodings. + + In addition to assigning names to encodings, this profile also + assigns static RTP payload type numbers to some of them. However, + the payload type number space is relatively small and cannot + accommodate assignments for all existing and future encodings. + During the early stages of RTP development, it was necessary to use + statically assigned payload types because no other mechanism had been + specified to bind encodings to payload types. It was anticipated + that non-RTP means beyond the scope of this memo (such as directory + services or invitation protocols) would be specified to establish a + + + +Schulzrinne & Casner Standards Track [Page 6] + +RFC 3551 RTP A/V Profile July 2003 + + + dynamic mapping between a payload type and an encoding. Now, + mechanisms for defining dynamic payload type bindings have been + specified in the Session Description Protocol (SDP) and in other + protocols such as ITU-T Recommendation H.323/H.245. These mechanisms + associate the registered name of the encoding/payload format, along + with any additional required parameters, such as the RTP timestamp + clock rate and number of channels, with a payload type number. This + association is effective only for the duration of the RTP session in + which the dynamic payload type binding is made. This association + applies only to the RTP session for which it is made, thus the + numbers can be re-used for different encodings in different sessions + so the number space limitation is avoided. + + This profile reserves payload type numbers in the range 96-127 + exclusively for dynamic assignment. Applications SHOULD first use + values in this range for dynamic payload types. Those applications + which need to define more than 32 dynamic payload types MAY bind + codes below 96, in which case it is RECOMMENDED that unassigned + payload type numbers be used first. However, the statically assigned + payload types are default bindings and MAY be dynamically bound to + new encodings if needed. Redefining payload types below 96 may cause + incorrect operation if an attempt is made to join a session without + obtaining session description information that defines the dynamic + payload types. + + Dynamic payload types SHOULD NOT be used without a well-defined + mechanism to indicate the mapping. Systems that expect to + interoperate with others operating under this profile SHOULD NOT make + their own assignments of proprietary encodings to particular, fixed + payload types. + + This specification establishes the policy that no additional static + payload types will be assigned beyond the ones defined in this + document. Establishing this policy avoids the problem of trying to + create a set of criteria for accepting static assignments and + encourages the implementation and deployment of the dynamic payload + type mechanisms. + + The final set of static payload type assignments is provided in + Tables 4 and 5. + + + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 7] + +RFC 3551 RTP A/V Profile July 2003 + + +4. Audio + +4.1 Encoding-Independent Rules + + Since the ability to suppress silence is one of the primary + motivations for using packets to transmit voice, the RTP header + carries both a sequence number and a timestamp to allow a receiver to + distinguish between lost packets and periods of time when no data was + transmitted. Discontiguous transmission (silence suppression) MAY be + used with any audio payload format. Receivers MUST assume that + senders may suppress silence unless this is restricted by signaling + specified elsewhere. (Even if the transmitter does not suppress + silence, the receiver should be prepared to handle periods when no + data is present since packets may be lost.) + + Some payload formats (see Sections 4.5.3 and 4.5.6) define a "silence + insertion descriptor" or "comfort noise" frame to specify parameters + for artificial noise that may be generated during a period of silence + to approximate the background noise at the source. For other payload + formats, a generic Comfort Noise (CN) payload format is specified in + RFC 3389 [9]. When the CN payload format is used with another + payload format, different values in the RTP payload type field + distinguish comfort-noise packets from those of the selected payload + format. + + For applications which send either no packets or occasional comfort- + noise packets during silence, the first packet of a talkspurt, that + is, the first packet after a silence period during which packets have + not been transmitted contiguously, SHOULD be distinguished by setting + the marker bit in the RTP data header to one. The marker bit in all + other packets is zero. The beginning of a talkspurt MAY be used to + adjust the playout delay to reflect changing network delays. + Applications without silence suppression MUST set the marker bit to + zero. + + The RTP clock rate used for generating the RTP timestamp is + independent of the number of channels and the encoding; it usually + equals the number of sampling periods per second. For N-channel + encodings, each sampling period (say, 1/8,000 of a second) generates + N samples. (This terminology is standard, but somewhat confusing, as + the total number of samples generated per second is then the sampling + rate times the channel count.) + + If multiple audio channels are used, channels are numbered left-to- + right, starting at one. In RTP audio packets, information from + lower-numbered channels precedes that from higher-numbered channels. + + + + + +Schulzrinne & Casner Standards Track [Page 8] + +RFC 3551 RTP A/V Profile July 2003 + + + For more than two channels, the convention followed by the AIFF-C + audio interchange format SHOULD be followed [3], using the following + notation, unless some other convention is specified for a particular + encoding or payload format: + + l left + r right + c center + S surround + F front + R rear + + channels description channel + 1 2 3 4 5 6 + _________________________________________________ + 2 stereo l r + 3 l r c + 4 l c r S + 5 Fl Fr Fc Sl Sr + 6 l lc c r rc S + + Note: RFC 1890 defined two conventions for the ordering of four + audio channels. Since the ordering is indicated implicitly by + the number of channels, this was ambiguous. In this revision, + the order described as "quadrophonic" has been eliminated to + remove the ambiguity. This choice was based on the observation + that quadrophonic consumer audio format did not become popular + whereas surround-sound subsequently has. + + Samples for all channels belonging to a single sampling instant MUST + be within the same packet. The interleaving of samples from + different channels depends on the encoding. General guidelines are + given in Section 4.3 and 4.4. + + The sampling frequency SHOULD be drawn from the set: 8,000, 11,025, + 16,000, 22,050, 24,000, 32,000, 44,100 and 48,000 Hz. (Older Apple + Macintosh computers had a native sample rate of 22,254.54 Hz, which + can be converted to 22,050 with acceptable quality by dropping 4 + samples in a 20 ms frame.) However, most audio encodings are defined + for a more restricted set of sampling frequencies. Receivers SHOULD + be prepared to accept multi-channel audio, but MAY choose to only + play a single channel. + +4.2 Operating Recommendations + + The following recommendations are default operating parameters. + Applications SHOULD be prepared to handle other values. The ranges + given are meant to give guidance to application writers, allowing a + + + +Schulzrinne & Casner Standards Track [Page 9] + +RFC 3551 RTP A/V Profile July 2003 + + + set of applications conforming to these guidelines to interoperate + without additional negotiation. These guidelines are not intended to + restrict operating parameters for applications that can negotiate a + set of interoperable parameters, e.g., through a conference control + protocol. + + For packetized audio, the default packetization interval SHOULD have + a duration of 20 ms or one frame, whichever is longer, unless + otherwise noted in Table 1 (column "ms/packet"). The packetization + interval determines the minimum end-to-end delay; longer packets + introduce less header overhead but higher delay and make packet loss + more noticeable. For non-interactive applications such as lectures + or for links with severe bandwidth constraints, a higher + packetization delay MAY be used. A receiver SHOULD accept packets + representing between 0 and 200 ms of audio data. (For framed audio + encodings, a receiver SHOULD accept packets with a number of frames + equal to 200 ms divided by the frame duration, rounded up.) This + restriction allows reasonable buffer sizing for the receiver. + +4.3 Guidelines for Sample-Based Audio Encodings + + In sample-based encodings, each audio sample is represented by a + fixed number of bits. Within the compressed audio data, codes for + individual samples may span octet boundaries. An RTP audio packet + may contain any number of audio samples, subject to the constraint + that the number of bits per sample times the number of samples per + packet yields an integral octet count. Fractional encodings produce + less than one octet per sample. + + The duration of an audio packet is determined by the number of + samples in the packet. + + For sample-based encodings producing one or more octets per sample, + samples from different channels sampled at the same sampling instant + SHOULD be packed in consecutive octets. For example, for a two- + channel encoding, the octet sequence is (left channel, first sample), + (right channel, first sample), (left channel, second sample), (right + channel, second sample), .... For multi-octet encodings, octets + SHOULD be transmitted in network byte order (i.e., most significant + octet first). + + The packing of sample-based encodings producing less than one octet + per sample is encoding-specific. + + The RTP timestamp reflects the instant at which the first sample in + the packet was sampled, that is, the oldest information in the + packet. + + + + +Schulzrinne & Casner Standards Track [Page 10] + +RFC 3551 RTP A/V Profile July 2003 + + +4.4 Guidelines for Frame-Based Audio Encodings + + Frame-based encodings encode a fixed-length block of audio into + another block of compressed data, typically also of fixed length. + For frame-based encodings, the sender MAY choose to combine several + such frames into a single RTP packet. The receiver can tell the + number of frames contained in an RTP packet, if all the frames have + the same length, by dividing the RTP payload length by the audio + frame size which is defined as part of the encoding. This does not + work when carrying frames of different sizes unless the frame sizes + are relatively prime. If not, the frames MUST indicate their size. + + For frame-based codecs, the channel order is defined for the whole + block. That is, for two-channel audio, right and left samples SHOULD + be coded independently, with the encoded frame for the left channel + preceding that for the right channel. + + All frame-oriented audio codecs SHOULD be able to encode and decode + several consecutive frames within a single packet. Since the frame + size for the frame-oriented codecs is given, there is no need to use + a separate designation for the same encoding, but with different + number of frames per packet. + + RTP packets SHALL contain a whole number of frames, with frames + inserted according to age within a packet, so that the oldest frame + (to be played first) occurs immediately after the RTP packet header. + The RTP timestamp reflects the instant at which the first sample in + the first frame was sampled, that is, the oldest information in the + packet. + + + + + + + + + + + + + + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 11] + +RFC 3551 RTP A/V Profile July 2003 + + +4.5 Audio Encodings + + name of sampling default + encoding sample/frame bits/sample rate ms/frame ms/packet + __________________________________________________________________ + DVI4 sample 4 var. 20 + G722 sample 8 16,000 20 + G723 frame N/A 8,000 30 30 + G726-40 sample 5 8,000 20 + G726-32 sample 4 8,000 20 + G726-24 sample 3 8,000 20 + G726-16 sample 2 8,000 20 + G728 frame N/A 8,000 2.5 20 + G729 frame N/A 8,000 10 20 + G729D frame N/A 8,000 10 20 + G729E frame N/A 8,000 10 20 + GSM frame N/A 8,000 20 20 + GSM-EFR frame N/A 8,000 20 20 + L8 sample 8 var. 20 + L16 sample 16 var. 20 + LPC frame N/A 8,000 20 20 + MPA frame N/A var. var. + PCMA sample 8 var. 20 + PCMU sample 8 var. 20 + QCELP frame N/A 8,000 20 20 + VDVI sample var. var. 20 + + Table 1: Properties of Audio Encodings (N/A: not applicable; var.: + variable) + + The characteristics of the audio encodings described in this document + are shown in Table 1; they are listed in order of their payload type + in Table 4. While most audio codecs are only specified for a fixed + sampling rate, some sample-based algorithms (indicated by an entry of + "var." in the sampling rate column of Table 1) may be used with + different sampling rates, resulting in different coded bit rates. + When used with a sampling rate other than that for which a static + payload type is defined, non-RTP means beyond the scope of this memo + MUST be used to define a dynamic payload type and MUST indicate the + selected RTP timestamp clock rate, which is usually the same as the + sampling rate for audio. + + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 12] + +RFC 3551 RTP A/V Profile July 2003 + + +4.5.1 DVI4 + + DVI4 uses an adaptive delta pulse code modulation (ADPCM) encoding + scheme that was specified by the Interactive Multimedia Association + (IMA) as the "IMA ADPCM wave type". However, the encoding defined + here as DVI4 differs in three respects from the IMA specification: + + o The RTP DVI4 header contains the predicted value rather than the + first sample value contained the IMA ADPCM block header. + + o IMA ADPCM blocks contain an odd number of samples, since the first + sample of a block is contained just in the header (uncompressed), + followed by an even number of compressed samples. DVI4 has an + even number of compressed samples only, using the `predict' word + from the header to decode the first sample. + + o For DVI4, the 4-bit samples are packed with the first sample in + the four most significant bits and the second sample in the four + least significant bits. In the IMA ADPCM codec, the samples are + packed in the opposite order. + + Each packet contains a single DVI block. This profile only defines + the 4-bit-per-sample version, while IMA also specified a 3-bit-per- + sample encoding. + + The "header" word for each channel has the following structure: + + int16 predict; /* predicted value of first sample + from the previous block (L16 format) */ + u_int8 index; /* current index into stepsize table */ + u_int8 reserved; /* set to zero by sender, ignored by receiver */ + + Each octet following the header contains two 4-bit samples, thus the + number of samples per packet MUST be even because there is no means + to indicate a partially filled last octet. + + Packing of samples for multiple channels is for further study. + + The IMA ADPCM algorithm was described in the document IMA Recommended + Practices for Enhancing Digital Audio Compatibility in Multimedia + Systems (version 3.0). However, the Interactive Multimedia + Association ceased operations in 1997. Resources for an archived + copy of that document and a software implementation of the RTP DVI4 + encoding are listed in Section 13. + + + + + + + +Schulzrinne & Casner Standards Track [Page 13] + +RFC 3551 RTP A/V Profile July 2003 + + +4.5.2 G722 + + G722 is specified in ITU-T Recommendation G.722, "7 kHz audio-coding + within 64 kbit/s". The G.722 encoder produces a stream of octets, + each of which SHALL be octet-aligned in an RTP packet. The first bit + transmitted in the G.722 octet, which is the most significant bit of + the higher sub-band sample, SHALL correspond to the most significant + bit of the octet in the RTP packet. + + Even though the actual sampling rate for G.722 audio is 16,000 Hz, + the RTP clock rate for the G722 payload format is 8,000 Hz because + that value was erroneously assigned in RFC 1890 and must remain + unchanged for backward compatibility. The octet rate or sample-pair + rate is 8,000 Hz. + +4.5.3 G723 + + G723 is specified in ITU Recommendation G.723.1, "Dual-rate speech + coder for multimedia communications transmitting at 5.3 and 6.3 + kbit/s". The G.723.1 5.3/6.3 kbit/s codec was defined by the ITU-T + as a mandatory codec for ITU-T H.324 GSTN videophone terminal + applications. The algorithm has a floating point specification in + Annex B to G.723.1, a silence compression algorithm in Annex A to + G.723.1 and a scalable channel coding scheme for wireless + applications in G.723.1 Annex C. + + This Recommendation specifies a coded representation that can be used + for compressing the speech signal component of multi-media services + at a very low bit rate. Audio is encoded in 30 ms frames, with an + additional delay of 7.5 ms due to look-ahead. A G.723.1 frame can be + one of three sizes: 24 octets (6.3 kb/s frame), 20 octets (5.3 kb/s + frame), or 4 octets. These 4-octet frames are called SID frames + (Silence Insertion Descriptor) and are used to specify comfort noise + parameters. There is no restriction on how 4, 20, and 24 octet + frames are intermixed. The least significant two bits of the first + octet in the frame determine the frame size and codec type: + + bits content octets/frame + 00 high-rate speech (6.3 kb/s) 24 + 01 low-rate speech (5.3 kb/s) 20 + 10 SID frame 4 + 11 reserved + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 14] + +RFC 3551 RTP A/V Profile July 2003 + + + It is possible to switch between the two rates at any 30 ms frame + boundary. Both (5.3 kb/s and 6.3 kb/s) rates are a mandatory part of + the encoder and decoder. Receivers MUST accept both data rates and + MUST accept SID frames unless restriction of these capabilities has + been signaled. The MIME registration for G723 in RFC 3555 [7] + specifies parameters that MAY be used with MIME or SDP to restrict to + a single data rate or to restrict the use of SID frames. This coder + was optimized to represent speech with near-toll quality at the above + rates using a limited amount of complexity. + + The packing of the encoded bit stream into octets and the + transmission order of the octets is specified in Rec. G.723.1 and is + the same as that produced by the G.723 C code reference + implementation. For the 6.3 kb/s data rate, this packing is + illustrated as follows, where the header (HDR) bits are always "0 0" + as shown in Fig. 1 to indicate operation at 6.3 kb/s, and the Z bit + is always set to zero. The diagrams show the bit packing in "network + byte order", also known as big-endian order. The bits of each 32-bit + word are numbered 0 to 31, with the most significant bit on the left + and numbered 0. The octets (bytes) of each word are transmitted most + significant octet first. The bits of each data field are numbered in + the order of the bit stream representation of the encoding (least + significant bit first). The vertical bars indicate the boundaries + between field fragments. + + + + + + + + + + + + + + + + + + + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 15] + +RFC 3551 RTP A/V Profile July 2003 + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | LPC |HDR| LPC | LPC | ACL0 |LPC| + | | | | | | | + |0 0 0 0 0 0|0 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| + |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 | + | | 1 |C| | 3 | 2 | | | + |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| + |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 | + | | | | | | | + |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0| + |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | MSBPOS |Z|POS| MSBPOS | POS0 |POS| POS0 | + | | | 0 | | | 1 | | + |0 0 0 0 0 0 0|0|0 0|1 1 1 0 0 0|0 0 0 0 0 0 0 0|0 0|1 1 1 1 1 1| + |6 5 4 3 2 1 0| |1 0|2 1 0 9 8 7|9 8 7 6 5 4 3 2|1 0|5 4 3 2 1 0| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | POS1 | POS2 | POS1 | POS2 | POS3 | POS2 | + | | | | | | | + |0 0 0 0 0 0 0 0|0 0 0 0|1 1 1 1|1 1 0 0 0 0 0 0|0 0 0 0|1 1 1 1| + |9 8 7 6 5 4 3 2|3 2 1 0|3 2 1 0|1 0 9 8 7 6 5 4|3 2 1 0|5 4 3 2| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | POS3 | PSIG0 |POS|PSIG2| PSIG1 | PSIG3 |PSIG2| + | | | 3 | | | | | + |1 1 0 0 0 0 0 0|0 0 0 0 0 0|1 1|0 0 0|0 0 0 0 0|0 0 0 0 0|0 0 0| + |1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2|2 1 0|4 3 2 1 0|4 3 2 1 0|5 4 3| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 1: G.723 (6.3 kb/s) bit packing + + For the 5.3 kb/s data rate, the header (HDR) bits are always "0 1", + as shown in Fig. 2, to indicate operation at 5.3 kb/s. + + + + + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 16] + +RFC 3551 RTP A/V Profile July 2003 + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | LPC |HDR| LPC | LPC | ACL0 |LPC| + | | | | | | | + |0 0 0 0 0 0|0 1|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| + |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | ACL2 |ACL|A| GAIN0 |ACL|ACL| GAIN0 | GAIN1 | + | | 1 |C| | 3 | 2 | | | + |0 0 0 0 0|0 0|0|0 0 0 0|0 0|0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| + |4 3 2 1 0|1 0|6|3 2 1 0|1 0|6 5|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | GAIN2 | GAIN1 | GAIN2 | GAIN3 | GRID | GAIN3 | + | | | | | | | + |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0| + |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0|4 3 2 1|1 0 9 8| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | POS0 | POS1 | POS0 | POS1 | POS2 | + | | | | | | + |0 0 0 0 0 0 0 0|0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0 0 0 0 0| + |7 6 5 4 3 2 1 0|3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|7 6 5 4 3 2 1 0| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | POS3 | POS2 | POS3 | PSIG1 | PSIG0 | PSIG3 | PSIG2 | + | | | | | | | | + |0 0 0 0|1 1 0 0|1 1 0 0 0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0|0 0 0 0| + |3 2 1 0|1 0 9 8|1 0 9 8 7 6 5 4|3 2 1 0|3 2 1 0|3 2 1 0|3 2 1 0| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 2: G.723 (5.3 kb/s) bit packing + + The packing of G.723.1 SID (silence) frames, which are indicated by + the header (HDR) bits having the pattern "1 0", is depicted in Fig. + 3. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | LPC |HDR| LPC | LPC | GAIN |LPC| + | | | | | | | + |0 0 0 0 0 0|1 0|1 1 1 1 0 0 0 0|2 2 1 1 1 1 1 1|0 0 0 0 0 0|2 2| + |5 4 3 2 1 0| |3 2 1 0 9 8 7 6|1 0 9 8 7 6 5 4|5 4 3 2 1 0|3 2| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 3: G.723 SID mode bit packing + + + + + + +Schulzrinne & Casner Standards Track [Page 17] + +RFC 3551 RTP A/V Profile July 2003 + + +4.5.4 G726-40, G726-32, G726-24, and G726-16 + + ITU-T Recommendation G.726 describes, among others, the algorithm + recommended for conversion of a single 64 kbit/s A-law or mu-law PCM + channel encoded at 8,000 samples/sec to and from a 40, 32, 24, or 16 + kbit/s channel. The conversion is applied to the PCM stream using an + Adaptive Differential Pulse Code Modulation (ADPCM) transcoding + technique. The ADPCM representation consists of a series of + codewords with a one-to-one correspondence to the samples in the PCM + stream. The G726 data rates of 40, 32, 24, and 16 kbit/s have + codewords of 5, 4, 3, and 2 bits, respectively. + + The 16 and 24 kbit/s encodings do not provide toll quality speech. + They are designed for used in overloaded Digital Circuit + Multiplication Equipment (DCME). ITU-T G.726 recommends that the 16 + and 24 kbit/s encodings should be alternated with higher data rate + encodings to provide an average sample size of between 3.5 and 3.7 + bits per sample. + + The encodings of G.726 are here denoted as G726-40, G726-32, G726-24, + and G726-16. Prior to 1990, G721 described the 32 kbit/s ADPCM + encoding, and G723 described the 40, 32, and 16 kbit/s encodings. + Thus, G726-32 designates the same algorithm as G721 in RFC 1890. + + A stream of G726 codewords contains no information on the encoding + being used, therefore transitions between G726 encoding types are not + permitted within a sequence of packed codewords. Applications MUST + determine the encoding type of packed codewords from the RTP payload + identifier. + + No payload-specific header information SHALL be included as part of + the audio data. A stream of G726 codewords MUST be packed into + octets as follows: the first codeword is placed into the first octet + such that the least significant bit of the codeword aligns with the + least significant bit in the octet, the second codeword is then + packed so that its least significant bit coincides with the least + significant unoccupied bit in the octet. When a complete codeword + cannot be placed into an octet, the bits overlapping the octet + boundary are placed into the least significant bits of the next + octet. Packing MUST end with a completely packed final octet. The + number of codewords packed will therefore be a multiple of 8, 2, 8, + and 4 for G726-40, G726-32, G726-24, and G726-16, respectively. An + example of the packing scheme for G726-32 codewords is as shown, + where bit 7 is the least significant bit of the first octet, and bit + A3 is the least significant bit of the first codeword: + + + + + + +Schulzrinne & Casner Standards Track [Page 18] + +RFC 3551 RTP A/V Profile July 2003 + + + 0 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- + |B B B B|A A A A|D D D D|C C C C| ... + |0 1 2 3|0 1 2 3|0 1 2 3|0 1 2 3| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- + + An example of the packing scheme for G726-24 codewords follows, where + again bit 7 is the least significant bit of the first octet, and bit + A2 is the least significant bit of the first codeword: + + 0 1 2 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- + |C C|B B B|A A A|F|E E E|D D D|C|H H H|G G G|F F| ... + |1 2|0 1 2|0 1 2|2|0 1 2|0 1 2|0|0 1 2|0 1 2|0 1| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- + + Note that the "little-endian" direction in which samples are packed + into octets in the G726-16, -24, -32 and -40 payload formats + specified here is consistent with ITU-T Recommendation X.420, but is + the opposite of what is specified in ITU-T Recommendation I.366.2 + Annex E for ATM AAL2 transport. A second set of RTP payload formats + matching the packetization of I.366.2 Annex E and identified by MIME + subtypes AAL2-G726-16, -24, -32 and -40 will be specified in a + separate document. + +4.5.5 G728 + + G728 is specified in ITU-T Recommendation G.728, "Coding of speech at + 16 kbit/s using low-delay code excited linear prediction". + + A G.278 encoder translates 5 consecutive audio samples into a 10-bit + codebook index, resulting in a bit rate of 16 kb/s for audio sampled + at 8,000 samples per second. The group of five consecutive samples + is called a vector. Four consecutive vectors, labeled V1 to V4 + (where V1 is to be played first by the receiver), build one G.728 + frame. The four vectors of 40 bits are packed into 5 octets, labeled + B1 through B5. B1 SHALL be placed first in the RTP packet. + + Referring to the figure below, the principle for bit order is + "maintenance of bit significance". Bits from an older vector are + more significant than bits from newer vectors. The MSB of the frame + goes to the MSB of B1 and the LSB of the frame goes to LSB of B5. + + + + + + + +Schulzrinne & Casner Standards Track [Page 19] + +RFC 3551 RTP A/V Profile July 2003 + + + 1 2 3 3 + 0 0 0 0 9 + ++++++++++++++++++++++++++++++++++++++++ + <---V1---><---V2---><---V3---><---V4---> vectors + <--B1--><--B2--><--B3--><--B4--><--B5--> octets + <------------- frame 1 ----------------> + + In particular, B1 contains the eight most significant bits of V1, + with the MSB of V1 being the MSB of B1. B2 contains the two least + significant bits of V1, the more significant of the two in its MSB, + and the six most significant bits of V2. B1 SHALL be placed first in + the RTP packet and B5 last. + +4.5.6 G729 + + G729 is specified in ITU-T Recommendation G.729, "Coding of speech at + 8 kbit/s using conjugate structure-algebraic code excited linear + prediction (CS-ACELP)". A reduced-complexity version of the G.729 + algorithm is specified in Annex A to Rec. G.729. The speech coding + algorithms in the main body of G.729 and in G.729 Annex A are fully + interoperable with each other, so there is no need to further + distinguish between them. An implementation that signals or accepts + use of G729 payload format may implement either G.729 or G.729A + unless restricted by additional signaling specified elsewhere related + specifically to the encoding rather than the payload format. The + G.729 and G.729 Annex A codecs were optimized to represent speech + with high quality, where G.729 Annex A trades some speech quality for + an approximate 50% complexity reduction [10]. See the next Section + (4.5.7) for other data rates added in later G.729 Annexes. For all + data rates, the sampling frequency (and RTP timestamp clock rate) is + 8,000 Hz. + + A voice activity detector (VAD) and comfort noise generator (CNG) + algorithm in Annex B of G.729 is RECOMMENDED for digital simultaneous + voice and data applications and can be used in conjunction with G.729 + or G.729 Annex A. A G.729 or G.729 Annex A frame contains 10 octets, + while the G.729 Annex B comfort noise frame occupies 2 octets. + Receivers MUST accept comfort noise frames if restriction of their + use has not been signaled. The MIME registration for G729 in RFC + 3555 [7] specifies a parameter that MAY be used with MIME or SDP to + restrict the use of comfort noise frames. + + A G729 RTP packet may consist of zero or more G.729 or G.729 Annex A + frames, followed by zero or one G.729 Annex B frames. The presence + of a comfort noise frame can be deduced from the length of the RTP + payload. The default packetization interval is 20 ms (two frames), + but in some situations it may be desirable to send 10 ms packets. An + + + + +Schulzrinne & Casner Standards Track [Page 20] + +RFC 3551 RTP A/V Profile July 2003 + + + example would be a transition from speech to comfort noise in the + first 10 ms of the packet. For some applications, a longer + packetization interval may be required to reduce the packet rate. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |L| L1 | L2 | L3 | P1 |P| C1 | + |0| | | | |0| | + | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2 3 4| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | C1 | S1 | GA1 | GB1 | P2 | C2 | + | 1 1 1| | | | | | + |5 6 7 8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6 7| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | C2 | S2 | GA2 | GB2 | + | 1 1 1| | | | + |8 9 0 1 2|0 1 2 3|0 1 2|0 1 2 3| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 4: G.729 and G.729A bit packing + + The transmitted parameters of a G.729/G.729A 10-ms frame, consisting + of 80 bits, are defined in Recommendation G.729, Table 8/G.729. The + mapping of the these parameters is given below in Fig. 4. The + diagrams show the bit packing in "network byte order", also known as + big-endian order. The bits of each 32-bit word are numbered 0 to 31, + with the most significant bit on the left and numbered 0. The octets + (bytes) of each word are transmitted most significant octet first. + The bits of each data field are numbered in the order as produced by + the G.729 C code reference implementation. + + The packing of the G.729 Annex B comfort noise frame is shown in Fig. + 5. + + 0 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |L| LSF1 | LSF2 | GAIN |R| + |S| | | |E| + |F| | | |S| + |0|0 1 2 3 4|0 1 2 3|0 1 2 3 4|V| RESV = Reserved (zero) + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 5: G.729 Annex B bit packing + + + + + + +Schulzrinne & Casner Standards Track [Page 21] + +RFC 3551 RTP A/V Profile July 2003 + + +4.5.7 G729D and G729E + + Annexes D and E to ITU-T Recommendation G.729 provide additional data + rates. Because the data rate is not signaled in the bitstream, the + different data rates are given distinct RTP encoding names which are + mapped to distinct payload type numbers. G729D indicates a 6.4 + kbit/s coding mode (G.729 Annex D, for momentary reduction in channel + capacity), while G729E indicates an 11.8 kbit/s mode (G.729 Annex E, + for improved performance with a wide range of narrow-band input + signals, e.g., music and background noise). Annex E has two + operating modes, backward adaptive and forward adaptive, which are + signaled by the first two bits in each frame (the most significant + two bits of the first octet). + + The voice activity detector (VAD) and comfort noise generator (CNG) + algorithm specified in Annex B of G.729 may be used with Annex D and + Annex E frames in addition to G.729 and G.729 Annex A frames. The + algorithm details for the operation of Annexes D and E with the Annex + B CNG are specified in G.729 Annexes F and G. Note that Annexes F + and G do not introduce any new encodings. Receivers MUST accept + comfort noise frames if restriction of their use has not been + signaled. The MIME registrations for G729D and G729E in RFC 3555 [7] + specify a parameter that MAY be used with MIME or SDP to restrict the + use of comfort noise frames. + + For G729D, an RTP packet may consist of zero or more G.729 Annex D + frames, followed by zero or one G.729 Annex B frame. Similarly, for + G729E, an RTP packet may consist of zero or more G.729 Annex E + frames, followed by zero or one G.729 Annex B frame. The presence of + a comfort noise frame can be deduced from the length of the RTP + payload. + + A single RTP packet must contain frames of only one data rate, + optionally followed by one comfort noise frame. The data rate may be + changed from packet to packet by changing the payload type number. + G.729 Annexes D, E and H describe what the encoding and decoding + algorithms must do to accommodate a change in data rate. + + For G729D, the bits of a G.729 Annex D frame are formatted as shown + below in Fig. 6 (cf. Table D.1/G.729). The frame length is 64 bits. + + + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 22] + +RFC 3551 RTP A/V Profile July 2003 + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |L| L1 | L2 | L3 | P1 | C1 | + |0| | | | | | + | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7|0 1 2 3 4 5| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | C1 |S1 | GA1 | GB1 | P2 | C2 |S2 | GA2 | GB2 | + | | | | | | | | | | + |6 7 8|0 1|0 1 2|0 1 2|0 1 2 3|0 1 2 3 4 5 6 7 8|0 1|0 1 2|0 1 2| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 6: G.729 Annex D bit packing + + The net bit rate for the G.729 Annex E algorithm is 11.8 kbit/s and a + total of 118 bits are used. Two bits are appended as "don't care" + bits to complete an integer number of octets for the frame. For + G729E, the bits of a data frame are formatted as shown in the next + two diagrams (cf. Table E.1/G.729). The fields for the G729E forward + adaptive mode are packed as shown in Fig. 7. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |0 0|L| L1 | L2 | L3 | P1 |P| C0_1| + | |0| | | | |0| | + | | |0 1 2 3 4 5 6|0 1 2 3 4|0 1 2 3 4|0 1 2 3 4 5 6 7| |0 1 2| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | C1_1 | C2_1 | C3_1 | C4_1 | + | | | | | | + |3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | GA1 | GB1 | P2 | C0_2 | C1_2 | C2_2 | + | | | | | | | + |0 1 2|0 1 2 3|0 1 2 3 4|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | C3_2 | C4_2 | GA2 | GB2 |DC | + | | | | | | | + |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 7: G.729 Annex E (forward adaptive mode) bit packing + + The fields for the G729E backward adaptive mode are packed as shown + in Fig. 8. + + + + + + +Schulzrinne & Casner Standards Track [Page 23] + +RFC 3551 RTP A/V Profile July 2003 + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |1 1| P1 |P| C0_1 | C1_1 | + | | |0| 1 1 1| | + | |0 1 2 3 4 5 6 7|0|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | C2_1 | C3_1 | C4_1 |GA1 | GB1 |P2 | + | | | | | | | | + |8 9|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | C0_2 | C1_2 | C2_2 | + | | 1 1 1| | | + |2 3 4|0 1 2 3 4 5 6 7 8 9 0 1 2|0 1 2 3 4 5 6 7 8 9|0 1 2 3 4 5| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | C3_2 | C4_2 | GA2 | GB2 |DC | + | | | | | | | + |6|0 1 2 3 4 5 6|0 1 2 3 4 5 6|0 1 2|0 1 2 3|0 1| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 8: G.729 Annex E (backward adaptive mode) bit packing + +4.5.8 GSM + + GSM (Group Speciale Mobile) denotes the European GSM 06.10 standard + for full-rate speech transcoding, ETS 300 961, which is based on + RPE/LTP (residual pulse excitation/long term prediction) coding at a + rate of 13 kb/s [11,12,13]. The text of the standard can be obtained + from: + + ETSI (European Telecommunications Standards Institute) + ETSI Secretariat: B.P.152 + F-06561 Valbonne Cedex + France + Phone: +33 92 94 42 00 + Fax: +33 93 65 47 16 + + Blocks of 160 audio samples are compressed into 33 octets, for an + effective data rate of 13,200 b/s. + +4.5.8.1 General Packaging Issues + + The GSM standard (ETS 300 961) specifies the bit stream produced by + the codec, but does not specify how these bits should be packed for + transmission. The packetization specified here has subsequently been + adopted in ETSI Technical Specification TS 101 318. Some software + implementations of the GSM codec use a different packing than that + specified here. + + + +Schulzrinne & Casner Standards Track [Page 24] + +RFC 3551 RTP A/V Profile July 2003 + + + field field name bits field field name bits + ________________________________________________ + 1 LARc[0] 6 39 xmc[22] 3 + 2 LARc[1] 6 40 xmc[23] 3 + 3 LARc[2] 5 41 xmc[24] 3 + 4 LARc[3] 5 42 xmc[25] 3 + 5 LARc[4] 4 43 Nc[2] 7 + 6 LARc[5] 4 44 bc[2] 2 + 7 LARc[6] 3 45 Mc[2] 2 + 8 LARc[7] 3 46 xmaxc[2] 6 + 9 Nc[0] 7 47 xmc[26] 3 + 10 bc[0] 2 48 xmc[27] 3 + 11 Mc[0] 2 49 xmc[28] 3 + 12 xmaxc[0] 6 50 xmc[29] 3 + 13 xmc[0] 3 51 xmc[30] 3 + 14 xmc[1] 3 52 xmc[31] 3 + 15 xmc[2] 3 53 xmc[32] 3 + 16 xmc[3] 3 54 xmc[33] 3 + 17 xmc[4] 3 55 xmc[34] 3 + 18 xmc[5] 3 56 xmc[35] 3 + 19 xmc[6] 3 57 xmc[36] 3 + 20 xmc[7] 3 58 xmc[37] 3 + 21 xmc[8] 3 59 xmc[38] 3 + 22 xmc[9] 3 60 Nc[3] 7 + 23 xmc[10] 3 61 bc[3] 2 + 24 xmc[11] 3 62 Mc[3] 2 + 25 xmc[12] 3 63 xmaxc[3] 6 + 26 Nc[1] 7 64 xmc[39] 3 + 27 bc[1] 2 65 xmc[40] 3 + 28 Mc[1] 2 66 xmc[41] 3 + 29 xmaxc[1] 6 67 xmc[42] 3 + 30 xmc[13] 3 68 xmc[43] 3 + 31 xmc[14] 3 69 xmc[44] 3 + 32 xmc[15] 3 70 xmc[45] 3 + 33 xmc[16] 3 71 xmc[46] 3 + 34 xmc[17] 3 72 xmc[47] 3 + 35 xmc[18] 3 73 xmc[48] 3 + 36 xmc[19] 3 74 xmc[49] 3 + 37 xmc[20] 3 75 xmc[50] 3 + 38 xmc[21] 3 76 xmc[51] 3 + + Table 2: Ordering of GSM variables + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 25] + +RFC 3551 RTP A/V Profile July 2003 + + + Octet Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 + _____________________________________________________________________ + 0 1 1 0 1 LARc0.0 LARc0.1 LARc0.2 LARc0.3 + 1 LARc0.4 LARc0.5 LARc1.0 LARc1.1 LARc1.2 LARc1.3 LARc1.4 LARc1.5 + 2 LARc2.0 LARc2.1 LARc2.2 LARc2.3 LARc2.4 LARc3.0 LARc3.1 LARc3.2 + 3 LARc3.3 LARc3.4 LARc4.0 LARc4.1 LARc4.2 LARc4.3 LARc5.0 LARc5.1 + 4 LARc5.2 LARc5.3 LARc6.0 LARc6.1 LARc6.2 LARc7.0 LARc7.1 LARc7.2 + 5 Nc0.0 Nc0.1 Nc0.2 Nc0.3 Nc0.4 Nc0.5 Nc0.6 bc0.0 + 6 bc0.1 Mc0.0 Mc0.1 xmaxc00 xmaxc01 xmaxc02 xmaxc03 xmaxc04 + 7 xmaxc05 xmc0.0 xmc0.1 xmc0.2 xmc1.0 xmc1.1 xmc1.2 xmc2.0 + 8 xmc2.1 xmc2.2 xmc3.0 xmc3.1 xmc3.2 xmc4.0 xmc4.1 xmc4.2 + 9 xmc5.0 xmc5.1 xmc5.2 xmc6.0 xmc6.1 xmc6.2 xmc7.0 xmc7.1 + 10 xmc7.2 xmc8.0 xmc8.1 xmc8.2 xmc9.0 xmc9.1 xmc9.2 xmc10.0 + 11 xmc10.1 xmc10.2 xmc11.0 xmc11.1 xmc11.2 xmc12.0 xmc12.1 xcm12.2 + 12 Nc1.0 Nc1.1 Nc1.2 Nc1.3 Nc1.4 Nc1.5 Nc1.6 bc1.0 + 13 bc1.1 Mc1.0 Mc1.1 xmaxc10 xmaxc11 xmaxc12 xmaxc13 xmaxc14 + 14 xmax15 xmc13.0 xmc13.1 xmc13.2 xmc14.0 xmc14.1 xmc14.2 xmc15.0 + 15 xmc15.1 xmc15.2 xmc16.0 xmc16.1 xmc16.2 xmc17.0 xmc17.1 xmc17.2 + 16 xmc18.0 xmc18.1 xmc18.2 xmc19.0 xmc19.1 xmc19.2 xmc20.0 xmc20.1 + 17 xmc20.2 xmc21.0 xmc21.1 xmc21.2 xmc22.0 xmc22.1 xmc22.2 xmc23.0 + 18 xmc23.1 xmc23.2 xmc24.0 xmc24.1 xmc24.2 xmc25.0 xmc25.1 xmc25.2 + 19 Nc2.0 Nc2.1 Nc2.2 Nc2.3 Nc2.4 Nc2.5 Nc2.6 bc2.0 + 20 bc2.1 Mc2.0 Mc2.1 xmaxc20 xmaxc21 xmaxc22 xmaxc23 xmaxc24 + 21 xmaxc25 xmc26.0 xmc26.1 xmc26.2 xmc27.0 xmc27.1 xmc27.2 xmc28.0 + 22 xmc28.1 xmc28.2 xmc29.0 xmc29.1 xmc29.2 xmc30.0 xmc30.1 xmc30.2 + 23 xmc31.0 xmc31.1 xmc31.2 xmc32.0 xmc32.1 xmc32.2 xmc33.0 xmc33.1 + 24 xmc33.2 xmc34.0 xmc34.1 xmc34.2 xmc35.0 xmc35.1 xmc35.2 xmc36.0 + 25 Xmc36.1 xmc36.2 xmc37.0 xmc37.1 xmc37.2 xmc38.0 xmc38.1 xmc38.2 + 26 Nc3.0 Nc3.1 Nc3.2 Nc3.3 Nc3.4 Nc3.5 Nc3.6 bc3.0 + 27 bc3.1 Mc3.0 Mc3.1 xmaxc30 xmaxc31 xmaxc32 xmaxc33 xmaxc34 + 28 xmaxc35 xmc39.0 xmc39.1 xmc39.2 xmc40.0 xmc40.1 xmc40.2 xmc41.0 + 29 xmc41.1 xmc41.2 xmc42.0 xmc42.1 xmc42.2 xmc43.0 xmc43.1 xmc43.2 + 30 xmc44.0 xmc44.1 xmc44.2 xmc45.0 xmc45.1 xmc45.2 xmc46.0 xmc46.1 + 31 xmc46.2 xmc47.0 xmc47.1 xmc47.2 xmc48.0 xmc48.1 xmc48.2 xmc49.0 + 32 xmc49.1 xmc49.2 xmc50.0 xmc50.1 xmc50.2 xmc51.0 xmc51.1 xmc51.2 + + Table 3: GSM payload format + + In the GSM packing used by RTP, the bits SHALL be packed beginning + from the most significant bit. Every 160 sample GSM frame is coded + into one 33 octet (264 bit) buffer. Every such buffer begins with a + 4 bit signature (0xD), followed by the MSB encoding of the fields of + the frame. The first octet thus contains 1101 in the 4 most + significant bits (0-3) and the 4 most significant bits of F1 (0-3) in + the 4 least significant bits (4-7). The second octet contains the 2 + least significant bits of F1 in bits 0-1, and F2 in bits 2-7, and so + on. The order of the fields in the frame is described in Table 2. + + + + +Schulzrinne & Casner Standards Track [Page 26] + +RFC 3551 RTP A/V Profile July 2003 + + +4.5.8.2 GSM Variable Names and Numbers + + In the RTP encoding we have the bit pattern described in Table 3, + where F.i signifies the ith bit of the field F, bit 0 is the most + significant bit, and the bits of every octet are numbered from 0 to 7 + from most to least significant. + +4.5.9 GSM-EFR + + GSM-EFR denotes GSM 06.60 enhanced full rate speech transcoding, + specified in ETS 300 726 which is available from ETSI at the address + given in Section 4.5.8. This codec has a frame length of 244 bits. + For transmission in RTP, each codec frame is packed into a 31 octet + (248 bit) buffer beginning with a 4-bit signature 0xC in a manner + similar to that specified here for the original GSM 06.10 codec. The + packing is specified in ETSI Technical Specification TS 101 318. + +4.5.10 L8 + + L8 denotes linear audio data samples, using 8-bits of precision with + an offset of 128, that is, the most negative signal is encoded as + zero. + +4.5.11 L16 + + L16 denotes uncompressed audio data samples, using 16-bit signed + representation with 65,535 equally divided steps between minimum and + maximum signal level, ranging from -32,768 to 32,767. The value is + represented in two's complement notation and transmitted in network + byte order (most significant byte first). + + The MIME registration for L16 in RFC 3555 [7] specifies parameters + that MAY be used with MIME or SDP to indicate that analog pre- + emphasis was applied to the signal before quantization or to indicate + that a multiple-channel audio stream follows a different channel + ordering convention than is specified in Section 4.1. + +4.5.12 LPC + + LPC designates an experimental linear predictive encoding contributed + by Ron Frederick, which is based on an implementation written by Ron + Zuckerman posted to the Usenet group comp.dsp on June 26, 1992. The + codec generates 14 octets for every frame. The framesize is set to + 20 ms, resulting in a bit rate of 5,600 b/s. + + + + + + + +Schulzrinne & Casner Standards Track [Page 27] + +RFC 3551 RTP A/V Profile July 2003 + + +4.5.13 MPA + + MPA denotes MPEG-1 or MPEG-2 audio encapsulated as elementary + streams. The encoding is defined in ISO standards ISO/IEC 11172-3 + and 13818-3. The encapsulation is specified in RFC 2250 [14]. + + The encoding may be at any of three levels of complexity, called + Layer I, II and III. The selected layer as well as the sampling rate + and channel count are indicated in the payload. The RTP timestamp + clock rate is always 90,000, independent of the sampling rate. + MPEG-1 audio supports sampling rates of 32, 44.1, and 48 kHz (ISO/IEC + 11172-3, section 1.1; "Scope"). MPEG-2 supports sampling rates of + 16, 22.05 and 24 kHz. The number of samples per frame is fixed, but + the frame size will vary with the sampling rate and bit rate. + + The MIME registration for MPA in RFC 3555 [7] specifies parameters + that MAY be used with MIME or SDP to restrict the selection of layer, + channel count, sampling rate, and bit rate. + +4.5.14 PCMA and PCMU + + PCMA and PCMU are specified in ITU-T Recommendation G.711. Audio + data is encoded as eight bits per sample, after logarithmic scaling. + PCMU denotes mu-law scaling, PCMA A-law scaling. A detailed + description is given by Jayant and Noll [15]. Each G.711 octet SHALL + be octet-aligned in an RTP packet. The sign bit of each G.711 octet + SHALL correspond to the most significant bit of the octet in the RTP + packet (i.e., assuming the G.711 samples are handled as octets on the + host machine, the sign bit SHALL be the most significant bit of the + octet as defined by the host machine format). The 56 kb/s and 48 + kb/s modes of G.711 are not applicable to RTP, since PCMA and PCMU + MUST always be transmitted as 8-bit samples. + + See Section 4.1 regarding silence suppression. + +4.5.15 QCELP + + The Electronic Industries Association (EIA) & Telecommunications + Industry Association (TIA) standard IS-733, "TR45: High Rate Speech + Service Option for Wideband Spread Spectrum Communications Systems", + defines the QCELP audio compression algorithm for use in wireless + CDMA applications. The QCELP CODEC compresses each 20 milliseconds + of 8,000 Hz, 16-bit sampled input speech into one of four different + size output frames: Rate 1 (266 bits), Rate 1/2 (124 bits), Rate 1/4 + (54 bits) or Rate 1/8 (20 bits). For typical speech patterns, this + results in an average output of 6.8 kb/s for normal mode and 4.7 kb/s + for reduced rate mode. The packetization of the QCELP audio codec is + described in [16]. + + + +Schulzrinne & Casner Standards Track [Page 28] + +RFC 3551 RTP A/V Profile July 2003 + + +4.5.16 RED + + The redundant audio payload format "RED" is specified by RFC 2198 + [17]. It defines a means by which multiple redundant copies of an + audio packet may be transmitted in a single RTP stream. Each packet + in such a stream contains, in addition to the audio data for that + packetization interval, a (more heavily compressed) copy of the data + from a previous packetization interval. This allows an approximation + of the data from lost packets to be recovered upon decoding of a + subsequent packet, giving much improved sound quality when compared + with silence substitution for lost packets. + +4.5.17 VDVI + + VDVI is a variable-rate version of DVI4, yielding speech bit rates of + between 10 and 25 kb/s. It is specified for single-channel operation + only. Samples are packed into octets starting at the most- + significant bit. The last octet is padded with 1 bits if the last + sample does not fill the last octet. This padding is distinct from + the valid codewords. The receiver needs to detect the padding + because there is no explicit count of samples in the packet. + + It uses the following encoding: + + DVI4 codeword VDVI bit pattern + _______________________________ + 0 00 + 1 010 + 2 1100 + 3 11100 + 4 111100 + 5 1111100 + 6 11111100 + 7 11111110 + 8 10 + 9 011 + 10 1101 + 11 11101 + 12 111101 + 13 1111101 + 14 11111101 + 15 11111111 + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 29] + +RFC 3551 RTP A/V Profile July 2003 + + +5. Video + + The following sections describe the video encodings that are defined + in this memo and give their abbreviated names used for + identification. These video encodings and their payload types are + listed in Table 5. + + All of these video encodings use an RTP timestamp frequency of 90,000 + Hz, the same as the MPEG presentation time stamp frequency. This + frequency yields exact integer timestamp increments for the typical + 24 (HDTV), 25 (PAL), and 29.97 (NTSC) and 30 Hz (HDTV) frame rates + and 50, 59.94 and 60 Hz field rates. While 90 kHz is the RECOMMENDED + rate for future video encodings used within this profile, other rates + MAY be used. However, it is not sufficient to use the video frame + rate (typically between 15 and 30 Hz) because that does not provide + adequate resolution for typical synchronization requirements when + calculating the RTP timestamp corresponding to the NTP timestamp in + an RTCP SR packet. The timestamp resolution MUST also be sufficient + for the jitter estimate contained in the receiver reports. + + For most of these video encodings, the RTP timestamp encodes the + sampling instant of the video image contained in the RTP data packet. + If a video image occupies more than one packet, the timestamp is the + same on all of those packets. Packets from different video images + are distinguished by their different timestamps. + + Most of these video encodings also specify that the marker bit of the + RTP header SHOULD be set to one in the last packet of a video frame + and otherwise set to zero. Thus, it is not necessary to wait for a + following packet with a different timestamp to detect that a new + frame should be displayed. + +5.1 CelB + + The CELL-B encoding is a proprietary encoding proposed by Sun + Microsystems. The byte stream format is described in RFC 2029 [18]. + +5.2 JPEG + + The encoding is specified in ISO Standards 10918-1 and 10918-2. The + RTP payload format is as specified in RFC 2435 [19]. + +5.3 H261 + + The encoding is specified in ITU-T Recommendation H.261, "Video codec + for audiovisual services at p x 64 kbit/s". The packetization and + RTP-specific properties are described in RFC 2032 [20]. + + + + +Schulzrinne & Casner Standards Track [Page 30] + +RFC 3551 RTP A/V Profile July 2003 + + +5.4 H263 + + The encoding is specified in the 1996 version of ITU-T Recommendation + H.263, "Video coding for low bit rate communication". The + packetization and RTP-specific properties are described in RFC 2190 + [21]. The H263-1998 payload format is RECOMMENDED over this one for + use by new implementations. + +5.5 H263-1998 + + The encoding is specified in the 1998 version of ITU-T Recommendation + H.263, "Video coding for low bit rate communication". The + packetization and RTP-specific properties are described in RFC 2429 + [22]. Because the 1998 version of H.263 is a superset of the 1996 + syntax, this payload format can also be used with the 1996 version of + H.263, and is RECOMMENDED for this use by new implementations. This + payload format does not replace RFC 2190, which continues to be used + by existing implementations, and may be required for backward + compatibility in new implementations. Implementations using the new + features of the 1998 version of H.263 MUST use the payload format + described in RFC 2429. + +5.6 MPV + + MPV designates the use of MPEG-1 and MPEG-2 video encoding elementary + streams as specified in ISO Standards ISO/IEC 11172 and 13818-2, + respectively. The RTP payload format is as specified in RFC 2250 + [14], Section 3. + + The MIME registration for MPV in RFC 3555 [7] specifies a parameter + that MAY be used with MIME or SDP to restrict the selection of the + type of MPEG video. + +5.7 MP2T + + MP2T designates the use of MPEG-2 transport streams, for either audio + or video. The RTP payload format is described in RFC 2250 [14], + Section 2. + + + + + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 31] + +RFC 3551 RTP A/V Profile July 2003 + + +5.8 nv + + The encoding is implemented in the program `nv', version 4, developed + at Xerox PARC by Ron Frederick. Further information is available + from the author: + + Ron Frederick + Blue Coat Systems Inc. + 650 Almanor Avenue + Sunnyvale, CA 94085 + United States + EMail: ronf@bluecoat.com + +6. Payload Type Definitions + + Tables 4 and 5 define this profile's static payload type values for + the PT field of the RTP data header. In addition, payload type + values in the range 96-127 MAY be defined dynamically through a + conference control protocol, which is beyond the scope of this + document. For example, a session directory could specify that for a + given session, payload type 96 indicates PCMU encoding, 8,000 Hz + sampling rate, 2 channels. Entries in Tables 4 and 5 with payload + type "dyn" have no static payload type assigned and are only used + with a dynamic payload type. Payload type 2 was assigned to G721 in + RFC 1890 and to its equivalent successor G726-32 in draft versions of + this specification, but its use is now deprecated and that static + payload type is marked reserved due to conflicting use for the + payload formats G726-32 and AAL2-G726-32 (see Section 4.5.4). + Payload type 13 indicates the Comfort Noise (CN) payload format + specified in RFC 3389 [9]. Payload type 19 is marked "reserved" + because some draft versions of this specification assigned that + number to an earlier version of the comfort noise payload format. + The payload type range 72-76 is marked "reserved" so that RTCP and + RTP packets can be reliably distinguished (see Section "Summary of + Protocol Constants" of the RTP protocol specification). + + The payload types currently defined in this profile are assigned to + exactly one of three categories or media types: audio only, video + only and those combining audio and video. The media types are marked + in Tables 4 and 5 as "A", "V" and "AV", respectively. Payload types + of different media types SHALL NOT be interleaved or multiplexed + within a single RTP session, but multiple RTP sessions MAY be used in + parallel to send multiple media types. An RTP source MAY change + payload types within the same media type during a session. See the + section "Multiplexing RTP Sessions" of RFC 3550 for additional + explanation. + + + + + +Schulzrinne & Casner Standards Track [Page 32] + +RFC 3551 RTP A/V Profile July 2003 + + + PT encoding media type clock rate channels + name (Hz) + ___________________________________________________ + 0 PCMU A 8,000 1 + 1 reserved A + 2 reserved A + 3 GSM A 8,000 1 + 4 G723 A 8,000 1 + 5 DVI4 A 8,000 1 + 6 DVI4 A 16,000 1 + 7 LPC A 8,000 1 + 8 PCMA A 8,000 1 + 9 G722 A 8,000 1 + 10 L16 A 44,100 2 + 11 L16 A 44,100 1 + 12 QCELP A 8,000 1 + 13 CN A 8,000 1 + 14 MPA A 90,000 (see text) + 15 G728 A 8,000 1 + 16 DVI4 A 11,025 1 + 17 DVI4 A 22,050 1 + 18 G729 A 8,000 1 + 19 reserved A + 20 unassigned A + 21 unassigned A + 22 unassigned A + 23 unassigned A + dyn G726-40 A 8,000 1 + dyn G726-32 A 8,000 1 + dyn G726-24 A 8,000 1 + dyn G726-16 A 8,000 1 + dyn G729D A 8,000 1 + dyn G729E A 8,000 1 + dyn GSM-EFR A 8,000 1 + dyn L8 A var. var. + dyn RED A (see text) + dyn VDVI A var. 1 + + Table 4: Payload types (PT) for audio encodings + + + + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 33] + +RFC 3551 RTP A/V Profile July 2003 + + + PT encoding media type clock rate + name (Hz) + _____________________________________________ + 24 unassigned V + 25 CelB V 90,000 + 26 JPEG V 90,000 + 27 unassigned V + 28 nv V 90,000 + 29 unassigned V + 30 unassigned V + 31 H261 V 90,000 + 32 MPV V 90,000 + 33 MP2T AV 90,000 + 34 H263 V 90,000 + 35-71 unassigned ? + 72-76 reserved N/A N/A + 77-95 unassigned ? + 96-127 dynamic ? + dyn H263-1998 V 90,000 + + Table 5: Payload types (PT) for video and combined + encodings + + Session participants agree through mechanisms beyond the scope of + this specification on the set of payload types allowed in a given + session. This set MAY, for example, be defined by the capabilities + of the applications used, negotiated by a conference control protocol + or established by agreement between the human participants. + + Audio applications operating under this profile SHOULD, at a minimum, + be able to send and/or receive payload types 0 (PCMU) and 5 (DVI4). + This allows interoperability without format negotiation and ensures + successful negotiation with a conference control protocol. + +7. RTP over TCP and Similar Byte Stream Protocols + + Under special circumstances, it may be necessary to carry RTP in + protocols offering a byte stream abstraction, such as TCP, possibly + multiplexed with other data. The application MUST define its own + method of delineating RTP and RTCP packets (RTSP [23] provides an + example of such an encapsulation specification). + +8. Port Assignment + + As specified in the RTP protocol definition, RTP data SHOULD be + carried on an even UDP port number and the corresponding RTCP packets + SHOULD be carried on the next higher (odd) port number. + + + + +Schulzrinne & Casner Standards Track [Page 34] + +RFC 3551 RTP A/V Profile July 2003 + + + Applications operating under this profile MAY use any such UDP port + pair. For example, the port pair MAY be allocated randomly by a + session management program. A single fixed port number pair cannot + be required because multiple applications using this profile are + likely to run on the same host, and there are some operating systems + that do not allow multiple processes to use the same UDP port with + different multicast addresses. + + However, port numbers 5004 and 5005 have been registered for use with + this profile for those applications that choose to use them as the + default pair. Applications that operate under multiple profiles MAY + use this port pair as an indication to select this profile if they + are not subject to the constraint of the previous paragraph. + Applications need not have a default and MAY require that the port + pair be explicitly specified. The particular port numbers were + chosen to lie in the range above 5000 to accommodate port number + allocation practice within some versions of the Unix operating + system, where port numbers below 1024 can only be used by privileged + processes and port numbers between 1024 and 5000 are automatically + assigned by the operating system. + +9. Changes from RFC 1890 + + This RFC revises RFC 1890. It is mostly backwards-compatible with + RFC 1890 except for functions removed because two interoperable + implementations were not found. The additions to RFC 1890 codify + existing practice in the use of payload formats under this profile. + Since this profile may be used without using any of the payload + formats listed here, the addition of new payload formats in this + revision does not affect backwards compatibility. The changes are + listed below, categorized into functional and non-functional changes. + + Functional changes: + + o Section 11, "IANA Considerations" was added to specify the + registration of the name for this profile. That appendix also + references a new Section 3 "Registering Additional Encodings" + which establishes a policy that no additional registration of + static payload types for this profile will be made beyond those + added in this revision and included in Tables 4 and 5. Instead, + additional encoding names may be registered as MIME subtypes for + binding to dynamic payload types. Non-normative references were + added to RFC 3555 [7] where MIME subtypes for all the listed + payload formats are registered, some with optional parameters for + use of the payload formats. + + + + + + +Schulzrinne & Casner Standards Track [Page 35] + +RFC 3551 RTP A/V Profile July 2003 + + + o Static payload types 4, 16, 17 and 34 were added to incorporate + IANA registrations made since the publication of RFC 1890, along + with the corresponding payload format descriptions for G723 and + H263. + + o Following working group discussion, static payload types 12 and 18 + were added along with the corresponding payload format + descriptions for QCELP and G729. Static payload type 13 was + assigned to the Comfort Noise (CN) payload format defined in RFC + 3389. Payload type 19 was marked reserved because it had been + temporarily allocated to an earlier version of Comfort Noise + present in some draft revisions of this document. + + o The payload format for G721 was renamed to G726-32 following the + ITU-T renumbering, and the payload format description for G726 was + expanded to include the -16, -24 and -40 data rates. Because of + confusion regarding draft revisions of this document, some + implementations of these G726 payload formats packed samples into + octets starting with the most significant bit rather than the + least significant bit as specified here. To partially resolve + this incompatibility, new payload formats named AAL2-G726-16, -24, + -32 and -40 will be specified in a separate document (see note in + Section 4.5.4), and use of static payload type 2 is deprecated as + explained in Section 6. + + o Payload formats G729D and G729E were added following the ITU-T + addition of Annexes D and E to Recommendation G.729. Listings + were added for payload formats GSM-EFR, RED, and H263-1998 + published in other documents subsequent to RFC 1890. These + additional payload formats are referenced only by dynamic payload + type numbers. + + o The descriptions of the payload formats for G722, G728, GSM, VDVI + were expanded. + + o The payload format for 1016 audio was removed and its static + payload type assignment 1 was marked "reserved" because two + interoperable implementations were not found. + + o Requirements for congestion control were added in Section 2. + + o This profile follows the suggestion in the revised RTP spec that + RTCP bandwidth may be specified separately from the session + bandwidth and separately for active senders and passive receivers. + + o The mapping of a user pass-phrase string into an encryption key + was deleted from Section 2 because two interoperable + implementations were not found. + + + +Schulzrinne & Casner Standards Track [Page 36] + +RFC 3551 RTP A/V Profile July 2003 + + + o The "quadrophonic" sample ordering convention for four-channel + audio was removed to eliminate an ambiguity as noted in Section + 4.1. + + Non-functional changes: + + o In Section 4.1, it is now explicitly stated that silence + suppression is allowed for all audio payload formats. (This has + always been the case and derives from a fundamental aspect of + RTP's design and the motivations for packet audio, but was not + explicit stated before.) The use of comfort noise is also + explained. + + o In Section 4.1, the requirement level for setting of the marker + bit on the first packet after silence for audio was changed from + "is" to "SHOULD be", and clarified that the marker bit is set only + when packets are intentionally not sent. + + o Similarly, text was added to specify that the marker bit SHOULD be + set to one on the last packet of a video frame, and that video + frames are distinguished by their timestamps. + + o RFC references are added for payload formats published after RFC + 1890. + + o The security considerations and full copyright sections were + added. + + o According to Peter Hoddie of Apple, only pre-1994 Macintosh used + the 22254.54 rate and none the 11127.27 rate, so the latter was + dropped from the discussion of suggested sampling frequencies. + + o Table 1 was corrected to move some values from the "ms/packet" + column to the "default ms/packet" column where they belonged. + + o Since the Interactive Multimedia Association ceased operations, an + alternate resource was provided for a referenced IMA document. + + o A note has been added for G722 to clarify a discrepancy between + the actual sampling rate and the RTP timestamp clock rate. + + o Small clarifications of the text have been made in several places, + some in response to questions from readers. In particular: + + - A definition for "media type" is given in Section 1.1 to allow + the explanation of multiplexing RTP sessions in Section 6 to be + more clear regarding the multiplexing of multiple media. + + + + +Schulzrinne & Casner Standards Track [Page 37] + +RFC 3551 RTP A/V Profile July 2003 + + + - The explanation of how to determine the number of audio frames + in a packet from the length was expanded. + + - More description of the allocation of bandwidth to SDES items + is given. + + - A note was added that the convention for the order of channels + specified in Section 4.1 may be overridden by a particular + encoding or payload format specification. + + - The terms MUST, SHOULD, MAY, etc. are used as defined in RFC + 2119. + + o A second author for this document was added. + +10. Security Considerations + + Implementations using the profile defined in this specification are + subject to the security considerations discussed in the RTP + specification [1]. This profile does not specify any different + security services. The primary function of this profile is to list a + set of data compression encodings for audio and video media. + + Confidentiality of the media streams is achieved by encryption. + Because the data compression used with the payload formats described + in this profile is applied end-to-end, encryption may be performed + after compression so there is no conflict between the two operations. + + A potential denial-of-service threat exists for data encodings using + compression techniques that have non-uniform receiver-end + computational load. The attacker can inject pathological datagrams + into the stream which are complex to decode and cause the receiver to + be overloaded. + + As with any IP-based protocol, in some circumstances a receiver may + be overloaded simply by the receipt of too many packets, either + desired or undesired. Network-layer authentication MAY be used to + discard packets from undesired sources, but the processing cost of + the authentication itself may be too high. In a multicast + environment, source pruning is implemented in IGMPv3 (RFC 3376) [24] + and in multicast routing protocols to allow a receiver to select + which sources are allowed to reach it. + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 38] + +RFC 3551 RTP A/V Profile July 2003 + + +11. IANA Considerations + + The RTP specification establishes a registry of profile names for use + by higher-level control protocols, such as the Session Description + Protocol (SDP), RFC 2327 [6], to refer to transport methods. This + profile registers the name "RTP/AVP". + + Section 3 establishes the policy that no additional registration of + static RTP payload types for this profile will be made beyond those + added in this document revision and included in Tables 4 and 5. IANA + may reference that section in declining to accept any additional + registration requests. In Tables 4 and 5, note that types 1 and 2 + have been marked reserved and the set of "dyn" payload types included + has been updated. These changes are explained in Sections 6 and 9. + +12. References + +12.1 Normative References + + [1] Schulzrinne, H., Casner, S., Frederick, R. and V. Jacobson, + "RTP: A Transport Protocol for Real-Time Applications", RFC + 3550, July 2003. + + [2] Bradner, S., "Key Words for Use in RFCs to Indicate Requirement + Levels", BCP 14, RFC 2119, March 1997. + + [3] Apple Computer, "Audio Interchange File Format AIFF-C", August + 1991. (also ftp://ftp.sgi.com/sgi/aiff-c.9.26.91.ps.Z). + +12.2 Informative References + + [4] Braden, R., Clark, D. and S. Shenker, "Integrated Services in + the Internet Architecture: an Overview", RFC 1633, June 1994. + + [5] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z. and W. + Weiss, "An Architecture for Differentiated Service", RFC 2475, + December 1998. + + [6] Handley, M. and V. Jacobson, "SDP: Session Description + Protocol", RFC 2327, April 1998. + + [7] Casner, S. and P. Hoschka, "MIME Type Registration of RTP + Payload Types", RFC 3555, July 2003. + + [8] Freed, N., Klensin, J. and J. Postel, "Multipurpose Internet + Mail Extensions (MIME) Part Four: Registration Procedures", BCP + 13, RFC 2048, November 1996. + + + + +Schulzrinne & Casner Standards Track [Page 39] + +RFC 3551 RTP A/V Profile July 2003 + + + [9] Zopf, R., "Real-time Transport Protocol (RTP) Payload for + Comfort Noise (CN)", RFC 3389, September 2002. + + [10] Deleam, D. and J.-P. Petit, "Real-time implementations of the + recent ITU-T low bit rate speech coders on the TI TMS320C54X + DSP: results, methodology, and applications", in Proc. of + International Conference on Signal Processing, Technology, and + Applications (ICSPAT) , (Boston, Massachusetts), pp. 1656--1660, + October 1996. + + [11] Mouly, M. and M.-B. Pautet, The GSM system for mobile + communications Lassay-les-Chateaux, France: Europe Media + Duplication, 1993. + + [12] Degener, J., "Digital Speech Compression", Dr. Dobb's Journal, + December 1994. + + [13] Redl, S., Weber, M. and M. Oliphant, An Introduction to GSM + Boston: Artech House, 1995. + + [14] Hoffman, D., Fernando, G., Goyal, V. and M. Civanlar, "RTP + Payload Format for MPEG1/MPEG2 Video", RFC 2250, January 1998. + + [15] Jayant, N. and P. Noll, Digital Coding of Waveforms--Principles + and Applications to Speech and Video Englewood Cliffs, New + Jersey: Prentice-Hall, 1984. + + [16] McKay, K., "RTP Payload Format for PureVoice(tm) Audio", RFC + 2658, August 1999. + + [17] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M., + Bolot, J.-C., Vega-Garcia, A. and S. Fosse-Parisis, "RTP Payload + for Redundant Audio Data", RFC 2198, September 1997. + + [18] Speer, M. and D. Hoffman, "RTP Payload Format of Sun's CellB + Video Encoding", RFC 2029, October 1996. + + [19] Berc, L., Fenner, W., Frederick, R., McCanne, S. and P. Stewart, + "RTP Payload Format for JPEG-Compressed Video", RFC 2435, + October 1998. + + [20] Turletti, T. and C. Huitema, "RTP Payload Format for H.261 Video + Streams", RFC 2032, October 1996. + + [21] Zhu, C., "RTP Payload Format for H.263 Video Streams", RFC 2190, + September 1997. + + + + + +Schulzrinne & Casner Standards Track [Page 40] + +RFC 3551 RTP A/V Profile July 2003 + + + [22] Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C., + Newell, D., Ott, J., Sullivan, G., Wenger, S. and C. Zhu, "RTP + Payload Format for the 1998 Version of ITU-T Rec. H.263 Video + (H.263+)", RFC 2429, October 1998. + + [23] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming + Protocol (RTSP)", RFC 2326, April 1998. + + [24] Cain, B., Deering, S., Kouvelas, I., Fenner, B. and A. + Thyagarajan, "Internet Group Management Protocol, Version 3", + RFC 3376, October 2002. + +13. Current Locations of Related Resources + + Note: Several sections below refer to the ITU-T Software Tool + Library (STL). It is available from the ITU Sales Service, Place des + Nations, CH-1211 Geneve 20, Switzerland (also check + http://www.itu.int). The ITU-T STL is covered by a license defined + in ITU-T Recommendation G.191, "Software tools for speech and audio + coding standardization". + + DVI4 + + An archived copy of the document IMA Recommended Practices for + Enhancing Digital Audio Compatibility in Multimedia Systems (version + 3.0), which describes the IMA ADPCM algorithm, is available at: + + http://www.cs.columbia.edu/~hgs/audio/dvi/ + + An implementation is available from Jack Jansen at + + ftp://ftp.cwi.nl/local/pub/audio/adpcm.shar + + G722 + + An implementation of the G.722 algorithm is available as part of the + ITU-T STL, described above. + + G723 + + The reference C code implementation defining the G.723.1 algorithm + and its Annexes A, B, and C are available as an integral part of + Recommendation G.723.1 from the ITU Sales Service, address listed + above. Both the algorithm and C code are covered by a specific + license. The ITU-T Secretariat should be contacted to obtain such + licensing information. + + + + + +Schulzrinne & Casner Standards Track [Page 41] + +RFC 3551 RTP A/V Profile July 2003 + + + G726 + + G726 is specified in the ITU-T Recommendation G.726, "40, 32, 24, and + 16 kb/s Adaptive Differential Pulse Code Modulation (ADPCM)". An + implementation of the G.726 algorithm is available as part of the + ITU-T STL, described above. + + G729 + + The reference C code implementation defining the G.729 algorithm and + its Annexes A through I are available as an integral part of + Recommendation G.729 from the ITU Sales Service, listed above. Annex + I contains the integrated C source code for all G.729 operating + modes. The G.729 algorithm and associated C code are covered by a + specific license. The contact information for obtaining the license + is available from the ITU-T Secretariat. + + GSM + + A reference implementation was written by Carsten Bormann and Jutta + Degener (then at TU Berlin, Germany). It is available at + + http://www.dmn.tzi.org/software/gsm/ + + Although the RPE-LTP algorithm is not an ITU-T standard, there is a C + code implementation of the RPE-LTP algorithm available as part of the + ITU-T STL. The STL implementation is an adaptation of the TU Berlin + version. + + LPC + + An implementation is available at + + ftp://parcftp.xerox.com/pub/net-research/lpc.tar.Z + + PCMU, PCMA + + An implementation of these algorithms is available as part of the + ITU-T STL, described above. + +14. Acknowledgments + + The comments and careful review of Simao Campos, Richard Cox and AVT + Working Group participants are gratefully acknowledged. The GSM + description was adopted from the IMTC Voice over IP Forum Service + Interoperability Implementation Agreement (January 1997). Fred Burg + and Terry Lyons helped with the G.729 description. + + + + +Schulzrinne & Casner Standards Track [Page 42] + +RFC 3551 RTP A/V Profile July 2003 + + +15. Intellectual Property Rights Statement + + The IETF takes no position regarding the validity or scope of any + intellectual property or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; neither does it represent that it + has made any effort to identify any such rights. Information on the + IETF's procedures with respect to rights in standards-track and + standards-related documentation can be found in BCP-11. Copies of + claims of rights made available for publication and any assurances of + licenses to be made available, or the result of an attempt made to + obtain a general license or permission for the use of such + proprietary rights by implementors or users of this specification can + be obtained from the IETF Secretariat. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights which may cover technology that may be required to practice + this standard. Please address the information to the IETF Executive + Director. + +16. Authors' Addresses + + Henning Schulzrinne + Department of Computer Science + Columbia University + 1214 Amsterdam Avenue + New York, NY 10027 + United States + + EMail: schulzrinne@cs.columbia.edu + + + Stephen L. Casner + Packet Design + 3400 Hillview Avenue, Building 3 + Palo Alto, CA 94304 + United States + + EMail: casner@acm.org + + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 43] + +RFC 3551 RTP A/V Profile July 2003 + + +17. Full Copyright Statement + + Copyright (C) The Internet Society (2003). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Schulzrinne & Casner Standards Track [Page 44] + |