From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc6366.txt | 955 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 955 insertions(+) create mode 100644 doc/rfc/rfc6366.txt (limited to 'doc/rfc/rfc6366.txt') diff --git a/doc/rfc/rfc6366.txt b/doc/rfc/rfc6366.txt new file mode 100644 index 0000000..91badcd --- /dev/null +++ b/doc/rfc/rfc6366.txt @@ -0,0 +1,955 @@ + + + + + + +Internet Engineering Task Force (IETF) J. Valin +Request for Comments: 6366 Mozilla +Category: Informational K. Vos +ISSN: 2070-1721 Skype Technologies, S.A. + August 2011 + + + Requirements for an Internet Audio Codec + +Abstract + + This document provides specific requirements for an Internet audio + codec. These requirements address quality, sampling rate, bit-rate, + and packet-loss robustness, as well as other desirable properties. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Not all documents + approved by the IESG are a candidate for any level of Internet + Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc6366. + +Copyright Notice + + Copyright (c) 2011 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + + + + + + +Valin & Vos Informational [Page 1] + +RFC 6366 Audio Codec Requirements August 2011 + + +Table of Contents + + 1. Introduction ....................................................2 + 2. Definitions .....................................................3 + 3. Applications ....................................................3 + 3.1. Point-to-Point Calls .......................................3 + 3.2. Conferencing ...............................................4 + 3.3. Telepresence ...............................................5 + 3.4. Teleoperation and Remote Software Services .................5 + 3.5. In-Game Voice Chat .........................................5 + 3.6. Live Distributed Music Performances / Internet + Music Lessons ..............................................6 + 3.7. Delay-Tolerant Networking or Push-to-Talk Services .........6 + 3.8. Other Applications .........................................7 + 4. Constraints Imposed by the Internet on the Codec ................7 + 5. Detailed Basic Requirements .....................................8 + 5.1. Operating Space ............................................9 + 5.2. Quality and Bit-Rate .......................................9 + 5.3. Packet-Loss Robustness ....................................10 + 5.4. Computational Resources ...................................10 + 6. Additional Considerations ......................................12 + 6.1. Low-Complexity Audio Mixing ...............................12 + 6.2. Encoder Side Potential for Improvement ....................12 + 6.3. Layered Bit-Stream ........................................13 + 6.4. Partial Redundancy ........................................13 + 6.5. Stereo Support ............................................13 + 6.6. Bit Error Robustness ......................................13 + 6.7. Time Stretching and Shortening ............................14 + 6.8. Input Robustness ..........................................14 + 6.9. Support of Audio Forensics ................................14 + 6.10. Legacy Compatibility .....................................14 + 7. Security Considerations ........................................14 + 8. Acknowledgments ................................................15 + 9. Informative References .........................................15 + +1. Introduction + + This document provides requirements for an audio codec designed + specifically for use over the Internet. The requirements attempt to + address the needs of the most common Internet interactive audio + transmission applications and ensure good quality when operating in + conditions that are typical for the Internet. These requirements + also address the quality, sampling rate, delay, bit-rate, and packet- + loss robustness. Other desirable codec properties are considered as + well. + + + + + + +Valin & Vos Informational [Page 2] + +RFC 6366 Audio Codec Requirements August 2011 + + +2. Definitions + + Throughout this document, the following conventions refer to the + sampling rate of a signal: + + Narrowband: 8 kilohertz (kHz) + + Wideband: 16 kHz + + Super-wideband: 24/32 kHz + + Full-band: 44.1/48 kHz + + Codec bit-rates in bits per second (bit/s) will be considered without + counting any overhead ((IP/UDP/RTP) headers, padding, etc.). The + codec delay is the total algorithmic delay when one adds the codec + frame size to the "look-ahead". Thus, it is the minimum + theoretically achievable end-to-end delay of a transmission system + that uses the codec. + +3. Applications + + The following applications should be considered for Internet audio + codecs, along with their requirements: + + o Point-to-point calls + + o Conferencing + + o Telepresence + + o Teleoperation + + o In-game voice chat + + o Live distributed music performances / Internet music lessons + + o Delay-tolerant networking or push-to-talk services + + o Other applications + +3.1. Point-to-Point Calls + + Point-to-point calls are voice over IP (VoIP) calls from two + "standard" (fixed or mobile) phones, and implemented in hardware or + software. For these applications, a wideband codec is required, + along with narrowband support for compatibility with a public + switched telephone network (PSTN). It is expected for the range of + + + +Valin & Vos Informational [Page 3] + +RFC 6366 Audio Codec Requirements August 2011 + + + useful bit-rates to be 12 - 32 kilobits per second (kbit/s) for + wideband speech and 8 - 16 kbit/s for narrowband speech. The codec + delay must be less than 40 milliseconds (ms), but no more than 25 ms + is desirable. Support for encoding music is not required, but it is + desirable for the codec not to make background (on-hold) music + excessively unpleasant to hear. Also, the codec should be robust to + noise (produce intelligible speech and no annoying artifacts) even at + lower bit-rates. + +3.2. Conferencing + + Conferencing applications (that support multi-party calls) have + additional requirements on top of the requirements for point-to-point + calls. Conferencing systems often have higher-fidelity audio + equipment and have greater network bandwidth available -- especially + when video transmission is involved. Therefore, support for super- + wideband audio becomes important, with useful bit-rates in the 32 - + 64 kbit/s range. The ability to vary the bit-rate, according to the + "difficulty" of the audio signal, is a desirable feature for the + codec. This not only saves bandwidth "on average", but it can also + help conference servers make more efficient use of the available + bandwidth, by using more bandwidth for important audio streams and + less bandwidth for less important ones (e.g., background noise). + + Conferencing end-points often operate in hands-free conditions, which + creates acoustic echo problems. Therefore, lower delay is important, + as it reduces the quality degradation due to any residual echo after + acoustic echo cancellation (AEC). Consequently, the codec delay must + be less than 30 ms for this application. An optional low-delay mode + with less than 10 ms delay is desirable, but not required. + + Most conferencing systems operate with a bridge that mixes some (or + all) of the audio streams and sends them back to all the + participants. In that case, it is important that the codec not + produce annoying artifacts when two voices are present at the same + time. Also, this mixing operation should be as easy as possible to + perform. To make it easier to determine which streams have to be + mixed (and which are noise/silence), it must be possible to measure + (or estimate) the voice activity in a packet without having to fully + decode the packet (saving most of the complexity when the packet need + not be decoded). Also, the ability to save on the computational + complexity when mixing is also desirable, but not required. For + example, a transform codec may make it possible to mix the streams in + the transform domain, without having to go back to time-domain. Low- + complexity up-sampling and down-sampling within the codec is also a + desirable feature when mixing streams with different sampling rates. + + + + + +Valin & Vos Informational [Page 4] + +RFC 6366 Audio Codec Requirements August 2011 + + +3.3. Telepresence + + Most telepresence applications can be considered to be essentially + very high-quality video-conferencing environments, so all of the + conferencing requirements also apply to telepresence. In addition, + telepresence applications require super-wideband and full-band audio + capability with useful bit-rates in the 32 - 80 kbit/s range. While + voice is still the most important signal to be encoded, it must be + possible to obtain good quality (even if not transparent) music. + + Most telepresence applications require more than one audio channel, + so support for stereo and multi-channel is important. While this can + always be accomplished by encoding multiple single-channel streams, + it is preferable to take advantage of the redundancy that exists + between channels. + +3.4. Teleoperation and Remote Software Services + + Teleoperation applications are similar to telepresence, with the + exception that they involve remote physical interactions. For + example, the user may be controlling a robot while receiving real- + time audio feedback from that robot. For these applications, the + delay has to be less than 10 ms. The other requirements of + telepresence (quality, bit-rate, multi-channel) apply to + teleoperation as well. The only exception is that mixing is not an + important issue for teleoperation. + + The requirements for remote software services are similar to those of + teleoperation. These applications include remote desktop + applications, remote virtualization, and interactive media + application being rendered remotely (e.g., video games rendered on + central servers). For all these applications, full-band audio with + an algorithmic delay below 10 ms are important. + +3.5. In-Game Voice Chat + + An increasing number of computer/console games make use of VoIP to + allow players to communicate in real time. The requirements for + gaming are similar to those of conferencing, with the main difference + being that narrowband compatibility is not necessary. While for most + applications a codec delay up to 30 ms is acceptable, a low-delay (< + 10 ms) option is highly desirable, especially for games with rapid + interactions. The ability to use variable bit-rate (VBR) (with a + maximum allowed bit-rate) is also highly desirable because it can + significantly reduce the bandwidth requirement for a game server. + + + + + + +Valin & Vos Informational [Page 5] + +RFC 6366 Audio Codec Requirements August 2011 + + +3.6. Live Distributed Music Performances / Internet Music Lessons + + Live music over the Internet requires extremely low end-to-end delay + and is one of the most demanding applications for interactive audio + transmission. It has been observed that for most scenarios, total + end-to-end delays up to 25 ms could be tolerated by musicians, with + the absolute limit (where none of the scenarios are possible) being + around 50 ms [carot09]. In order to achieve this low delay on the + Internet -- either in the same city or in a nearby city -- the + network propagation time must be taken into account. When also + subtracting the delay of the audio buffer, jitter buffer, and + acoustic path, that leaves around 2 ms to 10 ms for the total delay + of the codec. Considering the speed of light in fiber, every 1 ms + reduction in the codec delay increases the range over which + synchronization is possible by approximately 200 km. + + Acoustic echo is expected to be an even more important issue for + network music than it is in conferencing, especially considering that + the music quality requirements essentially forbid the use of a "non- + linear processor" (NLP) with AEC. This is another reason why very + low delay is essential. + + Considering that the application is music, the full audio bandwidth + (44.1 or 48 kHz sampling rate) must be transmitted with a bit-rate + that is sufficient to provide near-transparent to transparent + quality. With the current audio coding technology, this corresponds + to approximately 64 kbit/s to 128 kbit/s per channel. As for + telepresence, support for two or more channels is often desired, so + it would be useful for a codec to be able to take advantage of the + redundancy that is often present between audio channels. + +3.7. Delay-Tolerant Networking or Push-to-Talk Services + + Internet transmissions are subjected to interruptions of connectivity + that severely disturb a phone call. This may happen in cases of + route changes, handovers, slow fading, or device failures. To + overcome this distortion, the phone call can be halted and resumed + after the connectivity has been reestablished again. + + Also, if transmission capacity is lower than the minimal coding rate, + switching to a push-to-talk mode still allows for effective + communication. In this situation, voice is transmitted at slower- + than-real-time bit-rate and conversations are interrupted until the + speech has been transmitted. + + These modes require interrupting the audio playout and continuing + after a pause of arbitrary duration. + + + + +Valin & Vos Informational [Page 6] + +RFC 6366 Audio Codec Requirements August 2011 + + +3.8. Other Applications + + The above list is by no means a complete list of all applications + involving interactive audio transmission on the Internet. However, + it is believed that meeting the needs of all these different + applications should be sufficient to ensure that the needs of other + applications not listed will also be met. + +4. Constraints Imposed by the Internet on the Codec + + Packet losses are inevitable on the Internet, and dealing with them + is one of the most fundamental requirements for an Internet audio + codec. While any audio codec can be combined with a good packet-loss + concealment (PLC) algorithm, the important aspect is what happens on + the first packets received _after_ the loss. More specifically, this + means that: + + o it should be possible to interpret the contents of any received + packet, irrespective of previous losses as specified in BCP 36 + [PAYLOADS]; and + + o the decoder should re-synchronize as quickly as possible (i.e., + the output should quickly converge to the output that would have + been obtained if no loss had occurred). + + The constraint of being able to decode any packet implies the + following considerations for an audio codec: + + o The size of a compressed frame must be kept smaller than the MTU + to avoid fragmentation; + + o The interpretation of any parameter encoded in the bit-stream must + not depend on information contained in other packets. For + example, it is not acceptable for a codec to allow signaling a + mode change in one packet and assume that subsequent frames will + be decoded according to that mode. + + Although the interpretation of parameters cannot depend on other + packets, it is still reasonable to use some amount of prediction + across frames, provided that the predictors can resynchronize quickly + in case of a lost packet. In this case, it is important to use the + best compromise between the gain in coding efficiency and the loss in + packet loss robustness due to the use of inter-frame prediction. It + is a desirable property for the codec to allow some real-time control + of that trade-off, so that it can take advantage of more prediction + when the loss rate is small, while being more robust to losses when + the loss rate is high. + + + + +Valin & Vos Informational [Page 7] + +RFC 6366 Audio Codec Requirements August 2011 + + + To improve the robustness to packet loss, it would be desirable for + the codec to allow an adaptive (data- and network-dependent) amount + of side information to help improve audio quality when losses occur. + For example, side information may include the retransmission of + certain parameters encoded in the previous frame(s). + + To ensure freedom of implementation, decoder-side-only error + concealment does not need to be specified, although a functional PLC + algorithm is desirable as part of the codec reference implementation. + Obviously, any information signaled in the bit-stream intended to aid + PLC needs to be specified. + + Another important property of the Internet is that it is mostly a + best-effort network, with no guaranteed bandwidth. This means that + the codec has to be able to vary its output bit-rate dynamically (in + real time), without requiring an out-of-band signaling mechanism, and + without causing audible artifacts at the bit-rate change boundaries. + Additional desirable features are: + + o Having the possibility to use smooth bit-rate changes with one + byte/frame resolution; + + o Making it possible for a codec to adapt its bit-rate based on the + source signal being encoded (source-controlled VBR) to maximize + the quality for a certain _average_ bit-rate. + + Because the Internet transmits data in bytes, a codec should produce + compressed data in integer numbers of bytes. In general, the codec + design should take into consideration explicit congestion + notification (ECN) and may include features that would improve the + quality of an ECN implementation. + + The IETF has defined a set of application-layer protocols to be used + for transmitting real-time transport of multimedia data, including + voice. Thus, it is important for the resulting codec to be easy to + use with these protocols. For example, it must be possible to create + an [RTP] payload format that conforms to BCP 36 [PAYLOADS]. If any + codec parameters need to be negotiated between end-points, the + negotiation should be as easy as possible to carry over session + initiation protocol (SIP) [RFC3261]/ session description protocol + (SDP) [RFC4566] or alternatively over extensible messaging and + presence protocol (XMPP) [RFC6120] / Jingle [XEP-0167]. + +5. Detailed Basic Requirements + + This section summarizes all the constraints imposed by the target + applications and by the Internet into a set of actual requirements + for codec development. + + + +Valin & Vos Informational [Page 8] + +RFC 6366 Audio Codec Requirements August 2011 + + +5.1. Operating Space + + The operating space for the target applications can be divided in + terms of delay: most applications require a "medium delay" (20-30 + ms), while a few require a "very low delay" (< 10 ms). It makes + sense to divide the space based on delay because lowering the delay + has a cost in terms of quality versus bit-rate. + + For medium delay, the resulting codec must be able to efficiently + operate within the following range of bit-rates (per channel): + + o Narrowband: 8 kbit/s to 16 kbit/s + + o Wideband: 12 to 32 kbit/s + + o Super-wideband: 24 to 64 kbit/s + + o Full-band: 32 to 80 kbit/s + + Obviously, a lower-delay codec that can operate in the above range is + also acceptable. + + For very low delay, the resulting codec will need to operate within + the following range of bit-rates (per channel): + + o Super-wideband: 32 to 80 kbit/s + + o Full-band: 48 to 128 kbit/s + + o (Narrowband and wideband not required) + +5.2. Quality and Bit-Rate + + The quality of a codec is directly linked to the bit-rate, so these + two must be considered jointly. When comparing the bit-rate of + codecs, the overhead of IP/UDP/RTP headers should not be considered, + but any additional bits required in the RTP payload format, after the + header (e.g., required signaling), should be considered. In terms of + quality versus bit-rate, the codec to be developed must be better + than the following codecs, that are generally considered royalty- + free: + + o For narrowband: Speex (NB) [Speex], and internet low bit-rate + codec (iLBC)(*) [RFC3951] + + o For wideband: Speex (WB) [Speex], G.722.1(*) [ITU.G722.1] + + o For super-wideband/fullband: G.722.1C(*) [ITU.G722.1] + + + +Valin & Vos Informational [Page 9] + +RFC 6366 Audio Codec Requirements August 2011 + + + The codecs marked with (*) have additional licensing restrictions, + but the codec to be developed should still not perform significantly + worse. In addition to the quality targets listed above, a desirable + objective is for the codec quality to be no worse than Adaptive + Multi-Rate (AMR-NB) and Adaptive Multi-Rate Wideband (AMR-WB). + Quality should be measured for multiple languages, including tonal + languages. The case of multiple simultaneous voices (as sometimes + happens in conferencing) should be evaluated as well. + + The comparison with the above codecs assumes that the codecs being + compared have similar delay characteristics. The bit-rate required, + for a certain level of quality, may be higher than the referenced + codecs in cases where a much lower delay is required. In that case, + the increase in bit-rate must be less than the ratio between the + delays. + + It is desirable for the codecs to support source-controlled variable + bit-rate (VBR) to take advantage of different inputs, that require a + different bit-rate, to achieve the same quality. However, it should + still be possible to use the codec at a truly constant bit-rate to + ensure that no information leak is possible when using an encrypted + channel. + +5.3. Packet-Loss Robustness + + Robustness to packet loss is a very important aspect of any codec to + be used on the Internet. Codecs must maintain acceptable quality at + loss rates up to 5% and maintain good intelligibility up to 15% loss + rate. At any sampling rate, bit-rate, and packet-loss rate, the + quality must be no less than the quality obtained with the Speex + codec or the Global System for Mobile Communications - Full Rate + (GSM-FR) codec in the same conditions. The actual packet-loss + "patterns" to be used in testing must be obtained from real packet- + loss traces collected on the Internet, rather than from loss models. + These traces should be representative of the typical environments in + which the applications of Section 3 operate. For example, traces + related to VoIP calls should consider the loss patterns observed for + typical home broadband and corporate connections. + +5.4. Computational Resources + + The resulting codec should be implementable on a wide range of + devices, so there should be a fixed-point implementation or at least + assurance that a reasonable fixed-point is possible. The + computational resources figures listed below are meant to be upper + bounds. Even below these bounds, resources should still be + minimized. Any proposed increase in computational resources + consumption (e.g., to increase quality) should be carefully evaluated + + + +Valin & Vos Informational [Page 10] + +RFC 6366 Audio Codec Requirements August 2011 + + + even if the resulting resource consumption is below the upper bound. + Having variable complexity would be useful (but not required) in + achieving that goal as it would allow trading quality/bit-rate for + lower complexity. + + The computational requirements for real-time encoding and decoding of + a mono signal on one core of a recent x86 CPU (as measured with the + Unix "time" utility or equivalent) are as follows: + + o Narrowband: 40 megahertz (MHz) (2% of a 2 gigahertz (GHz) CPU + core) + + o Wideband: 80 MHz (4% of a 2 GHz CPU core) + + o Super-wideband/fullband: 200 MHz (10% of a 2 GHz CPU core) + + It is desirable that the MHz values listed above also be achievable + on fixed-point digital signal processors that are capable of single- + cycle multiply-accumulate operations (16x16 multiplication + accumulated into 32 bits). + + For applications that require mixing (e.g., conferencing), it should + be possible to estimate the energy and/or the voice activity status + of the decoded signal with less than 10% of the complexity figures + listed above. + + It is the intent to maximize the range of devices on which a codec + can be implemented. Therefore, the reference implementation must not + depend on special hardware features or instructions to be present in + order to meet the complexity requirement. However, it may be + desirable to take advantage of such hardware when available, (e.g., + hardware accelerators for operations like Fast Fourier Transforms + (FFT) and convolutions). A codec should also minimize the use of + saturating arithmetic so as to be implementable on architectures that + do not provide hardware saturation (e.g., ARMv4). + + The combined codec size and data read-only memory (ROM) should be + small enough not to cause significant implementation problems on + typical embedded devices. The codec context/state size required + should be no more than 2*R*C bytes in floating-point, where R is the + sampling rate and C is the number of channels. For fixed-point, that + size should be less than R*C. The scratch space required should also + be less than 2*R*C bytes for floating point or less than R*C bytes + for fixed-point. + + + + + + + +Valin & Vos Informational [Page 11] + +RFC 6366 Audio Codec Requirements August 2011 + + +6. Additional Considerations + + There are additional features or characteristics that may be + desirable under some circumstances, but should not be part of the + strict requirements. The benefit of meeting these considerations + should be weighted against the associated cost. + +6.1. Low-Complexity Audio Mixing + + In many applications that require a mixing server (e.g., + conferencing, games), it is important to minimize the computational + cost of the mixing. As much as possible, it should be possible to + perform the mixing with fewer computations than it would take to + decode all the streams, mix them, and re-encode the result. + Properties that reduce the complexity of the mixing process include: + + o The ability to derive sufficient parameters, such as loudness + and/or spectral envelope, for estimating voice activity of a + compressed frame without fully decoding that frame; + + o The ability to mix the streams in an intermediate representation + (e.g., transform domain), rather than having to fully decode the + signals before the mixing; + + o The use of bit-stream layers (Section 6.3) by aggregating a small + number of active streams at lower quality. + + For conferencing applications, the total complexity of the decoding, + voice activity detection (VAD), and mixing should be considered when + evaluating proposals. + +6.2. Encoder Side Potential for Improvement + + In many codecs, it is possible to improve the quality by improving + the encoder without breaking compatibility (i.e., without changing + the decoder). Potential for improvement varies from one codec to + another. It is generally low for pulse code modulation (PCM) or + adaptive differential pulse code modulation (ADPCM) codecs and higher + for perceptual transform codecs. All things being equal, being able + to improve a codec after the bit-stream is a desirable property. + However, this should not be done at the expense of quality in the + reference encoder. Other potential improvements include signal- + adaptive frame size selection and improved discontinuous transmission + (DTX) algorithms that take advantage of predicting the decoder sides + packet loss concealment (PLC) algorithms. + + + + + + +Valin & Vos Informational [Page 12] + +RFC 6366 Audio Codec Requirements August 2011 + + +6.3. Layered Bit-Stream + + A layered codec makes it possible to transmit only a certain subset + of the bits and still obtain a valid bit-stream with a quality that + is equivalent to the quality that would be obtained from encoding at + the corresponding rate. While this is not a necessary feature for + most applications, it can be desirable for cases where a "mixing + server" needs to handle a large number of streams with limited + computational resources. + +6.4. Partial Redundancy + + One possible way of increasing robustness to packet loss is to + include partial redundancy within packets. This can be achieved + either by including the base layer of the previous frame (for a + layered codec) or by transmitting other parameters from the previous + frame(s) to assist the PLC algorithm in case of loss. The ability to + include partial redundancy for high-loss scenarios is desirable, + provided that the feature can be dynamically turned on or off (so + that no bandwidth is wasted in case of loss-free transmission). + +6.5. Stereo Support + + It is highly desirable for the codec to have stereo support. At a + minimum, the codec should be able to encode two channels + independently without causing significant stereo image artifacts. It + is also desirable for the codec to take advantage of the inter- + channel redundancy in stereo audio to reduce the bit-rate (for an + equivalent quality) of stereo audio compared to coding channels + independently. + +6.6. Bit Error Robustness + + The vast majority of Internet-based applications do not need to be + robust to bit errors because packets either arrive unaltered or do + not arrive at all. Therefore, the emphasis should be on packet-loss + robustness and packet-loss concealment. That being said, often, the + extra robustness to bit errors can be achieved at no cost at all + (i.e., no increase in size, complexity, or bit-rate; no decrease in + quality, or packet-loss robustness, etc.). In those cases, it is + useful to make a change that increases the robustness to bit errors. + This can be useful for applications that use UDP Lite transmission + (e.g., over a wireless LAN). Robustness to packet loss should + *never* be sacrificed to achieve higher bit error robustness. + + + + + + + +Valin & Vos Informational [Page 13] + +RFC 6366 Audio Codec Requirements August 2011 + + +6.7. Time Stretching and Shortening + + When adaptive jitter buffers are used, it is often necessary to + stretch or shorten the audio signal to allow changes in buffering. + While this operation can be performed directly on the decoder's + output, it is often more computationally efficient to stretch or + shorten the signal directly within the decoder. It is desirable for + the reference implementation to provide a time stretching/shortening + implementation, although it should not be normative. + +6.8. Input Robustness + + The systems providing input to the encoder and receiving output from + the decoder may be far from ideal in actual use. Input and output + audio streams may be corrupted by compounding non-linear artifacts + from analog hardware and digital processing. The codecs to be + developed should be tested to ensure that they degrade gracefully + under adverse audio input conditions. Types of digital corruption + that may be tested include tandeming, transcoding, low-quality + resampling, and digital clipping. Types of analog corruption that + may be tested include microphones with substantial background noise, + analog clipping, and loudspeaker distortion. No specific end-to-end + quality requirements are mandated for use with the proposed codec. + It is advisable, however, that several typical in situ environments/ + processing chains be specified for the purpose of benchmarking end- + to-end quality with the proposed codec. + +6.9. Support of Audio Forensics + + Emergency calls can be analyzed using audio forensics if the context + and situation of the caller has to be identified. Thus, it is + important to transmit not only the voice of the callers well, but + also to transmit background noise at high quality. In these + situations, sounds or noises of low volume should also not be + compressed or dropped. Therefore, the encoder must allow DTX to be + disabled when required (e.g., for emergency calls). + +6.10. Legacy Compatibility + + In order to create the best possible codec for the Internet, there is + no requirement for compatibility with legacy Internet codecs. + +7. Security Considerations + + Although this document itself does not have security considerations, + this section describes the security requirements for the codec. + + + + + +Valin & Vos Informational [Page 14] + +RFC 6366 Audio Codec Requirements August 2011 + + + As for any protocol to be used over the Internet, security is a very + important aspect to consider. This goes beyond the obvious + considerations of preventing buffer overflows and similar attacks + that can lead to denial-of-service (DoS) or remote code execution. + One very important security aspect is to make sure that the decoders + have a bounded and reasonable worst-case complexity. This prevents + an attacker from causing a DoS by sending packets that are specially + crafted to take a very long (or infinite) time to decode. + + A more subtle aspect is the information leak that can occur when the + codec is used over an encrypted channel (e.g., [SRTP]). For example, + it was suggested [wright08] [white11] that use of source-controlled + VBR may reveal some information about a conversation through the size + of the compressed packets. Therefore, it should be possible to use + the codec at a truly constant bit-rate, if needed. + +8. Acknowledgments + + We would like to thank all the people who contributed directly or + indirectly to this document, including Slava Borilin, Christopher + Montgomery, Raymond (Juin-Hwey) Chen, Jason Fischl, Gregory Maxwell, + Alan Duric, Jonathan Christensen, Julian Spittka, Michael Knappe, + Christian Hoene, and Henry Sinnreich. We would also like to thank + Cullen Jennings, Jonathan Rosenberg, and Gregory Lebovitz for their + advice. + +9. Informative References + + [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, + A., Peterson, J., Sparks, R., Handley, M., and E. + Schooler, "SIP: Session Initiation Protocol", RFC 3261, + June 2002. + + [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session + Description Protocol", RFC 4566, July 2006. + + [RFC6120] Saint-Andre, P., "Extensible Messaging and Presence + Protocol (XMPP): Core", RFC 6120, March 2011. + + [XEP-0167] Ludwig, S., Saint-Andre, P., Egan, S., McQueen, R., and + D. Cionoiu, "Jingle RTP Sessions", XSF XEP 0167, + December 2009. + + [RFC3951] Andersen, S., Duric, A., Astrom, H., Hagen, R., Kleijn, + W., and J. Linden, "Internet Low Bit Rate Codec (iLBC)", + RFC 3951, December 2004. + + + + + +Valin & Vos Informational [Page 15] + +RFC 6366 Audio Codec Requirements August 2011 + + + [ITU.G722.1] International Telecommunications Union, "Low-complexity + coding at 24 and 32 kbit/s for hands-free operation in + systems with low frame loss", ITU-T Recommendation + G.722.1, May 2005. + + [Speex] Xiph.Org Foundation, "Speex: http://www.speex.org/", + 2003. + + [carot09] Carot, A., Werner, C., and T. Fischinger, "Towards a + Comprehensive Cognitive Analysis of Delay-Influenced + Rhythmical Interaction: + http://www.carot.de/icmc2009.pdf", 2009. + + [PAYLOADS] Handley, M. and C. Perkins, "Guidelines for Writers of + RTP Payload Format Specifications", BCP 36, RFC 2736, + December 1999. + + [RTP] Schulzrinne, H., Casner, S., Frederick, R., and V. + Jacobson, "RTP: A Transport Protocol for Real-Time + Applications", STD 64, RFC 3550, July 2003. + + [SRTP] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and + K. Norrman, "The Secure Real-time Transport Protocol + (SRTP)", RFC 3711, March 2004. + + [wright08] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. + Masson, "Spot me if you can: Uncovering spoken phrases + in encrypted VoIP conversations: + http://www.cs.jhu.edu/~cwright/oakland08.pdf", 2008. + + [white11] White, A., Matthews, A., Snow, K., and F. Monrose, + "Phonotactic Reconstruction of Encrypted VoIP + Conversations: Hookt on fon-iks + http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf", + 2011. + + + + + + + + + + + + + + + + +Valin & Vos Informational [Page 16] + +RFC 6366 Audio Codec Requirements August 2011 + + +Authors' Addresses + + Jean-Marc Valin + Mozilla + 650 Castro Street + Mountain View, CA 94041 + USA + + EMail: jmvalin@jmvalin.ca + + + Koen Vos + Skype Technologies, S.A. + Stadsgarden 6 + Stockholm, 11645 + Sweden + + EMail: koen.vos@skype.net + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Valin & Vos Informational [Page 17] + -- cgit v1.2.3