1 files changed, 955 insertions, 0 deletions
diff --git a/doc/rfc/rfc6366.txt b/doc/rfc/rfc6366.txt
new file mode 100644
index 0000000..91badcd
--- /dev/null
+++ b/doc/rfc/rfc6366.txt
@@ -0,0 +1,955 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF)                          J. Valin
+Request for Comments: 6366                                       Mozilla
+Category: Informational                                           K. Vos
+ISSN: 2070-1721                                 Skype Technologies, S.A.
+                                                             August 2011
+
+
+                Requirements for an Internet Audio Codec
+
+Abstract
+
+   This document provides specific requirements for an Internet audio
+   codec.  These requirements address quality, sampling rate, bit-rate,
+   and packet-loss robustness, as well as other desirable properties.
+
+Status of This Memo
+
+   This document is not an Internet Standards Track specification; it is
+   published for informational purposes.
+
+   This document is a product of the Internet Engineering Task Force
+   (IETF).  It represents the consensus of the IETF community.  It has
+   received public review and has been approved for publication by the
+   Internet Engineering Steering Group (IESG).  Not all documents
+   approved by the IESG are a candidate for any level of Internet
+   Standard; see Section 2 of RFC 5741.
+
+   Information about the current status of this document, any errata,
+   and how to provide feedback on it may be obtained at
+   http://www.rfc-editor.org/info/rfc6366.
+
+Copyright Notice
+
+   Copyright (c) 2011 IETF Trust and the persons identified as the
+   document authors.  All rights reserved.
+
+   This document is subject to BCP 78 and the IETF Trust's Legal
+   Provisions Relating to IETF Documents
+   (http://trustee.ietf.org/license-info) in effect on the date of
+   publication of this document.  Please review these documents
+   carefully, as they describe your rights and restrictions with respect
+   to this document.  Code Components extracted from this document must
+   include Simplified BSD License text as described in Section 4.e of
+   the Trust Legal Provisions and are provided without warranty as
+   described in the Simplified BSD License.
+
+
+
+
+
+
+Valin & Vos                   Informational                     [Page 1]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+Table of Contents
+
+   1. Introduction ....................................................2
+   2. Definitions .....................................................3
+   3. Applications ....................................................3
+      3.1. Point-to-Point Calls .......................................3
+      3.2. Conferencing ...............................................4
+      3.3. Telepresence ...............................................5
+      3.4. Teleoperation and Remote Software Services .................5
+      3.5. In-Game Voice Chat .........................................5
+      3.6. Live Distributed Music Performances / Internet
+           Music Lessons ..............................................6
+      3.7. Delay-Tolerant Networking or Push-to-Talk Services .........6
+      3.8. Other Applications .........................................7
+   4. Constraints Imposed by the Internet on the Codec ................7
+   5. Detailed Basic Requirements .....................................8
+      5.1. Operating Space ............................................9
+      5.2. Quality and Bit-Rate .......................................9
+      5.3. Packet-Loss Robustness ....................................10
+      5.4. Computational Resources ...................................10
+   6. Additional Considerations ......................................12
+      6.1. Low-Complexity Audio Mixing ...............................12
+      6.2. Encoder Side Potential for Improvement ....................12
+      6.3. Layered Bit-Stream ........................................13
+      6.4. Partial Redundancy ........................................13
+      6.5. Stereo Support ............................................13
+      6.6. Bit Error Robustness ......................................13
+      6.7. Time Stretching and Shortening ............................14
+      6.8. Input Robustness ..........................................14
+      6.9. Support of Audio Forensics ................................14
+      6.10. Legacy Compatibility .....................................14
+   7. Security Considerations ........................................14
+   8. Acknowledgments ................................................15
+   9. Informative References .........................................15
+
+1.  Introduction
+
+   This document provides requirements for an audio codec designed
+   specifically for use over the Internet.  The requirements attempt to
+   address the needs of the most common Internet interactive audio
+   transmission applications and ensure good quality when operating in
+   conditions that are typical for the Internet.  These requirements
+   also address the quality, sampling rate, delay, bit-rate, and packet-
+   loss robustness.  Other desirable codec properties are considered as
+   well.
+
+
+
+
+
+
+Valin & Vos                   Informational                     [Page 2]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+2.  Definitions
+
+   Throughout this document, the following conventions refer to the
+   sampling rate of a signal:
+
+      Narrowband: 8 kilohertz (kHz)
+
+      Wideband: 16 kHz
+
+      Super-wideband: 24/32 kHz
+
+      Full-band: 44.1/48 kHz
+
+   Codec bit-rates in bits per second (bit/s) will be considered without
+   counting any overhead ((IP/UDP/RTP) headers, padding, etc.).  The
+   codec delay is the total algorithmic delay when one adds the codec
+   frame size to the "look-ahead".  Thus, it is the minimum
+   theoretically achievable end-to-end delay of a transmission system
+   that uses the codec.
+
+3.  Applications
+
+   The following applications should be considered for Internet audio
+   codecs, along with their requirements:
+
+   o  Point-to-point calls
+
+   o  Conferencing
+
+   o  Telepresence
+
+   o  Teleoperation
+
+   o  In-game voice chat
+
+   o  Live distributed music performances / Internet music lessons
+
+   o  Delay-tolerant networking or push-to-talk services
+
+   o  Other applications
+
+3.1.  Point-to-Point Calls
+
+   Point-to-point calls are voice over IP (VoIP) calls from two
+   "standard" (fixed or mobile) phones, and implemented in hardware or
+   software.  For these applications, a wideband codec is required,
+   along with narrowband support for compatibility with a public
+   switched telephone network (PSTN).  It is expected for the range of
+
+
+
+Valin & Vos                   Informational                     [Page 3]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+   useful bit-rates to be 12 - 32 kilobits per second (kbit/s) for
+   wideband speech and 8 - 16 kbit/s for narrowband speech.  The codec
+   delay must be less than 40 milliseconds (ms), but no more than 25 ms
+   is desirable.  Support for encoding music is not required, but it is
+   desirable for the codec not to make background (on-hold) music
+   excessively unpleasant to hear.  Also, the codec should be robust to
+   noise (produce intelligible speech and no annoying artifacts) even at
+   lower bit-rates.
+
+3.2.  Conferencing
+
+   Conferencing applications (that support multi-party calls) have
+   additional requirements on top of the requirements for point-to-point
+   calls.  Conferencing systems often have higher-fidelity audio
+   equipment and have greater network bandwidth available -- especially
+   when video transmission is involved.  Therefore, support for super-
+   wideband audio becomes important, with useful bit-rates in the 32 -
+   64 kbit/s range.  The ability to vary the bit-rate, according to the
+   "difficulty" of the audio signal, is a desirable feature for the
+   codec.  This not only saves bandwidth "on average", but it can also
+   help conference servers make more efficient use of the available
+   bandwidth, by using more bandwidth for important audio streams and
+   less bandwidth for less important ones (e.g., background noise).
+
+   Conferencing end-points often operate in hands-free conditions, which
+   creates acoustic echo problems.  Therefore, lower delay is important,
+   as it reduces the quality degradation due to any residual echo after
+   acoustic echo cancellation (AEC).  Consequently, the codec delay must
+   be less than 30 ms for this application.  An optional low-delay mode
+   with less than 10 ms delay is desirable, but not required.
+
+   Most conferencing systems operate with a bridge that mixes some (or
+   all) of the audio streams and sends them back to all the
+   participants.  In that case, it is important that the codec not
+   produce annoying artifacts when two voices are present at the same
+   time.  Also, this mixing operation should be as easy as possible to
+   perform.  To make it easier to determine which streams have to be
+   mixed (and which are noise/silence), it must be possible to measure
+   (or estimate) the voice activity in a packet without having to fully
+   decode the packet (saving most of the complexity when the packet need
+   not be decoded).  Also, the ability to save on the computational
+   complexity when mixing is also desirable, but not required.  For
+   example, a transform codec may make it possible to mix the streams in
+   the transform domain, without having to go back to time-domain.  Low-
+   complexity up-sampling and down-sampling within the codec is also a
+   desirable feature when mixing streams with different sampling rates.
+
+
+
+
+
+Valin & Vos                   Informational                     [Page 4]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+3.3.  Telepresence
+
+   Most telepresence applications can be considered to be essentially
+   very high-quality video-conferencing environments, so all of the
+   conferencing requirements also apply to telepresence.  In addition,
+   telepresence applications require super-wideband and full-band audio
+   capability with useful bit-rates in the 32 - 80 kbit/s range.  While
+   voice is still the most important signal to be encoded, it must be
+   possible to obtain good quality (even if not transparent) music.
+
+   Most telepresence applications require more than one audio channel,
+   so support for stereo and multi-channel is important.  While this can
+   always be accomplished by encoding multiple single-channel streams,
+   it is preferable to take advantage of the redundancy that exists
+   between channels.
+
+3.4.  Teleoperation and Remote Software Services
+
+   Teleoperation applications are similar to telepresence, with the
+   exception that they involve remote physical interactions.  For
+   example, the user may be controlling a robot while receiving real-
+   time audio feedback from that robot.  For these applications, the
+   delay has to be less than 10 ms.  The other requirements of
+   telepresence (quality, bit-rate, multi-channel) apply to
+   teleoperation as well.  The only exception is that mixing is not an
+   important issue for teleoperation.
+
+   The requirements for remote software services are similar to those of
+   teleoperation.  These applications include remote desktop
+   applications, remote virtualization, and interactive media
+   application being rendered remotely (e.g., video games rendered on
+   central servers).  For all these applications, full-band audio with
+   an algorithmic delay below 10 ms are important.
+
+3.5.  In-Game Voice Chat
+
+   An increasing number of computer/console games make use of VoIP to
+   allow players to communicate in real time.  The requirements for
+   gaming are similar to those of conferencing, with the main difference
+   being that narrowband compatibility is not necessary.  While for most
+   applications a codec delay up to 30 ms is acceptable, a low-delay (<
+   10 ms) option is highly desirable, especially for games with rapid
+   interactions.  The ability to use variable bit-rate (VBR) (with a
+   maximum allowed bit-rate) is also highly desirable because it can
+   significantly reduce the bandwidth requirement for a game server.
+
+
+
+
+
+
+Valin & Vos                   Informational                     [Page 5]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+3.6.  Live Distributed Music Performances / Internet Music Lessons
+
+   Live music over the Internet requires extremely low end-to-end delay
+   and is one of the most demanding applications for interactive audio
+   transmission.  It has been observed that for most scenarios, total
+   end-to-end delays up to 25 ms could be tolerated by musicians, with
+   the absolute limit (where none of the scenarios are possible) being
+   around 50 ms [carot09].  In order to achieve this low delay on the
+   Internet -- either in the same city or in a nearby city -- the
+   network propagation time must be taken into account.  When also
+   subtracting the delay of the audio buffer, jitter buffer, and
+   acoustic path, that leaves around 2 ms to 10 ms for the total delay
+   of the codec.  Considering the speed of light in fiber, every 1 ms
+   reduction in the codec delay increases the range over which
+   synchronization is possible by approximately 200 km.
+
+   Acoustic echo is expected to be an even more important issue for
+   network music than it is in conferencing, especially considering that
+   the music quality requirements essentially forbid the use of a "non-
+   linear processor" (NLP) with AEC.  This is another reason why very
+   low delay is essential.
+
+   Considering that the application is music, the full audio bandwidth
+   (44.1 or 48 kHz sampling rate) must be transmitted with a bit-rate
+   that is sufficient to provide near-transparent to transparent
+   quality.  With the current audio coding technology, this corresponds
+   to approximately 64 kbit/s to 128 kbit/s per channel.  As for
+   telepresence, support for two or more channels is often desired, so
+   it would be useful for a codec to be able to take advantage of the
+   redundancy that is often present between audio channels.
+
+3.7.  Delay-Tolerant Networking or Push-to-Talk Services
+
+   Internet transmissions are subjected to interruptions of connectivity
+   that severely disturb a phone call.  This may happen in cases of
+   route changes, handovers, slow fading, or device failures.  To
+   overcome this distortion, the phone call can be halted and resumed
+   after the connectivity has been reestablished again.
+
+   Also, if transmission capacity is lower than the minimal coding rate,
+   switching to a push-to-talk mode still allows for effective
+   communication.  In this situation, voice is transmitted at slower-
+   than-real-time bit-rate and conversations are interrupted until the
+   speech has been transmitted.
+
+   These modes require interrupting the audio playout and continuing
+   after a pause of arbitrary duration.
+
+
+
+
+Valin & Vos                   Informational                     [Page 6]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+3.8.  Other Applications
+
+   The above list is by no means a complete list of all applications
+   involving interactive audio transmission on the Internet.  However,
+   it is believed that meeting the needs of all these different
+   applications should be sufficient to ensure that the needs of other
+   applications not listed will also be met.
+
+4.  Constraints Imposed by the Internet on the Codec
+
+   Packet losses are inevitable on the Internet, and dealing with them
+   is one of the most fundamental requirements for an Internet audio
+   codec.  While any audio codec can be combined with a good packet-loss
+   concealment (PLC) algorithm, the important aspect is what happens on
+   the first packets received _after_ the loss.  More specifically, this
+   means that:
+
+   o  it should be possible to interpret the contents of any received
+      packet, irrespective of previous losses as specified in BCP 36
+      [PAYLOADS]; and
+
+   o  the decoder should re-synchronize as quickly as possible (i.e.,
+      the output should quickly converge to the output that would have
+      been obtained if no loss had occurred).
+
+   The constraint of being able to decode any packet implies the
+   following considerations for an audio codec:
+
+   o  The size of a compressed frame must be kept smaller than the MTU
+      to avoid fragmentation;
+
+   o  The interpretation of any parameter encoded in the bit-stream must
+      not depend on information contained in other packets.  For
+      example, it is not acceptable for a codec to allow signaling a
+      mode change in one packet and assume that subsequent frames will
+      be decoded according to that mode.
+
+   Although the interpretation of parameters cannot depend on other
+   packets, it is still reasonable to use some amount of prediction
+   across frames, provided that the predictors can resynchronize quickly
+   in case of a lost packet.  In this case, it is important to use the
+   best compromise between the gain in coding efficiency and the loss in
+   packet loss robustness due to the use of inter-frame prediction.  It
+   is a desirable property for the codec to allow some real-time control
+   of that trade-off, so that it can take advantage of more prediction
+   when the loss rate is small, while being more robust to losses when
+   the loss rate is high.
+
+
+
+
+Valin & Vos                   Informational                     [Page 7]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+   To improve the robustness to packet loss, it would be desirable for
+   the codec to allow an adaptive (data- and network-dependent) amount
+   of side information to help improve audio quality when losses occur.
+   For example, side information may include the retransmission of
+   certain parameters encoded in the previous frame(s).
+
+   To ensure freedom of implementation, decoder-side-only error
+   concealment does not need to be specified, although a functional PLC
+   algorithm is desirable as part of the codec reference implementation.
+   Obviously, any information signaled in the bit-stream intended to aid
+   PLC needs to be specified.
+
+   Another important property of the Internet is that it is mostly a
+   best-effort network, with no guaranteed bandwidth.  This means that
+   the codec has to be able to vary its output bit-rate dynamically (in
+   real time), without requiring an out-of-band signaling mechanism, and
+   without causing audible artifacts at the bit-rate change boundaries.
+   Additional desirable features are:
+
+   o  Having the possibility to use smooth bit-rate changes with one
+      byte/frame resolution;
+
+   o  Making it possible for a codec to adapt its bit-rate based on the
+      source signal being encoded (source-controlled VBR) to maximize
+      the quality for a certain _average_ bit-rate.
+
+   Because the Internet transmits data in bytes, a codec should produce
+   compressed data in integer numbers of bytes.  In general, the codec
+   design should take into consideration explicit congestion
+   notification (ECN) and may include features that would improve the
+   quality of an ECN implementation.
+
+   The IETF has defined a set of application-layer protocols to be used
+   for transmitting real-time transport of multimedia data, including
+   voice.  Thus, it is important for the resulting codec to be easy to
+   use with these protocols.  For example, it must be possible to create
+   an [RTP] payload format that conforms to BCP 36 [PAYLOADS].  If any
+   codec parameters need to be negotiated between end-points, the
+   negotiation should be as easy as possible to carry over session
+   initiation protocol (SIP) [RFC3261]/ session description protocol
+   (SDP) [RFC4566] or alternatively over extensible messaging and
+   presence protocol (XMPP) [RFC6120] / Jingle [XEP-0167].
+
+5.  Detailed Basic Requirements
+
+   This section summarizes all the constraints imposed by the target
+   applications and by the Internet into a set of actual requirements
+   for codec development.
+
+
+
+Valin & Vos                   Informational                     [Page 8]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+5.1.  Operating Space
+
+   The operating space for the target applications can be divided in
+   terms of delay: most applications require a "medium delay" (20-30
+   ms), while a few require a "very low delay" (< 10 ms).  It makes
+   sense to divide the space based on delay because lowering the delay
+   has a cost in terms of quality versus bit-rate.
+
+   For medium delay, the resulting codec must be able to efficiently
+   operate within the following range of bit-rates (per channel):
+
+   o  Narrowband: 8 kbit/s to 16 kbit/s
+
+   o  Wideband: 12 to 32 kbit/s
+
+   o  Super-wideband: 24 to 64 kbit/s
+
+   o  Full-band: 32 to 80 kbit/s
+
+   Obviously, a lower-delay codec that can operate in the above range is
+   also acceptable.
+
+   For very low delay, the resulting codec will need to operate within
+   the following range of bit-rates (per channel):
+
+   o  Super-wideband: 32 to 80 kbit/s
+
+   o  Full-band: 48 to 128 kbit/s
+
+   o  (Narrowband and wideband not required)
+
+5.2.  Quality and Bit-Rate
+
+   The quality of a codec is directly linked to the bit-rate, so these
+   two must be considered jointly.  When comparing the bit-rate of
+   codecs, the overhead of IP/UDP/RTP headers should not be considered,
+   but any additional bits required in the RTP payload format, after the
+   header (e.g., required signaling), should be considered.  In terms of
+   quality versus bit-rate, the codec to be developed must be better
+   than the following codecs, that are generally considered royalty-
+   free:
+
+   o  For narrowband: Speex (NB) [Speex], and internet low bit-rate
+      codec (iLBC)(*) [RFC3951]
+
+   o  For wideband: Speex (WB) [Speex], G.722.1(*) [ITU.G722.1]
+
+   o  For super-wideband/fullband: G.722.1C(*) [ITU.G722.1]
+
+
+
+Valin & Vos                   Informational                     [Page 9]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+   The codecs marked with (*) have additional licensing restrictions,
+   but the codec to be developed should still not perform significantly
+   worse.  In addition to the quality targets listed above, a desirable
+   objective is for the codec quality to be no worse than Adaptive
+   Multi-Rate (AMR-NB) and Adaptive Multi-Rate Wideband (AMR-WB).
+   Quality should be measured for multiple languages, including tonal
+   languages.  The case of multiple simultaneous voices (as sometimes
+   happens in conferencing) should be evaluated as well.
+
+   The comparison with the above codecs assumes that the codecs being
+   compared have similar delay characteristics.  The bit-rate required,
+   for a certain level of quality, may be higher than the referenced
+   codecs in cases where a much lower delay is required.  In that case,
+   the increase in bit-rate must be less than the ratio between the
+   delays.
+
+   It is desirable for the codecs to support source-controlled variable
+   bit-rate (VBR) to take advantage of different inputs, that require a
+   different bit-rate, to achieve the same quality.  However, it should
+   still be possible to use the codec at a truly constant bit-rate to
+   ensure that no information leak is possible when using an encrypted
+   channel.
+
+5.3.  Packet-Loss Robustness
+
+   Robustness to packet loss is a very important aspect of any codec to
+   be used on the Internet.  Codecs must maintain acceptable quality at
+   loss rates up to 5% and maintain good intelligibility up to 15% loss
+   rate.  At any sampling rate, bit-rate, and packet-loss rate, the
+   quality must be no less than the quality obtained with the Speex
+   codec or the Global System for Mobile Communications - Full Rate
+   (GSM-FR) codec in the same conditions.  The actual packet-loss
+   "patterns" to be used in testing must be obtained from real packet-
+   loss traces collected on the Internet, rather than from loss models.
+   These traces should be representative of the typical environments in
+   which the applications of Section 3 operate.  For example, traces
+   related to VoIP calls should consider the loss patterns observed for
+   typical home broadband and corporate connections.
+
+5.4.  Computational Resources
+
+   The resulting codec should be implementable on a wide range of
+   devices, so there should be a fixed-point implementation or at least
+   assurance that a reasonable fixed-point is possible.  The
+   computational resources figures listed below are meant to be upper
+   bounds.  Even below these bounds, resources should still be
+   minimized.  Any proposed increase in computational resources
+   consumption (e.g., to increase quality) should be carefully evaluated
+
+
+
+Valin & Vos                   Informational                    [Page 10]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+   even if the resulting resource consumption is below the upper bound.
+   Having variable complexity would be useful (but not required) in
+   achieving that goal as it would allow trading quality/bit-rate for
+   lower complexity.
+
+   The computational requirements for real-time encoding and decoding of
+   a mono signal on one core of a recent x86 CPU (as measured with the
+   Unix "time" utility or equivalent) are as follows:
+
+   o  Narrowband: 40 megahertz (MHz) (2% of a 2 gigahertz (GHz) CPU
+      core)
+
+   o  Wideband: 80 MHz (4% of a 2 GHz CPU core)
+
+   o  Super-wideband/fullband: 200 MHz (10% of a 2 GHz CPU core)
+
+   It is desirable that the MHz values listed above also be achievable
+   on fixed-point digital signal processors that are capable of single-
+   cycle multiply-accumulate operations (16x16 multiplication
+   accumulated into 32 bits).
+
+   For applications that require mixing (e.g., conferencing), it should
+   be possible to estimate the energy and/or the voice activity status
+   of the decoded signal with less than 10% of the complexity figures
+   listed above.
+
+   It is the intent to maximize the range of devices on which a codec
+   can be implemented.  Therefore, the reference implementation must not
+   depend on special hardware features or instructions to be present in
+   order to meet the complexity requirement.  However, it may be
+   desirable to take advantage of such hardware when available, (e.g.,
+   hardware accelerators for operations like Fast Fourier Transforms
+   (FFT) and convolutions).  A codec should also minimize the use of
+   saturating arithmetic so as to be implementable on architectures that
+   do not provide hardware saturation (e.g., ARMv4).
+
+   The combined codec size and data read-only memory (ROM) should be
+   small enough not to cause significant implementation problems on
+   typical embedded devices.  The codec context/state size required
+   should be no more than 2*R*C bytes in floating-point, where R is the
+   sampling rate and C is the number of channels.  For fixed-point, that
+   size should be less than R*C.  The scratch space required should also
+   be less than 2*R*C bytes for floating point or less than R*C bytes
+   for fixed-point.
+
+
+
+
+
+
+
+Valin & Vos                   Informational                    [Page 11]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+6.  Additional Considerations
+
+   There are additional features or characteristics that may be
+   desirable under some circumstances, but should not be part of the
+   strict requirements.  The benefit of meeting these considerations
+   should be weighted against the associated cost.
+
+6.1.  Low-Complexity Audio Mixing
+
+   In many applications that require a mixing server (e.g.,
+   conferencing, games), it is important to minimize the computational
+   cost of the mixing.  As much as possible, it should be possible to
+   perform the mixing with fewer computations than it would take to
+   decode all the streams, mix them, and re-encode the result.
+   Properties that reduce the complexity of the mixing process include:
+
+   o  The ability to derive sufficient parameters, such as loudness
+      and/or spectral envelope, for estimating voice activity of a
+      compressed frame without fully decoding that frame;
+
+   o  The ability to mix the streams in an intermediate representation
+      (e.g., transform domain), rather than having to fully decode the
+      signals before the mixing;
+
+   o  The use of bit-stream layers (Section 6.3) by aggregating a small
+      number of active streams at lower quality.
+
+   For conferencing applications, the total complexity of the decoding,
+   voice activity detection (VAD), and mixing should be considered when
+   evaluating proposals.
+
+6.2.  Encoder Side Potential for Improvement
+
+   In many codecs, it is possible to improve the quality by improving
+   the encoder without breaking compatibility (i.e., without changing
+   the decoder).  Potential for improvement varies from one codec to
+   another.  It is generally low for pulse code modulation (PCM) or
+   adaptive differential pulse code modulation (ADPCM) codecs and higher
+   for perceptual transform codecs.  All things being equal, being able
+   to improve a codec after the bit-stream is a desirable property.
+   However, this should not be done at the expense of quality in the
+   reference encoder.  Other potential improvements include signal-
+   adaptive frame size selection and improved discontinuous transmission
+   (DTX) algorithms that take advantage of predicting the decoder sides
+   packet loss concealment (PLC) algorithms.
+
+
+
+
+
+
+Valin & Vos                   Informational                    [Page 12]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+6.3.  Layered Bit-Stream
+
+   A layered codec makes it possible to transmit only a certain subset
+   of the bits and still obtain a valid bit-stream with a quality that
+   is equivalent to the quality that would be obtained from encoding at
+   the corresponding rate.  While this is not a necessary feature for
+   most applications, it can be desirable for cases where a "mixing
+   server" needs to handle a large number of streams with limited
+   computational resources.
+
+6.4.  Partial Redundancy
+
+   One possible way of increasing robustness to packet loss is to
+   include partial redundancy within packets.  This can be achieved
+   either by including the base layer of the previous frame (for a
+   layered codec) or by transmitting other parameters from the previous
+   frame(s) to assist the PLC algorithm in case of loss.  The ability to
+   include partial redundancy for high-loss scenarios is desirable,
+   provided that the feature can be dynamically turned on or off (so
+   that no bandwidth is wasted in case of loss-free transmission).
+
+6.5.  Stereo Support
+
+   It is highly desirable for the codec to have stereo support.  At a
+   minimum, the codec should be able to encode two channels
+   independently without causing significant stereo image artifacts.  It
+   is also desirable for the codec to take advantage of the inter-
+   channel redundancy in stereo audio to reduce the bit-rate (for an
+   equivalent quality) of stereo audio compared to coding channels
+   independently.
+
+6.6.  Bit Error Robustness
+
+   The vast majority of Internet-based applications do not need to be
+   robust to bit errors because packets either arrive unaltered or do
+   not arrive at all.  Therefore, the emphasis should be on packet-loss
+   robustness and packet-loss concealment.  That being said, often, the
+   extra robustness to bit errors can be achieved at no cost at all
+   (i.e., no increase in size, complexity, or bit-rate; no decrease in
+   quality, or packet-loss robustness, etc.).  In those cases, it is
+   useful to make a change that increases the robustness to bit errors.
+   This can be useful for applications that use UDP Lite transmission
+   (e.g., over a wireless LAN).  Robustness to packet loss should
+   *never* be sacrificed to achieve higher bit error robustness.
+
+
+
+
+
+
+
+Valin & Vos                   Informational                    [Page 13]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+6.7.  Time Stretching and Shortening
+
+   When adaptive jitter buffers are used, it is often necessary to
+   stretch or shorten the audio signal to allow changes in buffering.
+   While this operation can be performed directly on the decoder's
+   output, it is often more computationally efficient to stretch or
+   shorten the signal directly within the decoder.  It is desirable for
+   the reference implementation to provide a time stretching/shortening
+   implementation, although it should not be normative.
+
+6.8.  Input Robustness
+
+   The systems providing input to the encoder and receiving output from
+   the decoder may be far from ideal in actual use.  Input and output
+   audio streams may be corrupted by compounding non-linear artifacts
+   from analog hardware and digital processing.  The codecs to be
+   developed should be tested to ensure that they degrade gracefully
+   under adverse audio input conditions.  Types of digital corruption
+   that may be tested include tandeming, transcoding, low-quality
+   resampling, and digital clipping.  Types of analog corruption that
+   may be tested include microphones with substantial background noise,
+   analog clipping, and loudspeaker distortion.  No specific end-to-end
+   quality requirements are mandated for use with the proposed codec.
+   It is advisable, however, that several typical in situ environments/
+   processing chains be specified for the purpose of benchmarking end-
+   to-end quality with the proposed codec.
+
+6.9.  Support of Audio Forensics
+
+   Emergency calls can be analyzed using audio forensics if the context
+   and situation of the caller has to be identified.  Thus, it is
+   important to transmit not only the voice of the callers well, but
+   also to transmit background noise at high quality.  In these
+   situations, sounds or noises of low volume should also not be
+   compressed or dropped.  Therefore, the encoder must allow DTX to be
+   disabled when required (e.g., for emergency calls).
+
+6.10.  Legacy Compatibility
+
+   In order to create the best possible codec for the Internet, there is
+   no requirement for compatibility with legacy Internet codecs.
+
+7.  Security Considerations
+
+   Although this document itself does not have security considerations,
+   this section describes the security requirements for the codec.
+
+
+
+
+
+Valin & Vos                   Informational                    [Page 14]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+   As for any protocol to be used over the Internet, security is a very
+   important aspect to consider.  This goes beyond the obvious
+   considerations of preventing buffer overflows and similar attacks
+   that can lead to denial-of-service (DoS) or remote code execution.
+   One very important security aspect is to make sure that the decoders
+   have a bounded and reasonable worst-case complexity.  This prevents
+   an attacker from causing a DoS by sending packets that are specially
+   crafted to take a very long (or infinite) time to decode.
+
+   A more subtle aspect is the information leak that can occur when the
+   codec is used over an encrypted channel (e.g., [SRTP]).  For example,
+   it was suggested [wright08] [white11] that use of source-controlled
+   VBR may reveal some information about a conversation through the size
+   of the compressed packets.  Therefore, it should be possible to use
+   the codec at a truly constant bit-rate, if needed.
+
+8.  Acknowledgments
+
+   We would like to thank all the people who contributed directly or
+   indirectly to this document, including Slava Borilin, Christopher
+   Montgomery, Raymond (Juin-Hwey) Chen, Jason Fischl, Gregory Maxwell,
+   Alan Duric, Jonathan Christensen, Julian Spittka, Michael Knappe,
+   Christian Hoene, and Henry Sinnreich.  We would also like to thank
+   Cullen Jennings, Jonathan Rosenberg, and Gregory Lebovitz for their
+   advice.
+
+9.  Informative References
+
+   [RFC3261]    Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
+                A., Peterson, J., Sparks, R., Handley, M., and E.
+                Schooler, "SIP: Session Initiation Protocol", RFC 3261,
+                June 2002.
+
+   [RFC4566]    Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+                Description Protocol", RFC 4566, July 2006.
+
+   [RFC6120]    Saint-Andre, P., "Extensible Messaging and Presence
+                Protocol (XMPP): Core", RFC 6120, March 2011.
+
+   [XEP-0167]   Ludwig, S., Saint-Andre, P., Egan, S., McQueen, R., and
+                D. Cionoiu, "Jingle RTP Sessions", XSF XEP 0167,
+                December 2009.
+
+   [RFC3951]    Andersen, S., Duric, A., Astrom, H., Hagen, R., Kleijn,
+                W., and J. Linden, "Internet Low Bit Rate Codec (iLBC)",
+                RFC 3951, December 2004.
+
+
+
+
+
+Valin & Vos                   Informational                    [Page 15]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+   [ITU.G722.1] International Telecommunications Union, "Low-complexity
+                coding at 24 and 32 kbit/s for hands-free operation in
+                systems with low frame loss", ITU-T Recommendation
+                G.722.1, May 2005.
+
+   [Speex]      Xiph.Org Foundation, "Speex: http://www.speex.org/",
+                2003.
+
+   [carot09]    Carot, A., Werner, C., and T. Fischinger, "Towards a
+                Comprehensive Cognitive Analysis of Delay-Influenced
+                Rhythmical Interaction:
+                http://www.carot.de/icmc2009.pdf", 2009.
+
+   [PAYLOADS]   Handley, M. and C. Perkins, "Guidelines for Writers of
+                RTP Payload Format Specifications", BCP 36, RFC 2736,
+                December 1999.
+
+   [RTP]        Schulzrinne, H., Casner, S., Frederick, R., and V.
+                Jacobson, "RTP: A Transport Protocol for Real-Time
+                Applications", STD 64, RFC 3550, July 2003.
+
+   [SRTP]       Baugher, M., McGrew, D., Naslund, M., Carrara, E., and
+                K. Norrman, "The Secure Real-time Transport Protocol
+                (SRTP)", RFC 3711, March 2004.
+
+   [wright08]   Wright, C., Ballard, L., Coull, S., Monrose, F., and G.
+                Masson, "Spot me if you can: Uncovering spoken phrases
+                in encrypted VoIP conversations:
+                http://www.cs.jhu.edu/~cwright/oakland08.pdf", 2008.
+
+   [white11]    White, A., Matthews, A., Snow, K., and F. Monrose,
+                "Phonotactic Reconstruction of Encrypted VoIP
+                Conversations: Hookt on fon-iks
+                http://www.cs.unc.edu/~fabian/papers/foniks-oak11.pdf",
+                2011.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Vos                   Informational                    [Page 16]
+
+RFC 6366                Audio Codec Requirements             August 2011
+
+
+Authors' Addresses
+
+   Jean-Marc Valin
+   Mozilla
+   650 Castro Street
+   Mountain View, CA 94041
+   USA
+
+   EMail: jmvalin@jmvalin.ca
+
+
+   Koen Vos
+   Skype Technologies, S.A.
+   Stadsgarden 6
+   Stockholm, 11645
+   Sweden
+
+   EMail: koen.vos@skype.net
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Valin & Vos                   Informational                    [Page 17]
+