summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc4352.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4352.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc4352.txt')
-rw-r--r--doc/rfc/rfc4352.txt2131
1 files changed, 2131 insertions, 0 deletions
diff --git a/doc/rfc/rfc4352.txt b/doc/rfc/rfc4352.txt
new file mode 100644
index 0000000..0943dd0
--- /dev/null
+++ b/doc/rfc/rfc4352.txt
@@ -0,0 +1,2131 @@
+
+
+
+
+
+
+Network Working Group J. Sjoberg
+Request for Comments: 4352 M. Westerlund
+Category: Standards Track Ericsson
+ A. Lakaniemi
+ S. Wenger
+ Nokia
+ January 2006
+
+
+ RTP Payload Format for the
+ Extended Adaptive Multi-Rate Wideband (AMR-WB+) Audio Codec
+
+Status of This Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2006).
+
+Abstract
+
+ This document specifies a Real-time Transport Protocol (RTP) payload
+ format for Extended Adaptive Multi-Rate Wideband (AMR-WB+) encoded
+ audio signals. The AMR-WB+ codec is an audio extension of the AMR-WB
+ speech codec. It encompasses the AMR-WB frame types and a number of
+ new frame types designed to support high-quality music and speech. A
+ media type registration for AMR-WB+ is included in this
+ specification.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 1]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 2. Definitions .....................................................4
+ 2.1. Glossary ...................................................4
+ 2.2. Terminology ................................................4
+ 3. Background of AMR-WB+ and Design Principles .....................4
+ 3.1. The AMR-WB+ Audio Codec ....................................4
+ 3.2. Multi-rate Encoding and Rate Adaptation ....................8
+ 3.3. Voice Activity Detection and Discontinuous Transmission ....8
+ 3.4. Support for Multi-Channel Session ..........................8
+ 3.5. Unequal Bit-Error Detection and Protection .................9
+ 3.6. Robustness against Packet Loss .............................9
+ 3.6.1. Use of Forward Error Correction (FEC) ...............9
+ 3.6.2. Use of Frame Interleaving ..........................10
+ 3.7. AMR-WB+ Audio over IP Scenarios ...........................11
+ 3.8. Out-of-Band Signaling .....................................11
+ 4. RTP Payload Format for AMR-WB+ .................................12
+ 4.1. RTP Header Usage ..........................................13
+ 4.2. Payload Structure .........................................14
+ 4.3. Payload Definitions .......................................14
+ 4.3.1. Payload Header .....................................14
+ 4.3.2. The Payload Table of Contents ......................15
+ 4.3.3. Audio Data .........................................20
+ 4.3.4. Methods for Forming the Payload ....................21
+ 4.3.5. Payload Examples ...................................21
+ 4.4. Interleaving Considerations ...............................24
+ 4.5. Implementation Considerations .............................25
+ 4.5.1. ISF Recovery in Case of Packet Loss ................26
+ 4.5.2. Decoding Validation ................................28
+ 5. Congestion Control .............................................28
+ 6. Security Considerations ........................................28
+ 6.1. Confidentiality ...........................................29
+ 6.2. Authentication and Integrity ..............................29
+ 7. Payload Format Parameters ......................................29
+ 7.1. Media Type Registration ...................................30
+ 7.2. Mapping Media Type Parameters into SDP ....................32
+ 7.2.1. Offer-Answer Model Considerations ..................32
+ 7.2.2. Examples ...........................................34
+ 8. IANA Considerations ............................................34
+ 9. Contributors ...................................................34
+ 10. Acknowledgements ..............................................34
+ 11. References ....................................................35
+ 11.1. Normative References .....................................35
+ 11.2. Informative References ...................................35
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 2]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+1. Introduction
+
+ This document specifies the payload format for packetization of
+ Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] encoded audio
+ signals into the Real-time Transport Protocol (RTP) [3]. The payload
+ format supports the transmission of mono or stereo audio, aggregating
+ multiple frames per payload, and mechanisms enhancing the robustness
+ of the packet stream against packet loss.
+
+ The AMR-WB+ codec is an extension of the Adaptive Multi-Rate Wideband
+ (AMR-WB) speech codec. New features include extended audio bandwidth
+ to enable high quality for non-speech signals (e.g., music), native
+ support for stereophonic audio, and the option to operate on, and
+ switch between, several internal sampling frequencies (ISFs). The
+ primary usage scenario for AMR-WB+ is the transport over IP.
+ Therefore, interworking with other transport networks, as discussed
+ for AMR-WB in [7], is not a major concern and hence not addressed in
+ this memo.
+
+ The expected key application for AMR-WB+ is streaming. To make the
+ packetization process on a streaming server as efficient as possible,
+ an octet-aligned payload format is desirable. Therefore, a
+ bandwidth-efficient mode (as defined for AMR-WB in [7]) is not
+ specified herein; the bandwidth savings of the bandwidth-efficient
+ mode would be very small anyway, since all extension frame types are
+ octet aligned.
+
+ The stereo encoding capability of AMR-WB+ renders the support for
+ multi-channel transport at RTP payload format level, as specified for
+ AMR-WB [7], obsolete. Therefore, this feature is not included in
+ this memo.
+
+ This specification does not include a definition of a file format for
+ AMR-WB+. Instead, it refers to the ISO-based 3GP file format [14],
+ which supports AMR-WB+ and provides all functionality required. The
+ 3GP format also supports storage of AMR, AMR-WB, and many other
+ multi-media formats, thereby allowing synchronized playback.
+
+ The rest of the document is organized as follows: Background
+ information on the AMR-WB+ codec, and design principles, can be found
+ in Section 3. The payload format itself is specified in Section 4.
+ Sections 5 and 6 discuss congestion control and security
+ considerations, respectively. In Section 7, a media type
+ registration is provided.
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 3]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+2. Definitions
+
+2.1. Glossary
+
+ 3GPP - Third Generation Partnership Project
+ AMR - Adaptive Multi-Rate (Codec)
+ AMR-WB - Adaptive Multi-Rate Wideband (Codec)
+ AMR-WB+ - Extended Adaptive Multi-Rate Wideband (Codec)
+ CN - Comfort Noise
+ DTX - Discontinuous Transmission
+ FEC - Forward Error Correction
+ FT - Frame Type
+ ISF - Internal Sampling Frequency
+ SCR - Source-Controlled Rate Operation
+ SID - Silence Indicator (the frames containing only CN
+ parameters)
+ TFI - Transport Frame Index
+ TS - Timestamp
+ VAD - Voice Activity Detection
+ UED - Unequal Error Detection
+ UEP - Unequal Error Protection
+
+2.2. Terminology
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [2].
+
+3. Background of AMR-WB+ and Design Principles
+
+ The Extended Adaptive Multi-Rate Wideband (AMR-WB+) [1] audio codec
+ is designed to compress speech and audio signals at low bit-rate and
+ good quality. The codec is specified by the Third Generation
+ Partnership Project (3GPP). The primary target applications are 1)
+ the packet-switched streaming service (PSS) [13], 2) multimedia
+ messaging service (MMS) [18], and 3) multimedia broadcast and
+ multicast service (MBMS) [19]. However, due to its flexibility and
+ robustness, AMR-WB+ is also well suited for streaming services in
+ other highly varying transport environments, for example, the
+ Internet.
+
+3.1. The AMR-WB+ Audio Codec
+
+ 3GPP originally developed the AMR-WB+ audio codec for streaming and
+ messaging services in Global System for Mobile communications (GSM)
+ and third generation (3G) cellular systems. The codec is designed as
+ an audio extension of the AMR-WB speech codec. The extension adds
+ new functionality to the codec in order to provide high audio quality
+
+
+
+Sjoberg, et al. Standards Track [Page 4]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ for a wide range of signals including music. Stereophonic operation
+ has also been added. A new, high-efficiency hybrid stereo coding
+ algorithm enables stereo operation at bit-rates as low as 6.2 kbit/s.
+
+ The AMR-WB+ codec includes the nine frame types specified for AMR-WB,
+ extended by new bit-rates ranging from 5.2 to 48 kbit/s. The AMR-WB
+ frame types can employ only a 16000 Hz sampling frequency and operate
+ only on monophonic signals. The newly introduced extension frame
+ types, however, can operate at a number of internal sampling
+ frequencies (ISFs), both in mono and stereo. Please see Table 24 in
+ [1] for details. The output sampling frequency of the decoder is
+ limited to 8, 16, 24, 32, or 48 kHz.
+
+ An overview of the AMR-WB+ encoding operations is provided as
+ follows. The encoder receives the audio sampled at, for example, 48
+ kHz. The encoding process starts with pre-processing and resampling
+ to the user-selected ISF. The encoding is performed on equally sized
+ super-frames. Each super-frame corresponds to 2048 samples per
+ channel, at the ISF. The codec carries out a number of encoding
+ decisions for each super-frame, thereby choosing between different
+ encoding algorithms and block lengths, so as to achieve a fidelity-
+ optimized encoding adapted to the signal characteristics of the
+ source. The stereo encoding (if used) executes separately from the
+ monophonic core encoding, thus enabling the selection of different
+ combinations of core and stereo encoding rates. The resulting
+ encoded audio is produced in four transport frames of equal length.
+ Each transport frame corresponds to 512 samples at the ISF and is
+ individually usable by the decoder, provided that its position in the
+ super-frame structure is known.
+
+ The codec supports 13 different ISFs, ranging from 12.8 to 38.4 kHz,
+ as described by Table 24 of [1]. The high number of ISFs allows a
+ trade-off between the audio bandwidth and the target bit-rate. As
+ encoding is performed on 2048 samples at the ISF, the duration of a
+ super-frame and the effective bit-rate of the frame type in use
+ varies.
+
+ The ISF of 25600 Hz has a super-frame duration of 80 ms. This is the
+ 'nominal' value used to describe the encoding bit-rates henceforth.
+ Assuming this normalization, the ISF selection results in bit-rate
+ variations from 1/2 up to 3/2 of the nominal bit-rate.
+
+ The encoding for the extension modes is performed as one monophonic
+ core encoding and one stereo encoding. The core encoding is executed
+ by splitting the monophonic signal into a lower and a higher
+ frequency band. The lower band is encoded employing either algebraic
+ code excited linear prediction (ACELP) or transform coded excitation
+ (TCX). This selection can be made once per transport frame, but must
+
+
+
+Sjoberg, et al. Standards Track [Page 5]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ obey certain limitations of legal combinations within the super-
+ frame. The higher band is encoded using a low-rate parametric
+ bandwidth extension approach.
+
+ The stereo signal is encoded employing a similar frequency band
+ decomposition; however, here the signal is divided into three bands
+ that are individually parameterized.
+
+ The total bit-rate produced by the extension is the result of the
+ combination of the encoder's core rate, stereo rate, and ISF. The
+ extension supports 8 different core encoding rates, producing bit-
+ rates between 10.4 and 24.0 kbit/s; see Table 22 in [1]. There are
+ 16 stereo encoding rates generating bit-rates between 2.0 and 8.0
+ kbit/s; see Table 23 in [1]. The frame type uniquely identifies the
+ AMR-WB modes, 4 fixed extension rates (see below), 24 combinations of
+ core and stereo rates for stereo signals, and the 8 core rates for
+ mono signals, as listed in Table 25 in [1]. This implies that the
+ AMR-WB+ supports encoding rates between 10.4 and 32 kbit/s, assuming
+ an ISF of 25600 Hz.
+
+ Different ISFs allow for additional freedom in the produced bit-rates
+ and audio quality. The selection of an ISF changes the available
+ audio bandwidth of the reconstructed signal, and also the total bit-
+ rate. The bit-rate for a given combination of frame type and ISF is
+ determined by multiplying the frame type's bit-rate with the used
+ ISF's bit-rate factor; see Table 24 in [1].
+
+ The extension also has four frame types which have fixed ISFs.
+ Please see frame types 10-13 in Table 21 in [1]. These four pre-
+ defined frame types have a fixed input sampling frequency at the
+ encoder, which can be set at either 16 or 24 kHz. Like the AMR-WB
+ frame types, transport frames encoded utilizing these frame types
+ represent exactly 20 ms of the audio signal. However, they are also
+ part of 80 ms super-frames. Frame types 0-13 (AMR-WB and fixed
+ extension rates), as listed in Table 21 in [1], do not require an
+ explicit ISF indication. The other frame types, 14-47, require the
+ ISF employed to be indicated.
+
+ The 32 different frame types of the extension, in combination with 13
+ ISFs, allows for a great flexibility in bit-rate and selection of
+ desired audio quality. A number of combinations exist that produce
+ the same codec bit-rate. For example, a 32 kbit/s audio stream can
+ be produced by utilizing frame type 41 (i.e., 25.6 kbit/s) and the
+ ISF of 32kHz (5/4 * (19.2+6.4) = 32 kbit/s), or frame type 47 and the
+ ISF of 25.6 kHz (1 * (24 + 8) = 32 kbit/s). Which combination is
+ more beneficial for the perceived audio quality depends on the
+ content. In the above example, the first case provides a higher
+ audio bandwidth, while the second one spends the same number of bits
+
+
+
+Sjoberg, et al. Standards Track [Page 6]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ on somewhat narrower audio bandwidth but provides higher fidelity.
+ Encoders are free to select the combination they deem most
+ beneficial.
+
+ Since a transport frame always corresponds to 512 samples at the used
+ ISF, its duration is limited to the range 13.33 to 40 ms; see Table
+ 1. An RTP Timestamp clock rate of 72000 Hz, as mandated by this
+ specification, results in AMR-WB+ transport frame lengths of 960 to
+ 2880 timestamp ticks, depending solely on the selected ISF.
+
+ Index ISF Duration(ms) Duration(TS Ticks @ 72 kHz)
+ ------------------------------------------------------
+ 0 N/A 20 1440
+ 1 12800 40 2880
+ 2 14400 35.55 2560
+ 3 16000 32 2304
+ 4 17067 30 2160
+ 5 19200 26.67 1920
+ 6 21333 24 1728
+ 7 24000 21.33 1536
+ 8 25600 20 1440
+ 9 28800 17.78 1280
+ 10 32000 16 1152
+ 11 34133 15 1080
+ 12 36000 14.22 1024
+ 13 38400 13.33 960
+
+ Table 1: Normative number of RTP Timestamp Ticks for each
+ Transport Frame depending on ISF (ISF and Duration in
+ ms are rounded)
+
+ The encoder is free to change both the ISF and the encoding frame
+ type (both mono and stereo) during a session. For the extension
+ frame types with index 10-13 and 16-47, the ISF and frame type
+ changes are constrained to occur at super-frame boundaries. This
+ implies that, for the frame types mentioned, the ISF is constant
+ throughout a super-frame. This limitation does not apply for frame
+ types with index 0-9, 14, and 15; i.e., the original AMR-WB frame
+ types.
+
+ A number of features of the AMR-WB+ codec require special
+ consideration from a transport point of view, and solutions that
+ could perhaps be viewed as unorthodox. First, there are constraints
+ on the RTP timestamping, due to the relationship of the frame
+ duration and the ISFs. Second, each frame of encoded audio must
+ maintain information about its frame type, ISF, and position in the
+ super-frame.
+
+
+
+
+Sjoberg, et al. Standards Track [Page 7]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+3.2. Multi-rate Encoding and Rate Adaptation
+
+ The multi-rate encoding capability of AMR-WB+ is designed to preserve
+ high audio quality under a wide range of bandwidth requirements and
+ transmission conditions.
+
+ AMR-WB+ enables seamless switching between frame types that use the
+ same number of audio channels and the same ISF. Every AMR-WB+ codec
+ implementation is required to support all frame types defined by the
+ codec and must be able to handle switching between any two frame
+ types. Switching between frame types employing a different number of
+ audio channels or a different ISF must also be supported, but it may
+ not be completely seamless. Therefore, it is recommended to perform
+ such switching infrequently and, if possible, during periods of
+ silence.
+
+3.3. Voice Activity Detection and Discontinuous Transmission
+
+ AMR-WB+ supports the same algorithms as AMR-WB for voice activity
+ detection (VAD) and generation of comfort noise (CN) parameters
+ during silence periods. However, these functionalities can only be
+ used in conjunction with the AMR-WB frame types (FT=0-8). This
+ option allows reducing the number of transmitted bits and packets
+ during silence periods to a minimum. The operation of sending CN
+ parameters at regular intervals during silence periods is usually
+ called discontinuous transmission (DTX) or source controlled rate
+ (SCR) operation. The AMR-WB+ frames containing CN parameters are
+ called Silence Indicator (SID) frames. More details about the VAD
+ and DTX functionality are provided in [4] and [5].
+
+3.4. Support for Multi-Channel Session
+
+ Some of the AMR-WB+ frame types support the encoding of stereophonic
+ audio. Because of this native support for a two-channel stereophonic
+ signal, it does not seem necessary to support multi-channel transport
+ with separate codec instances, as specified in the AMR-WB RTP payload
+ [7]. The codec has the capability of stereo to mono downmixing as
+ part of the decoding process. Thus, a receiver that is only capable
+ of playout of monophonic audio must still be able to decode and play
+ signals originally encoded and transmitted as stereo. However, to
+ avoid spending bits on a stereo encoding that is not going to be
+ utilized, a mechanism is defined in this specification to signal
+ mono-only audio.
+
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 8]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+3.5. Unequal Bit-Error Detection and Protection
+
+ The audio bits encoded in each AMR-WB frame are sorted according to
+ their different perceptual sensitivity to bit errors. In cellular
+ systems, for example, this property can be exploited to achieve
+ better voice quality, by using unequal error protection and detection
+ (UEP and UED) mechanisms. However, the bits of the extension frame
+ types of the AMR-WB+ codec do not have a consistent perceptual
+ significance property and are not sorted in this order. Thus, UEP or
+ UED is meaningless with the extension frame types. If there is a
+ need to use UEP or UED for AMR-WB frame types, it is recommended that
+ RFC 3267 [7] be used.
+
+3.6. Robustness against Packet Loss
+
+ The payload format supports two mechanisms to improve robustness
+ against packet loss: simple forward error correction (FEC) and frame
+ interleaving.
+
+3.6.1. Use of Forward Error Correction (FEC)
+
+ Generic forward error correction within RTP is defined, for example,
+ in RFC 2733 [11]. Audio redundancy coding is defined in RFC 2198
+ [12]. Either scheme can be used to add redundant information to the
+ RTP packet stream and make it more resilient to packet losses, at the
+ expense of a higher bit rate. Please see either RFC for a discussion
+ of the implications of the higher bit rate to network congestion.
+
+ In addition to these media-unaware mechanisms, this memo specifies an
+ AMR-WB+ specific form of audio redundancy coding, which may be
+ beneficial in terms of packetization overhead.
+
+ Conceptually, previously transmitted transport frames are aggregated
+ together with new ones. A sliding window is used to group the frames
+ to be sent in each payload. Figure 1 below shows an example.
+
+ --+--------+--------+--------+--------+--------+--------+--------+--
+ | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
+ --+--------+--------+--------+--------+--------+--------+--------+--
+
+ <---- p(n-1) ---->
+ <----- p(n) ----->
+ <---- p(n+1) ---->
+ <---- p(n+2) ---->
+ <---- p(n+3) ---->
+ <---- p(n+4) ---->
+
+ Figure 1: An example of redundant transmission
+
+
+
+Sjoberg, et al. Standards Track [Page 9]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ Here, each frame is retransmitted once in the following RTP payload
+ packet. F(n-2)...f(n+4) denote a sequence of audio frames, and
+ p(n-1)...p(n+4) a sequence of payload packets.
+
+ The mechanism described does not require signaling at the session
+ setup. In other words, the audio sender can choose to use this
+ scheme without consulting the receiver. For a certain timestamp, the
+ receiver may receive multiple copies of a frame containing encoded
+ audio data or frames indicated as NO_DATA. The cost of this scheme
+ is bandwidth and the receiver delay necessary to allow the redundant
+ copy to arrive.
+
+ This redundancy scheme provides a functionality similar to the one
+ described in RFC 2198, but it works only if both original frames and
+ redundant representations are AMR-WB+ frames. When the use of other
+ media coding schemes is desirable, one has to resort to RFC 2198.
+
+ The sender is responsible for selecting an appropriate amount of
+ redundancy based on feedback about the channel conditions, e.g., in
+ the RTP Control Protocol (RTCP) [3] receiver reports. The sender is
+ also responsible for avoiding congestion, which may be exacerbated by
+ redundancy (see Section 5 for more details).
+
+3.6.2. Use of Frame Interleaving
+
+ To decrease protocol overhead, the payload design allows several
+ audio transport frames to be encapsulated into a single RTP packet.
+ One of the drawbacks of such an approach is that in case of packet
+ loss several consecutive frames are lost. Consecutive frame loss
+ normally renders error concealment less efficient and usually causes
+ clearly audible and annoying distortions in the reconstructed audio.
+ Interleaving of transport frames can improve the audio quality in
+ such cases by distributing the consecutive losses into a number of
+ isolated frame losses, which are easier to conceal. However,
+ interleaving and bundling several frames per payload also increases
+ end-to-end delay and sets higher buffering requirements. Therefore,
+ interleaving is not appropriate for all use cases or devices.
+ Streaming applications should most likely be able to exploit
+ interleaving to improve audio quality in lossy transmission
+ conditions.
+
+ Note that this payload design supports the use of frame interleaving
+ as an option. The usage of this feature needs to be negotiated in
+ the session setup.
+
+ The interleaving supported by this format is rather flexible. For
+ example, a continuous pattern can be defined, as depicted in Figure
+ 2.
+
+
+
+Sjoberg, et al. Standards Track [Page 10]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ --+--------+--------+--------+--------+--------+--------+--------+--
+ | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
+ --+--------+--------+--------+--------+--------+--------+--------+--
+
+ [ P(n) ]
+ [ P(n+1) ] [ P(n+1) ]
+ [ P(n+2) ] [ P(n+2) ]
+ [ P(n+3) ] [P(
+ [ P(n+4) ]
+
+ Figure 2: An example of interleaving pattern that has constant delay
+
+ In Figure 2 the consecutive frames, denoted f(n-2) to f(n+4), are
+ aggregated into packets P(n) to P(n+4), each packet carrying two
+ frames. This approach provides an interleaving pattern that allows
+ for constant delay in both the interleaving and deinterleaving
+ processes. The deinterleaving buffer needs to have room for at least
+ three frames, including the one that is ready to be consumed. The
+ storage space for three frames is needed, for example, when f(n) is
+ the next frame to be decoded: since frame f(n) was received in packet
+ P(n+2), which also carried frame f(n+3), both these frames are stored
+ in the buffer. Furthermore, frame f(n+1) received in the previous
+ packet, P(n+1), is also in the deinterleaving buffer. Note also that
+ in this example the buffer occupancy varies: when frame f(n+1) is the
+ next one to be decoded, there are only two frames, f(n+1) and f(n+3),
+ in the buffer.
+
+3.7. AMR-WB+ Audio over IP Scenarios
+
+ Since the primary target application for the AMR-WB+ codec is
+ streaming over packet networks, the most relevant usage scenario for
+ this payload format is IP end-to-end between a server and a terminal,
+ as shown in Figure 3.
+
+ +----------+ +----------+
+ | | IP/UDP/RTP/AMR-WB+ | |
+ | SERVER |<------------------------>| TERMINAL |
+ | | | |
+ +----------+ +----------+
+
+ Figure 3: Server to terminal IP scenario
+
+3.8. Out-of-Band Signaling
+
+ Some of the options of this payload format remain constant throughout
+ a session. Therefore, they can be controlled/negotiated at the
+ session setup. Throughout this specification, these options and
+ variables are denoted as "parameters to be established through out-
+
+
+
+Sjoberg, et al. Standards Track [Page 11]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ of-band means". In Section 7, all the parameters are formally
+ specified in the form of media type registration for the AMR-WB+
+ encoding. The method used to signal these parameters at session
+ setup or to arrange prior agreement of the participants is beyond the
+ scope of this document; however, Section 7.2 provides a mapping of
+ the parameters into the Session Description Protocol (SDP) [6] for
+ those applications that use SDP.
+
+4. RTP Payload Format for AMR-WB+
+
+ The main emphasis in the payload design for AMR-WB+ has been to
+ minimize the overhead in typical use cases, while providing full
+ flexibility with a slightly higher overhead. In order to keep the
+ specification reasonably simple, we refrained from defining frame-
+ specific parameters for each frame type. Instead, a few common
+ parameters were specified that cover all types of frames.
+
+ The payload format has two modes: basic mode and interleaved mode.
+ The main structural difference between the two modes is the extension
+ of the table of content entries with frame displacement fields when
+ operating in the interleaved mode. The basic mode supports
+ aggregation of multiple consecutive frames in a payload. The
+ interleaved mode supports aggregation of multiple frames that are
+ non-consecutive in time. In both modes it is possible to have frames
+ encoded with different frame types in the same payload. The ISF must
+ remain constant throughout the payload of a single packet.
+
+ The payload format is designed around the property of AMR-WB+ frames
+ that the frames are consecutive in time and share the same frame
+ duration (in the absence of an ISF change). This enables the
+ receiver to derive the timestamp for an individual frame within a
+ payload. In basic mode, the deriving process is based on the order
+ of frames. In interleaved mode, it is based on the compact
+ displacement fields. The frame timestamps are used to regenerate the
+ correct order of frames after reception, identify duplicates, and
+ detect lost frames that require concealment.
+
+ The interleaving scheme of this payload format is significantly more
+ flexible than the one specified in RFC 3267. The AMR and AMR-WB
+ payload format is only capable of using periodic patterns with frames
+ taken from an interleaving group at fixed intervals. The
+ interleaving scheme of this specification, in contrast, allows for
+ any interleaving pattern, as long as the distance in decoding order
+ between any two adjacent frames is not more than 256 frames. Note
+ that even at the highest ISF this allows an interleaving depth of up
+ to 3.41 seconds.
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 12]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ To allow for error resiliency through redundant transmission, the
+ periods covered by multiple packets MAY overlap in time. A receiver
+ MUST be prepared to receive any audio frame multiple times. All
+ redundantly sent frames MUST use the same frame type and ISF, and
+ MUST have the same RTP timestamp, or MUST be a NO_DATA frame (FT=15).
+
+ The payload consists of octet-aligned elements (header, ToC, and
+ audio frames). Only the audio frames for AMR-WB frame types (0-9)
+ require padding for octet alignment. If additional padding is
+ desired, then the P bit in the RTP header MAY be set, and padding MAY
+ be appended as specified in [3].
+
+4.1. RTP Header Usage
+
+ The format of the RTP header is specified in [3]. This payload
+ format uses the fields of the header in a manner consistent with that
+ specification.
+
+ The RTP timestamp corresponds to the sampling instant of the first
+ sample encoded for the first frame in the packet. The timestamp
+ clock frequency SHALL be 72000 Hz. This frequency allows the frame
+ duration to be integer RTP timestamp ticks for the ISFs specified in
+ Table 1. It also provides reasonable conversion factors to the
+ input/output audio sampling frequencies supported by the codec. See
+ Section 4.3.2.3 for guidance on how to derive the RTP timestamp for
+ any audio frame beyond the first one.
+
+ The RTP header marker bit (M) SHALL be set to 1 whenever the first
+ frame carried in the packet is the first frame in a talkspurt (see
+ the definition of talkspurt in Section 4.1 of [9]). For all other
+ packets, the marker bit SHALL be set to zero (M=0).
+
+ The assignment of an RTP payload type for the format defined in this
+ memo is outside the scope of this document. The RTP profile in use
+ either assigns a static payload type or mandates binding the payload
+ type dynamically.
+
+ The media type parameter "channels" is used to indicate the maximum
+ number of channels allowed for a given payload type. A payload type
+ where channels=1 (mono) SHALL only carry mono content. A payload
+ type for which channels=2 has been declared MAY carry both mono and
+ stereo content. Note that this definition is different from the one
+ in RFC 3551 [9]. As mentioned before, the AMR-WB+ codec handles the
+ support of stereo content and the (eventual) downmixing of stereo to
+ mono internally. This makes it unnecessary to negotiate for the
+ number of channels for reasons other than bit-rate efficiency.
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 13]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+4.2. Payload Structure
+
+ The payload consists of a payload header, a table of contents, and
+ the audio data representing one or more audio frames. The following
+ diagram shows the general payload format layout:
+
+ +----------------+-------------------+----------------
+ | payload header | table of contents | audio data ...
+ +----------------+-------------------+----------------
+
+ Payloads containing more than one audio frame are called compound
+ payloads.
+
+ The following sections describe the variations taken by the payload
+ format depending on the mode in use: basic mode or interleaved mode.
+
+4.3. Payload Definitions
+
+4.3.1. Payload Header
+
+ The payload header carries data that is common for all frames in the
+ payload. The structure of the payload header is described below.
+
+ 0 1 2 3 4 5 6 7
+ +-+-+-+-+-+-+-+-+
+ | ISF |TFI|L|
+ +-+-+-+-+-+-+-+-+
+
+ ISF (5 bits): Indicates the Internal Sampling Frequency employed for
+ all frames in this payload. The index value corresponds to
+ internal sampling frequency as specified in Table 24 in [1]. This
+ field SHALL be set to 0 for payloads containing frames with Frame
+ Type values 0-13.
+
+ TFI (2 bits): Transport Frame Index, from 0 (first) to 3 (last),
+ indicating the position of the first transport frame of this
+ payload in the AMR-WB+ super-frame structure. For payloads with
+ frames of only Frame Type values 0-9, this field SHALL be set to 0
+ by the sender. The TFI value for a frame of type 0-9 SHALL be
+ ignored by the receiver. Note that the frame type is coded in the
+ table of contents (as discussed later); hence, the mentioned
+ dependencies of the frame type can be applied easily by
+ interpreting only values carried in the payload header. It is not
+ necessary to interpret the audio bit stream itself.
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 14]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ L (1 bit): Long displacement field flag for payloads in interleaved
+ mode. If set to 0, four-bit displacement fields are used to
+ indicate interleaving offset; if set to 1, displacement fields of
+ eight bits are used (see Section 4.3.2.2). For payloads in the
+ basic mode, this bit SHALL be set to 0 and SHALL be ignored by the
+ receiver.
+
+ Note that frames employing different ISF values require encapsulation
+ in separate packets. Thus, special considerations apply when
+ generating interleaved packets and an ISF change is executed. In
+ particular, frames that, according to the previously used
+ interleaving pattern, would be aggregated into a single packet have
+ to be separated into different packets, so that the aforementioned
+ condition (all frames in a packet share the ISF) remains true. A
+ naive implementation that splits the frames with different ISF into
+ different packets can result in up to twice the number of RTP
+ packets, when compared to an optimal interleaved solution.
+ Alteration of the interleaving before and after the ISF change may
+ reduce the need for extra RTP packets.
+
+4.3.2. The Payload Table of Contents
+
+ The table of contents (ToC) consists of a list of entries, each entry
+ corresponds to a group of audio frames carried in the payload, as
+ depicted below.
+
+ +----------------+----------------+- ... -+----------------+
+ | ToC entry #1 | ToC entry #2 | ToC entry #N |
+ +----------------+----------------+- ... -+----------------+
+
+ When multiple groups of frames are present in a payload, the ToC
+ entries SHALL be placed in the packet in order of increasing RTP
+ timestamp value (modulo 2^32) of the first transport frame the TOC
+ entry represents.
+
+4.3.2.1. ToC Entry in the Basic Mode
+
+ A ToC entry of a payload in the basic mode has the following format:
+
+ 0 1
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |F| Frame Type | #frames |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ F (1 bit): If set to 1, indicates that this ToC entry is followed by
+ another ToC entry; if set to 0, indicates that this ToC entry is
+ the last one in the ToC.
+
+
+
+Sjoberg, et al. Standards Track [Page 15]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ Frame Type (FT) (7 bits): Indicates the audio codec frame type used
+ for the group of frames referenced by this ToC entry. FT
+ designates the combination of AMR-WB+ core and stereo rate, one of
+ the special AMR-WB+ frame types, the AMR-WB rate, or comfort
+ noise, as specified by Table 25 in [1].
+
+ #frames (8 bits): Indicates the number of frames in the group
+ referenced by this ToC entry. ToC entries with this field equal
+ to 0 (which would indicate zero frames) SHALL NOT be used, and
+ received packets with such a TOC entry SHALL be discarded.
+
+4.3.2.2. ToC Entry in the Interleaved Mode
+
+ Two different ToC entry formats are defined in interleaved mode.
+ They differ in the length of the displacement field, 4 bits or 8
+ bits. The L-bit in the payload header differentiates between the two
+ modes.
+
+ If L=0, a ToC entry has the following format:
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |F| Frame Type | #frames | DIS1 | ... | DISi | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | ... | DISn | Padd |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ F (1 bit): See definition in 4.3.2.1.
+
+ Frame Type (FT) (7 bits): See definition in 4.3.2.1.
+
+ #frames (8 bits): See definition in 4.3.2.1.
+
+ DIS1...DISn (4 bits): A list of n (n=#frames) displacement fields
+ indicating the displacement of the i:th (i=1..n) audio frame
+ relative to the preceding audio frame in the payload, in units of
+ frames. The four-bit unsigned integer displacement values may be
+ between 0 and 15, indicating the number of audio frames in
+ decoding order between the (i-1):th and the i:th frame in the
+ payload. Note that for the first ToC entry of the payload, the
+ value of DIS1 is meaningless. It SHALL be set to zero by a sender
+ and SHALL be ignored by a receiver. This frame's location in the
+ decoding order is uniquely defined by the RTP timestamp and TFI in
+ the payload header. Note also that for subsequent ToC entries,
+ DIS1 indicates the number of frames between the last frame of the
+ previous group and the first frame of this group.
+
+
+
+
+Sjoberg, et al. Standards Track [Page 16]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ Padd (4 bits): To ensure octet alignment, four padding bits SHALL be
+ included at the end of the ToC entry in case there is odd number
+ of frames in the group referenced by this entry. These bits SHALL
+ be set to zero and SHALL be ignored by the receiver. If a group
+ containing an even number of frames is referenced by this ToC
+ entry, these padding bits SHALL NOT be included in the payload.
+
+ If L=1, a ToC entry has the following format:
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |F| Frame Type | #frames | DIS1 | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | DISn |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ F (1 bit): See definition in 4.3.2.1.
+
+ Frame Type (FT) (7 bits): See definition in 4.3.2.1.
+
+ #frames (8 bits): See definition in 4.3.2.1.
+
+ DIS1...DISn (8 bits): A list of n (n=#frames) displacement fields
+ indicating the displacement of the i:th (i=1..n) audio frame
+ relative to the preceding audio frame in the payload, in units of
+ frames. The eight-bit unsigned integer displacement values may be
+ between 0 and 255, indicating the number of audio frames in
+ decoding order between the (i-1):th and the i:th frame in the
+ payload. Note that for the first ToC entry of the payload, the
+ value of DIS1 is meaningless. It SHALL be set to zero by a sender
+ and SHALL be ignored by a receiver. This frame's location in the
+ decoding order is uniquely defined by the RTP timestamp and TFI in
+ the payload header. Note also that for subsequent ToC entries,
+ DIS1 indicates the displacement between the last frame of the
+ previous group and the first frame of this group.
+
+4.3.2.3. RTP Timestamp Derivation
+
+ The RTP Timestamp value for a frame SHALL be the timestamp value of
+ the first audio sample encoded in the frame. The timestamp value for
+ a frame is derived differently depending on the payload mode, basic
+ or interleaved. In both cases, the first frame in a compound packet
+ has an RTP timestamp equal to the one received in the RTP header. In
+ the basic mode, the RTP time for any subsequent frame is derived in
+ two steps. First, the sum of the frame durations (see Table 1) of
+ all the preceding frames in the payload is calculated. Then, this
+ sum is added to the RTP header timestamp value. For example, let's
+
+
+
+Sjoberg, et al. Standards Track [Page 17]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ assume that the RTP Header timestamp value is 12345, the payload
+ carries four frames, and the frame duration is 16 ms (ISF = 32 kHz)
+ corresponding to 1152 timestamp ticks. Then the RTP timestamp of the
+ fourth frame in the payload is 12345 + 3 * 1152 = 15801.
+
+ In interleaved mode, the RTP timestamp for each frame in the payload
+ is derived from the RTP header timestamp and the sum of the time
+ offsets of all preceding frames in this payload. The frame
+ timestamps are computed based on displacement fields and the frame
+ duration derived from the ISF value. Note that the displacement in
+ time between frame i-1 and frame i is (DISi + 1) * frame duration
+ because the duration of the (i-1):th must also be taken into account.
+ The timestamp of the first frame of the first group of frames (TS(1))
+ (i.e., the first frame of the payload) is the RTP header timestamp.
+ For subsequent frames in the group, the timestamp is computed by
+
+ TS(i) = TS(i-1) + (DISi + 1) * frame duration, 2 < i < n
+
+ For subsequent groups of frames, the timestamp of the first frame is
+ computed by
+
+ TS(1) = TSprev + (DIS1 + 1) * frame duration,
+
+ where TSprev denotes the timestamp of the last frame in the previous
+ group. The timestamps of the subsequent frames in the group are
+ computed in the same way as for the first group.
+
+ The following example derives the RTP timestamps for the frames in an
+ interleaved mode payload having the following header and ToC
+ information:
+
+ RTP header timestamp: 12345
+ ISF = 32 kHz
+ Frame 1 displacement field: DIS1 = 0
+ Frame 2 displacement field: DIS2 = 6
+ Frame 3 displacement field: DIS3 = 4
+ Frame 4 displacement field: DIS4 = 7
+
+ Assuming an ISF of 32 kHz, which implies a frame duration of 16 ms,
+ one frame lasts 1152 ticks. The timestamp of the first frame in the
+ payload is the RTP timestamp, i.e., TS(1) = RTP TS. Note that the
+ displacement field value for this frame must be ignored. For the
+ second frame in the payload, the timestamp can be calculated as TS(2)
+ = TS(1) + (DIS2 + 1) * 1152 = 20409. For the third frame, the
+ timestamp is TS(3) = TS(2) + (DIS3 + 1) * 1152 = 26169. Finally, for
+ the fourth frame of the payload, we have TS(4) = TS(3) + (DIS4 + 1) *
+ 1152 = 35385.
+
+
+
+
+Sjoberg, et al. Standards Track [Page 18]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+4.3.2.4. Frame Type Considerations
+
+ The value of Frame Type (FT) is defined in Table 25 in [1]. FT=14
+ (AUDIO_LOST) is used to denote frames that are lost. A NO_DATA
+ (FT=15) frame could result from two situations: First, that no data
+ has been produced by the audio encoder; and second, that no data is
+ transmitted in the current payload. An example for the latter would
+ be that the frame in question has been or will be sent in an earlier
+ or later packet. The duration for these non-included frames is
+ dependent on the internal sampling frequency indicated by the ISF
+ field.
+
+ For frame types with index 0-13, the ISF field SHALL be set 0. The
+ frame duration for these frame types is fixed to 20 ms in time, i.e.,
+ 1440 ticks in 72 kHz. For payloads containing only frames of type
+ 0-9, the TFI field SHALL be set to 0 and SHALL be ignored by the
+ receiver. In a payload combining frames of type 0-9 and 10-13, the
+ TFI values need to be set to match the transport frames of type
+ 10-13. Thus, frames of type 0-9 will also have a derived TFI, which
+ is ignored.
+
+4.3.2.5. Other TOC Considerations
+
+ If a ToC entry with an undefined FT value is received, the whole
+ packet SHALL be discarded. This is to avoid the loss of data
+ synchronization in the depacketization process, which can result in a
+ severe degradation in audio quality.
+
+ Packets containing only NO_DATA frames SHOULD NOT be transmitted.
+ Also, NO_DATA frames at the end of a frame sequence to be carried in
+ a payload SHOULD NOT be included in the transmitted packet. The
+ AMR-WB+ SCR/DTX is identical with AMR-WB SCR/DTX described in [5] and
+ can only be used in combination with the AMR-WB frame types (0-8).
+
+ When multiple groups of frames are present, their ToC entries SHALL
+ be placed in the ToC in order of increasing RTP timestamp value
+ (modulo 2^32) of the first transport frame the TOC entry represents,
+ independent of the payload mode. In basic mode, the frames SHALL be
+ consecutive in time, while in interleaved mode the frames MAY not
+ only be non-consecutive in time but MAY even have varying inter-frame
+ distances.
+
+4.3.2.6. ToC Examples
+
+ The following example illustrates a ToC for three audio frames in
+ basic mode. Note that in this case all audio frames are encoded
+ using the same frame type, i.e., there is only one ToC entry.
+
+
+
+
+Sjoberg, et al. Standards Track [Page 19]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ 0 1
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0| Frame Type1 | #frames = 3 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ The next example depicts a ToC of three entries in basic mode. Note
+ that in this case the payload also carries three frames, but three
+ ToC entries are needed because the frames of the payload are encoded
+ using different frame types.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |1| Frame Type1 | #frames = 1 |1| Frame Type2 | #frames = 1 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |0| Frame Type3 | #frames = 1 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ The following example illustrates a ToC with two entries in
+ interleaved mode using four-bit displacement fields. The payload
+ includes two groups of frames, the first one including a single
+ frame, and the other one consisting of two frames.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ |1| Frame Type1 | #frames = 1 | DIS1 | padd |0| Frame Type2 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | #frames = 2 | DIS1 | DIS2 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+4.3.3. Audio Data
+
+ Audio data of a payload consists of zero or more audio frames, as
+ described in the ToC of the payload.
+
+ ToC entries with FT=14 or 15 represent frame types with a length of
+ 0. Hence, no data SHALL be placed in the audio data section to
+ represent frames of this type.
+
+ As already discussed, each audio frame of an extension frame type
+ represents an AMR-WB+ transport frame corresponding to the encoding
+ of 512 samples of audio, sampled with the internal sampling frequency
+ specified by the ISF indicator. As an exception, frame types with
+ index 10-13 are only capable of using a single internal sampling
+ frequency (25600 Hz). The encoding rates (combination of core bit-
+ rate and stereo bit-rate) are indicated in the frame type field of
+
+
+
+Sjoberg, et al. Standards Track [Page 20]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ the corresponding ToC entry. The octet length of the audio frame is
+ implicitly defined by the frame type field and is given in Tables 21
+ and 25 of [1]. The order and numbering notation of the bits are as
+ specified in [1]. For the AMR-WB+ extension frame types and comfort
+ noise frames, the bits are in the order produced by the encoder. The
+ last octet of each audio frame MUST be padded with zeroes at the end
+ if not all bits in the octet are used. In other words, each audio
+ frame MUST be octet-aligned.
+
+4.3.4. Methods for Forming the Payload
+
+ The payload begins with the payload header, followed by the table of
+ contents, which consists of a list of ToC entries.
+
+ The audio data follows the table of contents. All the octets
+ comprising an audio frame SHALL be appended to the payload as a unit.
+ The audio frames are packetized in timestamp order within each group
+ of frames (per ToC entry). The groups of frames are packetized in
+ the same order as their corresponding ToC entries. Note that there
+ are no data octets in a group having a ToC entry with FT=14 or FT=15.
+
+4.3.5. Payload Examples
+
+4.3.5.1. Example 1: Basic Mode Payload Carrying Multiple Frames Encoded
+ Using the Same Frame Type
+
+ Figure 4 depicts a payload that carries three AMR-WB+ frames encoded
+ using 14 kbit/s frame type (FT=26) with a frame length of 280 bits
+ (35 bytes). The internal sampling frequency in this example is 25.6
+ kHz (ISF = 8). The TFI for the first frame is 2, indicating that the
+ first transport frame in this payload is the third in a super-frame.
+ Since this payload is in the basic mode, the subsequent frames of the
+ payload are consecutive frames in decoding order, i.e., the fourth
+ transport frame of the current super-frame and the first transport
+ frame of the next super-frame. Note that because the frames are all
+ encoded using the same frame type, only one ToC entry is required.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 21]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ISF = 8 | 2 |0|0| FT = 26 | #frames = 3 | f1(0...7) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | f1(272...279) | f2(0...7) | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | f2(272...279) | f3(0...7) | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | f3(272...279) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 4: An example of a basic mode payload carrying three frames
+ of the same frame type
+
+4.3.5.2. Example 2: Basic Mode Payload Carrying Multiple Frames Encoded
+ Using Different Frame Types
+
+ Figure 5 depicts a payload that carries three AMR-WB+ frames; the
+ first frame is encoded using 18.4 kbit/s frame type (FT=33) with a
+ frame length of 368 bits (46 bytes), and the two subsequent frames
+ are encoded using 20 kbit/s frame type (FT=35) having frame length of
+ 400 bits (50 bytes). The internal sampling frequency in this example
+ is 32 kHz (ISF = 10), implying the overall bit-rates of 23 kbit/s for
+ the first frame of the payload, and 25 kbit/s for the subsequent
+ frames. The TFI for the first frame is 3, indicating that the first
+ transport frame in this payload is the fourth in a super-frame.
+ Since this is a payload in the basic mode, the subsequent frames of
+ the payload are consecutive frames in decoding order, i.e., the first
+ and second transport frames of the current super-frame. Note that
+ since the payload carries two different frame types, there are two
+ ToC entries.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 22]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ISF=10 | 3 |0|1| FT = 33 | #frames = 1 |0| FT = 35 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | #frames = 2 | f1(0...7) | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | f1(360...367) | f2(0...7) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | f2(392...399) | f3(0...7) | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | f3(392...399) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 5: An example of a basic mode payload carrying three frames
+ employing two different frame types
+
+4.3.5.3. Example 3: Payload in Interleaved Mode
+
+ The example in Figure 6 depicts a payload in interleaved mode,
+ carrying four frames encoded using 32 kbit/s frame type (FT=47) with
+ frame length of 640 bits (80 bytes). The internal sampling frequency
+ is 38.4 kHz (ISF = 13), implying a bit-rate of 48 kbit/s for all
+ frames in the payload. The TFI for the first frame is 0; hence, it
+ is the first transport frame of a super-frame. The displacement
+ fields for the subsequent frames are DIS2=18, DIS3=15, and DIS4=10,
+ which indicates that the subsequent frames have the TFIs of 3, 3, and
+ 2, respectively. The long displacement field flag L in the payload
+ header is set to 1, which results in the use of eight bits for the
+ displacement fields in the ToC entry. Note that since all frames of
+ this payload are encoded using the same frame type, there is need
+ only for a single ToC entry. Furthermore, the displacement field for
+ the first frame (corresponding to the first ToC entry with DIS1=0)
+ must be ignored, since its timestamp and TFI are defined by the RTP
+ timestamp and the TFI found in the payload header.
+
+ The RTP timestamp values of the frames in this example are:
+
+ Frame1: TS1 = RTP Timestamp
+ Frame2: TS2 = TS1 + 19 * 960
+ Frame3: TS3 = TS2 + 16 * 960
+ Frame4: TS4 = TS3 + 11 * 960
+
+
+
+Sjoberg, et al. Standards Track [Page 23]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ISF=13 | 0 |1|0| FT = 47 | #frames = 4 | DIS1 = 0 |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | DIS2 = 18 | DIS3 = 15 | DIS4 = 10 | f1(0...7) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | f1(632...639) | f2(0...7) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | f2(632...639) | f3(0...7) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | f3(632...639) | f4(0...7) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | f4(632...639) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Figure 6: An example of an interleaved mode payload carrying four
+ frames at the same frame type
+
+4.4. Interleaving Considerations
+
+ The use of interleaving requires further considerations. As
+ presented in the example in Section 3.6.2, a given interleaving
+ pattern requires a certain amount of the deinterleaving buffer. This
+ buffer space, expressed in a number of transport frame slots, is
+ indicated by the "interleaving" media type parameter. The number of
+ frame slots needed can be converted into actual memory requirements
+ by considering the 80 bytes per frame used by the largest combination
+ of AMR-WB+'s core and stereo rates.
+
+ The information about the frame buffer size is not always sufficient
+ to determine when it is appropriate to start consuming frames from
+ the interleaving buffer. There are two cases in which additional
+ information is needed: first, when switching of the ISF occurs, and
+ second, when the interleaving pattern changes. The "int-delay" media
+ type parameter is defined to convey this information. It allows a
+ sender to indicate the minimal media time that needs to be present in
+ the buffer before the decoder can start consuming frames from the
+ buffer. Because the sender has full control over ISF changes and the
+ interleaving pattern, it can calculate this value.
+
+
+
+Sjoberg, et al. Standards Track [Page 24]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ In certain cases (for example, if joining a multicast session with
+ interleaving mid-session), a receiver may initially receive only part
+ of the packets in the interleaving pattern. This initial partial
+ reception (in frame sequence order) of frames can yield too few
+ frames for acceptable quality from the audio decoding. This problem
+ also arises when using encryption for access control, and the
+ receiver does not have the previous key.
+
+ Although the AMR-WB+ is robust and thus tolerant to a high random
+ frame erasure rate, it would have difficulties handling consecutive
+ frame losses at startup. Thus, some special implementation
+ considerations are described. In order to handle this type of
+ startup efficiently, it must be noted that decoding is only possible
+ to start at the beginning of a super-frame, and that holds true even
+ if the first transport frame is indicated as lost. Secondly,
+ decoding is only RECOMMENDED to start if at least 2 transport frames
+ are available out of the 4 belonging to that super-frame.
+
+ After receiving a number of packets, in the worst case as many
+ packets as the interleaving pattern covers, the previously described
+ effects disappear and normal decoding is resumed.
+
+ Similar issues arise when a receiver leaves a session or has lost
+ access to the stream. If the receiver leaves the session, this would
+ be a minor issue since playout is normally stopped. It is also a
+ minor issue for the case of lost access, since the AMR-WB+ error
+ concealment will fade out the audio if massive consecutive losses are
+ encountered.
+
+ The sender can avoid this type of problem in many sessions by
+ starting and ending interleaving patterns correctly when risks of
+ losses occur. One such example is a key-change done for access
+ control to encrypted streams. If only some keys are provided to
+ clients and there is a risk of their receiving content for which they
+ do not have the key, it is recommended that interleaving patterns not
+ overlap key changes.
+
+4.5. Implementation Considerations
+
+ An application implementing this payload format MUST understand all
+ the payload parameters. Any mapping of the parameters to a signaling
+ protocol MUST support all parameters. So an implementation of this
+ payload format in an application using SDP is required to understand
+ all the payload parameters in their SDP-mapped form. This
+ requirement ensures that an implementation always can decide whether
+ it is capable of communicating.
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 25]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ Both basic and interleaved mode SHALL be implemented. The
+ implementation burden of both is rather small, and requiring both
+ ensures interoperability. As the AMR-WB+ codec contains the full
+ functionality of the AMR-WB codec, it is RECOMMENDED to also
+ implement the payload format in RFC 3267 [7] for the AMR-WB frame
+ types when implementing this specification. Doing so makes
+ interoperability with devices that only support AMR-WB more likely.
+
+ The switching of ISF, when combined with packet loss, could result in
+ concealment using the wrong audio frame length. This can occur if
+ packet losses result in lost frames directly after the point of ISF
+ change. The packet loss would prevent the receiver from noticing the
+ changed ISF and thereby conceal the lost transport frame with the
+ previous ISF, instead of the new one. Although always later
+ detectable, such an error results in frame boundary misalignment,
+ which can cause audio distortions and problems with synchronization,
+ as too many or too few audio samples were created. This problem can
+ be mitigated in most cases by performing ISF recovery prior to
+ concealment as outlined in Section 4.5.1.
+
+4.5.1. ISF Recovery in Case of Packet Loss
+
+ In case of packet loss, it is important that the AMR-WB+ decoder
+ initiates a proper error concealment to replace the frames carried in
+ the lost packet. A loss concealment algorithm requires a codec
+ framing that matches the timestamps of the correctly received frames.
+ Hence, it is necessary to recover the timestamps of the lost frames.
+ Doing so is non-trivial because the codec frame length that is
+ associated with the ISF may have changed during the frame loss.
+
+ In the following, the recovery of the timestamp information of lost
+ frames is illustrated by the means of an example. Two frames with
+ timestamps t0 and t1 have been received properly, the first one being
+ the last packet before the loss, and the latter one being the first
+ packet after the loss period. The ISF values for these packets are
+ isf0 and isf1, respectively. The TFIs of these frames are tfi0 and
+ tfi1, respectively. The associated frame lengths (in timestamp
+ ticks) are given as L0 and L1, respectively. In this example three
+ frames with timestamps x1 - x3 have been lost. The example further
+ assumes that ISF changes once from isf0 to isf1 during the frame loss
+ period, as shown in the figure below.
+
+ Since not all information required for the full recovery of the
+ timestamps is generally known in the receiver, an algorithm is needed
+ to estimate the ISF associated with the lost frames. Also, the
+ number of lost frames needs to be recovered.
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 26]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ |<---L0--->|<---L0--->|<-L1->|<-L1->|<-L1->|
+
+ | Rxd | lost | lost | lost | Rxd |
+ --+----------+----------+------+------+------+--
+
+ t0 x1 x2 x3 t1
+
+ Example Algorithm:
+
+ Start: # check for frame loss
+ If (t0 + L0) == t1 Then goto End # no frame loss
+
+ Step 1: # check case with no ISF change
+ If (isf0 != isf1) Then goto Step 2 # At least one ISF change
+ If (isFractional(t1 - t0)/L0) Then goto Step 3
+ # More than 1 ISF change
+
+ Return recovered timestamps as
+ x(n) = t0 + n*L1 and associated ISF equal to isf0,
+ for 0 < n < (t1 - t0)/L0
+ goto End
+
+ Step 2:
+ Loop initialization: n := 4 - tfi0 mod 4
+ While n <= (t1-t0)/L0
+ Evaluate m := (t1 - t0 - n*L0)/L1
+ If (isInteger(m) AND ((tfi0+n+m) mod 4 == tfi1)) Then goto found;
+ n := n+4
+ endloop
+ goto step 3 # More than 1 ISF change
+
+ found:
+ Return recovered timestamps and ISFs as
+ x(i) = t0 + i*L0 and associated ISF equal to isf0, for 0 < i <= n
+ x(i) = t0 + n*L0 + (i-n)*L1 and associated ISF equal to isf1,
+ for n < i <= n+m
+ goto End
+
+ Step 3:
+ More than 1 ISF change has occurred. Since ISF changes can be
+ assumed to be infrequent, such a situation occurs only if long
+ sequences of frames are lost. In that case it is probably not useful
+ to try to recover the timestamps of the lost frames. Rather, the
+ AMR-WB+ decoder should be reset, and decoding should be resumed
+ starting with the frame with timestamp t1.
+
+ End:
+
+
+
+
+Sjoberg, et al. Standards Track [Page 27]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ The above algorithm still does not solve the issue when the receiver
+ buffer depth is shallower than the loss burst. In this kind of case,
+ where the concealment must be done without any knowledge about future
+ frames, the concealment may result in loss of frame boundary
+ alignment. If that occurs, it may be necessary to reset and restart
+ the codec to perform resynchronization.
+
+4.5.2. Decoding Validation
+
+ If the receiver finds a mismatch between the size of a received
+ payload and the size indicated by the ToC of the payload, the
+ receiver SHOULD discard the packet. This is recommended because
+ decoding a frame parsed from a payload based on erroneous ToC data
+ could severely degrade the audio quality.
+
+5. Congestion Control
+
+ The general congestion control considerations for transporting RTP
+ data apply; see RTP [3] and any applicable RTP profile like AVP [9].
+ However, the multi-rate capability of AMR-WB+ audio coding provides a
+ mechanism that may help to control congestion, since the bandwidth
+ demand can be adjusted (within the limits of the codec) by selecting
+ a different coding frame type or lower internal sampling rate.
+
+ The number of frames encapsulated in each RTP payload highly
+ influences the overall bandwidth of the RTP stream due to header
+ overhead constraints. Packetizing more frames in each RTP payload
+ can reduce the number of packets sent and hence the header overhead,
+ at the expense of increased delay and reduced error robustness.
+
+ If forward error correction (FEC) is used, the amount of FEC-induced
+ redundancy needs to be regulated such that the use of FEC itself does
+ not cause a congestion problem.
+
+6. Security Considerations
+
+ RTP packets using the payload format defined in this specification
+ are subject to the general security considerations discussed in RTP
+ [3] and any applicable profile such as AVP [9] or SAVP [10]. As this
+ format transports encoded audio, the main security issues include
+ confidentiality, integrity protection, and data origin authentication
+ of the audio itself. The payload format itself does not have any
+ built-in security mechanisms. Any suitable external mechanisms, such
+ as SRTP [10], MAY be used.
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 28]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ This payload format and the AMR-WB+ decoder do not exhibit any
+ significant non-uniformity in the receiver-side computational
+ complexity for packet processing, and thus are unlikely to pose a
+ denial-of-service threat due to the receipt of pathological data.
+
+6.1. Confidentiality
+
+ In order to ensure confidentiality of the encoded audio, all audio
+ data bits MUST be encrypted. There is less need to encrypt the
+ payload header or the table of contents since they only carry
+ information about the frame type. This information could also be
+ useful to a third party, for example, for quality monitoring.
+
+ The use of interleaving in conjunction with encryption can have a
+ negative impact on confidentiality, for a short period of time.
+ Consider the following packets (in brackets) containing frame numbers
+ as indicated: {10, 14, 18}, {13, 17, 21}, {16, 20, 24} (a popular
+ continuous diagonal interleaving pattern). The originator wishes to
+ deny some participants the ability to hear material starting at time
+ 16. Simply changing the key on the packet with the timestamp at or
+ after 16, and denying that new key to those participants, does not
+ achieve this; frames 17, 18, and 21 have been supplied in prior
+ packets under the prior key, and error concealment may make the audio
+ intelligible at least as far as frame 18 or 19, and possibly further.
+
+6.2. Authentication and Integrity
+
+ To authenticate the sender of the speech, an external mechanism MUST
+ be used. It is RECOMMENDED that such a mechanism protects both the
+ complete RTP header and the payload (speech and data bits).
+
+ Data tampering by a man-in-the-middle attacker could replace audio
+ content and also result in erroneous depacketization/decoding that
+ could lower the audio quality.
+
+7. Payload Format Parameters
+
+ This section defines the parameters that may be used to select
+ features of the AMR-WB+ payload format. The parameters are defined
+ as part of the media type registration for the AMR-WB+ audio codec.
+ A mapping of the parameters into the Session Description Protocol
+ (SDP) [6] is also provided for those applications that use SDP.
+ Equivalent parameters could be defined elsewhere for use with control
+ protocols that do not use MIME or SDP.
+
+ The data format and parameters are only specified for real-time
+ transport in RTP.
+
+
+
+
+Sjoberg, et al. Standards Track [Page 29]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+7.1. Media Type Registration
+
+ The media type for the Extended Adaptive Multi-Rate Wideband
+ (AMR-WB+) codec is allocated from the IETF tree, since AMR-WB+ is
+ expected to be a widely used audio codec in general streaming
+ applications.
+
+ Note: Parameters not listed below MUST be ignored by the receiver.
+
+ Media Type name: audio
+
+ Media subtype name: AMR-WB+
+
+ Required parameters:
+
+ None
+
+ Optional parameters:
+
+ channels: The maximum number of audio channels used by the
+ audio frames. Permissible values are 1 (mono) or 2
+ (stereo). If no parameter is present, the maximum
+ number of channels is 2 (stereo). Note: When set to
+ 1, implicitly the stereo frame types cannot be used.
+
+ interleaving: Indicates that interleaved mode SHALL
+ be used for the payload. The parameter specifies
+ the number of transport frame slots required in a
+ deinterleaving buffer (including the frame that is
+ ready to be consumed). Its value is equal to one
+ plus the maximum number of frames that precede any
+ frame in transmission order and follow the frame in
+ RTP timestamp order. The value MUST be greater than
+ zero. If this parameter is not present,
+ interleaved mode SHALL NOT be used.
+
+ int-delay: The minimal media time delay in RTP timestamp ticks
+ that is needed in the deinterleaving buffer, i.e.,
+ the difference in RTP timestamp ticks between the
+ earliest and latest audio frame present in the
+ deinterleaving buffer.
+
+ ptime: See Section 6 in RFC 2327 [6].
+
+ maxptime: See Section 8 in RFC 3267 [7].
+
+ Restriction on Usage:
+ This type is only defined for transfer via RTP (STD 64).
+
+
+
+Sjoberg, et al. Standards Track [Page 30]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ Encoding considerations:
+ An RTP payload according to this format is binary data
+ and thus may need to be appropriately encoded in non-
+ binary environments. However, as long as used within
+ RTP, no encoding is necessary.
+
+ Security considerations:
+ See Section 6 of RFC 4352.
+
+ Interoperability considerations:
+ To maintain interoperability with AMR-WB-capable end-
+ points, in cases where negotiation is possible and the
+ AMR-WB+ end-point supporting this format also supports
+ RFC 3267 for AMR-WB transport, an AMR-WB+ end-point
+ SHOULD declare itself also as AMR-WB capable (i.e.,
+ supporting also "audio/AMR-WB" as specified in RFC
+ 3267).
+
+ As the AMR-WB+ decoder is capable of performing stereo
+ to mono conversions, all receivers of AMR-WB+ should be
+ able to receive both stereo and mono, although the
+ receiver is only capable of playout of mono signals.
+
+ Public specification:
+ RFC 4352
+ 3GPP TS 26.290, see reference [1] of RFC 4352
+
+ Additional information:
+ This MIME type is not applicable for file storage.
+ Instead, file storage of AMR-WB+ encoded audio is
+ specified within the 3GPP-defined ISO-based multimedia
+ file format defined in 3GPP TS 26.244; see reference
+ [14] of RFC 4352. This file format has the MIME types
+ "audio/3GPP" or "video/3GPP" as defined by RFC 3839
+ [15].
+
+ Person & email address to contact for further information:
+ magnus.westerlund@ericsson.com
+ ari.lakaniemi@nokia.com
+
+ Intended usage: COMMON.
+ It is expected that many IP-based streaming
+ applications will use this type.
+
+ Change controller:
+ IETF Audio/Video Transport working group delegated from
+ the IESG.
+
+
+
+
+Sjoberg, et al. Standards Track [Page 31]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+7.2. Mapping Media Type Parameters into SDP
+
+ The information carried in the media type specification has a
+ specific mapping to fields in the Session Description Protocol (SDP)
+ [6], which is commonly used to describe RTP sessions. When SDP is
+ used to specify an RTP session using this RTP payload format, the
+ mapping is as follows:
+
+ - The media type ("audio") is used in SDP "m=" as the media name.
+
+ - The media type (payload format name) is used in SDP "a=rtpmap" as
+ the encoding name. The RTP clock rate in "a=rtpmap" SHALL be
+ 72000 for AMR-WB+, and the encoding parameter number of channels
+ MUST either be explicitly set to 1 or 2, or be omitted, implying
+ the default value of 2.
+
+ - The parameters "ptime" and "maxptime" are placed in the SDP
+ attributes "a=ptime" and "a=maxptime", respectively.
+
+ - Any remaining parameters are placed in the SDP "a=fmtp" attribute
+ by copying them directly from the MIME media type string as a
+ semicolon-separated list of parameter=value pairs.
+
+7.2.1. Offer-Answer Model Considerations
+
+ To achieve good interoperability in an Offer-Answer [8] negotiation
+ usage, the following considerations should be taken into account:
+
+ For negotiable offer/answer usage the following interpretation rules
+ SHALL be applied:
+
+ - The "interleaving" parameter is symmetric, thus requiring that the
+ answerer must also include it for the answer to an offered payload
+ type that contains the parameter. However, the buffer space value
+ is declarative in usage in unicast. For multicast usage, the same
+ value in the response is required in order to accept the payload
+ type. For streams declared as sendrecv or recvonly: The receiver
+ will accept reception of streams using the interleaved mode of the
+ payload format. The value declares the amount of buffer space the
+ receiver has available for the sender to utilize. For sendonly
+ streams, the parameter indicates the desired configuration and
+ amount of buffer space. An answerer is RECOMMENDED to respond
+ using the offered value, if capable of using it.
+
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 32]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ - The "int-delay" parameter is declarative. For streams declared as
+ sendrecv or recvonly, the value indicates the maximum initial
+ delay the receiver will accept in the deinterleaving buffer. For
+ sendonly streams, the value is the amount of media time the sender
+ desires to use. The value SHOULD be copied into any response.
+
+ - The "channels" parameter is declarative. For "sendonly" streams,
+ it indicates the desired channel usage, stereo and mono, or mono
+ only. For "recvonly" and "sendrecv" streams, the parameter
+ indicates what the receiver accepts to use. As any receiver will
+ be capable of receiving stereo frame type and perform local mixing
+ within the AMR-WB+ decoder, there is normally only one reason to
+ restrict to mono only: to avoid spending bit-rate on data that are
+ not utilized if the front-end is only capable of mono.
+
+ - The "ptime" parameter works as indicated by the offer/answer model
+ [8]; "maxptime" SHALL be used in the same way.
+
+ - To maintain interoperability with AMR-WB in cases where
+ negotiation is possible, an AMR-WB+ capable end-point that also
+ implements the AMR-WB payload format [7] is RECOMMENDED to declare
+ itself capable of AMR-WB as it is a subset of the AMR-WB+ codec.
+
+ In declarative usage, like SDP in RTSP [16] or SAP [17], the
+ following interpretation of the parameters SHALL be done:
+
+ - The "interleaving" parameter, if present, configures the payload
+ format in that mode, and the value indicates the number of frames
+ that the deinterleaving buffer is required to support to be able
+ to handle this session correctly.
+
+ - The "int-delay" parameter indicates the initial buffering delay
+ required to receive this stream correctly.
+
+ - The "channels" parameter indicates if the content being
+ transmitted can contain either both stereo and mono rates, or only
+ mono.
+
+ - All other parameters indicate values that are being used by the
+ sending entity.
+
+
+
+
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 33]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+7.2.2. Examples
+
+ One example of an SDP session description utilizing AMR-WB+ mono and
+ stereo encoding follows.
+
+ m=audio 49120 RTP/AVP 99
+ a=rtpmap:99 AMR-WB+/72000/2
+ a=fmtp:99 interleaving=30; int-delay=86400
+ a=maxptime:100
+
+ Note that the payload format (encoding) names are commonly shown in
+ uppercase. Media subtypes are commonly shown in lowercase. These
+ names are case-insensitive in both places. Similarly, parameter
+ names are case-insensitive both in MIME types and in the default
+ mapping to the SDP a=fmtp attribute.
+
+8. IANA Considerations
+
+ The IANA has registered one new MIME subtype (audio/amr-wb+); see
+ Section 7.
+
+9. Contributors
+
+ Daniel Enstrom has contributed in writing the codec introduction
+ section. Stefan Bruhn has contributed by writing the ISF recovery
+ algorithm.
+
+10. Acknowledgements
+
+ The authors would like to thank Redwan Salami and Stefan Bruhn for
+ their significant contributions made throughout the writing and
+ reviewing of this document. Dave Singer contributed by reviewing and
+ suggesting improved language. Anisse Taleb and Ingemar Johansson
+ contributed by implementing the payload format and thus helped locate
+ some flaws. We would also like to acknowledge Qiaobing Xie, coauthor
+ of RFC 3267, on which this document is based.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 34]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+11. References
+
+11.1. Normative References
+
+ [1] 3GPP TS 26.290 "Audio codec processing functions; Extended
+ Adaptive Multi-Rate Wideband (AMR-WB+) codec; Transcoding
+ functions", version 6.3.0 (2005-06), 3rd Generation Partnership
+ Project (3GPP).
+
+ [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
+ "RTP: A Transport Protocol for Real-Time Applications", STD 64,
+ RFC 3550, July 2003.
+
+ [4] 3GPP TS 26.192 "AMR Wideband speech codec; Comfort Noise
+ aspects", version 6.0.0 (2004-12), 3rd Generation Partnership
+ Project (3GPP).
+
+ [5] 3GPP TS 26.193 "AMR Wideband speech codec; Source Controlled
+ Rate operation", version 6.0.0 (2004-12), 3rd Generation
+ Partnership Project (3GPP).
+
+ [6] Handley, M. and V. Jacobson, "SDP: Session Description
+ Protocol", RFC 2327, April 1998.
+
+ [7] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-
+ Time Transport Protocol (RTP) Payload Format and File Storage
+ Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate
+ Wideband (AMR-WB) Audio Codecs", RFC 3267, June 2002.
+
+ [8] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
+ Session Description Protocol (SDP)", RFC 3264, June 2002.
+
+ [9] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
+ Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
+
+11.2. Informative References
+
+ [10] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
+ 3711, March 2004.
+
+ [11] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
+ Generic Forward Error Correction", RFC 2733, December 1999.
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 35]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+ [12] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
+ Bolot, J., Vega-Garcia, A., and S. Fosse-Parisis, "RTP Payload
+ for Redundant Audio Data", RFC 2198, September 1997.
+
+ [13] 3GPP TS 26.233 "Packet Switched Streaming service", version
+ 5.7.0 (2005-03), 3rd Generation Partnership Project (3GPP).
+
+ [14] 3GPP TS 26.244 "Transparent end-to-end packet switched streaming
+ service (PSS); 3GPP file format (3GP)", version 6.4.0 (2005-09),
+ 3rd Generation Partnership Project (3GPP).
+
+ [15] Castagno, R. and D. Singer, "MIME Type Registrations for 3rd
+ Generation Partnership Project (3GPP) Multimedia files", RFC
+ 3839, July 2004.
+
+ [16] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
+ Protocol (RTSP)", RFC 2326, April 1998.
+
+ [17] Handley, M., Perkins, C., and E. Whelan, "Session Announcement
+ Protocol", RFC 2974, October 2000.
+
+ [18] 3GPP TS 26.140 "Multimedia Messaging Service (MMS); Media
+ formats and codes", version 6.2.0 (2005-03), 3rd Generation
+ Partnership Project (3GPP).
+
+ [19] 3GPP TS 26.140 "Multimedia Broadcast/Multicast Service (MBMS);
+ Protocols and codecs", version 6.3.0 (2005-12), 3rd Generation
+ Partnership Project (3GPP).
+
+ Any 3GPP document can be downloaded from the 3GPP webserver,
+ "http://www.3gpp.org/", see specifications.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 36]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+Authors' Addresses
+
+ Johan Sjoberg
+ Ericsson Research
+ Ericsson AB
+ SE-164 80 Stockholm
+ SWEDEN
+
+ Phone: +46 8 7190000
+ EMail: Johan.Sjoberg@ericsson.com
+
+
+ Magnus Westerlund
+ Ericsson Research
+ Ericsson AB
+ SE-164 80 Stockholm
+ SWEDEN
+
+ Phone: +46 8 7190000
+ EMail: Magnus.Westerlund@ericsson.com
+
+
+ Ari Lakaniemi
+ Nokia Research Center
+ P.O. Box 407
+ FIN-00045 Nokia Group
+ FINLAND
+
+ Phone: +358-71-8008000
+ EMail: ari.lakaniemi@nokia.com
+
+
+ Stephan Wenger
+ Nokia Corporation
+ P.O. Box 100
+ FIN-33721 Tampere
+ FINLAND
+
+ Phone: +358-50-486-0637
+ EMail: Stephan.Wenger@nokia.com
+
+
+
+
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 37]
+
+RFC 4352 RTP Payload Format for AMR-WB+ January 2006
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2006).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78, and except as set forth therein, the authors
+ retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at
+ ietf-ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is provided by the IETF
+ Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+Sjoberg, et al. Standards Track [Page 38]
+