summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc4348.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4348.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc4348.txt')
-rw-r--r--doc/rfc/rfc4348.txt1795
1 files changed, 1795 insertions, 0 deletions
diff --git a/doc/rfc/rfc4348.txt b/doc/rfc/rfc4348.txt
new file mode 100644
index 0000000..622ef5a
--- /dev/null
+++ b/doc/rfc/rfc4348.txt
@@ -0,0 +1,1795 @@
+
+
+
+
+
+
+Network Working Group S. Ahmadi
+Request for Comments: 4348 January 2006
+Category: Standards Track
+
+
+ Real-Time Transport Protocol (RTP) Payload Format for the
+ Variable-Rate Multimode Wideband (VMR-WB) Audio Codec
+
+Status of This Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2006).
+
+Abstract
+
+ This document specifies a real-time transport protocol (RTP) payload
+ format to be used for the Variable-Rate Multimode Wideband (VMR-WB)
+ speech codec. The payload format is designed to be able to
+ interoperate with existing VMR-WB transport formats on non-IP
+ networks. A media type registration is included for VMR-WB RTP
+ payload format.
+
+ VMR-WB is a variable-rate multimode wideband speech codec that has a
+ number of operating modes, one of which is interoperable with AMR-WB
+ (i.e., RFC 3267) audio codec at certain rates. Therefore, provisions
+ have been made in this document to facilitate and simplify data
+ packet exchange between VMR-WB and AMR-WB in the interoperable mode
+ with no transcoding function involved.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 1]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 2. Conventions and Acronyms ........................................3
+ 3. The Variable-Rate Multimode Wideband (VMR-WB) Speech Codec ......4
+ 3.1. Narrowband Speech Processing ...............................5
+ 3.2. Continuous vs. Discontinuous Transmission ..................6
+ 3.3. Support for Multi-Channel Session ..........................6
+ 4. Robustness against Packet Loss ..................................7
+ 4.1. Forward Error Correction (FEC) .............................7
+ 4.2. Frame Interleaving and Multi-Frame Encapsulation ...........8
+ 5. VMR-WB Voice over IP Scenarios ..................................9
+ 5.1. IP Terminal to IP Terminal .................................9
+ 5.2. GW to IP Terminal .........................................10
+ 5.3. GW to GW (between VMR-WB- and AMR-WB-Enabled Terminals) ...10
+ 5.4. GW to GW (between Two VMR-WB-Enabled Terminals) ...........11
+ 6. VMR-WB RTP Payload Formats .....................................12
+ 6.1. RTP Header Usage ..........................................13
+ 6.2. Header-Free Payload Format ................................14
+ 6.3. Octet-Aligned Payload Format ..............................15
+ 6.3.1. Payload Structure ..................................15
+ 6.3.2. The Payload Header .................................15
+ 6.3.3. The Payload Table of Contents ......................18
+ 6.3.4. Speech Data ........................................20
+ 6.3.5. Payload Example: Basic Single Channel
+ Payload Carrying Multiple Frames ...................21
+ 6.4. Implementation Considerations .............................22
+ 6.4.1. Decoding Validation and Provision for Lost
+ or Late Packets ....................................22
+ 7. Congestion Control .............................................23
+ 8. Security Considerations ........................................23
+ 8.1. Confidentiality ...........................................24
+ 8.2. Authentication and Integrity ..............................24
+ 9. Payload Format Parameters ......................................24
+ 9.1. VMR-WB RTP Payload MIME Registration ......................25
+ 9.2. Mapping MIME Parameters into SDP ..........................27
+ 9.3. Offer-Answer Model Considerations .........................28
+ 10. IANA Considerations ...........................................29
+ 11. Acknowledgements ..............................................29
+ 12. References ....................................................30
+ 12.1. Normative References .....................................30
+ 12.2. Informative References ...................................30
+
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 2]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+1. Introduction
+
+ This document specifies the payload format for packetization of VMR-
+ WB-encoded speech signals into the Real-time Transport Protocol (RTP)
+ [3]. The VMR-WB payload formats support transmission of single and
+ multiple channels, frame interleaving, multiple frames per payload,
+ header-free payload, the use of mode switching, and interoperation
+ with existing VMR-WB transport formats on non-IP networks, as
+ described in Section 3.
+
+ The payload format is described in Section 6. The VMR-WB file format
+ (i.e., for transport of VMR-WB speech data in storage mode
+ applications such as email) is specified in [7]. In Section 9, a
+ media type registration for VMR-WB RTP payload format is provided.
+
+ Since VMR-WB is interoperable with AMR-WB at certain rates, an
+ attempt has been made throughout this document to maximize the
+ similarities with RFC 3267 while optimizing the payload format for
+ the non-interoperable modes of the VMR-WB codec.
+
+2. Conventions and Acronyms
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC2119 [2].
+
+ The following acronyms are used in this document:
+
+ 3GPP - The Third Generation Partnership Project
+ 3GPP2 - The Third Generation Partnership Project 2
+ CDMA - Code Division Multiple Access
+ WCDMA - Wideband Code Division Multiple Access
+ GSM - Global System for Mobile Communications
+ AMR-WB - Adaptive Multi-Rate Wideband Codec
+ VMR-WB - Variable-Rate Multimode Wideband Codec
+ CMR - Codec Mode Request
+ GW - Gateway
+ DTX - Discontinuous Transmission
+ FEC - Forward Error Correction
+ SID - Silence Descriptor
+ TrFO - Transcoder-Free Operation
+ UDP - User Datagram Protocol
+ RTP - Real-Time Transport Protocol
+ RTCP - RTP Control Protocol
+ MIME - Multipurpose Internet Mail Extension
+ SDP - Session Description Protocol
+ VoIP - Voice-over-IP
+
+
+
+
+Ahmadi Standards Track [Page 3]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ The term "interoperable mode" in this document refers to VMR-WB mode
+ 3, which is interoperable with AMR-WB codec modes 0, 1, and 2.
+
+ The term "non-interoperable modes" in this document refers to VMR-WB
+ modes 0, 1, and 2.
+
+ The term "frame-block" is used in this document to describe the
+ time-synchronized set of speech frames in a multi-channel VMR-WB
+ session. In particular, in an N-channel session, a frame-block will
+ contain N speech frames, one from each of the channels, and all N
+ speech frames represent exactly the same time period.
+
+3. The Variable-Rate Multimode Wideband (VMR-WB) Speech Codec
+
+ VMR-WB is the wideband speech-coding standard developed by Third
+ Generation Partnership Project 2 (3GPP2) for encoding/decoding
+ wideband/narrowband speech content in multimedia services in 3G CDMA
+ cellular systems [1]. VMR-WB is a source-controlled variable-rate
+ multimode wideband speech codec. It has a number of operating modes,
+ where each mode is a tradeoff between voice quality and average data
+ rate. The operating mode in VMR-WB (as shown in Table 2) is chosen
+ based on the traffic condition of the network and the desired quality
+ of service. The desired average data rate (ADR) in each mode is
+ obtained by encoding speech frames at permissible rates (as shown in
+ Tables 1 and 3) compliant with CDMA2000 system, depending on the
+ instantaneous characteristics of input speech and the maximum and
+ minimum rate constraints imposed by the network operator.
+
+ While VMR-WB is a native CDMA codec complying with all CDMA system
+ requirements, it is further interoperable with AMR-WB [4,12] at
+ 12.65, 8.85, and 6.60 kbps. This is due to the fact that VMR-WB and
+ AMR-WB share the same core technology. This feature enables
+ Transcoder-Free (TrFO) interconnections between VMR-WB and AMR-WB
+ across different wireless/wireline systems (e.g., GSM/WCDMA and
+ CDMA2000) without use of unnecessary complex media format conversion.
+
+ Note that the concept of mode in VMR-WB is different from that of
+ AMR-WB where each fixed-rate AMR-WB codec mode is adapted to
+ prevailing channel conditions by a tradeoff between the total number
+ of source-coding and channel-coding bits.
+
+ VMR-WB is able to transition between various modes with no
+ degradation in voice quality that is attributable to the mode
+ switching itself. The operating mode of the VMR-WB encoder may be
+ switched seamlessly without prior knowledge of the decoder. Any
+ non-interoperable mode (i.e., VMR-WB modes 0, 1, or 2) can be chosen
+ depending on the traffic conditions (e.g., network congestion) and
+ the desired quality of service.
+
+
+
+Ahmadi Standards Track [Page 4]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ While in the interoperable mode (i.e., VMR-WB mode 3), mode switching
+ between VMR-WB modes is not allowed because there is only one AMR-WB
+ interoperable mode in VMR-WB. Since the AMR-WB codec may request a
+ mode change, depending on channel conditions, in-band data included
+ in VMR-WB frame structure (see Section 8 of [1] for more details) is
+ used during an interoperable interconnection to switch between VMR-WB
+ frame types 0, 1, and 2 in VMR-WB mode 3 (corresponding to AMR-WB
+ codec modes 0, 1, or 2).
+
+ As mentioned earlier, VMR-WB is compliant with CDMA2000 system with
+ the permissible encoding rates shown in Table 1.
+
+ +---------------------------+-----------------+---------------+
+ | Frame Type | Bits per Packet | Encoding Rate |
+ | | (Frame Size) | (kbps) |
+ +---------------------------+-----------------+---------------+
+ | Full-Rate | 266 | 13.3 |
+ | Half-Rate | 124 | 6.2 |
+ | Quarter-Rate | 54 | 2.7 |
+ | Eighth-Rate | 20 | 1.0 |
+ | Blank | 0 | 0 |
+ | Erasure | 0 | 0 |
+ +---------------------------+-----------------+---------------+
+
+ Table 1: CDMA2000 system permissible frame types and their
+ associated encoding rates
+
+ VMR-WB is robust to high percentage of frame loss and frames with
+ corrupted rate information. The reception of an Erasure
+ (SPEECH_LOST) frame type at decoder invokes the built-in frame error
+ concealment mechanism. The built-in frame error concealment
+ mechanism in VMR-WB conceals the effect of lost frames by exploiting
+ in-band data and the information available in the previous frames.
+
+3.1. Narrowband Speech Processing
+
+ VMR-WB has the capability to operate with either 16000-Hz or 8000-Hz
+ sampled input/output speech signals in all modes of operation [1].
+ The VMR-WB decoder does not require a priori knowledge about the
+ sampling rate of the original media (i.e., speech/audio signals
+ sampled at 8 or 16 kHz) at the input of the encoder. The VMR-WB
+ decoder, by default, generates 16000-Hz wideband output regardless of
+ the encoder input sampling frequency. Depending on the application,
+ the decoder can be configured to generate 8000-Hz output, as well.
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 5]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ Therefore, while this specification defines a 16000-Hz RTP clock rate
+ for VMR-WB codec, the injection and processing of 8000-Hz narrowband
+ media during a session is also allowed; however, a 16000-Hz RTP clock
+ rate MUST always be used.
+
+ The choice of VMR-WB output sampling frequency depends on the
+ implementation and the audio acoustic capabilities of the receiving
+ side.
+
+3.2. Continuous vs. Discontinuous Transmission
+
+ The circuit-switched operation of VMR-WB within a CDMA network
+ requires continuous transmission of the speech data during a
+ conversation. The intrinsic source-controlled variable-rate feature
+ of the CDMA speech codecs is required for optimal operation of the
+ CDMA system and interference control. However, VMR-WB has the
+ capability to operate in a discontinuous transmission mode for some
+ packet-switched applications over IP networks (e.g., VoIP), where the
+ number of transmitted bits and packets during silence period are
+ reduced to a minimum. The VMR-WB DTX operation is similar to that of
+ AMR-WB [4,12].
+
+3.3. Support for Multi-Channel Session
+
+ The octet-aligned RTP payload format defined in this document
+ supports multi-channel audio content (e.g., a stereophonic speech
+ session). Although VMR-WB codec itself does not support encoding of
+ multi-channel audio content into a single bit stream, it can be used
+ to encode and decode each of the individual channels separately.
+
+ To transport the separately encoded multi-channel content, the speech
+ frames for all channels that are framed and encoded for the same 20
+ ms periods are logically collected in a frame-block.
+
+ At the session setup, out-of-band signaling must be used to indicate
+ the number of channels in the session and the order of the speech
+ frames from different channels in each frame-block. When using SDP
+ for signaling (see Section 9.2 for more details), the number of
+ channels is specified in the rtpmap attribute, and the order of
+ channels carried in each frame-block is implied by the number of
+ channels as specified in Section 4.1 in [6].
+
+
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 6]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+4. Robustness against Packet Loss
+
+ The octet-aligned payload format described in this document (see
+ Section 6 for more details) supports several features, including
+ forward error correction (FEC) and frame interleaving, in order to
+ increase robustness against lost packets.
+
+4.1. Forward Error Correction (FEC)
+
+ The simple scheme of repetition of previously sent data is one way of
+ achieving FEC. Another possible scheme, which is more bandwidth
+ efficient, is to use payload-external FEC; e.g., RFC2733 [8], which
+ generates extra packets containing repair data.
+
+ The repetition method involves the simple retransmission of
+ previously transmitted frame-blocks together with the current frame-
+ block(s). This is done by using a sliding window to group the speech
+ frame-blocks to send in each payload. Figure 1 illustrates an
+ example.
+
+ In this example, each frame-block is retransmitted one time in the
+ following RTP payload packet. Here, f(n-2)..f(n+4) denotes a
+ sequence of speech frame-blocks, and p(n-1)..p(n+4) a sequence of
+ payload packets.
+
+ --+--------+--------+--------+--------+--------+--------+--------+--
+ | f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
+ --+--------+--------+--------+--------+--------+--------+--------+--
+
+ <---- p(n-1) ---->
+ <----- p(n) ----->
+ <---- p(n+1) ---->
+ <---- p(n+2) ---->
+ <---- p(n+3) ---->
+ <---- p(n+4) ---->
+
+ Figure 1: An example of redundant transmission
+
+ The use of this approach does not require signaling at the session
+ setup. In other words, the speech sender can choose to use this
+ scheme without consulting the receiver. This is because a packet
+ containing redundant frames will not look different from a packet
+ with only new frames. The receiver may receive multiple copies or
+ versions of a frame for a certain timestamp if no packet is lost. If
+ multiple versions of the same speech frame are received, it is
+ RECOMMENDED that the highest rate be used by the speech decoder.
+
+
+
+
+
+Ahmadi Standards Track [Page 7]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ This redundancy scheme provides the same functionality as that
+ described in RFC 2198, "RTP Payload for Redundant Audio Data" [10].
+ In most cases, the mechanism in this payload format is more efficient
+ and simpler than requiring both endpoints to support RFC 2198. If
+ the spread in time required between the primary and redundant
+ encodings is larger than 5 frame times, the bandwidth overhead of RFC
+ 2198 will be lower.
+
+ The sender is responsible for selecting an appropriate amount of
+ redundancy based on feedback about the channel (e.g., in RTCP
+ receiver reports) or network traffic. A sender SHOULD NOT base
+ selection of FEC on the CMR, as this parameter most probably was set
+ based on non-IP information. The sender is also responsible for
+ avoiding congestion, which may be aggravated by redundant
+ transmission (see Section 7).
+
+4.2. Frame Interleaving and Multi-Frame Encapsulation
+
+ To decrease protocol overhead, the octet-aligned payload format,
+ described in Section 6, allows several speech frame-blocks to be
+ encapsulated into a single RTP packet. One of the drawbacks of this
+ approach is that in case of packet loss several consecutive speech
+ frame-blocks are lost, which usually causes clearly audible
+ distortion in the reconstructed speech.
+
+ Interleaving of frame-blocks can improve the speech quality in such
+ cases by distributing the consecutive losses into a series of single
+ frame-block losses. However, interleaving and bundling several
+ frame-blocks per payload will also increase end-to-end delay and is
+ therefore not appropriate for all types of applications. Streaming
+ applications will most likely be able to exploit interleaving to
+ improve speech quality in lossy transmission conditions.
+
+ The octet-aligned payload format supports the use of frame
+ interleaving as an option. For the encoder (speech sender) to use
+ frame interleaving in its outbound RTP packets for a given session,
+ the decoder (speech receiver) needs to indicate its support via out-
+ of-band means (see Section 9).
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 8]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+5. VMR-WB Voice over IP Scenarios
+
+5.1. IP Terminal to IP Terminal
+
+ The primary scenario for this payload format is IP end-to-end between
+ two terminals incorporating VMR-WB codec, as shown in Figure 2.
+ Nevertheless, this scenario can be generalized to an interoperable
+ interconnection between VMR-WB-enabled and AMR-WB-enabled IP
+ terminals using the offer-answer model described in Section 9.3.
+ This payload format is expected to be useful for both conversational
+ and streaming services.
+
+ +----------+ +----------+
+ | | | |
+ | TERMINAL |<----------------------->| TERMINAL |
+ | | VMR-WB/RTP/UDP/IP | |
+ +----------+ +----------+
+ (or AMR-WB/RTP/UDP/IP)
+
+ Figure 2: IP terminal to IP terminal
+
+ A conversational service puts requirements on the payload format.
+ Low delay is a very important factor, i.e., fewer speech frame-blocks
+ per payload packet. Low overhead is also required when the payload
+ format traverses across low bandwidth links, especially if the
+ frequency of packets will be high.
+
+ Streaming service has less strict real-time requirements and
+ therefore can use a larger number of frame-blocks per packet than
+ conversational service. This reduces the overhead from IP, UDP, and
+ RTP headers. However, including several frame-blocks per packet
+ makes the transmission more vulnerable to packet loss, so
+ interleaving may be used to reduce the effect of packet loss on
+ speech quality. A streaming server handling a large number of
+ clients also needs a payload format that requires as few resources as
+ possible when doing packetization.
+
+ For VMR-WB-enabled IP terminals at both ends, depending on the
+ implementation, all modes of the VMR-WB codec can be used in this
+ scenario. Also, both header-free and octet-aligned payload formats
+ (see Section 6 for details) can be utilized. For the interoperable
+ interconnection between VMR-WB and AMR-WB, only VMR-WB mode 3 is
+ used, and all restrictions described in Section 9.3 apply.
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 9]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+5.2. GW to IP Terminal
+
+ Another scenario occurs when VMR-WB-encoded speech will be
+ transmitted from a non-IP system (e.g., 3GPP2/CDMA2000 network) to an
+ IP terminal, and/or vice versa, as depicted in Figure 3.
+
+ VMR-WB over
+ 3GPP2/CDMA2000 network
+ +------+ +----------+
+ | | | |
+ <-------------->| GW |<---------------------->| TERMINAL |
+ | | VMR-WB/RTP/UDP/IP | |
+ +------+ +----------+
+ |
+ | IP network
+ |
+
+ Figure 3: GW to VoIP terminal scenario
+
+ VMR-WB's capability to switch seamlessly between operational modes is
+ exploited in CDMA (non-IP) networks to optimize speech quality for a
+ given traffic condition. To preserve this functionality in scenarios
+ including a gateway to an IP network using the octet-aligned payload
+ format, a codec mode request (CMR) field is considered. The gateway
+ will be responsible for forwarding the CMR between the non-IP and IP
+ parts in both directions. The IP terminal SHOULD follow the CMR
+ forwarded by the gateway to optimize speech quality going to the
+ non-IP decoder. The mode control algorithm in the gateway SHOULD
+ accommodate the delay imposed by the IP network on the response to
+ CMR by the IP terminal.
+
+ The IP terminal SHOULD NOT set the CMR (see Section 6.3.2), but the
+ gateway can set the CMR value on frames going toward the encoder in
+ the non-IP part to optimize speech quality from that encoder to the
+ gateway and to perform congestion control on the IP network.
+
+5.3. GW to GW (between VMR-WB- and AMR-WB-Enabled Terminals)
+
+ A third likely scenario is that RTP/UDP/IP is used as transport
+ between two non-IP systems, i.e., IP is originated and terminated in
+ gateways on both sides of the IP transport, as illustrated in Figure
+ 4. This is the most likely scenario for an interoperable
+ interconnection between 3GPP/(GSM-WCDMA)/AMR-WB and
+ 3GPP2/CDMA2000/VMR-WB-enabled mobile stations. In this scenario, the
+ VMR-WB-enabled terminal also declares itself capable of AMR-WB with
+ restricted mode set as described in Section 9.3. The CMR value may be
+ set in packets received by the gateways on the IP network side. The
+ gateway should forward to the non-IP side a CMR value that is the
+
+
+
+Ahmadi Standards Track [Page 10]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ minimum of three values: (1) the CMR value it receives on the IP
+ side; (2) a CMR value it may choose for congestion control of
+ transmission on the IP side; and (3) the CMR value based on its
+ estimate of reception quality on the non-IP side. The details of the
+ traffic control algorithm are left to the implementation.
+
+ VMR-WB over AMR-WB over
+ 3GPP2/CDMA2000 network 3GPP/(GSM-WCDMA) network
+
+ +------+ +------+
+ (AMR-WB Payload) | | AMR-WB/RTP/UDP/IP| |(AMR-WB Payload)
+ <---------------->| GW |<---------------->| GW |<--------------->
+ | | | |
+ +------+ +------+
+ | IP network |
+ | |
+
+ Figure 4: GW to GW scenario (AMR-WB <-> VMR-WB
+ interoperable interconnection)
+
+ During and upon initiation of an interoperable interconnection
+ between VMR-WB and AMR-WB, only VMR-WB mode 3 can be used. There are
+ three Frame Types (i.e., FT=0, 1, or 2; see Table 3) within this mode
+ that are compatible with AMR-WB codec modes 0, 1, and 2,
+ respectively. If the AMR-WB codec is engaged in an interoperable
+ interconnection with VMR-WB, the active AMR-WB codec mode set needs
+ to be limited to 0, 1, and 2.
+
+5.4. GW to GW (between Two VMR-WB-Enabled Terminals)
+
+ The fourth example VoIP scenario is composed of a RTP/UDP/IP
+ transport between two non-IP systems; i.e., IP is originated and
+ terminated in gateways on both sides of the IP transport, as
+ illustrated in Figure 5. This is the most likely scenario for
+ Mobile-Station-to-Mobile-Station (MS-to-MS) Transcoder-Free (TrFO)
+ interconnection between two 3GPP2/CDMA2000 terminals that both use
+ VMR-WB codec.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 11]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ VMR-WB over VMR-WB over
+ 3GPP2/CDMA2000 network 3GPP2/CDMA2000 network
+
+ +------+ +------+
+ | | | |
+ <------------>| GW |<----------------->| GW |<------------>
+ | | VMR-WB/RTP/UDP/IP | |
+ +------+ +------+
+ | IP network |
+ | |
+
+ Figure 5: GW to GW scenario (a CDMA2000 MS-to-MS VoIP scenario)
+
+6. VMR-WB RTP Payload Formats
+
+ For a given session, the payload format can be either header free or
+ octet aligned, depending on the mode of operation that is established
+ for the session via out-of-band means and the application.
+
+ The header-free payload format is designed for maximum bandwidth
+ efficiency, simplicity, and low latency. Only one codec data frame
+ can be sent in each header-free payload format packet. None of the
+ payload header fields or table of contents (ToC) entries is present
+ (the same consideration is also made in [11]).
+
+ In the octet-aligned payload format, all the fields in a payload,
+ including payload header, table of contents entries, and speech
+ frames themselves, are individually aligned to octet boundaries to
+ make implementations efficient.
+
+ Note that octet alignment of a field or payload means that the last
+ octet is padded with zeroes in the least significant bits to fill the
+ octet. Also note that this padding is separate from padding
+ indicated by the P bit in the RTP header.
+
+ Between the two payload formats, only the octet-aligned format has
+ the capability to use the interleaving to make the speech transport
+ robust to packet loss.
+
+ The VMR-WB octet-aligned payload format in the interoperable mode is
+ identical to that of AMR-WB (i.e., RFC 3267).
+
+
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 12]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+6.1. RTP Header Usage
+
+ The format of the RTP header is specified in [3]. This payload
+ format uses the fields of the header in a manner consistent with that
+ specification.
+
+ The RTP timestamp corresponds to the sampling instant of the first
+ sample encoded for the first frame-block in the packet. The
+ timestamp clock frequency is the same as the default sampling
+ frequency (i.e., 16 kHz), so the timestamp unit is in samples.
+
+ The duration of one speech frame-block is 20 ms for VMR-WB. For
+ normal wideband operation of VMR-WB, the input/output media sampling
+ frequency is 16 kHz, corresponding to 320 samples per frame from each
+ channel. Thus, the timestamp is increased by 320 for VMR-WB for each
+ consecutive frame-block.
+
+ The VMR-WB codec is capable of processing speech/audio signals
+ sampled at 8 kHz. By default, the VMR-WB decoder output sampling
+ frequency is 16 kHz. Depending on the application, the decoder can
+ be configured to generate 8-kHz output sampling frequency, as well.
+ Since the VMR-WB RTP payload formats for the 8- and 16-kHz sampled
+ media are identical and the VMR-WB decoder does not need a priori
+ knowledge about the encoder input sampling frequency, a fixed RTP
+ clock rate of 16000 Hz is defined for VMR-WB codec. This would allow
+ injection or processing of 8-kHz sampled speech/audio media without
+ having to change the RTP clock rate during a session. Note that the
+ timestamp is incremented by 320 per frame-block for 8-kHz sampled
+ media, as well.
+
+ A packet may contain multiple frame-blocks of encoded speech or
+ comfort noise parameters. If interleaving is employed, the frame-
+ blocks encapsulated into a payload are picked according to the
+ interleaving rules defined in Section 6.3.2. Otherwise, each packet
+ covers a period of one or more contiguous 20-ms frame-block
+ intervals. In case the data from all the channels for a particular
+ frame-block in the period is missing (for example, at a gateway from
+ some other transport format), it is possible to indicate that no data
+ is present for that frame-block instead of breaking a multi-frame-
+ block packet into two, as explained in Section 6.3.2.
+
+ No matter which payload format is used, the RTP payload is always
+ made an integral number of octets long by padding with zero bits if
+ necessary. If additional padding is required to bring the payload
+ length to a larger multiple of octets or for some other purpose, then
+ the P bit in the RTP header MAY be set, and padding appended, as
+ specified in [3].
+
+
+
+
+Ahmadi Standards Track [Page 13]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ The RTP header marker bit (M) SHALL be always set to 0 if the VMR-WB
+ codec operates in continuous transmission. When operating in
+ discontinuous transmission (DTX), the RTP header marker bit SHALL be
+ set to 1 if the first frame-block carried in the packet contains a
+ speech frame, which is the first in a talkspurt. For all other
+ packets, the marker bit SHALL be set to zero (M=0).
+
+ The assignment of an RTP payload type for this payload format is
+ outside the scope of this document and will not be specified here.
+ It is expected that the RTP profile under which this payload format
+ is being used will assign a payload type for this encoding or specify
+ that the payload type is to be bound dynamically (see Section 9).
+
+6.2. Header-Free Payload Format
+
+ The header-free payload format is designed for maximum bandwidth
+ efficiency, simplicity, and minimum delay. Only one speech data
+ frame presents in each header-free payload format packet. None of
+ the payload header fields or ToC entries is present. The encoding
+ rate for the speech frame can be determined from the length of the
+ speech data frame, since there is only one speech data frame in each
+ header-free payload format.
+
+ The use of the RTP header fields for header-free payload format is
+ the same as the corresponding one for the octet-aligned payload
+ format. The detailed bit mapping of speech data packets permissible
+ for this payload format is described in Section 8 of [1]. Since the
+ header-free payload format is not compatible with AMR-WB RTP payload,
+ only non-interoperable modes of VMR-WB SHALL be used with this
+ payload format. That is, FT=0, 1, 2, and 9 SHALL NOT be used with
+ header-free payload format.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | RTP Header [3] |
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ | |
+ + ONLY one speech data frame +-+-+-+-+-+-+-+-+
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Note that the mode of operation, using this payload format, is
+ decided by the transmitting (encoder) site. The default mode of
+ operation for VMR-WB encoder is mode 0 [1]. The mode change request
+ MAY also be sent through non-RTP means, which is out of the scope of
+ this specification.
+
+
+
+
+Ahmadi Standards Track [Page 14]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+6.3. Octet-Aligned Payload Format
+
+6.3.1. Payload Structure
+
+ The complete payload consists of a payload header, a payload table of
+ contents, and speech data representing one or more speech frame-
+ blocks. The following diagram shows the general payload format
+ layout:
+
+ +----------------+-------------------+----------------
+ | Payload header | Table of contents | Speech data ...
+ +----------------+-------------------+----------------
+
+6.3.2. The Payload Header
+
+ In octet-aligned payload format, the payload header consists of a
+ 4-bit CMR, 4 reserved bits, and, optionally, an 8-bit interleaving
+ header, as shown below.
+
+ 0 1
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+ +-+-+-+-+-+-+-+-+- - - - - - - -
+ | CMR |R|R|R|R| ILL | ILP |
+ +-+-+-+-+-+-+-+-+- - - - - - - -
+
+ CMR (4 bits): This indicates a codec mode request sent to the speech
+ encoder at the site of the receiver of this payload. CMR value 15
+ indicates that no mode request is present, and other unused values
+ are reserved for future use.
+
+ The value of the CMR field is set according to the following table:
+
+ +-------+----------------------------------------------------------+
+ | CMR | VMR-WB Operating Modes |
+ +-------+----------------------------------------------------------+
+ | 0 | VMR-WB mode 3 (AMR-WB interoperable mode at 6.60 kbps) |
+ | 1 | VMR-WB mode 3 (AMR-WB interoperable mode at 8.85 kbps) |
+ | 2 | VMR-WB mode 3 (AMR-WB interoperable mode at 12.65 kbps) |
+ | 3 | VMR-WB mode 2 |
+ | 4 | VMR-WB mode 1 |
+ | 5 | VMR-WB mode 0 |
+ | 6 | VMR-WB mode 2 with maximum half-rate encoding |
+ | 7-14 | (reserved) |
+ | 15 | No Preference (no mode request is present) |
+ +-------+----------------------------------------------------------+
+
+ Table 2: List of valid CMR values and their associated VMR-WB
+ operating modes
+
+
+
+Ahmadi Standards Track [Page 15]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ R: This is a reserved bit that MUST be set to zero. The receiver
+ MUST ignore all R bits.
+
+ ILL (4 bits, unsigned integer): This is an OPTIONAL field that is
+ present only if interleaving is signaled out-of-band for the session.
+ ILL=L indicates to the receiver that the interleaving length is L+1,
+ in number of frame-blocks.
+
+ ILP (4 bits, unsigned integer): This is an OPTIONAL field that is
+ present only if interleaving is signaled. ILP MUST take a value
+ between 0 and ILL, inclusive, indicating the interleaving index for
+ frame-blocks in this payload in the interleave group. If the value
+ of ILP is found greater than ILL, the payload SHOULD be discarded.
+
+ ILL and ILP fields MUST be present in each packet in a session if
+ interleaving is signaled for the session.
+
+ The mode request received in the CMR field is valid until the next
+ CMR is received, i.e., until a newly received CMR value overrides the
+ previous one. Therefore, if a terminal continuously wishes to
+ receive frames in the same mode, x, it needs to set CMR=x for all its
+ outbound payloads, and if a terminal has no preference in which mode
+ to receive, it SHOULD set CMR=15 in all its outbound payloads.
+
+ If a payload is received with a CMR value that is not valid, the CMR
+ MUST be ignored by the receiver.
+
+ In a multi-channel session, CMR SHOULD be interpreted by the receiver
+ of the payload as the desired encoding mode for all the channels in
+ the session, if the network allows.
+
+ There are two factors that affect the VMR-WB mode selection: (i) the
+ performance of any CDMA link connected via a gateway (e.g., in a GW
+ to IP terminal scenario), and (ii) the congestion state of an IP
+ network. The CDMA link performance is signaled via the CMR field,
+ which is not used by IP-only end-points. The IP network state is
+ monitored using, for example, RTCP. A sender needs to select the
+ operating mode to satisfy both these constraints (see Section 7).
+
+ The encoder SHOULD follow a received mode request, but MAY change to
+ a different mode if the network necessitates it, for example, to
+ control congestion.
+
+ The CMR field MUST be set to 15 for packets sent to a multicast
+ group. The encoder in the speech sender SHOULD ignore mode requests
+ when sending speech to a multicast session but MAY use RTCP feedback
+ information as a hint that a mode change is needed.
+
+
+
+
+Ahmadi Standards Track [Page 16]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ If interleaving option is utilized, interleaving MUST be performed on
+ a frame-block basis, as opposed to a frame basis, in a multi-channel
+ session.
+
+ The following example illustrates the arrangement of speech frame-
+ blocks in an interleave group during an interleave session. Here we
+ assume ILL=L for the interleave group that starts at speech frame-
+ block n. We also assume that the first payload packet of the
+ interleave group is s and the number of speech frame-blocks carried
+ in each payload is N. Then we will have
+
+ Payload s (the first packet of this interleave group):
+ ILL=L, ILP=0,
+
+ Carry frame-blocks: n, n+(L+1), n+2*(L+1),..., n+(N-1)*(L+1)
+
+ Payload s+1 (the second packet of this interleave group):
+ ILL=L, ILP=1,
+ Carry frame-blocks: n+1, n+1+(L+1), n+1+2*(L+1),..., n+1+
+ (N-1)*(L+1)
+
+ ...
+
+ Payload s+L (the last packet of this interleave group):
+ ILL=L, ILP=L,
+ Carry frame-blocks: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+
+ (N-1)*(L+1)
+
+ The next interleave group will start at frame-block n+N*(L+1). There
+ will be no interleaving effect unless the number of frame-blocks per
+ packet (N) is at least 2. Moreover, the number of frame-blocks per
+ payload (N) and the value of ILL MUST NOT be changed inside an
+ interleave group. In other words, all payloads in an interleave
+ group MUST have the same ILL and MUST contain the same number of
+ speech frame-blocks.
+
+ The sender of the payload MUST only apply interleaving if the
+ receiver has signaled its use through out-of-band means. Since
+ interleaving will increase buffering requirements at the receiver,
+ the receiver uses MIME parameter "interleaving=I" to set the maximum
+ number of frame-blocks allowed in an interleaving group to I.
+
+ When performing interleaving, the sender MUST use a proper number of
+ frame-blocks per payload (N) and ILL so that the resulting size of an
+ interleave group is less than or equal to I, i.e., N*(L+1)<=I.
+
+ The following example shows the ToC of three consecutive packets,
+ each carrying 3 frame-blocks, in an interleaved two-channel session.
+
+
+
+Ahmadi Standards Track [Page 17]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ Here, the two channels are left (L) and right (R), with L coming
+ before R, and the interleaving length is 3 (i.e., ILL=2). This makes
+ the interleave group 9 frame-blocks large.
+
+ Packet #1
+ ---------
+
+ ILL=2, ILP=0:
+ +----+----+----+----+----+----+
+ | 1L | 1R | 4L | 4R | 7L | 7R |
+ +----+----+----+----+----+----+
+ |<------->|<------->|<------->|
+ Frame Frame Frame
+ Block 1 Block 4 Block 7
+
+ Packet #2
+ ---------
+
+ ILL=2, ILP=1:
+
+ +----+----+----+----+----+----+
+ | 2L | 2R | 5L | 5R | 8L | 8R |
+ +----+----+----+----+----+----+
+ |<------->|<------->|<------->|
+ Frame Frame Frame
+ Block 2 Block 5 Block 8
+
+ Packet #3
+ ---------
+
+ ILL=2, ILP=2:
+ +----+----+----+----+----+----+
+ | 3L | 3R | 6L | 6R | 9L | 9R |
+ +----+----+----+----+----+----+
+ |<------->|<------->|<------->|
+ Frame Frame Frame
+ Block 3 Block 6 Block 9
+
+6.3.3. The Payload Table of Contents
+
+ The table of contents (ToC) in octet-aligned payload format consists
+ of a list of ToC entries where each entry corresponds to a speech
+ frame carried in the payload, i.e., when interleaving is used, the
+ frame-blocks in the ToC will almost never be placed consecutive in
+ time. Instead, the presence and order of the frame-blocks in a
+ packet will follow the pattern described in 6.3.2.
+
+
+
+
+
+Ahmadi Standards Track [Page 18]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ +---------------------+
+ | list of ToC entries |
+ +---------------------+
+
+ A ToC entry for the octet-aligned payload format is as follows:
+
+ 0 1 2 3 4 5 6 7
+ +-+-+-+-+-+-+-+-+
+ |F| FT |Q|P|P|
+ +-+-+-+-+-+-+-+-+
+
+ The table of contents (ToC) consists of a list of ToC entries, each
+ representing a speech frame.
+
+ F (1 bit): If set to 1, indicates that this frame is followed by
+ another speech frame in this payload; if set to 0,
+ indicates that this frame is the last frame in this
+ payload.
+
+ FT (4 bits): Frame type index whose value is chosen according to
+ Table 3.
+
+ During the interoperable mode, FT=14 (SPEECH_LOST) and
+ FT=15 (NO_DATA) are used to indicate frames that are
+ either lost or not being transmitted in this payload,
+ respectively. FT=14 or 15 MAY be used in the non-
+ interoperable modes to indicate frame erasure or blank
+ frame, respectively (see Section 2.1 of [1]).
+
+ If a payload with an invalid FT value is received, the
+ payload MUST be discarded. Note that for ToC entries
+ with FT=14 or 15, there will be no corresponding speech
+ frame in the payload.
+
+ Depending on the application and the mode of operation
+ of VMR-WB, any combination of the permissible frame
+ types (FT) shown in Table 3 MAY be used.
+
+ Q (1 bit): Frame quality indicator. If set to 0, indicates that
+ the corresponding frame is corrupted. During the
+ interoperable mode, the receiver side (with AMR-WB
+ codec) should set the RX_TYPE to either SPEECH_BAD or
+ SID_BAD depending on the frame type (FT), if Q=0. The
+ VMR-WB encoder always sets Q bit to 1. The VMR-WB
+ decoder may ignore the Q bit.
+
+ P bits: Padding bits MUST be set to zero and MUST be ignored by
+ a receiver.
+
+
+
+Ahmadi Standards Track [Page 19]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ +----+--------------------------------------------+-----------------+
+ | FT | Encoding Rate |Frame Size (Bits)|
+ +----+--------------------------------------------+-----------------+
+ | 0 | Interoperable Full-Rate (AMR-WB 6.60 kbps) | 132 |
+ | 1 | Interoperable Full-Rate (AMR-WB 8.85 kbps) | 177 |
+ | 2 | Interoperable Full-Rate (AMR-WB 12.65 kbps)| 253 |
+ | 3 | Full-Rate 13.3 kbps | 266 |
+ | 4 | Half-Rate 6.2 kbps | 124 |
+ | 5 | Quarter-Rate 2.7 kbps | 54 |
+ | 6 | Eighth-Rate 1.0 kbps | 20 |
+ | 7 | (reserved) | - |
+ | 8 | (reserved) | - |
+ | 9 | CNG (AMR-WB SID) | 40 |
+ | 10 | (reserved) | - |
+ | 11 | (reserved) | - |
+ | 12 | (reserved) | - |
+ | 13 | (reserved) | - |
+ | 14 | Erasure (AMR-WB SPEECH_LOST) | 0 |
+ | 15 | Blank (AMR-WB NO_DATA) | 0 |
+ +----+--------------------------------------------+-----------------+
+
+ Table 3: VMR-WB payload frame types for real-time transport
+
+ For multi-channel sessions, the ToC entries of all frames from a
+ frame-block are placed in the ToC in consecutive order. Therefore,
+ with N channels and K speech frame-blocks in a packet, there MUST be
+ N*K entries in the ToC, and the first N entries will be from the
+ first frame-block, the second N entries will be from the second
+ frame-block, and so on.
+
+6.3.4. Speech Data
+
+ Speech data of a payload contains one or more speech frames as
+ described in the ToC of the payload.
+
+ Each speech frame represents 20 ms of speech encoded in one of the
+ available encoding rates depending on the operation mode. The length
+ of the speech frame is defined by the frame type in the FT field,
+ with the following considerations:
+
+ - The last octet of each speech frame MUST be padded with zeroes at
+ the end if not all bits in the octet are used. In other words,
+ each speech frame MUST be octet-aligned.
+
+ - When multiple speech frames are present in the speech data, the
+ speech frames MUST be arranged one whole frame after another.
+
+
+
+
+
+Ahmadi Standards Track [Page 20]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ The order and numbering notation of the speech data bits are as
+ specified in the VMR-WB standard specification [1].
+
+ The payload begins with the payload header of one octet, or two if
+ frame interleaving is selected. The payload header is followed by
+ the table of contents consisting of a list of one-octet ToC entries.
+
+ The speech data follows the table of contents. For the purpose of
+ packetization, all the octets comprising a speech frame are appended
+ to the payload as a unit. The speech frames are packed in the same
+ order as their corresponding ToC entries are arranged in the ToC
+ list, with the exception that if a given frame has a ToC entry with
+ FT=14 or 15, there will be no data octets present for that frame.
+
+6.3.5. Payload Example: Basic Single Channel Payload Carrying Multiple
+ Frames
+
+ The following diagram shows an octet-aligned payload format from a
+ single channel session that carries two VMR-WB Full-Rate frames
+ (FT=3). In the payload, a codec mode request is sent (e.g., CMR=4),
+ requesting that the encoder at the receiver's side use VMR-WB mode 1.
+ No interleaving is used. Note that in the example below the last
+ octet in both speech frames is padded with zeros to make them octet
+ aligned.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | CMR=4 |R|R|R|R|1|FT#1=3 |Q|P|P|0|FT#2=3 |Q|P|P| f1(0..7) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | f1(8..15) | f1(16..23) | ... |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | r |P|P|P|P|P|P| f2(0..7) | f2(8..15) | f2(16..23) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ : ... :
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | ... | l |P|P|P|P|P|P|
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ r= f1(264,265)
+ l= f2(264,265)
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 21]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+6.4. Implementation Considerations
+
+ An application implementing this payload format MUST understand all
+ the payload parameters. Any mapping of the parameters to a signaling
+ protocol MUST support all parameters. Therefore, an implementation
+ of this payload format in an application using SDP is required to
+ understand all the payload parameters in their SDP-mapped form. This
+ requirement ensures that an implementation always can decide whether
+ it is capable of communicating.
+
+ To enable efficient interoperable interconnection with AMR-WB and to
+ ensure that a VMR-WB terminal appropriately declares itself as a
+ AMR-WB-capable terminal (see Section 9.3), it is also RECOMMENDED
+ that a VMR-WB RTP payload implementation understand relevant AMR-WB
+ signaling.
+
+ To further ensure interoperability between various implementations of
+ VMR-WB, implementations SHALL support both header-free and octet-
+ aligned payload formats. Support of interleaving is optional.
+
+6.4.1. Decoding Validation and Provision for Lost or Late Packets
+
+ When processing a received payload packet, if the receiver finds that
+ the calculated payload length, based on the information of the
+ session and the values found in the payload header fields, does not
+ match the size of the received packet, the receiver SHOULD discard
+ the packet to avoid potential degradation of speech quality and to
+ invoke the VMR-WB built-in frame error concealment mechanism.
+ Therefore, invalid packets SHALL be treated as lost packets.
+
+ Late packets (i.e., the unavailability of a packet when it is needed
+ for decoding at the receiver) should be treated as lost packets.
+ Furthermore, if the late packet is part of an interleave group,
+ depending upon the availability of the other packets in that
+ interleave group, decoding must be resumed from the next available
+ frame (sequential order). In other words, the unavailability of a
+ packet in an interleave group at a certain time should not invalidate
+ the other packets within that interleave group that may arrive later.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 22]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+7. Congestion Control
+
+ The general congestion control considerations for transporting RTP
+ data apply to VMR-WB speech over RTP as well. However, the multimode
+ capability of VMR-WB speech codec may provide an advantage over other
+ payload formats for controlling congestion since the bandwidth demand
+ can be adjusted by selecting a different operating mode.
+
+ Another parameter that may impact the bandwidth demand for VMR-WB is
+ the number of frame-blocks that are encapsulated in each RTP payload.
+ Packing more frame-blocks in each RTP payload can reduce the number
+ of packets sent and hence the overhead from RTP/UDP/IP headers, at
+ the expense of increased delay.
+
+ If forward error correction (FEC) is used to alleviate the packet
+ loss, the amount of redundancy added by FEC will need to be regulated
+ so that the use of FEC itself does not cause a congestion problem.
+
+ Congestion control for RTP SHALL be used in accordance with RFC 3550
+ [3] and any applicable RTP profile, for example, RFC 3551 [6]. This
+ means that congestion control is required for any transmission over
+ unmanaged best-effort networks.
+
+ Congestion on the IP network is managed by the IP sender. Feedback
+ about congestion SHOULD be provided to that IP sender through RTCP or
+ other means, and then the sender can choose to avoid congestion using
+ the most appropriate mechanism. That may include selecting an
+ appropriate operating mode, but also includes adjusting the level of
+ redundancy or number of frames per packet.
+
+8. Security Considerations
+
+ RTP packets using the payload format defined in this specification
+ are subject to the general security considerations discussed in RTP
+ [3] and any applicable profile such as AVP [9] or SAVP [10].
+
+ As this format transports encoded audio, the main security issues
+ include confidentiality, integrity protection, and data origin
+ authentication of the audio itself. The payload format itself does
+ not have any built-in security mechanisms. Any suitable external
+ mechanisms, such as SRTP [10], MAY be used.
+
+ This payload format and the VMR-WB decoder do not exhibit any
+ significant non-uniformity in the receiver-side computational
+ complexity for packet processing; thus, they are unlikely to pose a
+ denial-of-service threat due to the receipt of pathological data.
+
+
+
+
+
+Ahmadi Standards Track [Page 23]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+8.1. Confidentiality
+
+ In order to ensure confidentiality of the encoded audio, all audio
+ data bits MUST be encrypted. There is less need to encrypt the
+ payload header or the table of contents since they only carry
+ information about the frame type. This information could also be
+ useful to a third party, for example, for quality monitoring.
+
+ The use of interleaving in conjunction with encryption can have a
+ negative impact on the confidentiality for a short period of time.
+ Consider the following packets (in brackets) containing frame numbers
+ as indicated: {10, 14, 18}, {13, 17, 21}, {16, 20, 24} (a typical
+ continuous diagonal interleaving pattern). The originator wishes to
+ deny some participants the ability to hear material starting at time
+ 16. Simply changing the key on the packet with the timestamp at or
+ after 16, and denying the new key to those participants, does not
+ achieve this; frames 17, 18, and 21 have been supplied in prior
+ packets under the prior key, and error concealment may make the audio
+ intelligible at least as far as frame 18 or 19, and possibly further.
+
+8.2. Authentication and Integrity
+
+ To authenticate the sender of the speech, an external mechanism MUST
+ be used. It is RECOMMENDED that such a mechanism protects both the
+ complete RTP header and the payload (speech and data bits).
+
+ Data tampering by a man-in-the-middle attacker could replace audio
+ content and also result in erroneous depacketization/decoding that
+ could lower the audio quality. For example, tampering with the CMR
+ field may result in speech of a different quality than desired.
+
+9. Payload Format Parameters
+
+ This section defines the parameters that may be used to select
+ optional features in the VMR-WB RTP payload formats.
+
+ The parameters are defined here as part of the MIME subtype
+ registration for the VMR-WB speech codec. A mapping of the
+ parameters into the Session Description Protocol (SDP) [5] is also
+ provided for those applications that use SDP. In control protocols
+ that do not use MIME or SDP, the media type parameters must be mapped
+ to the appropriate format used with that control protocol.
+
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 24]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+9.1. VMR-WB RTP Payload MIME Registration
+
+ The MIME subtype for the Variable-Rate Multimode Wideband (VMR-WB)
+ audio codec is allocated from the IETF tree since VMR-WB is expected
+ to be a widely used speech codec in multimedia streaming and
+ messaging as well as in VoIP applications. This MIME registration
+ only covers real-time transfers via RTP.
+
+ Note, the receiver MUST ignore any unspecified parameter and use the
+ default values instead. Also note that if no input parameters are
+ defined, the default values will be used.
+
+ Media Type name: audio
+
+ Media subtype name: VMR-WB
+
+ Required parameters: none
+
+ Furthermore, if the interleaving parameter is present, the parameter
+ "octet-align=1" MUST also be present.
+
+OPTIONAL parameters:
+
+ mode-set: Requested VMR-WB operating mode set. Restricts
+ the active operating modes to a subset of all
+ modes. Possible values are a comma-separated
+ list of integer values. Currently, this list
+ includes modes 0, 1, 2, and 3 [1], but MAY be
+ extended in the future. If such mode-set is
+ specified during session initiation, the encoder
+ MUST NOT use modes outside of the subset. If not
+ present, all operating modes in the set 0 to 3 are
+ allowed for the session.
+
+ channels: The number of audio channels. The possible
+ values and their respective channel order
+ is specified in Section 4.1 in [6]. If
+ omitted, it has the default value of 1.
+
+ octet-align: RTP payload format; permissible values are 0 and
+ 1. If 1, octet-aligned payload format SHALL be
+ used. If 0 or if not present, header-free payload
+ format is employed (default).
+
+ maxptime: See RFC 3267 [4]
+
+
+
+
+
+
+Ahmadi Standards Track [Page 25]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ interleaving: Indicates that frame-block level
+ interleaving SHALL be used for the session.
+ Its value defines the maximum number of
+ frame-blocks allowed in an interleaving
+ group (see Section 6.3.1). If this
+ parameter is not present, interleaving
+ SHALL NOT be used. The presence of this
+ parameter also implies automatically that
+ octet-aligned operation SHALL be used.
+
+ ptime: See RFC2327 [5]. It SHALL be at least one
+ frame size for VMR-WB.
+
+ dtx: Permissible values are 0 and 1. The default
+ is 0 (i.e., No DTX) where VMR-WB normally
+ operates as a continuous variable-rate
+ codec. If dtx=1, the VMR-WB codec will
+ operate in discontinuous transmission mode
+ where silence descriptor (SID) frames are
+ sent by the VMR-WB encoder during silence
+ intervals with an adjustable update
+ frequency. The selection of the SID update-rate
+ depends on the implementation and
+ other network considerations that are
+ beyond the scope of this specification.
+
+ Encoding considerations:
+
+ This type is only defined for transfer of VMR-WB-encoded data
+ via RTP (RFC 3550) using the payload formats specified in
+ Section 6 of RFC 4348.
+
+ Security considerations:
+
+ See Section 8 of RFC 4348.
+
+ Public specification:
+
+ The VMR-WB speech codec is specified in
+ 3GPP2 specifications C.S0052-0 version 1.0.
+ Transfer methods are specified in RFC 4348.
+
+ Additional information:
+
+ Person & email address to contact for further information:
+
+ Sassan Ahmadi, Ph.D. sassan.ahmadi@ieee.org
+
+
+
+
+Ahmadi Standards Track [Page 26]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ Intended usage: COMMON.
+
+ It is expected that many VoIP, multimedia messaging and
+ streaming applications (as well as mobile applications)
+ will use this type.
+
+ Author/Change controller:
+
+ IETF Audio/Video Transport working group delegated from the IESG
+
+9.2. Mapping MIME Parameters into SDP
+
+ The information carried in the MIME media type specification has a
+ specific mapping to fields in the Session Description Protocol (SDP)
+ [5], which is commonly used to describe RTP sessions. When SDP is
+ used to specify sessions employing the VMR-WB codec, the mapping is
+ as follows:
+
+ - The media type ("audio") goes in SDP "m=" as the media name.
+
+ - The media subtype (payload format name) goes in SDP "a=rtpmap"
+ as the encoding name. The RTP clock rate in "a=rtpmap" MUST be
+ 16000 for VMR-WB.
+
+ - The parameter "channels" (number of channels) MUST be either
+ explicitly set to N or omitted, implying a default value of 1.
+ The values of N that are allowed is specified in Section 4.1 in
+ [6]. The parameter "channels", if present, is specified
+ subsequent to the MIME subtype and RTP clock rate as an encoding
+ parameter in the "a=rtpmap" attribute.
+
+ - The parameters "ptime" and "maxptime" go in the SDP "a=ptime"
+ and
+ "a=maxptime" attributes, respectively.
+
+ - Any remaining parameters go in the SDP "a=fmtp" attribute by
+ copying them directly from the MIME media type string as a
+ semicolon-separated list of parameter=value pairs.
+
+ Some examples of SDP session descriptions utilizing VMR-WB encodings
+ follow.
+
+ Example of usage of VMR-WB in a possible VoIP scenario (wideband
+ audio):
+
+ m=audio 49120 RTP/AVP 98
+ a=rtpmap:98 VMR-WB/16000
+ a=fmtp:98 octet-align=1
+
+
+
+Ahmadi Standards Track [Page 27]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ Example of usage of VMR-WB in a possible streaming scenario (two
+ channel stereo):
+
+ m=audio 49120 RTP/AVP 99
+ a=rtpmap:99 VMR-WB/16000/2
+ a=fmtp:99 octet-align=1; interleaving=30
+ a=maxptime:100
+
+9.3. Offer-Answer Model Considerations
+
+ To achieve good interoperability for the VMR-WB RTP payload in an
+ Offer-Answer negotiation usage in SDP [13], the following
+ considerations are made:
+
+ - The rate, channel, and payload configuration parameters (octet-
+ align and interleaving) SHALL be used symmetrically, i.e., offer
+ and answer must use the same values. The maximum size of the
+ interleaving buffer is, however, declarative, and each agent
+ specifies the value it supports to receive for recvonly and
+ sendrecv streams. For sendonly streams, the value indicates what
+ the agent desires to use.
+
+ - To maintain interoperability among all implementations of VMR-WB
+ that may or may not support all the codec's modes of operation, the
+ operational modes that are supported by an implementation MAY be
+ identified at session initiation. The mode-set parameter is
+ declarative, and only operating modes that have been indicated to
+ be supported by both ends SHALL be used. If the answerer is not
+ supporting any of the operating modes provided in the offer, the
+ complete payload type declaration SHOULD be rejected by removing it
+ from the answer.
+
+ - The remaining parameters are all declarative; i.e., for sendonly
+ streams they provide parameters that the agent desires to use,
+ while for recvonly and sendrecv streams they declare the parameters
+ that it accepts to receive. The dtx parameter is used to indicate
+ DTX support and capability, while the media sender is only
+ RECOMMENDED to send using the DTX in these cases. If DTX is not
+ supported by the media sender, it will send media without DTX; this
+ will not affect interoperability only the resource consumption.
+
+ - Both header-free and octet-aligned payload format configurations
+ MAY be offered by a VMR-WB enabled terminal. However, for an
+ interoperable interconnection with AMR-WB, only octet-aligned
+
+ - The parameters "maxptime" and "ptime" should in most cases not
+ affect the interoperability; however, the setting of the parameters
+ can affect the performance of the application.
+
+
+
+Ahmadi Standards Track [Page 28]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ - To maintain interoperability with AMR-WB in cases where negotiation
+ is possible using the VMR-WB interoperable mode, a VMR-WB-enabled
+ terminal SHOULD also declare itself capable of AMR-WB with limited
+ mode set (i.e., only AMR-WB codec modes 0, 1, and 2 are allowed)
+ and of octet-align mode of operation.
+
+ Example:
+
+ m=audio 49120 RTP/AVP 98 99
+ a=rtpmap:98 VMR-WB/16000
+ a=rtpmap:99 AMR-WB/16000
+ a=fmtp:99 octet-align=1; mode-set=0,1,2
+
+ An example of offer-answer exchange for the VoIP scenario described
+ in Section 5.3 is as follows:
+
+ CDMA2000 terminal -> WCDMA terminal Offer:
+ m=audio 49120 RTP/AVP 98 97
+ a=rtpmap:98 VMR-WB/16000
+ a=fmtp:98 octet-align=1
+ a=rtpmap:97 AMR-WB/16000
+ a=fmtp:97 mode-set=0,1,2; octet-align=1
+
+ WCDMA terminal -> CDMA2000 terminal Answer:
+ m=audio 49120 RTP/AVP 97
+ a=rtpmap:97 AMR-WB/16000
+ a=fmtp:97 mode-set=0,1,2; octet-align=1;
+
+ For declarative use of SDP such as in SAP [14] and RTSP [15], all
+ parameters are declarative and provide the parameters that SHALL be
+ used when receiving and/or sending the configured stream.
+
+10. IANA Considerations
+
+ The IANA has registered one new MIME subtype (audio/VMR-WB); see
+ Section 9.
+
+11. Acknowledgements
+
+ The author would like to thank Redwan Salami of VoiceAge Corporation,
+ Ari Lakaniemi of Nokia Inc., and IETF/AVT chairs Colin Perkins and
+ Magnus Westerlund for their technical comments to improve this
+ document.
+
+ Also, the author would like to acknowledge that some parts of RFC
+ 3267 [4] and RFC 3558 [11] have been used in this document.
+
+
+
+
+
+Ahmadi Standards Track [Page 29]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+12. References
+
+12.1. Normative References
+
+ [1] 3GPP2 C.S0052-0 v1.0 "Source-Controlled Variable-Rate Multimode
+ Wideband Speech Codec (VMR-WB) Service Option 62 for Spread
+ Spectrum Systems", 3GPP2 Technical Specification, July 2004.
+
+ [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
+ "RTP: A Transport Protocol for Real-Time Applications", STD 64,
+ RFC 3550, July 2003.
+
+ [4] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-
+ Time Transport Protocol (RTP) Payload Format and File Storage
+ Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate
+ Wideband (AMR-WB) Audio Codecs", RFC 3267, June 2002.
+
+ [5] Handley, M. and V. Jacobson, "SDP: Session Description
+ Protocol", RFC 2327, April 1998.
+
+ [6] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
+ Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
+
+12.2. Informative References
+
+ [7] 3GPP2 C.S0050-A v1.0 "3GPP2 File Formats for Multimedia
+ Services", 3GPP2 Technical Specification, September 2005.
+
+ [8] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
+ Generic Forward Error Correction", RFC 2733, December 1999.
+
+ [9] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
+ 3711, March 2004.
+
+ [10] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
+ Bolot, J., Vega-Garcia, A., and S. Fosse-Parisis, "RTP Payload
+ for Redundant Audio Data", RFC 2198, September 1997.
+
+ [11] Li, A., "RTP Payload Format for Enhanced Variable Rate Codecs
+ (EVRC) and Selectable Mode Vocoders (SMV)", RFC 3558, July 2003.
+
+ [12] 3GPP TS 26.193 "AMR Wideband Speech Codec; Source Controlled
+ Rate operation", version 5.0.0 (2001-03), 3rd Generation
+ Partnership Project (3GPP).
+
+
+
+Ahmadi Standards Track [Page 30]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+ [13] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
+ Session Description Protocol (SDP)", RFC 3264, June 2002.
+
+ [14] Handley, M., Perkins, C., and E. Whelan, "Session Announcement
+ Protocol", RFC 2974, October 2000.
+
+ [15] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
+ Protocol (RTSP)", RFC 2326, April 1998.
+
+ Any 3GPP2 document can be downloaded from the 3GPP2 web server,
+ "http://www.3gpp2.org/", see specifications.
+
+Author's Address
+
+ Dr. Sassan Ahmadi
+ EMail: sassan.ahmadi@ieee.org
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 31]
+
+RFC 4348 VMR-WB RTP Payload Format January 2006
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2006).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78, and except as set forth therein, the authors
+ retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at
+ ietf-ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is provided by the IETF
+ Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+Ahmadi Standards Track [Page 32]
+