1 files changed, 1795 insertions, 0 deletions
diff --git a/doc/rfc/rfc4348.txt b/doc/rfc/rfc4348.txt
new file mode 100644
index 0000000..622ef5a
--- /dev/null
+++ b/doc/rfc/rfc4348.txt
@@ -0,0 +1,1795 @@
+
+
+
+
+
+
+Network Working Group                                          S. Ahmadi
+Request for Comments: 4348                                  January 2006
+Category: Standards Track
+
+
+       Real-Time Transport Protocol (RTP) Payload Format for the
+         Variable-Rate Multimode Wideband (VMR-WB) Audio Codec
+
+Status of This Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2006).
+
+Abstract
+
+   This document specifies a real-time transport protocol (RTP) payload
+   format to be used for the Variable-Rate Multimode Wideband (VMR-WB)
+   speech codec.  The payload format is designed to be able to
+   interoperate with existing VMR-WB transport formats on non-IP
+   networks.  A media type registration is included for VMR-WB RTP
+   payload format.
+
+   VMR-WB is a variable-rate multimode wideband speech codec that has a
+   number of operating modes, one of which is interoperable with AMR-WB
+   (i.e., RFC 3267) audio codec at certain rates.  Therefore, provisions
+   have been made in this document to facilitate and simplify data
+   packet exchange between VMR-WB and AMR-WB in the interoperable mode
+   with no transcoding function involved.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                     [Page 1]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+Table of Contents
+
+   1. Introduction ....................................................3
+   2. Conventions and Acronyms ........................................3
+   3. The Variable-Rate Multimode Wideband (VMR-WB) Speech Codec ......4
+      3.1. Narrowband Speech Processing ...............................5
+      3.2. Continuous vs. Discontinuous Transmission ..................6
+      3.3. Support for Multi-Channel Session ..........................6
+   4. Robustness against Packet Loss ..................................7
+      4.1. Forward Error Correction (FEC) .............................7
+      4.2. Frame Interleaving and Multi-Frame Encapsulation ...........8
+   5. VMR-WB Voice over IP Scenarios ..................................9
+      5.1. IP Terminal to IP Terminal .................................9
+      5.2. GW to IP Terminal .........................................10
+      5.3. GW to GW (between VMR-WB- and AMR-WB-Enabled Terminals) ...10
+      5.4. GW to GW (between Two VMR-WB-Enabled Terminals) ...........11
+   6. VMR-WB RTP Payload Formats .....................................12
+      6.1. RTP Header Usage ..........................................13
+      6.2. Header-Free Payload Format ................................14
+      6.3. Octet-Aligned Payload Format ..............................15
+           6.3.1. Payload Structure ..................................15
+           6.3.2. The Payload Header .................................15
+           6.3.3. The Payload Table of Contents ......................18
+           6.3.4. Speech Data ........................................20
+           6.3.5. Payload Example: Basic Single Channel
+                  Payload Carrying Multiple Frames ...................21
+      6.4. Implementation Considerations .............................22
+           6.4.1. Decoding Validation and Provision for Lost
+                  or Late Packets ....................................22
+   7. Congestion Control .............................................23
+   8. Security Considerations ........................................23
+      8.1. Confidentiality ...........................................24
+      8.2. Authentication and Integrity ..............................24
+   9. Payload Format Parameters ......................................24
+      9.1. VMR-WB RTP Payload MIME Registration ......................25
+      9.2. Mapping MIME Parameters into SDP ..........................27
+      9.3. Offer-Answer Model Considerations .........................28
+   10. IANA Considerations ...........................................29
+   11. Acknowledgements ..............................................29
+   12. References ....................................................30
+      12.1. Normative References .....................................30
+      12.2. Informative References ...................................30
+
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                     [Page 2]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+1.  Introduction
+
+   This document specifies the payload format for packetization of VMR-
+   WB-encoded speech signals into the Real-time Transport Protocol (RTP)
+   [3].  The VMR-WB payload formats support transmission of single and
+   multiple channels, frame interleaving, multiple frames per payload,
+   header-free payload, the use of mode switching, and interoperation
+   with existing VMR-WB transport formats on non-IP networks, as
+   described in Section 3.
+
+   The payload format is described in Section 6.  The VMR-WB file format
+   (i.e., for transport of VMR-WB speech data in storage mode
+   applications such as email) is specified in [7].  In Section 9, a
+   media type registration for VMR-WB RTP payload format is provided.
+
+   Since VMR-WB is interoperable with AMR-WB at certain rates, an
+   attempt has been made throughout this document to maximize the
+   similarities with RFC 3267 while optimizing the payload format for
+   the non-interoperable modes of the VMR-WB codec.
+
+2.  Conventions and Acronyms
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in RFC2119 [2].
+
+   The following acronyms are used in this document:
+
+    3GPP   - The Third Generation Partnership Project
+    3GPP2  - The Third Generation Partnership Project 2
+    CDMA   - Code Division Multiple Access
+    WCDMA  - Wideband Code Division Multiple Access
+    GSM    - Global System for Mobile Communications
+    AMR-WB - Adaptive Multi-Rate Wideband Codec
+    VMR-WB - Variable-Rate Multimode Wideband Codec
+    CMR    - Codec Mode Request
+    GW     - Gateway
+    DTX    - Discontinuous Transmission
+    FEC    - Forward Error Correction
+    SID    - Silence Descriptor
+    TrFO   - Transcoder-Free Operation
+    UDP    - User Datagram Protocol
+    RTP    - Real-Time Transport Protocol
+    RTCP   - RTP Control Protocol
+    MIME   - Multipurpose Internet Mail Extension
+    SDP    - Session Description Protocol
+    VoIP   - Voice-over-IP
+
+
+
+
+Ahmadi                      Standards Track                     [Page 3]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   The term "interoperable mode" in this document refers to VMR-WB mode
+   3, which is interoperable with AMR-WB codec modes 0, 1, and 2.
+
+   The term "non-interoperable modes" in this document refers to VMR-WB
+   modes 0, 1, and 2.
+
+   The term "frame-block" is used in this document to describe the
+   time-synchronized set of speech frames in a multi-channel VMR-WB
+   session.  In particular, in an N-channel session, a frame-block will
+   contain N speech frames, one from each of the channels, and all N
+   speech frames represent exactly the same time period.
+
+3.  The Variable-Rate Multimode Wideband (VMR-WB) Speech Codec
+
+   VMR-WB is the wideband speech-coding standard developed by Third
+   Generation Partnership Project 2 (3GPP2) for encoding/decoding
+   wideband/narrowband speech content in multimedia services in 3G CDMA
+   cellular systems [1].  VMR-WB is a source-controlled variable-rate
+   multimode wideband speech codec.  It has a number of operating modes,
+   where each mode is a tradeoff between voice quality and average data
+   rate.  The operating mode in VMR-WB (as shown in Table 2) is chosen
+   based on the traffic condition of the network and the desired quality
+   of service.  The desired average data rate (ADR) in each mode is
+   obtained by encoding speech frames at permissible rates (as shown in
+   Tables 1 and 3) compliant with CDMA2000 system, depending on the
+   instantaneous characteristics of input speech and the maximum and
+   minimum rate constraints imposed by the network operator.
+
+   While VMR-WB is a native CDMA codec complying with all CDMA system
+   requirements, it is further interoperable with AMR-WB [4,12] at
+   12.65, 8.85, and 6.60 kbps.  This is due to the fact that VMR-WB and
+   AMR-WB share the same core technology.  This feature enables
+   Transcoder-Free (TrFO) interconnections between VMR-WB and AMR-WB
+   across different wireless/wireline systems (e.g., GSM/WCDMA and
+   CDMA2000) without use of unnecessary complex media format conversion.
+
+   Note that the concept of mode in VMR-WB is different from that of
+   AMR-WB where each fixed-rate AMR-WB codec mode is adapted to
+   prevailing channel conditions by a tradeoff between the total number
+   of source-coding and channel-coding bits.
+
+   VMR-WB is able to transition between various modes with no
+   degradation in voice quality that is attributable to the mode
+   switching itself.  The operating mode of the VMR-WB encoder may be
+   switched seamlessly without prior knowledge of the decoder.  Any
+   non-interoperable mode (i.e., VMR-WB modes 0, 1, or 2) can be chosen
+   depending on the traffic conditions (e.g., network congestion) and
+   the desired quality of service.
+
+
+
+Ahmadi                      Standards Track                     [Page 4]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   While in the interoperable mode (i.e., VMR-WB mode 3), mode switching
+   between VMR-WB modes is not allowed because there is only one AMR-WB
+   interoperable mode in VMR-WB.  Since the AMR-WB codec may request a
+   mode change, depending on channel conditions, in-band data included
+   in VMR-WB frame structure (see Section 8 of [1] for more details) is
+   used during an interoperable interconnection to switch between VMR-WB
+   frame types 0, 1, and 2 in VMR-WB mode 3 (corresponding to AMR-WB
+   codec modes 0, 1, or 2).
+
+   As mentioned earlier, VMR-WB is compliant with CDMA2000 system with
+   the permissible encoding rates shown in Table 1.
+
+   +---------------------------+-----------------+---------------+
+   |        Frame Type         | Bits per Packet | Encoding Rate |
+   |                           |   (Frame Size)  |     (kbps)    |
+   +---------------------------+-----------------+---------------+
+   | Full-Rate                 |      266        |     13.3      |
+   | Half-Rate                 |      124        |      6.2      |
+   | Quarter-Rate              |       54        |      2.7      |
+   | Eighth-Rate               |       20        |      1.0      |
+   | Blank                     |        0        |       0       |
+   | Erasure                   |        0        |       0       |
+   +---------------------------+-----------------+---------------+
+
+     Table 1: CDMA2000 system permissible frame types and their
+              associated encoding rates
+
+   VMR-WB is robust to high percentage of frame loss and frames with
+   corrupted rate information.  The reception of an Erasure
+   (SPEECH_LOST) frame type at decoder invokes the built-in frame error
+   concealment mechanism.  The built-in frame error concealment
+   mechanism in VMR-WB conceals the effect of lost frames by exploiting
+   in-band data and the information available in the previous frames.
+
+3.1.  Narrowband Speech Processing
+
+   VMR-WB has the capability to operate with either 16000-Hz or 8000-Hz
+   sampled input/output speech signals in all modes of operation [1].
+   The VMR-WB decoder does not require a priori knowledge about the
+   sampling rate of the original media (i.e., speech/audio signals
+   sampled at 8 or 16 kHz) at the input of the encoder.  The VMR-WB
+   decoder, by default, generates 16000-Hz wideband output regardless of
+   the encoder input sampling frequency.  Depending on the application,
+   the decoder can be configured to generate 8000-Hz output, as well.
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                     [Page 5]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   Therefore, while this specification defines a 16000-Hz RTP clock rate
+   for VMR-WB codec, the injection and processing of 8000-Hz narrowband
+   media during a session is also allowed; however, a 16000-Hz RTP clock
+   rate MUST always be used.
+
+   The choice of VMR-WB output sampling frequency depends on the
+   implementation and the audio acoustic capabilities of the receiving
+   side.
+
+3.2.  Continuous vs. Discontinuous Transmission
+
+   The circuit-switched operation of VMR-WB within a CDMA network
+   requires continuous transmission of the speech data during a
+   conversation.  The intrinsic source-controlled variable-rate feature
+   of the CDMA speech codecs is required for optimal operation of the
+   CDMA system and interference control.  However, VMR-WB has the
+   capability to operate in a discontinuous transmission mode for some
+   packet-switched applications over IP networks (e.g., VoIP), where the
+   number of transmitted bits and packets during silence period are
+   reduced to a minimum.  The VMR-WB DTX operation is similar to that of
+   AMR-WB [4,12].
+
+3.3.  Support for Multi-Channel Session
+
+   The octet-aligned RTP payload format defined in this document
+   supports multi-channel audio content (e.g., a stereophonic speech
+   session).  Although VMR-WB codec itself does not support encoding of
+   multi-channel audio content into a single bit stream, it can be used
+   to encode and decode each of the individual channels separately.
+
+   To transport the separately encoded multi-channel content, the speech
+   frames for all channels that are framed and encoded for the same 20
+   ms periods are logically collected in a frame-block.
+
+   At the session setup, out-of-band signaling must be used to indicate
+   the number of channels in the session and the order of the speech
+   frames from different channels in each frame-block.  When using SDP
+   for signaling (see Section 9.2 for more details), the number of
+   channels is specified in the rtpmap attribute, and the order of
+   channels carried in each frame-block is implied by the number of
+   channels as specified in Section 4.1 in [6].
+
+
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                     [Page 6]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+4.  Robustness against Packet Loss
+
+   The octet-aligned payload format described in this document (see
+   Section 6 for more details) supports several features, including
+   forward error correction (FEC) and frame interleaving, in order to
+   increase robustness against lost packets.
+
+4.1.  Forward Error Correction (FEC)
+
+   The simple scheme of repetition of previously sent data is one way of
+   achieving FEC.  Another possible scheme, which is more bandwidth
+   efficient, is to use payload-external FEC; e.g., RFC2733 [8], which
+   generates extra packets containing repair data.
+
+   The repetition method involves the simple retransmission of
+   previously transmitted frame-blocks together with the current frame-
+   block(s).  This is done by using a sliding window to group the speech
+   frame-blocks to send in each payload.  Figure 1 illustrates an
+   example.
+
+   In this example, each frame-block is retransmitted one time in the
+   following RTP payload packet.  Here, f(n-2)..f(n+4) denotes a
+   sequence of speech frame-blocks, and p(n-1)..p(n+4) a sequence of
+   payload packets.
+
+   --+--------+--------+--------+--------+--------+--------+--------+--
+     | f(n-2) | f(n-1) |  f(n)  | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
+   --+--------+--------+--------+--------+--------+--------+--------+--
+
+     <---- p(n-1) ---->
+              <----- p(n) ----->
+                       <---- p(n+1) ---->
+                                <---- p(n+2) ---->
+                                         <---- p(n+3) ---->
+                                                  <---- p(n+4) ---->
+
+              Figure 1: An example of redundant transmission
+
+   The use of this approach does not require signaling at the session
+   setup.  In other words, the speech sender can choose to use this
+   scheme without consulting the receiver.  This is because a packet
+   containing redundant frames will not look different from a packet
+   with only new frames.  The receiver may receive multiple copies or
+   versions of a frame for a certain timestamp if no packet is lost.  If
+   multiple versions of the same speech frame are received, it is
+   RECOMMENDED that the highest rate be used by the speech decoder.
+
+
+
+
+
+Ahmadi                      Standards Track                     [Page 7]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   This redundancy scheme provides the same functionality as that
+   described in RFC 2198, "RTP Payload for Redundant Audio Data" [10].
+   In most cases, the mechanism in this payload format is more efficient
+   and simpler than requiring both endpoints to support RFC 2198.  If
+   the spread in time required between the primary and redundant
+   encodings is larger than 5 frame times, the bandwidth overhead of RFC
+   2198 will be lower.
+
+   The sender is responsible for selecting an appropriate amount of
+   redundancy based on feedback about the channel (e.g., in RTCP
+   receiver reports) or network traffic.  A sender SHOULD NOT base
+   selection of FEC on the CMR, as this parameter most probably was set
+   based on non-IP information.  The sender is also responsible for
+   avoiding congestion, which may be aggravated by redundant
+   transmission (see Section 7).
+
+4.2.  Frame Interleaving and Multi-Frame Encapsulation
+
+   To decrease protocol overhead, the octet-aligned payload format,
+   described in Section 6, allows several speech frame-blocks to be
+   encapsulated into a single RTP packet.  One of the drawbacks of this
+   approach is that in case of packet loss several consecutive speech
+   frame-blocks are lost, which usually causes clearly audible
+   distortion in the reconstructed speech.
+
+   Interleaving of frame-blocks can improve the speech quality in such
+   cases by distributing the consecutive losses into a series of single
+   frame-block losses.  However, interleaving and bundling several
+   frame-blocks per payload will also increase end-to-end delay and is
+   therefore not appropriate for all types of applications.  Streaming
+   applications will most likely be able to exploit interleaving to
+   improve speech quality in lossy transmission conditions.
+
+   The octet-aligned payload format supports the use of frame
+   interleaving as an option.  For the encoder (speech sender) to use
+   frame interleaving in its outbound RTP packets for a given session,
+   the decoder (speech receiver) needs to indicate its support via out-
+   of-band means (see Section 9).
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                     [Page 8]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+5.  VMR-WB Voice over IP Scenarios
+
+5.1.  IP Terminal to IP Terminal
+
+   The primary scenario for this payload format is IP end-to-end between
+   two terminals incorporating VMR-WB codec, as shown in Figure 2.
+   Nevertheless, this scenario can be generalized to an interoperable
+   interconnection between VMR-WB-enabled and AMR-WB-enabled IP
+   terminals using the offer-answer model described in Section 9.3.
+   This payload format is expected to be useful for both conversational
+   and streaming services.
+
+       +----------+                         +----------+
+       |          |                         |          |
+       | TERMINAL |<----------------------->| TERMINAL |
+       |          |    VMR-WB/RTP/UDP/IP    |          |
+       +----------+                         +----------+
+                     (or AMR-WB/RTP/UDP/IP)
+
+          Figure 2: IP terminal to IP terminal
+
+   A conversational service puts requirements on the payload format.
+   Low delay is a very important factor, i.e., fewer speech frame-blocks
+   per payload packet.  Low overhead is also required when the payload
+   format traverses across low bandwidth links, especially if the
+   frequency of packets will be high.
+
+   Streaming service has less strict real-time requirements and
+   therefore can use a larger number of frame-blocks per packet than
+   conversational service.  This reduces the overhead from IP, UDP, and
+   RTP headers.  However, including several frame-blocks per packet
+   makes the transmission more vulnerable to packet loss, so
+   interleaving may be used to reduce the effect of packet loss on
+   speech quality.  A streaming server handling a large number of
+   clients also needs a payload format that requires as few resources as
+   possible when doing packetization.
+
+   For VMR-WB-enabled IP terminals at both ends, depending on the
+   implementation, all modes of the VMR-WB codec can be used in this
+   scenario.  Also, both header-free and octet-aligned payload formats
+   (see Section 6 for details) can be utilized.  For the interoperable
+   interconnection between VMR-WB and AMR-WB, only VMR-WB mode 3 is
+   used, and all restrictions described in Section 9.3 apply.
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                     [Page 9]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+5.2.  GW to IP Terminal
+
+   Another scenario occurs when VMR-WB-encoded speech will be
+   transmitted from a non-IP system (e.g., 3GPP2/CDMA2000 network) to an
+   IP terminal, and/or vice versa, as depicted in Figure 3.
+
+       VMR-WB over
+   3GPP2/CDMA2000 network
+                      +------+                        +----------+
+                      |      |                        |          |
+      <-------------->|  GW  |<---------------------->| TERMINAL |
+                      |      |   VMR-WB/RTP/UDP/IP    |          |
+                      +------+                        +----------+
+                          |
+                          |           IP network
+                          |
+
+                   Figure 3: GW to VoIP terminal scenario
+
+   VMR-WB's capability to switch seamlessly between operational modes is
+   exploited in CDMA (non-IP) networks to optimize speech quality for a
+   given traffic condition.  To preserve this functionality in scenarios
+   including a gateway to an IP network using the octet-aligned payload
+   format, a codec mode request (CMR) field is considered.  The gateway
+   will be responsible for forwarding the CMR between the non-IP and IP
+   parts in both directions.  The IP terminal SHOULD follow the CMR
+   forwarded by the gateway to optimize speech quality going to the
+   non-IP decoder.  The mode control algorithm in the gateway SHOULD
+   accommodate the delay imposed by the IP network on the response to
+   CMR by the IP terminal.
+
+   The IP terminal SHOULD NOT set the CMR (see Section 6.3.2), but the
+   gateway can set the CMR value on frames going toward the encoder in
+   the non-IP part to optimize speech quality from that encoder to the
+   gateway and to perform congestion control on the IP network.
+
+5.3.  GW to GW (between VMR-WB- and AMR-WB-Enabled Terminals)
+
+   A third likely scenario is that RTP/UDP/IP is used as transport
+   between two non-IP systems, i.e., IP is originated and terminated in
+   gateways on both sides of the IP transport, as illustrated in Figure
+   4.  This is the most likely scenario for an interoperable
+   interconnection between 3GPP/(GSM-WCDMA)/AMR-WB and
+   3GPP2/CDMA2000/VMR-WB-enabled mobile stations.  In this scenario, the
+   VMR-WB-enabled terminal also declares itself capable of AMR-WB with
+   restricted mode set as described in Section 9.3. The CMR value may be
+   set in packets received by the gateways on the IP network side.  The
+   gateway should forward to the non-IP side a CMR value that is the
+
+
+
+Ahmadi                      Standards Track                    [Page 10]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   minimum of three values: (1) the CMR value it receives on the IP
+   side; (2) a CMR value it may choose for congestion control of
+   transmission on the IP side; and (3) the CMR value based on its
+   estimate of reception quality on the non-IP side.  The details of the
+   traffic control algorithm are left to the implementation.
+
+      VMR-WB over                                       AMR-WB over
+   3GPP2/CDMA2000 network                      3GPP/(GSM-WCDMA) network
+
+                     +------+                  +------+
+    (AMR-WB Payload) |      | AMR-WB/RTP/UDP/IP|      |(AMR-WB Payload)
+   <---------------->|  GW  |<---------------->|  GW  |<--------------->
+                     |      |                  |      |
+                     +------+                  +------+
+                        |        IP network       |
+                        |                         |
+
+               Figure 4: GW to GW scenario (AMR-WB <-> VMR-WB
+                      interoperable interconnection)
+
+   During and upon initiation of an interoperable interconnection
+   between VMR-WB and AMR-WB, only VMR-WB mode 3 can be used.  There are
+   three Frame Types (i.e., FT=0, 1, or 2; see Table 3) within this mode
+   that are compatible with AMR-WB codec modes 0, 1, and 2,
+   respectively.  If the AMR-WB codec is engaged in an interoperable
+   interconnection with VMR-WB, the active AMR-WB codec mode set needs
+   to be limited to 0, 1, and 2.
+
+5.4.  GW to GW (between Two VMR-WB-Enabled Terminals)
+
+   The fourth example VoIP scenario is composed of a RTP/UDP/IP
+   transport between two non-IP systems; i.e., IP is originated and
+   terminated in gateways on both sides of the IP transport, as
+   illustrated in Figure 5.  This is the most likely scenario for
+   Mobile-Station-to-Mobile-Station (MS-to-MS) Transcoder-Free (TrFO)
+   interconnection between two 3GPP2/CDMA2000 terminals that both use
+   VMR-WB codec.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 11]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+        VMR-WB over                                     VMR-WB over
+   3GPP2/CDMA2000 network                         3GPP2/CDMA2000 network
+
+                      +------+                   +------+
+                      |      |                   |      |
+        <------------>|  GW  |<----------------->|  GW  |<------------>
+                      |      | VMR-WB/RTP/UDP/IP |      |
+                      +------+                   +------+
+                          |         IP network       |
+                          |                          |
+
+        Figure 5: GW to GW scenario (a CDMA2000 MS-to-MS VoIP scenario)
+
+6.  VMR-WB RTP Payload Formats
+
+   For a given session, the payload format can be either header free or
+   octet aligned, depending on the mode of operation that is established
+   for the session via out-of-band means and the application.
+
+   The header-free payload format is designed for maximum bandwidth
+   efficiency, simplicity, and low latency.  Only one codec data frame
+   can be sent in each header-free payload format packet.  None of the
+   payload header fields or table of contents (ToC) entries is present
+   (the same consideration is also made in [11]).
+
+   In the octet-aligned payload format, all the fields in a payload,
+   including payload header, table of contents entries, and speech
+   frames themselves, are individually aligned to octet boundaries to
+   make implementations efficient.
+
+   Note that octet alignment of a field or payload means that the last
+   octet is padded with zeroes in the least significant bits to fill the
+   octet.  Also note that this padding is separate from padding
+   indicated by the P bit in the RTP header.
+
+   Between the two payload formats, only the octet-aligned format has
+   the capability to use the interleaving to make the speech transport
+   robust to packet loss.
+
+   The VMR-WB octet-aligned payload format in the interoperable mode is
+   identical to that of AMR-WB (i.e., RFC 3267).
+
+
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 12]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+6.1.  RTP Header Usage
+
+   The format of the RTP header is specified in [3].  This payload
+   format uses the fields of the header in a manner consistent with that
+   specification.
+
+   The RTP timestamp corresponds to the sampling instant of the first
+   sample encoded for the first frame-block in the packet.  The
+   timestamp clock frequency is the same as the default sampling
+   frequency (i.e., 16 kHz), so the timestamp unit is in samples.
+
+   The duration of one speech frame-block is 20 ms for VMR-WB.  For
+   normal wideband operation of VMR-WB, the input/output media sampling
+   frequency is 16 kHz, corresponding to 320 samples per frame from each
+   channel.  Thus, the timestamp is increased by 320 for VMR-WB for each
+   consecutive frame-block.
+
+   The VMR-WB codec is capable of processing speech/audio signals
+   sampled at 8 kHz.  By default, the VMR-WB decoder output sampling
+   frequency is 16 kHz.  Depending on the application, the decoder can
+   be configured to generate 8-kHz output sampling frequency, as well.
+   Since the VMR-WB RTP payload formats for the 8- and 16-kHz sampled
+   media are identical and the VMR-WB decoder does not need a priori
+   knowledge about the encoder input sampling frequency, a fixed RTP
+   clock rate of 16000 Hz is defined for VMR-WB codec.  This would allow
+   injection or processing of 8-kHz sampled speech/audio media without
+   having to change the RTP clock rate during a session.  Note that the
+   timestamp is incremented by 320 per frame-block for 8-kHz sampled
+   media, as well.
+
+   A packet may contain multiple frame-blocks of encoded speech or
+   comfort noise parameters.  If interleaving is employed, the frame-
+   blocks encapsulated into a payload are picked according to the
+   interleaving rules defined in Section 6.3.2. Otherwise, each packet
+   covers a period of one or more contiguous 20-ms frame-block
+   intervals.  In case the data from all the channels for a particular
+   frame-block in the period is missing (for example, at a gateway from
+   some other transport format), it is possible to indicate that no data
+   is present for that frame-block instead of breaking a multi-frame-
+   block packet into two, as explained in Section 6.3.2.
+
+   No matter which payload format is used, the RTP payload is always
+   made an integral number of octets long by padding with zero bits if
+   necessary.  If additional padding is required to bring the payload
+   length to a larger multiple of octets or for some other purpose, then
+   the P bit in the RTP header MAY be set, and padding appended, as
+   specified in [3].
+
+
+
+
+Ahmadi                      Standards Track                    [Page 13]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   The RTP header marker bit (M) SHALL be always set to 0 if the VMR-WB
+   codec operates in continuous transmission.  When operating in
+   discontinuous transmission (DTX), the RTP header marker bit SHALL be
+   set to 1 if the first frame-block carried in the packet contains a
+   speech frame, which is the first in a talkspurt.  For all other
+   packets, the marker bit SHALL be set to zero (M=0).
+
+   The assignment of an RTP payload type for this payload format is
+   outside the scope of this document and will not be specified here.
+   It is expected that the RTP profile under which this payload format
+   is being used will assign a payload type for this encoding or specify
+   that the payload type is to be bound dynamically (see Section 9).
+
+6.2.  Header-Free Payload Format
+
+   The header-free payload format is designed for maximum bandwidth
+   efficiency, simplicity, and minimum delay.  Only one speech data
+   frame presents in each header-free payload format packet.  None of
+   the payload header fields or ToC entries is present.  The encoding
+   rate for the speech frame can be determined from the length of the
+   speech data frame, since there is only one speech data frame in each
+   header-free payload format.
+
+   The use of the RTP header fields for header-free payload format is
+   the same as the corresponding one for the octet-aligned payload
+   format.  The detailed bit mapping of speech data packets permissible
+   for this payload format is described in Section 8 of [1].  Since the
+   header-free payload format is not compatible with AMR-WB RTP payload,
+   only non-interoperable modes of VMR-WB SHALL be used with this
+   payload format.  That is, FT=0, 1, 2, and 9 SHALL NOT be used with
+   header-free payload format.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |                      RTP Header [3]                           |
+   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+   |                                                               |
+   +          ONLY one speech data frame           +-+-+-+-+-+-+-+-+
+   |                                               |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+   Note that the mode of operation, using this payload format, is
+   decided by the transmitting (encoder) site.  The default mode of
+   operation for VMR-WB encoder is mode 0 [1].  The mode change request
+   MAY also be sent through non-RTP means, which is out of the scope of
+   this specification.
+
+
+
+
+Ahmadi                      Standards Track                    [Page 14]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+6.3.  Octet-Aligned Payload Format
+
+6.3.1.  Payload Structure
+
+   The complete payload consists of a payload header, a payload table of
+   contents, and speech data representing one or more speech frame-
+   blocks.  The following diagram shows the general payload format
+   layout:
+
+   +----------------+-------------------+----------------
+   | Payload header | Table of contents | Speech data ...
+   +----------------+-------------------+----------------
+
+6.3.2.  The Payload Header
+
+   In octet-aligned payload format, the payload header consists of a
+   4-bit CMR, 4 reserved bits, and, optionally, an 8-bit interleaving
+   header, as shown below.
+
+    0                   1
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+   +-+-+-+-+-+-+-+-+- - - - - - - -
+   |  CMR  |R|R|R|R|  ILL  |  ILP  |
+   +-+-+-+-+-+-+-+-+- - - - - - - -
+
+   CMR (4 bits): This indicates a codec mode request sent to the speech
+   encoder at the site of the receiver of this payload.  CMR value 15
+   indicates that no mode request is present, and other unused values
+   are reserved for future use.
+
+   The value of the CMR field is set according to the following table:
+
+   +-------+----------------------------------------------------------+
+   | CMR   |                 VMR-WB Operating Modes                   |
+   +-------+----------------------------------------------------------+
+   |   0   | VMR-WB mode 3 (AMR-WB interoperable mode at 6.60 kbps)   |
+   |   1   | VMR-WB mode 3 (AMR-WB interoperable mode at 8.85 kbps)   |
+   |   2   | VMR-WB mode 3 (AMR-WB interoperable mode at 12.65 kbps)  |
+   |   3   | VMR-WB mode 2                                            |
+   |   4   | VMR-WB mode 1                                            |
+   |   5   | VMR-WB mode 0                                            |
+   |   6   | VMR-WB mode 2 with maximum half-rate encoding            |
+   | 7-14  | (reserved)                                               |
+   |  15   | No Preference (no mode request is present)               |
+   +-------+----------------------------------------------------------+
+
+     Table 2: List of valid CMR values and their associated VMR-WB
+              operating modes
+
+
+
+Ahmadi                      Standards Track                    [Page 15]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   R: This is a reserved bit that MUST be set to zero.  The receiver
+   MUST ignore all R bits.
+
+   ILL (4 bits, unsigned integer): This is an OPTIONAL field that is
+   present only if interleaving is signaled out-of-band for the session.
+   ILL=L indicates to the receiver that the interleaving length is L+1,
+   in number of frame-blocks.
+
+   ILP (4 bits, unsigned integer): This is an OPTIONAL field that is
+   present only if interleaving is signaled.  ILP MUST take a value
+   between 0 and ILL, inclusive, indicating the interleaving index for
+   frame-blocks in this payload in the interleave group.  If the value
+   of ILP is found greater than ILL, the payload SHOULD be discarded.
+
+   ILL and ILP fields MUST be present in each packet in a session if
+   interleaving is signaled for the session.
+
+   The mode request received in the CMR field is valid until the next
+   CMR is received, i.e., until a newly received CMR value overrides the
+   previous one.  Therefore, if a terminal continuously wishes to
+   receive frames in the same mode, x, it needs to set CMR=x for all its
+   outbound payloads, and if a terminal has no preference in which mode
+   to receive, it SHOULD set CMR=15 in all its outbound payloads.
+
+   If a payload is received with a CMR value that is not valid, the CMR
+   MUST be ignored by the receiver.
+
+   In a multi-channel session, CMR SHOULD be interpreted by the receiver
+   of the payload as the desired encoding mode for all the channels in
+   the session, if the network allows.
+
+   There are two factors that affect the VMR-WB mode selection: (i) the
+   performance of any CDMA link connected via a gateway (e.g., in a GW
+   to IP terminal scenario), and (ii) the congestion state of an IP
+   network.  The CDMA link performance is signaled via the CMR field,
+   which is not used by IP-only end-points.  The IP network state is
+   monitored using, for example, RTCP.  A sender needs to select the
+   operating mode to satisfy both these constraints (see Section 7).
+
+   The encoder SHOULD follow a received mode request, but MAY change to
+   a different mode if the network necessitates it, for example, to
+   control congestion.
+
+   The CMR field MUST be set to 15 for packets sent to a multicast
+   group.  The encoder in the speech sender SHOULD ignore mode requests
+   when sending speech to a multicast session but MAY use RTCP feedback
+   information as a hint that a mode change is needed.
+
+
+
+
+Ahmadi                      Standards Track                    [Page 16]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   If interleaving option is utilized, interleaving MUST be performed on
+   a frame-block basis, as opposed to a frame basis, in a multi-channel
+   session.
+
+   The following example illustrates the arrangement of speech frame-
+   blocks in an interleave group during an interleave session.  Here we
+   assume ILL=L for the interleave group that starts at speech frame-
+   block n.  We also assume that the first payload packet of the
+   interleave group is s and the number of speech frame-blocks carried
+   in each payload is N.  Then we will have
+
+    Payload s (the first packet of this interleave group):
+      ILL=L, ILP=0,
+
+    Carry frame-blocks: n, n+(L+1), n+2*(L+1),..., n+(N-1)*(L+1)
+
+    Payload s+1 (the second packet of this interleave group):
+      ILL=L, ILP=1,
+      Carry frame-blocks: n+1, n+1+(L+1), n+1+2*(L+1),..., n+1+
+      (N-1)*(L+1)
+
+        ...
+
+    Payload s+L (the last packet of this interleave group):
+      ILL=L, ILP=L,
+      Carry frame-blocks: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+
+      (N-1)*(L+1)
+
+   The next interleave group will start at frame-block n+N*(L+1).  There
+   will be no interleaving effect unless the number of frame-blocks per
+   packet (N) is at least 2.  Moreover, the number of frame-blocks per
+   payload (N) and the value of ILL MUST NOT be changed inside an
+   interleave group.  In other words, all payloads in an interleave
+   group MUST have the same ILL and MUST contain the same number of
+   speech frame-blocks.
+
+   The sender of the payload MUST only apply interleaving if the
+   receiver has signaled its use through out-of-band means.  Since
+   interleaving will increase buffering requirements at the receiver,
+   the receiver uses MIME parameter "interleaving=I" to set the maximum
+   number of frame-blocks allowed in an interleaving group to I.
+
+   When performing interleaving, the sender MUST use a proper number of
+   frame-blocks per payload (N) and ILL so that the resulting size of an
+   interleave group is less than or equal to I, i.e., N*(L+1)<=I.
+
+   The following example shows the ToC of three consecutive packets,
+   each carrying 3 frame-blocks, in an interleaved two-channel session.
+
+
+
+Ahmadi                      Standards Track                    [Page 17]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   Here, the two channels are left (L) and right (R), with L coming
+   before R, and the interleaving length is 3 (i.e., ILL=2).  This makes
+   the interleave group 9 frame-blocks large.
+
+   Packet #1
+   ---------
+
+   ILL=2, ILP=0:
+   +----+----+----+----+----+----+
+   | 1L | 1R | 4L | 4R | 7L | 7R |
+   +----+----+----+----+----+----+
+   |<------->|<------->|<------->|
+      Frame     Frame     Frame
+     Block 1   Block 4   Block 7
+
+   Packet #2
+   ---------
+
+   ILL=2, ILP=1:
+
+   +----+----+----+----+----+----+
+   | 2L | 2R | 5L | 5R | 8L | 8R |
+   +----+----+----+----+----+----+
+   |<------->|<------->|<------->|
+      Frame     Frame     Frame
+     Block 2   Block 5   Block 8
+
+   Packet #3
+   ---------
+
+   ILL=2, ILP=2:
+   +----+----+----+----+----+----+
+   | 3L | 3R | 6L | 6R | 9L | 9R |
+   +----+----+----+----+----+----+
+   |<------->|<------->|<------->|
+         Frame     Frame     Frame
+        Block 3   Block 6   Block 9
+
+6.3.3.  The Payload Table of Contents
+
+   The table of contents (ToC) in octet-aligned payload format consists
+   of a list of ToC entries where each entry corresponds to a speech
+   frame carried in the payload, i.e., when interleaving is used, the
+   frame-blocks in the ToC will almost never be placed consecutive in
+   time.  Instead, the presence and order of the frame-blocks in a
+   packet will follow the pattern described in 6.3.2.
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 18]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   +---------------------+
+   | list of ToC entries |
+   +---------------------+
+
+   A ToC entry for the octet-aligned payload format is as follows:
+
+    0 1 2 3 4 5 6 7
+   +-+-+-+-+-+-+-+-+
+   |F|  FT   |Q|P|P|
+   +-+-+-+-+-+-+-+-+
+
+   The table of contents (ToC) consists of a list of ToC entries, each
+   representing a speech frame.
+
+   F (1 bit):   If set to 1, indicates that this frame is followed by
+                another speech frame in this payload; if set to 0,
+                indicates that this frame is the last frame in this
+                payload.
+
+   FT (4 bits): Frame type index whose value is chosen according to
+                Table 3.
+
+                During the interoperable mode, FT=14 (SPEECH_LOST) and
+                FT=15 (NO_DATA) are used to indicate frames that are
+                either lost or not being transmitted in this payload,
+                respectively.  FT=14 or 15 MAY be used in the non-
+                interoperable modes to indicate frame erasure or blank
+                frame, respectively (see Section 2.1 of [1]).
+
+                If a payload with an invalid FT value is received, the
+                payload MUST be discarded.  Note that for ToC entries
+                with FT=14 or 15, there will be no corresponding speech
+                frame in the payload.
+
+                Depending on the application and the mode of operation
+                of VMR-WB, any combination of the permissible frame
+                types (FT) shown in Table 3 MAY be used.
+
+   Q (1 bit):   Frame quality indicator.  If set to 0, indicates that
+                the corresponding frame is corrupted.  During the
+                interoperable mode, the receiver side (with AMR-WB
+                codec) should set the RX_TYPE to either SPEECH_BAD or
+                SID_BAD depending on the frame type (FT), if Q=0.  The
+                VMR-WB encoder always sets Q bit to 1.  The VMR-WB
+                decoder may ignore the Q bit.
+
+   P bits:      Padding bits MUST be set to zero and MUST be ignored by
+                a receiver.
+
+
+
+Ahmadi                      Standards Track                    [Page 19]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   +----+--------------------------------------------+-----------------+
+   | FT |                Encoding Rate               |Frame Size (Bits)|
+   +----+--------------------------------------------+-----------------+
+   | 0  | Interoperable Full-Rate (AMR-WB 6.60 kbps) |       132       |
+   | 1  | Interoperable Full-Rate (AMR-WB 8.85 kbps) |       177       |
+   | 2  | Interoperable Full-Rate (AMR-WB 12.65 kbps)|       253       |
+   | 3  | Full-Rate 13.3 kbps                        |       266       |
+   | 4  | Half-Rate 6.2 kbps                         |       124       |
+   | 5  | Quarter-Rate 2.7 kbps                      |        54       |
+   | 6  | Eighth-Rate 1.0 kbps                       |        20       |
+   | 7  | (reserved)                                 |         -       |
+   | 8  | (reserved)                                 |         -       |
+   | 9  | CNG (AMR-WB SID)                           |        40       |
+   | 10 | (reserved)                                 |         -       |
+   | 11 | (reserved)                                 |         -       |
+   | 12 | (reserved)                                 |         -       |
+   | 13 | (reserved)                                 |         -       |
+   | 14 | Erasure (AMR-WB SPEECH_LOST)               |         0       |
+   | 15 | Blank (AMR-WB NO_DATA)                     |         0       |
+   +----+--------------------------------------------+-----------------+
+
+      Table 3: VMR-WB payload frame types for real-time transport
+
+   For multi-channel sessions, the ToC entries of all frames from a
+   frame-block are placed in the ToC in consecutive order.  Therefore,
+   with N channels and K speech frame-blocks in a packet, there MUST be
+   N*K entries in the ToC, and the first N entries will be from the
+   first frame-block, the second N entries will be from the second
+   frame-block, and so on.
+
+6.3.4.  Speech Data
+
+   Speech data of a payload contains one or more speech frames as
+   described in the ToC of the payload.
+
+   Each speech frame represents 20 ms of speech encoded in one of the
+   available encoding rates depending on the operation mode.  The length
+   of the speech frame is defined by the frame type in the FT field,
+   with the following considerations:
+
+   - The last octet of each speech frame MUST be padded with zeroes at
+     the end if not all bits in the octet are used.  In other words,
+     each speech frame MUST be octet-aligned.
+
+   - When multiple speech frames are present in the speech data, the
+     speech frames MUST be arranged one whole frame after another.
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 20]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   The order and numbering notation of the speech data bits are as
+   specified in the VMR-WB standard specification [1].
+
+   The payload begins with the payload header of one octet, or two if
+   frame interleaving is selected.  The payload header is followed by
+   the table of contents consisting of a list of one-octet ToC entries.
+
+   The speech data follows the table of contents.  For the purpose of
+   packetization, all the octets comprising a speech frame are appended
+   to the payload as a unit.  The speech frames are packed in the same
+   order as their corresponding ToC entries are arranged in the ToC
+   list, with the exception that if a given frame has a ToC entry with
+   FT=14 or 15, there will be no data octets present for that frame.
+
+6.3.5.  Payload Example: Basic Single Channel Payload Carrying Multiple
+        Frames
+
+   The following diagram shows an octet-aligned payload format from a
+   single channel session that carries two VMR-WB Full-Rate frames
+   (FT=3).  In the payload, a codec mode request is sent (e.g., CMR=4),
+   requesting that the encoder at the receiver's side use VMR-WB mode 1.
+   No interleaving is used.  Note that in the example below the last
+   octet in both speech frames is padded with zeros to make them octet
+   aligned.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | CMR=4 |R|R|R|R|1|FT#1=3 |Q|P|P|0|FT#2=3 |Q|P|P|   f1(0..7)    |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |   f1(8..15)   |  f1(16..23)   |  ...                          |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   : ...                                                           :
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   | r |P|P|P|P|P|P|  f2(0..7)     |   f2(8..15)   |  f2(16..23)   |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   : ...                                                           :
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |                        ...    | l |P|P|P|P|P|P|
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+      r= f1(264,265)
+      l= f2(264,265)
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 21]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+6.4.  Implementation Considerations
+
+   An application implementing this payload format MUST understand all
+   the payload parameters.  Any mapping of the parameters to a signaling
+   protocol MUST support all parameters.  Therefore, an implementation
+   of this payload format in an application using SDP is required to
+   understand all the payload parameters in their SDP-mapped form.  This
+   requirement ensures that an implementation always can decide whether
+   it is capable of communicating.
+
+   To enable efficient interoperable interconnection with AMR-WB and to
+   ensure that a VMR-WB terminal appropriately declares itself as a
+   AMR-WB-capable terminal (see Section 9.3), it is also RECOMMENDED
+   that a VMR-WB RTP payload implementation understand relevant AMR-WB
+   signaling.
+
+   To further ensure interoperability between various implementations of
+   VMR-WB, implementations SHALL support both header-free and octet-
+   aligned payload formats.  Support of interleaving is optional.
+
+6.4.1.  Decoding Validation and Provision for Lost or Late Packets
+
+   When processing a received payload packet, if the receiver finds that
+   the calculated payload length, based on the information of the
+   session and the values found in the payload header fields, does not
+   match the size of the received packet, the receiver SHOULD discard
+   the packet to avoid potential degradation of speech quality and to
+   invoke the VMR-WB built-in frame error concealment mechanism.
+   Therefore, invalid packets SHALL be treated as lost packets.
+
+   Late packets (i.e., the unavailability of a packet when it is needed
+   for decoding at the receiver) should be treated as lost packets.
+   Furthermore, if the late packet is part of an interleave group,
+   depending upon the availability of the other packets in that
+   interleave group, decoding must be resumed from the next available
+   frame (sequential order).  In other words, the unavailability of a
+   packet in an interleave group at a certain time should not invalidate
+   the other packets within that interleave group that may arrive later.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 22]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+7.  Congestion Control
+
+   The general congestion control considerations for transporting RTP
+   data apply to VMR-WB speech over RTP as well.  However, the multimode
+   capability of VMR-WB speech codec may provide an advantage over other
+   payload formats for controlling congestion since the bandwidth demand
+   can be adjusted by selecting a different operating mode.
+
+   Another parameter that may impact the bandwidth demand for VMR-WB is
+   the number of frame-blocks that are encapsulated in each RTP payload.
+   Packing more frame-blocks in each RTP payload can reduce the number
+   of packets sent and hence the overhead from RTP/UDP/IP headers, at
+   the expense of increased delay.
+
+   If forward error correction (FEC) is used to alleviate the packet
+   loss, the amount of redundancy added by FEC will need to be regulated
+   so that the use of FEC itself does not cause a congestion problem.
+
+   Congestion control for RTP SHALL be used in accordance with RFC 3550
+   [3] and any applicable RTP profile, for example, RFC 3551 [6].  This
+   means that congestion control is required for any transmission over
+   unmanaged best-effort networks.
+
+   Congestion on the IP network is managed by the IP sender.  Feedback
+   about congestion SHOULD be provided to that IP sender through RTCP or
+   other means, and then the sender can choose to avoid congestion using
+   the most appropriate mechanism.  That may include selecting an
+   appropriate operating mode, but also includes adjusting the level of
+   redundancy or number of frames per packet.
+
+8.  Security Considerations
+
+   RTP packets using the payload format defined in this specification
+   are subject to the general security considerations discussed in RTP
+   [3] and any applicable profile such as AVP [9] or SAVP [10].
+
+   As this format transports encoded audio, the main security issues
+   include confidentiality, integrity protection, and data origin
+   authentication of the audio itself.  The payload format itself does
+   not have any built-in security mechanisms.  Any suitable external
+   mechanisms, such as SRTP [10], MAY be used.
+
+   This payload format and the VMR-WB decoder do not exhibit any
+   significant non-uniformity in the receiver-side computational
+   complexity for packet processing; thus, they are unlikely to pose a
+   denial-of-service threat due to the receipt of pathological data.
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 23]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+8.1.  Confidentiality
+
+   In order to ensure confidentiality of the encoded audio, all audio
+   data bits MUST be encrypted.  There is less need to encrypt the
+   payload header or the table of contents since they only carry
+   information about the frame type.  This information could also be
+   useful to a third party, for example, for quality monitoring.
+
+   The use of interleaving in conjunction with encryption can have a
+   negative impact on the confidentiality for a short period of time.
+   Consider the following packets (in brackets) containing frame numbers
+   as indicated: {10, 14, 18}, {13, 17, 21}, {16, 20, 24} (a typical
+   continuous diagonal interleaving pattern).  The originator wishes to
+   deny some participants the ability to hear material starting at time
+   16.  Simply changing the key on the packet with the timestamp at or
+   after 16, and denying the new key to those participants, does not
+   achieve this; frames 17, 18, and 21 have been supplied in prior
+   packets under the prior key, and error concealment may make the audio
+   intelligible at least as far as frame 18 or 19, and possibly further.
+
+8.2.  Authentication and Integrity
+
+   To authenticate the sender of the speech, an external mechanism MUST
+   be used.  It is RECOMMENDED that such a mechanism protects both the
+   complete RTP header and the payload (speech and data bits).
+
+   Data tampering by a man-in-the-middle attacker could replace audio
+   content and also result in erroneous depacketization/decoding that
+   could lower the audio quality.  For example, tampering with the CMR
+   field may result in speech of a different quality than desired.
+
+9.  Payload Format Parameters
+
+   This section defines the parameters that may be used to select
+   optional features in the VMR-WB RTP payload formats.
+
+   The parameters are defined here as part of the MIME subtype
+   registration for the VMR-WB speech codec.  A mapping of the
+   parameters into the Session Description Protocol (SDP) [5] is also
+   provided for those applications that use SDP.  In control protocols
+   that do not use MIME or SDP, the media type parameters must be mapped
+   to the appropriate format used with that control protocol.
+
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 24]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+9.1.  VMR-WB RTP Payload MIME Registration
+
+   The MIME subtype for the Variable-Rate Multimode Wideband (VMR-WB)
+   audio codec is allocated from the IETF tree since VMR-WB is expected
+   to be a widely used speech codec in multimedia streaming and
+   messaging as well as in VoIP applications.  This MIME registration
+   only covers real-time transfers via RTP.
+
+   Note, the receiver MUST ignore any unspecified parameter and use the
+   default values instead.  Also note that if no input parameters are
+   defined, the default values will be used.
+
+     Media Type name:      audio
+
+     Media subtype name:   VMR-WB
+
+     Required parameters:  none
+
+   Furthermore, if the interleaving parameter is present, the parameter
+   "octet-align=1" MUST also be present.
+
+OPTIONAL parameters:
+
+  mode-set:       Requested VMR-WB operating mode set.  Restricts
+                  the active operating modes to a subset of all
+                  modes.  Possible values are a comma-separated
+                  list of integer values.  Currently, this list
+                  includes modes 0, 1, 2, and 3 [1], but MAY be
+                  extended in the future.  If such mode-set is
+                  specified during session initiation, the encoder
+                  MUST NOT use modes outside of the subset.  If not
+                  present, all operating modes in the set 0 to 3 are
+                  allowed for the session.
+
+  channels:       The number of audio channels.  The possible
+                  values and their respective channel order
+                  is specified in Section 4.1 in [6].  If
+                  omitted, it has the default value of 1.
+
+  octet-align:    RTP payload format; permissible values are 0 and
+                  1.  If 1, octet-aligned payload format SHALL be
+                  used.  If 0 or if not present, header-free payload
+                  format is employed (default).
+
+  maxptime:       See RFC 3267 [4]
+
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 25]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+  interleaving:   Indicates that frame-block level
+                  interleaving SHALL be used for the session.
+                  Its value defines the maximum number of
+                  frame-blocks allowed in an interleaving
+                  group (see Section 6.3.1).  If this
+                  parameter is not present, interleaving
+                  SHALL NOT be used.  The presence of this
+                  parameter also implies automatically that
+                  octet-aligned operation SHALL be used.
+
+  ptime:          See RFC2327 [5].  It SHALL be at least one
+                  frame size for VMR-WB.
+
+  dtx:            Permissible values are 0 and 1.  The default
+                  is 0 (i.e., No DTX) where VMR-WB normally
+                  operates as a continuous variable-rate
+                  codec.  If dtx=1, the VMR-WB codec will
+                  operate in discontinuous transmission mode
+                  where silence descriptor (SID) frames are
+                  sent by the VMR-WB encoder during silence
+                  intervals with an adjustable update
+                  frequency.  The selection of the SID update-rate
+                  depends on the implementation and
+                  other network considerations that are
+                  beyond the scope of this specification.
+
+   Encoding considerations:
+
+          This type is only defined for transfer of VMR-WB-encoded data
+          via RTP (RFC 3550) using the payload formats specified in
+          Section 6 of RFC 4348.
+
+   Security considerations:
+
+          See Section 8 of RFC 4348.
+
+   Public specification:
+
+          The VMR-WB speech codec is specified in
+          3GPP2 specifications C.S0052-0 version 1.0.
+          Transfer methods are specified in RFC 4348.
+
+   Additional information:
+
+   Person & email address to contact for further information:
+
+          Sassan Ahmadi, Ph.D.        sassan.ahmadi@ieee.org
+
+
+
+
+Ahmadi                      Standards Track                    [Page 26]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   Intended usage: COMMON.
+
+     It is expected that many VoIP, multimedia messaging and
+     streaming applications (as well as mobile applications)
+     will use this type.
+
+   Author/Change controller:
+
+     IETF Audio/Video Transport working group delegated from the IESG
+
+9.2.  Mapping MIME Parameters into SDP
+
+   The information carried in the MIME media type specification has a
+   specific mapping to fields in the Session Description Protocol (SDP)
+   [5], which is commonly used to describe RTP sessions.  When SDP is
+   used to specify sessions employing the VMR-WB codec, the mapping is
+   as follows:
+
+      - The media type ("audio") goes in SDP "m=" as the media name.
+
+      - The media subtype (payload format name) goes in SDP "a=rtpmap"
+        as the encoding name.  The RTP clock rate in "a=rtpmap" MUST be
+        16000 for VMR-WB.
+
+      - The parameter "channels" (number of channels) MUST be either
+        explicitly set to N or omitted, implying a default value of 1.
+        The values of N that are allowed is specified in Section 4.1 in
+        [6].  The parameter "channels", if present, is specified
+        subsequent to the MIME subtype and RTP clock rate as an encoding
+        parameter in the "a=rtpmap" attribute.
+
+      - The parameters "ptime" and "maxptime" go in the SDP "a=ptime"
+        and
+           "a=maxptime" attributes, respectively.
+
+      - Any remaining parameters go in the SDP "a=fmtp" attribute by
+        copying them directly from the MIME media type string as a
+        semicolon-separated list of parameter=value pairs.
+
+   Some examples of SDP session descriptions utilizing VMR-WB encodings
+   follow.
+
+   Example of usage of VMR-WB in a possible VoIP scenario (wideband
+   audio):
+
+      m=audio 49120 RTP/AVP 98
+      a=rtpmap:98 VMR-WB/16000
+      a=fmtp:98 octet-align=1
+
+
+
+Ahmadi                      Standards Track                    [Page 27]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   Example of usage of VMR-WB in a possible streaming scenario (two
+   channel stereo):
+
+      m=audio 49120 RTP/AVP 99
+      a=rtpmap:99 VMR-WB/16000/2
+      a=fmtp:99 octet-align=1; interleaving=30
+      a=maxptime:100
+
+9.3.  Offer-Answer Model Considerations
+
+   To achieve good interoperability for the VMR-WB RTP payload in an
+   Offer-Answer negotiation usage in SDP [13], the following
+   considerations are made:
+
+   - The rate, channel, and payload configuration parameters (octet-
+     align and interleaving) SHALL be used symmetrically, i.e., offer
+     and answer must use the same values.  The maximum size of the
+     interleaving buffer is, however, declarative, and each agent
+     specifies the value it supports to receive for recvonly and
+     sendrecv streams.  For sendonly streams, the value indicates what
+     the agent desires to use.
+
+   - To maintain interoperability among all implementations of VMR-WB
+     that may or may not support all the codec's modes of operation, the
+     operational modes that are supported by an implementation MAY be
+     identified at session initiation.  The mode-set parameter is
+     declarative, and only operating modes that have been indicated to
+     be supported by both ends SHALL be used.  If the answerer is not
+     supporting any of the operating modes provided in the offer, the
+     complete payload type declaration SHOULD be rejected by removing it
+     from the answer.
+
+   - The remaining parameters are all declarative; i.e., for sendonly
+     streams they provide parameters that the agent desires to use,
+     while for recvonly and sendrecv streams they declare the parameters
+     that it accepts to receive.  The dtx parameter is used to indicate
+     DTX support and capability, while the media sender is only
+     RECOMMENDED to send using the DTX in these cases.  If DTX is not
+     supported by the media sender, it will send media without DTX; this
+     will not affect interoperability only the resource consumption.
+
+   - Both header-free and octet-aligned payload format configurations
+     MAY be offered by a VMR-WB enabled terminal.  However, for an
+     interoperable interconnection with AMR-WB, only octet-aligned
+
+   - The parameters "maxptime" and "ptime" should in most cases not
+     affect the interoperability; however, the setting of the parameters
+     can affect the performance of the application.
+
+
+
+Ahmadi                      Standards Track                    [Page 28]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   - To maintain interoperability with AMR-WB in cases where negotiation
+     is possible using the VMR-WB interoperable mode, a VMR-WB-enabled
+     terminal SHOULD also declare itself capable of AMR-WB with limited
+     mode set (i.e., only AMR-WB codec modes 0, 1, and 2 are allowed)
+     and of octet-align mode of operation.
+
+   Example:
+
+                m=audio 49120 RTP/AVP 98 99
+                a=rtpmap:98 VMR-WB/16000
+                a=rtpmap:99 AMR-WB/16000
+                a=fmtp:99 octet-align=1; mode-set=0,1,2
+
+   An example of offer-answer exchange for the VoIP scenario described
+   in Section 5.3 is as follows:
+
+       CDMA2000 terminal -> WCDMA terminal Offer:
+                m=audio 49120 RTP/AVP 98 97
+                a=rtpmap:98 VMR-WB/16000
+                a=fmtp:98 octet-align=1
+                a=rtpmap:97 AMR-WB/16000
+                a=fmtp:97 mode-set=0,1,2; octet-align=1
+
+       WCDMA terminal -> CDMA2000 terminal Answer:
+                m=audio 49120 RTP/AVP 97
+                a=rtpmap:97 AMR-WB/16000
+                a=fmtp:97 mode-set=0,1,2; octet-align=1;
+
+   For declarative use of SDP such as in SAP [14] and RTSP [15], all
+   parameters are declarative and provide the parameters that SHALL be
+   used when receiving and/or sending the configured stream.
+
+10.  IANA Considerations
+
+   The IANA has registered one new MIME subtype (audio/VMR-WB); see
+   Section 9.
+
+11.  Acknowledgements
+
+   The author would like to thank Redwan Salami of VoiceAge Corporation,
+   Ari Lakaniemi of Nokia Inc., and IETF/AVT chairs Colin Perkins and
+   Magnus Westerlund for their technical comments to improve this
+   document.
+
+   Also, the author would like to acknowledge that some parts of RFC
+   3267 [4] and RFC 3558 [11] have been used in this document.
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 29]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+12.  References
+
+12.1.  Normative References
+
+   [1]  3GPP2 C.S0052-0 v1.0 "Source-Controlled Variable-Rate Multimode
+        Wideband Speech Codec (VMR-WB) Service Option 62 for Spread
+        Spectrum Systems", 3GPP2 Technical Specification, July 2004.
+
+   [2]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
+        Levels", BCP 14, RFC 2119, March 1997.
+
+   [3]  Schulzrinne, H.,  Casner, S., Frederick, R., and V. Jacobson,
+        "RTP: A Transport Protocol for Real-Time Applications", STD 64,
+        RFC 3550, July 2003.
+
+   [4]  Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, "Real-
+        Time Transport Protocol (RTP) Payload Format and File Storage
+        Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate
+        Wideband (AMR-WB) Audio Codecs", RFC 3267, June 2002.
+
+   [5]  Handley, M. and V. Jacobson, "SDP: Session Description
+        Protocol", RFC 2327, April 1998.
+
+   [6]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
+        Conferences with Minimal Control", STD 65, RFC 3551, July 2003.
+
+12.2.  Informative References
+
+   [7]  3GPP2 C.S0050-A v1.0 "3GPP2 File Formats for Multimedia
+        Services", 3GPP2 Technical Specification, September 2005.
+
+   [8]  Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
+        Generic Forward Error Correction", RFC 2733, December 1999.
+
+   [9]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+        Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
+        3711, March 2004.
+
+   [10] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
+        Bolot, J., Vega-Garcia, A., and S. Fosse-Parisis, "RTP Payload
+        for Redundant Audio Data", RFC 2198, September 1997.
+
+   [11] Li, A., "RTP Payload Format for Enhanced Variable Rate Codecs
+        (EVRC) and Selectable Mode Vocoders (SMV)", RFC 3558, July 2003.
+
+   [12] 3GPP TS 26.193 "AMR Wideband Speech Codec; Source Controlled
+        Rate operation", version 5.0.0 (2001-03), 3rd Generation
+        Partnership Project (3GPP).
+
+
+
+Ahmadi                      Standards Track                    [Page 30]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+   [13] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
+        Session Description Protocol (SDP)", RFC 3264, June 2002.
+
+   [14] Handley, M., Perkins, C., and E. Whelan, "Session Announcement
+        Protocol", RFC 2974, October 2000.
+
+   [15] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
+        Protocol (RTSP)", RFC 2326, April 1998.
+
+   Any 3GPP2 document can be downloaded from the 3GPP2 web server,
+   "http://www.3gpp2.org/", see specifications.
+
+Author's Address
+
+   Dr. Sassan Ahmadi
+   EMail: sassan.ahmadi@ieee.org
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 31]
+
+RFC 4348               VMR-WB RTP Payload Format            January 2006
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2006).
+
+   This document is subject to the rights, licenses and restrictions
+   contained in BCP 78, and except as set forth therein, the authors
+   retain all their rights.
+
+   This document and the information contained herein are provided on an
+   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+   The IETF takes no position regarding the validity or scope of any
+   Intellectual Property Rights or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; nor does it represent that it has
+   made any independent effort to identify any such rights.  Information
+   on the procedures with respect to rights in RFC documents can be
+   found in BCP 78 and BCP 79.
+
+   Copies of IPR disclosures made to the IETF Secretariat and any
+   assurances of licenses to be made available, or the result of an
+   attempt made to obtain a general license or permission for the use of
+   such proprietary rights by implementers or users of this
+   specification can be obtained from the IETF on-line IPR repository at
+   http://www.ietf.org/ipr.
+
+   The IETF invites any interested party to bring to its attention any
+   copyrights, patents or patent applications, or other proprietary
+   rights that may cover technology that may be required to implement
+   this standard.  Please address the information to the IETF at
+   ietf-ipr@ietf.org.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is provided by the IETF
+   Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+Ahmadi                      Standards Track                    [Page 32]
+