doc: Add RFC documents

author: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committer: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit: 4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree: e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc5686.txt
parent: ea76e11061bda059ae9f9ad130a9895cc85607db (diff)
1 files changed, 1179 insertions, 0 deletions
diff --git a/doc/rfc/rfc5686.txt b/doc/rfc/rfc5686.txt
new file mode 100644
index 0000000..5382f79
--- /dev/null
+++ b/doc/rfc/rfc5686.txt
@@ -0,0 +1,1179 @@
+
+
+
+
+
+
+Network Working Group                                        Y. Hiwasaki
+Request for Comments: 5686                                     H. Ohmuro
+Category: Standards Track                                NTT Corporation
+                                                            October 2009
+
+
+     RTP Payload Format for mU-law EMbedded Codec for Low-delay IP
+                  Communication (UEMCLIP) Speech Codec
+
+Abstract
+
+   This document describes the RTP payload format of a mU-law EMbedded
+   Coder for Low-delay IP communication (UEMCLIP), an enhanced speech
+   codec of ITU-T G.711.  The bitstream has a scalable structure with an
+   embedded u-law bitstream, also known as PCMU, thus providing a handy
+   transcoding operation between narrowband and wideband speech.
+
+Status of This Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (c) 2009 IETF Trust and the persons identified as the
+   document authors.  All rights reserved.
+
+   This document is subject to BCP 78 and the IETF Trust's Legal
+   Provisions Relating to IETF Documents
+   (http://trustee.ietf.org/license-info) in effect on the date of
+   publication of this document.  Please review these documents
+   carefully, as they describe your rights and restrictions with respect
+   to this document.  Code Components extracted from this document must
+   include Simplified BSD License text as described in Section 4.e of
+   the Trust Legal Provisions and are provided without warranty as
+   described in the BSD License.
+
+   This document may contain material from IETF Documents or IETF
+   Contributions published or made publicly available before November
+   10, 2008.  The person(s) controlling the copyright in some of this
+   material may not have granted the IETF Trust the right to allow
+   modifications of such material outside the IETF Standards Process.
+   Without obtaining an adequate license from the person(s) controlling
+   the copyright in such materials, this document may not be modified
+   outside the IETF Standards Process, and derivative works of it may
+
+
+
+Hiwasaki & Ohmuro           Standards Track                     [Page 1]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   not be created outside the IETF Standards Process, except to format
+   it for publication as an RFC or to translate it into languages other
+   than English.
+
+Table of Contents
+
+   1. Introduction ....................................................2
+      1.1. Terminology ................................................3
+   2. Media Format Background .........................................3
+   3. Payload Format ..................................................5
+      3.1. RTP Header Usage ...........................................6
+      3.2. Multiple Frames in an RTP Packet ...........................6
+      3.3. Payload Data ...............................................7
+           3.3.1. Main Header .........................................7
+           3.3.2. Sub-Layer ..........................................10
+   4. Transcoding between UEMCLIP and G.711 ..........................11
+   5. Congestion Control Considerations ..............................12
+   6. Payload Format Parameters ......................................13
+      6.1. Media Type Registration ...................................13
+      6.2. Mapping to SDP Parameters .................................14
+           6.2.1. Mode Specification .................................15
+      6.3. Offer-Answer Model Considerations .........................16
+           6.3.1. Offer-Answer Guidelines ............................16
+           6.3.2. Examples ...........................................17
+   7. Security Considerations ........................................19
+   8. IANA Considerations ............................................19
+   9. References .....................................................19
+      9.1. Normative References ......................................19
+      9.2. Informative References ....................................20
+
+1.  Introduction
+
+   This document specifies the payload format for sending UEMCLIP-
+   encoded (mU-law EMbedded Coder for Low-delay IP communication) speech
+   using the Real-time Transport Protocol (RTP) [RFC3550].  UEMCLIP is a
+   proprietary codec that enhances u-law ITU-T G.711 [ITU-T-G.711] and
+   that is designed to help the market for smooth transition towards the
+   forthcoming wideband communication environment while achieving a very
+   small media transcoding load with the existing terminals, in which
+   the implementation of G.711 is mandatory.
+
+   It should be noted that, generally speaking, codecs are negotiated
+   and changed using an SDP exchange.  Also, [RFC3550] defines general
+   RTP mixer and translator models, where media transcoding may not take
+   place at the node.  For those cases, the design concept of the
+   embedded structure is not useful.  However, there are other cases
+   when costly transcoding is unavoidable in commonly deployed types of
+   Multi-point Control Units (MCUs), which terminate media and RTCP
+
+
+
+Hiwasaki & Ohmuro           Standards Track                     [Page 2]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   packets [RFC5117], and when narrowband and wideband terminals
+   coexist.  This embedded bitstream structure can reduce the media
+   transcoding to a simple bitstream truncation.
+
+   The background and the basic idea of the media format is described in
+   Section 2.  The details of the payload format are given in Section 3.
+   The transcoding issues with G.711 are discussed in Section 4, and the
+   considerations for congestion control are in Section 5.  In
+   Section 6, the payload format parameters for a media type
+   registration for UEMCLIP RTP payload format and Session Description
+   Protocol (SDP) mappings are provided.  The security considerations
+   and IANA considerations are dealt with in Section 7 and Section 8,
+   respectively.
+
+1.1.  Terminology
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in [RFC2119].
+
+2.  Media Format Background
+
+   UEMCLIP is an enhanced version of u-law ITU-T G.711, otherwise known
+   as PCMU [RFC4856].  It is targeted at Voice over Internet Protocol
+   (VoIP) applications, and its main goal is to provide a wideband
+   communication platform that is highly interoperable with existing
+   terminals equipped with G.711 and to stimulate the market to
+   gradually shift to using wideband communication.  In widely deployed
+   multi-point conferencing systems, the packets usually go through
+   RTCP-terminating (RTP Control Protocol) MCUs, "Topo-RTCP-terminating-
+   MCU" as defined in [RFC5117].  Because the G.711 bitstream is
+   embedded in the bitstream, costly media transcoding can be avoided in
+   this case.
+
+   This document does not discuss the implementation details of the
+   encoder and decoder, but only describes the bitstream format.
+
+   Because of its scalable nature, there are a number of sub-bitstreams
+   (sub-layer) in a UEMCLIP bitstream.  By choosing appropriate sub-
+   layers, the codec can adapt to the following requirements:
+
+   o  Sampling frequency,
+
+   o  Number of channels,
+
+   o  Speech quality, and
+
+   o  Bit-rate.
+
+
+
+Hiwasaki & Ohmuro           Standards Track                     [Page 3]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   The UEMCLIP codec operates at a 20-ms frame, and includes three sub-
+   coders as shown in Table 1.  The core layer is u-law G.711 at 64
+   kbit/s, and other two are quality and bandwidth enhancement layers
+   with bit-rate of 16 kbit/s each.
+
+   +-------+---------------------+----------+--------------------------+
+   | Layer | Description         | Bit-rate | Coding algorithm         |
+   +-------+---------------------+----------+--------------------------+
+   |   a   | G.711 core          |       64 | u-law PCM                |
+   |       |                     |          |                          |
+   |   b   | Lower-band          |       16 | Time domain block        |
+   |       | enhancement         |          | quantization             |
+   |       |                     |          |                          |
+   |   c   | Higher-band         |       16 | MDCT block quantization  |
+   +-------+---------------------+----------+--------------------------+
+
+                      Table 1: Sub-Layer Description
+
+   Based on these sub-layers, the UEMCLIP codec operates in four modes
+   as shown in Table 2.  Here, "Ch" is the number of channels and "Fs"
+   is the sampling frequency in kHz.  It should be noted that the
+   current version only supports single-channel operation and there
+   might be future extensions with multi-channel capabilities.  The
+   absent Modes 2 and 5 are reserved for possible future extension to 32
+   kHz sampling modes.  As the mode definition is expected to grow, any
+   other modes not defined in this table MUST NOT be used for
+   compatibility and interoperability reasons.
+
+   +------+----+----+-------+-------+-------+-------------+------------+
+   | Mode | Ch | Fs | Layer | Layer | Layer |    Bit-rate |      Total |
+   |      |    |    |   a   |   b   |   c   | w/o headers |   bit-rate |
+   |      |    |    |       |       |       |    [kbit/s] |   [kbit/s] |
+   +------+----+----+-------+-------+-------+-------------+------------+
+   |   0  |  1 |  8 |   x   |   -   |   -   |          64 |       67.2 |
+   |      |    |    |       |       |       |             |            |
+   |   1  |  1 | 16 |   x   |   -   |   x   |          80 |       84.0 |
+   |      |    |    |       |       |       |             |            |
+   |   2  |  - |  - |   -   |   -   |   -   |           - |          - |
+   |      |    |    |       |       |       |             |            |
+   |   3  |  1 |  8 |   x   |   x   |   -   |          80 |       84.0 |
+   |      |    |    |       |       |       |             |            |
+   |   4  |  1 | 16 |   x   |   x   |   x   |          96 |      100.8 |
+   |      |    |    |       |       |       |             |            |
+   |   5  |  - |  - |   -   |   -   |   -   |           - |          - |
+   +------+----+----+-------+-------+-------+-------------+------------+
+
+                         Table 2: Mode Description
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                     [Page 4]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   The UEMCLIP bitstream contains internal headers and other side-
+   information apart from the layer data.  This results in total bit-
+   rate larger than the sum of the layers shown in the above table.  The
+   detail of the internal headers and auxiliary information are
+   described in Section 3.3.1.
+
+   Defining the sampling frequency and the number of channels does not
+   result in a singular mode, i.e., there can be multiple modes for the
+   same sampling frequency or number of channels.  The supported modes
+   would differ between implementations; thus, the sender and the
+   receiver must negotiate what mode to use for transmission.
+
+3.  Payload Format
+
+   As an RTP payload, the UEMCLIP bitstream can contain one or more
+   frames as shown in Figure 1.
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                      RTP Header                               |
+    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+    |                                                               |
+    |                 one or more frames of UEMCLIP                 |
+    |                                                               |
+    +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+
+                       Figure 1: RTP Payload Format
+
+   The UEMCLIP bitstream has a scalable structure; thus, it is possible
+   to reconstruct the signal by decoding a part of it.  A UEMCLIP frame
+   is composed of a main header (MH) followed by one or more (up to
+   three) sub-layers (SLs) as shown in Figure 2.
+
+                            +--+-------+//-+
+                            |MH| SL #1 |...|
+                            +--+-------+//-+
+
+               Figure 2: A UEMCLIP Frame (Bitstream Format)
+
+   As a sub-layer, the core layer, i.e., "Layer a", MUST always be
+   included.  It should be noted that the location of the core layer may
+   or may not immediately follow MH field.  The decoder MUST always
+   refer to the layer indices for proper decoding because the order of
+   the sub-layers is arbitrary.
+
+
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                     [Page 5]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   The UEMCLIP bitstream does not explicitly include the following
+   information: mode and sampling frequency (Fs).  As described before,
+   this information MUST be exchanged while establishing a connection,
+   for example, by means of SDP.
+
+3.1.  RTP Header Usage
+
+   Each RTP packet starts with a fixed RTP header, as explained in
+   [RFC3550].  The following fields of the RTP fixed header used
+   specifically for UEMCLIP streams are emphasized:
+
+   Payload type:  The assignment of an RTP payload type for this packet
+      format is outside the scope of this document; however, it is
+      expected that a payload type in the dynamic range shall be
+      assigned.
+
+   Timestamp:  This encodes the sampling instant of the first speech
+      signal sample in the RTP data packet.  For UEMCLIP streams, the
+      RTP timestamp MUST advance based on a clock either at 8000 or
+      16000 (Hz).  In cases where the audio sampling rate can change
+      during a session, the RTP timestamp rate MUST be equal to the
+      maximum rate (in Hz) given in the mode range (see Section 6.2.1).
+      This implies that the RTP timestamp rate for UEMCLIP payload type
+      MUST NOT change during a session.  For example, for a UEMCLIP
+      stream with 8-kHz audio sampling, where a transition to a 16-kHz
+      audio sampling mode is allowed, the RTP time stamp must always
+      advance using the 16-kHz clock rate.  For a fixed audio sampling
+      mode, the RTP timestamp rate should be either 8 or 16 kHz,
+      depending on the sampling rate.
+
+   Marker bit:  If the codec is used for applications with discontinuous
+      transmission (DTX, or silence compression), the first packet after
+      a silence period during which packets have not been transmitted
+      contiguously SHOULD have the marker bit in the RTP data header set
+      to one.  The marker bit in all other packets MUST be zero.
+      Applications without DTX MUST set the marker bit to zero.
+
+3.2.  Multiple Frames in an RTP Packet
+
+   More than one UEMCLIP frame may be included in a single RTP packet by
+   a sender.  However, senders have the following additional
+   restrictions:
+
+   o  A single RTP packet SHOULD NOT include more UEMCLIP frames than
+      will fit in the path MTU.
+
+   o  All frames contained in a single RTP packet MUST be of the same
+      mode.
+
+
+
+Hiwasaki & Ohmuro           Standards Track                     [Page 6]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   o  Frames MUST NOT be split between RTP packets.
+
+   It is RECOMMENDED that the number of frames contained within an RTP
+   packet be consistent with the application.  Since UEMCLIP is designed
+   for telephony applications where delay has a great impact on the
+   quality, then fewer frames per packet for lower delay, is preferable.
+
+3.3.  Payload Data
+
+   In a UEMCLIP bitstream, all numbers are encoded in a network byte
+   order.
+
+3.3.1.  Main Header
+
+   The main header (MH) is placed at the top of a frame and has a size
+   of 6 bytes.  The content of the main header is shown in Figure 3.
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |      MX       |                      PC                       |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |          PC(cont'd)           |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                 Figure 3: UEMCLIP Main Header Format (MH)
+
+   Mixing information (MX):  8 bits
+
+      Mixing information field.  This field is only relevant when Topo-
+      RTCP-terminating-MCUs are utilized to interpret these fields.  See
+      Section 3.3.1.1 for details of the fields.
+
+   Packet-loss Concealment information (PC):  40 bits
+
+      Packet-loss concealment (PLC) information field.  See
+      Section 3.3.1.2.
+
+3.3.1.1.  Mixing Information Field
+
+                            0 1 2 3 4 5 6 7
+                           +-+-+-+-+-+-+-+-+
+                           |C|R|V|   PW1   |
+                           |1|1|1|         |
+                           +-+-+-+-+-+-+-+-+
+
+                  Figure 4: Mixing Information Field (MX)
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                     [Page 7]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   Check bit #1 (C1):  1 bit
+
+      Validity flag of V1 and PW1.  This bit being "1" indicates that
+      both parameters are valid, and "0" indicates that the parameters
+      should be ignored.  If any of these parameters is invalid, this
+      bit should be set to "0".  This flag is mainly intended for a
+      UEMCLIP-conscious Topo-RTCP-terminating-MCU.  This flag should be
+      set to "0" in case of upward transcoding from G.711 (see
+      Section 4).
+
+   Reserved bit #1 (R1):  1 bit
+
+      This bit should be ignored.  The default of this bit is 0.
+
+   VAD flag #1 (V1):  1 bit
+
+      Voice activity detection flag of the current frame, designed to be
+      used for MCU operations.  This flag being "1" indicates that the
+      frame is an active (voice) segment, and "0" indicates that it is
+      an inactive (non-voice) or a silent segment.  This flag is
+      specifically designed for mixing information.  DTX judgment based
+      this flag is not recommended.
+
+   Power #1 (PW1):  5 bits
+
+      Signal power code of the current frame.  The code is obtained by
+      calculating a root mean square (RMS) of "Layer a" and encoding
+      this RMS using G.711 u-law [ITU-T-G.711].  Denoting the encoded
+      RMS as R, then PW1 is obtained by PW1 = ((~R)>>2) & 0x1F, where
+      "~", ">>", "&" are one's complement arithmetic, right SHIFT, and
+      bitwise AND operators, respectively.
+
+3.3.1.2.  PLC Information Field
+
+    0                   1                   2                   3
+    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |C|R2 |V|   K   |U|     P1      |U|     P2      |      PW2      |
+   |2|   |2|       |1|             |2|             |               |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |      R3       |
+   |               |
+   +-+-+-+-+-+-+-+-+
+
+                   Figure 5: PLC Information Field (PC)
+
+
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                     [Page 8]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   Check bit #2 (C2):  1 bit
+
+      Validity flag of V2, K, U1, P1, U2, P2, and PW2.  If the flag is
+      "1", it means that all these parameters are valid, and "0" means
+      that the parameters should be ignored.  If any of these parameters
+      is invalid, this bit should be set to "0".  Similarly to C1, this
+      flag should be set to "0" in case of upward transcoding from G.711
+      (see Section 4).
+
+   Reserved bit #2 (R2):  2 bits
+
+      These bits should be ignored.  The default of these bits are 0.
+
+   VAD flag #2 (V2):  1 bit
+
+      Voice activity detection flag of the current frame, designed to be
+      used for packet-loss concealment.  This might not be the same as
+      V1 in the mixing information, and might not be synchronous to the
+      marker bit in the RTP header.  DTX judgment based this flag is not
+      recommended.
+
+   Frame indicator (K):  4 bits
+
+      This value indicates the frame offset of U2, P2, and PW2.  Since
+      it is a better idea to carry the speech feature parameters as PLC
+      information in a different frame to maintain the speech quality,
+      this frame offset value gives with which frame the parameters are
+      to be associated.  The value ranges between "0" and "15".  If the
+      current frame number is N, for example, the value K indicates that
+      U2, P2, and PW2 are associated with the frame of N-K.  The frame
+      indicator is equal to the difference in the RTP sequence number
+      when one UEMCLIP frame is contained in a single RTP packet.
+
+   V/UV flag #1 (U1):  1 bit
+
+      Voiced/Unvoiced signal indicator of the current frame.  This flag
+      being "0" indicates that the frame is a voiced signal segment, and
+      "1" indicates that it is an unvoiced signal segment.
+
+   Pitch lag #1 (P1):  7 bits
+
+      Pitch code of the current frame.  The actual pitch lag is
+      calculated as P1+20 samples in 8-kHz sampling rate.  Pitch lag
+      must be 20 <= pitch length <= 120.  Codes ranging between "0x65"
+      and "0x7F" are not used.  To obtain the pitch lag, any pitch
+      estimation method can be used, such as the one used in G.711
+      Appendix I [ITU-T-G.711Appendix1].
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                     [Page 9]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   V/UV flag #2 (U2):  1 bit
+
+      Voiced/Unvoiced signal indicator of the offset frame.  This flag
+      being "0" indicates that the frame is a voiced signal segment, and
+      "1" indicates that it is an unvoiced signal segment.  The offset
+      value is defined as K.
+
+   Pitch lag #2 (P2):  7 bits
+
+      Pitch code of the offset frame.  The offset value is defined as K.
+      The calculation method is identical to "P1", except that it is
+      based on the signal of offset frame.
+
+   Power #2 (PW2):  8 bits
+
+      Signal power code of the offset frame.  The offset value is
+      defined as K.
+
+   Reserved bits #3 (R3):  8 bits
+
+      These bits should be ignored.  The default of all bits are "0".
+
+3.3.2.  Sub-Layer
+
+   Sub-layer (SL) is a sub-header followed by layer bitstreams, as shown
+   in Figure 6.  The sub-header indicates the layer location and the
+   number of bytes.
+
+     0                   1                   2
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7   . . .
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+
+    |CI |FI |QI |R4 |      SB       |               LD         ...  |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+//-+-+-+
+
+                      Figure 6: Sub-Layer Format (SL)
+
+   Channel index (CI):  2 bits
+
+      Indicates the channel number.  For all modes given in Table 2,
+      this should be "0".  The detail is given in Table 3.
+
+   Frequency index (FI):  2 bits
+
+      Indicates the frequency number. "0" means that the layer is in the
+      base frequency band, higher number means that the layer is in
+      respective frequency band.  The detail is given in Table 3.
+
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 10]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   Quality index (QI):  2 bits
+
+      Indicates the quality layer number. "0" means that the layer is in
+      the base layer, and higher number means that the layer is in
+      respective quality layer.  The detail is given in Table 3.
+
+   Reserved #4 (R4):  2 bits
+
+      Not used (reserved).  The default value is "0".
+
+   Sub-layer Size (SB):  8 bits
+
+      Indicates the byte size of the following sub-layer data.
+
+   Layer Data (LD):  SB*8 bits
+
+      The actual sub-layer data.
+
+   For all the layers shown in Table 1, the layer indices are shown in
+   Table 3.
+
+                         +-------+----+----+----+
+                         | Layer | CI | FI | QI |
+                         +-------+----+----+----+
+                         |   a   |  0 |  0 |  0 |
+                         |       |    |    |    |
+                         |   b   |  0 |  0 |  1 |
+                         |       |    |    |    |
+                         |   c   |  0 |  1 |  0 |
+                         +-------+----+----+----+
+
+                          Table 3: Layer Indices
+
+4.  Transcoding between UEMCLIP and G.711
+
+   As given in Section 2, the u-law-encoded G.711 bitstream (Layer a) is
+   the core layer of a UEMCLIP bitstream, and is always embedded.  This
+   means that media transcoding from the UEMCLIP bitstream to G.711 does
+   not have to undergo decoding and re-encoding procedures, but simple
+   extraction would suffice.  However, this does not apply for the
+   reverse procedure, i.e., transcoding from G.711 to UEMCLIP, because
+   the auxiliary information in the main header (MH) must be assigned
+   separately.  It should be noted that this media transcoding is useful
+   for a Media Translator (Topo-Media-Translator) or a Point-to-
+   Multipoint Using RTCP Terminating MCU (Topo-RTCP-terminating-MCU) in
+   [RFC5117], and all the requirements apply.  This means that a
+   transcoding device of this sort MUST rewrite RTCP packets, together
+   with the RTP media packets.
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 11]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   The transcoding from UEMCLIP to u-law G.711 can be done easily by
+   finding an appropriate sub-layer.  Within a frame, the transcoder
+   should look for a sub-layer with a layer index of "0x00", and
+   subsequent LD that has a size of SB*8 bits (UEMCLIP has a 20-ms frame
+   thus, SB=160) are the actual G.711 bitstream data.  It should be
+   noted that the transcoder should not always expect the core layer to
+   be located right after the main header.
+
+   On the other hand, the transcoding from G.711 to UEMCLIP is not
+   entirely straightforward.  Since there are no means to generate
+   enhancement sub-layers, a G.711 bitstream can only be converted to
+   UEMCLIP Mode 0 bitstream.  If the original G.711 bitstream is encoded
+   in A-law, it should first be converted to u-law to become the core
+   layer.  Because a UEMCLIP frame size is 20 ms, a u-law-encoded G.711
+   bitstream MUST be a 160-sample chunk to become a core layer.  For the
+   main header contents, when the UEMCLIP encoder is not available, it
+   should follow these guidelines:
+
+   o  The check bits for mixing and PLC (C1 and C2) are set to 0.
+
+   o  The reserved bits (R1 to R3) in MH are set to respective default
+      values.
+
+   For the core layer (i.e., u-law G.711 bitstream), it should have the
+   following sub-layer header:
+
+   o  All CI, FI, QI, and R4 MUST be 0.
+
+   o  Sub-layer size (SB) MUST be 160 for a 20-ms frame.
+
+5.  Congestion Control Considerations
+
+   The general congestion control considerations for transporting RTP
+   data also apply to UEMCLIP over RTP [RFC3550] as well as any
+   applicable RTP profile like Audio-Visual Profile (AVP) [RFC3551].
+
+   The bandwidth of a UEMCLIP bitstream can be reduced by changing to
+   lower-bit-rate modes.  The embedded layer structure of UEMCLIP may
+   help to control congestion, when dynamic mode changing (see
+   Section 6.2.1) is available, and the range of modes is obtained by
+   offer-answer negotiation as given in Section 6.3.  It should be noted
+   that this involves proper RTCP handling when the bit-rate is modified
+   in an RTP translator or a mixer [RFC3550].
+
+
+
+
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 12]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   Packing more frames in each RTP payload can reduce the number of
+   packets sent, and hence the overhead from IP/UDP/RTP headers, at the
+   expense of increased delay and reduced error robustness against
+   packet losses.  It should be treated with care because increased
+   delay means reduced quality.
+
+6.  Payload Format Parameters
+
+6.1.  Media Type Registration
+
+   This registration is done using the template defined in [RFC4288] and
+   following [RFC4855].
+
+   Media type name:  audio
+
+   Media subtype name:  UEMCLIP
+
+   Required parameters:
+
+      Rate:  Defines the sampling rate, and it MUST be either 8000 or
+         16000.  See Section 6.2.1 "Mode specification" of RFC 5686
+         (this RFC) for details.
+
+   Optional parameters:
+
+      ptime:  See RFC 4566 [RFC4566].
+
+      maxptime:  See RFC 4566 [RFC4566].
+
+      mode:  Indicates the range of dynamically changeable modes during
+         a session.  Possible values are a comma-separated list of modes
+         from the supported mode set: 0, 1, 3, and 4.  If only one mode
+         is specified, it means that the mode must not be changed during
+         the session.  When not specified, the mode transmission
+         defaults to a singular mode as specified in Table 4.  See
+         Section 6.2.1 "Mode specification" of RFC 5686 (this RFC) for
+         details.
+
+   Encoding considerations:  This media type is framed and contains
+      binary data.  See Section 4.8 of RFC 4288.
+
+   Security considerations:  See Section 7 "Security Considerations" of
+      RFC 5686 (this RFC).
+
+   Interoperability considerations:  This media may be readily
+      transcoded to u-law-encoded ITU-T G.711.  See Section 4
+      "Transcoding between UEMCLIP and G.711" of RFC 5686 (this RFC).
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 13]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   Published specification:  RFC 5686 (this RFC)
+
+   Applications that use this media type:  Audio and video streaming and
+      conferencing tools.
+
+   Additional information:  None
+
+   Intended usage:  COMMON
+
+   Restrictions on usage:  This media type depends on RTP framing, and
+      hence is only defined for transfer via RTP.
+
+   Person & email address to contact for further information:
+      Yusuke Hiwasaki <hiwasaki.yusuke@lab.ntt.co.jp>
+
+   Author:  Yusuke Hiwasaki
+
+   Change Controller:  IETF Audio/Video Transport Working Group
+      delegated from the IESG
+
+6.2.  Mapping to SDP Parameters
+
+   The media types audio/UEMCLIP are mapped to fields in the Session
+   Description Protocol (SDP) [RFC4566] as follows:
+
+   Media name:  The "m=" line of SDP MUST be audio.
+
+   Encoding name:  Registered media subtype name should be used for the
+      "a=rtpmap" line.
+
+   Sampling Frequency:  Depending on the mode, clock rate (sampling
+      frequency) specified in "a=rtpmap" MUST be selected from the ones
+      defined in Table 2.  See Section 6.2.1 for details.
+
+   Encoding parameters:  Since this is an audio stream, the encoding
+      parameters indicate the number of audio channels, and this SHOULD
+      default to "1", as selected from the ones defined in Table 2.
+      This is OPTIONAL.
+
+   Packet time:  A frame length of any UEMCLIP is 20 ms, thus the
+      argument of "a=ptime" SHOULD be a multiple of "20".  When not
+      listed in SDP, it should also default to the minimum size: "20".
+
+   UMECLIP specific:  Any description specific to UEMCLIP is defined in
+      the Format Specification Parameters ("a=fmtp").  Each parameter
+      MUST be separated with ";", and if any attribute (value) exists,
+      it MUST be defined with "=".  For compatibility reasons, any
+      application/terminal MUST ignore any parameters that it does not
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 14]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+      understand.  This is to ensure the upper-compatibility with
+      parameters added in future enhancements.  The mode specification
+      should be made here (see Section 6.2.1).
+
+6.2.1.  Mode Specification
+
+   Since UEMCLIP codec can operate in number of modes (bit-rates), it is
+   desirable to specify the range of modes at which an encoder or a
+   decoder can operate.  When exchanging SDP messages, an offerer should
+   specify all possible combinations of mode numbers as arguments to
+   "mode=" in "a=fmtp" line, delimited by commas ",".  In case of
+   specifying multiple modes, those SHOULD appear in the descending
+   priority order.
+
+   Although UEMCLIP decoders SHOULD accept bitstreams in any modes, an
+   implementation may fail to adapt to the dynamic mode changes during a
+   session.  For this reason, an application may choose to operate
+   either with one fixed mode or with multiple modes that can be
+   dynamically changed.  If the mode is to be fixed and changes are not
+   allowed, this can be indicated by specifying a single mode per
+   payload type.
+
+   The mode numbers that can be specified in a payload type as arguments
+   to "mode" are restricted by a combination of a clock rate and a
+   number of audio channels.  This is because SDP binds a payload type
+   to a combination of a sampling frequency and a number of audio
+   channels.  Table 4 gives selectable mode numbers that are attributed
+   with clock rates.  When mode specifications are not given at all, a
+   payload type MUST default to a single mode using the default value
+   specified in this table.
+
+        +------------+----------+------------------+--------------+
+        | Clock rate | Channels | Selectable modes | Default mode |
+        +------------+----------+------------------+--------------+
+        |       8000 |     1    |        0,3       |       0      |
+        |            |          |                  |              |
+        |      16000 |     1    |      0,1,3,4     |       1      |
+        +------------+----------+------------------+--------------+
+
+                          Table 4: Default Modes
+
+   It should be noted that a mode attributed with a larger sampling
+   frequency (Fs) is not used in conjunction with smaller clock rates
+   specified in "a=rtpmap".  This means that Modes 0 and 3 can be
+   specified in a payload type having a clock rate of both 8000 and
+   16000 in "a=rtpmap", but Modes 1 and 4 cannot be specified with one
+   having a clock rate of 8000.
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 15]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+6.3.  Offer-Answer Model Considerations
+
+6.3.1.  Offer-Answer Guidelines
+
+   The procedures related to exchanging SDP messages MUST follow
+   [RFC3264].  The following is a detailed list on the semantics of
+   using the UEMCLIP payload format in an offer-answer exchange.
+
+   o  An offerer SHOULD offer every possible combination of UEMCLIP
+      payload type it can handle, i.e., sampling frequency, channel
+      number, and fmtp parameters, in a preferred order.  When the
+      transmission bandwidth is restricted, it MUST be offered in
+      accordance to the restriction.
+
+   o  When multiple UEMCLIP payload types are offered, it is RECOMMENDED
+      that the answerer select a single UEMCLIP payload type and answer
+      it back.
+
+   o  In a UEMCLIP payload type, an answerer MUST answer back suitable
+      mode number(s) as a subset of what has been offered.  This means
+      that there is a symmetry assumption on sent and received streams,
+      and the offerer MUST NOT send in modes that it does not offer.
+
+   o  In an offering/answering SDP, any fmtp parameters that are not
+      known MUST be ignored.  If any unknown/undefined parameters should
+      be offered, an answerer MUST delete the entry from the answer
+      message.
+
+   o  A receiver of an SDP message MUST only use specified payload types
+      and modes.  When a mode specification is missing, i.e., a mode is
+      not specified at all, the session MUST default to one single mode
+      without mode changes during a session.  For this case, the default
+      mode values, as shown in Table 4, MUST be used based on the
+      sampling frequency and number of channels.  This table must be
+      looked up only when there are no mode specifications; thus, the
+      offerer/answerer MUST NOT assume that the default modes are always
+      available when it is not in the specified list of modes.
+
+   o  When an offered condition does not fit an answerer's capabilities,
+      it naturally MUST NOT answer any of the conditions, and the
+      session MAY proceed to re-INVITE, if possible.  If a condition
+      (mode) is decided upon, an offerer and an answerer MUST transmit
+      on this condition.
+
+
+
+
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 16]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+6.3.2.  Examples
+
+   When an offerer indicates that he/she wishes to dynamically switch
+   between modes (0,1,3, and 4) during a session, an example of an
+   offered SDP could be:
+
+     v=0
+     o=john 51050101 51050101 IN IP4 offhost.example.com
+     s=-
+     c=IN IP4 offhost.example.com
+     t=0 0
+     m=audio 5004 RTP/AVP 96
+     a=rtpmap:96 UEMCLIP/16000/1
+     a=fmtp:96 mode=4,1,3,0
+
+   It should be noted that the listed modes appears in the offerer's
+   preference.
+
+   When an answerer can only operate in Modes 1 and 0 but can
+   dynamically switch between those modes during a session, an answerer
+   MUST delete the entries of Mode 3 and 4, and answer back as:
+
+     v=0
+     o=lena 549947322 549947322 IN IP4 anshost.example.org
+     s=-
+     c=IN IP4 anshost.example.org
+     t=0 0
+     m=audio 5004 RTP/AVP 96
+     a=rtpmap:96 UEMCLIP/16000/1
+     a=fmtp:96 mode=1,0
+
+   As a result, both would start communicating in either Mode 1 or 0,
+   and can dynamically switch between those modes during the session.
+
+   On the other hand, when the answerer is capable of communicating
+   either in Modes 1 or 0, and cannot switch between modes during a
+   session, an example of such answer is as follows:
+
+     v=0
+     o=lena 549947322 549947322 IN IP4 anshost.example.org
+     s=-
+     c=IN IP4 anshost.example.org
+     t=0 0
+     m=audio 5004 RTP/AVP 96
+     a=rtpmap:96 UEMCLIP/16000/1
+     a=fmtp:96 mode=1
+
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 17]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   As a result, both will start communicating in Mode 1.  It should be
+   noted that mode change during this session is not allowed because the
+   answerer responded with a single mode, and answerer selected Mode 1
+   above Mode 0 according to the offered order.
+
+   If an offerer does not want a mode change during a session but is
+   capable of receiving either Modes 4 or 1 bitstreams, the SDP should
+   somewhat look like:
+
+     v=0
+     o=john 51050101 51050101 IN IP4 offhost.example.com
+     s=-
+     c=IN IP4 offhost.example.com
+     t=0 0
+     m=audio 5004 RTP/AVP 96 97
+     a=rtpmap:96 UEMCLIP/16000/1
+     a=fmtp:96 mode=4
+     a=rtpmap:97 UEMCLIP/16000/1
+     a=fmtp:97 mode=1
+
+   and if the answerer prefers to communicate in Mode 1, an answer would
+   be:
+
+     v=0
+     o=lena 549947322 549947322 IN IP4 anshost.example.org
+     s=-
+     c=IN IP4 anshost.example.org
+     t=0 0
+     m=audio 5004 RTP/AVP 97
+     a=rtpmap:97 UEMCLIP/16000/1
+     a=fmtp:97 mode=1
+
+   Please note that it is RECOMMENDED to select a single UEMCLIP payload
+   type for answers.
+
+   The "ptime" attribute is used to denote the desired packetization
+   interval.  When not specified, it SHOULD default to 20.  Since
+   UEMCLIP uses 20-ms frames, ptime values of multiples of 20 imply
+   multiple frames per packet.  In the example below, the ptime is set
+   to 60, and this means that offerer wants to receive 3 frames in each
+   packet.
+
+
+
+
+
+
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 18]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+     v=0
+     o=kosuke 2890844730 2890844730 IN IP4 anotherhost.example.com
+     s=-
+     c=IN IP4 anotherhost.example.com
+     t=0 0
+     m=audio 5004 RTP/AVP 96
+     a=ptime:60
+     a=rtpmap:96 UEMCLIP/16000/1
+
+   When mode specification is not present, it should default to a fixed
+   mode, and in this case, Mode 1 (see Section 6.2.1).
+
+7.  Security Considerations
+
+   RTP packets using the payload format defined in this specification
+   are subject to the security considerations discussed in the RTP
+   specification [RFC3550] and any appropriate profiles.  This implies
+   that confidentiality of the media streams is achieved by encryption
+   unless the applicable profile specifies other means.
+
+   A potential denial-of-service threat exists for data encoding using
+   compression techniques that have non-uniform receiver-end
+   computational load.  The attacker can inject pathological datagrams
+   into the stream that are complex to decode and cause the receiver
+   output to become overloaded.  However, the UEMCLIP covered in this
+   document do not exhibit any significant non-uniformity.
+
+   Another potential threat is memory attacks by illegal layer indices
+   or byte numbers.  The implementor of the decoder should always be
+   aware that the indicated numbers may be corrupted and not point to
+   the right sub-layer, and they may force reading beyond the bitstream
+   boundaries.  It is advised that a decoder implementation reject
+   layers of such indices.
+
+8.  IANA Considerations
+
+   One new media subtype (audio/UEMCLIP) has been registered by IANA.
+   For details, see Section 6.1.
+
+9.  References
+
+9.1.  Normative References
+
+   [ITU-T-G.711]
+              International Telecommunications Union, "Pulse code
+              modulation (PCM) of voice frequencies", ITU-
+              T Recommendation G.711, November 1988.
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 19]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
+              Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
+              with Session Description Protocol (SDP)", RFC 3264,
+              June 2002.
+
+   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
+              Jacobson, "RTP: A Transport Protocol for Real-Time
+              Applications", STD 64, RFC 3550, July 2003.
+
+   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
+              Video Conferences with Minimal Control", STD 65, RFC 3551,
+              July 2003.
+
+   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
+              Registration Procedures", BCP 13, RFC 4288, December 2005.
+
+   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+              Description Protocol", RFC 4566, July 2006.
+
+   [RFC4855]  Casner, S., "Media Type Registration of RTP Payload
+              Formats", RFC 4855, February 2007.
+
+   [RFC4856]  Casner, S., "Media Type Registration of Payload Formats in
+              the RTP Profile for Audio and Video Conferences",
+              RFC 4856, February 2007.
+
+   [RFC5117]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
+              January 2008.
+
+9.2.  Informative References
+
+   [ITU-T-G.711Appendix1]
+              International Telecommunications Union, "Pulse code
+              modulation (PCM) of voice frequencies, Appendix I: A high
+              quality low-complexity algorithm for packet loss
+              concealment with G.711", ITU-T Recommendation G.711
+              Appendix I, September 1999.
+
+
+
+
+
+
+
+
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 20]
+
+RFC 5686             RTP Payload Format for UEMCLIP         October 2009
+
+
+Authors' Addresses
+
+   Yusuke Hiwasaki
+   NTT Corporation
+   3-9-11 Midori-cho,
+   Musashino-shi
+   Tokyo  180-8585
+   Japan
+
+   Phone: +81(422)59-4815
+   EMail: hiwasaki.yusuke@lab.ntt.co.jp
+
+
+   Hitoshi Ohmuro
+   NTT Corporation
+   3-9-11 Midori-cho,
+   Musashino-shi
+   Tokyo  180-8585
+   Japan
+
+   Phone: +81(422)59-2151
+   EMail: ohmuro.hitoshi@lab.ntt.co.jp
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Hiwasaki & Ohmuro           Standards Track                    [Page 21]
+
author	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
committer	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
commit	4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree	e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc5686.txt
parent	ea76e11061bda059ae9f9ad130a9895cc85607db (diff)