diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4060.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc4060.txt')
-rw-r--r-- | doc/rfc/rfc4060.txt | 1067 |
1 files changed, 1067 insertions, 0 deletions
diff --git a/doc/rfc/rfc4060.txt b/doc/rfc/rfc4060.txt new file mode 100644 index 0000000..00a3894 --- /dev/null +++ b/doc/rfc/rfc4060.txt @@ -0,0 +1,1067 @@ + + + + + + +Network Working Group Q. Xie +Request for Comments: 4060 D. Pearce +Category: Standards Track Motorola + May 2005 + + + RTP Payload Formats for European Telecommunications + Standards Institute (ETSI) European Standard + ES 202 050, ES 202 211, and ES 202 212 + Distributed Speech Recognition Encoding + +Status of This Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2005). + +Abstract + + This document specifies RTP payload formats for encapsulating + European Telecommunications Standards Institute (ETSI) European + Standard ES 202 050 DSR Advanced Front-end (AFE), ES 202 211 DSR + Extended Front-end (XFE), and ES 202 212 DSR Extended Advanced + Front-end (XAFE) signal processing feature streams for distributed + speech recognition (DSR) systems. + + + + + + + + + + + + + + + + + + + + +Xie & Pearce Standards Track [Page 1] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + +Table of Contents + + 1. Introduction ....................................................2 + 1.1. Conventions and Acronyms ...................................3 + 2. ETSI DSR Front-end Codecs .......................................4 + 2.1. ES 202 050 Advanced DSR Front-end Codec ....................4 + 2.2. ES 202 211 Extended DSR Front-end Codec ....................4 + 2.3. ES 202 212 Extended Advanced DSR Front-end Codec ...........5 + 3. DSR RTP Payload Formats .........................................6 + 3.1. Common Considerations of the Three DSR RTP Payload + Formats ....................................................6 + 3.1.1. Number of FPs in Each RTP Packet ....................6 + 3.1.2. Support for Discontinuous Transmission ..............6 + 3.1.3. RTP Header Usage ....................................6 + 3.2. Payload Format for ES 202 050 DSR ..........................7 + 3.2.1. Frame Pair Formats ..................................7 + 3.3. Payload Format for ES 202 211 DSR ..........................9 + 3.3.1. Frame Pair Formats ..................................9 + 3.4. Payload Format for ES 202 212 DSR .........................11 + 3.4.1. Frame Pair Formats .................................12 + 4. IANA Considerations ............................................14 + 4.1. Mapping MIME Parameters into SDP ..........................15 + 4.2. Usage in Offer/Answer .....................................16 + 4.3. Congestion Control ........................................16 + 5. Security Considerations ........................................16 + 6. Acknowledgments ................................................16 + 7. References .....................................................16 + 7.1. Normative References ......................................16 + 7.2. Informative References ....................................17 + +1. Introduction + + Distributed speech recognition (DSR) technology is intended for a + remote device acting as a thin client (a.k.a. the front-end) to + communicate with a speech recognition server (a.k.a. a speech + engine), over a network connection to obtain speech recognition + services. More details on DSR over Internet can be found in RFC 3557 + [10]. + + To achieve interoperability with different client devices and speech + engines, the first ETSI standard DSR front-end ES 201 108 was + published in early 2000 [11]. An RTP packetization for ES 201 108 + frames is defined in RFC 3557 [10] by IETF. + + In ES 202 050 [1], ETSI issues another standard for an Advanced DSR + front-end that provides substantially improved recognition + performance when background noise is present. The codecs in ES 202 + + + + +Xie & Pearce Standards Track [Page 2] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + + 050 use a slightly different frame format from that of ES 201 108 and + thus the two do not inter-operate with each other. + + The RTP packetization for ES 202 050 front-end defined in this + document uses the same RTP packet format layout as that defined in + RFC 3557 [10]. The differences are in the DSR codec frame bit + definition and the payload type MIME registration. + + The two further standards, ES 202 211 and ES 202 212, provide + extensions to each of the DSR front-end standards. The extensions + allow the speech waveform to be reconstructed for human audition and + can also be used to improve recognition performance for tonal + languages. This is done by sending additional pitch and voicing + information for each frame along with the recognition features. + + The RTP packet format for these extended standards is also defined in + this document. + + It is worthwhile to note that the performance of most speech + recognizers are extremely sensitive to consecutive frame losses and + DSR speech recognizers are no exception. If a DSR over RTP session + is expected to endure high packet loss ratio between the front-end + and the speech engine, one should consider limiting the maximum + number of DSR frames allowed in a packet, or employing other loss + management techniques, such as FEC or interleaving, to minimize the + chance of losing consecutive frames. + +1.1. Conventions and Acronyms + + The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, + SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when + they appear in this document, are to be interpreted as described in + RFC 2119 [4]. + + The following acronyms are used in this document: + + DSR - Distributed Speech Recognition + ETSI - the European Telecommunications Standards Institute + FP - Frame Pair + DTX - Discontinuous Transmission + VAD - Voice Activity Detection + + + + + + + + + + +Xie & Pearce Standards Track [Page 3] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + +2. ETSI DSR Front-end Codecs + + Some relevant characteristics of ES 202 050 Advanced, ES 202 211 + Extended, and ES 202 212 Extended Advanced DSR front-end codecs are + summarized below. + +2.1. ES 202 050 Advanced DSR Front-end Codec + + The front-end calculation is a frame-based scheme that produces an + output vector every 10 ms. In the front-end feature extraction, + noise reduction by two stages of Wiener filtering is performed first. + Then, waveform processing is applied to the de-noised signal and + mel-cepstral features are calculated. At the end, blind equalization + is applied to the cepstral features. The front-end algorithm + produces at its output a mel-cepstral representation in the same + format as ES 210 108, i.e., 12 cepstral coefficients [C1 - C12], C0 + and log Energy. Voice activity detection (VAD) for the + classification of each frame as speech or non-speech is also + implemented in Feature Extraction. The VAD information is included + in the payload format for each frame pair to be sent to the remote + recognition engine as part of the payload. This information may + optionally be used by the receiving recognition engine to drop + non-speech frames. The front-end supports three raw sampling rates: + 8 kHz, 11 kHz, and 16 kHz (Note that unlike some other speech codecs, + the feature frame size of DSR presented to RTP packetization is not + dependent on the number of speech samples used in each 10 ms sample + frame. This will become more evident in the following sections). + + After calculation of the mel-cepstral representation, the + representation is first quantized via split-vector quantization to + reduce the data rate of the encoded stream. Then, the quantized + vectors from two consecutive frames are put into a FP, as described + in more detail in Section 3.2. + +2.2. ES 202 211 Extended DSR Front-end Codec + + Some relevant characteristics of ES 202 211 Extended DSR front-end + codec are summarized below. + + ES 202 211 is an extension of the mel-cepstrum DSR Front-end standard + ES 201 108 [11]. The mel-cepstrum front-end provides the features + for speech recognition but these are not available for human + listening. The purpose of the extension is allow the reconstruction + of the speech waveform from these features so that they can be + replayed. The front-end feature extraction part of the processing is + exactly the same as for ES 201 108. To allow speech reconstruction + additional fundamental frequency (perceived as pitch) and voicing + class (e.g., non-speech, voiced, unvoiced and mixed) information is + + + +Xie & Pearce Standards Track [Page 4] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + + needed. This extra information is provided by the extended front-end + processing algorithms at the device side. It is compressed and + transmitted along with the front-end features to the server. This + extra information may also be useful for improved speech recognition + performance with tonal languages such as Mandarin, Cantonese and + Thai. + + Full information about the client side signal processing algorithms + used in the standard are described in the specification ES 202 211 + [2]. + + The additional fundamental frequency and voicing class information is + compressed for each frame pair. The pitch for the first frame of the + FP is quantized to 7 bits and the second frame is differentially + quantized to 7 bits. The voicing class is indicated with one bit for + each frame. The total for the extension information for a frame pair + therefore consists of 14 bits plus an additional 2 bits of CRC error + protection computed over these extension bits only. + + The total information for the frame pair is made up of 92 bits for + the two compressed front-end feature frames (including 4 bits for + their CRC) plus 16 bits for the extension (including 2 bits for their + CRC) and 4 bits of null padding to give a total of 14 octets per + frame pair. As for ES 201 208 the extended frame pair also + corresponds to 20ms of speech. The extended front-end supports three + raw sampling rates: 8 kHz, 11 kHz, and 16 kHz. + + The quantized vectors from two consecutive frames are put into an FP, + as described in more detail in Section 3.3 below. + + The parameters received at the remote server from the RTP extended + DSR payload specified here can be used to synthesize an intelligible + speech waveform for replay. The algorithms to do this are described + in the specification ES 202 211 [2]. + +2.3. ES 202 212 Extended Advanced DSR Front-end Codec + + ES 202 212 is the extension for the DSR Advanced Front-end ES 202 050 + [1]. It provides the same capabilities as the extended mel-cepstrum + front-end described in Section 2.2 but for the DSR Advanced + Front-end. + + + + + + + + + + +Xie & Pearce Standards Track [Page 5] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + +3. DSR RTP Payload Formats + +3.1. Common Considerations of the Three DSR RTP Payload Formats + + The three DSR RTP payload formats defined in this document share the + following consideration or behaviours. + +3.1.1. Number of FPs in Each RTP Packet + + Any number of FPs MAY be aggregate together in an RTP payload and + they MUST be consecutive in time. However, one SHOULD always keep + the RTP payload size smaller than the MTU in order to avoid IP + fragmentation and SHOULD follow the recommendations given in Section + 3.1 in RFC 3557 [10] when determining the proper number of FPs in an + RTP payload. + +3.1.2. Support for Discontinuous Transmission + + Same considerations described in Section 3.2 of RFC 3557 [10] apply + to all the three DSR RTP payloads defined in this document. + +3.1.3. RTP Header Usage + + The format of the RTP header is specified in RFC 3550 [8]. The three + payload formats defined here use the fields of the header in a manner + consistent with that specification. + + The RTP timestamp corresponds to the sampling instant of the first + sample encoded for the first FP in the packet. The timestamp clock + frequency is the same as the sampling frequency, so the timestamp + unit is in samples. + + As defined by all three front-end codecs, the duration of one FP is + 20 ms, corresponding to 160, 220, or 320 encoded samples with a + sampling rate of 8, 11, or 16 kHz being used at the front-end, + respectively. Thus, the timestamp is increased by 160, 220, or 320 + for each consecutive FP, respectively. + + The DSR payload for all three front-end codecs is always an integral + number of octets. If additional padding is required for some other + purpose, then the P bit in the RTP header may be set and padding + appended as specified in RFC 3550 [8]. + + The RTP header marker bit (M) MUST be set following the general rules + for audio codecs, as defined in Section 4.1 in RFC 3551 [9]. + + + + + + +Xie & Pearce Standards Track [Page 6] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + + This document does not specify the assignment of an RTP payload type + for these three new packet formats. It is expected that the RTP + profile under which any of these payload formats is being used will + assign a payload type for this encoding or will specify that the + payload type is to be bound dynamically. + +3.2. Payload Format for ES 202 050 DSR + + An ES 202 050 DSR RTP payload datagram uses exactly the same layout + as defined in Section 3 of RFC 3557 [10], i.e., a standard RTP header + followed by a DSR payload containing a series of DSR FPs. + + The size of each ES 202 050 FP remains 96 bits or 12 octets, as + defined in the following sections. This ensures that a DSR RTP + payload will always end on an octet boundary. + +3.2.1. Frame Pair Formats + +3.2.1.1. Format of Speech and Non-speech FPs + + The following mel-cepstral frame MUST be used, as defined in [1]: + + Pairs of the quantized 10ms mel-cepstral frames MUST be grouped + together and protected with a 4-bit CRC forming a 92-bit long FP. At + the end, each FP MUST be padded with 4 zeros to the MSB 4 bits of the + last octet in order to make the FP aligned to the octet boundary. + + The following diagram shows a complete ES 202 050 FP: + + Frame #1 in FP: + =============== + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(2,3) | idx(0,1) | Octet 1 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(4,5) | idx(2,3) (cont) : Octet 2 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(6,7) |idx(4,5)(cont) Octet 3 + +-----+-----+-----+-----+-----+-----+-----+-----+ + idx(10,11)| VAD | idx(8,9) | Octet 4 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(12,13) | idx(10,11) (cont) : Octet 5 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(12,13) (cont) : Octet 6/1 + +-----+-----+-----+-----+ + + + + + +Xie & Pearce Standards Track [Page 7] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + + Frame #2 in FP: + =============== + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+ + : idx(0,1) | Octet 6/2 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(2,3) |idx(0,1)(cont) Octet 7 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(6,7) | idx(4,5) | Octet 8 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(8,9) | idx(6,7) (cont) : Octet 9 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(10,11) | VAD |idx(8,9)(cont) Octet 10 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(12,13) | Octet 11 + +-----+-----+-----+-----+-----+-----+-----+-----+ + + + CRC for Frame #1 and Frame #2 and padding in FP: + ================================================ + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | 0 | 0 | 0 | 0 | CRC | Octet 12 + +-----+-----+-----+-----+-----+-----+-----+-----+ + + The 4-bit CRC in the FP MUST be calculated using the formula + (including the bit-order rules) defined in 7.2 in [1]. + + Therefore, each FP represents 20ms of original speech. Note that + each FP MUST be padded with 4 zeros to the MSB 4 bits of the last + octet in order to make the FP aligned to the octet boundary, as shown + above. This makes the total size of an FP 96 bits, or 12 octets. + Note that this padding is separate from padding indicated by the P + bit in the RTP header. + + The definition of the indices and 'VAD' flag are described in [1] and + their value is only set and examined by the codecs in the front-end + client and the recognizer. + +3.2.1.2. Format of Null FP + + Null FPs are sent to mark the end of a transmission segment. Details + on transmission segment and the use of Null FPs can be found in RFC + 3557 [10]. + + + + + +Xie & Pearce Standards Track [Page 8] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + + A Null FP for the ES 202 050 front-end codec is defined by setting + the content of the first and second frame in the FP to null (i.e., + filling the first 88 bits of the FP with zeros). The 4-bit CRC MUST + be calculated the same way as described in Section 7.2.4 of [1], and + 4 zeros MUST be padded to the end of the Null FP in order to make it + aligned to the octet boundary. + +3.3. Payload Format for ES 202 211 DSR + + An ES 202 211 DSR RTP payload datagram is very similar to that + defined in Section 3 of RFC 3557 [10], i.e., a standard RTP header + followed by a DSR payload containing a series of DSR FPs. + + The size of each ES 202 211 FP is 112 bits or 14 octets, as defined + in the following sections. This ensures that a DSR RTP payload will + always end on an octet boundary. + +3.3.1. Frame Pair Formats + +3.3.1.1. Format of Speech and Non-speech FPs + + The following mel-cepstral frame MUST be used, as defined in Section + 6.2.4 in [2]: + + Immediately following two frames (Frame #1 and Frame #2) worth of + codebook indices (or 88 bits), there is a 4-bit CRC calculated on + these 88 bits. The pitch indices of the first frame (Pidx1: 7 bits) + and the second frame (Pidx2: 5 bits) of the frame pair then follow. + The class indices of the two frames in the frame pair worth 1 bit + each (Cidx1 and Cidx2) next follow. Finally, a 2-bit CRC calculated + on the pitch and class bits (total: 14 bits) of the frame pair is + included (PC-CRC). The total number of bits in a frame pair packet + is therefore 44 + 44 + 4 + 7 + 5 + 1 + 1 + 2 = 108. At the end, each + FP MUST be padded with 4 zeros to the MSB 4 bits of the last octet in + order to make the FP aligned to the octet boundary. + + + + + + + + + + + + + + + + +Xie & Pearce Standards Track [Page 9] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + + The following diagram shows a complete ES 202 211 FP: + + Frame #1 in FP: + =============== + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(2,3) | idx(0,1) | Octet 1 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(4,5) | idx(2,3) (cont) : Octet 2 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(6,7) |idx(4,5)(cont) Octet 3 + +-----+-----+-----+-----+-----+-----+-----+-----+ + idx(10,11) | idx(8,9) | Octet 4 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(12,13) | idx(10,11) (cont) : Octet 5 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(12,13) (cont) : Octet 6/1 + +-----+-----+-----+-----+ + + Frame #2 in FP: + =============== + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+ + : idx(0,1) | Octet 6/2 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(2,3) |idx(0,1)(cont) Octet 7 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(6,7) | idx(4,5) | Octet 8 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(8,9) | idx(6,7) (cont) : Octet 9 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(10,11) |idx(8,9)(cont) Octet 10 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(12,13) | Octet 11 + +-----+-----+-----+-----+-----+-----+-----+-----+ + + CRC for Frame #1 and Frame #2 in FP: + ==================================== + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+ + | CRC | Octet 12/1 + +-----+-----+-----+-----+ + + + + + + +Xie & Pearce Standards Track [Page 10] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + + Extension information and padding in FP: + ======================================== + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+ + : Pidx1 | Octet 12/2 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | Pidx2 | Pidx1 (cont) : Octet 13 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | 0 | 0 | 0 | 0 | PC-CRC |Cidx2|Cidx1| Octet 14 + +-----+-----+-----+-----+-----+-----+-----+-----+ + + The 4-bit CRC and the 2-bit PC-CRC in the FP MUST be calculated using + the formula (including the bit-order rules) defined in 6.2.4 in [2]. + + Therefore, each FP represents 20ms of original speech. Note, as + shown above, each FP MUST be padded with 4 zeros to the MSB 4 bits of + the last octet in order to make the FP aligned to the octet boundary. + This makes the total size of an FP 112 bits, or 14 octets. Note, + this padding is separate from padding indicated by the P bit in the + RTP header. + +3.3.1.2. Format of Null FP + + A Null FP for the ES 202 211 front-end codec is defined by setting + all the 112 bits of the FP with zeros. Null FPs are sent to mark the + end of a transmission segment. Details on transmission segment and + the use of Null FPs can be found in RFC 3557 [10]. + +3.4. Payload Format for ES 202 212 DSR + + Similar to other ETSI DSR front-end encoding schemes, the encoded DSR + feature stream of ES 202 212 is transmitted in a sequence of FPs, + where each FP represents two consecutive original voice frames. + + An ES 202 212 DSR RTP payload datagram is very similar to that + defined in Section 3 of RFC 3557 [10], i.e., a standard RTP header + followed by a DSR payload containing a series of DSR FPs. + + The size of each ES 202 212 FP is 112 bits or 14 octets, as defined + in the following sections. This ensures that an ES 202 212 DSR RTP + payload will always end on an octet boundary. + + + + + + + + + +Xie & Pearce Standards Track [Page 11] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + +3.4.1. Frame Pair Formats + +3.4.1.1. Format of Speech and Non-speech FPs + + The following mel-cepstral frame MUST be used, as defined in Section + 7.2.4 of [3]: + + Immediately following two frames (Frame #1 and Frame #2) worth of + codebook indices (or 88 bits), there is a 4-bit CRC calculated on + these 88 bits. The pitch indices of the first frame (Pidx1: 7 bits) + and the second frame (Pidx2: 5 bits) of the frame pair then follow. + The class indices of the two frames in the frame pair worth 1 bit + each next follow (Cidx1 and Cidx2). Finally, a 2-bit CRC (PC-CRC) + calculated on the pitch and class bits (total: 14 bits) of the frame + pair is included. The total number of bits in frame pair packet is + therefore 44 + 44 + 4 + 7 + 5 + 1 + 1 + 2 = 108. At the end, each FP + MUST be padded with 4 zeros to the MSB 4 bits of the last octet in + order to make the FP aligned to the octet boundary. The padding + brings the total size of a FP to 112 bits, or 14 octets. Note that + this padding is separate from padding indicated by the P bit in the + RTP header. + + The following diagram shows a complete ES 202 212 FP: + + Frame #1 in FP: + =============== + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(2,3) | idx(0,1) | Octet 1 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(4,5) | idx(2,3) (cont) : Octet 2 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(6,7) |idx(4,5)(cont) Octet 3 + +-----+-----+-----+-----+-----+-----+-----+-----+ + idx(10,11)| VAD | idx(8,9) | Octet 4 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(12,13) | idx(10,11) (cont) : Octet 5 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(12,13) (cont) : Octet 6/1 + +-----+-----+-----+-----+ + + + + + + + + + + +Xie & Pearce Standards Track [Page 12] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + + Frame #2 in FP: + =============== + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+ + : idx(0,1) | Octet 6/2 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(2,3) |idx(0,1)(cont) Octet 7 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(6,7) | idx(4,5) | Octet 8 + +-----+-----+-----+-----+-----+-----+-----+-----+ + : idx(8,9) | idx(6,7) (cont) : Octet 9 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(10,11) | VAD |idx(8,9)(cont) Octet 10 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | idx(12,13) | Octet 11 + +-----+-----+-----+-----+-----+-----+-----+-----+ + + + CRC for Frame #1 and Frame #2 in FP: + ==================================== + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+ + | CRC | Octet 12/1 + +-----+-----+-----+-----+ + + Extension information and padding in FP: + ======================================== + (MSB) (LSB) + 0 1 2 3 4 5 6 7 + +-----+-----+-----+-----+ + : Pidx1 | Octet 12/2 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | Pidx2 | Pidx1 (cont) : Octet 13 + +-----+-----+-----+-----+-----+-----+-----+-----+ + | 0 | 0 | 0 | 0 | PC-CRC |Cidx2|Cidx1| Octet 14 + +-----+-----+-----+-----+-----+-----+-----+-----+ + + The codebook indices, VAD flag, pitch index, and class index are + specified in Section 6 of [3]. The 4-bit CRC and the 2-bit PC-CRC in + the FP MUST be calculated using the formula (including the bit-order + rules) defined in 7.2.4 in [3]. + + + + + + + + +Xie & Pearce Standards Track [Page 13] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + +3.4.1.2. Format of Null FP + + A Null FP for the ES 202 212 front-end codec is defined by setting + all 112 bits of the FP with zeros. Null FPs are sent to mark the end + of a transmission segment. Details on transmission segments and the + use of Null FPs can be found in RFC 3557 [10]. + +4. IANA Considerations + + For each of the three ETSI DSR front-end codecs covered in this + document, a new MIME subtype registration has been registered by the + IANA for the corresponding payload type, as described below. + + Media Type name: audio + + Media subtype names: + + dsr-es202050 (for ES 202 050 front-end) + + dsr-es202211 (for ES 202 211 front-end) + + dsr-es202212 (for ES 202 212 front-end) + + Required parameters: none + + Optional parameters: + + rate: Indicates the sample rate of the speech. Valid values include: + 8000, 11000, and 16000. If this parameter is not present, 8000 + sample rate is assumed. + + maxptime: see RFC 3267 [7]. If this parameter is not present, + maxptime is assumed to be 80ms. + + Note, since the performance of most speech recognizers are + extremely sensitive to consecutive FP losses, if the user of the + payload format expects a high packet loss ratio for the session, + it MAY consider to explicitly choose a maxptime value for the + session that is shorter than the default value. + + ptime: see RFC 2327 [5]. + + Encoding considerations: These types are defined for transfer via RTP + [8] as described in Section 3 of RFC 4060. + + Security considerations: See Section 5 of RFC 4060. + + + + + +Xie & Pearce Standards Track [Page 14] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + + Person & email address to contact for further information: + Qiaobing.Xie@motorola.com + + Intended usage: COMMON. It is expected that many VoIP applications + (as well as mobile applications) will use this type. + + Author: Qiaobing.Xie@motorola.com + + Change controller: IETF Audio/Video transport working group + +4.1. Mapping MIME Parameters into SDP + + The information carried in the MIME media type specification has a + specific mapping to fields in the Session Description Protocol (SDP) + [5], which is commonly used to describe RTP sessions. When SDP is + used to specify sessions employing ES 202 050, ES 202 211, or ES 202 + 212 DSR codec, the mapping is as follows: + + o The MIME type ("audio") goes in SDP "m=" as the media name. + + o The MIME subtype ("dsr-es202050", "dsr-es202211", or + "dsr-es202212") goes in SDP "a=rtpmap" as the encoding name. + + o The optional parameter "rate" also goes in "a=rtpmap" as clock + rate. If no rate is given, then the default value (i.e., 8000) is + used in SDP. + + o The optional parameters "ptime" and "maxptime" go in the SDP + "a=ptime" and "a=maxptime" attributes, respectively. + + Example of usage of ES 202 050 DSR: + + m=audio 49120 RTP/AVP 101 + a=rtpmap:101 dsr-es202050/8000 + a=maxptime:40 + + Example of usage of ES 202 211 DSR: + + m=audio 49120 RTP/AVP 101 + a=rtpmap:101 dsr-es202211/8000 + a=maxptime:40 + + Example of usage of ES 202 212 DSR: + + m=audio 49120 RTP/AVP 101 + a=rtpmap:101 dsr-es202212/8000 + a=maxptime:40 + + + + +Xie & Pearce Standards Track [Page 15] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + +4.2. Usage in Offer/Answer + + All SDP parameters in this payload format are declarative, and all + reasonable values are expected to be supported. Thus, the standard + usage of Offer/Answer as described in RFC 3264 [6] should be + followed. + +4.3. Congestion Control + + Congestion control for RTP MUST be used in accordance with RFC 3550 + [8], and in any applicable RTP profile, e.g., RFC 3551 [9]. + +5. Security Considerations + + Implementations using the payload defined in this specification are + subject to the security considerations discussed in the RTP + specification RFC 3550 [8] and any RTP profile, e.g., RFC 3551 [9]. + This payload does not specify any different security services. + +6. Acknowledgments + + The design presented here is based on that of RFC 3557 [10]. The + authors wish to thank Magnus Westerlund and others for their reviews + and comments. + +7. References + +7.1. Normative References + + [1] European Telecommunications Standards Institute (ETSI) Standard + ES 202 050, "Speech Processing, Transmission and Quality + Aspects (STQ); Distributed Speech Recognition; Advanced Front- + end Feature Extraction Algorithm; Compression Algorithms", + http://pda.etsi.org/pda/. + + [2] European Telecommunications Standards Institute (ETSI) Standard + ES 202 211, "Speech Processing, Transmission and Quality + Aspects (STQ); Distributed Speech Recognition; Extended front- + end feature extraction algorithm; Compression algorithms; Back- + end speech reconstruction algorithm", http://pda.etsi.org/pda/. + + [3] European Telecommunications Standards Institute (ETSI) Standard + ES 202 212, "Speech Processing, Transmission and Quality + aspects (STQ); Distributed speech recognition; Extended + advanced front-end feature extraction algorithm; Compression + algorithms; Back-end speech reconstruction algorithm", + http://pda.etsi.org/pda/. + + + + +Xie & Pearce Standards Track [Page 16] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + + [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement + Levels", BCP 14, RFC 2119, March 1997. + + [5] Handley, M. and V. Jacobson, "SDP: Session Description + Protocol", RFC 2327, April 1998. + + [6] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with + the Session Description Protocol (SDP)", RFC 3264, June 2002. + + [7] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, + "Real-Time Transport Protocol (RTP) Payload Format and File + Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive + Multi-Rate Wideband (AMR-WB) Audio Codecs", RFC 3267, + June 2002. + + [8] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, + "RTP: A Transport Protocol for Real-Time Applications", STD 64, + RFC 3550, July 2003. + + [9] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video + Conferences with Minimal Control", STD 65, RFC 3551, July 2003. + + [10] Xie, Q., "RTP Payload Format for European Telecommunications + Standards Institute (ETSI) European Standard ES 201 108 + Distributed Speech Recognition Encoding", RFC 3557, July 2003. + +7.2. Informative References + + [11] European Telecommunications Standards Institute (ETSI) Standard + ES 201 108, "Speech Processing, Transmission and Quality + Aspects (STQ); Distributed Speech Recognition; Front-end + Feature Extraction Algorithm; Compression Algorithms", + http://pda.etsi.org/pda/. + + + + + + + + + + + + + + + + + + +Xie & Pearce Standards Track [Page 17] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + +Authors' Addresses + + Qiaobing Xie + Motorola, Inc. + 1501 W. Shure Drive, 2-F9 + Arlington Heights, IL 60004 + US + + Phone: +1-847-632-3028 + EMail: qxie1@email.mot.com + + + David Pearce + Motorola Labs + UK Research Laboratory + Jays Close + Viables Industrial Estate + Basingstoke, HANTS RG22 4PD + UK + + Phone: +44 (0)1256 484 436 + EMail: bdp003@motorola.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Xie & Pearce Standards Track [Page 18] + +RFC 4060 RTP Payloads for ETSI DSR Codecs May 2005 + + +Full Copyright Statement + + Copyright (C) The Internet Society (2005). + + This document is subject to the rights, licenses and restrictions + contained in BCP 78, and except as set forth therein, the authors + retain all their rights. + + This document and the information contained herein are provided on an + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET + ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, + INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE + INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + + The IETF takes no position regarding the validity or scope of any + Intellectual Property Rights or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; nor does it represent that it has + made any independent effort to identify any such rights. Information + on the procedures with respect to rights in RFC documents can be + found in BCP 78 and BCP 79. + + Copies of IPR disclosures made to the IETF Secretariat and any + assurances of licenses to be made available, or the result of an + attempt made to obtain a general license or permission for the use of + such proprietary rights by implementers or users of this + specification can be obtained from the IETF on-line IPR repository at + http://www.ietf.org/ipr. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights that may cover technology that may be required to implement + this standard. Please address the information to the IETF at ietf- + ipr@ietf.org. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + +Xie & Pearce Standards Track [Page 19] + |