diff options
Diffstat (limited to 'doc/rfc/rfc2198.txt')
-rw-r--r-- | doc/rfc/rfc2198.txt | 619 |
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc2198.txt b/doc/rfc/rfc2198.txt new file mode 100644 index 0000000..5745258 --- /dev/null +++ b/doc/rfc/rfc2198.txt @@ -0,0 +1,619 @@ + + + + + + +Network Working Group C. Perkins +Request for Comments: 2198 I. Kouvelas +Category: Standards Track O. Hodson + V. Hardman + University College London + M. Handley + ISI + J.C. Bolot + A. Vega-Garcia + S. Fosse-Parisis + INRIA Sophia Antipolis + September 1997 + + + RTP Payload for Redundant Audio Data + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + This document describes a payload format for use with the real-time + transport protocol (RTP), version 2, for encoding redundant audio + data. The primary motivation for the scheme described herein is the + development of audio conferencing tools for use with lossy packet + networks such as the Internet Mbone, although this scheme is not + limited to such applications. + +1 Introduction + + If multimedia conferencing is to become widely used by the Internet + Mbone community, users must perceive the quality to be sufficiently + good for most applications. We have identified a number of problems + which impair the quality of conferences, the most significant of + which is packet loss. This is a persistent problem, particularly + given the increasing popularity, and therefore increasing load, of + the Internet. The disruption of speech intelligibility even at low + loss rates which is currently experienced may convince a whole + generation of users that multimedia conferencing over the Internet is + not viable. The addition of redundancy to the data stream is offered + as a solution [1]. If a packet is lost then the missing information + may be reconstructed at the receiver from the redundant data that + arrives in the following packet(s), provided that the average number + + + +Perkins, et. al. Standards Track [Page 1] + +RFC 2198 RTP Payload for Redundant Audio Data September 1997 + + + of consecutively lost packets is small. Recent work [4,5] shows that + packet loss patterns in the Internet are such that this scheme + typically functions well. + + This document describes an RTP payload format for the transmission of + audio data encoded in such a redundant fashion. Section 2 presents + the requirements and motivation leading to the definition of this + payload format, and does not form part of the payload format + definition. Sections 3 onwards define the RTP payload format for + redundant audio data. + +2 Requirements/Motivation + + The requirements for a redundant encoding scheme under RTP are as + follows: + + o Packets have to carry a primary encoding and one or more + redundant encodings. + + o As a multitude of encodings may be used for redundant + information, each block of redundant encoding has to have an + encoding type identifier. + + o As the use of variable size encodings is desirable, each encoded + block in the packet has to have a length indicator. + + o The RTP header provides a timestamp field that corresponds to + the time of creation of the encoded data. When redundant + encodings are used this timestamp field can refer to the time of + creation of the primary encoding data. Redundant blocks of data + will correspond to different time intervals than the primary + data, and hence each block of redundant encoding will require its + own timestamp. To reduce the number of bytes needed to carry the + timestamp, it can be encoded as the difference of the timestamp + for the redundant encoding and the timestamp of the primary. + + There are two essential means by which redundant audio may be added + to the standard RTP specification: a header extension may hold the + redundancy, or one, or more, additional payload types may be defined. + + Including all the redundancy information for a packet in a header + extension would make it easy for applications that do not implement + redundancy to discard it and just process the primary encoding data. + There are, however, a number of disadvantages with this scheme: + + + + + + + +Perkins, et. al. Standards Track [Page 2] + +RFC 2198 RTP Payload for Redundant Audio Data September 1997 + + + o There is a large overhead from the number of bytes needed for + the extension header (4) and the possible padding that is needed + at the end of the extension to round up to a four byte boundary + (up to 3 bytes). For many applications this overhead is + unacceptable. + + o Use of the header extension limits applications to a single + redundant encoding, unless further structure is introduced into + the extension. This would result in further overhead. + + For these reasons, the use of RTP header extension to hold redundant + audio encodings is disregarded. + + The RTP profile for audio and video conferences [3] lists a set of + payload types and provides for a dynamic range of 32 encodings that + may be defined through a conference control protocol. This leads to + two possible schemes for assigning additional RTP payload types for + redundant audio applications: + + 1.A dynamic encoding scheme may be defined, for each combination + of primary/redundant payload types, using the RTP dynamic payload + type range. + + 2.A single fixed payload type may be defined to represent a packet + with redundancy. This may then be assigned to either a static + RTP payload type, or the payload type for this may be assigned + dynamically. + + It is possible to define a set of payload types that signify a + particular combination of primary and secondary encodings for each of + the 32 dynamic payload types provided. This would be a slightly + restrictive yet feasible solution for packets with a single block of + redundancy as the number of possible combinations is not too large. + However the need for multiple blocks of redundancy greatly increases + the number of encoding combinations and makes this solution not + viable. + + A modified version of the above solution could be to decide prior to + the beginning of a conference on a set a 32 encoding combinations + that will be used for the duration of the conference. All tools in + the conference can be initialized with this working set of encoding + combinations. Communication of the working set could be made through + the use of an external, out of band, mechanism. Setup is complicated + as great care needs to be taken in starting tools with identical + parameters. This scheme is more efficient as only one byte is used + to identify combinations of encodings. + + + + + +Perkins, et. al. Standards Track [Page 3] + +RFC 2198 RTP Payload for Redundant Audio Data September 1997 + + + It is felt that the complication inherent in distributing the mapping + of payload types onto combinations of redundant data preclude the use + of this mechanism. + + A more flexible solution is to have a single payload type which + signifies a packet with redundancy. That packet then becomes a + container, encapsulating multiple payloads into a single RTP packet. + Such a scheme is flexible, since any amount of redundancy may be + encapsulated within a single packet. There is, however, a small + overhead since each encapsulated payload must be preceded by a header + indicating the type of data enclosed. This is the preferred + solution, since it is both flexible, extensible, and has a relatively + low overhead. The remainder of this document describes this + solution. + +3 Payload Format Specification + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC2119 [7]. + + The assignment of an RTP payload type for this new packet format is + outside the scope of this document, and will not be specified here. + It is expected that the RTP profile for a particular class of + applications will assign a payload type for this encoding, or if that + is not done then a payload type in the dynamic range shall be chosen. + + An RTP packet containing redundant data shall have a standard RTP + header, with payload type indicating redundancy. The other fields of + the RTP header relate to the primary data block of the redundant + data. + + Following the RTP header are a number of additional headers, defined + in the figure below, which specify the contents of each of the + encodings carried by the packet. Following these additional headers + are a number of data blocks, which contain the standard RTP payload + data for these encodings. It is noted that all the headers are + aligned to a 32 bit boundary, but that the payload data will + typically not be aligned. If multiple redundant encodings are + carried in a packet, they should correspond to different time + intervals: there is no reason to include multiple copies of data for + a single time interval within a packet. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |F| block PT | timestamp offset | block length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + + +Perkins, et. al. Standards Track [Page 4] + +RFC 2198 RTP Payload for Redundant Audio Data September 1997 + + + The bits in the header are specified as follows: + + + F: 1 bit First bit in header indicates whether another header block + follows. If 1 further header blocks follow, if 0 this is the + last header block. + + block PT: 7 bits RTP payload type for this block. + + timestamp offset: 14 bits Unsigned offset of timestamp of this block + relative to timestamp given in RTP header. The use of an unsigned + offset implies that redundant data must be sent after the primary + data, and is hence a time to be subtracted from the current + timestamp to determine the timestamp of the data for which this + block is the redundancy. + + block length: 10 bits Length in bytes of the corresponding data + block excluding header. + + It is noted that the use of an unsigned timestamp offset limits the + use of redundant data slightly: it is not possible to send + redundancy before the primary encoding. This may affect schemes + where a low bandwidth coding suitable for redundancy is produced + early in the encoding process, and hence could feasibly be + transmitted early. However, the addition of a sign bit would + unacceptably reduce the range of the timestamp offset, and increasing + the size of the field above 14 bits limits the block length field. + It seems that limiting redundancy to be transmitted after the primary + will cause fewer problems than limiting the size of the other fields. + + The timestamp offset for a redundant block is measured in the same + units as the timestamp of the primary encoding (ie: audio samples, + with the same clock rate as the primary). The implication of this is + that the redundant encoding MUST be sampled at the same rate as the + primary. + + It is further noted that the block length and timestamp offset are 10 + bits, and 14 bits respectively; rather than the more obvious 8 and 16 + bits. Whilst such an encoding complicates parsing the header + information slightly, and adds some additional processing overhead, + there are a number of problems involved with the more obvious choice: + An 8 bit block length field is sufficient for most, but not all, + possible encodings: for example 80ms PCM and DVI audio packets + comprise more than 256 bytes, and cannot be encoded with a single + byte length field. It is possible to impose additional structure on + the block length field (for example the high bit set could imply the + lower 7 bits code a length in words, rather than bytes), however such + schemes are complex. The use of a 10 bit block length field retains + + + +Perkins, et. al. Standards Track [Page 5] + +RFC 2198 RTP Payload for Redundant Audio Data September 1997 + + + simplicity and provides an enlarged range, at the expense of a + reduced range of timestamp values. + + The primary encoding block header is placed last in the packet. It + is therefore possible to omit the timestamp and block-length fields + from the header of this block, since they may be determined from the + RTP header and overall packet length. The header for the primary + (final) block comprises only a zero F bit, and the block payload type + information, a total of 8 bits. This is illustrated in the figure + below: + + 0 1 2 3 4 5 6 7 + +-+-+-+-+-+-+-+-+ + |0| Block PT | + +-+-+-+-+-+-+-+-+ + + The final header is followed, immediately, by the data blocks, stored + in the same order as the headers. There is no padding or other + delimiter between the data blocks, and they are typically not 32 bit + aligned. Again, this choice was made to reduce bandwidth overheads, + at the expense of additional decoding time. + + The choice of encodings used should reflect the bandwidth + requirements of those encodings. It is expected that the redundant + encoding shall use significantly less bandwidth that the primary + encoding: the exception being the case where the primary is very + low-bandwidth and has high processing requirement, in which case a + copy of the primary MAY be used as the redundancy. The redundant + encoding MUST NOT be higher bandwidth than the primary. + + The use of multiple levels of redundancy is rarely necessary. + However, in those cases which require it, the bandwidth required by + each level of redundancy is expected to be significantly less than + that of the previous level. + +4 Limitations + + The RTP marker bit is not preserved for redundant data blocks. Hence + if the primary (containing this marker) is lost, the marker is lost. + It is believed that this will not cause undue problems: even if the + marker bit was transmitted with the redundant information, there + would still be the possibility of its loss, so applications would + still have to be written with this in mind. + + In addition, CSRC information is not preserved for redundant data. + The CSRC data in the RTP header of a redundant audio packet relates + to the primary only. Since CSRC data in an audio stream is expected + to change relatively infrequently, it is recommended that + + + +Perkins, et. al. Standards Track [Page 6] + +RFC 2198 RTP Payload for Redundant Audio Data September 1997 + + + applications which require this information assume that the CSRC data + in the RTP header may be applied to the reconstructed redundant data. + +5 Relation to SDP + + When a redundant payload is used, it may need to be bound to an RTP + dynamic payload type. This may be achieved through any out-of-band + mechanism, but one common way is to communicate this binding using + the Session Description Protocol (SDP) [6]. SDP has a mechanism for + binding a dynamic payload types to particular codec, sample rate, and + number of channels using the "rtpmap" attribute. An example of its + use (using the RTP audio/video profile [3]) is: + + m=audio 12345 RTP/AVP 121 0 5 + a=rtpmap:121 red/8000/1 + + This specifies that an audio stream using RTP is using payload types + 121 (a dynamic payload type), 0 (PCM u-law) and 5 (DVI). The "rtpmap" + attribute is used to bind payload type 121 to codec "red" indicating + this codec is actually a redundancy frame, 8KHz, and monaural. When + used with SDP, the term "red" is used to indicate the redundancy + format discussed in this document. + + In this case the additional formats of PCM and DVI are specified. + The receiver must therefore be prepared to use these formats. Such a + specification means the sender will send redundancy by default, but + also may send PCM or DVI. However, with a redundant payload we + additionally take this to mean that no codec other than PCM or DVI + will be used in the redundant encodings. Note that the additional + payload formats defined in the "m=" field may themselves be dynamic + payload types, and if so a number of additional "a=" attributes may + be required to describe these dynamic payload types. + + To receive a redundant stream, this is all that is required. However + to send a redundant stream, the sender needs to know which codecs are + recommended for the primary and secondary (and tertiary, etc) + encodings. This information is specific to the redundancy format, + and is specified using an additional attribute "fmtp" which conveys + format-specific information. A session directory does not parse the + values specified in an fmtp attribute but merely hands it to the + media tool unchanged. For redundancy, we define the format + parameters to be a slash "/" separated list of RTP payload types. + + Thus a complete example is: + + m=audio 12345 RTP/AVP 121 0 5 + a=rtpmap:121 red/8000/1 + a=fmtp:121 0/5 + + + +Perkins, et. al. Standards Track [Page 7] + +RFC 2198 RTP Payload for Redundant Audio Data September 1997 + + + This specifies that the default format for senders is redundancy with + PCM as the primary encoding and DVI as the secondary encoding. + Encodings cannot be specified in the fmtp attribute unless they are + also specified as valid encodings on the media ("m=") line. + +6 Security Considerations + + RTP packets containing redundant information are subject to the + security considerations discussed in the RTP specification [2], and + any appropriate RTP profile (for example [3]). This implies that + confidentiality of the media streams is achieved by encryption. + Encryption of a redundant data stream may occur in two ways: + + 1.The entire stream is to be secured, and all participants are + expected to have keys to decode the entire stream. In this case, + nothing special need be done, and encryption is performed in the + usual manner. + + 2.A portion of the stream is to be encrypted with a different + key to the remainder. In this case a redundant copy of the last + packet of that portion cannot be sent, since there is no + following packet which is encrypted with the correct key in which + to send it. Similar limitations may occur when + enabling/disabling encryption. + + The choice between these two is a matter for the encoder only. + Decoders can decrypt either form without modification. + + Whilst the addition of low-bandwidth redundancy to an audio stream is + an effective means by which that stream may be protected against + packet loss, application designers should be aware that the addition + of large amounts of redundancy will increase network congestion, and + hence packet loss, leading to a worsening of the problem which the + use of redundancy was intended to solve. At its worst, this can lead + to excessive network congestion and may constitute a denial of + service attack. + + + + + + + + + + + + + + + +Perkins, et. al. Standards Track [Page 8] + +RFC 2198 RTP Payload for Redundant Audio Data September 1997 + + +7 Example Packet + + An RTP audio data packet containing a DVI4 (8KHz) primary, and a + single block of redundancy encoded using 8KHz LPC (both 20ms + packets), as defined in the RTP audio/video profile [3] is + illustrated: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |V=2|P|X| CC=0 |M| PT | sequence number of primary | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | timestamp of primary encoding | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | synchronization source (SSRC) identifier | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |1| block PT=7 | timestamp offset | block length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |0| block PT=5 | | + +-+-+-+-+-+-+-+-+ + + | | + + LPC encoded redundant data (PT=7) + + | (14 bytes) | + + +---------------+ + | | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + | | + + + + | | + + + + | | + + + + | DVI4 encoded primary data (PT=5) | + + (84 bytes, not to scale) + + / / + + + + | | + + + + | | + + +---------------+ + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + + + + + + + + +Perkins, et. al. Standards Track [Page 9] + +RFC 2198 RTP Payload for Redundant Audio Data September 1997 + + +8 Authors' Addresses + + Colin Perkins/Isidor Kouvelas/Orion Hodson/Vicky Hardman + Department of Computer Science + University College London + London WC1E 6BT + United Kingdom + + EMail: {c.perkins|i.kouvelas|o.hodson|v.hardman}@cs.ucl.ac.uk + + + Mark Handley + USC Information Sciences Institute + c/o MIT Laboratory for Computer Science + 545 Technology Square + Cambridge, MA 02139, USA + + EMail: mjh@isi.edu + + + Jean-Chrysostome Bolot/Andres Vega-Garcia/Sacha Fosse-Parisis + INRIA Sophia Antipolis + 2004 Route des Lucioles, BP 93 + 06902 Sophia Antipolis + France + + EMail: {bolot|avega|sfosse}@sophia.inria.fr + + + + + + + + + + + + + + + + + + + + + + + + +Perkins, et. al. Standards Track [Page 10] + +RFC 2198 RTP Payload for Redundant Audio Data September 1997 + + +9 References + + [1] V.J. Hardman, M.A. Sasse, M. Handley and A. Watson; Reliable + Audio for Use over the Internet; Proceedings INET'95, Honalulu, Oahu, + Hawaii, September 1995. http://www.isoc.org/in95prc/ + + [2] Schulzrinne, H., Casner, S., Frederick R., and V. Jacobson, "RTP: + A Transport Protocol for Real-Time Applications", RFC 1889, January + 1996. + + [3] Schulzrinne, H., "RTP Profile for Audio and Video Conferences + with Minimal Control", RFC 1890, January 1996. + + [4] M. Yajnik, J. Kurose and D. Towsley; Packet loss correlation in + the MBone multicast network; IEEE Globecom Internet workshop, London, + November 1996 + + [5] J.-C. Bolot and A. Vega-Garcia; The case for FEC-based error + control for packet audio in the Internet; ACM Multimedia Systems, + 1997 + + [6] Handley, M., and V. Jacobson, "SDP: Session Description Protocol + (draft 03.2)", Work in Progress. + + [7] Bradner, S., "Key words for use in RFCs to indicate requirement + levels", RFC 2119, March 1997. + + + + + + + + + + + + + + + + + + + + + + + + + +Perkins, et. al. Standards Track [Page 11] + |