diff options
Diffstat (limited to 'doc/rfc/rfc5888.txt')
-rw-r--r-- | doc/rfc/rfc5888.txt | 1179 |
1 files changed, 1179 insertions, 0 deletions
diff --git a/doc/rfc/rfc5888.txt b/doc/rfc/rfc5888.txt new file mode 100644 index 0000000..e1f04c4 --- /dev/null +++ b/doc/rfc/rfc5888.txt @@ -0,0 +1,1179 @@ + + + + + + +Internet Engineering Task Force (IETF) G. Camarillo +Request for Comments: 5888 Ericsson +Obsoletes: 3388 H. Schulzrinne +Category: Standards Track Columbia University +ISSN: 2070-1721 June 2010 + + + The Session Description Protocol (SDP) Grouping Framework + +Abstract + + In this specification, we define a framework to group "m" lines in + the Session Description Protocol (SDP) for different purposes. This + framework uses the "group" and "mid" SDP attributes, both of which + are defined in this specification. Additionally, we specify how to + use the framework for two different purposes: for lip synchronization + and for receiving a media flow consisting of several media streams on + different transport addresses. This document obsoletes RFC 3388. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc5888. + +Copyright Notice + + Copyright (c) 2010 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + + + + +Camarillo & Schulzrinne Standards Track [Page 1] + +RFC 5888 SDP Grouping Framework June 2010 + + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 + 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 + 3. Overview of Operation . . . . . . . . . . . . . . . . . . . . 3 + 4. Media Stream Identification Attribute . . . . . . . . . . . . 4 + 5. Group Attribute . . . . . . . . . . . . . . . . . . . . . . . 4 + 6. Use of "group" and "mid" . . . . . . . . . . . . . . . . . . . 4 + 7. Lip Synchronization (LS) . . . . . . . . . . . . . . . . . . . 5 + 7.1. Example of LS . . . . . . . . . . . . . . . . . . . . . . 5 + 8. Flow Identification (FID) . . . . . . . . . . . . . . . . . . 6 + 8.1. SIP and Cellular Access . . . . . . . . . . . . . . . . . 6 + 8.2. DTMF Tones . . . . . . . . . . . . . . . . . . . . . . . . 7 + 8.3. Media Flow Definition . . . . . . . . . . . . . . . . . . 7 + 8.4. FID Semantics . . . . . . . . . . . . . . . . . . . . . . 7 + 8.4.1. Examples of FID . . . . . . . . . . . . . . . . . . . 8 + 8.5. Scenarios That FID Does Not Cover . . . . . . . . . . . . 11 + 8.5.1. Parallel Encoding Using Different Codecs . . . . . . . 11 + 8.5.2. Layered Encoding . . . . . . . . . . . . . . . . . . . 12 + 8.5.3. Same IP Address and Port Number . . . . . . . . . . . 12 + 9. Usage of the "group" Attribute in SIP . . . . . . . . . . . . 13 + 9.1. Mid Value in Answers . . . . . . . . . . . . . . . . . . . 13 + 9.1.1. Example . . . . . . . . . . . . . . . . . . . . . . . 14 + 9.2. Group Value in Answers . . . . . . . . . . . . . . . . . . 15 + 9.2.1. Example . . . . . . . . . . . . . . . . . . . . . . . 15 + 9.3. Capability Negotiation . . . . . . . . . . . . . . . . . . 16 + 9.3.1. Example . . . . . . . . . . . . . . . . . . . . . . . 16 + 9.4. Backward Compatibility . . . . . . . . . . . . . . . . . . 17 + 9.4.1. Offerer Does Not Support "group" . . . . . . . . . . . 17 + 9.4.2. Answerer Does Not Support "group" . . . . . . . . . . 17 + 10. Changes from RFC 3388 . . . . . . . . . . . . . . . . . . . . 18 + 11. Security Considerations . . . . . . . . . . . . . . . . . . . 18 + 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 + 13. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 + 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 + 14.1. Normative References . . . . . . . . . . . . . . . . . . . 20 + 14.2. Informative References . . . . . . . . . . . . . . . . . . 20 + + + + + + + + + + + + + + +Camarillo & Schulzrinne Standards Track [Page 2] + +RFC 5888 SDP Grouping Framework June 2010 + + +1. Introduction + + RFC 3388 [RFC3388] specified a media-line grouping framework for SDP + [RFC4566]. This specification obsoletes RFC 3388 [RFC3388]. + + An SDP [RFC4566] session description typically contains one or more + media lines, which are commonly known as "m" lines. When a session + description contains more than one "m" line, SDP does not provide any + means to express a particular relationship between two or more of + them. When an application receives an SDP session description with + more than one "m" line, it is up to the application to determine what + to do with them. SDP does not carry any information about grouping + media streams. + + While in some environments this information can be carried out of + band, it is necessary to have a mechanism in SDP to express how + different media streams within a session description relate to each + other. The framework defined in this specification is such a + mechanism. + +2. Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + +3. Overview of Operation + + This section provides a non-normative description of how the SDP + Grouping Framework defined in this document works. In a given + session description, each "m" line is identified by a token, which is + carried in a "mid" attribute below the "m" line. The session + description carries session-level "group" attributes that group + different "m" lines (identified by their tokens) using different + group semantics. The semantics of a group describe the purpose for + which the "m" lines are grouped. For example, the "group" line in + the session description below indicates that the "m" lines identified + by tokens 1 and 2 (the audio and the video "m" lines, respectively) + are grouped for the purpose of lip synchronization (LS). + + + + + + + + + + + + +Camarillo & Schulzrinne Standards Track [Page 3] + +RFC 5888 SDP Grouping Framework June 2010 + + + v=0 + o=Laura 289083124 289083124 IN IP4 one.example.com + c=IN IP4 192.0.2.1 + t=0 0 + a=group:LS 1 2 + m=audio 30000 RTP/AVP 0 + a=mid:1 + m=video 30002 RTP/AVP 31 + a=mid:2 + +4. Media Stream Identification Attribute + + This document defines the "media stream identification" media + attribute, which is used for identifying media streams within a + session description. Its formatting in SDP [RFC4566] is described by + the following Augmented Backus-Naur Form (ABNF) [RFC5234]: + + mid-attribute = "a=mid:" identification-tag + identification-tag = token + ; token is defined in RFC 4566 + + The identification-tag MUST be unique within an SDP session + description. + +5. Group Attribute + + This document defines the "group" session-level attribute, which is + used for grouping together different media streams. Its formatting + in SDP is described by the following ABNF [RFC5234]: + + group-attribute = "a=group:" semantics + *(SP identification-tag) + semantics = "LS" / "FID" / semantics-extension + semantics-extension = token + ; token is defined in RFC 4566 + + This document defines two standard semantics: Lip Synchronization + (LS) and Flow Identification (FID). Semantics extensions follow the + Standards Action policy [RFC5226]. + +6. Use of "group" and "mid" + + All of the "m" lines of a session description that uses "group" MUST + be identified with a "mid" attribute whether they appear in the group + line(s) or not. If a session description contains at least one "m" + line that has no "mid" identification, the application MUST NOT + perform any grouping of media lines. + + + + +Camarillo & Schulzrinne Standards Track [Page 4] + +RFC 5888 SDP Grouping Framework June 2010 + + + "a=group" lines are used to group together several "m" lines that are + identified by their "mid" attribute. "a=group" lines that contain + identification-tags that do not correspond to any "m" line within the + session description MUST be ignored. The application acts as if the + "a=group" line did not exist. The behavior of an application + receiving an SDP description with grouped "m" lines is defined by the + semantics field in the "a=group" line. + + There MAY be several "a=group" lines in a session description. The + "a=group" lines of a session description can use the same or + different semantics. An "m" line identified by its "mid" attribute + MAY appear in more than one "a=group" line. + +7. Lip Synchronization (LS) + + An application that receives a session description that contains "m" + lines that are grouped together using LS semantics MUST synchronize + the playout of the corresponding media streams. Note that LS + semantics apply not only to a video stream that has to be + synchronized with an audio stream; the playout of two streams of the + same type can be synchronized as well. + + For RTP streams, synchronization is typically performed using the RTP + Control Protocol (RTCP), which provides enough information to map + time stamps from the different streams into a local absolute time + value. However, the concept of media stream synchronization MAY also + apply to media streams that do not make use of RTP. If this is the + case, the application MUST recover the original timing relationship + between the streams using whatever mechanism is available. + +7.1. Example of LS + + The following example shows a session description of a conference + that is being multicast. The first media stream (mid:1) contains the + voice of the speaker who speaks in English. The second media stream + (mid:2) contains the video component, and the third (mid:3) media + stream carries the translation to Spanish of what she is saying. The + first and second media streams have to be synchronized. + + + + + + + + + + + + + +Camarillo & Schulzrinne Standards Track [Page 5] + +RFC 5888 SDP Grouping Framework June 2010 + + + v=0 + o=Laura 289083124 289083124 IN IP4 two.example.com + c=IN IP4 233.252.0.1/127 + t=0 0 + a=group:LS 1 2 + m=audio 30000 RTP/AVP 0 + a=mid:1 + m=video 30002 RTP/AVP 31 + a=mid:2 + m=audio 30004 RTP/AVP 0 + i=This media stream contains the Spanish translation + a=mid:3 + + Note that although the third media stream is not present in the group + line, it still has to contain a "mid" attribute (mid:3), as stated + before. + +8. Flow Identification (FID) + + An "m" line in an SDP session description defines a media stream. + However, SDP does not define what a media stream is. This definition + can be found in the Real Time Streaming Protocol (RTSP) + specification. The RTSP RFC [RFC2326] defines a media stream as "a + single media instance, e.g., an audio stream or a video stream as + well as a single whiteboard or shared application group. When using + RTP, a stream consists of all RTP and RTCP packets created by a + source within an RTP session". + + This definition assumes that a single audio (or video) stream maps + into an RTP session. The RTP RFC [RFC1889] (at present obsoleted by + [RFC3550]) used to define an RTP session as follows: "For each + participant, the session is defined by a particular pair of + destination transport addresses (one network address plus a port pair + for RTP and RTCP)". + + While the previous definitions cover the most common cases, there are + situations where a single media instance (e.g., an audio stream or a + video stream) is sent using more than one RTP session. Two examples + (among many others) of this kind of situation are cellular systems + using the Session Initiation Protocol (SIP; [RFC3261]) and systems + receiving Dual-Tone Multi-Frequency (DTMF) tones on a different host + than the voice. + +8.1. SIP and Cellular Access + + Systems using a cellular access and SIP as a signalling protocol need + to receive media over the air. During a session, the media can be + encoded using different codecs. The encoded media has to traverse + + + +Camarillo & Schulzrinne Standards Track [Page 6] + +RFC 5888 SDP Grouping Framework June 2010 + + + the radio interface. The radio interface is generally characterized + as being prone to bit errors and associated with relatively high + packet transfer delays. In addition, radio interface resources in a + cellular environment are scarce and thus expensive, which calls for + special measures in providing a highly efficient transport. In order + to get an appropriate speech quality in combination with an efficient + transport, precise knowledge of codec properties is required so that + a proper radio bearer for the RTP session can be configured before + transferring the media. These radio bearers are dedicated bearers + per media type (i.e., codec). + + Cellular systems typically configure different radio bearers on + different port numbers. Therefore, incoming media has to have + different destination port numbers for the different possible codecs + in order to be routed properly to the correct radio bearer. Thus, + this is an example in which several RTP sessions are used to carry a + single media instance (the encoded speech from the sender). + +8.2. DTMF Tones + + Some voice sessions include DTMF tones. Sometimes, the voice + handling is performed by a different host than the DTMF handling. It + is common to have an application server in the network gathering DTMF + tones for the user while the user receives the encoded speech on his + user agent. In this situation, it is necessary to establish two RTP + sessions: one for the voice and the other for the DTMF tones. Both + RTP sessions are logically part of the same media instance. + +8.3. Media Flow Definition + + The previous examples show that the definition of a media stream in + [RFC2326] does not cover some scenarios. It cannot be assumed that a + single media instance maps into a single RTP session. Therefore, we + introduce the definition of a media flow: + + A media flow consists of a single media instance, e.g., an audio + stream or a video stream as well as a single whiteboard or shared + application group. When using RTP, a media flow comprises one or + more RTP sessions. + +8.4. FID Semantics + + Several "m" lines grouped together using FID semantics form a media + flow. A media agent handling a media flow that comprises several "m" + lines MUST send a copy of the media to every "m" line that is part of + the flow as long as the codecs and the direction attribute present in + a particular "m" line allow it. + + + + +Camarillo & Schulzrinne Standards Track [Page 7] + +RFC 5888 SDP Grouping Framework June 2010 + + + It is assumed that the application uses only one codec at a time to + encode the media produced. This codec MAY change dynamically during + the session, but at any particular moment, only one codec is in use. + + The application encodes the media using the current codec and checks, + one by one, all of the "m" lines that are part of the flow. If a + particular "m" line contains the codec being used and the direction + attribute is "sendonly" or "sendrecv", a copy of the encoded media is + sent to the address/port specified in that particular media stream. + If either the "m" line does not contain the codec being used or the + direction attribute is neither "sendonly" nor "sendrecv", nothing is + sent over this media stream. + + The application typically ends up sending media to different + destinations (IP address/port number) depending on the codec used at + any moment. + +8.4.1. Examples of FID + + The session description below might be sent by a SIP user agent using + a cellular access. The user agent supports GSM (Global System for + Mobile communications) on port 30000 and AMR (Adaptive Multi-Rate) on + port 30002. When the remote party sends GSM, it will send RTP + packets to port number 30000. When AMR is the codec chosen, packets + will be sent to port 30002. Note that the remote party can switch + between both codecs dynamically in the middle of the session. + However, in this example, only one media stream at a time carries + voice. The other remains "muted" while its corresponding codec is + not in use. + + v=0 + o=Laura 289083124 289083124 IN IP4 three.example.com + c=IN IP4 192.0.2.1 + t=0 0 + a=group:FID 1 2 + m=audio 30000 RTP/AVP 3 + a=rtpmap:3 GSM/8000 + a=mid:1 + m=audio 30002 RTP/AVP 97 + a=rtpmap:97 AMR/8000 + a=fmtp:97 mode-set=0,2,5,7; mode-change-period=2; + mode-change-neighbor; maxframes=1 + a=mid:2 + + (The linebreak in the fmtp line accommodates RFC formatting + restrictions; SDP does not have continuation lines.) + + + + + +Camarillo & Schulzrinne Standards Track [Page 8] + +RFC 5888 SDP Grouping Framework June 2010 + + + In the previous example, a system receives media on the same IP + address on different port numbers. The following example shows how a + system can receive different codecs on different IP addresses. + + v=0 + o=Laura 289083124 289083124 IN IP4 four.example.com + c=IN IP4 192.0.2.1 + t=0 0 + a=group:FID 1 2 + m=audio 20000 RTP/AVP 0 + c=IN IP4 192.0.2.2 + a=rtpmap:0 PCMU/8000 + a=mid:1 + m=audio 30002 RTP/AVP 97 + a=rtpmap:97 AMR/8000 + a=fmtp:97 mode-set=0,2,5,7; mode-change-period=2; + mode-change-neighbor; maxframes=1 + a=mid:2 + + (The linebreak in the fmtp line accommodates RFC formatting + restrictions; SDP does not have continuation lines.) + + The cellular terminal in this example only supports the AMR codec. + However, many current IP phones only support PCM (Pulse-Code + Modulation; payload 0). In order to be able to interoperate with + them, the cellular terminal uses a transcoder whose IP address is + 192.0.2.2. The cellular terminal includes the transcoder IP address + in its SDP description to provide support for PCM. Remote systems + will send AMR directly to the terminal, but PCM will be sent to the + transcoder. The transcoder will be configured (using whatever method + is preferred) to convert the incoming PCM audio to AMR and send it to + the terminal. + + The next example shows how the "group" attribute used with FID + semantics can indicate the use of two different codecs in the two + directions of a bidirectional media stream. + + v=0 + o=Laura 289083124 289083124 IN IP4 five.example.com + c=IN IP4 192.0.2.1 + t=0 0 + a=group:FID 1 2 + m=audio 30000 RTP/AVP 0 + a=mid:1 + m=audio 30002 RTP/AVP 8 + a=recvonly + a=mid:2 + + + + +Camarillo & Schulzrinne Standards Track [Page 9] + +RFC 5888 SDP Grouping Framework June 2010 + + + A user agent that receives the SDP description above knows that, at a + certain moment, it can send either PCM u-law to port number 30000 or + PCM A-law to port number 30002. However, the media agent also knows + that the other end will only send PCM u-law (payload 0). + + The following example shows a session description with different "m" + lines grouped together using FID semantics that contain the same + codec. + + v=0 + o=Laura 289083124 289083124 IN IP4 six.example.com + c=IN IP4 192.0.2.1 + t=0 0 + a=group:FID 1 2 3 + m=audio 30000 RTP/AVP 0 + a=mid:1 + m=audio 30002 RTP/AVP 8 + a=mid:2 + m=audio 20000 RTP/AVP 0 8 + c=IN IP4 192.0.2.2 + a=recvonly + a=mid:3 + + At a particular point in time, if the media agent receiving the SDP + message above is sending PCM u-law (payload 0), it sends RTP packets + to 192.0.2.1 on port 30000 and to 192.0.2.2 on port 20000 (first and + third "m" lines). If it is sending PCM A-law (payload 8), it sends + RTP packets to 192.0.2.1 on port 30002 and to 192.0.2.2 on port 20000 + (second and third "m" lines). + + The system that generated the SDP description above supports PCM + u-law on port 30000 and PCM A-law on port 30002. Besides, it uses an + application server that records the conversation and whose IP address + is 192.0.2.2. The application server does not need to understand the + media content, so it always receives a copy of the media stream, + regardless of the codec and payload type that is being used. That is + why the application server always receives a copy of the audio stream + regardless of the codec being used at any given moment (it actually + performs an RTP dump, so it can effectively receive any codec). + + Remember that if several "m" lines that are grouped together using + the FID semantics contain the same codec, the media agent MUST send + copies of the same media stream as several RTP sessions at the same + time. + + The last example in this section deals with DTMF tones. DTMF tones + can be transmitted using a regular voice codec or can be transmitted + as telephony events. The RTP payload for DTMF tones treated as + + + +Camarillo & Schulzrinne Standards Track [Page 10] + +RFC 5888 SDP Grouping Framework June 2010 + + + telephone events is described in [RFC4733]. Below, there is an + example of an SDP session description using FID semantics and this + payload type. + + v=0 + o=Laura 289083124 289083124 IN IP4 seven.example.com + c=IN IP4 192.0.2.1 + t=0 0 + a=group:FID 1 2 + m=audio 30000 RTP/AVP 0 + a=mid:1 + m=audio 20000 RTP/AVP 97 + c=IN IP4 192.0.2.2 + a=rtpmap:97 telephone-events + a=mid:2 + + The remote party would send PCM encoded voice (payload 0) to + 192.0.2.1 and DTMF tones encoded as telephony events to 192.0.2.2. + Note that only voice or DTMF is sent at a particular point in time. + When DTMF tones are sent, the first media stream does not carry any + data and, when voice is sent, there is no data in the second media + stream. FID semantics provide different destinations for alternative + codecs. + +8.5. Scenarios That FID Does Not Cover + + It is worthwhile mentioning some scenarios where the "group" + attribute using existing semantics (particularly FID) might seem to + be applicable but is not. + +8.5.1. Parallel Encoding Using Different Codecs + + FID semantics are useful when the application only uses one codec at + a time. An application that encodes the same media using different + codecs simultaneously MUST NOT use FID to group those media lines. + Some systems that handle DTMF tones are a typical example of parallel + encoding using different codecs. Some systems implement the RTP + payload defined in RFC 4733 [RFC4733], but when they send DTMF tones, + they do not mute the voice channel. Therefore, in effect they are + sending two copies of the same DTMF tone: encoded as voice and + encoded as a telephony event. When the receiver gets both copies, it + typically uses the telephony event rather than the tone encoded as + voice. FID semantics MUST NOT be used in this context to group both + media streams, since such a system is not using alternative codecs + but rather different parallel encodings for the same information. + + + + + + +Camarillo & Schulzrinne Standards Track [Page 11] + +RFC 5888 SDP Grouping Framework June 2010 + + +8.5.2. Layered Encoding + + Layered encoding schemes encode media in different layers. The + quality of the media stream at the receiver varies depending on the + number of layers received. SDP provides a means to group together + contiguous multicast addresses that transport different layers. The + "c" line below: + + c=IN IP4 233.252.0.1/127/3 + + is equivalent to the following three "c" lines: + + c=IN IP4 233.252.0.1/127 + c=IN IP4 233.252.0.2/127 + c=IN IP4 233.252.0.3/127 + + FID MUST NOT be used to group "m" lines that do not represent the + same information. Therefore, FID MUST NOT be used to group "m" lines + that contain the different layers of layered encoding schemes. + Besides, we do not define new group semantics to provide a more + flexible way of grouping different layers, because the already + existing SDP mechanism covers the most useful scenarios. Since the + existing SDP mechanism already covers the most useful scenarios, we + do not define a new group semantics to define a more flexible way of + grouping different layers. + +8.5.3. Same IP Address and Port Number + + If media streams using several different codecs have to be sent to + the same IP address and port, the traditional SDP syntax of listing + several codecs in the same "m" line MUST be used. FID MUST NOT be + used to group "m" lines with the same IP address/port. Therefore, an + SDP description like the one below MUST NOT be generated. + + v=0 + o=Laura 289083124 289083124 IN IP4 eight.example.com + c=IN IP4 192.0.2.1 + t=0 0 + a=group:FID 1 2 + m=audio 30000 RTP/AVP 0 + a=mid:1 + m=audio 30000 RTP/AVP 8 + a=mid:2 + + + + + + + + +Camarillo & Schulzrinne Standards Track [Page 12] + +RFC 5888 SDP Grouping Framework June 2010 + + + The correct SDP description for the session above would be the + following one: + + v=0 + o=Laura 289083124 289083124 IN IP4 nine.example.com + c=IN IP4 192.0.2.1 + t=0 0 + m=audio 30000 RTP/AVP 0 8 + + If two "m" lines are grouped using FID, they MUST differ in their + transport addresses (i.e., IP address plus port). + +9. Usage of the "group" Attribute in SIP + + SDP descriptions are used by several different protocols, SIP among + them. We include a section about SIP, because the "group" attribute + will most likely be used mainly by SIP systems. + + SIP [RFC3261] is an application layer protocol for establishing, + terminating, and modifying multimedia sessions. SIP carries session + descriptions in the bodies of the SIP messages but is independent + from the protocol used for describing sessions. SDP [RFC4566] is one + of the protocols that can be used for this purpose. + + At session establishment, SIP provides a three-way handshake + (INVITE-200 OK-ACK) between end systems. However, just two of these + three messages carry SDP, as described in [RFC3264]. + +9.1. Mid Value in Answers + + The "mid" attribute is an identifier for a particular media stream. + Therefore, the "mid" value in the offer MUST be the same as the "mid" + value in the answer. Besides, subsequent offers (e.g., in a + re-INVITE) SHOULD use the same "mid" value for the already existing + media streams. + + [RFC3264] describes the usage of SDP in text of SIP. The offerer and + the answerer align their media description so that the nth media + stream ("m=" line) in the offerer's session description corresponds + to the nth media stream in the answerer's description. + + The presence of the "group" attribute in an SDP session description + does not modify this behavior. + + Since the "mid" attribute provides a means to label "m" lines, it + would be possible to perform media alignment using "mid" labels + rather than matching nth "m" lines. However, this would not bring + any gain and would add complexity to implementations. Therefore, SIP + + + +Camarillo & Schulzrinne Standards Track [Page 13] + +RFC 5888 SDP Grouping Framework June 2010 + + + systems MUST perform media alignment matching nth lines regardless of + the presence of the "group" or "mid" attributes. + + If a media stream that contained a particular "mid" identifier in the + offer contains a different identifier in the answer, the application + ignores all of the "mid" and "group" lines that might appear in the + session description. The following example illustrates this + scenario. + +9.1.1. Example + + Two SIP entities exchange SDPs during session establishment. The + INVITE contains the SDP description below: + + v=0 + o=Laura 289083124 289083124 IN IP4 ten.example.com + c=IN IP4 192.0.2.1 + t=0 0 + a=group:FID 1 2 + m=audio 30000 RTP/AVP 0 8 + a=mid:1 + m=audio 30002 RTP/AVP 0 8 + a=mid:2 + + The 200 OK response contains the following SDP description: + + v=0 + o=Bob 289083122 289083122 IN IP4 eleven.example.com + c=IN IP4 192.0.2.3 + t=0 0 + a=group:FID 1 2 + m=audio 25000 RTP/AVP 0 8 + a=mid:2 + m=audio 25002 RTP/AVP 0 8 + a=mid:1 + + Since alignment of "m" lines is performed based on matching of nth + lines, the first stream had "mid:1" in the INVITE and "mid:2" in the + 200 OK. Therefore, the application ignores every "mid" and "group" + line contained in the SDP description. + + + + + + + + + + + +Camarillo & Schulzrinne Standards Track [Page 14] + +RFC 5888 SDP Grouping Framework June 2010 + + + A well-behaved SIP user agent would have returned the SDP description + below in the 200 OK response. + + v=0 + o=Bob 289083122 289083122 IN IP4 twelve.example.com + c=IN IP4 192.0.2.3 + t=0 0 + a=group:FID 1 2 + m=audio 25002 RTP/AVP 0 8 + a=mid:1 + m=audio 25000 RTP/AVP 0 8 + a=mid:2 + +9.2. Group Value in Answers + + A SIP entity that receives an offer that contains an "a=group" line + with semantics that it does not understand MUST return an answer + without the "group" line. Note that, as described in the previous + section, the "mid" lines MUST still be present in the answer. + + A SIP entity that receives an offer that contains an "a=group" line + with semantics that are understood MUST return an answer that + contains an "a=group" line with the same semantics. The + identification-tags contained in this "a=group" line MUST be the same + as those received in the offer, or a subset of them (zero + identification-tags is a valid subset). When the identification-tags + in the answer are a subset, the "group" value to be used in the + session MUST be the one present in the answer. + + SIP entities refuse media streams by setting the port to zero in the + corresponding "m" line. "a=group" lines MUST NOT contain + identification-tags that correspond to "m" lines with the port set to + zero. + + Note that grouping of "m" lines MUST always be requested by the + offerer, but never by the answerer. Since SIP provides a two-way SDP + exchange, an answerer that requested grouping would not know whether + the "group" attribute was accepted by the offerer or not. An + answerer that wants to group media lines issues another offer after + having responded to the first one (in a re-INVITE, for instance). + +9.2.1. Example + + The example below shows how the callee refuses a media stream offered + by the caller by setting its port number to zero. The "mid" value + corresponding to that media stream is removed from the "group" value + in the answer. + + + + +Camarillo & Schulzrinne Standards Track [Page 15] + +RFC 5888 SDP Grouping Framework June 2010 + + + SDP description in the INVITE from caller to callee: + + v=0 + o=Laura 289083124 289083124 IN IP4 thirteen.example.com + c=IN IP4 192.0.2.1 + t=0 0 + a=group:FID 1 2 3 + m=audio 30000 RTP/AVP 0 + a=mid:1 + m=audio 30002 RTP/AVP 8 + a=mid:2 + m=audio 30004 RTP/AVP 3 + a=mid:3 + + SDP description in the INVITE from callee to caller: + + v=0 + o=Bob 289083125 289083125 IN IP4 fourteen.example.com + c=IN IP4 192.0.2.3 + t=0 0 + a=group:FID 1 3 + m=audio 20000 RTP/AVP 0 + a=mid:1 + m=audio 0 RTP/AVP 8 + a=mid:2 + m=audio 20002 RTP/AVP 3 + a=mid:3 + +9.3. Capability Negotiation + + A client that understands "group" and "mid", but does not want to use + these SDP features in a particular session, may still want to + indicate that it supports these features. To indicate this support, + a client can add an "a=3Dgroup" line with no identification-tags for + every semantics value it understands. + + If a server receives an offer that contains empty "a=group" lines, it + SHOULD add its capabilities also in the form of empty "a=group" lines + to its answer. + +9.3.1. Example + + A system that supports both LS and FID semantics but does not want to + group any media stream for this particular session generates the + following SDP description: + + + + + + +Camarillo & Schulzrinne Standards Track [Page 16] + +RFC 5888 SDP Grouping Framework June 2010 + + + v=0 + o=Bob 289083125 289083125 IN IP4 fifteen.example.com + c=IN IP4 192.0.2.3 + t=0 0 + a=group:LS + a=group:FID + m=audio 20000 RTP/AVP 0 8 + + The server that receives that offer supports FID but not LS. It + responds with the SDP description below: + + v=0 + o=Laura 289083124 289083124 IN IP4 sixteen.example.com + c=IN IP4 192.0.2.1 + t=0 0 + a=group:FID + m=audio 30000 RTP/AVP 0 + +9.4. Backward Compatibility + + This document does not define any SIP "Require" header field. + Therefore, if one of the SIP user agents does not understand the + "group" attribute, the standard SDP fall-back mechanism MUST be used, + namely, attributes that are not understood are simply ignored. + +9.4.1. Offerer Does Not Support "group" + + This situation does not represent a problem, because grouping + requests are always performed by offerers and not by answerers. If + the offerer does not support "group", this attribute will simply not + be used. + +9.4.2. Answerer Does Not Support "group" + + The answerer will ignore the "group" attribute since it does not + understand it and will also ignore the "mid" attribute. For LS + semantics, the answerer might decide to perform, or not to perform, + synchronization between media streams. + + For FID semantics, the answerer will consider the session to consist + of several media streams. + + Different implementations will behave in different ways. + + In the case of audio and different "m" lines for different codecs, an + implementation might decide to act as a mixer with the different + incoming RTP sessions, which is the correct behavior. + + + + +Camarillo & Schulzrinne Standards Track [Page 17] + +RFC 5888 SDP Grouping Framework June 2010 + + + An implementation might also decide to refuse the request (e.g., 488 + Not Acceptable Here, or 606 Not Acceptable), because it contains + several "m" lines. In this case, the server does not support the + type of session that the caller wanted to establish. In case the + client is willing to establish a simpler session anyway, the client + can re-try the request without the "group" attribute and with only + one "m" line per flow. + +10. Changes from RFC 3388 + + Section 3 (Overview of Operation) has been added for clarity. The + AMR and GSM acronyms are now expanded on their first use. The + examples now use IP addresses in the range suitable for examples. + + The grouping mechanism is now defined as an extensible framework. + Earlier, RFC 3388 [RFC3388] used to discourage extensions to this + mechanism in favor of using new session description protocols. + + Given a semantics value, RFC 3388 [RFC3388] used to restrict "m" line + identifiers to only appear in a single group using that semantics. + That restriction has been lifted in this specification. From + conversations with implementers, existing (i.e., legacy) + implementations enforce this restriction on a per-semantics basis. + That is, they only enforce this restriction for supported semantics. + Because of the nature of existing semantics, implementations will + only use a single "m" line identifier across groups using a given + semantics even after the restriction has been lifted by this + specification. Consequently, the lifting of this restriction will + not cause backward-compatibility problems, because implementations + supporting new semantics will be updated to not enforce this + restriction at the same time as they are updated to support the new + semantics. + +11. Security Considerations + + Using the "group" parameter with FID semantics, an entity that + managed to modify the session descriptions exchanged between the + participants to establish a multimedia session could force the + participants to send a copy of the media to any destination of its + choosing. + + Integrity mechanisms provided by protocols used to exchange session + descriptions and media encryption can be used to prevent this attack. + In SIP, Secure/Multipurpose Internet Mail Extensions (S/MIME) + [RFC5750] and Transport Layer Security (TLS) [RFC5246] can be used to + protect session description exchanges in an end-to-end and a hop-by- + hop fashion, respectively. + + + + +Camarillo & Schulzrinne Standards Track [Page 18] + +RFC 5888 SDP Grouping Framework June 2010 + + +12. IANA Considerations + + This document defines two SDP attributes: "mid" and "group". + + The "mid" attribute is used to identify media streams within a + session description, and its format is defined in Section 4. + + The "group" attribute is used for grouping together different media + streams, and its format is defined in Section 5. + + This document defines a framework to group media lines in SDP using + different semantics. Semantics values to be used with this framework + are registered by the IANA following the Standards Action policy + [RFC5226]. + + The IANA Considerations section of the RFC MUST include the following + information, which appears in the IANA registry along with the RFC + number of the publication. + + o A brief description of the semantics. + + o Token to be used within the "group" attribute. This token may be + of any length, but SHOULD be no more than four characters long. + + o Reference to a standards track RFC. + + The following are the current entries in the registry: + + Semantics Token Reference + --------------------------------- ----- ----------- + Lip Synchronization LS [RFC5888] + Flow Identification FID [RFC5888] + Single Reservation Flow SRF [RFC3524] + Alternative Network Address Types ANAT [RFC4091] + Forward Error Correction FEC [RFC4756] + Decoding Dependency DDP [RFC5583] + +13. Acknowledgments + + Goran Eriksson and Jan Holler were coauthors of RFC 3388 [RFC3388]. + + + + + + + + + + + +Camarillo & Schulzrinne Standards Track [Page 19] + +RFC 5888 SDP Grouping Framework June 2010 + + +14. References + +14.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, + A., Peterson, J., Sparks, R., Handley, M., and E. + Schooler, "SIP: Session Initiation Protocol", RFC 3261, + June 2002. + + [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model + with Session Description Protocol (SDP)", RFC 3264, + June 2002. + + [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session + Description Protocol", RFC 4566, July 2006. + + [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an + IANA Considerations Section in RFCs", BCP 26, RFC 5226, + May 2008. + + [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax + Specifications: ABNF", STD 68, RFC 5234, January 2008. + + [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security + (TLS) Protocol Version 1.2", RFC 5246, August 2008. + + [RFC5750] Ramsdell, B. and S. Turner, "Secure/Multipurpose Internet + Mail Extensions (S/MIME) Version 3.2 Certificate + Handling", RFC 5750, January 2010. + +14.2. Informative References + + [RFC1889] Schulzrinne, H., Casner, S., Frederick, R., and V. + Jacobson, "RTP: A Transport Protocol for Real-Time + Applications", RFC 1889, January 1996. + + [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time + Streaming Protocol (RTSP)", RFC 2326, April 1998. + + [RFC3388] Camarillo, G., Eriksson, G., Holler, J., and H. + Schulzrinne, "Grouping of Media Lines in the Session + Description Protocol (SDP)", RFC 3388, December 2002. + + + + + + +Camarillo & Schulzrinne Standards Track [Page 20] + +RFC 5888 SDP Grouping Framework June 2010 + + + [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. + Jacobson, "RTP: A Transport Protocol for Real-Time + Applications", STD 64, RFC 3550, July 2003. + + [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF + Digits, Telephony Tones, and Telephony Signals", RFC 4733, + December 2006. + +Authors' Addresses + + Gonzalo Camarillo + Ericsson + Hirsalantie 11 + Jorvas 02420 + FINLAND + + EMail: Gonzalo.Camarillo@ericsson.com + + + Henning Schulzrinne + Columbia University + 1214 Amsterdam Avenue + New York, NY 10027 + USA + + EMail: schulzrinne@cs.columbia.edu + + + + + + + + + + + + + + + + + + + + + + + + + +Camarillo & Schulzrinne Standards Track [Page 21] + |