From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc4722.txt | 4539 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 4539 insertions(+) create mode 100644 doc/rfc/rfc4722.txt (limited to 'doc/rfc/rfc4722.txt') diff --git a/doc/rfc/rfc4722.txt b/doc/rfc/rfc4722.txt new file mode 100644 index 0000000..5cabcf9 --- /dev/null +++ b/doc/rfc/rfc4722.txt @@ -0,0 +1,4539 @@ + + + + + + +Network Working Group J. Van Dyke +Request for Comments: 4722 E. Burger, Ed. +Category: Informational Cantata Technology, Inc. + A. Spitzer + Pingtel Corporation + November 2006 + + + Media Server Control Markup Language (MSCML) and Protocol + +Status of This Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The IETF Trust (2006). + +Abstract + + Media Server Control Markup Language (MSCML) is a markup language + used in conjunction with SIP to provide advanced conferencing and + interactive voice response (IVR) functions. MSCML presents an + application-level control model, as opposed to device-level control + models. One use of this protocol is for communications between a + conference focus and mixer in the IETF SIP Conferencing Framework. + + + + + + + + + + + + + + + + + + + + + + + +Van Dyke, et al. Informational [Page 1] + +RFC 4722 MSCML November 2006 + + +Table of Contents + + 1. Introduction ....................................................4 + 1.1. Conventions Used in This Document ..........................5 + 2. MSCML Approach ..................................................5 + 3. Use of SIP Request Methods ......................................6 + 4. MSCML Design ....................................................8 + 4.1. Transaction Model ..........................................8 + 4.2. XML Usage ..................................................9 + 4.2.1. MSCML Time Values ...................................9 + 5. Advanced Conferencing ..........................................10 + 5.1. Conference Model ..........................................10 + 5.2. Configure Conference Request .......11 + 5.3. Configure Leg Request .....................13 + 5.4. Terminating a Conference ..................................14 + 5.5. Conference Manipulation ...................................15 + 5.6. Video Conferencing ........................................16 + 5.7. Conference Events .........................................17 + 5.8. Conferencing with Personalized Mixes ......................18 + 5.8.1. MSCML Elements and Attributes for + Personalized Mixes .................................19 + 5.8.2. Example Usage of Personalized Mixes ................20 + 6. Interactive Voice Response (IVR) ...............................23 + 6.1. Specifying Prompt Content .................................24 + 6.1.1. Use of the Prompt Element ..........................24 + 6.2. Multimedia Processing for IVR .............................30 + 6.3. Playing Announcements ..............................31 + 6.4. Prompt and Collect ..........................32 + 6.4.1. Control of Digit Buffering and Barge-In ............33 + 6.4.2. Mapping DTMF Keys to Special Functions .............33 + 6.4.3. Collection Timers ..................................35 + 6.4.4. Logging Caller DTMF Input ..........................36 + 6.4.5. Specifying DTMF Grammars ...........................36 + 6.4.6. Playcollect Response ...............................37 + 6.4.7. Playcollect Example ................................38 + 6.5. Prompt and Record ............................38 + 6.5.1. Prompt Phase .......................................38 + 6.5.2. Record Phase .......................................39 + 6.5.3. Playrecord Example .................................41 + 6.6. Stop Request .......................................42 + 7. Call Leg Events ................................................43 + 7.1. Keypress Events ...........................................43 + 7.1.1. Keypress Subscription Examples .....................45 + 7.1.2. Keypress Notification Examples .....................45 + 7.2. Signal Events .............................................46 + 7.2.1. Signal Event Examples ..............................47 + 8. Managing Content ...............................48 + 8.1. Managecontent Example .....................................50 + + + +Van Dyke, et al. Informational [Page 2] + +RFC 4722 MSCML November 2006 + + + 9. Fax Processing .................................................51 + 9.1. Recording a Fax ...............................51 + 9.2. Sending a Fax ...................................53 + 10. MSCML Response Attributes and Elements ........................56 + 10.1. Mechanism ................................................56 + 10.2. Base Attributes ...............................56 + 10.3. Response Attributes and Elements for .....57 + 10.4. Response Attributes and Elements for ..............57 + 10.4.1. Reporting Content Retrieval Errors ...............58 + 10.5. Response Attributes and Elements for .......59 + 10.6. Response Attributes and Elements for ........60 + 10.7. Response Attributes and Elements for .....61 + 10.8. Response Attributes and Elements for + and ..........................................61 + 11. Formal Syntax .................................................62 + 11.1. Schema ...................................................62 + 12. IANA Considerations ...........................................73 + 12.1. IANA Registration of MIME Media Type application/ + mediaservercontrol+xml ...................................73 + 13. Security Considerations .......................................74 + 14. References ....................................................75 + 14.1. Normative References .....................................75 + 14.2. Informative References ...................................76 + Appendix A. Regex Grammar Syntax .................................78 + Appendix B. Contributors .........................................79 + Appendix C. Acknowledgements .....................................79 + + + + + + + + + + + + + + + + + + + + + + + + + +Van Dyke, et al. Informational [Page 3] + +RFC 4722 MSCML November 2006 + + +1. Introduction + + This document describes the Media Server Control Markup Language + (MSCML) and its usage. It describes payloads that one can send to a + media server using standard SIP INVITE and INFO methods and the + capabilities these payloads implement. RFC 4240 [2] describes media + server SIP URI formats. + + Prior to MSCML, there was not a standard way to deliver SIP-based + enhanced conferencing. Basic SIP constructs, such as those described + in RFC 4240 [2], serve simple n-way conferencing well. The SIP URI + provides a natural mechanism for identifying a specific SIP + conference, while INVITE and BYE methods elegantly implement + conference join and leave semantics. However, enhanced conferencing + applications also require features such as sizing and resizing, in- + conference IVR operations (e.g., recording and playing participant + names to the full conference), and conference event reporting. MSCML + payloads within standard SIP methods realize these features. + + The structure and approach of MSCML satisfy the requirements set out + in RFC 4353 [10]. In particular, MSCML serves as the interface + between the conference server or focus and a centralized conference + mixer. In this case, a media server has the role of the conference + mixer. + + There are two broad classes of MSCML functionality. The first class + includes primitives for advanced conferencing, such as conference + configuration, participant leg manipulation, and conference event + reporting. The second class comprises primitives for interactive + voice response (IVR). These include collecting DTMF digits and + playing and recording multimedia content. + + MSCML fills the need for IVR and conference control with requests and + responses over a SIP transport. VoiceXML [11] fills the need for IVR + with requests and responses over a HTTP transport. This enables + developers to use whatever model fits their needs best. + + In general, a media server offers services to SIP UACs, such as + Application Servers, Feature Servers, and Media Gateway Controllers. + See the IPCC Reference Architecture [12] for definitions of these + terms. It is unlikely, but not prohibited, for end-user SIP UACs to + have a direct signaling relationship with a media server. The term + "client" is used in this document to refer generically to an entity + that interacts with the media server using SIP and MSCML. + + + + + + + +Van Dyke, et al. Informational [Page 4] + +RFC 4722 MSCML November 2006 + + + The media server fulfills the role of the Media Resource Function + (MRF) in the IP Multimedia Subsystem (IMS) [13] as described by 3GPP. + MSCML and RFC 4240 [2], upon which MSCML builds, are specifically + focused on the Media resource (Mr) interface which supports + interactions between application logic and the MRF. + + This document describes a working framework and protocol with which + there is considerable implementation experience. Application + developers and service providers have created several MSCML-based + services since the availability of the initial version in 2001. This + experience is highly relevant to the ongoing work of the IETF, + particularly the SIP [26], SIPPING [27], MMUSIC [28], and XCON [29] + work groups, the IMS [30] work in 3GPP, and the CCXML work in the + Voice Browser Work Group of the W3C. + +1.1. Conventions Used in This Document + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119 [1]. + +2. MSCML Approach + + It is critically important to emphasize that the goal of MSCML is to + provide an application interface that follows the SIP, HTTP, and XML + development paradigm to foster easier and more rapid application + deployment. This goal is reflected in MSCML in two ways. + + First, the programming model is that of peer to peer rather than + master-slave. Importantly, this allows the media server to be used + simultaneously for multiple applications rather than be tied to a + single point of control. It also enables standard SIP mechanisms to + be used for media server location and load balancing. + + Second, MSCML defines constructs and primitives that are meaningful + at the application level to ensure that programmers are not + distracted by unnecessary complexity. For example, the mixing + resource operates on constructs such as conferences and call + participants rather than directly on individual media streams. + + The MSCML paradigm is important to the developer community, in that + developers and operators conceptually write applications about calls, + conferences, and call legs. For the majority of developers and + applications this approach significantly simplifies and speeds + development. + + + + + + +Van Dyke, et al. Informational [Page 5] + +RFC 4722 MSCML November 2006 + + +3. Use of SIP Request Methods + + As mentioned above, MSCML payloads may be carried in either SIP + INVITE or INFO requests. The initial INVITE, which creates an + enhanced conference, MAY include an MSCML payload. A subsequent + INVITE to the same Request-URI joins a participant leg to the + conference. This INVITE MAY include an MSCML payload. The initial + INVITE that establishes an IVR session MUST NOT include an MSCML + payload. The client sends all mid-call MSCML payloads for + conferencing and IVR via SIP INFO requests. + + SIP INVITE requests that contain both MSCML and Session Description + Protocol (SDP) body parts are used frequently in conferencing + scenarios. Therefore, the media server MUST support message bodies + with the MIME type "multipart/mixed" in SIP INVITE requests. + + The media server transports MSCML responses in the final response to + the SIP INVITE containing the matching MSCML request or in a SIP INFO + message. The only allowable final response to a SIP INFO containing + a message body is a 200 OK, per RFC 2976 [3]. Therefore, if the + client sends the MSCML request via SIP INFO, the media server + responds with the MSCML response in a separate INFO request. In + general, these responses are asynchronous in nature and require a + separate transaction due to timing considerations. + + There has been considerable debate on the use of the SIP INFO method + for any purpose. Our experience is that MSCML would not have been + possible without it. At the time the first MSCML specification was + published, the first SIP Event Notification draft had just been + submitted as an individual submission. At that time, there was no + mechanism to link SUBSCRIBE/NOTIFY to an existing dialog. This + prevented its use in MSCML, since all events occurred in an INVITE- + established dialog. And while SUBSCRIBE/NOTIFY was well suited for + reporting conference events, its semantics seemed inappropriate for + modifying a participant leg or conference setting where the only + "event" was the success or failure of the request. Lastly, since SIP + INFO was an established RFC, most SIP stack implementations supported + it at that time. We had few, if any, interoperability issues as a + result. + + More recent developments have provided additional reasons why + SUBSCRIBE/NOTIFY is not appropriate for use in MSCML. Use of + SUBSCRIBE presents two problems. The first is semantic. The purpose + of SUBSCRIBE is to register interest in User Agent state. However, + using SUBSCRIBE for MSCML results in the SUBSCRIBE modifying the User + Agent state. The second reason SUBSCRIBE is not appropriate is + because MSCML is inherently call based. The association of a SIP + dialog with a call leg means MSCML can be incredibly straightforward. + + + +Van Dyke, et al. Informational [Page 6] + +RFC 4722 MSCML November 2006 + + + For example, if one used SUBSCRIBE or other SIP method to send + commands about some context, one must identify that context somehow. + Relating commands to the SIP dialog they arrive on defines the + context for free. Moreover, it is conceptually easy for the + developer. Using NOTIFY to transport MSCML responses is also not + appropriate, as the NOTIFY would be in response to an implicit + subscription. The SIP and SIPPING lists have discussed the dangers + of implicit subscription. + + In order to guarantee interoperability with this specification, as + well as with SIP User Agents that are unaware of MSCML, SIP UACs that + wish to use MSCML services MUST specify a service indicator that + supports MSCML in the initial INVITE. RFC 4240 [2] defines the + service indicator "conf", which MUST be used for MSCML conferencing + applications. The service indicator "ivr" MUST be used for MSCML + interactive voice response applications. In this specification, only + "conf" and "ivr" are described. + + The media server MUST support moving the call between services + through sending the media server a BYE on the existing dialog and + establishing a new dialog with an INVITE to the desired service. + Media servers SHOULD support moving between services without + requiring modification of the previously established SDP parameters. + This is achieved by sending a re-INVITE on the existing dialog in + which the Request-URI is modified to specify the new service desired + by the client. This eliminates the need for the client to send an + INVITE to the caller or gateway to establish new SDP parameters. + + The media server, as a SIP UAS, MUST respond appropriately to an + INVITE that contains an MSCML body. If MSCML is not supported, the + media server MUST generate a 415 final response and include a list of + the supported content types in the response per RFC 3261 [4]. The + media server MUST also advertise its support of MSCML in responses to + OPTIONS requests, by including "application/mediaservercontrol+xml" + as a supported content type in an Accept header. This alleviates the + major issues with using INFO for the transport of application data; + namely, the User Agent's proper interpretation of what is, by design, + an opaque message request. + + + + + + + + + + + + + +Van Dyke, et al. Informational [Page 7] + +RFC 4722 MSCML November 2006 + + +4. MSCML Design + +4.1. Transaction Model + + To avoid undue complexity, MSCML establishes two rules regarding its + usage. The first is that only one MSCML body may be present in a SIP + request. The second is that each MSCML body may contain only one + request or response. This greatly simplifies transaction management. + MSCML syntax does provide for the unique identification of multiple + requests in a single body part. However, this is not supported in + this specification. + + Per the guidelines of RFC 3470 [14], MSCML bodies MUST be well formed + and valid. + + MSCML is a direct request-response protocol. There are no + provisional responses, only final responses. A request may, however, + result in multiple notifications. For example, a request for active + talker reports will result in a notification for each speaker set. + This maps to the three major element trees for MSCML: , + , and . + + Figure 1 shows a request body. Depending on the command, one can + send the request in an INVITE or an INFO. Figure 2 shows a response + body. The SIP INFO method transports response bodies. Figure 3 + shows a notification body. The SIP INFO method transports + notifications. + + + + + ... request body ... + + + + Figure 1: MSCML Request Format + + + + + ... response body ... + + + + Figure 2: MSCML Response Format + + + + + + +Van Dyke, et al. Informational [Page 8] + +RFC 4722 MSCML November 2006 + + + + + + ... notification body ... + + + + Figure 3: MSCML Notification Format + + MSCML requests MAY include a client-defined ID attribute for the + purposes of matching requests and responses. The values used for + these IDs need only be unique within the scope of the dialog in which + the requests are issued. + +4.2. XML Usage + + In the philosophy of XML as a text-based description language, and + not as a programming language, MSCML makes the choice of many + attribute values for readability by a human. Thus, many attributes + that would often be "boolean" instead take "yes" or "no" values. For + example, what does 'report="false"' or 'report="1"' mean? However, + 'report="yes"' is clearer: I want a report. Some programmers prefer + the precision of a boolean. To satisfy both styles, MSCML defines an + XML type, "yesnoType", that takes on the values "yes" and "no" as + well as "true", "false", "1", and "0". + + Many attributes in the MSCML schema have default values. In order to + limit demands on the XML parser, MSCML applies these values at the + protocol, not XML, level. The MSCML schema documents these defaults + as XML annotations to the appropriate attribute. + +4.2.1. MSCML Time Values + + For clarity, time values in MSCML are based on the time designations + described in the Cascading Style Sheets level 2 (CSS2) Specification + [15]. Their format consists of a number immediately followed by an + optional time unit identifier of the following form: + + ms: milliseconds (default) + s: seconds + + If no time unit identifier is present, the value MUST be interpreted + as being in milliseconds. As extensions to [15] MSCML allows the + string values "immediate" and "infinite", which have special meaning + for certain timers. + + + + + + +Van Dyke, et al. Informational [Page 9] + +RFC 4722 MSCML November 2006 + + +5. Advanced Conferencing + +5.1. Conference Model + + The advanced conferencing model is a star controller model, with both + signaling and media directed to a central location. Figure 4 depicts + a typical signaling relationship between end users' UACs, a + conference application server, and a media server. + + RFC 4353 [10] describes this model. The application server is an + instantiation of the conference focus. The media server is an + instantiation of the media mixer. Note that user-level constructs, + such as event notifications, are in the purview of the application + server. This is why, for example, the media server sends active + talker reports using MSCML notifications, while the application + server would instead use the conference package [16] for individual + notifications to SIP user agents. Note that we do not recommend the + use of the conference package for media server to application server + notifications because none of the filtering and membership + information is available at the media server. + + +-------+ + | UAC 1 |---\ Public URI +-------------+ + +-------+ \ _____________| Application | + / / | Server | Not shown: + +-------+ / / +-------------+ RTP flows directly + | UAC 2 |---/ / | Private between UACs and + +-------+ / | URI media server + . / +--------------+ + : / | | + +-------+ / | Media Server | + | UAC n |---/ | | + +-------+ +--------------+ + + Figure 4: Conference Model + + Each UAC sends an INVITE to a Public Conference URI. Presumably, + the client publishes this URI, or it is an ad hoc URI. In any + event, the client generates a Private URI, following the rules + specified by RFC 4240 [2]. That is, the URI is of the following + form: + + sip:conf=UniqueID@ms.example.net + + where UniqueID is a unique conference identifier and + ms.example.net is the host name or IP address of the media server. + There is nothing to prevent the UACs from contacting the media + + + + +Van Dyke, et al. Informational [Page 10] + +RFC 4722 MSCML November 2006 + + + server directly. However, one would expect the owner of the media + server to restrict who can use its resources. + + As for basic conferencing, described by RFC 4240 [2], the first + INVITE to the media server with a UniqueID creates a conference. + However, in advanced conferencing, the first INVITE MAY include a + MSCML payload rather than the SDP of a + conference participant. The payload + conveys extended session parameters (e.g., number of participants) + that SDP does not readily express, but the media server must know + to allocate the appropriate resources. + + When the conference is created by sending an INVITE containing a + MSCML payload, the resulting SIP dialog is + termed the "Conference Control Leg." This leg has several useful + properties. The lifetime of the conference is the same as that of + its control leg. This ensures that the conference remains in + existence even if all participant legs leave or have not yet + arrived. In addition, when the client terminates the Conference + Control Leg, the media server automatically terminates all + participant legs. The Conference Control Leg is also used for + play or record operations to/from the entire conference and for + active talker notifications. Full conference media operations and + active talker report subscriptions MUST be executed on the + Conference Control Leg. + + Creation of a Conference Control Leg is RECOMMENDED because full + advanced conferencing capabilities are not available without it. + Clients MUST establish the Conference Control Leg in the initial + INVITE that creates the conference; it cannot be created later. + + Once the client has created the conference with or without the + Conference Control Leg, participants can be joined to the + conference. This is achieved by the client's directing an INVITE + to the Private Conference URI for each participant. Using the + example conference URI given above, this would be + sip:conf=UniqueID@ms.example.net. + +5.2. Configure Conference Request + + The request has two attributes that control + the resources the media server sets aside for the conference. + These are described in the list below. + + Attributes of : + + o reservedtalkers - optional (see note), no default value: The + maximum number of talker legs allocated for the conference. Note: + + + +Van Dyke, et al. Informational [Page 11] + +RFC 4722 MSCML November 2006 + + + required when establishing the Conference Control Leg but optional + in subsequent requests. + + o reserveconfmedia - optional, default value "yes": Controls + allocation of resources to enable playing or recording to or from + the entire conference + + When the reservedtalkers+1st INVITE arrives at the media server, the + media server SHOULD generate a 486 Busy Here response. Failure to + send a 486 response to this condition can cause the media server to + oversubscribe its resources. + + NOTE: It would be symmetric to have a reservedlisteners parameter. + However, the practical limitation on the media server is the + number of talkers for a mixer to monitor. In either case, the + client regulates who gets into the conference by either proxying + the INVITEs from the user agent clients or metering to whom it + gives the conference URI. + + For example, to create a conference with up to 120 active talkers and + the ability to play audio into the conference or record portions or + all of the conference full mix, the client specifies both attributes, + as shown in Figure 6. + + + + + + + + + Figure 6: 120 Speaker MSCML Example + + In addition to these attributes, a request MAY + contain a child element. The element is used + to request notifications for conference-wide active talker events. + Detailed information regarding active talker events is contained in + Section 5.7. + + The client MUST include a request in the + initial INVITE which establishes the conference when creating the + Conference Control Leg. The client server MUST issue asynchronous + commands, such as , separately (i.e., in INFO messages) to + avoid ambiguous responses. + + Media operations on the Conference Control leg are performed + internally, no external RTP streams are involved. Accordingly, the + + + +Van Dyke, et al. Informational [Page 12] + +RFC 4722 MSCML November 2006 + + + media server does not expect RTP on the Conference Control Leg. + Therefore, the client MUST send either no SDP or hold SDP in the + INVITE request containing a payload. The + media server MUST treat SDP with all media lines set to "inactive" or + with connection addresses set to 0.0.0.0 (for backwards + compatibility) as hold SDP. + + The media server sends a response when it has finished processing the + request. The format of the + response is detailed in Section 10.2. + +5.3. Configure Leg Request + + Conference legs have a number of properties the client can modify. + These are set using the request. This request has + the attributes described in the list below. + + Attributes of : + + o type - optional, default value "talker": Consider this leg's audio + for inclusion in the output mix. Alternative is "listener". + + o dtmfclamp - optional, default value "yes": Remove detected DTMF + digits from the input audio. + + o toneclamp - optional, default value "yes": Remove tones from the + input audio. Tones include call progress tones and the like. + + o mixmode - optional, default value "full": Be a candidate for the + full mix. Alternatives are "mute", to disallow media in the mix, + "parked", to disconnect the leg's media streams from the + conference for IVR operations, "preferred", to give this stream + preferential selection in the mix (i.e., even if not loudest + talker, include media, if present, from this leg in the mix), and + "private", which enables personalized mixes. + + In addition to these attributes, there are four child elements + defined for . These are , , + , and . + + The first two, and , modify the gain applied + to the input and output audio streams, respectively. These may + contain , to use automatic gain control (AGC) or . The + element has the attributes "startlevel", "targetlevel", and + "silencethreshold". All the parameters are in dB. The + element has the attribute "level", which is in dB. The default for + both and is . The media server MAY + + + + +Van Dyke, et al. Informational [Page 13] + +RFC 4722 MSCML November 2006 + + + silently cap or requests that exceed the + gain limits imposed by the platform. + + Clients most commonly manipulate only the input gain for a conference + leg and rely on the mixer to set an optimum output gain based on the + inputs currently in the mix. However, as described above, MSCML does + allow for manipulation of the output gain as well. Some of the IVR + commands, such as , enable control of the output gain for + content playback operations. The interaction of conference output + gain and IVR playback gain controls is described in Section 6.1.1. + Note that and settings apply only to + conference legs and do not apply to IVR sessions. + + The element is used to create and manipulate groups + for personalized mixes. Details of personalized mixes are discussed + in Section 5.8. + + The element is used to request notifications for call leg + related events, such as asynchronous DTMF digit reports. Detailed + information regarding call leg events is discussed in Section 7. + + If the default parameters are acceptable for the leg the client + wishes to enter into the conference, then a normal SIP INVITE, with + no MSCML body, is sufficient. However, if the client wishes to + modify one or more of the parameters, the client can include a MSCML + body in addition to the SDP body. + + The client can modify the conference leg parameters during the + conference by issuing a SIP INFO on the dialog representing the + conference leg. Of course, the client cannot modify SDP in an INFO + message. + + The media server sends a response when it has finished processing the + request. The format of the response + is detailed in Section 10.3. + +5.4. Terminating a Conference + + To remove a leg from the conference, the client issues a SIP BYE + request on the selected dialog representing the conference leg. + + The client can terminate all legs in a conference by issuing a SIP + BYE request on the Conference Control Leg. If one or more + participants are still in the conference when the media server + receives a SIP BYE request on the Conference Control Leg, the media + server issues SIP BYE requests on all remaining conference legs to + ensure cleanup of the legs. + + + + +Van Dyke, et al. Informational [Page 14] + +RFC 4722 MSCML November 2006 + + + The media server returns a 200 OK to the SIP BYE request as it sends + BYE requests to the other legs. This is because we cannot issue a + provisional response to a non-INVITE request, yet the teardown of the + other legs may exceed the retransmission timer limits of the original + request. While the conference is being cleaned up, the media server + MUST reject any new INVITEs to the terminated conference with a 486 + Busy Here response. This response indicates that the specified + conference cannot accept any new members, pending deletion. + +5.5. Conference Manipulation + + Once the conference has begun, the client can manipulate the + conference as a whole or a particular participant leg by issuing + commands on the associated SIP dialog. For example, by sending MSCML + requests on the Conference Control Leg the client can request that + the media server record the conference, play a prompt to the + conference, or request reports on active talker events. Similarly, + the client may mute a participant leg, configure a personalized mix + or request reports for call leg events, such as DTMF keypresses. + + Figure 7 shows an example of an MSCML command that plays a prompt to + all conference participants. + + + + + + + + + + + + Figure 7: Full Conference Audio Command - Play + + A client can modify a leg by issuing an INFO on the dialog associated + with the participant leg. For example, Figure 8 mutes a conference + leg. + + + + + + + + + Figure 8: Sample Change Leg Command + + + +Van Dyke, et al. Informational [Page 15] + +RFC 4722 MSCML November 2006 + + + In Figure 7, we saw a request to play a prompt to the entire + conference. The client can also request to play a prompt to an + individual call leg. In that case, the MSCML request is issued + within the SIP dialog of the desired conference participant. + + Section 6 describes the interactive voice response (IVR) services + offered by MSCML. If an IVR command arrives on the control channel, + it takes effect on the whole conference. This is a mechanism for + playing prompts to the entire conference (e.g., announcing new + participants). If an IVR command arrives on an individual leg, it + only affects that leg. This is a mechanism for interacting with + users, such as the creation of "waiting rooms", allowing a user to + mute themselves using key presses, allowing a moderator to out-dial, + etc. + + A participant leg MUST be configured with mixmode="parked" prior to + the issuance of any IVR commands with prompt content ('prompturl' + attribute or element). Parking the leg isolates the + participant's input and output media from the conference and allows + use of those streams for playing and recording purposes. However, + the mixmode has no effect if just digit collection or recording is + desired. and requests without prompt + content MAY be sent on participant legs without setting + mixmode="parked". + +5.6. Video Conferencing + + MSCML-controlled advanced conferences, as well as RFC 4240 [2] + controlled basic conferences, implicitly support video conferencing + in the form of video switching. In video switching, the video stream + of the loudest talker (with some hysteresis) is sent to all + participants other than that talker. The loudest talker receives the + video stream from the immediately prior loudest talker. + + Media servers MUST ensure that participants receive video media + compatible with their session. For example, a participant who has + established an H.263 video stream will not receive video from another + participant employing H.264 media. Media servers SHOULD implement + video transcoding to minimize media incompatibilities between + participants. + + The media server MUST switch video streams only when it receives a + refresh video frame. A refresh frame contains all the video + information required to decode that frame (i.e., there is no + dependency on data from previous video frames). + + + + + + +Van Dyke, et al. Informational [Page 16] + +RFC 4722 MSCML November 2006 + + + Refresh frames are large and generally sent infrequently to conserve + network bandwidth. The media server MUST implement standard + mechanisms to request that the new loudest talker's video encoder + transmits a refresh frame to ensure that video can be switched + quickly. + +5.7. Conference Events + + A client can subscribe for periodic active talker event reports that + indicate which participants are included in the conference mix. As + these are conference-level events, the subscription and notifications + are sent on the Conference Control Leg. + + Media servers MAY impose limits on the minimum interval for active + talker reports for performance reasons. If the client request is + below the imposed minimum, the media server SHOULD set the interval + to the minimum value supported. To limit unnecessary notification + traffic, the media server SHOULD NOT send a report if the active + talker information for the conference has not changed during the + reporting interval. + + A request for an active talker report is in Figure 9. The active + talker report enumerates the current call legs in the mix. + + + + + + + + + + + + + + + Figure 9: Active Talker Request + + Event notifications are sent in SIP INFO messages. Figure 10 shows + an example of a report. + + + + + + + + + + +Van Dyke, et al. Informational [Page 17] + +RFC 4722 MSCML November 2006 + + + + + + + + + + + + + + + + Figure 10: Active Talker Event Example + + The value of the "callid" attribute in the element + corresponds to the value of the SIP Call-ID header of the associated + dialog. This enables the client to associate the active talker with + a specific participant leg. + +5.8. Conferencing with Personalized Mixes + + MSCML enables clients to create personalized mixes through the + element for scenarios where the standard mixmode + settings do not provide sufficient control. The + element is a child of . + + To create personalized mixes, the client has to identify the + relationships among the participants. This is accomplished by + manipulating two MSCML objects. These objects are: + + 1. The list of team members ( elements), set using + + + 2. The mixmode attribute set through + + The media server uses the values of these objects to determine which + audio inputs to combine for output to the participant. In a normal + conference, each participant hears the conference mix minus their own + input if they are part of the mixed output. The team list enables + the client to specify other participants that the leg can hear in + addition to the normal mixed output. Note that personalized mix + settings apply only to audio media and do not affect video switching. + + Team relationships are implicitly symmetric. If the client sets + participant A as a team member of participant B, then the media + server automatically sets participant B as a team member for A. + + + + +Van Dyke, et al. Informational [Page 18] + +RFC 4722 MSCML November 2006 + + + The id attribute set through is used to identify the + various participants. A unique ID MUST be assigned to each + participant included in a personalized mix. The IDs used MUST be + unique within the scope of the conference in which they appear. + + By itself, the team list only defines those participants that the leg + can hear. The mixmode attribute of each team member determines + whether to include their audio input in the personalized mix. If the + client sets the teammate's mixmode to private, then it is part of the + mix. If the mixmode is set to any other value, it is not. + +5.8.1. MSCML Elements and Attributes for Personalized Mixes + + Control of personalized mixes rely on two major MSCML elements: + + 1. , using the mixmode attribute setting + mixmode="private" + + 2. + + The element allows the user to make the participants + members of a team within a specific conference. It is a child of the + parent element. + + The client sends the element in a + request in either a SIP INVITE or SIP INFO. + + o In an INVITE, to join a participant whose properties differ from + the properties established for the conference as a whole. + + o In an INFO, to change the properties for an existing leg. + + The two attributes of the configure_team element are "id" and + "action". The id attribute MUST contain the unique ID of the leg + being modified, as set in the original request. The + action attribute can take on the values "add", "delete", "query", and + "set". The default value is "query". This attribute allows the user + to modify the team list. Table 1 describes the actions that can be + performed on the team list. + + + + + + + + + + + + +Van Dyke, et al. Informational [Page 19] + +RFC 4722 MSCML November 2006 + + + +--------+----------------------------------------------------------+ + | Action | Description | + +--------+----------------------------------------------------------+ + | add | Adds a teammate to the mix. | + | delete | Deletes a teammate from the mix. | + | query | Returns the teammate list to the requestor. This is the | + | | default value. | + | set | Creates a team list when followed by | + | | and also removes all the teammates from the team list | + | | for example, when the creator (originator) of the team | + | | list on that specific conference leg wants to remove all | + | | of the teammates from the team. If the set operation | + | | removes all teammates from a participant, that | + | | participant hears the full conference mix. | + +--------+----------------------------------------------------------+ + + Table 1: Configure Team Actions + +5.8.2. Example Usage of Personalized Mixes + + A common use of personalized mixing is to support coaching of one + participant by another. The coaching scenario includes three + participants: + 1. The Supervisor, who coaches the agent. + 2. The Agent, who interacts with the customer. + 3. The Customer, who interacts with the agent. + + Table 2 illustrates the details of the coached conference topology. + + +-------------+------------+------------+---------+-----------------+ + | Participant | ID | Team | Mixmode | Hears | + | | | Members | | | + +-------------+------------+------------+---------+-----------------+ + | Supervisor | supervisor | Agent | Private | customer + | + | | | | | agent | + | Agent | agent | Supervisor | Full | customer + | + | | | | | supervisor | + | Customer | customer | none | Full | agent | + +-------------+------------+------------+---------+-----------------+ + + Table 2: Coached Conference Example + + To create this topology, the client performs the following actions: + + 1. The client joins each leg to the conference, being certain to + include a unique ID in the request. The leg ID + needs to be unique only within the scope of the conference to + which it belongs. + + + +Van Dyke, et al. Informational [Page 20] + +RFC 4722 MSCML November 2006 + + + 2. The client configures the teammate list and mixmode of each + participant, as required. + + Both actions (steps 1 and 2) may be combined in a single MSCML + request. The following sections detail these actions and their + corresponding MSCML payloads. + +5.8.2.1. Create the Conference + + Before joining any participants, the client must create the + conference by sending a SIP INVITE that contains an MSCML + request with a unique conference identifier. + +5.8.2.2. Joining and Configuring the Coach + + Join the coach leg to the conference and configure its desired + properties by sending a SIP INVITE containing a + request. The element sets the leg's unique ID to + supervisor and its mixmode to private. + + The corresponding MSCML request is as follows. + + + + + + + + + Figure 11: Join Coach Request + + Note that the client cannot configure the teammate list for the coach + yet, as there are no other participants in the conference. One must + join a participant to the conference before one can add it as a + teammate for another leg. + +5.8.2.3. Joining and Configuring the Agent + + Join the agent leg to the conference and configure its desired + properties by sending a SIP INVITE containing a + request. The element sets the leg's unique ID to + "agent" and sets the supervisor as a team member of the agent. + Because team member relationships are symmetric, this action also + adds the agent as a team member for the coach. + + + + + + + +Van Dyke, et al. Informational [Page 21] + +RFC 4722 MSCML November 2006 + + + The corresponding MSCML request is as follows. + + + + + + + + + + + + Figure 12: Join Agent Request + + Because the desired mixmode for this leg is full, which is the + default value, there is no need to set it explicitly. + +5.8.2.4. Joining and Configuring the Client + + Join the client leg to the conference and configure its desired + properties by sending a SIP INVITE containing a + request. The element simply sets the leg's unique ID + to "customer". The media server does not need further configuration + because the desired mixmode, full, is the default and the customer + has no team members. + + The corresponding MSCML request is as follows. + + + + + + + + Figure 13: Join Client Request + + Strictly speaking, it is not a requirement that the client give the + customer leg a unique ID because it will not be a team member. + However, when using coached conferencing, we RECOMMEND that one + assign a unique ID to each leg in the initial INVITE request. + Assigning a unique ID eliminates the need to set it later by sending + a SIP INFO if one later desires personalized mixing for the customer + leg. + + The conference is now in the desired configuration, shown previously + in Table 2. + + + + + +Van Dyke, et al. Informational [Page 22] + +RFC 4722 MSCML November 2006 + + +6. Interactive Voice Response (IVR) + + In the IVR model, the media server acts as a media-processing proxy + for the UAC. This is particularly useful when the UAC is a media + gateway or other device with limited media processing capability. + + The typical use case for MSCML is when there is an application server + that is the MSCML client. The client can use the SIP Service URI + concept (RFC 3087) to initiate a service. The client then uses RFC + 4240 [2] to initiate a MSCML session on a media server. These + relationships are shown in Figure 14. + + SIP +--------------+ + Service URI | Application | + /----------------| Server | + /(e.g., RFC 3087) +--------------+ + / | MSCML + / SIP | Session + / +--------------+ + +-----+/ RTP | | + | UAC |======================| Media Server | + +-----+ | | + +--------------+ + + Figure 14: IVR Model + + The IVR service supports basic Interactive Voice Response functions, + playing announcements, collecting DTMF digits, and recording, based + on Media Server Control Markup Language (MSCML) directives added to + the message body of a SIP request. The major MSCML IVR requests are + , , and . + + Multifunction media servers MUST use the URI conventions described in + RFC 4240 [2]. The service indicator for MSCML IVR MUST be set to + "ivr", as shown in the following example: + + sip:ivr@ms.example.net + + The VoiceXML IVR service indicator is "dialog". This service + indicator MUST NOT be used for any other interactive voice response + control mechanism. + + The media server MUST accept MSCML IVR payloads in INFO requests and + MUST NOT accept MSCML IVR payloads in the initial or subsequent + INVITEs. The INFO method reduces certain timing issues that occur + with INVITEs and requires less processing on both the client and + media server. + + + + +Van Dyke, et al. Informational [Page 23] + +RFC 4722 MSCML November 2006 + + + The media server notifies the client that the command has completed + through a message containing final status information and + associated data such as collected DTMF digits. + + The media server does not queue IVR requests. If the media server + receives a new IVR request while another is in progress, the media + server stops the first operation and it carries out the new request. + The media server generates a message for the first request + and returns any data collected up to that point. If a client wishes + to stop a request in progress but does not wish to initiate another + operation, it issues a request. This also causes the media + server to generate a message. + + The media server treats a SIP re-INVITE that modifies the established + SDP parameters as an implicit request. Examples of such SDP + modifications include receiving hold SDP or removing an audio or + video stream. When this occurs, the media server immediately + terminates the running , , or request + and sends a indicating "reason=stopped". + +6.1. Specifying Prompt Content + + The MSCML IVR requests support two methods of specifying content to + be delivered to the user. These are the element and the + prompturl attribute. Clients MUST NOT utilize both methods in a + single IVR request. Clients SHOULD use the more flexible + mechanism. Use of the prompturl attribute is deprecated and may not + be supported in future MSCML versions. + +6.1.1. Use of the Prompt Element + + The element MAY be included in the body of a , + , or request to specify a prompt sequence + to be delivered to the caller. The prompt sequence consists of one + or more references to physical content files, spoken variables, or + dynamic URLs that return a sub-sequence of files or variables. In + addition, the element has several attributes that control + playback of the included content. These are described in the list + below. + + Attributes of : + + o baseurl - optional, no default value: For notational convenience, + as well as reducing the MSCML payload size, the "baseurl" + attribute is used to specify a base URL that is prepended to any + other URLs in the sequence that are not fully qualified. + + + + + +Van Dyke, et al. Informational [Page 24] + +RFC 4722 MSCML November 2006 + + + o delay - optional, default value "0": The "delay" attribute to the + prompt element specifies the time to pause between repetitions of + the sequence. It has no effect on the first iteration of + the sequence. Expressed as a time value (Section 4.2.1) from 0 + onwards. + + o duration - optional, default value "infinite": The "duration" + attribute to the prompt element controls the maximum amount of + time that may elapse while the media server repeats the sequence. + This allows the client to set an upper bound on the length of + play. Expressed as a time value (Section 4.2.1) from 1ms onwards + or the strings "immediate" and "infinite". "Immediate" directs + the media server to end play immediately, whereas "infinite" + indicates that the media server imposes no limit. + + o gain - optional, default value "0": Sets the absolute gain to be + applied to the content contained in . The value of this + attribute is specified in units of dB. The media server MAY + silently cap values that exceed the gain limits imposed by the + platform. The level reverts back to its original value when + playback of the content contained in has been completed. + + o gaindelta - optional, default value "0": Sets the relative gain to + be applied to the content contained in . The value of + this attribute is specified in units of dB. The media server MAY + silently cap values which exceed the gain limits imposed by the + platform. The level reverts back to its original value when + playback of the content contained in has been completed. + + o rate - optional, default value "0": Specifies the absolute + playback rate of the content relative to normal as either a + positive percentage (faster) or a negative percentage (slower). + Any value that attempts to set the rate above the maximum allowed + or below the minimum allowed silently sets the rate to the maximum + or minimum. The rate reverts back to its original value when + playback of the content contained in has been completed. + + o ratedelta - optional, default value "0": Specifies the playback + rate of the content relative to it's current rate as either a + positive percentage (faster) or negative percentage (slower). Any + value that attempts to set the rate above the maximum allowed or + below the minimum allowed silently sets the rate to the maximum or + minimum. The rate reverts back to its original value when + playback of the content contained in has completed. + + + + + + + +Van Dyke, et al. Informational [Page 25] + +RFC 4722 MSCML November 2006 + + + o locale - optional, no default value: Specifies the language and + country variant used for resolving spoken variables. The language + is defined as a two-letter code per ISO 639. The country variant + is also defined as a two-letter code per ISO 3166. These codes + are concatenated with a single underscore (%x5F) character. + + o offset - optional, default value "0": A time value (Section 4.2.1) + which specifies the time from the beginning of the sequence at + which play is to begin. Offset only applies to the first + repetition; subsequent repetitions begin play at offset 0. + Allowable values are positive time values from 0 onwards. When + the sequence consists of multiple content files, the offset may + select any point in the sequence. If the offset value is greater + than the total time of the sequence, it will "wrap" to the + beginning and continue from there until the media server reaches + the specified offset. + + o repeat - optional, default value "1": The "repeat" attribute to + the prompt element controls the number of times the media server + plays the sequence in the element. Allowable values are + integers from 0 on and the string "infinite", which indicates that + repetition should occur indefinitely. For example, "repeat=2" + means that the sequence will be played twice, and "repeat=0", + which is allowed, means that the sequence is not played. + + o stoponerror - optional, default value "no": Controls media server + handling and reporting of errors encountered when retrieving + remote content. If set to "yes", content play will end if a fetch + error occurs, and the response will contain details regarding the + failure. If set to "no", the media server will silently move on + to the next URL in the sequence if a fetch failure occurs. + + Clients MUST NOT include both 'gain' and 'gaindelta' attributes + within a single element. + + When the client explicitly controls the output gain on a conference + leg, as described in Section 5.3, the 'gain' and 'gaindelta' + attributes SHOULD interact with the conference leg output gain + settings in the following manner. + + o Conference leg output gain set to : The operation of the + 'gain' and 'gaindelta' attributes are unchanged. However, the + baseline gain value before any playback changes are applied is the + value specified for the conference leg. + + o Conference leg output gain set to : When playback gain + controls are used, the automatic gain control settings for the leg + are suspended for the duration of the playback operation. The + + + +Van Dyke, et al. Informational [Page 26] + +RFC 4722 MSCML November 2006 + + + operation of the 'gain' and 'gaindelta' attributes are unchanged. + The automatic gain control settings are reinstated when playback + has finished. + + Media servers SHOULD support rate controls for content. However, + media servers MAY silently ignore rate change requests if content + limitations do not allow the request to be honored. Clients MUST NOT + include both 'rate' and 'ratedelta' attributes within a single + element. + + Figure 16 shows a sample prompt block. + + + + + Figure 16: Prompt Block Example + +6.1.1.1.