diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc5707.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc5707.txt')
-rw-r--r-- | doc/rfc/rfc5707.txt | 10307 |
1 files changed, 10307 insertions, 0 deletions
diff --git a/doc/rfc/rfc5707.txt b/doc/rfc/rfc5707.txt new file mode 100644 index 0000000..6265c1d --- /dev/null +++ b/doc/rfc/rfc5707.txt @@ -0,0 +1,10307 @@ + + + + + + +Independent Submission A. Saleem +Request for Comments: 5707 Y. Xin +Category: Informational RadiSys +ISSN: 2070-1721 G. Sharratt + Consultant + February 2010 + + + Media Server Markup Language (MSML) + +Abstract + + The Media Server Markup Language (MSML) is used to control and invoke + many different types of services on IP media servers. The MSML + control interface was initially driven by RadiSys with subsequent + significant contributions from Intel, Dialogic, and others in the + industry. Clients can use it to define how multimedia sessions + interact on a media server and to apply services to individuals or + groups of users. MSML can be used, for example, to control media + server conferencing features such as video layout and audio mixing, + create sidebar conferences or personal mixes, and set the properties + of media streams. As well, clients can use MSML to define media + processing dialogs, which may be used as parts of application + interactions with users or conferences. Transformation of media + streams to and from users or conferences as well as interactive voice + response (IVR) dialogs are examples of such interactions, which are + specified using MSML. MSML clients may also invoke dialogs with + individual users or with groups of conference participants using + VoiceXML. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This is a contribution to the RFC Series, independently of any other + RFC stream. The RFC Editor has chosen to publish this document at + its discretion and makes no statement about its value for + implementation or deployment. Documents approved for publication by + the RFC Editor are not a candidate for any level of Internet + Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc5707. + + + + + + +Saleem, et al. Informational [Page 1] + +RFC 5707 Media Server Markup Language February 2010 + + +IESG Note + + This RFC is not a candidate for any level of Internet Standard. The + IETF disclaims any knowledge of the fitness of this RFC for any + purpose and in particular notes that the decision to publish is not + based on IETF review for such things as security, congestion control, + or inappropriate interaction with deployed protocols. The RFC Editor + has chosen to publish this document at its discretion. Readers of + this document should exercise caution in evaluating its value for + implementation and deployment. See RFC 3932 for more information. + +Copyright Notice + + Copyright (c) 2010 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. + +Table of Contents + + 1. Introduction ....................................................4 + 2. Glossary ........................................................5 + 3. MSML SIP Usage ..................................................6 + 3.1. SIP INFO ...................................................7 + 3.2. SIP Control Framework ......................................8 + 4. Language Structure .............................................15 + 4.1. Package Scheme ............................................15 + 4.2. Profile Scheme ............................................18 + 5. Execution Flow .................................................19 + 6. Media Server Object Model ......................................21 + 6.1. Objects ...................................................21 + 6.2. Identifiers ...............................................23 + 7. MSML Core Package ..............................................26 + 7.1. <msml> ....................................................26 + 7.2. <send> ....................................................26 + 7.3. <result> ..................................................27 + 7.4. <event> ...................................................27 + 8. MSML Conference Core Package ...................................28 + 8.1. Conferences ...............................................28 + 8.2. Media Streams .............................................29 + 8.3. <createconference> ........................................31 + 8.4. <modifyconference> ........................................33 + 8.5. <destroyconference> .......................................34 + + + +Saleem, et al. Informational [Page 2] + +RFC 5707 Media Server Markup Language February 2010 + + + 8.6. <audiomix> ................................................35 + 8.7. <videolayout> .............................................36 + 8.8. <join> ....................................................43 + 8.9. <modifystream> ............................................45 + 8.10. <unjoin> .................................................46 + 8.11. <monitor> ................................................47 + 8.12. <stream> .................................................47 + 9. MSML Dialog Packages ...........................................51 + 9.1. Overview ..................................................51 + 9.2. Primitives ................................................53 + 9.3. Events ....................................................55 + 9.4. MSML Dialog Usage with SIP ................................56 + 9.5. MSML Dialog Structure and Modularity ......................57 + 9.6. MSML Dialog Core Package ..................................58 + 9.7. MSML Dialog Base Package ..................................63 + 9.8. MSML Dialog Group Package .................................81 + 9.9. MSML Dialog Transform Package .............................85 + 9.10. MSML Dialog Speech Package ...............................88 + 9.11. MSML Dialog Fax Detection Package ........................92 + 9.12. MSML Dialog Fax Send/Receive Package .....................93 + 10. MSML Audit Package ...........................................100 + 10.1. MSML Audit Core Package .................................100 + 10.2. MSML Audit Conference Package ...........................102 + 10.3. MSML Audit Connection Package ...........................106 + 10.4. MSML Audit Dialog Package ...............................108 + 10.5. MSML Audit Stream Package ...............................110 + 11. Response Codes ...............................................111 + 12. MSML Conference Examples .....................................113 + 12.1. Establishing a Dial-In Conference .......................113 + 12.2. Example of a Sidebar Audio Conference ...................117 + 12.3. Example of Removing a Conference ........................118 + 12.4. Example of Modifying Video Layout .......................118 + 13. MSML Dialog Examples .........................................120 + 13.1. Announcement ............................................120 + 13.2. Voice Mail Retrieval ....................................120 + 13.3. Play and Record .........................................122 + 13.4. Speech Recognition ......................................125 + 13.5. Play and Collect ........................................125 + 13.6. User Controlled Gain ....................................128 + 14. MSML Audit Examples ..........................................128 + 14.1. Audit All Conferences ...................................128 + 14.2. Audit Conference Dialogs ................................129 + 14.3. Audit Conference Streams ................................130 + 14.4. Audit All Connections ...................................131 + 14.5. Audit Connection Dialogs ................................131 + 14.6. Audit Connection Streams ................................132 + 14.7. Audit Connection with Selective States ..................133 + 15. Future Work ..................................................134 + + + +Saleem, et al. Informational [Page 3] + +RFC 5707 Media Server Markup Language February 2010 + + + 16. XML Schema ...................................................134 + 16.1. MSML Core ...............................................136 + 16.2. MSML Conference Core Package ............................140 + 16.3. MSML Dialog Packages ....................................148 + 16.4. MSML Audit Packages .....................................170 + 17. Security Considerations ......................................176 + 18. IANA Considerations ..........................................176 + 18.1. IANA Registrations for 'application' MIME Media Type ....176 + 18.2. IANA Registrations for 'text' MIME Media Type ...........178 + 18.3. URN Sub-Namespace Registration ..........................179 + 18.4. XML Schema Registration .................................180 + 19. References ...................................................181 + 19.1. Normative References ....................................181 + 19.2. Informative References ..................................182 + Acknowledgments ..................................................183 + +1. Introduction + + Media servers contain dynamic pools of media resources. Control + agents and other users of media servers (called media server clients) + can define and create many different services based on how they + configure and use those resources. Often, that configuration and the + ways in which those resources interact will be changed dynamically + over the course of a call, to reflect changes in the way that an + application interacts with a user. + + For example, a call may undergo an initial IVR dialog before being + placed into a conference. Calls may be moved from a main conference + to a sidebar conference and then back again. Individual calls may be + directly bridged to create small n-way calls or simple sidebars. + None of these change the SIP [n1] dialog or RTP [i3] session. Yet + these do affect the media flow and processing internal to the media + server. + + The Media Server Markup Language (MSML) is an XML [n2] language used + to control the flow of media streams and services applied to media + streams within a media server. It is used to invoke many different + types of services on individual sessions, groups of sessions, and + conferences. MSML allows the creation of conferences, bridging + different sessions together, and bridging sessions into conferences. + + MSML may also be used to create user interaction dialogs and allows + the application of media transforms to media streams. Media + interaction dialogs created using MSML allow construction of IVR + dialog sessions to individual users as well as to groups of users + participating in a conference. Dialogs may also be specified using + other languages, VoiceXML [n5], which support complete single-party + application logic to be executed on the media server. + + + +Saleem, et al. Informational [Page 4] + +RFC 5707 Media Server Markup Language February 2010 + + + MSML is a transport independent language, such that it does not rely + on underlying transport mechanisms and language semantics are + independent of transport. However, SIP is a typical and commonly + used transport mechanism for MSML, invoked using the SIP URI scheme. + This specification defines using MSML dialogs using SIP as the + transport mechanism. + + A network connection may be established with the media server using + SIP. Media received and transmitted on that connection will flow + through different media resources on the media server depending on + the requested service. Basic Network Media Services with SIP [n7] + defines conventions for associating a basic service with a SIP + Request-URI. MSML allows services to be dynamically applied and + changed by a control agent during the lifetime of the SIP dialog. + + MSML has been designed to address the control and manipulation of + media processing operations (e.g., announcement, IVR, play and + record, automatic speech recognition (ASR), text to speech (TTS), + fax, video), as well as control and relationships of media streams + (e.g., simple and advanced conferencing). It provides a general- + purpose media server control architecture. MSML can additionally be + used to invoke other more complex IVR languages such as VoiceXML. + + The MSML control interface has been widely deployed in the industry, + with numerous client-side and server-side implementations, since + 2003. The in-service commercial deployments cover a wide variety of + applications including, but not limited to, IP multimedia + conferencing, network voice services, IVR, IVVR (interactive voice + and video response), and voice/video mail. + +2. Glossary + + Media Server: a general-purpose platform for executing real-time + media processing tasks. This is a logical function that maps either + to a single physical device or to a portion of a physical device. + + Media Server Client: an application that originates MSML requests to + a media server and also referred to as a control agent in this + specification. + + Network Connection: a participant that represents the termination on + a media server of one or more RTP [i3] sessions (for example, audio + and video) associated with a call. Network connections are + established and removed using a session establishment protocol such + as SIP. An instance of a network connection is independent of MSML + processing instructions applied to it. + + + + + +Saleem, et al. Informational [Page 5] + +RFC 5707 Media Server Markup Language February 2010 + + + Dialog: an automated IVR participant. Examples of dialogs may be + announcement players, IVR interfaces, or voice recorders. Dialogs + may be defined in MSML or using VoiceXML [n5]. + + Conference: an intermediary function that provides multimedia mixing + and other advanced conferencing services. This specification + currently considers conferences with audio and/or video media types, + but is extensible to other media types. + + Identifier: a name that is used to refer to a specific instance of an + object on the media server, such as a conference or a dialog. + Identifiers are composed of one or more terms where each term + identifies an object class and instance. + + Object: the generic term for a media server entity that terminates, + originates, or processes media. This specification defines four + classes of objects and specifies mechanisms to create them, join them + together, and destroy them. + + Participant Object: an object in a media server that sources original + media in a call and/or receives and terminates media in a call. + + Intermediary Object: an object in a media server that acts on media + within a call for the benefit of the participants. + + Independent Object: an object that can exist on a media server + independent of other objects. + + Operator: an intermediary transformer that modifies or transforms a + media stream. Examples of operators may be audio gain controls, + video scaling, or voice masking. MSML defines operators as media + transform objects, which transform media using operations such as + gain control, when applied to media streams. + + Media Stream: a single media flow between two objects. A media + stream has a media type and may be unidirectional or bidirectional. + +3. MSML SIP Usage + + SIP is used to create and modify media sessions with a media server + according to the procedures defined in RFC 3261 [n1]. Often, SIP + third party call control [i4] will be used to create sessions to a + media server on behalf of end users. MSML is used to define and + change the service that a user connected to a media server will + receive. MSML clients are application servers, soft-switches, or + other forms of control agents, and SHOULD have an authorized security + relationship with the media server. MSML itself does not define + authorization mechanisms. + + + +Saleem, et al. Informational [Page 6] + +RFC 5707 Media Server Markup Language February 2010 + + + MSML transactions are originated based upon events that occur in the + application domain. These events may be independent from any media + or user interaction. For example, an application may wish to play an + announcement to a conference warning that its scheduled completion + time is approaching. Applications themselves are structured in many + different ways. Their structure and requirements contribute to their + selection of protocols and languages. To accommodate differing + application needs, MSML has been designed to be neutral to other + languages and independent of the transport used to carry it. + + MSML is purposely designed to be transport independent. In this + release of the specification, SIP INFO [i5] and SIP Control Framework + [i11] have been chosen for transport mechanisms for MSML, as + described in the following sections. + +3.1. SIP INFO + + SIP INVITE and INFO [i5] requests and responses MAY be used to carry + MSML. INFO requests allow asynchronous mid-call messages within SIP + with few additional semantics. In addition, there are existing + widely deployed implementations of that method, it aids in initial + developments that are closely coupled with SIP session establishment, + and it allows MSML to be directly associated with user dialogs when + third party call control is used. + + Although INFO is sometimes considered not to be a suitable general- + purpose transport mechanism for messages within SIP, there have been + proposals to make it more acceptable. MSML may evolve to include + other SIP usage and/or to work with other protocols or as a stand- + alone protocol established through SIP, in future releases of this + document. + + MSML supports several models for client interaction. When clients + use 3PCC to establish media sessions on behalf of end users, clients + will have a SIP dialog for each media session. MSML MAY be sent on + these dialogs. However the targets of MSML actions are not inferred + from the session associated with the SIP dialog. The targets of MSML + actions are always explicitly specified using identifiers as + previously defined. + + An application, after interacting with a user, may want to affect + multiple objects within a media server. For example, tones or + messages are often played to a conference when connections are added + or removed. A separate message may also be played to a participant + as they are joined, or to moderators. Explicit identifiers, that is, + not inferred from a transport mechanism, allow these multiple actions + to be easily grouped into a single transaction sent on any SIP + dialog. + + + +Saleem, et al. Informational [Page 7] + +RFC 5707 Media Server Markup Language February 2010 + + + MSML also supports a model of dedicated control associations. This + supports decoupled application architectures where a client can + control media server services without also establishing all of the + media sessions itself. Control associations are created using SIP, + but they do not have any associated media session. Although + initially INFO messages will be sent on this SIP dialog, just as with + dialogs associated with media sessions, it is possible that in the + future, the SIP dialog will be used to establish a separate control + session (defined in SDP [n9]) that does not use SIP as the transport + for MSML messages. + + A media server using MSML also sends asynchronous events to a client + using MSML scripts in SIP INFO. Events are sent based on previous + MSML requests and are sent within the SIP dialog on which the MSML + request that caused the event to be generated was received. If this + dialog no longer exists when the event is generated, the event is + discarded. + + Events may be generated during the execution of a dialog created by a + <dialogstart> element. For example, dialogs can send events based on + user input. VoiceXML dialogs, on the other hand, generally interact + with other servers outside of MSML using HTTP. + + An event is also generated when the execution of a dialog terminates, + because of either completion or failure. The exact information + returned is dependent on the dialog language, the capabilities of the + dialog execution environment, and what was requested by the dialog. + Both MSML and VoiceXML [n5] allow information to be returned when + they exit. These events may be sent in a SIP INFO or a SIP BYE. SIP + BYE is used when the dialog itself specifies that the connection + should be disconnected, for example, through the use of the + <disconnect> element. + + Conferences may also generate events based upon their configuration. + An example of this is the notification of the set of active speakers. + +3.2. SIP Control Framework + + The SIP Control Framework [i11] MAY be used as a transport mechanism + for MSML. + + The Control Framework provides a generic approach for establishment + and reporting capabilities of remotely initiated commands. The + framework utilizes many functions provided by the Session Initiation + Protocol (SIP) [n1] for the rendezvous and establishment of a + reliable channel for control interactions. Compared to SIP INFO, the + + + + + +Saleem, et al. Informational [Page 8] + +RFC 5707 Media Server Markup Language February 2010 + + + SIP Control Framework is a more general-purpose transport mechanism + and one that is not constrained by limitations of the SIP INFO + mechanism. + + The Control Framework also introduces the concept of a Control + Package, which is an explicit usage of the Control Framework for a + particular interaction set. This specification has already specified + a list of packages for MSML to control the media server in many + aspects, including basic dialog, advanced conferencing, advanced + dialog, and audit service. Each of these packages has a unique + Control Package name assigned in order for MSML to be used with the + Control Framework. + + This section fulfills the mandatory requirement for information that + MUST be specified during the definition of a Control Framework + Package, as detailed in SIP Control Framework [i11]. + +3.2.1. Control Framework Package Names + + The Control Framework [i11] requires a Control Package definition to + specify and register a unique name. + + MSML specification defines Control Package names using a hierarchical + scheme to indicate the inherited relationship across packages. For + example, package "msml-x" is derived from package "msml", and package + "msml-x-y" is derived from package "msml-x". + + The following is a list of Control Package names reserved by the MSML + specification. + + "msml": this Control Package supports MSML Core Package as specified + in section 7. + + "msml-conf": this Control Package supports MSML Conference Core + Package as specified in section 8. + + "msml-dialog": this Control Package supports MSML Dialog Core Package + as specified in section 9.6. + + "msml-dialog-base": this Control Package supports MSML Dialog Base + Package as specified in section 9.7. + + "msml-dialog-group": this Control Package supports MSML Dialog Group + Package as specified in section 9.8. + + "msml-dialog-transform": this Control Package supports MSML Dialog + Transform Package as specified in section 9.9. + + + + +Saleem, et al. Informational [Page 9] + +RFC 5707 Media Server Markup Language February 2010 + + + "msml-dialog-speech": this Control Package supports MSML Dialog + Speech Package as specified in section 9.10. + + "msml-dialog-fax-detect": this Control Package supports MSML Dialog + Fax Detection Package as specified in section 9.11. + + "msml-dialog-fax-sendrecv": this Control Package supports MSML Dialog + Fax Send/Receive Package as specified in section 9.12. + + "msml-audit": this Control Package supports MSML Audit Core Package + as specified in section 10.1. + + "msml-audit-conf": this Control Package supports MSML Audit + Conference Package as specified in section 10.2. + + "msml-audit-conn": this Control Package supports MSML Audit + Connection Package as specified in section 10.3. + + "msml-audit-dialog": this Control Package supports MSML Audit Dialog + Package as specified in section 10.4. + + "msml-audit-stream": this Control Package supports MSML Audit Stream + Package as specified in section 10.5. + + An application server using the Control Framework as transport for + MSML MUST use one or multiple package names, depending on the service + required from the media server. The package name(s) are identified + in the "Control-Packages" SIP header that is present in the SIP + INVITE dialog request that creates the control channel, as specified + in [i11]. The "Control-Packages" value MAY be re-negotiated via the + SIP re-INVITE mechanism. + +3.2.2. Control Framework Messages + + The usage of CONTROL, response, and REPORT messages, as defined in + [i11], by each Control Package defined in MSML is different and + described separately in the following sections. + + MSML Core Package "msml" + + The application server may send a CONTROL message with a body + of MSML request using the following elements to the MS: + + <msml>: the root element that may contain a list of child + elements that request a specific operation. The child elements + are defined in extended packages (e.g., "msml-conf" and "msml- + dialog"). This element is also the root element that contains + an MSML result and event. + + + +Saleem, et al. Informational [Page 10] + +RFC 5707 Media Server Markup Language February 2010 + + + <send>: sends an event to the specified recipient within the + media server. Specific event types are defined within the + extended packages. + + The media server replies with a response message containing a + MSML result using the following elements: + + <result>: reports the results of an MSML transaction. + + The media server MAY send the MSML event to the application + server, in a REPORT or CONTROL message, using the element + <event>. The actual content of the <event> and which Control + Framework message to use are defined within the extended + packages. + + MSML Conference Core Package "msml-conf" + + This package extends the MSML Core Package to define a + framework for creation, manipulation, and deletion of a + conference. + + The AS can send a CONTROL message with a body of the MSML + request that contains one or multiple conference-related + commands to the MS. The MS then replies with a response + message with a body of the MSML result to indicate whether or + not the request has been fulfilled. + + During the lifetime of a conference, whenever an event occurs, + the media server MAY send CONTROL messages containing MSML + events to notify the application server. The application + server SHOULD reply with a response message with no MSML body + to acknowledge the event has been received. + + This package does NOT use the REPORT message. + + Dialog Core Package "msml-dialog" + + This package extends the MSML Core Package to define the + structural framework and abstractions for MSML dialogs. + + The application server MAY send CONTROL messages containing a + MSML request using the following elements: + + <dialogstart>: instantiate an MSML media dialog on a connection + or a conference. + + <dialogend>: terminates an MSML dialog. + + + + +Saleem, et al. Informational [Page 11] + +RFC 5707 Media Server Markup Language February 2010 + + + <send>: sends an event and an optional namelist to the dialog, + dialog group, or dialog primitive. + + <exit>: used by the dialog description language to cause the + execution of the MSML dialog to terminate. + + For the <dialogstart> command, the response message MUST + contain an MSML result that indicates that the dialog has been + started successfully. The MSML result MAY contain <dialogid> + to return the dialog identifier, if the identifier was assigned + by the media server. Subsequently, zero or more MSML events + MAY be initiated by the media server in (update) REPORT + messages to report information gathered during the dialog. + Finally, an MSML event "msml.dialog.exit" SHOULD be generated + in a (terminate) REPORT message when the dialog terminates + (e.g., MSML execution of <exit>). + + For the <dialogend> and <send> commands, the response message + contains the final MSML result that indicates that the request + has either been fulfilled or rejected. + + Dialog Base Package "msml-dialog-base" + + This package extends the MSML Dialog Core Package to define a + set of base functionality for MSML dialogs. The extension + defines individual media primitives, including <play>, + <dtmfgen>, <tonegen>, <record>, <dtmf> and <collect>, to be + used as child element of <dialogstart>. This package does not + change the framework message usage as defined by the MSML + Dialog Core Package. + + Dialog Transform Package "msml-dialog-transform" + + This package extends the MSML Dialog Core Package to define a + set of transform primitives that works as filter on half-duplex + media streams. The extension defines transform primitives, + including <vad>, <gain>, <agc>, <gate>, <clamp> and <relay>, + that MAY be used as child elements of <dialogstart>. This + package does not change the framework message usage as defined + by the MSML Dialog Core Package. + + Dialog Group Package "msml-dialog-group" + + This package extends the MSML Dialog Core, Base, and Transform + Packages to define a single control flow construct that + specifies concurrent execution of multiple media primitives. + The extension defines the <group> element that MAY be used as a + child element of <dialogstart> to enclose multiple media + + + +Saleem, et al. Informational [Page 12] + +RFC 5707 Media Server Markup Language February 2010 + + + primitives, such that they can be executed concurrently. This + package does not change the framework message usage as defined + by the MSML Dialog Core Package. + + Dialog Speech Package "msml-dialog-speech" + + This package extends the MSML Dialog Core and MSML Base Package + to define functionality that MAY be used for automatic speech + recognition and text to speech. The extension extends the + <dialogstart> and the <play> elements. + + For <dialogstart>, it defines a new child element <speech> to + activate grammars or user input rules associated with speech + recognition. For <play>, it defines a new child element <tts> + to initiate the text-to-speech service. + + This package does not change the framework message usage as + defined by the MSML Dialog Core Package. + + Dialog Fax Detection Package "msml-dialog-fax-detect" + + This package extends the MSML Dialog Core Package to define + primitives provide fax detection service. The extension + defines a primitive <faxdetect> to be used as a child element + of <dialogstart>. This package does not change the framework + message usage as defined by the MSML Dialog Core Package. + + Dialog Fax Send/Receive Package "msml-dialog-fax-sendrecv" + + This package extends the MSML Dialog Core Package to define + primitives that allow a media server to provide fax send or + receive service. The extension defines new primitives + <faxsend> and <faxrcv>, to be used as a child element of + <dialogstart>. This package does not change the framework + message usage as defined by the MSML Dialog Core Package. + + Dialog Audit Core Package "msml-audit" + + This package extends the MSML Core Package to define a + framework for auditing media resource(s) allocated on the media + server. + + This package follows a simple request/response transaction, + allowing the application server to send CONTROL messages + containing MSML <audit> requests. The media server MUST reply + with a response message containing the result. The result is + contained within the <auditresult> element, returning the + queried state information. + + + +Saleem, et al. Informational [Page 13] + +RFC 5707 Media Server Markup Language February 2010 + + + This package does NOT use the REPORT message. + + Dialog Audit Conference Package "msml-audit-conf" + + This package extends the MSML Audit Core Package to define + conference specific states that MAY be queried via the <audit> + command and the corresponding response MUST be returned by the + <auditresult> element. This package does not change the + framework message usage as defined by the MSML Audit Core + Package. + + Dialog Audit Connection Package "msml-audit-conn" + + This package extends the MSML Audit Core Package to define + connection specific states that MAY be queried via the <audit> + command and the corresponding response MUST be returned by the + <auditresult> element. This package does not change the + framework message usage as defined by the MSML Audit Core + Package. + + Dialog Audit Dialog Package "msml-audit-dialog" + + This package extends the MSML Audit Core Package to define + dialog specific states that MAY be queried via the <audit> + command and the corresponding response MUST be returned by the + <auditresult> element. This package does not change the + framework message usage as defined by the MSML Audit Core + Package. + + Dialog Audit Stream Package "msml-audit-stream" + + This package extends the MSML Audit Core Package to define + stream specific states that MAY be queried via the <audit> + command and the corresponding response MUST returned by the + <auditresult> element. This package does not change the + framework message usage as defined by the MSML Audit Core + Package. + +3.2.3. Common XML Support + + The XML schema described in [i11] MUST be supported by all Control + Packages defined by MSML. However, the "connection-id" value MUST be + constructed as defined by MSML (i.e., the identifier MUST contain a + local dialog tag only, while the SIP Control Framework [i11] requires + that the "connection-id" contain both local and remote dialog tags). + + + + + + +Saleem, et al. Informational [Page 14] + +RFC 5707 Media Server Markup Language February 2010 + + +3.2.4. Control Message Body + + A valid CONTROL body message MUST conform to the MSML schema, as + included in this specification, for the MSML package(s) used. + +3.2.5. REPORT Message Body + + A valid REPORT body message MUST conform to the MSML schema, as + included in this specification, for the MSML package(s) used. + +4. Language Structure + +4.1. Package Scheme + + The primary mechanism for extending MSML is the "package". A package + is an integrated set of one or more XML schemas that define + additional features and functions via new or extended use of elements + and attributes. Each package, except for those defined in the + current document, is defined in a separate standards document, e.g., + an Internet Draft or an RFC. All packages that extend the base MSML + functionality MUST include references to the MSML base set of schemas + provided in the Internet Drafts. A schema in a package MUST only + extend MSML; that is, it must not alter the existing specification. + + A particular MSML script will include references to all the schemas + defining the packages whose elements and attributes it makes use of. + A particular script MUST reference MSML base and optionally extension + package(s). See the IANA Considerations section. + + Each package MUST define its own namespace so that elements or + attributes with the same name in different packages do not conflict. + A script using a particular element or attribute MUST prefix the + namespace name on that element or attribute's name if it is defined + in a package (as opposed to being defined in the base). + + MSML consists of a core package that provides structure without + support for any specific feature set. Additional packages, relying + on the core package, provide functional features. Any combination of + additional packages may be used along with the core package. The + following describes the set of MSML packages defined in this + document. + + + + + + + + + + +Saleem, et al. Informational [Page 15] + +RFC 5707 Media Server Markup Language February 2010 + + + +--------------------------------------------------------+ + | MSML Core | + +--------------------------------------------------------+ + / \ \ + +--------+ +--------+ +-------+ + | Dialog | | Conf | | Audit | + | Core | | Core | | Core | + +--------+ +--------+ +-------+ + ________ \_______________________________________ | + ------------------------------------------------ | + / \ \ \ \ \ | + +------+ +---------+ +------+ +------+ +------+ +-------+ | + |Dialog| |Dialog | |Dialog| |Dialog| |Dialog| |Dialog | | + |Base | |Transform| |Group | |Speech| |Fax | |Fax | | + +------+ +---------+ +------+ +------+ |Detect| |Send/ | | + +------+ |Receive| | + +-------+ | + ________________________| + ------------------------- + / \ \ \ + +-----+ +-----+ +------+ +------+ + |Audit| |Audit| |Audit | |Audit | + |Conf | |Conn | |Dialog| |Stream| + +-----+ +-----+ +------+ +------+ + + + o MSML Core Package (Mandatory) + + Describes the minimum base framework that MUST be implemented to + support additional core packages. + + o MSML Conference Core Package (Conditionally Mandatory, for + Conferencing) + + Describes the audio and multimedia basic and advanced conferencing + package that MAY be implemented. + + o MSML Dialog Core Package (Conditionally Mandatory, for Dialogs) + + Describes the dialog core package that MUST be implemented for any + dialog services. However, systems supporting conferencing only, + MAY omit support for MSML dialogs. The MSML Dialog Core Package + specifies the framework within which additional dialog packages + are supported. The MSML Dialog Base Package MUST be supported, + while all other dialog packages MAY be supported. + + o MSML Dialog Base Package (Conditionally Mandatory, for Dialogs) + + + + +Saleem, et al. Informational [Page 16] + +RFC 5707 Media Server Markup Language February 2010 + + + o MSML Dialog Group Package (Optional) + + o MSML Dialog Transform Package (Optional) + + o MSML Dialog Fax Detection Package (Optional) + + o MSML Dialog Fax Send/Receive Package (Optional) + + o MSML Dialog Speech Package (Optional) + + o MSML Audit Core Package (Conditionally Mandatory, for Auditing) + + Describes the audit core package that MUST be implemented to + support auditing services. The MSML audit core package specifies + the framework within which additional audit packages are + supported. + + o MSML Audit Conference Package (Conditionally Mandatory, for + Auditing Conference, Conference Dialog, and Conference Stream) + + o MSML Audit Connection Package (Conditionally Mandatory, for + Auditing Connection, Connection Dialog, and Connection Stream) + + o MSML Audit Dialog Package (Conditionally Mandatory, for Auditing + Dialog, and MUST be used with either MSML Audit Conference + Package or MSML Audit Connection Package) + + o MSML Audit Stream Package (Conditionally Mandatory, for Auditing + Stream, and MUST be used with either MSML Audit Conference + Package or MSML Audit Connection Package) + + The formal process for defining extensions to MSML dialogs is to + define a new package. The new package MUST provide a text + description of what extensions are included and how they work. It + MUST also define an XML schema file (if applicable) that defines the + new package (which may be through extension, restriction of an + existing package, or a specific profile of an existing package). + Dependencies upon other packages MUST be stated. For example, a + package that extends or restricts has a dependency on the original + package specification. Finally, the new package MUST be assigned a + unique name and version. + + The types of things that can be defined in new packages are: + + o new primitives + + o extensions to existing primitives (events, shadow variables, + attributes, content) + + + +Saleem, et al. Informational [Page 17] + +RFC 5707 Media Server Markup Language February 2010 + + + o new recognition grammars for existing primitives + + o new markup languages for speech generation + + o languages for specifying a topology schema + + o new predefined topology schemas + + o new variables / segment types (sets & languages) + + o new control flow elements + + MSML packages are assembled together to form a specific MSML profile + that is shared between different implementations. The base MSML + dialog profiles that are defined in this document consist of the MSML + Core Package, MSML Dialog Core Package, MSML Dialog Base Package, + MSML Dialog Group Package, MSML Transform Package, MSML Fax Packages, + and the MSML Speech Package. + + MSML extension packages, which define primitives, MUST define the + following for each primitive within the package: + + o the function that the primitive performs + + o the attributes that may be used to tailor its behavior + + o the events that it is capable of understanding + + o the shadow variables that provide access to information + determined as a result of the primitive's operation + + The mechanism used to ensure that a media server and its client share + a compatible set of packages is not defined. Currently, it is + expected that provisioning will be used, possibly coupled with a + future auditing capability. Additionally, when used in SIP networks, + packages could be defined using feature tags and the procedures + defined for Indicating User Agent Capabilities in SIP [i1] used to + allow a media server to describe its capabilities to other user + agents. + +4.2. Profile Scheme + + Not all devices and applications using MSML will need to support the + entire MSML schema. For example, a media processing device might + support only audio announcements, only audio simple conferencing, or + only multimedia IVR. It is highly desirable to have a system for + describing what portion of MSML a particular media processing device + or control agent supports. + + + +Saleem, et al. Informational [Page 18] + +RFC 5707 Media Server Markup Language February 2010 + + + The package scheme described earlier allows MSML functionality to be + functionally grouped, relying on the MSML core package. This scheme + allows a portion of the complete MSML specification to be + implemented, on a per-package basis, and also creates a framework for + future extension packages. However, within a given package, in some + cases, only a subset of the package functionality may be required. + In order to support subsets of packages, with greater degree of + granularity than at the package level, a profile scheme is required. + + MSML package profiles would identify a subset of a given MSML package + with specific definitions of elements and attributes. Each MSML + package profile MUST be accompanied by one or more corresponding + schemas. To use the examples above, there could be an audio + announcements profile of the MSML Dialog Base Package, an audio + simple conferencing profile of the MSML Conference Core Package, and + a multimedia IVR profile of the MSML Dialog Base Package. + + MSML package profiles MUST be published separately from the MSML + specification, in one or more standards documents (e.g., Internet + Drafts or RFCs) dedicated to MSML package profiles. Profiles would + not be registered with IANA and any organization would additionally + be free to create its own profile(s) if required. + +5. Execution Flow + + MSML assumes a model where there is a single control context within a + media server for MSML processing. That context may have one or many + SIP [n1] dialogs associated with it. It is assumed that any SIP + dialogs associated with the MSML control context have been + authorized, as appropriate, by mechanisms outside the scope of MSML. + + A media server control context maintains information about the state + of all media objects and media streams within a media server. It + receives and processes all MSML requests from authorized SIP dialogs + and receives all events generated internally by media objects and + sends them on the appropriate SIP dialog. An MSML request is able to + create new media objects and streams, and to modify or destroy any + existing media objects and streams. + + An MSML request may simply specify a single action for a media server + to undertake. In this case, the document is very similar to a simple + command request. Often, though, it may be more natural for a client + to request multiple actions at one time, or the client would like + several actions to be closely coordinated by the media server. + Multiple MSML elements received in a single request MUST be processed + sequentially in document order. + + + + + +Saleem, et al. Informational [Page 19] + +RFC 5707 Media Server Markup Language February 2010 + + + An example of the first scenario would be to create a conference and + join it with an initial participant. An example of the second case + would be to unjoin one or more participants from a main conference + and join them to a sidebar conference. In the first scenario, + network latencies may not be an issue, but it is simpler for the + client to combine the requests. In the second case, the added + network latency between separate requests could mean perceptible + audio loss to the participant. + + Each MSML request is processed as a single transaction. A media + server MUST ensure that it has the necessary resources available to + carry out the complete transaction before executing any elements of + the request. If it does not have sufficient resources, it MUST + return a 520 response and MUST NOT execute the transaction. + + The MSML request MUST be checked for well-formedness and validated + against the schema prior to executing any elements. This allows XML + [n2] errors to reported immediately and minimizes failures within a + transaction and the corresponding execution of only part of the + transaction. + + Each element is expected to execute immediately. Elements such as + <dialogstart>, which take an unpredictable amount of time, are + "forked" and executed in a separate thread (see MSML Dialog + Packages). Once successfully forked, execution continues with the + element following the </dialogstart>. As such, MSML does not provide + mechanisms to sequence or coordinate other operations with dialog + elements. + + Processing within a transaction MUST stop if any errors occur. + Elements that were executed prior to the error are not rolled back. + It is the responsibility of the client to determine appropriate + actions based upon the results indicated in the response. Most + elements MAY contain an optional "mark" attribute. The value of that + attribute from the last successfully executed element MUST be + returned in an error response. Note that errors that occur during + the execution of a dialog occur outside the context of an MSML + transaction. These errors will be indicated in an asynchronous + event. + + Transaction results are returned as part of the SIP request response. + The transaction results indicate the success or failure of the + transaction. The result MUST also include identifiers for any + objects created by a media server for which the client did not + provide an instance name. Additionally, if the transaction fails, + the reason for the failure MUST be returned, as well as an indication + of how much of the transaction was executed before the failure + occurred SHOULD be returned. + + + +Saleem, et al. Informational [Page 20] + +RFC 5707 Media Server Markup Language February 2010 + + +6. Media Server Object Model + + Media servers are general-purpose platforms for executing real-time + media processing tasks. These tasks range in complexity from simple + ones such as serving announcements, to complex ones, such as speech + interfaces, centralized multimedia conferencing, and sophisticated + gaming applications. + + Calls are established to a media server using SIP. Clients will + often use SIP third party call control (3PCC) [i4] to establish calls + to a media server on behalf of end users. However MSML does not + require that 3PCC be used, only that the client and the media server + share a common identifier for the call and its associated RTP [i3] + sessions. + + Objects represent entities that source, sink, or modify media + streams. A media streams is a bidirectional or unidirectional media + flow between objects on a media server. The following subsections + define the classes of objects that exist on a media server and the + way these are identified in MSML. + +6.1. Objects + + A media object is an endpoint of one or more media streams. It may + be a connection that terminates RTP sessions from the network or a + resource that transforms or manipulates media. MSML defines four + classes of media objects. Each class defines the basic properties of + how object instances are used within a media server. However, most + classes require that the function of specific instances be defined by + the client, using MSML or other languages such as VoiceXML. + + The following classes of media processing objects are defined. The + class names are given in parentheses: + + o network connection (conn) + + o conference (conf) + + o dialog (dialog) + + Network connection is an abstraction for the media processing + resources involved in terminating the RTP session(s) of a call. For + audio services, a connection instance presents a full-duplex audio + stream interface within a media server. Multimedia connections have + multiple media streams of different media types, each corresponding + to an RTP session. Network connections get instantiated through SIP + [n1]. + + + + +Saleem, et al. Informational [Page 21] + +RFC 5707 Media Server Markup Language February 2010 + + + A conference represents the media resources and state information + required for a single logical mix of each media type in the + conference (e.g., audio and video). MSML models multiple mixes/views + of the same media type as separate conferences. Each conference has + multiple inputs. Inputs may be divided into classes that allow an + application to request different media treatment for different + participants. For example, the video streams for some participants + may be assigned to fixed regions of the screen while those for other + participants may only be shown when they are speaking. + + A conference has a single logical output per media type. For each + participant, it consists of the audio conference mix, less any + contributed audio of the participant, and the video mix shared by all + conference participants. Video conferences using voice activated + switching have an optional ability to show the previous speaker to + the current speaker. + + Conferences are instantiated using the <createconference> element. + The content of the <createconference> element specifies the + parameters of the audio and/or video mixes. + + Dialogs are a class of objects that represent automated participants. + They are similar to network connections from a media flow perspective + and may have one or more media streams as the abstraction for their + interface within a media server. Unlike connections, however, + dialogs are created and destroyed through MSML, and the media server + itself implements the dialog participant. Dialogs are instantiated + through the <dialogstart> element. Contents of the <dialogstart> + element define the desired or expected dialog behavior. Dialogs may + also be invoked by referencing VoiceXML as the dialog description + language. + + Operators are functions that are used to filter or transform a media + stream. The function that an instance of an operator fulfills is + defined as a property of the media stream. Operators may be + unidirectional or bidirectional and have a media type. + Unidirectional operators reflect simple atomic functions such as + automatic gain control, filtering tones from conferences, or applying + specific gain values to a stream. Unidirectional operators have a + single media input, which is connected to the media stream from one + object, and a single media output, which is connected to the media + stream of a different object. + + Bidirectional operators have two media inputs and two media outputs. + One media input and output is associated with the stream to one + object, and the other input and output is associated with a stream to + a different object. Bidirectional objects may treat the media + differently in each direction. For example, an operator could be + + + +Saleem, et al. Informational [Page 22] + +RFC 5707 Media Server Markup Language February 2010 + + + defined that changed the media sent to a connection based upon + recognized speech or dual-tone multi-frequency (DTMF) received from + the connection. Operators are implicitly instantiated when streams + are created or modified using the elements <join> and <modifystream>, + respectively. + + The relationships between the different object classes (conf, conn, + and dialog) are shown in the figure below. + + +--------------------------------------+ + | Media Server | + | | + |------+ ,---. | + | | +------+ / \ | + <== RTP ==>| conn |<---->| oper |<---->( conf ) | + | | +------+ \ / | + |------+ `---' | + | ^ ^ | + | | | | + | | +------+ +------+ | | + | | | | | | | | + | +-->|dialog| |dialog|<---+ | + | | | | | | + | +------+ +------+ | + +--------------------------------------+ + + A single, full-duplex instance of each object class is shown together + with common relationships between them. An operator (such as gain) + is shown between a connection and a conference and dialogs are shown + participating both with an individual connection and with a + conference. The figure is not meant to imply only one-to-one + relationships. Conferences will often have hundreds of participants, + and either connections or conferences may be interacting with more + than one dialog. For example, one dialog may be recording a + conference while other dialogs announce participants joining or + leaving the conference. + +6.2. Identifiers + + Objects are referenced using identifiers that are composed of one or + more terms. Each term specifies an object class and names a specific + instance within that class. The object class and instance are + separated by a colon ":" in an identifier term. + + Identifiers are assigned to objects when they are first created. In + general, either the MSML client or a media server may specify the + instance name for an object. Objects for which a client does not + assign an instance name will be assigned one by a media server. + + + +Saleem, et al. Informational [Page 23] + +RFC 5707 Media Server Markup Language February 2010 + + + Media server assigned instance names are returned to the client as a + complete object identifier in the response to the request that + created the object. + + It is meaningful for some classes of objects to exist independently + on a media server. Network connections may be created through SIP at + any time. MSML can then be used to associate their media with other + objects as required to create services. Conferences may be created + and have specific resources reserved waiting for participant + connections. + + Objects from these two classes, connections and conferences, are + considered independent objects since they can exist on a standalone + basis. Identifiers for independent objects consist of a single term + as defined above. For example, identifiers for a conference and + connection could be "conf:abc" or "conn:1234" respectively. Clients + that choose to assign instance names to independent objects must use + globally unique instance names. One way to create globally unique + names is to include the domain name of the client as part of the + name. + + Dialogs are created to provide a service to independent objects. + Dialogs may act as a participant in a conference or interact with a + connection similar to a two-participant call. Dialogs depend upon + the existence of independent objects, and this is reflected in the + composition of their identifiers. Operators modify the media flow + between other objects, such as application of gain between a + connection and a conference. As operators are merely media transform + primitives defined as properties of the media stream, they are not + represented by identifiers and created implicitly. + + Identifiers for dialogs are composed of a structured list of slash + ('/') separated terms. The left-most term of the identifier must + specify a conference or connection. This serves as the root for the + identifier. An example of an identifier for a dialog acting as a + conference participant could be: + + conf:abc/dialog:recorder + + All objects except connections are created using MSML. Connections + are created when media sessions get established through SIP. There + are several options clients and media servers can use to establish a + shared instance name for a connection and its media streams. + + When media servers support multiple media types, the instance name + SHOULD be a call identifier that can be used to identify the + collection of RTP sessions associated with a call. When MSML is used + in conjunction with SIP and third party call control, the call + + + +Saleem, et al. Informational [Page 24] + +RFC 5707 Media Server Markup Language February 2010 + + + identifier MUST be the same as the local tag assigned by the media + server to identify the SIP dialog. This will be the tag the media + server adds to the "To" header in its response to an initial invite + transaction. RFC 3261 requires the tag values to be globally unique. + + An example of a connection identifier is: conn:74jgd63956ts. + + With third party call control, the MSML client acts as a back-to-back + user agent (B2BUA) to establish the media sessions. SIP dialogs are + established between the client and the media server allowing the use + of the media server local tag as a connection identifier. If third + party call control is not used, a SIP event package MAY be used to + allow a media server to notify new sessions to a client that has + subscribed to this information. + + Identifiers as described above allow every object in a media server + to be uniquely addressed. They can also be used to refer to multiple + objects. There are two ways in which this can currently be done: + + wildcards + + common instance names + + An identifier can reference multiple objects when a wildcard is used + as an instance name. MSML reserves the instance name composed of a + single asterisk ('*') to mean all objects that have the same + identifier root and class. Instance names containing an asterisk + cannot be created. Wildcards MUST only be used as the right-most + term of an identifier and MUST NOT be used as part of the root for + dialog identifiers. Wildcards are only allowed where explicitly + indicated below. + + The following are examples of valid wildcards: + + conf:abc/dialog:* + + conn:* + + An example of illegal wildcard usage is: + + conf:*/dialog:73849 + + Although identifiers share a common syntax, MSML elements restrict + the class of objects that are valid in a given context. As an + example, although it is valid to join two connections together, it is + not valid to join two IVR dialogs. + + + + + +Saleem, et al. Informational [Page 25] + +RFC 5707 Media Server Markup Language February 2010 + + +7. MSML Core Package + + This section describes the core MSML package that MUST be supported + in order to use any other MSML packages. The core MSML package + defines a framework, without explicit functionality, over which + functional packages are used. + +7.1. <msml> + + <msml> is the root element. When received by a media server, it + defines the set of operations that form a single MSML request. + Operations are requested by the contents of the element. Each + operation MAY appear zero or more times as children of <msml>. + Specific operations are defined within the conference package and in + the set of dialog packages. + + The results of a request or the contents of events sent by a media + server are also enclosed within the <msml> element. The results of + the transaction are included as a body in the response to the SIP + request that contained the transaction. This response will contain + any identifiers that the media server assigned to newly created + objects. All messages that a media server generates are correlated + to an object identifier. Objects and identifiers are discussed in + section 6 (Media Server Object Model). + + Attributes: + + version: "1.1" Mandatory + +7.2. <send> + + Events are used to affect the behavior of different objects within a + media server. The <send> element is used to send an event to the + specified recipient within the media server. + + Attributes: + + event: the name of an event. Mandatory. + + target: an object identifier. When the identifier is for a + dialog, it may optionally be appended with a slash "/" followed by + the target to be included in an MSML dialog <send>. Mandatory. + + valuelist: a list of zero or more parameters that are included + with the event. + + + + + + +Saleem, et al. Informational [Page 26] + +RFC 5707 Media Server Markup Language February 2010 + + + mark: a token that can be used to identify execution progress in + the case of errors. The value of the mark attribute from the last + successfully executed MSML element is returned in an error + response. Therefore, the value of all mark attributes within an + MSML document should be unique. + +7.3. <result> + + The <result> element is used to report the results of an MSML + transaction. It is included as a body in the final response to the + SIP request that initiated the transaction. An optional child + element <description> may include text that expands on the meaning of + error responses. Response codes are defined in section 11 (Response + Codes). + + Attributes: + + response: a numeric code indicating the overall success or failure + of the transaction, and in the case of failure, an indication of + the reason. Mandatory. + + mark: in the case of an error, the value of the mark attribute + from the last successfully executed element that included the mark + attribute. + + In the case of failure, a description of the reason SHOULD be + provided using the child element <description>. + + Three other child elements allow the response to include identifiers + for objects created by the request but that did not have instance + names specified by the client. Those elements are <confid> and + <dialogid>, for objects created through a <createconference> and + <dialogstart> respectively. + +7.4. <event> + + The <event> element is used to notify an event to a media server + client. Three types of events are defined by the MSML Core Package: + "msml.dialog.exit", "msml.conf.nomedia", and "msml.conf.asn". These + correspond to the termination of an executing dialog, a conference + being automatically deleted when the last participant has left, and + the notification of the current set of active speakers for a + conference, respectively. Events may also be generated by an + executing dialog. In this case, the event type is specified by the + dialog (see MSML Dialog Core Package <send>). + + + + + + +Saleem, et al. Informational [Page 27] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + name: the type of event. If the event is generated because of the + execution MSML dialog <send>, the value MUST be the value of the + "event" attribute from the <send> element within the MSML Dialog + Core Package. If the event is generated because of the execution + of an <exit>, the value MUST be "moml.exit". If the event is + generated because of the execution of a <disconnect>, the value + MUST be "moml.disconnect". If the event is generated because of + an error, the value must be "moml.error". Mandatory. + + id: the identifier of the conference or dialog that generated the + event or caused the event to be generated. Mandatory. + + <event> has two children, <name> and <value>, which contain the + name and value respectively of each namelist item associated with + the event. + +8. MSML Conference Core Package + +8.1. Conferences + + A conference has a mixer for each type of media that the conference + supports. Each mix has a corresponding description that defines how + the media from participants contributes to that mix. A mixer has + multiple inputs that are combined in a media specific way to create a + single logical output. + + The elements that describe the mix for each media type are called + mixer description elements. They are: + + <audiomix> defines the parameters for mixing audio media. + + <videolayout> defines the composition of a video window. + + These elements, defined in sections 8.6 (Audio Mix) and 8.7 (Video + Layout) respectively, are used as content of the <createconference> + element to establish the initial properties of a conference. The + elements are used within the <modifyconference> element to change the + properties of a conference once it has been created, or within the + <destroyconference> element to remove individual mixes from the + conference. + + Conferences may be terminated by an MSML client using the + <destroyconference> element to remove the entire conference or by + removing the last mixer(s) associated with the conference. + Conferences can also be terminated automatically by a media server + based on criteria specified when the conference is created. When the + + + +Saleem, et al. Informational [Page 28] + +RFC 5707 Media Server Markup Language February 2010 + + + conference is deleted, any remaining participants will have their + associated SIP dialogs left unchanged or deleted based on the value + of the "term" attribute specified when the conference was created. + +8.2. Media Streams + + Objects have at least one media input and output for each type of + media that they support. Each object class defines the number of + input and output objects of that class support. Media streams are + created when objects are joined, either explicitly using <join> or + implicitly when dialogs are created using <dialogstart>. Dialog + creation has two stages, allocating and configuring the resources + required for the dialog instance, and implicitly joining those + resources to the dialog target during the dialog execution. Refer to + the MSML Dialog Base Package. + + A join operation by default creates a bidirectional audio stream + between two objects. Video and unidirectional streams may also be + created. A media stream is created by connecting the output from one + object to the input of another object and vice versa (assuming a + bidirectional or full-duplex join). + + Many objects may only support a single input for each type of media. + Within this specification, only the conference object class supports + an arbitrary number of inputs. When a stream is requested to be + created to an object that already has a stream of the same type + connected to its single input, the result of the request depends upon + the type of the media stream. + + Audio mixing is done by summing audio signals. Automatically mixing + audio streams has common and straightforward applications. For + example, the ability to bridge two streams allows for the easy + creation of simple three-way calls or to bridge private announcements + with a (whispered) conference mix for an individual participant. In + the case of general conferences, however, an MSML client SHOULD + create an audio conference and then join participants to the + conference. Conference mixers SHOULD subtract the audio of each + participant from the mix so that they do not hear themselves. + + A media server receiving a request that requires joining an audio + stream to the single audio input of an object that already has an + audio stream connected SHOULD automatically bridge the new stream + with the existing stream, creating a mix of the two audio streams. + The maximum number of streams that may be bridged in this manner is + implementation specific. It is RECOMMENDED that a media server + support bridging at least two streams. A media server that cannot + bridge a new stream with any existing streams MUST fail the operation + requesting the join. + + + +Saleem, et al. Informational [Page 29] + +RFC 5707 Media Server Markup Language February 2010 + + + Unlike audio mixing, there are many different ways that two video + streams may be combined and presented. For example, they may be + presented side by side in separate panes, picture in picture, or in a + single pane that displays only a single stream at a time based on a + heuristic such as active speaker. Each of these options creates a + very different presentation and requires significantly different + media resources. + + A join operation does not describe how a new stream can be combined + with an existing stream. Therefore, automatic bridging of video is + not supported. A media server MUST fail requests to join a new video + stream to an object that only supports a single video input and + already has a video stream connected to that input. For an object to + have multiple video streams joined to it, the object itself must be + capable in supporting multiple video streams. Conference objects can + support multiple video streams and provide a way to specify the + mixing presentation for the video streams. + + A media server MUST NOT establish any streams unless the media server + is able to create all the streams requested by an operation. Streams + are only able to be created if both objects support a media type and + at least one of the following conditions is true: + + 1. Each object that is to receive media is not already receiving a + stream of that type. + + 2. Any object that is to receive media and is already receiving a + stream of that type supports receiving an additional stream of + that type. The only class of objects defined in this + specification that directly support receiving multiple streams + of the same type are conferences. + + 3. The media server is able to automatically bridge media streams + for an object that is to receive media and that is already + receiving a stream of the requested type. The only type of + media defined in this specification that MAY be automatically + bridged is audio. + + The directionality of media streams associated with a connection is + modeled independently from what SDP [n9] allows for the corresponding + RTP [i3] sessions. Media servers MUST respect the SDP in what they + actually transmit but MUST NOT allow the SDP to affect the + directionality when joining streams internal to the media server. + + + + + + + + +Saleem, et al. Informational [Page 30] + +RFC 5707 Media Server Markup Language February 2010 + + +8.3. <createconference> + + <createconference> is used to allocate and configure the media mixing + resources for conferences. A description of the properties for each + type of media mix required for the conference is defined within the + content of the <createconference> element. Mixer descriptions are + described in Audio Mix and Video Layout sections. When no mixer + descriptions are specified, the default behavior MUST be equivalent + to inclusion of a single <audiomix>. + + Clients can request that a media server automatically delete a + conference when a specified condition occurs by using the + "deletewhen" attribute. A value of "nomedia" indicates that the + conference MUST be deleted when no participants remain in the + conference. When this occurs, an "msml.conf.nomedia" event MUST be + notified to the MSML client. A value of "nocontrol" indicates that + the conference MUST be deleted when the SIP [n1] dialog that carries + the <createconference> element is terminated. When this occurs, a + media server MUST terminate all participant dialogs by sending a BYE + for their associated SIP dialog. A value of "never" MUST leave the + ability to delete a conference under the control of the MSML client. + + Attributes: + + name: the instance name of the conference. If the attribute is + not present, the media server MUST assign a globally unique name + for the conference. If the attribute is present but the name is + already in use, an error (432) will result and MSML document + execution MUST stop. Events that the conference generates use + this name as the value of their "id" attribute (see section 7.4 + (<event>)). + + deletewhen: defines whether a media server should automatically + delete the conference. Possible values are "nomedia", + "nocontrol", and "never". Default is "nomedia". + + term: when true, the media server MUST send a BYE request on all + SIP dialogs still associated with the conference when the + conference is deleted. Setting term equal to false allows clients + to start dialogs on connections once the conference has completed. + Default is "true". + + mark: a token that MAY be used to identify execution progress in + the case of errors. The value of the mark attribute from the last + successfully executed MSML element is returned in an error + response. Therefore, the value of all mark attributes within an + MSML document should be unique. + + + + +Saleem, et al. Informational [Page 31] + +RFC 5707 Media Server Markup Language February 2010 + + + An example of creating an audio conference is shown below. This + conference allows at most two participants to contend to be heard and + reports the set of active speakers no more frequently than every 10 + seconds. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <createconference name="example"> + <audiomix> + <n-loudest n="3"/> + <asn ri="10s"/> + </audiomix> + </createconference> + </msml> + +8.3.1. <reserve> + + Conference resources may be reserved by including the <reserve> + element as a child of <createconference>. <reserve> allows the + specification of a set of resources that a media server will reserve + for the conference. Any requests for resources beyond those that + have been reserved should be honored on a best-effort basis by a + media server. + + Attributes: + + required: boolean that specifies whether <createconference> should + fail if the requested resources are not available. When set to + false, the conference will be created, with no reserved resources, + if the complete reservation cannot be honored. Default is "true". + +8.3.1.1. <resource> + + The resources to be reserved are defined using <resource>. The + contents of these elements describe a resource that is to be + reserved. Descriptions are implementation dependent. Media servers + that support MSML dialogs may use the elements from that package as + the basis for resource descriptions. Each resource element may use + the attribute "n" to define the quantity of the resource to reserve. + + For example, the following creates a conference and reserves two + types of resources. One resource element may represent resources + that are shared by all participants of the conference, while the + other may represent resources that are reserved for each of the + expected participants. + + + + + + +Saleem, et al. Informational [Page 32] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + n: number of resources to be reserved. Default is 1. + + type: specifies whether the resource is to be reserved by each + individual participant or reserved as a shared conference + resource. Valid values for this attribute are "individual" or + "shared". Default is "individual". + + <createconference> + <reserve> + <resource n="20"> + <!--description of resources used by each participant--> + </resource> + <resource n="2" type="shared"> + <!--description of the shared conference resources--> + </resource> + </reserve> + </createconference> + +8.4. <modifyconference> + + All of the properties of an audio mix or the presentation of a video + mix may be changed during the life of a conference using the + <modifyconference> element. Changes to an audio mix are requested by + including an <audiomix> element as a child of <modifyconference>. + This may also be used to add an audio mixer to the conference if none + was previously allocated. Changes to a video presentation are + requested by including a <videolayout> element as a child of + <modifyconference>. Similar to an audio mixer, this may be used to + add a video mixer if none was previously allocated. + + Mixers are removed by including a mixer description element within + <destroyconference/>. + + Features and presentation aspects are enabled/added or modified by + including the element(s) that define the feature or presentation + aspect within a mixer description. The complete specification of the + element must be included just as it would be included when the + conference is created. The new definition completely replaces any + previous definition that existed. Only things that are defined by + elements included in the mixer descriptions are affected. Any + existing configuration aspects of a conference, which are not + specified within the <modifyconference/> element, MUST maintain their + current state in the media server. + + + + + + +Saleem, et al. Informational [Page 33] + +RFC 5707 Media Server Markup Language February 2010 + + + For example, if an MSML client wanted to change the minimum reporting + interval for active speaker notification from that shown in the + Conference Examples section (<createconference>) it would send the + following to the media server: + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <modifyconference id="conf:example"> + <audiomix> + <asn ri="4"/> + </audiomix> + </modifyconference> + </msml> + + This would also enable active speaker notification if it had not + previously been enabled. The N-loudest mixing is unaffected. + + Multiple elements MAY be included in the mixer descriptions similar + to when conferences are created. For example, in a video conference, + the video mix description (<videolayout>) could specify that the + layout of the video being displayed should change such that the + regions currently displaying participants get smaller and new + region(s) are created to support additional participants. A media + server MUST make all of the requested changes or none of the + requested changes. + + Additional examples of modifying conferences are presented in the + Conference Examples section. + + Attributes: + + id: the identifier for a conference. Wildcards MUST NOT be used. + Mandatory. + + mark: a token that can be used to identify execution progress in + the case of errors. The value of the mark attribute from the last + successfully executed MSML element is returned in an error + response. Therefore, the value of all "mark" attributes within an + MSML document SHOULD be unique. + +8.5. <destroyconference> + + Destroy conference is used to delete mixers or to delete the entire + conference and all state and shared resources. When a mixer is + removed, all of the streams joined to that mixer are unjoined. When + a conference is destroyed, SIP dialogs for any remaining participants + MUST be maintained or removed based on the value of the "term" + attribute when the conference was created. + + + +Saleem, et al. Informational [Page 34] + +RFC 5707 Media Server Markup Language February 2010 + + + When there is no element content, <destroyconference/> deletes the + entire conference. Individual mixers are removed by including a + mixer description element identifying the mix (or mixes) to be + removed as content to <destroyconference/>. <audiomix/> is used + remove audio mixers and <videolayout/> is used remove video mixers. + When one or more mixer descriptions are specified, then media server + MUST only delete the specified mixer and MUST NOT affect any other + existing mixers. When <audiomix/> or <videolayout/> is identified + for individual removal, other feature aspects of the mix MUST NOT be + included. If specified, the media server MUST ignore any such + elements. When the last mixer is removed from a conference, a media + server MUST remove all conference state, leaving or removing any + remaining SIP dialogs as described above. + + Attributes: + + id: the identifier for a conference. Mandatory. + + mark: a token that can be used to identify execution progress in + the case of errors. The value of the mark attribute from the last + successfully executed MSML element is returned in an error + response. Therefore, the value of all "mark" attributes within an + MSML document SHOULD be unique. + +8.6. <audiomix> + + The properties of the overall audio mix are specified using the + <audiomix> element. + + Attributes: + + id: an optional identifier for the audio mix. + + samplerate: Integer value specifies the sample rate (in Hz) for + the audio mixer. Optional, default value of 8000. + + An example of the description for an audio mix is: + + <audiomix id="mix1"> + <asn ri="10s"/> + <n-loudest n="3"/> + </audiomix> + +8.6.1. <n-loudest> + + The <n-loudest> element defines that participants contend to be + included in the conference mix based upon their audio energy. When + the element is not present, all participants are mixed. + + + +Saleem, et al. Informational [Page 35] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + n: the number of participants that will be included in the audio + mix based upon having the greatest audio energy. Mandatory. + +8.6.2. <asn> + + The <asn> element enables notification of active speakers. Active + speakers MUST be notified using the <event> element with an event + name of "msml.conf.asn". The namelist of the event consists of the + set of active speakers. The name of each item is the string + "speaker" with a value of the connection identifier for the + connection. + + Attributes: + + ri: the minimum reporting interval defines the minimum duration of + time that must pass before changes to active speakers will be + reported. A value of zero disables active speaker notification. + + asth: specifies the active speaker threshold (in unit of dBm0). + Valid value range is 0 to -96. Optional, default is -96. + + An example of an active speaker notification is: + + <event name="msml.conf.asn" id="conf:example"> + <name>speaker</name> + <value>conn:hd93tg5hdf</value> + <name>speaker</name> + <value>conn:w8cn59vei7</value> + <name>speaker</name> + <value>conn:p78fnh6sek47fg</value> </event> + +8.7. <videolayout> + + A video layout is specified using the <videolayout> element. It is + used as a container to hold elements that describe all of the + properties of a video mix. The parameters of the window that + displays the video mix are defined by the <root> element. When the + video mix in composed of multiple panes, the location and + characteristics of the panes are defined by one or more <region> + elements. A <region> element is not required when only a single + video stream is displayed at one time and none of the visual + attributes of regions are required. + + Some regions may be used to display a video stream based on a + selection criteria rather than having a video stream of a single + participant continuously presented in the region. One such an + + + +Saleem, et al. Informational [Page 36] + +RFC 5707 Media Server Markup Language February 2010 + + + example is a distance learning lecture where the instructor sees each + of the students periodically displayed in a region. When a region is + used to display one of a number of streams, it is placed as a child + of a <selector> element. + + Attributes: + + type: specifies the language used to define the layout. Layouts + defined using MSML MUST use the value "text/msml-basic-layout". + This is the same convention as defined for the layout package from + the W3C SMIL 2.0 specification [i6]. The default when omitted is + "text/msml-basic-layout". + + id: an optional identifier for the video layout. + +8.7.1. <root> + + The <root> element describes the root window or virtual screen in + which the conference video mix will be displayed. Simple conferences + can display participant video directly within the root window but + more complex conferences will use regions for this purpose. Areas of + the window which are not used to display video will show the root + window background. + + All video presentations require a root window. It MUST be present + when a video mix is created and it cannot be deleted; however, its + attributes MAY be changed using the <modifyconference> element. + + Attributes: + + size: the size of the root window specified as one of the five + standard common intermediate formats (e.g., CIF, QCIF). + + backgroundcolor: the color for the root window background defined + using the values for the "background-color" property of the CSS2 + specification [n10]. + + backgroundimage: the URI for an image to be displayed as the root + window background. Transparent portions of the image allow the + background color to show through. + +8.7.2. <region> + + <region> elements define video panes that are used to display + participant video streams. Regions are rendered on top of the root + window. + + + + + +Saleem, et al. Informational [Page 37] + +RFC 5707 Media Server Markup Language February 2010 + + + The size of a region is specified relative to the size of the root + window using the "relativesize" attribute. Relative sizes are + expressed as fractions (e.g., 1/4, 1/3) that preserve the aspect + ratio of the original video stream while allowing for efficient + scaling implementations. + + Regions are located on the root window based on the value of the + position attributes "top" and "left". These attributes define the + position of the top left corner of the region as an offset from the + top left corner of the root window. Their values may be expressed + either as a number of pixels or as a percent of the vertical or + horizontal dimension of the root window. Percent values are appended + with a percent ('%') character. Percent values of "33%" and "67%" + should be interpreted as "1/3" and "2/3" to allow easy alignment of + regions whose size is expressed relative to the size of the root + window. + + An example of a video layout with six regions is: + + +-------+---+ + | | 2 | + | 1 +---+ + | | 3 | + +---+---+---+ + | 6 | 5 | 4 | + +---+---+---+ + + <videolayout type="text/msml-basic-layout"> + <root size="CIF"/> + <region id="1" left="0" top="0" relativesize="2/3"/> + <region id="2" left="67%" top="0" relativesize="1/3"/> + <region id="3" left="67%" top="33%" relativesize="1/3"> + <region id="4" left="67%" top="67%" relativesize="1/3"/> + <region id="5" left="33%" top="67%" relativesize="1/3"/> + <region id="6" left="0" top="67%" relativesize="1/3"/> + </videolayout> + + The area of the root window covered by a region is a function of the + region's position and its size. When areas of different regions + overlap, they are layered in order of their "priority" attribute. + The region with the highest value for the "priority" attribute is + below all other regions and will be hidden by overlapping regions. + The region with the lowest non-zero value for the "priority" + attribute is on top of all other regions and will not be hidden by + overlapping regions. The priority attribute may be assigned values + between 0 and 1. A value of zero disables the region, freeing any + resources associated with the region, and unjoining any video stream + displayed in the region. + + + +Saleem, et al. Informational [Page 38] + +RFC 5707 Media Server Markup Language February 2010 + + + Regions that do not specify a priority will be assigned a priority by + a media server when a conference is created. The first region within + the <videolayout> element that does not specify a priority will be + assigned a priority of one, the second a priority of two, etc. In + this way, all regions that do not explicitly specify a priority will + be underneath all regions that do specify a priority. As well, + within those regions that do not specify a priority, they will be + layered from top to bottom, in the order they appear within the + <videolayout> element. + + For example, if a layout was specified as follows: + + <videolayout> + <root size="CIF"/> + <region id="a" ... priority=".3" .../> + <region id="b" ... /> + <region id="c" ... priority=".2" ...> + <region id="d" ... /> + </videolayout> + + Then the regions would be layered, from top to bottom, c,a,b,d. + + Portions of regions that extend beyond the root window will be + cropped. For example, a layout specified as: + + <videolayout> + <root size="CIF"/> + <region id="foo" left="50%" top="50%" relativesize="2/3"/> + </videolayout> + + would appear similar to: + + +-----------+ + | root | + |background | + | +-----+-- + | | |// + | | foo |// + +-----+-----+// + |//////// + + Visual attributes are used to define aspects of the visual appearance + of individual regions. A border may be defined together with a title + and/or logo. Text and logos are displayed as images on top of the + region's video, below all regions with a lower priority. The visual + attributes are "title", "titletextcolor", "titlebackgroundcolor", + "bordercolor", "borderwidth", and "logo". + + + + +Saleem, et al. Informational [Page 39] + +RFC 5707 Media Server Markup Language February 2010 + + + Visual attributes can also be defined for individual streams (Video + Stream Properties). When visual attributes are specified as part of + both a region and a stream, those associated with the stream MUST + take precedence. This allows streams that are chosen for display + automatically (Stream Selection) to have proper text and logos + displayed. The region visual attributes are displayed when no stream + is associated with the region. + + Two other attributes associated with a region, "blank" and "freeze", + define the state of the video displayed in the region. When the + blank or freeze attribute is assigned the value "true", then the + media server MUST display the region either as a blank region, or the + video image frozen at the last received frame. + + These attributes are specified for a region and not allowed for + streams because that appears to be the common use case. Applying + them to streams would allow only that stream to be affected within a + selector while other streams continue to display normally. Except + for personal mixing scenarios, the same effect can be achieved by + having the participant mute their own transmission to the media + server. + + Attributes: associated with each region: + + id: a name that can be used to refer to the region. + + left: the position of the region from the left side of the root + window. + + top: the position of the region from the top of the root window. + + relativesize: the size of the region expressed as a fraction of + the root window size. + + priority: a number between 0 and 1 that is used to define the + precedence when rendering overlapping regions. A value of zero + disables the region. + + title: text to be displayed as the title for the region + + titletextcolor: the color of the text + + titlebackgroundcolor: the color of the text background + + bordercolor: the color of the region border + + borderwidth: the width of the region border + + + + +Saleem, et al. Informational [Page 40] + +RFC 5707 Media Server Markup Language February 2010 + + + logo: the URI of an image file to be displayed + + freeze: a boolean value, with a default of "false", that defines + whether the video image should be frozen at the currently + displayed frame + + blank: a boolean value, with a default of "false", that defines + whether the region should display black instead of the associated + video stream + +8.7.3. <selector> + + It is often desired that one of several video streams be + automatically selected to be displayed. The <selector> element is + used to define the selection criteria and its associated parameters. + The selection algorithm is specified by the "method" attribute. + Currently defined selection methods allow for voice activated + switching and to iterate sequentially through the set of associated + video streams. + + The regions that will display the selected video stream are placed as + child elements of the <selector> element. Including regions within a + <selector> element does not affect their layout with respect to + regions not subject to the selection. For simple video conferences + that display the video directly in the root window, the <root> + element can be placed as a child of <selector>. Region elements MUST + NOT be used in this case. + + For example, below is a common video layout that allows the video + stream from the currently active speaker to be displayed in the large + region ("1") at the top left of the layout while the streams from + five other participants are displayed in regions located at the + layout periphery. + + +-------+---+ + | | 2 | + | 1 +---+ + | | 3 | + +---+---+---+ + | 6 | 5 | 4 | + +---+---+---+ + + + + + + + + + + +Saleem, et al. Informational [Page 41] + +RFC 5707 Media Server Markup Language February 2010 + + + <videolayout type="text/msml-basic-layout"> + <root size="CIF"/> + <selector id="switch" method="vas"> + <region id="1" left="0" top="0" relativesize="2/3"/> + </selector> + <region id="2" left="67%" top="0" relativesize="1/3"/> + <region id="3" left="67%" top="33%" relativesize="1/3"> + <region id="4" left="67%" top="67%" relativesize="1/3"/> + <region id="5" left="33%" top="67%" relativesize="1/3"/> + <region id="6" left="0" top="67%" relativesize="1/3"/> + </videolayout> + + All selector methods must be defined so that they work if only a + single region is a child of the selector. Selector methods that + support more than one child region MUST specify how the method works + across multiple regions. Media server implementations MAY support + only a single region for methods that are defined to allow multiple + regions. + + The selector or region for a participant's video is defined using the + "display" attribute of <stream> during a join operation. Specifying + a selector allows the stream to be displayed according to the + criteria defined by the selector method. Specifying a region + supports continuous presence display of participants. Some streams + may be joined with both a selector and a region. In this case, the + value of <blankothers> attribute defines whether the streams + associated with a continuous presence region should be blanked when + the stream is selected for display in one of the selector regions. + + Attributes: common to all selector methods are: + + id: a name that can be used to refer to the selector. + + method: the name of the method used to select the video stream. A + value of "vas" (see the following section, Voice Activated + Switching) MAY be specified. + + status: specifies whether the selector is "active" or "disabled". + + blankothers: when "true", video streams that are also displayed in + continuous presence regions will have the continuous presence + regions blanked when the stream is displayed in a selection + region. + + + + + + + + +Saleem, et al. Informational [Page 42] + +RFC 5707 Media Server Markup Language February 2010 + + +8.7.3.1. Voice Activated Switching ("vas") + + Voice activated switching (VAS) is used to display the video stream + that correlates with the participant who is currently speaking. It + is specified using a selector method value of "vas". + + If the video stream associated with the active speaker is not + currently displayed in a selection region, then it replaces the video + in the region that is displaying the video of the speaker that was + least recently active. If the video of the active speaker is + currently displayed in a selection region, then there is no change to + any region. When VAS is applied to a single region, this has the + effect that the current speaker is displayed in that region. + + Attributes: + + si: switching interval is the minimum period of time that must + elapse before allowing the video to switch to the active speaker. + + speakersees: defines whether the active speaker sees the "current" + speaker (themselves) or the "previous" speaker. + +8.8. <join> + + <join> is used to create one or more streams between two independent + objects. Streams may be audio or video and may be bidirectional or + unidirectional. A bidirectional stream is implicitly composed of two + unidirectional streams that can be manipulated independently. The + streams to be established are specified by <stream> elements (section + <stream>) as the content of <join>. + + Without any content, <join> by default establishes a bidirectional + audio stream. When only a stream of a single type has previously + been created between two objects, or when only a unidirectional + stream exists, <join> can be used to add a stream of another media + type or make the stream bidirectional by including the necessary + <stream> elements. Bidirectional streams are made unidirectional by + using <unjoin> (section <unjoin>) to remove the unidirectional stream + for the direction that is no longer required. + + In addition to defining the media type and direction of streams, + <stream> elements are also used to establish the properties of + streams, such as gain, voice masking, or tone clamping of audio + streams, or labels and other visual characteristics of video streams. + Properties are often defined asymmetrically for a single direction of + a stream. Creating a bidirectional stream requires two <stream> + elements within the <join>, one for each direction, if one direction + is to have different properties from the other direction. + + + +Saleem, et al. Informational [Page 43] + +RFC 5707 Media Server Markup Language February 2010 + + + If a media server can provide services using both compressed or + uncompressed media, the MSML client may need to distinguish within + requests which format is to be used. When compressed streams are + created, both objects must use the same media format or an error + response (450) is generated. + + Attributes: + + id1: an identifier of either a connection or conference. + Wildcards MUST NOT be used. Mandatory. Any other object class + results in a 440 error. + + id2: an identifier of either a connection or conference. + Wildcards MUST NOT be used. Mandatory. Any other object class + results in a 440 error. + + mark: a token that can be used to identify execution progress in + the case of errors. The value of the mark attribute from the last + successfully executed MSML element is returned in an error + response. Therefore, the value of all mark attributes within an + MSML document SHOULD be unique. + + For example, consider a call center coaching scenario where a + supervisor can listen to the conversation between an agent and a + customer and provide hints to the agent, which are not heard by the + customer. One join establishes a stream between the agent and the + customer and another join establishes a stream between the agent and + the supervisor. A third join is used to establish a half-duplex + stream from the customer to the supervisor. The media server + automatically bridges the media streams from the customer and the + supervisor for the agent, and from the customer and the agent for the + supervisor. + + Assuming the following connections, each with a single audio stream: + + conn:supervisor + + conn:agent + + conn:customer + + + + + + + + + + + +Saleem, et al. Informational [Page 44] + +RFC 5707 Media Server Markup Language February 2010 + + + The following would create the media flows previously described: + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <join id1="conn:supervisor" id2="conn:agent"/> + <join id1="conn:agent" id2="conn:customer"/> + <join id1="conn:supervisor" id2="conn:customer"> + <stream media="audio" dir="to-id1"/> + </join> + </msml> + + The following example shows joining a participant to a multimedia + conference. It assumes that the conference has a video + presentation region named "topright". The "display" attribute is + explained in the section Video Stream Properties. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <join id1="conn:hd83t5hf7g3" id2="conf:example"> + <stream media="audio"/> + <stream media="video" dir="from-id1" display="topright"/> + <stream media="video" dir="to-id1"/> + </join> + </msml> + +8.9. <modifystream> + + Media streams can have different properties such as the gain for an + audio stream or a visual label for a video stream. These properties + are specified as the content of <stream> elements (section <stream>). + <modifystream> is used to change the properties of a stream by + including one or more <stream> elements that are to have their + properties changed. + + Stream properties MUST be set as specified by the element <stream> as + a child element of <modifystream> element. Any properties not + included in the <stream> element when modifying a stream MUST remain + unchanged. Setting a property for only one direction of a + bidirectional stream MUST NOT affect the other direction. The + directionality of streams can be changed by issuing an <unjoin> + followed by a <join>. Any streams that exist between the two objects + that are not included within <modifystream> MUST NOT be affected. + + Attributes: + + id1: an identifier of either a conference or a connection. The + instance name MUST NOT contain a wildcard if "id2" contains a + wildcard. Mandatory. + + + +Saleem, et al. Informational [Page 45] + +RFC 5707 Media Server Markup Language February 2010 + + + id2: an identifier of either a conference or a connection. The + instance name MUST NOT contain a wildcard if "id1" contains a + wildcard. Mandatory. + + mark: a token that can be used to identify execution progress in + the case of errors. The value of the mark attribute from the last + successfully executed MSML element is returned in an error + response. Therefore, the value of all mark attributes within an + MSML document is RECOMMENDED to be unique. + +8.10. <unjoin> + + Unjoin removes one or more media streams between two objects. In the + absence of any content in the <stream> element, all media streams + between the objects MUST be removed. Individual streams may be + removed by specifying them using <stream> elements, while the + unspecified streams MUST NOT be removed. A bidirectional stream is + changed to a unidirectional stream by unjoining the direction that is + no longer required, using the <unjoin> element. Operator elements + MUST NOT be specified within <stream> elements when streams are being + unjoined using the <unjoin> element. Any specified stream operators + MUST be ignored. + + <unjoin> and <join> may be used together to move a media stream, such + as from a main conference to a sidebar conference. + + Attributes: + + id1: an identifier of either a conference or a connection. The + instance name MUST NOT contain a wildcard if "id2" contains a + wildcard. Mandatory. + + id2: an identifier of either a conference or a connection. The + instance name MUST NOT contain a wildcard if "id1" contains a + wildcard. Mandatory. + + mark: a token that can be used to identify execution progress in + the case of errors. The value of the mark attribute from the last + successfully executed MSML element is returned in an error + response. Therefore, the value of all mark attributes within an + MSML document SHOULD be unique. + + The following removes a participant from a conference and plays a + leave tone for the remaining participants in the conference. + + + + + + + +Saleem, et al. Informational [Page 46] + +RFC 5707 Media Server Markup Language February 2010 + + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <unjoin id1="conn:jd73ht89sf489f" id2="conf:1"/> + <dialogstart target="conf:1" type="application/moml+xml"> + <play> + <audio uri="file://leave_tone.wav"/> + </play> + </dialogstart> + </msml> + +8.11. <monitor> + + Monitor is a specialized unidirectional join that copies the media + that is destined for a connection object. One example of the use for + <monitor> may be quality monitoring within a conference. The media + stream may be removed using the <unjoin> element (see the section + <unjoin>). + + Attributes: + + id1: an identifier of the connection to be monitored. Mandatory. + Any other object class results in a 440 error. Wildcards MUST NOT + be used. + + id2: an identifier of the object that is to receive the copy of + the media destined to id1. id2 may be a connection or a + conference. Mandatory. Any other object class results in a 440 + error. Wildcards MUST NOT be used. + + compressed: "true" or "false". Specifies whether the join should + occur before or after compression. When "true", id2 must be a + connection using the same media format as id1 or an error response + (450) is generated. Default is "false". + + mark: a token that can be used to identify execution progress in + the case of errors. The value of the mark attribute from the last + successfully executed MSML element is returned in an error + response. Therefore, the value of all mark attributes within an + MSML document SHOULD be unique. + +8.12. <stream> + + Individual streams are specified using the <stream> element. They + MAY be included as a child element in any of the stream manipulation + elements <join>, <modifystream>, or <unjoin>. + + + + + + +Saleem, et al. Informational [Page 47] + +RFC 5707 Media Server Markup Language February 2010 + + + The type of the stream is specified using a "media" attribute that + uses values corresponding to the top-level MIME media types as + defined in RFC 2046 [i7]. This specification only addresses audio + and video media. Other specifications may define procedures for + additional types. + + A bidirectional stream is identified when no direction attribute + "dir" is present. A unidirectional stream is identified when a + direction attribute is present. The "dir" attribute MUST have a + value of "from-id1" or "to-id1" depending on the required direction. + These values are relative to the identifier attributes of the parent + element. + + The compressed attribute is used to distinguish the compressed nature + of the stream when necessary. It is implementation specific what is + used when the attribute is not present. Joining compressed streams + acts much like an RTP [i3] relay. + + The properties of the media streams are specified as the content of + <stream> elements when the element is used as a child of <join> or + <modifystream>. Stream elements MUST NOT have any content when they + are used as a child of <unjoin> to identify specific streams to + remove. + + Some properties are defined within MSML as additional attributes or + child elements of <stream> that are media type specific. Ones for + audio streams and video streams are defined in the following two sub- + sections. Operators, viewed as properties of the media stream, MAY + be specified as child elements of the <stream> element. + + Attributes: + + media: "audio" or video". Mandatory + + dir: "from-id1" or "to-id1". + + compressed: "true" or "false". Specifies whether the stream uses + compressed media. Default is implementation specific. + +8.12.1. Audio Stream Properties + + Audio mixes can be specified to only mix the N-loudest participants. + However, there may be some "preferred" participants that are always + able to contribute. When audio streams are joined to a conference + that uses N-loudest audio mixing, preferred streams need to be + identified. + + + + + +Saleem, et al. Informational [Page 48] + +RFC 5707 Media Server Markup Language February 2010 + + + A preferred audio stream is identified using the "preferred" + attribute. The "preferred" attribute MAY be used for an audio stream + that is input to a conference and MUST NOT be used for other streams. + + Additional attributes of the <stream> element for audio streams are: + + Attributes: + + preferred: a boolean value that defines whether the stream does + not contend for N-loudest mixing. A value of "true" means that + the stream MUST always be mixed while a value of "false" means + that the stream MAY contend for mixing into a conference when + N-loudest mixing is enabled. Default is "false". + + There are two elements that can be used to change the characteristics + of an audio stream as defined below. + +8.12.1.1. <gain> + + The <gain> element may be used to adjust the volume of an audio media + stream. It may be set to a specific gain amount, to automatically + adjust the gain to a desired target level, or to mute the stream. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the gain primitive. + + amt: a specific gain to apply specified in dB or the string "mute" + indicating that the stream should be muted. This attribute MUST + NOT be used if "agc" is present. + + agc: boolean indicating whether automatic gain control is to be + used. This attribute MUST NOT be used if "amt" is present. + + tgtlvl: the desired target level for AGC specified in dBm0. This + attribute MUST be specified if "agc" is set to "true". This + attribute MUST NOT be specified if "agc" is not present. + + maxgain: the maximum gain that AGC may apply. Maxgain is + specified in dB. This attribute MUST be used if "agc" is present + and MUST NOT be used when "agc" is not present. + +8.12.1.2. <clamp> + + The <clamp> element is used to filter tones and/or audio-band dtmf + from a media stream. + + + + +Saleem, et al. Informational [Page 49] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + dtmf: boolean indicating whether DTMF tones should be removed. + + tone: boolean indicating whether other tones should be removed. + +8.12.2. Video Stream Properties + + Video mixes define a presentation that may have multiple regions, + such as a quad-split. Each region displays the video from one or + more participants. When video streams are joined to such a + conference, the region that will display the video needs to be + specified as part of the join operation. + + The region that will display the video is specified using the + "display" attribute. The "display" attribute MUST be used for a + video stream that is input to a conference and MUST NOT be used for + other streams. The value of the attribute MUST identify a <region> + (see the section <region>) or a <selector> (see the section + <selector>) that is defined for the conference. A stream MUST NOT be + directly joined to a region that is defined within a selector. + Changing the value of the "display" attribute can be used to change + where in a video presentation layout a video stream is displayed. + + Additional attributes of the <stream> element for video streams are: + + Attributes: + + display: the identifier of a video layout region or selector that + is to be used to display the video stream. + + override: specifies whether or not the given video stream is the + override source in the region defined by "display" attribute. + Valid values are "true" or "false". Optional, default value is + "false". Only a video stream that is input to a conference can be + the override source. A particular region can have at most one + override source at a time. The most recently joined video stream + with this attribute set to "true" becomes the override source. + When there's an override source in place, its video is always + displayed in the region, regardless of what video selection + algorithm (either a selector or continuous presence mode) is + configured for that region. Once the override source is cleared, + the conference MUST revert back to original video selection + algorithm. + + + + + + + +Saleem, et al. Informational [Page 50] + +RFC 5707 Media Server Markup Language February 2010 + + +8.12.2.1. <visual> + + Some regions of video conferences may display different streams + automatically, such as when voice activated switching is used. + Connections MAY also be joined directly without the use of video + mixing. In these cases, the <visual> element may be used to define + visual display properties for a stream. + + The <visual> element MAY use any of the visual attributes defined for + regions (see the section <region>). This allows the visual aspects + of regions within a <selector> to be tailored to the selected video + stream, or for streams that are directly joined to display a name or + logo. + +9. MSML Dialog Packages + +9.1. Overview + + MSML Dialog Packages define an XML [n2] language for composing + complex media objects from a vocabulary of simple media resource + objects called primitives. It is primarily a descriptive or + declarative language to describe media processing objects. MSML + dialogs operate on a single or multiple streams that are identified + by the MSML document outside the scope of the MSML Dialog Package. + + MSML dialogs are intended to be used in different environments. As + such, the language itself does not define how an MSML dialog is used. + Each environment in which an MSML dialog is used must define how it + is used, the set of services provided, and the mechanism for passing + information between the environment and MSML dialog. The specific + mechanisms used to realize the interface between MSML dialog and its + environment are platform specific. + + MSML Dialog Packages provide two models for access to media resources + and service creation building blocks. Both models MAY be used in + conjunction with each other in a complementary manner. The first + model (referred to as "Media Primitives and Composites", part of the + mandatory MSML Dialog Base Package) contains media primitives (such + as digit collection and announcements) and composite functions (such + as play and collect combined as a single operation). The second + model (referred to as "Media Groups", part of the optional MSML + Dialog Group Package) allows the ability to define complex customized + interactions, via event passing mechanisms, between media primitives, + if required. + + + + + + + +Saleem, et al. Informational [Page 51] + +RFC 5707 Media Server Markup Language February 2010 + + + MSML Dialog Core Package + + Defines core framework over which all MSML Dialog Packages + operate. + + MSML Dialog Base Package + + Media Primitives + <dtmf> or <collect> + DTMF digit collection + <play> + Playing of Announcements + <dtmfgen> + Generation of DTMF digits + <tonegen> + Tone genration + <record> + Media recording + + Media Composites + <collect> + Supports play and collect operation. + Composite function with inclusion of play. + <record> + Supports play and record operation. + Composite function with inclusion of play. + + MSML Dialog Group Package + <group> + Allows grouping of media primitives for parallel + execution, with an event exchange mechanism + between the media primitives to achieve + customized media operations. All the above media + primitive elements are accepted within the + group. + + The following operations MUST be supported using elements described + above using either the MSML Dialog Base Package or MSML Dialog Group + Package. + + Announcement only + <play> + Collection only + <dtmf> or <collect> + + Recording only + <record> + + + + +Saleem, et al. Informational [Page 52] + +RFC 5707 Media Server Markup Language February 2010 + + + Play and Collect + <collect> + <play/> + </collect> + + Play and Record + <record> + <play/> + </record> + + Additional MSML Dialog Packages are: + + o MSML Dialog Transform Package + + o MSML Dialog Speech Package + + o MSML Fax Detection Package + + o MSML Fax Send/Receive Package + + MSML dialogs MAY be used to simply expose primitive media resource + objects but will be used more often to describe dialog operations and + media transformation objects that can be controlled via user + interaction. + + MSML dialogs do not contain any computation or flow control + constructs. There are no results automatically generated when media + operations complete. Results MUST be explicitly requested using a + <send> or <exit> element within the definition of the MSML dialog. + +9.2. Primitives + + Primitives perform a single function on a media stream or multiple + streams such as generating audio/video, recognizing speech or DTMF, + or adjusting the gain. They may be composed so that primitives + execute concurrently. Primitives not composed for concurrent + execution MUST simply execute sequentially in the order they occur in + an MSML document. All concurrently executing primitives in the same + MSML object (defined in one MSML document) MAY interact with each + other through events (see MSML Dialog Group Package). + + Primitives are categorized into one of the following descriptive + categories. + + o Recognizers have a media input but no output. They allow + different things within a media stream to be recognized or + detected and for events to be generated based upon received + media. + + + +Saleem, et al. Informational [Page 53] + +RFC 5707 Media Server Markup Language February 2010 + + + o Transformers have one media input and output and may send and + receive events. + + o Sources and sinks generate or consume media. They have either + a media input or a media output but not both. They may receive + and generate events. + + o Composites combine underlying primitives to provide higher- + level user interaction, without the need for specific event- + based exchange between the primitives. The composite elements + provide a simpler mechanism for more commonly used services, + such as play and collect or play and record. + + Primitives may define different media processing behavior (states) + based upon the events that they receive. Primitives that support + different processing states must define their default starting state + and should support the "initial" attribute to allow that state to be + specified when the primitive is instantiated. All primitives must + support the "terminate" event class. + + The following types of primitives are defined within this + specification: + + Recognizers Transformers Source/Sink Composites + ------------------------------------------------------ + dtmf/collect agc play dtmf/collect + faxdetect clamp record record + speech gain dtmfgen + vad gate tonegen + relay faxsend + faxrcv + + Primitives have shadow variables, similar to those within VoiceXML + [n5], which are automatically assigned values when the primitives are + used. Upon initialization of an MSML dialog context, all shadow + variables have the string value "undefined". Each primitive has its + own instance of shadow variables that are global in scope to the + entire MSML dialog context. + + Names SHOULD be assigned to individual primitives when more than one + primitive of the same type is used within one MSML document. Shadow + variables are overwritten if the primitive has not been named and is + instantiated a second time. + + Shadow variables cannot be modified under user control. They may be + returned from the MSML dialog context using the <send> element. + + + + + +Saleem, et al. Informational [Page 54] + +RFC 5707 Media Server Markup Language February 2010 + + +9.3. Events + + Events provide the mechanism for primitives to interact with each + other and for an MSML context to interact with its external + environment. The external environment is defined by the way in which + an MSML context has been invoked. This will often be through MSML, + but other languages and protocols such as SIP may also be used. + + Every primitive and group conceptually implements their own event + queue. Events sent to them get placed into their associated queue. + Events are removed from their queues and processed in order. + Primitives within a group conceptually have their own thread of + execution. Due to the asynchronous nature of servicing events from + multiple queues, it cannot be assumed that several events sent in + sequence to different queues will be processed in the order in which + they were sent. For example, if recognition of something led to + sending events to both a <play> and a <record> in that order, it is + possible that the <record> may process its event before the <play>. + + Primitives each define the set of events that they support and the + behavior associated with their handling of each event. This allows + many types of behaviors to be defined. For example, VCR type + controls can be constructed by defining primitives that support + events corresponding to each control. Media recognition/detection + can be used to cause those events to be generated. + + Alternatively, events can be originated elsewhere, such as from a + control agent, and simply received by the primitive implementing the + control. Examples of the use of events include adjusting volume + (gain) and pause and resume of both announcement playout and record + creation. + + Primitives act on events based upon the longest match of an event + name. Event names are a period '.' delimited sequence of tokens. + The first token, or the root of the name, can be considered an event + class. Matching allows a standard meaning to be defined and then + extended based upon what triggers an event's generation. For + example, a record primitive has different behavior depending upon + whether it completed because a user stopped speaking or because it + was cancelled. The recording is retained in the first case but not + the second. + + Longest match allows new recognizers to be created and used without + changing how existing primitives are defined. For example, a face + recognition capability could be created that generates a + terminate.frowning event when a user looks puzzled. Although no + primitive directly defines this event, it will still effect a generic + terminate action. Primitives that require specialized behavior based + + + +Saleem, et al. Informational [Page 55] + +RFC 5707 Media Server Markup Language February 2010 + + + upon frowning may be extended to support this. As well, the event + can still be exported from the MSML context without requiring that + primitives receiving the event understand facial expressions. + +9.4. MSML Dialog Usage with SIP + + MSML dialogs MAY be used directly with SIP for dialog interactions + (e.g., IVR or fax). It can be initially invoked as part of the + "Prompt and Collect" service described in "Basic Network Media + Services with SIP" [n7]. That defines service indicators for a small + number of well-defined services using the user part of the SIP + Request-URI (R-URI). + + The prompt and collect service uses "dialog" as the service + indicator. URI parameters further refine the specific IVR request. + This document defines an additional parameter "msml-param" for the + dialog service indicator as follows: + + dialog-parameters = ";" ( dialog-param [ vxml-parameters ] ) + | moml-param + dialog-param = "voicexml=" dialog-url + moml-param = "moml=" moml-url + + There are no additional URI parameters when MSML is used as the + dialog language. + + MSML dialogs define discrete IVR dialog commands. These commands MAY + be included directly in the body of the INVITE to the "dialog" + service indicator by using the "cid" [n8] URL scheme. This scheme + identifies a message body part that in this case would contain the + MSML dialog request. Note that a multipart message body, containing + a single part, MUST be present even if the INVITE does not contain an + SDP offer. Subsequent MSML dialog requests are sent in the body of + SIP INFO messages as are all messages from a media server. + + An example of SIP URI as described above is: + + sip:dialog@mediaserver.example.net;\ + moml=cid:14864099865376@appserver.example.net + + The body part that contained the MSML dialog referenced by the URL + would have a Content-Id header of: + + Content-Id: <14864099865376@appserver.example.net> + + + + + + + +Saleem, et al. Informational [Page 56] + +RFC 5707 Media Server Markup Language February 2010 + + + The results of executing an <exit> or <disconnect>, or of executing a + <send> that has a "target" attribute value equal to "source", are + notified in SIP INFO messages using the <event> element from MSML + Core package. No messages are sent if execution completes normally + without executing one of these elements. + + If there is an error during validation or execution, then a media + server MUST notify the error as described above and must include the + namelist items "moml.error.status" and "moml.error.description". The + values for these items are defined in section 11. + + A restricted subset of MSML dialogs can also be used with the + "Announcement" service defined in [n7]. This service uses "annc" as + the service indicator and defines parameters that describe an + announcement. The "play=" parameter identifies the URL of a prompt + or a provisioned announcement sequence. The value of the "play=" + parameter can refer to an MSML dialog body part using a "cid" URL as + described above. That body part must only contain the <play> + primitive. + + Using MSML dialogs enhances the announcement service by allowing the + client to specify a sequence of audio segments rather than requiring + each sequence to be provisioned as well as support for video. + Moreover, MSML dialogs define a standard set of variables in contrast + to [n7] which defines a parameterization mechanism but does not + formally specify any semantics. + + If a media server does not understand the "cid" scheme or does not + understand MSML dialogs, it must respond with the SIP response code + "488 - not acceptable here". If the MSML dialog body contains + elements other than the <play> primitive, or there are errors during + validation, a media server must respond with a SIP response code "400 + - bad request". Finally, if there is a discrepancy between + parameters specified in the Request-URI and corresponding attributes + defined in the MSML dialog body, the Request-URI parameters must be + silently ignored. + + MSML dialogs MUST NOT change the operation of the announcement + service from that defined in [n7]. When the announcement completes, + a media server issues a SIP BYE request. The INFO method MUST NOT + used with the announcement service. + +9.5. MSML Dialog Structure and Modularity + + MSML is structured as a set of packages. Only the core and base + packages are required. The Dialog Core Package defines the framework + for MSML requests to a media server, without specific functionality. + It consists of the "primitive" abstraction, an abstract element for + + + +Saleem, et al. Informational [Page 57] + +RFC 5707 Media Server Markup Language February 2010 + + + control flow, the sequential execution model, and the <send> element. + That is, the MSML Dialog Core Package allows for the execution of a + sequence of one or more media processing primitives with the ability + to notify events to the invocation environment. + + Primitives are contained within the MSML Dialog Base Package, which + defines the basic <play>, <record>, <dtmf>, <dtmfgen>, <tonegen>, and + <collect> elements. Another package, the MSML Dialog Transform + Package, defines the simple half-duplex filters. More advanced + primitives are defined in the speech and fax packages. The MSML + speech package depends on the MSML Dialog Base Package as it extends + the capability of <play> by adding synthesized speech. Finally, the + group execution model, which is currently the only element that + changes the flow of control, is defined in a separate MSML Dialog + Group Package. All of these packages are optional with the exception + that MSML Dialog Core and MSML Dialog Base Packages MUST be + implemented to provide the minimal functionality. + +9.6. MSML Dialog Core Package + + The MSML Dialog Core Package defines the structural framework and + abstractions for MSML dialogs (via its schema). It also defines the + basic elements that are not part of the core primitive or control + abstractions. This package is dependent on the MSML Core Package. + Events generated by MSML dialogs, such as prompt completion, digits + collected, or dialog termination, are communicated by the media + server via the MSML Core Package (see MSML Core Package <event>). + + MSML dialogs are executed independently from the MSML core context. + When an MSML dialog is started, MSML allocates the dialog control + resources, and if successful, starts those resources executing. MSML + core execution then continues without waiting for the MSML dialog to + complete. This forking of MSML dialog invocation from the MSML core + context is done via the <dialogstart> element. Media streams are + created between the MSML dialog target and other internal media + server resources as part of dialog execution. Stream creation is + subject to the requirements defined in the MSML Core Package and + media streams as defined by the MSML Conference Core Package. + +9.6.1. <dialogstart> + + The <dialogstart> element is used to instantiate an MSML media dialog + on connections or conferences. The dialog is specified either inline + or by a URI [n6]. Inline dialogs MUST be composed of any of the MSML + Dialog Packages. MSML dialogs MAY be defined externally as VoiceXML + [n5]. The MSML dialog description MUST NOT be inline if the src + attribute, containing a URI, is present. + + + + +Saleem, et al. Informational [Page 58] + +RFC 5707 Media Server Markup Language February 2010 + + + The originator of the MSML dialog is notified using a + "msml.dialog.exit" event when the dialog completes. Any results + returned by the dialog when it exits are sent as a namelist to the + event. + + The "msml.dialog.exit" event is also used when dialogs fail due to + errors encountered fetching external documents or errors that occur + within the dialog execution thread. In this case, a namelist + containing the items "dialog.exit.status" and + "dialog.exit.description" is returned with the event to inform the + client of the failure and the failure reason. The values of these + items are defined within this package and the MSML Core Package. + Information from the failed dialog may be returned as additional + namelist items. + + Attributes: + + target: an identifier of a connection or a conference that will + interact with the dialog. The identifier must not contain + wildcards. Mandatory. + + src: the URL of the dialog description. MUST NOT be used if the + MSML dialog description is inline. Otherwise, an error (422) will + result and MSML document execution will stop. + + type: a MIME type that identifies the type of language used to + describe the dialog. application/moml+xml and + application/vxml+xml are used to identify MSML dialogs and + VoiceXML [n5] respectively. Mandatory. + + name: an instance name for the dialog. If the attribute is not + present, the media server will assign an identifier to the dialog. + If the attribute is present but the name is already associated + with the target, an error (431) will result and MSML document + execution will stop. Any results that a dialog generates will be + correlated to its identifier. + + mark: a token that can be used to identify execution progress in + the case of errors. The value of the mark attribute from the last + successfully executed MSML element is returned in an error + response. Therefore, the value of all "mark" attributes within an + MSML document should be unique. + + The following sections show examples of initiating an external MSML + dialog, an inline embedded MSML dialog, and an MSML-initiated + VoiceXML dialog. + + The following example starts an MSML dialog on a connection. + + + +Saleem, et al. Informational [Page 59] + +RFC 5707 Media Server Markup Language February 2010 + + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogstart target="conn:abcd1234" + type="application/moml+xml" + name="sample" + src="http://server.example.com/scripts/foo.moml"/> + </msml> + + The following example starts an inline embedded MSML dialog on a + connection. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogstart target="conn:abcd1234" name="sample"> + <play> + <audio uri="file://clip1.wav"/> + <audio uri="http://host1/clip2.wav"/> + <tts uri="http://host2/text.ssml"/> + <var type="date" subtype="mdy" value="20030601"/> + </play> + <send target="source" + event="done" + namelist="play.amt play.end"/> + </dialogstart> + </msml> + + The following example starts a VoiceXML dialog on a connection. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogstart target="conn:abcd1234" + type="application/vxml+xml" + name="sample" + src="http://server.example.com/scripts/foo.vxml"/> + </msml> + + If this dialog fails once its execution thread had begun, for + example, the fetch of the VoiceXML document failed, an example of the + event that would be returned would be: + + <?xml version="1.0" encoding="UTF-8"?> + <event name="msml.dialog.exit" + id="conn:abcd1234/dialog:sample"> + <name>dialog.exit.status</name> + <value>423</value> + <name>dialog.exit.description</name> + <value>External document fetch error</value> + </event> + + + +Saleem, et al. Informational [Page 60] + +RFC 5707 Media Server Markup Language February 2010 + + +9.6.2. <dialogend> + + Dialog end is used to terminate an MSML dialog created through + <dialogstart> before it completes of its own accord. The operation + of <dialogend> depends on the dialog language being used by the + executing context. When that context is VoiceXML, a + "connection.disconnected" event will be thrown to the VoiceXML + application. When that context is MSML dialog, a "terminate" event + will be sent to the MSML core context. + + <dialogend> allows the executing dialog the opportunity to gracefully + complete before generating a "msml.dialog.exit" event. Dialog + results may be returned and will be contained as a namelist to that + event. + + Attributes: + + id: the identifier of a dialog. Mandatory. + + mark: a token that can be used to identify execution progress in + the case of errors. The value of the mark attribute from the last + successfully executed MSML dialog element is returned in an error + response. Therefore, the value of all "mark" attributes within an + MSML document should be unique. + + For example, if the dialog from the previous example was still + executing, the following would terminate the dialog and generate an + "msml.dialog.exit" event. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogend id="conn:abcd1234/dialog:sample"/> + </msml> + +9.6.3. <send> + + The <send> element sends an event and optional namelist to the + recipient identified by the target attribute. Event names are + defined by the recipient. In the case where the recipient is an MSML + dialog group or primitive, the events are defined within this + document. Other recipients MAY use names that are suitable for their + environment. + + The "target" attribute specifies the recipient of the event. + Recipients MAY be other MSML dialog primitives or groups executing + within the object, the object itself, or the environment that invoked + the MSML dialog. Sending events to media primitives or groups is + supported by the MSML Dialog Group Package. Any target that is + + + +Saleem, et al. Informational [Page 61] + +RFC 5707 Media Server Markup Language February 2010 + + + unknown within the object is assumed to be destined to the external + environment. By convention, the string "source" SHOULD used to + address that environment, but any target name distinct from the MSML + dialog namespace MAY be used. + + Attributes: + + event: the name of an event. Mandatory. + + target: the recipient of the event. The recipient MUST be a MSML + dialog primitive, the currently executing group, or the MSML + dialog environment. A primitive is specified by a primitive type, + optionally appended by a period '.' followed by the identifier of + a primitive. Identifiers are only needed when more than one + primitive of the same type exists in the object. The executing + group is specified using the token "group". The environment is + specified using the token "source", optionally appended by a + period '.' followed by any environment specific target. + Mandatory. + + namelist: a list of zero or more shadow variables that are + included with the event. + +9.6.4. <exit> + + The <exit> element causes execution of the MSML dialog to terminate. + + Attributes: + + namelist: a list of one or more shadow variables that MAY + optionally be sent to the context that invoked the MSML Dialog + object. + +9.6.5. <disconnect> + + The <disconnect> element is similar to <exit> but has the additional + semantics of indicating to the context that invoked the MSML dialog + that it should disconnect from a media server, the media stream + associated with the object. The method of disconnection depends upon + how the media stream was initially established. If SIP was used, a + <disconnect> would cause a media server to issue a BYE request. The + request would be sent for the SIP dialog associated with media + session on which the MSML dialog was operating. + + + + + + + + +Saleem, et al. Informational [Page 62] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + namelist: a list of one or more shadow variables that MAY + optionally be sent to the context that invoked the MSML dialog + object. + +9.7. MSML Dialog Base Package + + The MSML Dialog Base Package defines a required set of base + functionality for the media server. It supports individual media + primitives, such as playing an announcement or collection digits, as + well as composite operations such as play and collect. When this + package is used in conjunction with the MSML Dialog Group Package, + the event-based mechanism is used to control primitives. This + package may also be used in conjunction with the MSML Speech Package + to extend the functionality of prompts to include TTS and user input + collection to include ASR. + + In the following sections, subsections of a primitive define child + elements of that primitive and are not themselves considered + primitives. They do not receive events or populate shadow variables. + +9.7.1. <play> + + Play is used to generate an audio or video stream. It MUST play in + sequence the media created by the child media elements <audio>, + <video>, <media>, <tts>, and <var>. When the play stops, either + because the terminate event is received or all media generation has + completed, the <playexit> element, if present, is executed. At least + one media generation element must be present. + + Play supports two states: generate and suspend. Media generation + occurs in the generate state and is suspended in the suspend state. + Once in the suspend state, media generation continues upon receiving + the generate event. The default initial state is generate. + + Audio MAY be generated in different languages by specifying the + xml:lang attribute for <play> and/or the child elements of <play>. + The language is inherited by the child elements, but each child MAY + specify its own language. Except for physical audio clips, it is an + error if a language is specified but the media server cannot render + the audio in the requested language. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the play primitive. + + + + +Saleem, et al. Informational [Page 63] + +RFC 5707 Media Server Markup Language February 2010 + + + interval: specifies the delay between stopping one iteration and + beginning another. The attribute has no effect if iterate is not + also specified. Default is no interval. + + iterate: specifies the number of times the media specified by the + child media elements should be played. Each iteration is a + complete play of each of the child media elements in document + order. Defaults to once '1'. + + initial: defines the initial state for the play element. Default + is "generate". + + maxtime: defines the maximum allowed time for the <play> to + complete. + + barge: defines whether or not audio announcements may be + interrupted by DTMF detection during play-out. The DTMF digit + barging the announcement is stored in the digit buffer. Valid + values for barge are "true" or "false", and the attribute is + mandatory. When barge is applied to a conference target, DTMF + digit detected from any conference participant MUST terminate the + announcement. + + cleardb: defines whether or not the digit buffer is cleared, prior + to starting the announcement. Valid values for cleardb are "true" + or "false", and the attribute is mandatory. + + offset: defines an offset, measured in units of time, where the + <play> is to begin media generation. Offset is only valid when + all child media elements are <audio>. + + skip: an amount, expressed in time, that will be used to skip + through the media when "forward" and "backward" events are + received. Default is 3 s (three seconds). + + xml:lang: specifies the language to use for content that can be + rendered in different languages. + + Events: + + The following describes input events to the media primitive + object. The MSML Dialog Group Package allows an event exchange + mechanism between primitives. + + pause: causes the play to enter the suspend state. + + resume: causes play to enter the generate state. + + + + +Saleem, et al. Informational [Page 64] + +RFC 5707 Media Server Markup Language February 2010 + + + forward: skips forward through the media. Only has effect when + all child media elements are <audio>. + + backward: skips backward through the media. Only has effect when + all child media elements are <audio>. + + restart: skips to the beginning of the media. Only has effect + when all child media elements are <audio>. + + toggle-state: causes the suspend / generate state to toggle. + + terminate: terminates the play and assigns values to the shadow + variables. + + Shadow Variables: + + play.amt: identifies the length of time for which media was + generated before the play was stopped. This does not include time + that may have elapsed while the play was in the suspend state. + + play.end: contains the event that caused the play to stop. When + the play stops because all media generation has completed, end is + assigned the value "play.complete". + + Note: Attributes barge and cleardb provide a simplified mechanism for + controlling play operations with implicit DTMF without the use of + <group> and event exchange mechanism. When using the <play> element + within the group framework and barge is specified, detection of barge + condition generates an implicit terminate event to the play + primitive. + + The following sections describe the child elements of <play>. + +9.7.1.1. <audio> + + The <audio> element identifies prerecorded audio to play. Local URI + references may resolve to a single physical audio clip, a logical + clip, or a provisioned sequence of clips (physical or logical). A + logical clip is one that can be rendered differently based on the + language attribute. Logical clips are provisioned for each of the + languages that a media server supports. Remote URI references are + resolved according to the capabilities of the remote server. + + Attributes: + + uri: identifies the location of the audio to be played. The file + and http schemes are supported. Mandatory. + + + + +Saleem, et al. Informational [Page 65] + +RFC 5707 Media Server Markup Language February 2010 + + + format: defines the encoding and file type of the audio resource. + The format attribute is defined as a string type of form + "audio/<filetype>;codecs=<codec>". The keyword 'audio' identifies + an audio content. The codecs field identifies the audio file's + codec to be used for decoding the audio content. If format + attribute is not specified, the filetype MUST be determined from + the URI and the codec information MUST be determined from the + media resource. + + audiosamplerate: identifies audio sample rate in kHz. If not + specified, the sample rate SHOULD be determined from the media + resource. + + audiosamplesize: identifies audio sample size in bits. If not + specified, the sample size SHOULD be determined from the media + resource. + + iterate: specifies the number of times the audio is to be played. + Defaults to once '1'. + + xml:lang: specifies the language to use when the URI identifies a + logical clip, either directly, or as part of a sequence. + +9.7.1.2. <video> + + The <video> element identifies prerecorded multimedia to play. + Contents identified by the URI attribute may contain audio only, + video only, or both audio and video. The media server SHOULD attempt + to play both audio and video from the identified URI, if both are + available in the content. + + Attributes: + + uri: identifies the location of the video or multimedia to be + played. The file and http schemes are supported. Mandatory. + + format: defines the encoding and file type of the video or + multimedia resource. The format attribute is defined as a string + type of form "video/<filetype>;codecs=<codecx>,<codecy>". The + keyword 'video' identifies video-only media or media containing + audio and video. The "codecs" field identifies the audio and/or + video codecs to be used for decoding the file content, where the + order of the codec values is not significant. In the event of + audio and video content, using 'video' keyword, the + codecs=<codecx>,<codecy> field MAY be used to identify the audio + codec and the video codec. If not specified, the codec + information SHOULD be determined from the media file. + + + + +Saleem, et al. Informational [Page 66] + +RFC 5707 Media Server Markup Language February 2010 + + + audiosamplerate: identifies audio sample rate in kHz. If not + specified, the sample rate SHOULD be determined from the media + file. + + audiosamplesize: identifies audio sample size in bits. If not + specified, the sample size SHOULD be determined from the media + file. + + codecconfig: identifies an optional special instruction string for + codec configuration. Default is to send no special configuration + string to the codec. + + profile: identifies a video profile name specific to the codec. + If not specified, default video profile of the codec SHOULD be + selected. + + level: identifies a video profile level to the codec. Default is + to send no profile information to the codec and allow the codec to + select an internal default. + + imagewidth: identifies the width of video image in pixels. + Default is to use image width information from media file. + + imageheight: identifies the height of video image in pixels. + Default is to use image height information from media file. + + + + + + + + + + + + + + + + + + + + + + + + + + +Saleem, et al. Informational [Page 67] + +RFC 5707 Media Server Markup Language February 2010 + + + maxbitrate: identifies the bitrate of the video signal in kbps. + Default is to use maximum bitrate information from the media file. + + framerate: identifies the video frame rate in frames per second. + Default is to use frame rate information from the media file. + + iterate: specifies the number of times the media content is to be + played. Defaults to once '1'. + +9.7.1.3. <media> + + The <media> element identifies multimedia content for play. All + content of the <media> element MUST start to play concurrently. This + element may be used to generate a multimedia stream from two + independent media resources, one identifying audio and the other + identifying video. + + The <media> element MUST contain at least one child element. Valid + child elements of <media> are <audio> and <video>, as described + earlier. <media> element MUST contain at most one <audio> element or + at most one <video> element. + +9.7.1.4. <var> + + The <var> element specifies the generation of audio from a variable + using prerecorded audio segments. A variable represents a semantic + concept (such as date or number) and dynamically produces the + appropriate speech. + + Prerecorded audio allows an application vendor or service provider to + choose the exact voice for their audio and therefore completely + control the "sound and feel" of the service provided to end users. + It provides very high audio quality and allows the variables to blend + seamlessly into the surrounding audio segments. + + Text to speech (TTS) using Speech Synthesis Markup Language (SSML) + [n11] may also be used to render variables, but may not provide as + good quality, or allow as complete control of the "sound and feel" or + user experience. TTS is normally used for reading text such as + emails and for very large vocabularies such as stock names. TTS + results in a very clear difference between the variables and the + surrounding audio segments. (See MSML Dialog Speech Package.) + + Attributes: + + type: specifies the type of variable. Mandatory. Variable type + must be one of "date", "digits", "duration", "month", "money", + "number", "silence", "time", or "weekday". + + + +Saleem, et al. Informational [Page 68] + +RFC 5707 Media Server Markup Language February 2010 + + + subtype: specifies an optional clarification of type. Specific + values depend upon the type. + + value: text that should be rendered appropriate to the type and + subtype attributes. Mandatory. + + xml:lang: specifies the language to use when rendering the + variable. + +9.7.1.5. <playexit> + + The <playexit> element MUST be invoked when generation of all content + of the <play> has come to completion. The contents of this element + MAY be used to send events. + + Attributes: + + none + +9.7.2. <dtmfgen> + + DTMF generator originates one or more DTMF digits in sequence. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the dtmfgen primitive. + + digits: a string of characters from the alphabet "0-9a-d#*" that + correspond to a sequence of DTMF tones. Mandatory. + + level: used to define the power level for which the tones will be + generated. Expressed in dBm0 in a range of 0 to -96 dBm0. Larger + negative values express lower power levels. Note that values + lower than -55 dBm0 will be rejected by most receivers (TR- + TSY-000181, ITU-T Q.24A). Default is -6 dBm0. + + dur: the duration in milliseconds for which each tone should be + generated. Implementations may round the value if they only + support discrete durations. Default is 100 ms. + + interval: the duration in milliseconds of a silence interval + following each generated tone. Implementations may round the + value if they only support discrete durations. Default is 100 ms. + + Events: + + terminate: terminates DTMF generation and assigns values to the + + + +Saleem, et al. Informational [Page 69] + +RFC 5707 Media Server Markup Language February 2010 + + + shadow variables. + + Shadow Variables: + + dtmfgen.end: contains the event that caused DTMF generation to + stop. + + The following sections describe the child elements of <dtmfgen>. + +9.7.2.1. <dtmfgenexit> + + The <dtmfgenexit> element MUST be invoked when the DTMF generation + operation completes or is terminated as a result of receiving the + terminate event. The <dtmfgenexit> element MAY be used to send + events when the DTMF generation has completed. + + Attributes: + + none + +9.7.3. <tonegen> + + Tone generator allows customized tone generation. A sequence of + varying tones with optional silence intervals can be composed using + the <tonegen> element. Child elements of <tonegen>, namely <tone> + and <silence>, specify a single tone or sequence of tones. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the tonegen primitive. + + iterate: A numeric value specifying the total number of + iterations. A value of 'forever' represents infinite repetitions. + Optional. Default is 1. + + Events: + + terminate: terminates tone generation and assigns values to the + shadow variables. + + Shadow Variables: + + tonegen.end: contains the event that caused tone generation to + stop. + + The following sections describe the child elements of <tonegen>. + + + + +Saleem, et al. Informational [Page 70] + +RFC 5707 Media Server Markup Language February 2010 + + +9.7.3.1. <tone> + + The <tone> element specifies a single tone with an optional silence + interval. The tone specification consists of two tone frequencies, + their attenuation values, a duration of the tone, and the number of + times to repeat the tone. + + Attributes: + + duration: time duration or length of the individual tone, + specified in "ms" or "s" in increments of 10 ms. A value of 0 + represents an infinite duration. Mandatory. + + iterate: specifies the number of times to execute the contents of + <tone> element. A value of 'forever' represents infinite + repetitions. Optional. Default is 1. + + Events: + + none + + Child Elements: + + The child elements of <tone> element specify a single tone and an + optional silence interval to be inserted at the end of tone + generation. A tone is defined by <tone1> and <tone2> elements. + Each <tone> element MUST contain at least one of <tone1> or + <tone2>, or MAY contain <tone1> and <tone2> exactly once. + + <tone1> + + Attributes: + + freq: specifies the frequency of the first tone in "Hz", + ranging from 0 to 3999 Hz. Mandatory. + + atten: specifies the attenuation level expressed in dBm0, + ranging from 0 to -96 dBm0. Mandatory. + + <tone2> + + Attributes: + + freq: specifies the frequency of the second tone in "Hz", + ranging from 0 to 3999 Hz. Mandatory. + + atten: specifies the attenuation level expressed in dBm0, + ranging from 0 to -96 dBm0. Mandatory. + + + +Saleem, et al. Informational [Page 71] + +RFC 5707 Media Server Markup Language February 2010 + + + <silence> - Refer to the silence element definition below. + +9.7.3.2. <silence> + + The <silence> element inserts a silence interval as optional content + of <tonegen> or <tone> elements. + + Attributes: + + duration: specifies the amount of silence interval in "ms" or "s", + in increments of 10ms. Mandatory. + + Events: + + none + +9.7.3.3. <tonegenexit> + + The <tonegenexit> element MUST be invoked when the tone generation + operation completes or is terminated as a result of receiving the + terminate event. The <tonegenexit> element MAY be used to send + events when the tone generation has completed. + + Attributes: + + none + +9.7.4. <record> + + Record creates a recording. Similar to play, <record> supports two + states: create and suspend. Received media becomes part of the + recording when <record> is in the create state and is discarded when + it is in the suspend state. + + Recording MUST be terminated when a terminate event is received or + when a nospeech event is received and no audio has yet been recorded. + <record> differentiates different types of terminate events. + + An optional <play> element MAY be specified as a child element of + <record>. This mechanism provides a complete play-record operation, + where the prompts specified within the <play> element are played in + advance of start of recording. + + Note: Attributes prespeech, postspeech, and termkey provide a + simplified mechanism for controlling record operations using implicit + DTMF and VAD, without the use of <group> and event exchange + mechanism. + + + + +Saleem, et al. Informational [Page 72] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the record primitive. + + append: a boolean that defines whether the recording is allowed to + be appended to an existing file if dest already exists. Default + is "false". The attribute is ignored if the scheme is http. + + dest: the destination for the recording, which will contain either + audio only, video only, or both audio and video depending on the + stream(s) being recorded. Recording MAY be either local or + external based upon the attribute value. File and http schemes + are supported. + + audiodest: the destination for the audio-only recording. + Recording MAY be either local or external based upon the attribute + value. All combinations of dest, audiodest, and videodest are + valid. File and http schemes are supported. + + videodest: the destination for the video-only recording. + Recording MAY be either local or external based upon the attribute + value. All combinations of dest, audiodest, and videodest are + valid. File and http schemes are supported. + + format: defines the encoding and file type of the recording. The + format attribute is defined as a string type of form + "audio|video/filetype;codecs=x,y". The keyword 'audio' identifies + an audio only recording, while the keyword 'video' identifies + video-only recording or an audio plus video recording. The codecs + field identifies the audio and/or video codecs to be used for the + recording, where the order of the codec values is not significant. + In the event of audio and video recording, using 'video' keyword, + the codecs=x,y field MAY be used to identify the audio codec and + the video codec. Mandatory. + + codecconfig: identifies an optional special instruction string for + codec configuration. Default is to send no special configuration + string to the codec. + + audiosamplerate: identifies audio sample rate in kHz. If not + specified, the sample rate SHOULD be determined from the media + source. + + audiosamplesize: identifies audio sample size in bits. If not + specified, the sample size SHOULD be determined from the media + source. + + + + +Saleem, et al. Informational [Page 73] + +RFC 5707 Media Server Markup Language February 2010 + + + profile: identifies a video profile name specific to the codec. + If not specified, default video profile of the codec SHOULD be + selected for the recording. + + level: identifies a video profile level to the codec. Default is + to send no profile information to the codec and allow the codec to + select an internal default. + + imagewidth: identifies the width of video image in pixels. + Default is to use image width information from the media source. + + imageheight: identifies the height of video image in pixels. + Default is to use image height information from the media source. + + maxbitrate: identifies the bitrate of the video signal in kbps. + Default is to use maximum bitrate information from the media + source. + + framerate: identifies the video frame rate in frames per second. + Default is to use frame rate information from the media source. + + initial: defines the initial state for the record element. + Default is "create", which starts the recording as soon as the + <record> element is executed. The "initial" attribute is + applicable only when <record> is used within the <group> + structure. + + maxtime: defines the maximum length of the recording in units of + time. Mandatory. + + prespeech: defines a timer value, in seconds, for detection of + absence of audio energy at the start of the record operation. If + no audio energy is detection for the amount of time specified by + prespeech, the recording is terminated. Default is 0 s, which + does not activate the prespeech timer. + + postspeech: defines a timer value, in seconds, for detection of + absence of audio energy while the recoding is in progress. During + an in progress recording, if absence of audio energy is detected + as specified by the postspeech timer, the recording is terminated. + Default is 0 s, which disables the ability to terminate a + recording due to postspeech silence. + + termkey: defines a single DTMF key that, when detected, terminates + the recording. Absence of this attribute prevents the recording + from being terminated due to detection of DTMF digits. When + termkey is specified, the detected DTMF digit terminates the + recording and the DTMF digit is not entered in the digit buffer. + + + +Saleem, et al. Informational [Page 74] + +RFC 5707 Media Server Markup Language February 2010 + + + Events: + + The following describes input events to the media primitive + object. The MSML Dialog Group Package allows an event exchange + mechanism between primitives. + + pause: causes the record to enter the suspend state. Received + media is discarded. + + resume: causes the record to resume if it was suspended. It has + no effect otherwise. + + toggle-state: causes the suspend / create state to toggle. + + terminate: terminates the recording and assigns values to the + shadow variables. + + terminate.cancelled: terminates the recording and assigns values + to the shadow variables. If the dest attribute used the file + scheme, the local recording is deleted. Applications are + responsible for removing external files created using the http + scheme. + + terminate.finalsilence: terminates the recording and assigns + values to the shadow variables. If the dest attribute used the + file scheme, the final silence is removed from the recording. + + nospeech: terminates the recording and assigns values to the + shadow variables if it is received and no recording has yet been + created. The "nospeech" event is ignored if audio has already + been recorded. + + Shadow Variables: + + record.len: the actual length of the recording measured in units + of time. This does not include time that may have elapsed while + the record was in the suspend state. + + record.end: contains the event that caused the record to + terminate. When the record terminates because maxtime is + exceeded, end is assigned the value "record.complete.maxlength". + + record.recordid: contains the value of the "dest" attribute, if + supplied, otherwise contains a media server assigned record + identifier. + + Record termination due to prespeech silence results in assigned + value of "record.failed.prespeech" + + + +Saleem, et al. Informational [Page 75] + +RFC 5707 Media Server Markup Language February 2010 + + + Record termination due to postspeech silence results in assigned + value of "record.complete.postspeech" + + Record termination due to DTMF detection results in assigned value + of "record.complete.termkey" + + The following sections describe the child elements of <record>. + +9.7.4.1. <play> + + The optional <play> element as a child element of <record> allows a + prompt to be played prior to start of recording. The record + operation starts at the end of the play sequence or if the play is + barged by DTMF, assuming that barge=true is specified for <play>. + For a complete description, refer to <play> element. + +9.7.4.2. <tonegen> + + The optional <tonegen> element as a child element of <record> allows + a tone or sequence of tones to be played prior to start of recording. + The record operation starts at the end of the tone generation. For a + complete description, refer to <tonegen> element. + +9.7.4.3. <recordexit> + + The <recordexit> element MUST be invoked when the record operation + completes or when the recording is terminated as a result of + receiving the terminate event. The <recordexit> element MAY be used + to send events when the recording has completed. + + Attributes: + + none + +9.7.5. <dtmf> or <collect> + + DTMF input fulfills several roles within MSML dialogs. It is used to + trigger events that will affect the media processing operation of + other primitives. It is also used to collect DTMF digits from a + media stream that are to be reported back to the user of MSML dialog. + Often DTMF detection is used for both purposes. Barge is the most + common example, where a prompt is stopped based upon DTMF input but + more digits may remain to be collected. + + DTMF detection supports multiple simultaneous recognition patterns. + Different patterns can be used to trigger sending different events in + order to implement DTMF controls. Alternatively, one pattern may be + + + + +Saleem, et al. Informational [Page 76] + +RFC 5707 Media Server Markup Language February 2010 + + + used to represent a collection and another pattern, a substring of + the first, used as a barge indication. + + An optional <play> element MAY be specified as a child element of + <dtmf> or <collect>. This mechanism provides a complete play-collect + operation, where the prompt(s) specified within the <play> element + are played in advance of DTMF digit collection. + + Note that all patterns share the same digit collection buffer, inter- + digit timing, a single <nomatch> element, and a single <noinput> + element. As such, multiple patterns may not be suitable to support + simultaneous collections for different purposes. When this is + required, separate <dtmf> elements should be used instead. + + <dtmf> terminates if any of the <pattern>, <noinput>, or <nomatch> + elements are matched the maximum number of times that they are + allowed. The number of times they may match may be specified as an + attribute of <dtmf> or of the individual child elements. + + Element identifier <dtmf> is equivalent to <collect>. However, + <collect> is the preferred name. MSML clients SHOULD use <collect>, + while MSML servers SHOULD support both. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to this primitive. + + cleardb: a boolean indication of whether the buffer for digit + collection should be cleared of any collected digits when the + element is instantiated. If set to false, any digits currently in + the buffer MUST be immediately compared against the pattern + elements. + + fdt: defines the first-digit timer value. The first-digit timer + is started when DTMF detection is initially invoked. If no DTMF + digits are detected during this initial interval, the <noinput> + element MUST be invoked. Optional, default is 0 s (wait forever + for the first digit). + + idt: defines the inter-digit timer to be used when digits are + being collected. When specified, the timer is started when the + first digit is detected and restarted on each subsequent digit. + Timer expiration is applied to all patterns. After that, if any + patterns remain active and a nomatch element is specified, the + nomatch is executed and DTMF input MUST terminate. The idt + attribute should only be used when digit collection is being + performed. Optional, default is 4 s. + + + +Saleem, et al. Informational [Page 77] + +RFC 5707 Media Server Markup Language February 2010 + + + edt: defines the extra-digit timer value. Specifies the length of + time the media server MUST wait after a match to detect a + termination key, if one is specified by the <pattern> element. + Optional, default is 4 s. + + starttimer: boolean value that defines whether the first digit + timer (fdt) is started initially. When set to false, the + starttimer event must be received for it to start. Default is + "false". + + iterate: specifies the number of times the <pattern>, <noinput>, + and <nomatch> elements may be executed unless those elements + specify differently. The value "forever" MAY be used to indicate + that these may be executed any number of times. Default is once + '1'. + + ldd: defines the minimum duration for a digit to be held in order + for it to be detected as a long DTMF digit. A long DTMF digit + event MUST be treated as a single DTMF event, and MUST contain an + extra character 'L' at the end to be distinguished from the other + regular digit events. For example, "#L" and "#" are different + DTMF events. Optional, default of 0 s. A value of 0 s disables + long DTMF digit detection and reporting. Attribute value is an + integer with a valid range from 100 ms to 100 s (units MUST be + supplied). + + Events: + + The following describes input events to the media primitive + object. The MSML Dialog Group Package allows an event exchange + mechanism between primitives. + + starttimer: starts the first digit timer (fdt) if it has not + already been started. Has no effect otherwise. + + terminate: terminates the DTMF input and assigns values to the + shadow variables. + + Shadow Variables: + + dtmf.digits: the string of DTMF digits that have been received + (the contents of the digit buffer). + + dtmf.len: the number of digits in the digit buffer. + + dtmf.last: the last digit in the digit buffer. + + + + + +Saleem, et al. Informational [Page 78] + +RFC 5707 Media Server Markup Language February 2010 + + + dtmf.end: contains the event that caused the <dtmf> to terminate + or is assigned one of "dtmf.match", "dtmf.noinput", or + "dtmf.nomatch" depending upon which of the corresponding elements + reached its maximum. + + The following sections describe the child elements of <dtmf> or + <collect>. + +9.7.5.1. <play> + + The optional <play> element as a child element of <dtmf> or <collect> + allows a prompt to be played prior to DTMF digit collection. DTMF + digit collection starts at the end of the play sequence or if the + play is barged by DTMF, assuming that barge=true is specified for + <play>. For a complete description, refer to <play> element. + +9.7.5.2. <pattern> + + The <pattern> element describes one or more DTMF digits that are to + be recognized. When the pattern is matched, the child elements MUST + be executed. + + Attributes: + + digits: the digit pattern that should be matched. Mandatory. + + format: an enumerated value that defines the format used to + express the digit pattern. The format may be "mgcp" or "megaco" + for patterns expressed as a digit map from those specifications, + or as one of the simple built-in formats defined within this + specification. Currently, a single built-in format "moml+digits" + is defined that allows a match based on either one or more + specific digits, or based upon a specific length specification + with an optional return key. "moml+digits" is the default. + + iterate: specifies the number of times the <pattern> may be + matched. The value "forever" may be used to indicate that + <pattern> may be matched any number of times. This value + overrides any specified in <dtmf>. Default is once '1'. + +9.7.5.3. <detect> + + The contents of the <detect> element MUST be executed whenever any + DTMF is first detected. It MUST be matched at most once. + + Attributes: + + none + + + +Saleem, et al. Informational [Page 79] + +RFC 5707 Media Server Markup Language February 2010 + + +9.7.5.4. <noinput> + + The <noinput> element is used when DTMF is being collected. Children + of the <noinput> element MUST be executed when DTMF has not been + detected and the first digit timeout occurs. + + Attributes: + + iterate: specifies the number of times the <noinput> may be + triggered. The value "forever" may be used to indicate that + <noinput> may be triggered any number of times. This value + overrides any specified in <dtmf>. Default is once '1'. + +9.7.5.5. <nomatch> + + The <nomatch> element is used when DTMF is being collected. Children + of the <nomatch> element MUST be executed when it is determined that + none of the individual patterns can be matched. + + Attributes: + + iterate: specifies the number of times the <nomatch> may be + triggered. The value "forever" may be used to indicate that + <nomatch> may be triggered any number of times. This value + overrides any specified in <dtmf>. Default is once '1'. + +9.7.5.6. <dtmfexit> + + The <dtmfexit> element MUST be invoked when the dtmf input completes + because one of <pattern>, <noinput>, or <nomatch> occurred its + maximum number of times. + + Attributes: + + None + +9.7.6. <moml> + + The root element <moml> MUST be used when the document is a stand- + alone MSML dialog, where the invoking application media type + indicates 'application/moml+xml'. Additionally, for backwards + compatibility, the <moml> element MUST be used within <dialogstart>, + which contains an inline embedded MSML dialog. + + Valid contents of <moml> are all elements described within this MSML + Dialog Base Package. + + + + + +Saleem, et al. Informational [Page 80] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + version: "1.0" Mandatory. + + id: an identifier unique to this object. Events returned from + MSML dialog (the "target" attribute of a <send> is equal to + "source") will be correlated with this identifier. Mandatory. + + Events: + + terminate: terminates the MOML context. A terminate event gets + sent to the currently executing <group> or primitive. + +9.8. MSML Dialog Group Package + + The group package defines a single control flow construct that + specifies concurrent execution. Primitives are composed for + concurrent execution by placing them within a <group> element. + Groups define how media flows between multiple concurrently executing + primitives. They have one or more inputs and one or more outputs. A + <group> represents the declaration of a complex media processing + operation. The event interaction between primitives (see the + following subsection) is defined within the context of one or more + groups. However groups themselves do not scope events, they simply + define that primitives are concurrently executing and a primitive + must be executing in order to receive an event. + + Placing primitives within a group structure is an optional feature of + this specification. It allows for complex services to created using + the event exchange mechanism between the primitives. For simpler + services, such as play/collect or play/record, the use of group + mechanism is not necessary. MSML Dialog Group Package is dependent + on the MSML Dialog Base Package. + + Groups may also be used to describe media objects that transform a + media stream while optionally allowing application or user control of + the transformation. For example, a gain control could be defined + that responds to user speech or DTMF input. In this case, a + recognition primitive would send events to a gain control primitive. + + Groups have one attribute that defines the media flow within them. + They also have a dimension that defines how many media inputs and + outputs they have. Currently, dimensions of 1 and 2 are supported + based upon the group topology. These correspond to a group with one + input and one output and a group with two inputs and two outputs. + + + + + + +Saleem, et al. Informational [Page 81] + +RFC 5707 Media Server Markup Language February 2010 + + + Media flow to and from the primitives within the group is based upon + a topology attribute of the <group> element. The topology attribute + defines a topology schema and implies the group dimension. + + There are several common ways in which primitives are often connected + together. A schema provides a convenient template that can be + applied to multiple primitives without having to define all of the + individual media relationships. The following two schemas are + initially defined for one-dimensional groups: + + o parallel: specifies that media sent to the group is sent to every + primitive that has an input. The group bridges the output from + every primitive that has an output into a single common group + output. + + o serial: specifies that the first primitive listed in the group + receives the media sent to the group. Its output is to be + connected to the input of the next primitive defined within the + group and so on until the last primitive within the group becomes + the group output. + + Groups with these topologies are shown in the two diagrams below. + The group on the left has a parallel topology and that on the right + has a serial topology. + + /-> P1 --\ + / \ + G(in) +---> P2 ----> G(out) G(in) --> P1 --> P2 --> P3 --> G(out) + \ / + \-> P3 --/ + + More complex media flows MAY be created by nesting groups of serial + and parallel topologies within each other. For example, the diagram + below has a group with a serial topology nested within a star + topology. + + /-----> P1 ------------------------\ + / \ + Gs(in) +-> Gp(in) --> P2 --> P3 --> Gp(out) -+> Gs(out) + + This combination could be used to create record operation where DTMF + was to be clamped from the recording itself, but a DTMF key press is + still used to stop the recording. In this case, P1 would be a DTMF + recognizer, P2 would be a clamp primitive, and P3 a recorder as shown + by the following example. This example omits child elements and + attributes not concerned with the core concept. The following + section discusses sending events, and the details of each of the + primitives are found in section 4. + + + +Saleem, et al. Informational [Page 82] + +RFC 5707 Media Server Markup Language February 2010 + + + <group topology="parallel"> + <dtmf/> + <group topology="serial"> + <clamp/> + <record/> + </group> + </group> + + A single schema, "fullduplex", is defined for a two-dimensional + group. A full-duplex two-dimensional group has exactly two immediate + children. Those children may be primitives or other one-dimensional + groups. A "fullduplex" group must only be used as the top-most group + and must not be nested. Each primitive (P1) and group (G2) becomes + half of the full-duplex group as shown in the diagram below. + + G-A(in1) +-> G2 --> G-B(out1) + + G-A(out2) <-- P1 <-+ G-B(in2) + + Full-duplex groups are symmetrical when both halves are the same. + They are asymmetrical when they differ. Asymmetric groups need to + have a name associated with each side. The left side is defined as + the input of the first child of the full-duplex group combined with + the output of the second child. The right side is reverse. These + sides were labeled A and B respectively in the preceding diagram. + + An example of a full-duplex group is the user operated gain control + mentioned at the beginning of this subsection. The gain should + operate on the audio that a user hears, but the gain is controlled by + recognizing things such as DTMF or spoken commands in media that the + user originates. The following shows the XML tag grouping that would + accomplish this and corresponds to the media flow shown in the + diagram above. If the user's audio is not required for anything + other than control of the gain, then the <relay> is not required and + the internal group could be omitted. A complete XML description for + this is included in the examples section. + + <group topology="fullduplex"> + <group topology="parallel"> + <dtmf/> + <relay/> + </group> + <gain/> + </group> + + Primitives within a group MUST begin concurrently but MAY finish + asynchronously based upon events that they receive or their task + completes. A group MUST terminate when all of the primitives within + + + +Saleem, et al. Informational [Page 83] + +RFC 5707 Media Server Markup Language February 2010 + + + it have completed. If the group contains a <groupexit> element, then + the contents of that element MUST be executed as part of group + termination. + + A group itself MAY receive a terminate event requesting termination. + A terminate event sent to the group causes a terminate event to be + sent to each of its currently active primitives. The <groupexit> + element is not executed until all primitives have processed their + respective terminate events. + +9.8.1. <group> + + The <group> element allows the contained primitives to be executed + concurrently. + + Attributes: + + topology: specifies a schema that defines the flow of media within + the group. Three schemas are initially defined. "fullduplex" is + specified for use with two-dimensional groups. "parallel" and + "serial" are for use with one-dimensional groups. The definitions + of these topologies are in section 9.8. Mandatory. + + id: identifies the name of the group. Mandatory when groups are + nested. + + Events: + + terminate: causes a terminate event to be sent to each element + contained within the group. + +9.8.2. <groupexit> + + The <groupexit> element allows events to be sent when group + processing completes. Group processing completes when all contained + primitives terminate. + + Attributes: + + none + + Events: + + none + + + + + + + +Saleem, et al. Informational [Page 84] + +RFC 5707 Media Server Markup Language February 2010 + + +9.9. MSML Dialog Transform Package + + The MSML Dialog Transform Package gathers together the simple + primitives that work as filters on half-duplex media streams. + +9.9.1. <vad> + + Voice activity detection (VAD) is used to detect voice and silence + when speech recognition is not required. Similar to both speech and + DTMF, a VAD has different media conditions that it can match. Those + conditions can be qualified by a minimum length of time that is + required for them to be considered recognized. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the vad primitive. + + starttimer: boolean value that defines whether the timer is + started to allow recognition of the initial condition (voice, + silence). When set to false, the starttimer event must be + received in order for the initial condition to be recognized. The + timer does not affect recognition of the transition conditions. + Default is "false". + + Events: + + starttimer: starts the timer to allow recognition of the initial + condition if it has not already been started. Has no effect + otherwise. + + terminate: terminates voice activity detection. + + Shadow Variables: + + none + + The following sections describe the child elements of <vad>. + +9.9.1.1. <voice>, <silence>, <tvoice>, <tsilence> + + Each child element corresponds to a condition that a VAD can detect. + The first two detect when voice or silence has been initially present + for a minimum length of time since the VAD was started. The second + two require that a transition to the voice or silence condition first + occur. + + + + + +Saleem, et al. Informational [Page 85] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + len: the length of time the condition must persist in order to be + recognized. Mandatory. In the case of <tvoice> and <tsilence>, + the length of time applies only to the final recognized condition. + + sen: the maximum length of time the condition not being detected + may occur without causing the detector to begin measuring that + condition. + +9.9.2. <gain> + + Gain MAY be used to adjust of the gain of a media stream by a + specific amount. Application of <gain> removes any previous + connection AGC setting used by the <agc> element. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the gain primitive. + + incr: an increment, expressed in dB, that will be used to adjust + the gain when "louder" and "softer" events are received. Default + is 3 dB. + + amt: a specific gain to apply specified in dB. Mandatory. + + Events: + + mute: self-explanatory. + + unmute: self-explanatory. + + reset: sets the gain to zero dB. + + louder: makes the audio on a stream louder. + + softer: makes the audio on a stream quieter. + + amt: sets the gain to the specified value between -96 dB and 96 + dB. + +9.9.3. <agc> + + Automatic gain control MAY be used to have a media server + automatically adjust the gain of a media stream. Application of + <agc> removes any previous connection gain setting used by the <gain> + element. + + + +Saleem, et al. Informational [Page 86] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the gain primitive. + + tgtlvl: the desired target level for AGC, specified in dBm0 with a + valid range of -40 to 0. Mandatory. + + maxgain: an optional attribute used to specify the maximum gain + that AGC will apply, specified in dBm0 with a valid range of 0 to + 40, with a default of 10. + + Events: + + mute: self-explanatory. + + unmute: self-explanatory. + +9.9.4. <gate> + + The <gate> element is a simple filter that will pass or halt media, + regardless of the format of the media stream, based on the events it + receives. <gate> shares the same mute and unmute events for + compatibility with the gain primitives <gain> and <agc>. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the gate primitive. + + initial: the values "pass" and "halt" define whether media is + initially allowed to pass. Default is to pass. + + Events: + + mute: halts media flow through the primitive. + + unmute: allows media to pass through the primitive. + +9.9.5. <clamp> + + This element MAY be used to filter DTMF tones from a media stream. + Media other than DTMF tones is passed unchanged. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the clamp primitive. + + + +Saleem, et al. Informational [Page 87] + +RFC 5707 Media Server Markup Language February 2010 + + + Events: + + none. + +9.9.6. <relay> + + This element is a simple primitive that copies its input to its + output. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the relay primitive. + + Events: + + none. + +9.10. MSML Dialog Speech Package + + The MSML speech package defines functionality that MAY be used for + automatic speech recognition <speech> and extends the <play> + primitive defined in the MSML Dialog Base Package to include speech + synthesis. As such, this package depends on the MSML Dialog Base + Package. + +9.10.1. <speech> + + The <speech> element activates grammars or user input rules + associated with speech recognition. If multiple grammars are + specified, all are activated. All active grammars share the same + timers, recognition attributes, and <noinput> and <nomatch> elements. + Each grammar may have its own <match> element. + + <speech> terminates if any of the <grammar>, <noinput>, or <nomatch> + elements are matched the maximum number of times that they are + allowed. The number of times they may match may be specified as an + attribute of <speech> or of the individual child elements. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the speech primitive. + + noint: specifies a time period during which speech input must be + started; otherwise, the associated <noinput> element is invoked. + + + + + +Saleem, et al. Informational [Page 88] + +RFC 5707 Media Server Markup Language February 2010 + + + norect: specifies a maximum time period during which speech must + begin to be matched; otherwise, the associated <nomatch> element + is invoked. + + spcmplt: specifies the length of silence necessary after speech + before a result will be finalized in the case where there is a + complete match of an active grammar. Following the silence, the + appropriate <match> element will be triggered if the result is + above the confidence level. Otherwise, a <nomatch> element will + be triggered. + + spincmplt: specifies the length of silence necessary after speech + before a result will be finalized in the case where there is a + incomplete match of all active grammars. Following the silence, + the <nomatch> element will be triggered. + + confidence: the minimum confidence level that the recognizer must + have to consider a recognition result as matching a grammar. + Expressed as an integer between 1-100. + + sens: specifies the sensitivity of the recognizer to determine + whether speech is present. Lower sensitivity may be required for + the recognizer to work well in the presence of high background + noise or line echo. + + starttimer: boolean value that defines whether the no input + (noint) and no recognition (norect) are started initially. When + set to false, the starttimer event must be received in order to + start them. Default is "false". + + iterate: specifies the number of times the <grammar>, <noinput>, + and <nomatch> elements may be executed unless those elements + specify differently. The value "forever" may be used to indicate + that these may be executed any number of times. Default is once + '1'. + + Events: + + sens: sets the sensitivity of the recognizer as described above. + + starttimer: starts the no input (noint) and no recognition + (norect) timers if they have not already been started. Has no + effect otherwise. + + terminate: terminates the speech input and assigns values to the + shadow variables. + + + + + +Saleem, et al. Informational [Page 89] + +RFC 5707 Media Server Markup Language February 2010 + + + Shadow Variables: + + speech.end: contains the event that caused the <speech> to + terminate or is assigned one of "speech.match", "speech.noinput", + or "speech.nomatch" depending upon which of the corresponding + elements reached its maximum. + + speech.results: contains the results of a matched grammar. The + results are formatted using the Natural Language Semantics Markup + Language (NLSML) [n4]. When this variable is referenced to return + results, the results are returned as a separate MIME entity. + + The following sections describe the child elements of <speech>. + +9.10.1.1. <grammar> + + The <grammar> element specifies and activates a speech grammar based + on Speech Recognition Grammar Specification (SRGS) [n3] XML notation. + Grammars may be referenced by a URI or defined inline. Child + elements of <match> MUST be executed when the specified speech + grammar is matched. + + Attributes: + + uri: specifies the location of an SRGS grammar when the grammar is + not defined inline. + + iterate: specifies the number of times the <grammar> may be + matched. The value "forever" MAY be used to indicate that + <grammar> may be matched any number of times. This value + overrides any specified in <speech>. Default is once '1'. + +9.10.1.2. <match> + + <match> is a child of <grammar> and specifies the actions to take + when the corresponding grammar is matched. + +9.10.1.3. <noinput> + + The <noinput> element is used when speech is being recognized. + Children of the <noinput> element MUST be executed when speech has + not been detected and the no input timeout (noint) occurs. + + + + + + + + + +Saleem, et al. Informational [Page 90] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + iterate: specifies the number of times the <noinput> may be + triggered. The value "forever" may be used to indicate that + <noinput> may be triggered any number of times. This value + overrides any specified in <speech>. Default is once '1'. + +9.10.1.4. <nomatch> + + The <nomatch> element is used when speech is being recognized. + Children of the <nomatch> element MUST be executed when it is + determined that none of the active grammars will match. + + Attributes: + + iterate: specifies the maximum number of times the <nomatch> may + be triggered. The value "forever" MAY be used to indicate that + <nomatch> may be triggered any number of times. This value + overrides any specified in <speech>. Default is once '1'. + +9.10.1.5. <speechexit> + + The <speechexit> element MUST be invoked when the speech input + completes because one of <grammar>, <noinput>, or <nomatch> occurred + its maximum number of times. + + Attributes: + + none + +9.10.2. <play> + + The <play> element, as defined in the MSML Dialog Base Package, is + extended with a new child element for synthesizing speech. From an + XML perspective, <tts> is a member of a media substitution group. + See the schema at the end of this document for details. + + The following sections describe the child elements of <play>. + +9.10.2.1. <tts> + + Contents of the <tts> element are rendered using text-to-speech + services and must be compliant to the SSML specification [n11]. + Element content MAY be plain text, contain the SSML <speak> element, + or the uri attribute should identify the location of text to be + rendered. + + + + + +Saleem, et al. Informational [Page 91] + +RFC 5707 Media Server Markup Language February 2010 + + + Attributes: + + uri: identifies the location of the text to be rendered. The file + and http schemes are supported. + + iterate: specifies the number of times the text-to-speech block is + to be rendered. Defaults to once '1'. + + xml:lang: specifies the language to use when it is not explicitly + specified as an attribute for <speak>. + +9.11. MSML Dialog Fax Detection Package + + The Fax Detection Package defines primitives that allow a media + server to provide facsimile detection services. + +9.11.1. <faxdetect> + + Fax tone detection is used to detect the presence of the T.30 Calling + Tone (CNG) or Called Station Identification (CED) tone in a media + stream. Child elements of <faxdetectexit> MUST be executed when a + CNG tone is detected. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the faxdetect primitive. + + Events: + + terminate: terminates fax tone detection and assigns values to the + associated shadow variables. + + Shadow Variables: + + faxdetect.tone: A string that specifies the fax tone type detected + by the media server. Values supported SHOULD include "CED", + "CNG", or empty string. The empty string MUST be used if fax tone + detection terminated before detection of a fax tone, resulting in + execution of the <faxdetectexit> element. + + faxdetect.end: A string value that specifies the reason for + termination of <faxdetect>. Values supported SHOULD include + "faxdetect.complete" (due to detection of CED or CNG tone), + "faxdetect.failed.noresource" (failed due to lack of resources on + the media server), "faxdetect.failed" (failed due to any other + reason) "faxdetect.terminated" (terminated by <dialogend>), or + undefined. + + + +Saleem, et al. Informational [Page 92] + +RFC 5707 Media Server Markup Language February 2010 + + +9.11.2. <faxdetectexit> + + The <faxdetectexit> element MUST be invoked when fax detection, + invoked via <faxdetect>, terminates. Child elements of + <faxdetectexit>, <send> and <exit>, allow events to be reported by + the media server. + + Attributes: + + none + +9.12. MSML Dialog Fax Send/Receive Package + +9.12.1. <faxsend> + + The <faxsend> primitive provides the functionality of a calling fax + terminal. This typically means sending a set of pages. However, it + can also mean requesting the called terminal to send pages instead + of, or in addition to, receiving pages. The fax images to send are + defined by the <sendobj> elements, described below. + + Requesting the called terminal to send pages happens when the + <rxpoll> element is included as part of <faxsend>. This element may + be included in addition to, or instead of, the <sendobj> element. + One <sendobj> (at a minimum) or <rxpoll> element must be present. + When both are present, a media server will first send pages and will + then poll the other terminal, requesting pages. + + Because fax is a distinct media type, the <faxsend> primitive is not + expected to interact with other primitives. Rather, it will interact + using fax protocols with a remote fax terminal (or gateway) and will + send requested status events to its invoking environment. During fax + operation, shadow variables are used to record the progress and + parameters of the varying stages of fax operation. + + Status events are requested by including one or more status request + elements. These elements correspond to different stages or events in + fax operation and cause predefined events to be sent to the invoking + environment when they occur. Since the only recipient of these + events is expected to be a fax control agent, requests are simplified + by associating a predefined namelist of shadow variables with each + event. This decision may be revisited to allowed tailored namelists + based on further implementation experience. Status requests apply + both to sending and polling operation. + + Attributes: + + lclid: the identifier that a media server uses to identify itself. + + + +Saleem, et al. Informational [Page 93] + +RFC 5707 Media Server Markup Language February 2010 + + + minspeed: the minimum acceptable speed to negotiate for the + operation. + + maxspeed: the maximum speed to negotiate for the operation. This + attribute is primarily for testing purposes. + + ecm: specifies whether Error Correction Mode (ECM) is allowed to + be used if supported by the remote terminal. Defaults to "true". + + Events: + + terminate: terminates the fax send operation. + + Shadow Variables: + + fax.rmtid: the identifier of the remote fax terminal. + + fax.rate: the negotiated speed for the operation. + + fax.resolution: identifies the resolution of the image. Both + metric- and inch-based resolutions are defined. Metric-based + resolutions are 75x75, 150x150, 204x98, 204x196, 204x391, and + 408x391. Inch-based resolutions are 200x200, 300x300, 400x400, + and 600x600. + + fax.pagesize: identifies the negotiated page size. Metric sizes + are "A3", "A4", "A5", "A6", and "B4". Inch-based page sizes are + "Letter" and "Legal". + + fax.encoding: identifies the image encoding utilized. Valid + values are "MH", "R", "MMR", and "JPEG". + + fax.ecm: identifies whether ECM operation was used. + + fax.pagebadlines: the number of bad lines in a page. + + fax.objbadlines: the number of bad lines in an object. + + fax.opbadlines: the number of bad lines in an operation. + + fax.objuri: the objuri of the current object. + + fax.resendcount: the number of pages resent due to errors. + + fax.totalpages: the number of pages processed or stored. + + fax.totalobjects: the count of the objects used in the operation. + + + + +Saleem, et al. Informational [Page 94] + +RFC 5707 Media Server Markup Language February 2010 + + + fax.duration: the duration of the operation expressed as a + duration in seconds and milliseconds (e.g., "23s250ms"). + + fax.result: contains the reason that caused the fax operation to + complete. When the operation completes successfully, the value + will be assigned "fax.success". Other values include + "fax.partial", "fax.nofax", "fax.remotedisconnect", + "fax.uri.access.error", and "fax.invalid.startpage". + + The following sections describe the child elements of <faxsend>. + +9.12.1.1. <sendobj> + + <sendobj> is used to define a fax transmission. There MAY be + multiple instances of the element, which will be transmitted in + order. + + Attributes: + + objuri: a URI that points to the fax image that will be + transmitted. Mandatory. + + startpage: the first page of a multi-page objuri to send. + + pagecount: page count. + +9.12.1.2. <hdrfooter> + + <hdrfooter> describes the header/footer that a media server MAY put + on pages. The header or footer may be defined as the content of the + <format> child element. The <format> element is only allowed if the + type attribute has a value of "header" or "footer". + + Attributes: + + type: specifies whether a header or a footer should be put on + pages and identifies the source of the header or footer. The + following enumerated values may be used: + + "header" indicates that the media server should put a header on + pages using the contents of the <format> element. + + "nohdr" indicates that there should be no header or footer. + + "footer" indicates that the media server should put a footer on + pages using the contents of the <format> element. + + + + + +Saleem, et al. Informational [Page 95] + +RFC 5707 Media Server Markup Language February 2010 + + + style: defines the style of insertion onto a fax page that a media + server should use for the header or footer. Valid styles are + "append", "overlay", or "replace". + + <format> is a child of the <hdrfooter> element that defines the style + format to be used for the header or footer. It uses a "C" language + style format statement (as shown below) to define the contents and + layout of the header or footer. + + code length name format + %a 3 day of week 3-character abbreviation + %d 2 date 01-31 + %m 2 month 01-12 + %y 2 year 00-99 + %Y 4 year 0000-9999 + %I 2 12 hour 01-12 + %H 2 24 hour 00-23 + %M 2 minute 00-59 + %S 2 seconds 00-59 + %p 2 AM/PM AM or PM + %P 2 page number 01-99 + %T 2 total pages 01-99 + %l 20 local ID (sender) 0-9, + or spaces + %r 20 remote ID (rcvr) 0-9, + or spaces + %% 1 percent display % in header/ftr + +9.12.1.3. <rxpoll> + + <rxpoll> provides the information necessary for a receive polling + operation to occur. The object(s) to be received are defined by one + or more <rcvobj> elements. The <rcvobj> is defined further under the + child elements of <faxrcv>. The <rxpoll> element MAY also include a + description of the header/footer that a media server SHOULD put on + received pages. The <hdrfooter> element and its usage is described + above. + + Attributes: + + rmtid: specifies the identifier of the remote fax terminal that is + to be associated with a polling operation. A media server MUST + NOT execute a polling operation unless the value of rmtid matches + that of the connected remote machine. Mandatory. + + + + + + + + + +Saleem, et al. Informational [Page 96] + +RFC 5707 Media Server Markup Language February 2010 + + +9.12.1.4. <faxstart> + + The <faxstart> element requests that an event be sent when fax + operation has begun. When triggered, the following will be executed: + + <send target="source" event="fax.start"/> + +9.12.1.5. <faxnegotiate> + + The <faxnegotiate> element requests that an event be sent when a + negotiation has been completed. Multiple events MAY be sent each + time a Digital Command Signal (DCS) frame is sent or received. When + triggered, the following will be executed: + + <send target="source" event="fax.negotiate" + namelist="fax.rmtid + fax.rate + fax.resolution + fax.pagesize + fax.encoding + fax.ecm"/> + +9.12.1.6. <faxpagedone> + + The <faxpagedone> element requests that an event be sent when a page + has been sent or received. When triggered, the following will be + executed: + + <send target="source" event="fax.pagedone" + namelist="fax.resolution + fax.pagesize + fax.encoding + fax.pagebadlines + fax.resendcount"/> + +9.12.1.7. <faxobjectdone> + + The <faxobjectdone> element requests that an event be sent when an + objuri has been completed. When triggered, the following will be + executed: + + <send target="source" event="fax.objectdone" + namelist="fax.objuri + fax.objbadlines + fax.resendcount + fax.totalpages + fax.result"/> + + + + +Saleem, et al. Informational [Page 97] + +RFC 5707 Media Server Markup Language February 2010 + + +9.12.1.8. <faxopcomplete> + + The <faxopcomplete> element requests that an event be sent when an + operation has been completed. When triggered, the following will be + executed: + + <send target="source" event="fax.opcomplete" + namelist="fax.totalpages + fax.opbadlines + fax.resendcount + fax.totalobjects + fax.duration + fax.result"/> + +9.12.1.9. <faxpollstarted> + + The <faxpollstarted> element requests that an event be sent when a + polling operation has started. When triggered, the following will be + executed: + + <send target="source" event="fax.opcomplete" + namelist="fax.rmtid + fax.rate + fax.resolution + fax.pagesize + fax.encoding + fax.ecm"/> + +9.12.2. <faxrcv> + + The <faxrcv> primitive provides the functionality of a called fax + terminal. Typically this type of operation is to receive pages. + However, it can include sending pages instead of, or in addition to, + receiving them. The fax objects to receive are defined by the + <rcvobj> elements, described below. + + A media server SHOULD send pages as a polled terminal when the + <txpoll> element is included as part of <faxrcv>. This element may + be included in addition to, or instead of, the <rcvobj> element. One + <rcvobj> or <txpoll> element must be present. When both are present, + a media server SHOULD first receive pages and will then allow the + other terminal to poll the media server, requesting pages. + + Because fax is a distinct media type, the <faxrcv> primitive is not + expected to interact with other primitives. Rather, it will interact + using fax protocols with a remote fax terminal and will send + + + + + +Saleem, et al. Informational [Page 98] + +RFC 5707 Media Server Markup Language February 2010 + + + requested status events to its invoking environment. During fax + operation, shadow variables are used to record the progress and + parameters of the varying stages of fax operation. + + Status events are requested by including one or more status request + elements. These elements correspond to different stages or events in + fax operation and cause predefined events to be sent to the invoking + environment when they occur. Since the only recipient of these + events is expected to be a fax control agent, requests are simplified + by associating a predefined namelist of shadow variables with each + event. This decision may be revisited to allowed tailored namelists + based on further implementation experience. Status requests apply + both to receiving and polling operation. + + Attributes: + + id: an optional identifier that may be referenced elsewhere for + sending events to the faxrecv primitive. + + lclid: the identifier that a media server uses to identify itself. + + ecm: specifies whether ECM mode is allowed to be used if supported + by the remote terminal. Defaults to "true". + + Events: + + terminate: terminates the fax reception operation. + + Shadow Variables: + + <faxrcv> supports the same set of shadow variables as <faxsend> + + The following sections describe the child elements of <faxrcv>. + + In addition to the elements defined below, <faxrcv> MAY also have + the following child elements, which were defined under <faxsend>: + + o <hdrfooter> + o <faxstart> + o <faxnegotiate> + o <faxpagedone> + o <faxobjectdone> + o <faxopcomplete> + o <faxpollstarted> + + Their meaning and usage are the same as previously defined. + + + + + +Saleem, et al. Informational [Page 99] + +RFC 5707 Media Server Markup Language February 2010 + + +9.12.2.1. <rcvobj> + + <rcvobj> is used to define fax objects that a media server will + receive. There may be multiple instances of the element, which will + be used in order. + + Attributes: + + objuri: a URI that points to the location that a received image is + to be stored. Mandatory. + + maxpages: the maximum number of pages that will be stored in + objuri. + +9.12.2.2. <txpoll> + + <txpoll> provides the information for a polling operation to occur as + part of a fax receive operation. An object or multiple objects to be + sent may be supplied by one or more <sendobj> elements. In the event + of multiple occurrences, a media server MUST select the <sendobj> + element whose rmtid attribute matches that of the remote terminal. + + The <sendobj> element was defined previously as a child element of + <faxsend>. The <txpoll> element is extended with an rmtid attribute + that specifies the identifier of the remote fax terminal and is used + to select the specific <sendobj> to send. + + A media server SHOULD put a header/footer on transmitted pages based + on any <hdrfooter> element included as part of <txpoll>. + + Attributes: + + rmtid: specifies the identifier of the remote fax terminal that is + to be associated with a polling operation. A media server MUST + NOT execute a polling operation unless the value of rmtid matches + that of the connected remote machine. Mandatory. + +10. MSML Audit Package + +10.1. MSML Audit Core Package + + This section describes the MSML Audit Core Package that MAY be + implemented to support auditing services. + + Audit requests and results may vary based on the information being + audited. The MSML Audit Core Package specifies the framework to send + audit request, defines a state list, and builds audit results. The + + + + +Saleem, et al. Informational [Page 100] + +RFC 5707 Media Server Markup Language February 2010 + + + additional audit packages define package specific state lists and + associated audit result content. The additional audit packages MUST + be defined within the framework specified by the Audit Core Package. + +10.1.1. <audit> + + The <audit> element is an optional child element of <msml>, which MAY + be used by MSML clients to perform state auditing of current media + resources allocated and in use by the media server. The requested + state information is returned in an MSML response. + + Attributes: + + queryid: the identifier of the MSML object being queried by the + MSML client. Mandatory. Supported object types: conference or + connection. Wildcards are allowed. + + statelist: a list of one or more state parameters that are being + queried. Optional. If not present, the media server SHOULD + return the id of audited object only. Each object type may + contain a set of states. If the "statelist" contains any state + that does not match the audited object type, the request MUST be + rejected. + + mark: in the case of an error, the value of the mark attribute + from the last successfully executed element that included the mark + attribute. + + State Parameters: + + The state parameter MUST be named using a dot-notation format + "audit.X.a.b.c...", where X is the mandatory field that indicates + the class name of the object (e.g., "conf" or "conn") and the + "a.b.c..." is the optional field used to describe the actual name + of the state parameter in a hierarchical manner. The wildcard "*" + MAY be used as part of a state name; however, it MUST only be used + in the last field of the dot-notation (e.g., "audit.conf.*" is + valid, but "audit.conf.*.a" is invalid). When a wildcard is used, + it is equivalent to querying all the states below the specified + level. Each field (e.g., within "a.b.c...") will result in + individual element names <a>, <b>, and <c> in the audit result to + contain corresponding state value. The parent/child relationship + between these elements follows the hierarchy of the state name + (i.e., <c> is child element of <b>, and <b> is child element of + <a>). + + + + + + +Saleem, et al. Informational [Page 101] + +RFC 5707 Media Server Markup Language February 2010 + + +10.1.2. <auditresult> + + The <auditresult> element is an optional child element of <result>, + which MUST be used by the media server to return the audit result. A + specific instance of the <auditresult> element contains the state + information of a single active object. Therefore, if multiple + objects are within the scope of the audit request, then one + <auditresult> element per object MUST be present. A zero occurrence + of <auditresult> element indicates that there are no active resources + within the scope of the audit request. + + Attributes: + + targetid: the identifier of a conference or connection. + Mandatory. Wildcard is not allowed. + + The <auditresult> may contain child element(s) that return additional + state information, corresponding to the "statelist" attribute in the + <audit> request. The child element names correspond to the fields of + the state parameter name (e.g., "a.b.c..."), following the same + hierarchical structure. + +10.2. MSML Audit Conference Package + + This section describes the MSML Audit Conference Package that MUST be + implemented to support auditing of conference services. The MSML + Audit Conference Package follows the framework specified by the MSML + Audit Core Package. This package defines the state parameter list + and audit result for conference auditing. + +10.2.1. State Parameters + + All conference state parameter names MUST be prefixed by + "audit.conf". + + confconfig: query the conferences general configuration. + + confconfig.audiomix: query the audio mixer's general configuration + in the conference. + + confconfig.audiomix.asn: query the current ASN setting in the + audio mixer. + + confconfig.audiomix.n-loudest: query the current n-loudest setting + in the audio mixer. + + confconfig.videolayout: query the video layout's general + configuration in the conference. + + + +Saleem, et al. Informational [Page 102] + +RFC 5707 Media Server Markup Language February 2010 + + + confconfig.videolayout.root: query the root window setting of the + video layout. + + confconfig.videolayout.selector: query the video stream selector + setting of the video layout. + + confconfig.controller: query who is the conference controller. + + dialog: query the active dialog information on the conference. + See MSML Audit Dialog Package for details. + + stream: query the active stream information on the conference. + See MSML Audit Stream Package for details. + +10.2.2. <auditresult> + + The <auditresult> attribute of "targetid" is required to indicate + results for auditing a conference. + + The <auditresult> element may optionally contain the following child + elements, returning additional conference state information, if + corresponding states are queried and available. + +10.2.2.1. confconfig + + The <confconfig> element is used to return the general configuration + state(s) of a conference, using the following attributes. + + Attributes: + + deletewhen: as defined by <createconference> element in MSML + Conference Core Package. + + term: as defined by <createconference> element in MSML Conference + Core Package. + +10.2.2.2. confconfig.audiomix + + The <audiomix> element contains the general audio mixer configuration + using the following attributes. + + Attributes: + + id: as defined by <audiomix> element in MSML Conference Core + Package. + + samplerate: as defined by <audiomix> element in MSML Conference + Core Package. + + + +Saleem, et al. Informational [Page 103] + +RFC 5707 Media Server Markup Language February 2010 + + +10.2.2.3. confconfig.audiomix.asn + + The <asn> element contains the current ASN setting of an audio mixer, + if ASN is enabled. The state values are included in the following + attributes. + + Attributes: + + ri: as defined by <asn> element in MSML Conference Core Package. + + asth: as defined by <asn> element in MSML Conference Core Package. + +10.2.2.4. confconfig.audiomix.n-loudest + + The <n-loudest> element contains the current n-loudest setting of the + audio mixer. The state values are included in the following + attributes. + + Attributes: + + n: as defined by <n-loudest> element in MSML Conference Core + Package. + +10.2.2.5. confconfig.videolayout + + The <videolayout> element contains the general video layout + configuration using the following attributes. + + Attributes: + + id: as defined by <videolayout> in MSML Conference Core Package. + + type: as defined by <videolayout> in MSML Conference Core Package. + +10.2.2.6. confconfig.videolayout.root + + The <root> element is used to contain root window settings. + + Attributes: + + size: as defined by <root> element in MSML Conference Core + Package. + + backgroundcolor: as defined by <root> element in MSML Conference + Core Package. + + Backgroundimage: as defined by <root> element in MSML Conference + Core Package. + + + +Saleem, et al. Informational [Page 104] + +RFC 5707 Media Server Markup Language February 2010 + + +10.2.2.7. confconfig.videolayout.selector + + The <selector> element is used to contain selector settings. + + Attributes: + + id: as defined by <selector> element in MSML Conference Core + Package. + + method: as defined by <selector> element in MSML Conference Core + Package. + + status: as defined by <selector> element in MSML Conference Core + Package. + + blankothers: as defined by <selector> element in MSML Conference + Core Package. + + si: as defined by <selector> element in MSML Conference Core + Package when selector method is "vas". + + speakersees: as defined by <selector> element in MSML Conference + Core Package when selector method is "vas". + +10.2.2.8. confconfig.controller + + The <controller> element is used to return the conference controller + id in its content. The conference controller is the SIP dialog that + carries the <createconference> request. The return value is the MSML + connection id. + +10.2.2.9. dialog + + If conference dialog state is queried, the audit result is returned + using the <dialog> element as specified in the MSML Audit Dialog + Package. + +10.2.2.10. stream + + If conference stream state is queried, the audit result is returned + using the <stream> element as specified in the MSML Audit Stream + Package. + + + + + + + + + +Saleem, et al. Informational [Page 105] + +RFC 5707 Media Server Markup Language February 2010 + + +10.3. MSML Audit Connection Package + + This section describes the MSML Audit Connection Package that MAY be + implemented to support auditing connection services. The MSML Audit + Connection Package follows the framework specified by the MSML Audit + Core Package. This package defines the state parameter list and + audit result for connection auditing. + +10.3.1. State Parameters + + Connection state parameter names are prefixed by "audit.conn". + + sipdialog: queries the identifier of the SIP dialog with which the + connection is associated. + + sipdialog.localseq: queries one of the SIP dialog states - local + sequence number. + + sipdialog.remoteseq: queries one of the SIP dialog states - remote + sequence number. + + sipdialog.localURI: queries one of the SIP dialog states - local + URI. + + sipdialog.remoteURI: queries one of the SIP dialog states - remote + URI. + + sipdialog.remotetarget: queries one of the SIP dialog states - + remote target. + + sipdialog.routeset: queries one of the SIP dialog states - route + set. + + localsdp: queries the local SDP body of the connection. + + remotesdp: queries the remote SDP body of the connection. + + dialog: queries the active dialog information on the connection. + See MSML Audit Dialog Package for details. + + stream: queries the active stream information on the connection. + See MSML Audit Stream Package for details. + +10.3.2. <auditresult> + + The <auditresult> attribute "targetid" MUST specify a connection + identifier for a connection result. + + + + +Saleem, et al. Informational [Page 106] + +RFC 5707 Media Server Markup Language February 2010 + + + The <auditresult> element MAY contain the following child elements + optionally to return additional connection state information if the + corresponding states are queried and are available. + +10.3.2.1. sipdialog + + The <sipdialog> element contains the associated SIP dialog + information. The SIP dialog ID information is returned using the + following attributes. + + Attributes: + + callid: call-ID value as defined in [n1]. Mandatory. + + localtag: local-tag value as defined in [n1]. Mandatory. + + remotetag: remote-tag value as defined in [n1]. Mandatory. + + This element can contain the following child elements optionally to + return additional SIP dialog state information to the client if the + corresponding states are queried and available. + +10.3.2.2. sipdialog.localseq + + The <localseq> element contains the local sequence number. The local + sequence number is one of the SIP dialog states as defined in [n1]. + +10.3.2.3. sipdialog.remoteseq + + The <remoteseq> element contains the remote sequence number. The + remote sequence number is one of the SIP dialog states as defined in + [n1]. + +10.3.2.4. sipdialog.localuri + + The <localuri> element contains the local URI value. The local URI + is one of the SIP dialog states as defined in [n1]. + +10.3.2.5. sipdialog.remoteuri + + The <remoteuri> element contains the remote URI value. The remote + URI is one of the SIP dialog states as defined in [n1]. + +10.3.2.6. sipdialog.remotetarget + + The <remotetarget> element contains the remote target value. The + remote target is one of the SIP dialog states as defined in [n1]. + + + + +Saleem, et al. Informational [Page 107] + +RFC 5707 Media Server Markup Language February 2010 + + +10.3.2.7. sipdialog.routeset + + The <routeset> element contains the route-set value (an ordered list + of URIs separated by comma). The route set is one of the SIP dialog + states as defined in [n1]. + +10.3.2.8. localsdp + + The <localsdp> element contains the local SDP body. + +10.3.2.9. remotesdp + + The <remotesdp> element contains the remote SDP body. + +10.3.2.10. dialog + + If the connection dialog state is queried, the audit result returns + the queried information using the <dialog> element, as specified in + the MSML Audit Dialog Package. + +10.3.2.11. stream + + If the connection stream state is queried, the audit result returns + the queried information using the <stream> element, as specified in + the MSML Audit Stream Package. + +10.4. MSML Audit Dialog Package + + This section describes the MSML Audit Dialog Package that MAY be + implemented to support auditing dialogs. The MSML Audit Dialog + Package follows the framework specified by the MSML Audit Core + Package. + + The MSML Audit Dialog Package must be used together with either the + MSML Audit Conference Package or MSML Audit Connection Package, since + the dialogs are applicable to conferences or connections. + +10.4.1. State Parameters + + Dialog state parameter names are prefixed by "dialog". Since this + package must be used together with the MSML Audit Conference Package + or MSML Audit Connection Package, the complete dialog state name must + be prefixed by "audit.conf.dialog" or "audit.conn.dialog", depending + on the context within which the dialog state is queried. + + dialog: queries the number of active dialog(s) running on the target + (a conference or connection); basic dialog information will be + returned. + + + +Saleem, et al. Informational [Page 108] + +RFC 5707 Media Server Markup Language February 2010 + + + dialog.duration: queries the amount of time a dialog has been + running. + + dialog.primitive: queries the media primitive currently being + executed by the dialog. + + dialog.controller: queries the dialog controller. + +10.4.2. <dialog> + + The <dialog> element is a child element of <auditresult>, which + contains the active dialog information on the target identified by + the attribute "targetid" of the <audioresult> element. + + Basic dialog information is returned using the following attributes. + + Attributes: + + src: as defined by the <dialogstart> element in the MSML Dialog + Core Package. + + type: as defined by the <dialogstart> element in the MSML Dialog + Core Package. Mandatory. + + name: as defined by the <dialogstart> element in the MSML Dialog + Core Package. Mandatory. + + This element may contain the following child elements optionally to + return additional dialog information if the corresponding state + parameter has been queried and the state value is available. + +10.4.2.1. <duration> + + The <duration> element returns the duration that a dialog has been + running on the specified target. The duration value is included in + the element content. It is a positive integer value (in unit of + seconds). + +10.4.2.2. <primitive> + + The <primitive> element returns the currently active media primitive + in its content. The active media primitive is the primitive that is + currently being executed. Possible return values are play, dtmf, + collect, dtmfgen, tonegen, record, or none. + + + + + + + +Saleem, et al. Informational [Page 109] + +RFC 5707 Media Server Markup Language February 2010 + + +10.4.2.3. <controller> + + The <controller> element returns the dialog controller id in its + content. The dialog controller is the SIP dialog that carries the + <dialogstart> request. The returned value is the MSML connection id. + +10.5. MSML Audit Stream Package + + This section describes the MSML Audit Stream Package that MAY be + implemented to support auditing stream. The MSML Audit Stream + Package follows the framework specified by the MSML Audit Core + Package. + + The MSML Audit Stream Package MUST be used together with either the + MSML Audit Conference Package or the MSML Audit Connection Package, + since the stream is applicable between conferences, between + connections, or between conferences and connections. + +10.5.1. State Parameters + + Stream state parameter names are prefixed by "stream". Since this + package must be used together with the MSML Audit Conference Package + or MSML Audit Connection Package, the complete stream state name must + be prefixed by "audit.conf.stream" or "audit.conn.stream", depending + on the context within which the stream state is queried. + + stream: queries the number of active streams created on the audited + object; basic stream information will be returned. + + stream.clamp: queries the clamping status. + + stream.gain: queries the gain control information. + + stream.visual: queries the visual setting. + +10.5.2. <stream> + + The <stream> element is a child element of <auditresult> and contains + the active stream information on the target identified by the + attribute "targetid" of the <audioresult> element. + + Basic stream information is returned using the following attributes. + + Attributes: + + joinwith: an identifier of either a connection or a conference + with which the audited object is joined. Mandatory. Wildcard is + not allowed. + + + +Saleem, et al. Informational [Page 110] + +RFC 5707 Media Server Markup Language February 2010 + + + media: as defined by the <stream> element in the MSML Conference + Core Package. Mandatory. + + dir: direction of stream, from audited target perspective, "from" + or "to". Mandatory. + + compressed: as defined by the <stream> element in the MSML + Conference Core Package. + + display: as defined by the <stream> element in the MSML Conference + Core Package. + + override: as defined by the <stream> element in the MSML + Conference Core Package. + + preferred: as defined by the <stream> element in the MSML + Conference Core Package. + + This element MAY contain the following child elements that optionally + return additional stream information, if the corresponding state + parameter is queried and the state value is available. + +10.5.2.1. <clamp> + + The <clamp> element is included if stream clamping is active. The + currently active clamping state values are returned using the + attributes as defined by the <clamp> element in the MSML Conference + Core Package. + +10.5.2.2. <gain> + + The <gain> element is included if stream gain is active. The current + gain control state values are returned using the attributes as + defined by the <gain> element in the MSML Conference Core Package. + +10.5.2.3. <visual> + + The <visual> element is included if stream visual display is active. + The current visual display settings are returned using the attributes + as defined by the <visual> element in the MSML Conference Core + Package. + +11. Response Codes + + Response codes are used to indicate reasons for failures as well as + completion status. The appropriate code and description must be + passed to the invoking environment on failure. + + + + +Saleem, et al. Informational [Page 111] + +RFC 5707 Media Server Markup Language February 2010 + + + The response codes defined in this section are returned as the value + of the response attribute to the <result> element. Some values may + also be returned as part of a namelist to an "msml.dialog.exit" event + generated when an executing MSML dialog fails. + + Informational (1xx) + + Reserved for future use + + Success (200) + + 200 OK + + Request Error (4xx) + + 400 Bad Request + 401 Unknown Element + 402 Unsupported Element + 403 Missing mandatory element content + 404 Forbidden element content + 405 Invalid element content + 406 Unknown attribute + 407 Attribute not supported + 408 Missing mandatory attribute + 409 Forbidden attribute is present + + 410 Invalid attribute value + + 420 Unsupported media description language + 421 Unknown media description language + 422 Ambiguous request (both URI and inline description) + 423 External document fetch error + 424 Syntax error in foreign language + 425 Semantic error in foreign language + 426 Unknown error executing foreign language + + 430 Object does not exist + 431 Object instance name already used + 432 Conference name already in use + 433 reserved + 434 External document fetch error + + 440 Cannot join objects of the specified class + 441 Objects have incompatible media types + 442 reserved + 443 reserved + 444 Number of media inputs exceeded + + + + +Saleem, et al. Informational [Page 112] + +RFC 5707 Media Server Markup Language February 2010 + + + 450 Objects have incompatible media formats + 451 Incompatible media stream format + + Server Error (5xx) + + 500 Internal media server error + 503 Service Unavailable + 510 Not in service + 511 Service Unavailable + 520 No resource to fulfill request + 521 Internal limit exceeded + +12. MSML Conference Examples + + These examples focus on the MSML Conference Core Package used by a + control agent (CA) to control services on a media server (MS). They + show the relationship between SIP signaling to establish media + sessions and MSML service control commands. For brevity, only the + content of MSML messages is shown. The examples assumes that the CA + and MS use the IPv4 address and UDP port number of the audio stream + (on the MS) to identify the MSML connection. + +12.1. Establishing a Dial-In Conference + + UA Control Agent Media Server + | | | + | | INVITE F1 | + | |-------------------------->| + | | 200 F2 | + | |<--------------------------| + | | ACK F3 | + | |-------------------------->| + | | | + | | createconference> F4 | + | |-------------------------->| + | | 200 F5 | + | |<--------------------------| + | INVITE (SDP UA) F6 | | + |------------------------>| | + | | INVITE (SDP UA) F7 | + | |-------------------------->| + | | 200 (SDP MS) F8 | + | |<--------------------------| + | | ACK F9 | + | |-------------------------->| + | 200 (SDP MS) F10 | | + |<------------------------| | + | ACK F11 | | + + + +Saleem, et al. Informational [Page 113] + +RFC 5707 Media Server Markup Language February 2010 + + + |------------------------>| | + | | <dialogstart> F12 | + | |-------------------------->| + | | 200 F13 | + | |<--------------------------| + | | HTTP interactions F14 | + | |<------------------------->| + | | <event>(dialog.exit) F15 | + | |<--------------------------| + | | <join> F16 | + | |-------------------------->| + | | 200 F17 | + | |<--------------------------| + | ... | ... | + | | | + | | <dialogstart> F18 | + | |-------------------------->| + | | 200 F19 | + | |-------------------------->| + | | HTTP interactions F20 | + | |<--------------------------| + | | <event>(dialog.exit) F21 | + | |-------------------------->| + | ... | ... | + | | | + + Steps 1-3: establish an MSML control channel for the conference. + Alternatively, a control channel could already have been established + that was used for all CA/MS interactions. A control channel per + conference is only one possible model. Currently, MSML uses SIP INFO + requests and responses on this SIP dialog. There is a proposal to + use this message exchange to establish a TCP channel for MSML similar + to the approach used for the Media Resource Control Protocol v2 + (MRCPv2). This approach would require that a request identifier be + added to the <msml> element to correlate requests and responses. + This currently relies on the SIP INFO request and response for this + property. MSML messages are shown without specifying the transport + in this example, but it assumes a request/response correlation based + on transport messages. + + Step 4: create a conference that will mix the loudest two speakers + and report those speakers to the control agent every 10 seconds. The + media server will automatically terminate remaining media sessions + and delete the conference and associated resources and when the + control channel is terminated. + + + + + + +Saleem, et al. Informational [Page 114] + +RFC 5707 Media Server Markup Language February 2010 + + + <msml version="1.1"> + <createconference name="exampleConf" deletewhen="nocontrol"> + <audiomix> + <n-loudest n="3"/> + <asn ri="10s"/> + </audiomix> + </createconference> + </msml> + + Step 5: conference created successfully + + <msml version="1.1"> + <result response="200"/> + </msml> + + Steps 6-11: standard 3PCC establishment of a user-initiated media + session to a media server. This is the equivalent of a dial-in + conference participant. The "To:" header returned by the MS in the + 200 response of Step F8 was: + + To: <sip:msml@ms.example.com>;tag=jd87dfg4h + + Step 12: request an initial dialog with the participant to prompt for + their name, desired conference, etc. The dialog completes by + informing the participant that they are joining the conference. If + this was not the first participant, the dialog could also announce + the other participants. + + <msml version="1.1"> + <dialogstart target="conn:jd87dfg4h" name="12345" + type="application/vxml+xml" + src="http://server.example.com/scripts/initial.vxml"/> + </msml> + + Step 13: dialog started successfully. The dialog identifier is + returned. + + <msml version="1.1"> + <result response="200"/> + <dialogid>conn:jd87dfg4h/dialog:12345</dialogid> + </msml> + + Step 14: sequence of HTTP VoiceXML dialog interactions. + + Step 15: the VoiceXML browser exits (but does not disconnect). If a + namelist had been specified within the VoiceXML <exit> element, it + would have been included in the <event> sent to the CA. + + + + +Saleem, et al. Informational [Page 115] + +RFC 5707 Media Server Markup Language February 2010 + + + <msml version="1.1"> + <event name="msml.dialog.exit" + id="conn:jd87dfg4h/dialog:12345"/> + </msml> + + Step 16: join the participant to the conference and have the volume + of their contributing audio automatically adjusted to a target level + of -20 dBm0. + + <msml version="1.1"> + <join id1="conn:jd87dfg4h" id2="conf:exampleConf"> + <stream media="audio" dir="from-id1"> + <gain agc="true" tgtlvl="-20"/> + </stream> + <stream media="audio" dir="to-id1"/> + </msml> + + Step 17: successfully joined to conference + + <msml version="1.1"> + <result response="200"/> + </msml> + + Steps 6 through 17 are repeated for the second participant. + + Step 18: play a join tone or message announcing the new participant + to the conference. + + <msml version="1.1"> + <dialogstart target="conf:exampleConf" + type="application/vxml+xml" + src="http://server.example.com/scripts/joinmsg.vxml"/> + </msml> + + Step 19: dialog started successfully. The dialog identifier is + returned. The media server assigned a unique identifier since name + attribute was not specified in <dialogstart>. + + <msml version="1.1"> + <result response="200"/> + <dialogid>conf:ExampleConf/dialog:j6fs8745</dialogid> + </msml> + + Step 20: HTTP VoiceXML dialog interaction(s). + + + + + + + +Saleem, et al. Informational [Page 116] + +RFC 5707 Media Server Markup Language February 2010 + + + Step 21: the VoiceXML browser exits. + + <msml version="1.1"> + <event name="msml.dialog.exit" + id="conf:ExampleConf/dialog:j6fs8745"/> + </msml> + + Steps 6 through 21 are repeated for the third and subsequent + participants. + +12.2. Example of a Sidebar Audio Conference + + This example assumes that a conference has already been established + as in the previous example. It creates a sidebar conference that + hears the main conference as a whisper. Three participants are moved + to the sidebar. After some period of time, the sidebar participants + are returned to the main conference and the sidebar is deleted. + + Step 1: the sidebar conference is created. It is joined half-duplex + to the main conference and a manual gain object is inserted in the + media stream. Three participants are then moved from the main + conference to the sidebar. Although not shown, a CA could include + the "mark" attribute in each element to allow recovery in the event + of a mid- transaction error. + + <msml version="1.1"> + <createconference name="sidebarConf" + deletewhen="nomedia"> + <audiomix/> + </createconference> + <join id1="conf:sidebarConf" id2="conf:exampleConf"> + <stream media="audio" dir="to-id1"> + <gain amt="-20"/> + </stream> + </join> + <unjoin id1="conn:gs5s4-1" id2="conf:exampleConf"/> + <join id1="conn:gs5s4-1" id2="conf:sidebarConf"/> + <unjoin id1="conn:hd764gr9-2" id2="conf:exampleConf"/> + <join id1="conn:hd764gr9-2" id2="conf:sidebarConf"/> + <unjoin id1="conn:h37frdvgs65-3" id2="conf:exampleConf"/> + <join id1="conn:h37frdvgs65-3" id2="conf:sidebarConf"/> + </msml> + + + + + + + + + +Saleem, et al. Informational [Page 117] + +RFC 5707 Media Server Markup Language February 2010 + + + Step 2: sidebar conference created successfully and participants + joined. + + <msml version="1.1"> + <result response="200"/> + </msml> + + Step 3: once the sidebar conference has completed, the participants + are rejoined to the main conference. The sidebar is destroyed + automatically by the MS when the last media stream is removed as + specified when the sidebar conference was created. + + <msml version="1.1"> + <unjoin id1="conn:gs5s4-1" id2="conf:sidebarConf"/> + <join id1="conn:gs5s4-1" id2="conf:exampleConf"/> + <unjoin id1="conn:hd764gr9-2" id2="conf:sidebarConf"/> + <join id1="conn:hd764gr9-2" id2="conf:exampleConf"/> + <unjoin id1="conn:h37frdvgs65-3" id2="conf:sidebarConf"/> + <join id1="conn:h37frdvgs65-3" id2="conf:exampleConf"/> + </msml> + + Step 4: participants successfully moved to main conference and + sidebar destroyed. + + <msml version="1.1"> + <result response="200"/> + </msml> + +12.3. Example of Removing a Conference + + This example assumes a conference created similar to the first + example where there is an MSML control channel specific to the + conference and the conference has been configured to be deleted when + that channel is removed (using SIP). + + Steps 1-2: the CA signals BYE for the SIP dialog used to establish + the conference control channel. + + Steps 3-6: the MS initiates terminating the media sessions for each + participant remaining in the conference. + + The MS deletes the conference and removes all resources when the last + participant has been removed. + +12.4. Example of Modifying Video Layout + + Assume that a conference named "example" is created using the + following mixer descriptions. + + + +Saleem, et al. Informational [Page 118] + +RFC 5707 Media Server Markup Language February 2010 + + + +---+---+ + | 1 | 2 | + +---+---+ + | 3 | 4 | + +---+---+ + + <createconference name="quad-split"> + <audiomix> + <n-loudest n="3"/> + <asn ri="10s"/> + </audiomix> + <videolayout> + <root size="CIF" background="white" /> + <selector id="default" method="vas" si="500ms"> + <region id="1" left="0" top="0" relativesize="1/4"/> + </selector> + <region id="2" left="50%" top="0" relativesize="1/4"/> + <region id="3" left="0%" top="50%" relativesize="1/4"> + <region id="4" left="50%" top="50%" relativesize="1/4"/> + </videolayout> + </createconference> + + The following would change the size of the video window to QCIF + and the background color to the default "black". + + <modifyconference id="conf:example"> + <videolayout> + <root size="4CIF"/> + </videolayout> + </modifyconference> + + The relative location of the regions does not change. However, the + sizes of the regions do change because they are relative to the size + of the root window. The result is a layout that looks identical but + half the size. + + The following would freeze the video displayed in region "2" without + affecting any other attributes of that region. + + <modifyconference id="conf:example"> + <videolayout> + <region id="2" left="50%" top="0" relativesize="1/4" + freeze="true"/> + </videolayout> + </modifyconference> + + + + + + +Saleem, et al. Informational [Page 119] + +RFC 5707 Media Server Markup Language February 2010 + + +13. MSML Dialog Examples + + These examples focus on the MSML Dialog Base Package and the MSML + Dialog Group Package. + +13.1. Announcement + + The following is a simple announcement scenario. Two recorded audio + files are played in sequence followed by generated speech followed by + a variable. The results are reported once media generation + completes. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogstart target="conn:12345" name="12345"> + <play> + <audio uri="file://clip1.wav"/> + <audio uri="http://host1/clip2.wav"/> + <tts uri="http://host2/text.ssml"/> + <var type="date" subtype="mdy" value="20030601"/> + </play> + <send target="source" event="done" namelist="play.amt + play.end"/> + </dialogstart> + </msml> + +13.2. Voice Mail Retrieval + + Below is an example that shows a simple voice mail retrieval + operation consisting of playing a message and allowing the user to + pause and resume play using '5' to toggle the state. The operation + would terminate when the play completed or the user entered '#'. + + During the play, the user can advance forward and backward through + the message as well as rewinding to the beginning. + + + + + + + + + + + + + + + + +Saleem, et al. Informational [Page 120] + +RFC 5707 Media Server Markup Language February 2010 + + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogstart target="conn:12345" name="12345"> + <group topology="parallel"> + <play> + <audio uri="file://message.wav"/> + <playexit> + <send target="group" event="terminate"/> + </playexit> + </play> + <dtmf iterate="forever"> + <pattern digits="5"> + <send target="play" event="toggle-state"/> + </pattern> + <pattern digits="6"> + <send target="play" event="forward"/> + </pattern> + <pattern digits="7"> + <send target="play" event="backward"/> + </pattern> + <pattern digits="8"> + <send target="play" event="restart"/> + </pattern> + <pattern digits="#"> + <send target="play" event="terminate"/> + </pattern> + </dtmf> + </group> + </dialogstart> + </msml> + + + + + + + + + + + + + + + + + + + + + +Saleem, et al. Informational [Page 121] + +RFC 5707 Media Server Markup Language February 2010 + + +13.3. Play and Record + + A more complex example is a play and record operation. This sources + and sinks media and uses voice activity DTMF detection and + recognition to influence behavior. Any DTMF input or voice activity + will barge the play and cause the record to begin. However, if the + prompt was barged with a DTMF digit of '#', the record terminates + without starting. When the play terminates, it send a starttimer + event to the VAD to allow it to recognize an initial silence + condition. The recording will be terminated (without starting) when + the VAD detects an initial 3 seconds of silence. + + Once resumed (based upon voice detection), the recording may be + terminated under several conditions. It will terminate after 5 + seconds of silence or after 60 seconds elapses. It will also + terminate if a '#' key is recognized. Every aspect of this behavior + can be modified by changing what is recognized and the events that + are sent. The following example uses the MSML Dialog Group Package. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Saleem, et al. Informational [Page 122] + +RFC 5707 Media Server Markup Language February 2010 + + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogstart target="conn:12345" name="12345"> + <group topology="parallel"> + <play> + <audio uri="file://prompt.wav"/> + <playexit> + <send target="vad" event="starttimer"/> + </playexit> + </play> + <dtmf> + <pattern digits="#"> + <send target="record" event="terminate.termkey"/> + </pattern> + <detect> + <send target="play" event="terminate"/> + </detect> + </dtmf> + <vad> + <voice len="10ms"> + <send target="play" event="terminate"/> + <send target="record" event="resume"/> + </voice> + <silence len="3s"> + <send target="record" event="nospeech"/> + </silence> + <tsilence len="5s"> + <send target="record" event="terminate.finalsilence"/> + </tsilence> + </vad> + <record initial="suspend" maxtime="60s" + dest="file://record.wav" format="g729"> + <recordexit> + <send target="group" event="terminate"/> + </recordexit> + </record> + <groupexit> + <send target="source" event="done" + namelist="record.len record.end"/> + </groupexit> + </group> + </dialogstart> + </msml> + + + + + + + + +Saleem, et al. Informational [Page 123] + +RFC 5707 Media Server Markup Language February 2010 + + + The following implements the same functionality, as described above, + in using the MSML Dialog Base Package, using the <record> composite + mechanism for the play and record operation. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogstart target="conn:12345" name="12345"> + <record prespeech="3s" postspeech="5s" maxtime="60s" termkey="#" + dest="file://record.wav" format="g729"> + <play barge="true"> + <audio uri="file://prompt.wav"/> + </play> + <recordexit> + <send target="source" event="done" + namelist="record.len record.end"/> + </recordexit> + </record> + </dialogstart> + </msml> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Saleem, et al. Informational [Page 124] + +RFC 5707 Media Server Markup Language February 2010 + + +13.4. Speech Recognition + + The following simple example requests that a user speak the name of a + city and returns the result. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogstart target="conn:12345" name="12345"> + <group topology="parallel"> + <play> + <audio uri="file://prompt.wav"/> + </play> + <speech> + <grammar version="1.0"> + <rule id="city" scope="public"> + <item> + <one-of> + <item>vancouver</item> + <item>new york</item> + <item>london</item> + </one-of> + </item> + </rule> + <match> + <send target="group" event="terminate"/> + </match> + </grammar> + <noinput> + <send target="group" event="terminate"/> + </noinput> + <nomatch> + <send target="group" event="terminate"/> + </nomatch> + </speech> + <groupexit> + <send target="source" event="done" + namelist="speech.end speech.results"/> + </groupexit> + </group> + </dialogstart> + </msml> + +13.5. Play and Collect + + This example prompts a user to enter 4 DTMF digits terminated by the + '#' key (represented by "xxxx#" below). The prompt will be barged + and the user has 10 seconds to begin entering input or no input will + be indicated. + + + +Saleem, et al. Informational [Page 125] + +RFC 5707 Media Server Markup Language February 2010 + + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogstart target="conn:12345" name="12345"> + <group topology="parallel"> + <play> + <audio uri="file://prompt.wav"/> + <playexit> + <send target="dtmf" event="starttimer"/> + </playexit> + </play> + <dtmf fdt="10s" idt="16s"> + <pattern digits="xxxx#"> + <send target="group" event="terminate"/> + </pattern> + <detect> + <send target="play" event="terminate"/> + </detect> + <noinput> + <send target="group" event="terminate"/> + </noinput> + <nomatch> + <send target="group" event="terminate"/> + </nomatch> + </dtmf> + <groupexit> + <send target="source" event="done" + namelist="dtmf.digits dtmf.end"/> + </groupexit> + </group> + </dialogstart> + </msml> + + + + + + + + + + + + + + + + + + + + +Saleem, et al. Informational [Page 126] + +RFC 5707 Media Server Markup Language February 2010 + + + The following implements the same functionality, as described above, + using the MSML Dialog Base Package, using the <collect> composite + mechanism for the play and collect operation. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <dialogstart target="conn:12345" name="12345"> + + <collect fdt="10s" idt="16s"> + <play barge="true"> + <audio uri="file://prompt.wav"/> + </play> + <pattern digits="xxxx#"> + <send target="source" event="done" + namelist="dtmf.digits dtmf.end"/> + </pattern> + <noinput> + <send target="source" event="done" + namelist="dtmf.end"/> + </noinput> + <nomatch> + <send target="source" event="done" + namelist="dtmf.end"/> + </nomatch> + </collect> + </dialogstart> + </msml> + + + + + + + + + + + + + + + + + + + + + + + + +Saleem, et al. Informational [Page 127] + +RFC 5707 Media Server Markup Language February 2010 + + +13.6. User Controlled Gain + + This shows an example of nesting groups to create an arbitrary full- + duplex media control. DTMF is detected on media flowing in one + direction and used to adjust the gain applied to media flowing in the + opposite direction. Additionally, the stream that is used to detect + DTMF has DTMF removed and its gain automatically adjusted before + leaving the group. This widget could be used between a conference + participant and a conference mixer. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.0"> + <dialogstart target="conn:12345" name="12345"> + <group topology="fullduplex"> + <group topology="parallel"> + <dtmf> + <pattern digits="1" iterate="forever"> + <send target="gain" event="louder"/> + </pattern> + <pattern digits="2" iterate="forever"> + <send target="gain" event="softer"/> + </pattern> + </dtmf> + <group topology="serial"> + <clamp/> + <agc tgtlvl="0"/> + </group> + </group> + <gain amt="0" incr="5"/> + </group> + </dialogstart> + </msml> + +14. MSML Audit Examples + + The following examples describe the MSML Audit Conference Package and + the MSML Audit Connection Package, and their use together with the + MSML Audit Dialog Package or/and the MSML Audit Stream Package. + +14.1. Audit All Conferences + + This example describes an audit of all active conferences on the + media server, querying the conference configurations. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <audit queryid="conf:*" statelist="audit.conf.confconfig.*"/> + </msml> + + + +Saleem, et al. Informational [Page 128] + +RFC 5707 Media Server Markup Language February 2010 + + + The following result assumes two conferences currently allocated by + the media server. Conference "conf:1" contains both an audio mixer + (with ASN enabled) and a video layout (vas) created, while conference + "conf:2" contains only an audio mixer created with ASN disabled. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <result response="200"> + <auditresult targetid="conf:1"> + <confconfig deletewhen="nocontrol" term="true"> + <audiomix id="audiomix1"> + <asn ri="5s"/> + <n-loudest n="16"/> + </audiomix> + <videolayout id="videolayout1" + type="text/msml-basic-layout"> + <selector id="selector1" method="vas" si="5s" + speakersees="current"> + <root size="CIF"/> + </selector> + </videolayout> + <controller>conn:1234</controller> + </confconfig> + </auditresult> + <auditresult targetid="conf:2"> + <confconfig deletewhen="nomedia" term="true"> + <audiomix id="audiomix2"> + <n-loudest n="1"/> + </audiomix> + <controller>conn:1234</controller> + </confconfig> + </auditresult> + </result> + </msml> + +14.2. Audit Conference Dialogs + + This example describes an audit of active dialogs on a specific + conference. The request queries all available dialog states. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <audit queryid="conf:1" statelist="audit.conf.dialog.*"/> + </msml> + + The example result assumes a single dialog running on conference + "conf:1", which has been running for 60 seconds, and the dialog is + currently executing a record operation. + + + +Saleem, et al. Informational [Page 129] + +RFC 5707 Media Server Markup Language February 2010 + + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <result response="200"> + <auditresult targetid="conf:1"> + <dialog name="sample"> + <duration>60</duration> + <primitive>record</primitive> + <controller>conn:1234</controller> + </dialog> + </auditresult> + </result> + </msml> + +14.3. Audit Conference Streams + + This example request describes an audit of active streams on a + specific conference. The request queries all available stream + states. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <audit queryid="conf:1" statelist="audit.conf.stream.*"/> + </msml> + + The example result assumes three audio participants in the + conference. Connection "conn:1234" is a talk-listen participant with + both clamp and gain control enabled. Connection "conn:1235" is a + talk-only participant. Connection "conn:1236" is a listen-only + participant with automatic gain control enabled. + + + + + + + + + + + + + + + + + + + + + + +Saleem, et al. Informational [Page 130] + +RFC 5707 Media Server Markup Language February 2010 + + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <result response="200"> + <auditresult targetid="conf:1"> + <stream joinwith="conn:1234" media="audio" dir="to"> + <clamp dtmf="true" tone="false"/> + <gain amt="-10"/> + </stream> + <stream joinwith="conn:1234" media="audio" dir="from"> + <gain amt="10"/> + </stream> + <stream joinwith="conn:1235" media="audio" dir="to"> + </stream> + <stream joinwith="conn:1236" media="audio" dir="from"> + <gain agc="true" tgtlvl="0" maxgain="10"/> + </stream> + </auditresult> + </result> + </msml> + +14.4. Audit All Connections + + This example request describes an audit of all active connections on + the media server. No additional state is queried. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <audit queryid="conn:*"/> + </msml> + + The example result assumes five connections currently allocated by + the media server. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <result response="200"> + <auditresult targetid="conn:1230"/> + <auditresult targetid="conn:1231"/> + <auditresult targetid="conn:1232"/> + <auditresult targetid="conn:1233"/> + <auditresult targetid="conn:1234"/> + </result> + </msml> + +14.5. Audit Connection Dialogs + + This example request describes an audit of active dialogs on a + specific connection. No additional dialog state is queried. + + + +Saleem, et al. Informational [Page 131] + +RFC 5707 Media Server Markup Language February 2010 + + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <audit queryid="conn:1234" statelist="audit.conn.dialog"/> + </msml> + + The example result assumes three dialogs running on the connection. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <result response="200"> + <auditresult targetid="conn:1234"> + <dialog name="sample1"/> + <dialog name="sample2"/> + <dialog name="sample3"/> + </auditresult> + </result> + </msml> + +14.6. Audit Connection Streams + + This example request describes an audit of active streams on a + specific connection. No additional stream state is queried. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <audit queryid="conn:1234" statelist="audit.conn.stream"/> + </msml> + + The example result assumes three audio streams created between target + connection and other MSML objects, one of which is a bidirectional + stream between target connection and a conference, and two are + unidirectional streams between two other connections. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <result response="200"> + <auditresult targetid="conn:1234"> + <stream joinwith="conf:1" media="audio" dir="to"/> + <stream joinwith="conf:1" media="audio" dir="from"/> + <stream joinwith="conn:1235" media="audio" dir="to"/> + <stream joinwith="conn:1236" media="audio" dir="from"/> + </auditresult> + </result> + </msml> + + + + + + + +Saleem, et al. Informational [Page 132] + +RFC 5707 Media Server Markup Language February 2010 + + +14.7. Audit Connection with Selective States + + This example describes an audit of a specific connection, querying + associated SIP dialog ID and SDP info. + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <audit queryid="conn:1234" statelist="audit.conn.sipdialog + audit.conn.localsdp audit.conn.remotesdp"/> + </msml> + + <?xml version="1.0" encoding="UTF-8"?> + <msml version="1.1"> + <result response="200"> + <auditresult targetid="conn:1234"> + <sipdialog callid="ABCD@10.0.0.10:5060" + localtag="sdfjsiodf" + remotetag="zvnmviuhd8"/> + <localsdp> + v=0 + o=- 31691 31691 IN IP4 ms5mpc11.lab.radisys.com + s=media server session + t=0 0 + m=audio 33794 RTP/AVP 0 + c=IN IP4 10.3.5.111 + a=rtpmap:0 PCMU/8000 + a=sendrecv + m=video 32770 RTP/AVP 34 + c=IN IP4 10.3.5.11 + b=AS:48 + a=rtpmap:34 H263/90000 + a=fmtp:34 CIF=1 + a=sendrecv + </localsdp> + <remotesdp> + v=0 + o=- 12345 12345 IN IP4 10.0.0.88 + s=RadiSys SIP Media Server session + t=0 0 + c=IN IP4 10.0.0.126 + b=AS:128 + m=audio 10000 RTP/AVP 0 + a=rtpmap:0 PCMU/8000 + a=ptime:20 + a=sendrecv + m=video 10002 RTP/AVP 34 + a=rtpmap:34 H263/90000 + a=fmtp:34 CIF=1 + + + +Saleem, et al. Informational [Page 133] + +RFC 5707 Media Server Markup Language February 2010 + + + a=sendrecv + </remotesdp> + </auditresult> + </result> + </msml> + +15. Future Work + + The following capabilities may be added in future versions of this + document: + + o Ability for MSML clients to audit or query the media server for + supported set of MSML packages and profiles. + + o Ability to version MSML packages and profiles and naming scheme for + MSML extension packages. + +16. XML Schema + + MSML specification consists of a set of XML schemas, all of which may + be used together or any sub-set of the schemas may be used for each + MSML package. The following sections define a complete set of + schemas covering all MSML packages. + + Each package contains a single schema file, <package-name>- + datatypes.xsd. This schema file can be included by its extended + package(s). Every package optionally contains another schema file, + <package_name>.xsd, which can be used directly to build or validate + MSML scripts for a given package. + + The complete MSML schema (msml.xsd) includes all the individual MSML + packages. + + <?xml version="1.0" encoding="UTF-8"?> + <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-conf-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-base-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-transform-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-group-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-speech-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-fax-detect-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-fax-sendrecv- + datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-core-datatypes.xsd"/> + + + +Saleem, et al. Informational [Page 134] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:include schemaLocation="msml-audit-conf-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-conn-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-dialog-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-stream-datatypes.xsd"/> + <xs:element name="msml"> + <xs:complexType> + <xs:choice> + <xs:group ref="msmlRequestType" maxOccurs="unbounded"/> + <xs:element name="event"> + <xs:complexType> + <xs:choice maxOccurs="unbounded"> + <xs:sequence> + <xs:element name="name" type="msmlEventNameValue.datatype"/> + <xs:element name="value"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:pattern value="[a-zA-Z0-9.]+"/> + </xs:restriction> + </xs:simpleType> + </xs:element> + </xs:sequence> + </xs:choice> + <xs:attribute name="name" type="msmlEventName.datatype" + use="required"/> + <xs:attribute name="id" type="msmlEventSource.datatype" + use="required"/> + </xs:complexType> + </xs:element> + <xs:element name="result"> + <xs:complexType> + <xs:choice> + <xs:element ref="description" minOccurs="0"/> + <xs:sequence> + <xs:element ref="msmlResultSimple" minOccurs="0" + maxOccurs="unbounded"/> + <xs:element ref="msmlResultComplex" minOccurs="0" + maxOccurs="unbounded"/> + </xs:sequence> + </xs:choice> + <xs:attribute name="response"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:pattern value="\d{3}"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="mark" type="mark.datatype"/> + </xs:complexType> + + + +Saleem, et al. Informational [Page 135] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:element> + </xs:choice> + <xs:attribute name="version" type="xs:string" use="required" + fixed="1.1"/> + </xs:complexType> + </xs:element> + </xs:schema> + +16.1. MSML Core + +16.1.1. msml-core.xsd + + <?xml version="1.0" encoding="UTF-8"?> + <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core-datatypes.xsd"/> + <xs:element name="msml"> + <xs:complexType> + <xs:choice> + <xs:group ref="msmlRequestType" maxOccurs="unbounded"/> + <xs:element name="event"> + <xs:complexType> + <xs:choice maxOccurs="unbounded"> + <xs:sequence> + <xs:element name="name" type="msmlEventNameValue.datatype"/> + <xs:element name="value"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:pattern value="[a-zA-Z0-9.]+"/> + </xs:restriction> + </xs:simpleType> + </xs:element> + </xs:sequence> + </xs:choice> + <xs:attribute name="name" type="msmlEventName.datatype" + use="required"/> + <xs:attribute name="id" type="msmlEventSource.datatype" + use="required"/> + </xs:complexType> + </xs:element> + <xs:element name="result"> + <xs:complexType> + <xs:choice> + <xs:element ref="description" minOccurs="0"/> + <xs:sequence> + <xs:element ref="msmlResultSimple" minOccurs="0" + maxOccurs="unbounded"/> + <xs:element ref="msmlResultComplex" minOccurs="0" + + + +Saleem, et al. Informational [Page 136] + +RFC 5707 Media Server Markup Language February 2010 + + + maxOccurs="unbounded"/> + </xs:sequence> + </xs:choice> + <xs:attribute name="response"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:pattern value="\d{3}"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="mark" type="mark.datatype"/> + </xs:complexType> + </xs:element> + </xs:choice> + <xs:attribute name="version" type="xs:string" use="required" + fixed="1.1"/> + </xs:complexType> + </xs:element> + </xs:schema> + +16.1.2. msml-core-datatypes.xsd + + <?xml version="1.0" encoding="UTF-8"?> + <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:group name="msmlRequestType"> + <xs:choice> + <xs:element ref="msmlRequest"/> + <xs:element name="send"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:attribute name="event" type="msmlEvent.datatype" + use="required"/> + <xs:attribute name="target" type="msmlTarget.datatype" + use="required"/> + <xs:attribute name="valuelist" type="xs:string"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + </xs:choice> + </xs:group> + <xs:element name="msmlRequest" type="msmlRequestType" + abstract="true"/> + <xs:complexType name="msmlRequestType"> + <xs:attribute ref="mark"/> + + + +Saleem, et al. Informational [Page 137] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:complexType> + <xs:element name="msmlResultSimple" type="msmlResultSimpleType" + abstract="true"/> + <xs:element name="msmlResultComplex" type="msmlResultComplexType" + abstract="true"/> + <xs:simpleType name="msmlResultSimpleType"> + <xs:restriction base="xs:string"/> + </xs:simpleType> + <xs:complexType name="msmlResultComplexType"/> + <xs:element name="description" type="xs:string"/> + <xs:attribute name="mark" type="mark.datatype"/> + <xs:simpleType name="msmlInstanceID.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="[a-zA-Z0-9.:\-_]+"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="connID.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="conn:[a-zA-Z0-9.:\-_]+"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="confID.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="conf:[a-zA-Z0-9.:\-_]+"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="dialogID.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="conf:[a-zA-Z0-9.:\-_]+/dialog:[a-zA-Z0-9.:\-_]+"/> + <xs:pattern value="conn:[a-zA-Z0-9.:\-_]+/dialog:[a-zA-Z0-9.:\-_]+"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="independentID.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="conf:[a-zA-Z0-9.:\-_]+"/> + <xs:pattern value="conn:[a-zA-Z0-9.:\-_]+"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="dialogLanguage.datatype"> + <xs:restriction base="xs:string"> + <xs:enumeration value="application/moml+xml"/> + <xs:enumeration value="application/voicexml+xml"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="msmlEvent.datatype"> + <xs:restriction base="xs:string"/> + </xs:simpleType> + <xs:simpleType name="msmlSend.datatype"> + + + +Saleem, et al. Informational [Page 138] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:restriction base="xs:string"/> + </xs:simpleType> + <xs:simpleType name="msmlEventName.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="msml.dialog.exit"/> + <xs:pattern value="msml.conf.asn"/> + <xs:pattern value="msml.conf.nomedia"/> + <xs:pattern value="msml.dialog.exit"/> + <xs:pattern value="[a-zA-Z0-9.:_\-]+"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="msmlTarget.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern + value="conf:[a-zA-Z0-9.:_\-]+(/oper:[a-zA-Z0-9.:_\-]+|\*)*"/> + <xs:pattern + value="conn:[a-zA-Z0-9.:_\-]+(/oper:[a-zA-Z0-9.:_\-]+|\*)+"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="msmlEventSource.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="conf:[a-zA-Z0-9.:_\-]+"/> + <xs:pattern value="(conf:[a-zA-Z0-9.:_\-]+|conn:[a-zA-Z0-9.:_\- + ]+)/dialog:[a-zA-Z0-9.:_\-]+"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="msmlEventNameValue.datatype"> + <xs:restriction base="xs:string"/> + </xs:simpleType> + <xs:simpleType name="mark.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="[a-zA-Z0-9.:\-_]+"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="boolean.datatype"> + <xs:restriction base="xs:string"> + <xs:enumeration value="true"/> + <xs:enumeration value="false"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="posDuration.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="(\+)?([0-9]*\.)?[0-9]+(ms|s)"/> + </xs:restriction> + </xs:simpleType> + </xs:schema> + + + + + +Saleem, et al. Informational [Page 139] + +RFC 5707 Media Server Markup Language February 2010 + + +16.2. MSML Conference Core Package + +16.2.1. msml-conf-core.xsd + + <?xml version="1.0" encoding="UTF-8"?> + <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-conf-core-datatypes.xsd"/> + </xs:schema> + +16.2.2. msml-conf-core-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core-datatypes.xsd"/> + <xs:element name="createconference" substitutionGroup="msmlRequest"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:all> + <xs:element name="audiomix" type="audioMixType" minOccurs="0"/> + <xs:element name="videolayout" type="videoLayoutType" + minOccurs="0"/> + <xs:element name="reserve" minOccurs="0"> + <xs:complexType> + <xs:sequence> + <xs:element name="resource" maxOccurs="unbounded"> + <xs:complexType> + <xs:sequence> + <xs:any namespace="##other" processContents="lax" + minOccurs="0" maxOccurs="unbounded"/> + </xs:sequence> + <xs:attribute name="n" type="xs:positiveInteger" + default="1"/> + <xs:anyAttribute namespace="##any"/> + </xs:complexType> + </xs:element> + </xs:sequence> + <xs:attribute name="required" type="boolean.datatype" + default="true"/> + </xs:complexType> + </xs:element> + </xs:all> + <xs:attribute name="name" type="msmlInstanceID.datatype"/> + + + +Saleem, et al. Informational [Page 140] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:attribute name="deletewhen" default="never"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="nomedia"/> + <xs:enumeration value="nocontrol"/> + <xs:enumeration value="never"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="term" type="boolean.datatype" default="true"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="modifyconference" substitutionGroup="msmlRequest"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:all> + <xs:element name="audiomix" type="audioMixType" minOccurs="0"/> + <xs:element name="videolayout" type="videoLayoutType" + minOccurs="0"/> + </xs:all> + <xs:attribute name="id" type="confID.datatype" use="required"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="destroyconference" substitutionGroup="msmlRequest"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:all> + <xs:element name="audiomix" type="basicAudioMixType" + minOccurs="0"/> + <xs:element name="videolayout" type="basicVideoLayoutType" + minOccurs="0"/> + </xs:all> + <xs:attribute name="id" type="confID.datatype" use="required"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="join" substitutionGroup="msmlRequest"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:sequence> + + + +Saleem, et al. Informational [Page 141] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:element name="stream" type="streamType" minOccurs="0" + maxOccurs="4"/> + </xs:sequence> + <xs:attribute name="id1" type="independentID.datatype" + use="required"/> + <xs:attribute name="id2" type="independentID.datatype" + use="required"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="modifystream" substitutionGroup="msmlRequest"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:sequence> + <xs:element name="stream" type="streamType" maxOccurs="4"/> + </xs:sequence> + <xs:attribute name="id1" type="independentID.datatype" + use="required"/> + <xs:attribute name="id2" type="independentID.datatype" + use="required"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="unjoin" substitutionGroup="msmlRequest"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:sequence> + <xs:element name="stream" type="basicStreamType" minOccurs="0" + maxOccurs="4"/> + </xs:sequence> + <xs:attribute name="id1" type="independentID.datatype" + use="required"/> + <xs:attribute name="id2" type="independentID.datatype" + use="required"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="monitor" substitutionGroup="msmlRequest"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:attribute name="id1" type="connID.datatype" use="required"/> + <xs:attribute name="id2" type="independentID.datatype" + + + +Saleem, et al. Informational [Page 142] + +RFC 5707 Media Server Markup Language February 2010 + + + use="required"/> + <xs:attribute name="compressed" type="boolean.datatype" + default="false"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="confid" type="msmlResultSimpleType" + substitutionGroup="msmlResultSimple"/> + <xs:complexType name="basicStreamType"> + <xs:attribute name="dir"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="to-id1"/> + <xs:enumeration value="from-id1"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="media"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="audio"/> + <xs:enumeration value="video"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="compressed" type="boolean.datatype"/> + </xs:complexType> + <xs:complexType name="streamType"> + <xs:complexContent> + <xs:extension base="basicStreamType"> + <xs:choice minOccurs="0" maxOccurs="unbounded"> + <xs:element name="gain"> + <xs:complexType> + <xs:attribute name="amt" use="optional"> + <xs:simpleType> + <xs:restriction base="xs:integer"> + <xs:minInclusive value="-96"/> + <xs:maxInclusive value="96"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="agc" type="boolean.datatype"/> + <xs:attribute name="tgtlvl" use="optional"> + <xs:simpleType> + <xs:restriction base="xs:nonPositiveInteger"> + <xs:minInclusive value="-40"/> + <xs:maxInclusive value="0"/> + + + +Saleem, et al. Informational [Page 143] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="maxgain" default="10"> + <xs:simpleType> + <xs:restriction base="xs:nonNegativeInteger"> + <xs:minInclusive value="0"/> + <xs:maxInclusive value="40"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:complexType> + </xs:element> + <xs:element name="clamp"> + <xs:complexType> + <xs:attribute name="dtmf" type="boolean.datatype"/> + <xs:attribute name="tones" type="boolean.datatype"/> + </xs:complexType> + </xs:element> + <xs:element name="visual"/> + </xs:choice> + <xs:attribute name="preferred" type="boolean.datatype" + default="false"/> + <xs:attribute name="display" type="xs:string"/> + <xs:attribute name="override" type="boolean.datatype" + default="false"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + <xs:complexType name="basicAudioMixType"> + <xs:attribute name="id" type="xs:string" use="optional"/> + <xs:attribute name="samplerate" type="xs:positiveInteger" + use="optional" default="8000"/> + </xs:complexType> + <xs:complexType name="audioMixType"> + <xs:complexContent> + <xs:extension base="basicAudioMixType"> + <xs:all> + <xs:element name="asn" minOccurs="0"> + <xs:complexType> + <xs:attribute name="ri" type="posDuration.datatype"/> + <xs:attribute name="asth" default="-96"> + <xs:simpleType> + <xs:restriction base="xs:nonPositiveInteger"> + <xs:minInclusive value="-96"/> + <xs:maxInclusive value="0"/> + </xs:restriction> + </xs:simpleType> + + + +Saleem, et al. Informational [Page 144] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:attribute> + </xs:complexType> + </xs:element> + <xs:element name="n-loudest" minOccurs="0"> + <xs:complexType> + <xs:attribute name="n" type="xs:positiveInteger" use="required"/> + </xs:complexType> + </xs:element> + </xs:all> + </xs:extension> + </xs:complexContent> + </xs:complexType> + <xs:complexType name="basicVideoLayoutType"> + <xs:attribute name="id" type="xs:string" use="required"/> + <xs:attribute name="type" type="xs:string" use="required" + fixed="text/msml-basic-layout"/> + </xs:complexType> + <xs:complexType name="videoLayoutType"> + <xs:complexContent> + <xs:extension base="basicVideoLayoutType"> + <xs:choice> + <xs:element name="selector"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="selectorType"> + <xs:choice> + <xs:element name="root" type="rootType" minOccurs="0"/> + <xs:element name="region" minOccurs="0"> + <xs:complexType> + <xs:attribute name="id" type="xs:string" use="required"/> + <xs:attribute name="left" type="xs:positiveInteger"/> + <xs:attribute name="top" type="xs:positiveInteger"/> + <xs:attribute name="relativeSize"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="1/4"/> + <xs:enumeration value="1/3"/> + <xs:enumeration value="2/3"/> + <xs:enumeration value="3/4"/> + <xs:enumeration value="1"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="priority"> + <xs:simpleType> + <xs:restriction base="xs:float"> + <xs:minInclusive value="0"/> + <xs:maxExclusive value="1"/> + + + +Saleem, et al. Informational [Page 145] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="title" type="xs:string"/> + <xs:attribute name="titleTextColor" type="xs:string"/> + <xs:attribute name="titleBackgroundColor" type="xs:string"/> + <xs:attribute name="borderColor" type="xs:string"/> + <xs:attribute name="borderWidth" type="xs:positiveInteger"/> + <xs:attribute name="logo" type="xs:anyURI"/> + </xs:complexType> + </xs:element> + </xs:choice> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="root" type="rootType"/> + <xs:element name="region" minOccurs="0" maxOccurs="unbounded"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="regionType"/> + </xs:complexContent> + </xs:complexType> + </xs:element> + </xs:choice> + </xs:extension> + </xs:complexContent> + </xs:complexType> + <xs:complexType name="regionType"> + <xs:attribute name="id" type="xs:string" use="required"/> + <xs:attribute name="left" type="xs:positiveInteger"/> + <xs:attribute name="top" type="xs:positiveInteger"/> + <xs:attribute name="relativeSize"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="1/4"/> + <xs:enumeration value="1/3"/> + <xs:enumeration value="2/3"/> + <xs:enumeration value="3/4"/> + <xs:enumeration value="1"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="priority"> + <xs:simpleType> + <xs:restriction base="xs:float"> + <xs:minInclusive value="0"/> + <xs:maxExclusive value="1"/> + + + +Saleem, et al. Informational [Page 146] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="title" type="xs:string"/> + <xs:attribute name="titleTextColor" type="xs:string"/> + <xs:attribute name="titleBackgroundColor" type="xs:string"/> + <xs:attribute name="borderColor" type="xs:string"/> + <xs:attribute name="borderWidth" type="xs:positiveInteger"/> + <xs:attribute name="logo" type="xs:anyURI"/> + </xs:complexType> + <xs:complexType name="selectorType"> + <xs:attribute name="id" type="xs:string" use="required"/> + <xs:attribute name="method" use="required"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="vas"/> + <xs:enumeration value="sequence"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="status" default="active"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="active"/> + <xs:enumeration value="disabled"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="si" type="posDuration.datatype" default="1s"/> + <xs:attribute name="blankothers" type="xs:boolean" default="false"/> + <xs:attribute name="speakersees" default="current"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="current"/> + <xs:enumeration value="previous"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:complexType> + <xs:complexType name="rootType"> + <xs:attribute name="size" default="CIF"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="16CIF"/> + <xs:enumeration value="4CIF"/> + <xs:enumeration value="CIF"/> + <xs:enumeration value="QCIF"/> + </xs:restriction> + + + +Saleem, et al. Informational [Page 147] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:simpleType> + </xs:attribute> + <xs:attribute name="backgroundcolor" type="xs:string" + default="black"/> + <xs:attribute name="backgroundimage" type="xs:anyURI"/> + </xs:complexType> + <xs:simpleType name="confclass.datatype"> + <xs:restriction base="xs:string"> + <xs:enumeration value="standard"/> + <xs:enumeration value="preferred"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="conferenceType.datatype"> + <xs:restriction base="xs:string"> + <xs:enumeration value="audio.basic"/> + <xs:enumeration value="audio.advanced"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="duplex.datatype"> + <xs:restriction base="xs:string"> + <xs:enumeration value="half"/> + <xs:enumeration value="full"/> + </xs:restriction> + </xs:simpleType> +</xs:schema> + +16.3. MSML Dialog Packages + +16.3.1. msml-dialog-core.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> +</xs:schema> + +16.3.2. msml-dialog-core-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core-datatypes.xsd"/> + <xs:group name="momlRequest"> + <xs:choice> + <xs:group ref="executeType"/> + + + +Saleem, et al. Informational [Page 148] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:group ref="sendType"/> + </xs:choice> + </xs:group> + <xs:element name="dialogstart" substitutionGroup="msmlRequest"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:choice> + <xs:group ref="momlRequest" minOccurs="0"/> + </xs:choice> + <xs:attribute name="target" type="independentID.datatype" + use="required"/> + <xs:attribute name="type" type="dialogLanguage.datatype" + use="required"/> + <xs:attribute name="name" type="msmlInstanceID.datatype"/> + <xs:attribute name="src" type="xs:anyURI" use="optional"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="dialogend" substitutionGroup="msmlRequest"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:attribute name="id" type="dialogID.datatype" use="required"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="dialogid" type="msmlResultSimpleType" + substitutionGroup="msmlResultSimple"/> + <xs:group name="executeType"> + <xs:choice> + <xs:element ref="primitive" maxOccurs="unbounded"/> + <xs:element ref="control" maxOccurs="unbounded"/> + </xs:choice> + </xs:group> + <xs:element name="primitive" type="primitiveType" abstract="true"/> + <xs:complexType name="primitiveType"> + <xs:attribute name="id" type="momlID.datatype"/> + </xs:complexType> + <xs:element name="control" abstract="true"/> + <xs:group name="sendType"> + <xs:choice> + <xs:choice> + <xs:element name="exit" type="exitType"/> + <xs:element name="disconnect" type="exitType"/> + </xs:choice> + + + +Saleem, et al. Informational [Page 149] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:sequence> + <xs:element ref="send" maxOccurs="unbounded"/> + <xs:choice minOccurs="0"> + <xs:element name="exit" type="exitType"/> + <xs:element name="disconnect" type="exitType"/> + </xs:choice> + </xs:sequence> + </xs:choice> + </xs:group> + <xs:element name="send"> + <xs:complexType> + <xs:attribute name="event" type="momlEvent.datatype" use="required"/> + <xs:attribute name="target" type="momlTarget.datatype" + use="required"/> + <xs:attribute name="namelist" type="momlNamelist.datatype"/> + </xs:complexType> + </xs:element> + <xs:complexType name="exitType"> + <xs:attribute name="namelist" type="momlNamelist.datatype"/> + </xs:complexType> + <xs:simpleType name="momlID.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="[a-zA-Z0-9][a-zA-Z0-9._\-]*"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="momlEvent.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="[a-zA-Z0-9][a-zA-Z0-9._\-]*"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="momlNamelist.datatype"> + <xs:restriction base="xs:string"/> + </xs:simpleType> + <xs:simpleType name="dtmfDigits.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="[0-9#*]+"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="iterate.datatype"> + <xs:union memberTypes="xs:positiveInteger"> + <xs:simpleType> + <xs:restriction base="xs:negativeInteger"> + <xs:minInclusive value="-1"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="forever"/> + + + +Saleem, et al. Informational [Page 150] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:restriction> + </xs:simpleType> + </xs:union> + </xs:simpleType> + <xs:simpleType name="momlTarget.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="[a-zA-Z0-9][a-zA-Z0-9._\-]*"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="duration.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="(\+|\-)?([0-9]*\.)?[0-9]+(ms|s)"/> + </xs:restriction> + </xs:simpleType> +</xs:schema> + +16.3.3. msml-dialog-base.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="unqualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-base-datatypes.xsd"/> +</xs:schema> + +16.3.4. msml-dialog-base-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="unqualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:import namespace="http://www.w3.org/XML/1998/namespace" + schemaLocation="http://www.w3.org/2001/xml.xsd"/> + <xs:element name="play" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:sequence> + <xs:choice maxOccurs="unbounded"> + <xs:element name="audio" minOccurs="0" maxOccurs="unbounded"> + <xs:complexType> + <xs:attribute name="uri" type="xs:anyURI" use="required"/> + <xs:attribute name="iterate" type="iterate.datatype" + default="1"/> + <xs:attribute name="format" type="xs:string" use="optional"/> + + + +Saleem, et al. Informational [Page 151] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:attribute name="audiosamplerate" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="audiosamplesize" type="xs:positiveInteger" + use="optional"/> + <xs:attribute ref="xml:lang"/> + </xs:complexType> + </xs:element> + <xs:element name="video" minOccurs="0" maxOccurs="unbounded"> + <xs:complexType> + <xs:attribute name="uri" type="xs:anyURI" use="required"/> + <xs:attribute name="iterate" type="iterate.datatype" + use="optional" default="1"/> + <xs:attribute name="format" type="xs:string" use="optional"/> + <xs:attribute name="audiosamplerate" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="audiosamplesize" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="codecconfig" type="xs:string" + use="optional"/> + <xs:attribute name="profile" type="xs:string" use="optional"/> + <xs:attribute name="level" type="xs:string" use="optional"/> + <xs:attribute name="imagewidth" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="imageheight" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="maxbitrate" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="framerate" type="xs:positiveInteger" + use="optional"/> + </xs:complexType> + </xs:element> + <xs:element name="media" minOccurs="0" maxOccurs="unbounded"> + <xs:complexType> + <xs:choice minOccurs="0" maxOccurs="unbounded"> + <xs:element name="audio" minOccurs="0"> + <xs:complexType> + <xs:attribute name="uri" type="xs:anyURI" use="required"/> + <xs:attribute name="iterate" type="iterate.datatype" + default="1"/> + <xs:attribute name="format" type="xs:string" + use="optional"/> + <xs:attribute name="audiosamplerate" + type="xs:positiveInteger" use="optional"/> + <xs:attribute name="audiosamplesize" + type="xs:positiveInteger" use="optional"/> + <xs:attribute ref="xml:lang"/> + </xs:complexType> + </xs:element> + + + +Saleem, et al. Informational [Page 152] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:element name="video" minOccurs="0"> + <xs:complexType> + <xs:attribute name="uri" type="xs:anyURI" use="required"/> + <xs:attribute name="iterate" type="iterate.datatype" + use="optional" default="1"/> + <xs:attribute name="format" type="xs:string" + use="optional"/> + <xs:attribute name="audiosamplerate" + type="xs:positiveInteger" use="optional"/> + <xs:attribute name="audiosamplesize" + type="xs:positiveInteger" use="optional"/> + <xs:attribute name="codecconfig" type="xs:string" + use="optional"/> + <xs:attribute name="profile" type="xs:string" + use="optional"/> + <xs:attribute name="level" type="xs:string" use="optional"/> + <xs:attribute name="imagewidth" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="imageheight" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="maxbitrate" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="framerate" type="xs:positiveInteger" + use="optional"/> + </xs:complexType> + </xs:element> + </xs:choice> + </xs:complexType> + </xs:element> + <xs:element ref="smedia" minOccurs="0" maxOccurs="unbounded"/> + </xs:choice> + <xs:choice minOccurs="0"> + <xs:element name="playexit"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + </xs:choice> + </xs:sequence> + <xs:attribute name="interval" type="posDuration.datatype" + use="optional"/> + <xs:attribute name="iterate" type="iterate.datatype" use="optional" + default="1"/> + <xs:attribute name="offset" type="duration.datatype" + use="optional"/> + <xs:attribute name="initial" use="optional" default="generate"> + <xs:simpleType> + <xs:restriction base="xs:string"> + + + +Saleem, et al. Informational [Page 153] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:enumeration value="generate"/> + <xs:enumeration value="suspend"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="maxtime" type="posDuration.datatype" + use="optional"/> + <xs:attribute name="skip" type="duration.datatype" use="optional" + default="3s"/> + <xs:attribute name="barge" type="boolean.datatype" use="optional" + default="false"/> + <xs:attribute name="cleardb" type="boolean.datatype" use="optional" + default="false"/> + <xs:attribute ref="xml:lang"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="record" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:choice minOccurs="0"> + <xs:element ref="play" minOccurs="0" maxOccurs="unbounded"/> + <xs:element ref="tonegen" minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="recordexit"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + </xs:choice> + <xs:attribute name="append" type="boolean.datatype" use="optional" + default="false"/> + <xs:attribute name="dest" type="xs:anyURI" use="optional"/> + <xs:attribute name="audiodest" type="xs:anyURI" use="optional"/> + <xs:attribute name="videodest" type="xs:anyURI" use="optional"/> + <xs:attribute name="format" use="required"> + <xs:simpleType> + <xs:restriction base="xs:string"/> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="codecconfig" use="optional"> + <xs:simpleType> + <xs:restriction base="xs:string"/> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="audiosamplerate" type="xs:positiveInteger" + use="optional"/> + + + +Saleem, et al. Informational [Page 154] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:attribute name="audiosamplesize" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="profile" use="optional"> + <xs:simpleType> + <xs:restriction base="xs:string"/> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="level" use="optional"> + <xs:simpleType> + <xs:restriction base="xs:string"/> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="imagewidth" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="imageheight" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="maxbitrate" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="framerate" type="xs:positiveInteger" + use="optional"/> + <xs:attribute name="maxtime" type="posDuration.datatype" + use="required"/> + <xs:attribute name="initial" use="optional" default="create"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="create"/> + <xs:enumeration value="suspend"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="prespeech" type="posDuration.datatype" + use="optional" default="0s"/> + <xs:attribute name="postspeech" type="posDuration.datatype" + use="optional" default="0s"/> + <xs:attribute name="termkey" use="optional"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:pattern value="[0-9#*ABCD]"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="dtmf" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + + + +Saleem, et al. Informational [Page 155] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:extension base="primitiveType"> + <xs:sequence> + <xs:element name="pattern" maxOccurs="unbounded"> + <xs:complexType> + <xs:group ref="sendType"/> + <xs:attribute name="digits" type="xs:string" use="required"/> + <xs:attribute name="format"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="mgcp"/> + <xs:enumeration value="megaco"/> + <xs:enumeration value="moml+digits"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="iterate" type="iterate.datatype" + default="1"/> + </xs:complexType> + </xs:element> + <xs:element name="detect" minOccurs="0"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + <xs:element name="noinput" type="iterateSendType" minOccurs="0"/> + <xs:element name="nomatch" type="iterateSendType" minOccurs="0"/> + <xs:element name="dtmfexit" minOccurs="0"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + <xs:element ref="play" minOccurs="0"/> + </xs:sequence> + <xs:attribute name="cleardb" type="boolean.datatype" + default="true"/> + <xs:attribute name="fdt" type="posDuration.datatype" default="0s"/> + <xs:attribute name="idt" type="posDuration.datatype" default="4s"/> + <xs:attribute name="edt" type="posDuration.datatype" default="4s"/> + <xs:attribute name="starttimer" type="boolean.datatype" + default="false"/> + <xs:attribute name="iterate" type="iterate.datatype" default="1"/> + <xs:attribute name="ldd" type="posDuration.datatype" default="0s"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="collect" substitutionGroup="primitive"> + <xs:complexType> + + + +Saleem, et al. Informational [Page 156] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:sequence> + <xs:element name="pattern" maxOccurs="unbounded"> + <xs:complexType> + <xs:group ref="sendType"/> + <xs:attribute name="digits" type="xs:string" use="required"/> + <xs:attribute name="format"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="mgcp"/> + <xs:enumeration value="megaco"/> + <xs:enumeration value="moml+digits"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="iterate" type="iterate.datatype" + default="1"/> + </xs:complexType> + </xs:element> + <xs:element name="detect" minOccurs="0"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + <xs:element name="noinput" type="iterateSendType" minOccurs="0"/> + <xs:element name="nomatch" type="iterateSendType" minOccurs="0"/> + <xs:element name="dtmfexit" minOccurs="0"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + <xs:element ref="play" minOccurs="0"/> + </xs:sequence> + <xs:attribute name="cleardb" type="boolean.datatype" + default="true"/> + <xs:attribute name="fdt" type="posDuration.datatype" default="0s"/> + <xs:attribute name="idt" type="posDuration.datatype" default="4s"/> + <xs:attribute name="edt" type="posDuration.datatype" default="4s"/> + <xs:attribute name="starttimer" type="boolean.datatype" + default="false"/> + <xs:attribute name="iterate" type="iterate.datatype" default="1"/> + <xs:attribute name="ldd" type="posDuration.datatype"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="dtmfgen" substitutionGroup="primitive"> + + + +Saleem, et al. Informational [Page 157] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:choice minOccurs="0"> + <xs:element name="dtmfgenexit"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + </xs:choice> + <xs:attribute name="level" use="optional" default="-6"> + <xs:simpleType> + <xs:restriction base="xs:nonPositiveInteger"> + <xs:maxInclusive value="0"/> + <xs:minInclusive value="-96"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="digits" type="dtmfDigits.datatype" + use="required"/> + <xs:attribute name="dur" type="posDuration.datatype" use="optional" + default="100ms"/> + <xs:attribute name="interval" type="posDuration.datatype" + use="optional" default="100ms"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="tonegen" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:choice minOccurs="0"> + <xs:element name="tonegenexit" minOccurs="0"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + <xs:element name="tone" maxOccurs="unbounded"> + <xs:complexType> + <xs:sequence> + <xs:element name="tone1"> + <xs:complexType> + <xs:attribute name="freq" use="required"> + <xs:simpleType> + <xs:restriction base="xs:unsignedInt"> + <xs:minInclusive value="0"/> + <xs:maxInclusive value="3999"/> + + + +Saleem, et al. Informational [Page 158] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="atten" use="required"> + <xs:simpleType> + <xs:restriction base="xs:nonPositiveInteger"> + <xs:minInclusive value="-96"/> + <xs:maxInclusive value="0"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:complexType> + </xs:element> + <xs:element name="tone2"> + <xs:complexType> + <xs:attribute name="freq" use="required"> + <xs:simpleType> + <xs:restriction base="xs:unsignedInt"> + <xs:minInclusive value="0"/> + <xs:maxInclusive value="3999"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="atten" use="required"> + <xs:simpleType> + <xs:restriction base="xs:nonPositiveInteger"> + <xs:minInclusive value="-96"/> + <xs:maxInclusive value="0"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:complexType> + </xs:element> + <xs:element name="silence" minOccurs="0" maxOccurs="unbounded"> + <xs:complexType> + <xs:attribute name="duration" type="duration.datatype" + use="required"/> + </xs:complexType> + </xs:element> + </xs:sequence> + <xs:attribute name="duration" use="required"> + <xs:simpleType> + <xs:restriction base="duration.datatype"/> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="iterate" type="iterate.datatype" + use="optional" default="1"/> + </xs:complexType> + + + +Saleem, et al. Informational [Page 159] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:element> + <xs:element name="silence" minOccurs="0" maxOccurs="unbounded"> + <xs:complexType> + <xs:attribute name="duration" type="duration.datatype" + use="required"/> + </xs:complexType> + </xs:element> + </xs:choice> + <xs:attribute name="iterate" type="iterate.datatype" use="optional" + default="1"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:complexType name="iterateSendType"> + <xs:group ref="sendType"/> + <xs:attribute name="iterate" type="iterate.datatype" default="1"/> + </xs:complexType> + <xs:element name="smedia" type="smediaType" abstract="true"/> + <xs:complexType name="smediaType"> + <xs:attribute ref="xml:lang"/> + <xs:attribute name="iterate" type="iterate.datatype"/> + </xs:complexType> + <xs:element name="var" substitutionGroup="smedia"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="smediaType"> + <xs:attribute name="type" use="required"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="date"/> + <xs:enumeration value="digits"/> + <xs:enumeration value="duration"/> + <xs:enumeration value="month"/> + <xs:enumeration value="money"/> + <xs:enumeration value="number"/> + <xs:enumeration value="silence"/> + <xs:enumeration value="time"/> + <xs:enumeration value="weekday"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="subtype" type="xs:string" use="optional"/> + <xs:attribute name="value" type="xs:string" use="required"/> + </xs:extension> + </xs:complexContent> + + + + + +Saleem, et al. Informational [Page 160] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:complexType> + </xs:element> +</xs:schema> + +16.3.5. msml-dialog-transform.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="unqualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-transform-datatypes.xsd"/> +</xs:schema> + +16.3.6. msml-dialog-transform-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="unqualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:import namespace="http://www.w3.org/XML/1998/namespace" + schemaLocation="http://www.w3.org/2001/xml.xsd"/> + <xs:element name="vad" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:all> + <xs:element name="voice" type="vadPatternType" minOccurs="0"/> + <xs:element name="silence" type="vadPatternType" minOccurs="0"/> + <xs:element name="tvoice" type="vadPatternType" minOccurs="0"/> + <xs:element name="tsilence" type="vadPatternType" minOccurs="0"/> + </xs:all> + <xs:attribute name="starttimer" type="boolean.datatype" + default="false"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="gain" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:attribute name="incr" default="3"> + <xs:simpleType> + <xs:restriction base="xs:positiveInteger"> + <xs:maxInclusive value="96"/> + + + +Saleem, et al. Informational [Page 161] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="amt" use="required"> + <xs:simpleType> + <xs:restriction base="xs:integer"> + <xs:minInclusive value="-96"/> + <xs:maxInclusive value="96"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="agc" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:attribute name="tgtlvl" use="required"> + <xs:simpleType> + <xs:restriction base="xs:nonPositiveInteger"> + <xs:minInclusive value="-40"/> + <xs:maxInclusive value="0"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="maxgain" default="10"> + <xs:simpleType> + <xs:restriction base="xs:nonNegativeInteger"> + <xs:minInclusive value="0"/> + <xs:maxInclusive value="40"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="gate" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:attribute name="initial" default="pass"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="pass"/> + <xs:enumeration value="halt"/> + + + +Saleem, et al. Informational [Page 162] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="clamp" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"/> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="relay" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"/> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:complexType name="vadPatternType"> + <xs:group ref="sendType"/> + <xs:attribute name="iterate" type="iterate.datatype" default="1"/> + <xs:attribute name="len" type="posDuration.datatype" use="required"/> + <xs:attribute name="sen" type="posDuration.datatype" use="optional"/> + </xs:complexType> +</xs:schema> + +16.3.7. msml-dialog-group.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="unqualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-base-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-group-datatypes.xsd"/> +</xs:schema> + +16.3.8. msml-dialog-group-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="unqualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core-datatypes.xsd"/> + + + +Saleem, et al. Informational [Page 163] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-base-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-transform-datatypes.xsd"/> + <xs:element name="group" substitutionGroup="control"> + <xs:complexType> + <xs:sequence> + <xs:group ref="executeType"/> + <xs:element name="groupexit" minOccurs="0"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + </xs:sequence> + <xs:attribute name="id" type="momlID.datatype"/> + <xs:attribute name="topology" use="required"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="serial"/> + <xs:enumeration value="parallel"/> + <xs:enumeration value="fullduplex"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:complexType> + </xs:element> +</xs:schema> + +16.3.9. msml-dialog-speech.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-base-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-speech-datatypes.xsd"/> +</xs:schema> + +16.3.10. msml-dialog-speech-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-base-datatypes.xsd"/> + <xs:include schemaLocation="http://www.w3.org/TR/2002/WD-speech- + + + +Saleem, et al. Informational [Page 164] + +RFC 5707 Media Server Markup Language February 2010 + + + synthesis-20020405/synthesis-core.xsd"/> + <xs:include schemaLocation="http://www.w3.org/TR/speech- + grammar/grammar-core.xsd"/> + <xs:element name="speech" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:sequence> + <xs:element name="grammar" maxOccurs="unbounded"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="grammar"> + <xs:choice> + <xs:element name="match" type="iterateSendType" + minOccurs="0"/> + </xs:choice> + <xs:attribute name="uri" type="xs:anyURI"/> + <xs:attribute name="iterate" type="iterate.datatype" + default="1"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="noinput" type="iterateSendType" minOccurs="0"/> + <xs:element name="nomatch" type="iterateSendType" minOccurs="0"/> + <xs:element name="speechexit" minOccurs="0"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + </xs:sequence> + <xs:attribute name="noint" type="posDuration.datatype"/> + <xs:attribute name="norect" type="posDuration.datatype"/> + <xs:attribute name="spcmplt" type="posDuration.datatype"/> + <xs:attribute name="confidence"> + <xs:simpleType> + <xs:restriction base="xs:positiveInteger"> + <xs:maxInclusive value="100"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="sens" type="xs:positiveInteger"/> + <xs:attribute name="starttimer" type="boolean.datatype" + default="false"/> + <xs:attribute name="iterate" type="iterate.datatype" default="1"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + + + +Saleem, et al. Informational [Page 165] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:element> + <xs:element name="tts" type="smediaType" substitutionGroup="smedia"/> +</xs:schema> + +16.3.11. msml-dialog-fax-detect.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-fax-detect-datatypes.xsd"/> +</xs:schema> + +16.3.12. msml-dialog-fax-detect-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:element name="faxdetect" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:choice minOccurs="0"> + <xs:element name="faxdetectexit"> + <xs:complexType> + <xs:group ref="sendType"/> + </xs:complexType> + </xs:element> + </xs:choice> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> +</xs:schema> + +16.3.13. msml-dialog-fax-sendrecv.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-dialog-fax-sendrecv-datatypes.xsd"/> + + + +Saleem, et al. Informational [Page 166] + +RFC 5707 Media Server Markup Language February 2010 + + +</xs:schema> + +16.3.14. msml-dialog-fax-sendrecv-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-dialog-core-datatypes.xsd"/> + <xs:element name="faxsend" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:sequence> + <xs:element name="sendobj" type="sendobjType" minOccurs="0" + maxOccurs="unbounded"/> + <xs:element name="hdrfooter" type="hdrfooterType" minOccurs="0"/> + <xs:element name="rxpoll" minOccurs="0"> + <xs:complexType> + <xs:sequence> + <xs:element name="rcvobj" type="rcvobjType" + maxOccurs="unbounded"/> + <xs:element name="hdrfooter" type="hdrfooterType" + minOccurs="0"/> + </xs:sequence> + <xs:attribute name="rmtid" type="faxid.datatype" + use="required"/> + </xs:complexType> + </xs:element> + <xs:group ref="faxstatusrequest"/> + </xs:sequence> + <xs:attribute name="lclid" type="faxid.datatype" use="optional"/> + <xs:attribute name="minspeed" type="faxspeed.datatype" + use="optional"/> + <xs:attribute name="maxspeed" type="faxspeed.datatype" + use="optional"/> + <xs:attribute name="ecm" type="boolean.datatype" use="optional"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="faxrecv" substitutionGroup="primitive"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="primitiveType"> + <xs:sequence> + <xs:element name="rcvobj" type="rcvobjType" minOccurs="0" + maxOccurs="unbounded"/> + + + +Saleem, et al. Informational [Page 167] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:element name="hdrfooter" type="hdrfooterType" minOccurs="0"/> + <xs:element name="txpoll" minOccurs="0"> + <xs:complexType> + <xs:sequence> + <xs:element name="sendobj" type="sendobjType" + maxOccurs="unbounded"/> + <xs:element name="hdrfooter" type="hdrfooterType" + minOccurs="0"/> + </xs:sequence> + <xs:attribute name="rmtid" type="faxid.datatype"/> + </xs:complexType> + </xs:element> + <xs:group ref="faxstatusrequest"/> + </xs:sequence> + <xs:attribute name="lclid" type="faxid.datatype" use="optional"/> + <xs:attribute name="ecm" type="boolean.datatype" default="true"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:group name="faxstatusrequest"> + <xs:sequence> + <xs:element name="faxstart" minOccurs="0"/> + <xs:element name="faxnegotiate" minOccurs="0"/> + <xs:element name="faxpagedone" minOccurs="0"/> + <xs:element name="faxobjectdone" minOccurs="0"/> + <xs:element name="faxopcomplete" minOccurs="0"/> + <xs:element name="faxpollstart" minOccurs="0"/> + </xs:sequence> + </xs:group> + <xs:complexType name="hdrfooterType"> + <xs:choice> + <xs:element name="format" type="xs:string" minOccurs="0" + maxOccurs="unbounded"/> + </xs:choice> + <xs:attribute name="type" type="hdrfooter.datatype"/> + <xs:attribute name="style" type="hdrfooterstyle.datatype"/> + </xs:complexType> + <xs:complexType name="formatType"> + <xs:simpleContent> + <xs:extension base="xs:string"> + <xs:attribute name="style"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="append"/> + <xs:enumeration value="overlay"/> + <xs:enumeration value="replace"/> + </xs:restriction> + + + +Saleem, et al. Informational [Page 168] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:simpleType> + </xs:attribute> + </xs:extension> + </xs:simpleContent> + </xs:complexType> + <xs:complexType name="rcvobjType"> + <xs:attribute name="objuri" type="xs:anyURI" use="required"/> + <xs:attribute name="maxpages" type="xs:positiveInteger"/> + </xs:complexType> + <xs:complexType name="sendobjType"> + <xs:attribute name="objuri" type="xs:anyURI" use="required"/> + <xs:attribute name="startpage" type="xs:positiveInteger"/> + <xs:attribute name="pagecount" type="xs:positiveInteger"/> + </xs:complexType> + <xs:simpleType name="faxid.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="[0-9+*- ]{20}"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="faxspeed.datatype"> + <xs:restriction base="xs:string"> + <xs:enumeration value="2400"/> + <xs:enumeration value="4800"/> + <xs:enumeration value="7200"/> + <xs:enumeration value="9600"/> + <xs:enumeration value="12000"/> + <xs:enumeration value="14400"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="hdrfooter.datatype"> + <xs:restriction base="xs:string"> + <xs:enumeration value="header"/> + <xs:enumeration value="footer"/> + <xs:enumeration value="autohdr"/> + <xs:enumeration value="nohdr"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="hdrfooterstyle.datatype"> + <xs:restriction base="xs:string"> + <xs:enumeration value="append"/> + <xs:enumeration value="overlay"/> + <xs:enumeration value="replace"/> + </xs:restriction> + </xs:simpleType> +</xs:schema> + + + + + + +Saleem, et al. Informational [Page 169] + +RFC 5707 Media Server Markup Language February 2010 + + +16.4. MSML Audit Packages + +16.4.1. msml-audit-core.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-core-datatypes.xsd"/> +</xs:schema> + +16.4.2. msml-audit-core-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core-datatypes.xsd"/> + <xs:element name="audit" substitutionGroup="msmlRequest"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlRequestType"> + <xs:attribute name="queryid" type="auditQueryId.datatype" + use="required"/> + <xs:attribute name="statelist" type="auditStateList.datatype" + use="optional"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="auditresult" substitutionGroup="msmlResultComplex"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="msmlResultComplexType"> + <xs:choice maxOccurs="unbounded"> + <xs:element ref="stateParameter"/> + <xs:element ref="stateParameterSimple"/> + </xs:choice> + <xs:attribute name="targetid" type="independentID.datatype" + use="required"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="stateParameter" type="stateParameterType" + abstract="true"/> + + + +Saleem, et al. Informational [Page 170] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:element name="stateParameterSimple" type="stateParameterSimpleType" + abstract="true"/> + <xs:complexType name="stateParameterType"/> + <xs:simpleType name="stateParameterSimpleType"> + <xs:restriction base="xs:string"/> + </xs:simpleType> + <xs:simpleType name="auditQueryId.datatype"> + <xs:restriction base="xs:string"> + <xs:pattern value="conf:[a-zA-Z0-9.:\-_]+"/> + <xs:pattern value="conn:[a-zA-Z0-9.:\-_]+"/> + <xs:pattern value="conf:\*"/> + <xs:pattern value="conn:\*"/> + </xs:restriction> + </xs:simpleType> + <xs:simpleType name="auditStateList.datatype"> + <xs:restriction base="xs:string"/> + </xs:simpleType> +</xs:schema> + +16.4.3. msml-audit-conf.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-dialog-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-stream-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-conf-datatypes.xsd"/> +</xs:schema> + +16.4.4. msml-audit-conf-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-conf-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-core-datatypes.xsd"/> + <xs:element name="confconfig" substitutionGroup="stateParameter"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="stateParameterType"> + <xs:sequence> + <xs:element name="audiomix" type="audioMixType" minOccurs="0" + maxOccurs="unbounded"/> + + + +Saleem, et al. Informational [Page 171] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:element name="videolayout" type="videoLayoutType" + minOccurs="0" maxOccurs="unbounded"/> + <xs:element name="controller" type="connID.datatype" + minOccurs="0"/> + </xs:sequence> + <xs:attribute name="deletewhen" use="optional" default="never"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:enumeration value="nomedia"/> + <xs:enumeration value="nocontrol"/> + <xs:enumeration value="never"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="term" type="boolean.datatype" use="optional" + default="true"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> +</xs:schema> + +16.4.5. msml-audit-conn.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-core.xsd"/> + <xs:include schemaLocation="msml-audit-core-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-dialog-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-stream-datatypes.xsd"/> + <xs:include schemaLocation="msml-audit-conn-datatypes.xsd"/> +</xs:schema> + +16.4.6. msml-audit-conn-datatypes.xsd + +<?xml version="1.0" encoding="UTF-8"?> +<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-audit-core-datatypes.xsd"/> + <xs:element name="sipdialog" substitutionGroup="stateParameter"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="stateParameterType"> + <xs:sequence> + <xs:element name="localseq" type="xs:integer" minOccurs="0"/> + + + +Saleem, et al. Informational [Page 172] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:element name="remoteseq" type="xs:int" minOccurs="0"/> + <xs:element name="localuri" type="xs:string" minOccurs="0"/> + <xs:element name="remoteuri" type="xs:string" minOccurs="0"/> + <xs:element name="remotetarget" type="xs:string" minOccurs="0"/> + <xs:element name="routeset" type="xs:string" minOccurs="0"/> + </xs:sequence> + <xs:attribute name="callid" type="xs:string" use="required"/> + <xs:attribute name="localtag" type="xs:string" use="required"/> + <xs:attribute name="remotetag" type="xs:string" use="required"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + <xs:element name="localsdp" type="stateParameterSimpleType" + substitutionGroup="stateParameterSimple"/> + <xs:element name="remotesdp" type="stateParameterSimpleType" + substitutionGroup="stateParameterSimple"/> +</xs:schema> + +16.4.7. msml-audit-dialog-datatypes.xsd + + Audit Dialog functionality requires use of either the Audit Conf + Package or the Audit Conn Package. + + <?xml version="1.0" encoding="UTF-8"?> + <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-audit-core-datatypes.xsd"/> + <xs:element name="dialog" substitutionGroup="stateParameter"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="stateParameterType"> + <xs:sequence> + <xs:element name="duration" type="xs:positiveInteger" + minOccurs="0"/> + <xs:element name="primitive" minOccurs="0"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:pattern value="play"/> + <xs:pattern value="dtmf"/> + <xs:pattern value="collect"/> + <xs:pattern value="dtmfgen"/> + <xs:pattern value="tonegen"/> + <xs:pattern value="record"/> + <xs:pattern value="none"/> + </xs:restriction> + </xs:simpleType> + + + +Saleem, et al. Informational [Page 173] + +RFC 5707 Media Server Markup Language February 2010 + + + </xs:element> + <xs:element name="controller" type="connID.datatype" + minOccurs="0"/> + </xs:sequence> + <xs:attribute name="name" type="msmlInstanceID.datatype" + use="required"/> + <xs:attribute name="src" type="xs:anyURI" use="optional"/> + <xs:attribute name="type" type="dialogLanguage.datatype" + use="required"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + </xs:schema> + +16.4.8. msml-audit-stream-datatypes.xsd + + Audit Stream functionality requires use of either the Audit Conf + Package or the Audit Conn Package. + + <?xml version="1.0" encoding="UTF-8"?> + <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" + elementFormDefault="qualified" + attributeFormDefault="unqualified"> + <xs:include schemaLocation="msml-audit-core-datatypes.xsd"/> + <xs:element name="stream" substitutionGroup="stateParameter"> + <xs:complexType> + <xs:complexContent> + <xs:extension base="stateParameterType"> + <xs:all> + <xs:element name="clamp" minOccurs="0"> + <xs:complexType> + <xs:attribute name="dtmf" type="boolean.datatype"/> + <xs:attribute name="tones" type="boolean.datatype"/> + </xs:complexType> + </xs:element> + <xs:element name="gain" minOccurs="0"> + <xs:complexType> + <xs:attribute name="amt" use="optional"> + <xs:simpleType> + <xs:restriction base="xs:integer"> + <xs:minInclusive value="-96"/> + <xs:maxInclusive value="96"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="agc" type="boolean.datatype"/> + <xs:attribute name="tgtlvl" use="optional"> + + + +Saleem, et al. Informational [Page 174] + +RFC 5707 Media Server Markup Language February 2010 + + + <xs:simpleType> + <xs:restriction base="xs:nonPositiveInteger"> + <xs:minInclusive value="-40"/> + <xs:maxInclusive value="0"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="maxgain" default="10"> + <xs:simpleType> + <xs:restriction base="xs:nonNegativeInteger"> + <xs:minInclusive value="0"/> + <xs:maxInclusive value="40"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + </xs:complexType> + </xs:element> + <xs:element name="visual" minOccurs="0"/> + </xs:all> + <xs:attribute name="joinwith" type="independentID.datatype" + use="required"/> + <xs:attribute name="media" use="required"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:pattern value="audio"/> + <xs:pattern value="video"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="dir" use="required"> + <xs:simpleType> + <xs:restriction base="xs:string"> + <xs:pattern value="from"/> + <xs:pattern value="to"/> + </xs:restriction> + </xs:simpleType> + </xs:attribute> + <xs:attribute name="compressed" type="boolean.datatype"/> + <xs:attribute name="preferred" type="boolean.datatype" + default="false"/> + <xs:attribute name="display" type="xs:string"/> + <xs:attribute name="override" type="boolean.datatype" + default="false"/> + </xs:extension> + </xs:complexContent> + </xs:complexType> + </xs:element> + </xs:schema> + + + +Saleem, et al. Informational [Page 175] + +RFC 5707 Media Server Markup Language February 2010 + + +17. Security Considerations + + MSML being an XML-based language, security considerations as defined + by RFC 3023 [i2] are applicable. + + Media server interfaces driven using MSML are under the explicit + control of a SIP application server. SIP call legs are used to + deliver XML-based MSML transactions to the media server. The + security and integrity of MSML transactions, whenever required, + SHOULD use sips: and TLS for encryption and authentication of the SIP + control channel used to carry MSML payloads. Further information + related to security, privacy, and integrity of MSML media types is + described in the IANA Considerations section. + + Media streams, such as audio/video, MAY optionally be protected, + encrypted/decrypted, and authenticated, utilizing Secure Real Time + Protocol (SRTP), wherever media stream security is required. Media + negotiation establishes the required level of security and is + initiated by the clients, which is outside the scope of the control + interface specified by MSML. + +18. IANA Considerations + +18.1. IANA Registrations for 'application' MIME Media Type + + The following registrations have been made: + + Type Name: "application" + + Subtype names: + + 'application/vnd.radisys.msml+xml', + + 'application/vnd.radisys.moml+xml', + + 'application/vnd.radisys.msml-conf+xml', + + 'application/vnd.radisys.msml-dialog+xml', + + 'application/vnd.radisys.msml-dialog-base+xml', + + 'application/vnd.radisys.msml-dialog-group+xml', + + 'application/vnd.radisys.msml-dialog-speech+xml', + + 'application/vnd.radisys.msml-dialog-transform+xml', + + 'application/vnd.radisys.msml-dialog-fax-detect+xml', + + + +Saleem, et al. Informational [Page 176] + +RFC 5707 Media Server Markup Language February 2010 + + + 'application/vnd.radisys.msml-dialog-fax-sendrecv+xml', + + 'application/vnd.radisys.msml-audit+xml', + + 'application/vnd.radisys.msml-audit-conf+xml', + + 'application/vnd.radisys.msml-audit-conn+xml', + + 'application/vnd.radisys.msml-audit-dialog+xml', + + 'application/vnd.radisys.msml-audit-stream+xml' + + Required parameters: none + + Optional parameters: charset + + charset semantics as specified in RFC 3023 [i2] for + "application/xml" media type. + + Encoding considerations: + + As specified in RFC 3023 [i2]. + + Security Considerations: + + Media types included in this section are XML based, and therefore + security considerations as defined by RFC 3023 [i10] are + applicable. + + These media types do not contain active or executable content as + the content itself merely provides control of the underlying media + streams. + + Secure exchange of content associated with these media types for + purposes of authentication and privacy, whenever applicable, shall + require the establishment of a secure control channel using sips: + and TLS. + + Privacy and integrity of media content associated with these media + types shall be considered when applications using these media + types are exchanging personal information such as personal + identification codes or conference access codes. Whenever such + content is deemed to require secure transport and authentication, + a secure channel using sips: and TLS MUST be used, as these media + types themselves provide no such inherent mechanisms for security. + + + + + + +Saleem, et al. Informational [Page 177] + +RFC 5707 Media Server Markup Language February 2010 + + + Interoperability considerations: + + As specified in RFC 3023 [i2] and as specified within this + document. + + Published specification: RFC 5707 + + Intended applications for these media types: + + Multimedia Conferencing, Interactive Voice Response systems + + Additional information: + + Magic number(s): None + + File extension(s): None + + Macintosh file type code(s): None + + Person & email address to contact for further information: + + Adnan Saleem <adnan.saleem@radisys.com> + + Intended usage: COMMON + +18.2. IANA Registrations for 'text' MIME Media Type + + The following registrations are planned: + + 'text/vnd.radisys.msml-basic-layout' + + Required parameters: none + + Optional parameters: charset + + charset semantics as specified in RFC 3023 [i2] for "text/xml" + media type. + + Encoding considerations: As specified in RFC 3023 [i2]. + + Security Considerations: + + Media types included in this section are XML based, and therefore + security considerations as defined by RFC 3023 [i10] are + applicable. + + + + + + +Saleem, et al. Informational [Page 178] + +RFC 5707 Media Server Markup Language February 2010 + + + The media type defined in this section does not contain active or + executable content. The media type defines only a visual layout + scheme of a video conference. Establishment of active connections + associated with the video conference are outside the scope of this + media type. + + Since this media type only defines a visual layout scheme, with no + reference or information about client connections or participants + within the conference, privacy and integrity concerns are not + applicable to this media type. + + Interoperability considerations: + + As specified in RFC 3023 [i2] and as specified within this + document. + + Published specification: RFC 5707 + + Intended applications for these media types: + + Multimedia Conferencing, Interactive Voice Response systems + + Additional information: + + Magic number(s): None + + File extension(s): None + + Macintosh file type code(s): None + + Person & email address to contact for further information: + + Adnan Saleem <adnan.saleem@radisys.com> + + Intended usage: COMMON + +18.3. URN Sub-Namespace Registration + + The namespace URI for elements defined within this specification is a + URN [i8]. It uses the namespace identifier 'ietf' defined by [i9] + and extended by RFC 3688 [i10]. + + The following registrations of URN Sub-Namespaces are planned: + + XML namespace: urn:ietf:params:xml:ns:msml + + + + + + +Saleem, et al. Informational [Page 179] + +RFC 5707 Media Server Markup Language February 2010 + + + XML: + + BEGIN + + <?xml version="1.0"?> + + <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML Basic 1.0//EN" + + "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd"> + + <html xmlns="http://www.w3.org/1999/xhtml"> + + <head> + + <meta http-equiv="content-type" + + content="text/html;charset=iso-8859-1"/> + + <title>Media Server Markup Language Namespace</title> + + </head> + + <body> + + <h1>Namespace for Media Server Markup Language</h1> + + <h2>urn:ietf:params:xml:ns:msml</h2> + + <p>See MSML <a + href="http://www.rfc-editor.org/rfc/rfc5707.txt">RFC 5707</a></p> + + </body> + + </html> + + END + +18.4. XML Schema Registration + + This section registers an XML schema per the procedures in [i10]. + + URI: urn:ietf:params:xml:schema:msml + + Registrant Contact: + + Adnan Saleem (adnan.saleem@radisys.com) and authors listed + within this document. + + + + +Saleem, et al. Informational [Page 180] + +RFC 5707 Media Server Markup Language February 2010 + + + The XML for this schema can be found as the sole content of Section + 16. + +19. References + +19.1. Normative References + + [n1] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., + Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: + Session Initiation Protocol", RFC 3261, June 2002. + + [n2] Bray, T., Paoli, J., Sperberg-McQueen, C., and E. Maler, + "Extensible Markup Language (XML) 1.0 (Second Edition)," W3C + First Edition REC-xml-20001006, October 2000. + + [n3] World Wide Web Consortium, "Speech Recognition Grammar + Specification Version 1.0" (SRGS), W3C Candidate + Recommendation, March 16, 2004 + + [n4] World Wide Web Consortium, "Natural Language Semantics Markup + Language (NLSML) for the Speech Interface Framework", W3C + Working Draft 20, November 2000. + + [n5] World Wide Web Consortium, "Voice Extensible Markup Language + (VoiceXML) Version 2.0, W3C Candidate Recommendation, March 16, + 2004. + + [n6] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform + Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, + January 2005. + + [n7] Burger, E., Ed., Van Dyke, J., and A. Spitzer, "Basic Network + Media Services with SIP", RFC 4240, December 2005. + + [n8] Levinson, E., "Content-ID and Message-ID Uniform Resource + Locators", RFC 2392, August 1998. + + [n9] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session + Description Protocol", RFC 4566, July 2006. + + [n10] Bos, B., Lie, H., Tantek, C., and Hickson, I., "Cascading Style + Sheets, level 2 (CSS2) Specification," W3C REC CR-CSS21-, July + 2007. + + [n11] Burnett, D., Walker, M., and Hunt, A., "Speech Synthesis Markup + Language (SSML) Version 1.0", W3C Recommendation, 7 September + 2004. + + + + +Saleem, et al. Informational [Page 181] + +RFC 5707 Media Server Markup Language February 2010 + + +19.2. Informative References + + [i1] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Indicating + User Agent Capabilities in the Session Initiation Protocol + (SIP)", RFC 3840, August 2004. + + [i2] Murata, M., St. Laurent, S., and D. Kohn, "XML Media Types", + RFC 3023, January 2001. + + [i3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, + "RTP: A Transport Protocol for Real-Time Applications", STD 64, + RFC 3550, July 2003. + + [i4] Rosenberg, J., Peterson, J., Schulzrinne, H., and G. Camarillo, + "Best Current Practices for Third Party Call Control (3pcc) in + the Session Initiation Protocol (SIP)", BCP 85, RFC 3725, April + 2004. + + [i5] Donovan, S., "The SIP INFO Method", RFC 2976, October 2000. + + [i6] Ossenbruggen, J., Rutledge, L., Saccocio, B., Schmitz, P., + Kate, W., Ayars, J., Bulterman, D., Cohen, A., Day, K., Hodge, + E., Hoschka, P., Hyche, E., Jourdan, M., Kubota, K., Lanphier, + R., Laya'da, N., Michel, T., and D. Newman, "Synchronized + Multimedia Integration Language (SMIL 2.0) Specification," W3C + REC REC-smil2-20050107, January 2005. + + [i7] Freed, N. and N. Borenstein, "Multipurpose Internet Mail + Extensions (MIME) Part Two: Media Types", RFC 2046, November + 1996. + + [i8] Moats, R., "URN Syntax", RFC 2141, May 1997. + + [i9] Moats, R., "A URN Namespace for IETF Documents", RFC 2648, + August 1999. + + [i10] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, + January 2004. + + [i11] Boulton, C., Melanchuk, T., McGlashan, S., and A. Shiratzky, "A + Control Framework for the Session Initiation Protocol (SIP)", + Work in Progress, February 2007. + + + + + + + + + +Saleem, et al. Informational [Page 182] + +RFC 5707 Media Server Markup Language February 2010 + + +Acknowledgments + + Sergiu Stambolian of RadiSys provided key insights, both theoretic + and through development experience, on several versions of the + document. + + Stephen Buko and George Raskulinec of Intel made numerous valuable + contributions towards enhancements of multimedia playback and record + operations. Gene Shtirmer of Intel provided review feedback on + several revisions and feature enhancement suggestions. + + David Asher of NMS Communications provided valuable insights towards + creation of standard profiles and a modularization scheme based on + packages for better interoperability. + + Gilles Compienne of Ubiquity Software has provided feedback on + several earlier versions of this document. + + Chris Boulton and Ben Smith, both of Ubiquity, and Michael Rice of + VocalData helped clarify several issues, while Bruce Walsh and Kevin + Fitzgerald, both of Spectel/Avaya, provided important feedback. + Cliff Schornak of Commetrex significantly contributed to the + facsimile work. Peter Danielsen of Lucent has contributed thoughtful + and detailed reviews for several earlier versions of the document. + + + + + + + + + + + + + + + + + + + + + + + + + + + +Saleem, et al. Informational [Page 183] + +RFC 5707 Media Server Markup Language February 2010 + + +Authors' Addresses + + Adnan Saleem + RadiSys + 4190 Still Creek Drive, Suite 300 + Burnaby, BC, V5C 6C6 + Canada + + Phone: +1 604 918 6376 + EMail : adnan.saleem@radisys.com + + + Yong Xin + RadiSys + 4190 Still Creek Drive, Suite 300 + Burnaby, BC, V5C 6C6 + Canada + + Phone: +1 604 918 6383 + EMail: yong.xin@radiSys.com + + + Garland Sharratt + Consultant + Vancouver, BC + Canada + + EMail: garland.sharratt@gmail.com + + + + + + + + + + + + + + + + + + + + + + + +Saleem, et al. Informational [Page 184] + |