diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc5552.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc5552.txt')
-rw-r--r-- | doc/rfc/rfc5552.txt | 2019 |
1 files changed, 2019 insertions, 0 deletions
diff --git a/doc/rfc/rfc5552.txt b/doc/rfc/rfc5552.txt new file mode 100644 index 0000000..47e01d1 --- /dev/null +++ b/doc/rfc/rfc5552.txt @@ -0,0 +1,2019 @@ + + + + + + +Network Working Group D. Burke +Request for Comments: 5552 Google +Category: Standards Track M. Scott + Genesys + May 2009 + + + SIP Interface to VoiceXML Media Services + +Status of This Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (c) 2009 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents in effect on the date of + publication of this document (http://trustee.ietf.org/license-info). + Please review these documents carefully, as they describe your rights + and restrictions with respect to this document. + +Abstract + + This document describes a SIP interface to VoiceXML media services. + Commonly, Application Servers controlling Media Servers use this + protocol for pure VoiceXML processing capabilities. This protocol is + an adjunct to the full MEDIACTRL protocol and packages mechanism. + + + + + + + + + + + + + + + + + +Burke & Scott Standards Track [Page 1] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + +Table of Contents + + 1. Introduction ....................................................3 + 1.1. Use Cases ..................................................3 + 1.1.1. IVR Services with Application Servers ...............3 + 1.1.2. PSTN IVR Service Node ...............................4 + 1.1.3. 3GPP IMS Media Resource Function (MRF) ..............5 + 1.1.4. CCXML <-> VoiceXML Interaction ......................6 + 1.1.5. Other Use Cases .....................................6 + 1.2. Terminology ................................................7 + 2. VoiceXML Session Establishment and Termination ..................7 + 2.1. Service Identification .....................................7 + 2.2. Initiating a VoiceXML Session .............................10 + 2.3. Preparing a VoiceXML Session ..............................11 + 2.4. Session Variable Mappings .................................12 + 2.5. Terminating a VoiceXML Session ............................15 + 2.6. Examples ..................................................16 + 2.6.1. Basic Session Establishment ........................16 + 2.6.2. VoiceXML Session Preparation .......................17 + 2.6.3. MRCP Establishment .................................18 + 3. Media Support ..................................................19 + 3.1. Offer/Answer ..............................................19 + 3.2. Early Media ...............................................19 + 3.3. Modifying the Media Session ...............................21 + 3.4. Audio and Video Codecs ....................................21 + 3.5. DTMF ......................................................22 + 4. Returning Data to the Application Server .......................22 + 4.1. HTTP Mechanism ............................................22 + 4.2. SIP Mechanism .............................................23 + 5. Outbound Calling ...............................................25 + 6. Call Transfer ..................................................25 + 6.1. Blind .....................................................26 + 6.2. Bridge ....................................................27 + 6.3. Consultation ..............................................29 + 7. Contributors ...................................................31 + 8. Acknowledgements ...............................................31 + 9. Security Considerations ........................................31 + 10. IANA Considerations ...........................................32 + 11. References ....................................................32 + 11.1. Normative References .....................................32 + 11.2. Informative References ...................................35 + Appendix A. Notes on Normative References ........................36 + + + + + + + + + +Burke & Scott Standards Track [Page 2] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + +1. Introduction + + VoiceXML [VXML20], [VXML21] is a World Wide Web Consortium (W3C) + standard for creating audio and video dialogs that feature + synthesized speech, digitized audio, recognition of spoken and dual + tone multi-frequency (DTMF) key input, recording of audio and video, + telephony, and mixed-initiative conversations. VoiceXML allows Web- + based development and content delivery paradigms to be used with + interactive video and voice response applications. + + This document describes a SIP [RFC3261] interface to VoiceXML media + services. Commonly, Application Servers controlling media servers + use this protocol for pure VoiceXML processing capabilities. SIP is + responsible for initiating a media session to the VoiceXML media + server and simultaneously triggering the execution of a specified + VoiceXML application. This protocol is an adjunct to the full + MEDIACTRL protocol and packages mechanism. + + The interface described here leverages a mechanism for identifying + dialog media services first described in [RFC4240]. The interface + has been updated and extended to support the W3C Recommendation for + VoiceXML 2.0 [VXML20] and VoiceXML 2.1 [VXML21]. A set of commonly + implemented functions and extensions have been specified including + VoiceXML dialog preparation, outbound calling, video media support, + and transfers. VoiceXML session variable mappings have been defined + for SIP with an extensible mechanism for passing application-specific + values into the VoiceXML application. Mechanisms for returning data + to the Application Server have also been added. + +1.1. Use Cases + + The VoiceXML media service user in this document is generically + referred to as an Application Server. In practice, it is intended + that the interface defined by this document be applicable across a + wide range of use cases. Several intended use cases are described + below. + +1.1.1. IVR Services with Application Servers + + SIP Application Servers provide services to users of the network. + Typically, there may be several Application Servers in the same + network, each specialized in providing a particular service. + Throughout this specification and without loss of generality, we + posit the presence of an Application Server specialized in providing + Interactive Voice Response (IVR) services. A typical configuration + for this use case is illustrated below. + + + + + +Burke & Scott Standards Track [Page 3] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + +--------------+ + | | + | Application |\ + | Server | \ + | | \ HTTP + SIP +--------------+ \ + / \ \ + +-------------+ / SIP \ +--------------+ + | |/ \| | + | SIP | | VoiceXML | + | User Agent | RTP/SRTP | Media Server | + | |=====================| | + +-------------+ +--------------+ + + Assuming the Application Server also supports HTTP, the VoiceXML + application may be hosted on it and served up via HTTP [RFC2616]. + Note, however, that the Web model allows the VoiceXML application to + be hosted on a separate (HTTP) Application Server from the (SIP) + Application Server that interacts with the VoiceXML Media Server via + this specification. It is also possible for a static VoiceXML + application to be stored locally on the VoiceXML Media Server, + leveraging the VoiceXML 2.1 [VXML21] <data> mechanism to interact + with a Web/Application Server when dynamic behavior is required. The + viability of static VoiceXML applications is further enhanced by the + mechanisms defined in Section 2.4, through which the Application + Server can make session-specific information available within the + VoiceXML session context. + + The approach described in this document is sometimes termed the + "delegation model" -- the Application Server is essentially + delegating programmatic control of the human-machine interactions to + one or more VoiceXML documents running on the VoiceXML Media Server. + During the human-machine interactions, the Application Server remains + in the signaling path and can respond to results returned from the + VoiceXML Media Server or other external network events. + +1.1.2. PSTN IVR Service Node + + While this document is intended to enable enhanced use of VoiceXML as + a component of larger systems and services, it is intended that + devices that are completely unaware of this specification remain + capable of invoking VoiceXML services offered by a VoiceXML Media + Server compliant with this document. A typical configuration for + this use case is as follows: + + + + + + + +Burke & Scott Standards Track [Page 4] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + +-------------+ SIP +--------------+ + | |---------------------| | + | IP/PSTN | | VoiceXML | + | Gateway | RTP/SRTP | Media Server | + | |=====================| | + +-------------+ +--------------+ + + Note also that beyond the invocation and termination of a VoiceXML + dialog, the semantics defined for call transfers using REFER are + intended to be compatible with standard, existing IP/PSTN (Public + Switched Telephone Network) gateways. + +1.1.3. 3GPP IMS Media Resource Function (MRF) + + The 3rd Generation Partnership Project (3GPP) IP Multimedia Subsystem + (IMS) [TS23002] defines a Media Resource Function (MRF) used to offer + media processing services such as conferencing, transcoding, and + prompt/collect. The capabilities offered by VoiceXML are ideal for + offering richer media processing services in the context of the MRF. + In this architecture, the interface defined here corresponds to the + "Mr" interface to the MRFC (MRF Controller); the implementation of + this interface might use separated MRFC and MRFP (MRF Processor) + elements (as per the IMS architecture), or might be an integrated MRF + (as is common practice). + + +----------+ + | App | + | Server | + +----------+ + | + | SIP (ISC) + | + +----------+ SIP (Mr) +--------------+ + | S-CSCF |---------------| VoiceXML | + | | | MRF | + +----------+ +--------------+ + || + || RTP/SRTP (Mb) + || + + The above diagram is highly simplified and shows a subset of nodes + typically involved in MRF interactions. It should be noted that + while the MRF will primarily be used by the Application Server via + the Serving Call Session Control Function (S-CSCF), it is also + possible for calls to be routed directly to the MRF without the + involvement of an Application Server. + + + + + +Burke & Scott Standards Track [Page 5] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + Although the above is described in terms of the 3GPP IMS + architecture, it is intended that it is also applicable to 3GPP2, + Next Generation Network (NGN), and PacketCable architectures that are + converging with 3GPP IMS standards. + +1.1.4. CCXML <-> VoiceXML Interaction + + Call Control eXtensible Markup Language (CCXML) 1.0 [CCXML10] + applications provide services mainly through controlling the + interaction between Connections, Conferences, and Dialogs. Although + CCXML is capable of supporting arbitrary dialog environments, + VoiceXML is commonly used as a dialog environment in conjunction with + CCXML applications; CCXML is specifically designed to effectively + support the use of VoiceXML. CCXML 1.0 defines language elements + that allow for Dialogs to be prepared, started, and terminated; it + further allows for data to be returned by the dialog environment, for + call transfers to be requested (by the dialog) and responded to by + the CCXML application, and for arbitrary eventing between the CCXML + application and running dialog application. + + The interface described in this document can be used by CCXML 1.0 + implementations to control VoiceXML Media Servers. Note, however, + that some CCXML language features require eventing facilities between + CCXML and VoiceXML sessions that go beyond what is defined in this + specification. For example, VoiceXML-controlled call transfers and + mid-dialog, application-defined events cannot be fully realized using + this specification alone. A SIP event package [RFC3265] MAY be used + in addition to this specification to provide extended eventing. + +1.1.5. Other Use Cases + + In addition to the use cases described in some detail above, there + are a number of other intended use cases that are not described in + detail, such as: + + 1. Use of a VoiceXML Media Server as an adjunct to an IP-based + Private Branch Exchange / Automatic Call Distributor (PBX/ACD), + possibly to provide voicemail/messaging, automated attendant, or + other capabilities. + + 2. Invocation and control of a VoiceXML session that provides the + voice modality component in a multimodal system. + + + + + + + + + +Burke & Scott Standards Track [Page 6] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + +1.2. Terminology + + Application Server: A SIP Application Server hosts and executes + services, in particular by terminating SIP sessions on a media + server. The Application Server MAY also act as an HTTP server + [RFC2616] in interactions with media servers. + + VoiceXML Media Server: A VoiceXML interpreter including a SIP-based + interpreter context and the requisite media processing + capabilities to support VoiceXML functionality. + + VoiceXML Session: A VoiceXML Session is a multimedia session + comprising of at least a SIP User Agent, a VoiceXML Media Server, + the data streams between them, and an executing VoiceXML + application. + + VoiceXML Dialog: Equivalent to VoiceXML Session. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + +2. VoiceXML Session Establishment and Termination + + This section describes how to establish a VoiceXML Session, with or + without preparation, and how to terminate a session. This section + also addresses how session information is made available to VoiceXML + applications. + +2.1. Service Identification + + The SIP Request-URI is used to identify the VoiceXML media service. + The user part of the SIP Request-URI is fixed to "dialog". This is + done to ensure compatibility with [RFC4240], since this document + extends the dialog interface defined in that specification and + because this convention from [RFC4240] is widely adopted by existing + media servers. + + Standardizing the SIP Request-URI including the user part also + improves interoperability between Application Servers and media + servers, and reduces the provisioning overhead that would be required + if use of a media server by an Application Server required an + individually provisioned URI. In this respect, this document (and + [RFC4240]) do not add semantics to the user part, but rather + standardize the way that targets on media servers are provisioned. + Further, since Application Servers -- and not human beings -- are + generally the clients of media servers, issues such as interpretation + and internationalization do not apply. + + + +Burke & Scott Standards Track [Page 7] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + Exposing a VoiceXML media service with a well-known address may + enhance the possibility of exploitation: the VoiceXML Media Server is + RECOMMENDED to use standard SIP mechanisms to authenticate endpoints + as discussed in Section 9. + + The initial VoiceXML document is specified with the "voicexml" + parameter. In addition, parameters are defined that control how the + VoiceXML Media Server fetches the specified VoiceXML document. The + list of parameters defined by this specification is as follows (note + the parameter names are case-insensitive): + + voicexml: URI of the initial VoiceXML document to fetch. This will + typically contain an HTTP URI, but may use other URI schemes, for + example, to refer to local, static VoiceXML documents. If the + "voicexml" parameter is omitted, the VoiceXML Media Server may + select the initial VoiceXML document by other means, such as by + applying a default, or may reject the request. + + maxage: Used to set the max-age value of the Cache-Control header in + conjunction with VoiceXML documents fetched using HTTP, as per + [RFC2616]. If omitted, the VoiceXML Media Server will use a + default value. + + maxstale: Used to set the max-stale value of the Cache-Control + header in conjunction with VoiceXML documents fetched using HTTP, + as per [RFC2616]. If omitted, the VoiceXML Media Server will use + a default value. + + method: Used to set the HTTP method applied in the fetch of the + initial VoiceXML document. Allowed values are "get" or "post" + (case-insensitive). Default is "get". + + postbody: Used to set the application/x-www-form-urlencoded encoded + [HTML4] HTTP body for "post" requests (or is otherwise ignored). + + ccxml: Used to specify a "JSON value" [RFC4627] that is mapped to + the session.connection.ccxml VoiceXML session variable -- see + Section 2.4. + + aai: Used to specify a "JSON value" [RFC4627] that is mapped to the + session.connection.aai VoiceXML session variable -- see + Section 2.4. + + Other application-specific parameters may be added to the Request-URI + and are exposed in VoiceXML session variables (see Section 2.4). + + + + + + +Burke & Scott Standards Track [Page 8] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + Formally, the Request-URI for the VoiceXML media service has a fixed + user part "dialog". Seven URI parameters are defined (see the + definition of uri-parameter in Section 25.1 of [RFC3261]). + + dialog-param = "voicexml=" vxml-url ; vxml-url follows the URI + ; syntax defined in [RFC3986] + maxage-param = "maxage=" 1*DIGIT + maxstale-param = "maxstale=" 1*DIGIT + method-param = "method=" ("get" / "post") + postbody-param = "postbody=" token + ccxml-param = "ccxml=" json-value + aai-param = "aai=" json-value + json-value = false / + null / + true / + object / + array / + number / + string ; defined in [RFC4627] + + Parameters of the Request-URI in subsequent re-INVITEs are ignored. + One consequence of this is that the VoiceXML Media Server cannot be + instructed by the Application Server to change the executing VoiceXML + Application after a VoiceXML Session has been started. + + Special characters contained in the dialog-param, postbody-param, + ccxml-param, and aai-param values must be URL-encoded ("escaped") as + required by the SIP URI syntax, for example, '?' (%3f), '=' (%3d), + and ';' (%3b). The VoiceXML Media Server MUST therefore unescape + these parameter values before making use of them or exposing them to + running VoiceXML applications. It is important that the VoiceXML + Media Server only unescape the parameter values once since the + desired VoiceXML URI value could itself be URL encoded, for example. + + Since some applications may choose to transfer confidential + information, the VoiceXML Media Server MUST support the sips: scheme + as discussed in Section 9. + + Informative note: With respect to the postbody-param value, since the + application/x-www-form-urlencoded content itself escapes non- + alphanumeric characters by inserting %HH replacements, the escaping + rules above will result in the '%' characters being further escaped + in addition to the '&' and '=' name/value separators. + + + + + + + + +Burke & Scott Standards Track [Page 9] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + As an example, the following SIP Request-URI identifies the use of + VoiceXML media services, with + 'http://appserver.example.com/promptcollect.vxml' as the initial + VoiceXML document, to be fetched with max-age/max-stale values of + 3600s/0s, respectively: + + sip:dialog@mediaserver.example.com; \ + voicexml=http://appserver.example.com/promptcollect.vxml; \ + maxage=3600;maxstale=0 + +2.2. Initiating a VoiceXML Session + + A VoiceXML Session is initiated via the Application Server using a + SIP INVITE. Typically, the Application Server will be specialized in + providing VoiceXML services. At a minimum, the Application Server + may behave as a simple proxy by rewriting the Request-URI received + from the User Agent to a Request-URI suitable for consumption by the + VoiceXML Media Server (as specified in Section 2.1). For example, a + User Agent might present a dialed number: + + tel:+1-201-555-0123 + + that the Application Server maps to a directory assistance + application on the VoiceXML Media Server with a Request-URI of: + + sip:dialog@ms1.example.com; \ + voicexml=http://as1.example.com/da.vxml + + Certain header values in the INVITE message to the VoiceXML Media + Server are mapped into VoiceXML session variables and are specified + in Section 2.4. + + On receipt of the INVITE, the VoiceXML Media Server issues a + provisional response, 100 Trying, and commences the fetch of the + initial VoiceXML document. The 200 OK response indicates that the + VoiceXML document has been fetched and parsed correctly and is ready + for execution. Application execution commences on receipt of the ACK + (except if the dialog is being prepared as specified in Section 2.3). + Note that the 100 Trying response will usually be sent on receipt of + the INVITE in accordance with [RFC3261], since the VoiceXML Media + Server cannot in general guarantee that the initial fetch will + complete in less than 200 ms. However, certain implementations may + be able to guarantee response times to the initial INVITE, and thus + may not need to send a 100 Trying response. + + As an optimization, prior to sending the 200 OK response, the + VoiceXML Media Server MAY execute the application up to the point of + the first VoiceXML waiting state or prompt flush. + + + +Burke & Scott Standards Track [Page 10] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + A VoiceXML Media Server, like any SIP User Agent, may be unable to + accept the INVITE request for a variety of reasons. For instance, a + Session Description Protocol (SDP) offer contained in the INVITE + might require the use of codecs that are not supported by the Media + Server. In such cases, the Media Server should respond as defined by + [RFC3261]. However, there are error conditions specific to VoiceXML, + as follows: + + 1. If the Request-URI does not conform to this specification, a 400 + Bad Request MUST be returned (unless it is used to select other + services not defined by this specification). + + 2. If a URI parameter in the Request-URI is repeated, then the + request MUST be rejected with a 400 Bad Request response. + + 3. If the Request-URI does not include a "voicexml" parameter, and + the VoiceXML Media Server does not elect to use a default page, + the VoiceXML Media Server MUST return a final response of 400 Bad + Request, and it SHOULD include a Warning header with a 3-digit + code of 399 and a human-readable error message. + + 4. If the VoiceXML document cannot be fetched or parsed, the + VoiceXML Media Server MUST return a final response of 500 Server + Internal Error and SHOULD include a Warning header with a 3-digit + code of 399 and a human-readable error message. + + Informative note: Certain applications may pass a significant amount + of data to the VoiceXML dialog in the form of Request-URI parameters. + This may cause the total size of the INVITE request to exceed the MTU + of the underlying network. In such cases, applications/ + implementations must take care either to use a transport appropriate + to these larger messages (such as TCP) or to use alternative means of + passing the required information to the VoiceXML dialog (such as + supplying a unique session identifier in the initial VoiceXML URI and + later using that identifier as a key to retrieve data from the HTTP + server). + +2.3. Preparing a VoiceXML Session + + In certain scenarios, it is beneficial to prepare a VoiceXML Session + for execution prior to running it. A previously prepared VoiceXML + Session is expected to execute with minimal delay when instructed to + do so. + + + + + + + + +Burke & Scott Standards Track [Page 11] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + If a media-less SIP dialog is established with the initial INVITE to + the VoiceXML Media Server, the VoiceXML application will not execute + after receipt of the ACK. To run the VoiceXML application, the + Application Server (AS) must issue a re-INVITE to establish a media + session. + + A media-less SIP dialog can be established by sending an SDP + containing no media lines in the initial INVITE. Alternatively, if + no SDP is sent in the initial INVITE, the VoiceXML Media Server will + include an offer in the 200 OK message, which can be responded to + with an answer in the ACK with the media port(s) set to 0. + + Once a VoiceXML application is running, a re-INVITE that disables the + media streams (i.e., sets the ports to 0) will not otherwise affect + the executing application (except that recognition actions initiated + while the media streams are disabled will result in noinput + timeouts). + +2.4. Session Variable Mappings + + The standard VoiceXML session variables are assigned values according + to: + + session.connection.local.uri: Evaluates to the SIP URI specified in + the To: header of the initial INVITE. + + session.connection.remote.uri: Evaluates to the SIP URI specified in + the From: header of the initial INVITE. + + session.connection.redirect: This array is populated by information + contained in the History-Info [RFC4244] header in the initial + INVITE or is otherwise undefined. Each entry (hi-entry) in the + History-Info header is mapped, in reverse order, into an element + of the session.connection.redirect array. Properties of each + element of the array are determined as follows: + + * uri - Set to the hi-targeted-to-uri value of the History-Info + entry + + * pi - Set to 'true' if hi-targeted-to-uri contains a + "Privacy=history" parameter, or if the INVITE Privacy header + includes 'history'; 'false' otherwise + + * si - Set to the value of the "si" parameter if it exists, + undefined otherwise + + * reason - Set verbatim to the value of the "Reason" parameter of + hi-targeted-to-uri + + + +Burke & Scott Standards Track [Page 12] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + session.connection.protocol.name: Evaluates to "sip". Note that + this is intended to reflect the use of SIP in general, and does + not distinguish between whether the media server was accessed via + SIP or SIPS procedures. + + session.connection.protocol.version: Evaluates to "2.0". + + session.connection.protocol.sip.headers: This is an associative + array where each key in the array is the non-compact name of a SIP + header in the initial INVITE converted to lowercase (note the case + conversion does not apply to the header value). If multiple + header fields of the same field name are present, the values are + combined into a single comma-separated value. Implementations + MUST at a minimum include the Call-ID header and MAY include other + headers. For example, + session.connection.protocol.sip.headers["call-id"] evaluates to + the Call-ID of the SIP dialog. + + session.connection.protocol.sip.requesturi: This is an associative + array where the array keys and values are formed from the URI + parameters on the SIP Request-URI of the initial INVITE. The + array key is the URI parameter name converted to lowercase (note + the case conversion does not apply to the parameter value). The + corresponding array value is obtained by evaluating the URI + parameter value as a "JSON value" [RFC4627] in the case of the + ccxml-param and aai-param values and otherwise as a string. In + addition, the array's toString() function returns the full SIP + Request-URI. For example, assuming a Request-URI of sip:dialog@ + example.com;voicexml=http://example.com;aai=%7b"x":1%2c"y":true%7d + then session.connection.protocol.sip.requesturi["voicexml"] + evaluates to "http://example.com", + session.connection.protocol.sip.requesturi["aai"].x evaluates to 1 + (type Number), session.connection.protocol.sip.requesturi["aai"].y + evaluates to true (type Boolean), and + session.connection.protocol.sip.requesturi evaluates to the + complete Request-URI (type String) 'sip:dialog@ + example.com;voicexml=http://example.com;aai={"x":1,"y":true}'. + + session.connection.aai: Evaluates to + session.connection.protocol.sip.requesturi["aai"]. + + session.connection.ccxml: Evaluates to + session.connection.protocol.sip.requesturi["ccxml"]. + + + + + + + + +Burke & Scott Standards Track [Page 13] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + session.connection.protocol.sip.media: This is an array where each + array element is an object with the following properties: + + * type: - This required property indicates the type of the media + associated with the stream. The value is a string. It is + strongly recommended that the following values are used for + common types of media: "audio" for audio media, and "video" for + video media. + + * direction: - This required property indicates the + directionality of the media relative to + session.connection.originator. Defined values are sendrecv, + sendonly, recvonly, and inactive. + + * format: - This property is optional. If defined, the value of + the property is an array. Each array element is an object that + specifies information about one format of the media (there is + an array element for each payload type on the m-line). The + object contains at least one property called "name" whose value + is the MIME subtype of the media format (MIME subtypes are + registered in [RFC4855]). Other properties may be defined with + string values; these correspond to required and, if defined, + optional parameters of the format. + + As a consequence of this definition, there is an array entry in + session.connection.protocol.sip.media for each non-disabled m-line + for the negotiated media session. Note that this session variable + is updated if the media session characteristics for the VoiceXML + Session change (i.e., due to a re-INVITE). For an example, + consider a connection with bidirectional G.711 mu-law "audio" + sampled at 8 kHz. In this case, + session.connection.protocol.sip.media[0].type evaluates to + "audio", session.connection.protocol.sip.media[0].direction to + "sendrecv", + session.connection.protocol.sip.media[0].format[0].name evaluates + to "audio/PCMU", and + session.connection.protocol.sip.media[0].format[0].rate evaluates + to "8000". + + Note that when accessing SIP headers and Request-URI parameters via + the session.connection.protocol.sip.headers and + session.connection.protocol.sip.requesturi associative arrays defined + above, applications can choose between two semantically equivalent + ways of referring to the array. For example, either of the following + can be used to access a Request-URI parameter named "foo": + + session.connection.protocol.sip.requesturi["foo"] + session.connection.protocol.sip.requesturi.foo + + + +Burke & Scott Standards Track [Page 14] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + However, it is important to note that not all SIP header names or + Request-URI parameter names are valid ECMAScript identifiers, and as + such, can only be accessed using the first form (array notation). + For example, the Call-ID header can only be accessed as + session.connection.protocol.sip.headers["call-id"]; attempting to + access the same value as + session.connection.protocol.sip.headers.call-id would result in an + error. + +2.5. Terminating a VoiceXML Session + + The Application Server can terminate a VoiceXML Session by issuing a + BYE to the VoiceXML Media Server. Upon receipt of a BYE in the + context of an existing VoiceXML Session, the VoiceXML Media Server + MUST send a 200 OK response and MUST throw a + 'connection.disconnect.hangup' event to the VoiceXML application. If + the Reason header [RFC3326] is present on the BYE Request, then the + value of the Reason header is provided verbatim via the '_message' + variable within the catch element's anonymous variable scope. + + The VoiceXML Media Server may also initiate termination of the + session by issuing a BYE request. This will typically occur as a + result of encountering a <disconnect> or <exit> in the VoiceXML + application, due to the VoiceXML application running to completion, + or due to unhandled errors within the VoiceXML application. + + See Section 4 for mechanisms to return data to the Application + Server. + + + + + + + + + + + + + + + + + + + + + + + +Burke & Scott Standards Track [Page 15] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + +2.6. Examples + +2.6.1. Basic Session Establishment + + This example illustrates an Application Server setting up a VoiceXML + Session on behalf of a User Agent. + + SIP VoiceXML HTTP + User Application Media Application + Agent Server Server Server + | | | | + |(1) INVITE [offer] | | | + |------------------->|(2) INVITE [offer] | | + |(3) 100 Trying |------------------->| | + |<-------------------|(4) 100 Trying | | + | |<-------------------| | + | | | | + | | |(5) GET | + | | |------------------->| + | | |(6) 200 OK [VXML] | + | | |<-------------------| + | | | | + | |(7) 200 OK [answer] | | + |(8) 200 OK [answer] |<-------------------| | + |<-------------------| | | + |(9) ACK | | | + |------------------->|(10) ACK | | + | |------------------->| (execute | + |(11) RTP/SRTP | | VoiceXML | + |.........................................| application) | + | | | | + + + + + + + + + + + + + + + + + + + + +Burke & Scott Standards Track [Page 16] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + +2.6.2. VoiceXML Session Preparation + + This example demonstrates the preparation of a VoiceXML Session. In + this example, the VoiceXML session is prepared prior to placing an + outbound call to a User Agent, and is started as soon as the User + Agent answers. + + The [answer1:0] notation is used to indicate an SDP answer with the + media ports set to 0. + + SIP VoiceXML HTTP + User Application Media Application + Agent Server Server Server + | | | | + | |(1) INVITE | | + | |-------------------->| | + | |(2) 100 Trying | | + | |<--------------------| | + | | | | + | | |(3) GET | + | | |------------------->| + | | |(4) 200 OK [VXML] | + | | |<-------------------| + | | | | + | |(5) 200 OK [offer1] | | + | |<--------------------| | + | |(6) ACK [answer1:0] | | + |(7) INVITE |-------------------->| | + |<-------------------| | | + |(8) 200 OK [offer2] | | | + |------------------->|(9) INVITE [offer2'] | | + | |-------------------->| | + | |(10) 100 Trying | | + | |<--------------------| | + | |(11) 200 OK [answer2]| | + |(12) ACK [answer2] |<--------------------| | + |<-------------------|(13) ACK | | + | |-------------------->| (execute | + |(14) RTP/SRTP | VoiceXML | + |..........................................| application) | + | | | | + + Implementation detail: offer2' is derived from offer2 -- it + duplicates the m-lines and a-lines from offer2. However, offer2' + differs from offer2 since it must contain the same o-line as used in + answer1:0 but with the version number incremented. Also, if offer1 + has more m-lines than offer2, then offer2' must be padded with extra + (rejected) m-lines. + + + +Burke & Scott Standards Track [Page 17] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + +2.6.3. MRCP Establishment + + Media Resource Control Protocol (MRCP) [MRCPv2] is a protocol that + enables clients such as a VoiceXML Media Server to control media + service resources such as speech synthesizers, recognizers, + verifiers, and identifiers residing in servers on the network. + + The example below illustrates how a VoiceXML Media Server may + establish an MRCP session in response to an initial INVITE. + + VoiceXML HTTP + User Media MRCPv2 Application + Agent Server Server Server + | | | | + |(1) INVITE [offer1] | | | + |------------------->| | | + |(2) 100 Trying | | | + |<-------------------|(3) GET | | + | |---------------------------------------->| + | | | | + | |(4) 200 OK [VXML] | | + | |<----------------------------------------| + | | | | + | |(5) INVITE [offer2] | | + | |--------------------->| | + | | | | + | |(6) 200 OK [answer2] | | + | |<---------------------| | + | | | | + | |(7) ACK | | + | |--------------------->| | + | | | | + | |(8) MRCP connection | | + | |<-------------------->| | + |(9) 200 OK [answer1]| | | + |<-------------------| | | + | | | | + |(10) ACK | | | + |------------------->| | | + | | | | + |(11) RTP/SRTP | | | + |...........................................| | + | | | | + + + + + + + + +Burke & Scott Standards Track [Page 18] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + In this example, the VoiceXML Media Server is responsible for + establishing a session with the MRCPv2 Media Resource Server prior to + sending the 200 OK response to the initial INVITE. The VoiceXML + Media Server will perform the appropriate offer/answer with the + MRCPv2 Media Resource Server based on the SDP capabilities of the + Application Server and the MRCPv2 Media Resource Server. The + VoiceXML Media Server will change the offer received from step 1 to + establish an MRCPv2 session in step (5) and will re-write the SDP to + include an m-line for each MRCPv2 resource to be used and other + required SDP modifications as specified by MRCPv2. Once the VoiceXML + Media Server performs the offer/answer with the MRCPv2 Media Resource + Server, it will establish an MRCPv2 control channel in step (8). The + MRCPv2 resource is deallocated when the VoiceXML Media Server + receives or sends a BYE (not shown). + +3. Media Support + + This section describes the mandatory and optional media support + required by this interface. + +3.1. Offer/Answer + + The VoiceXML Media Server MUST support the standard offer/answer + mechanism of [RFC3264]. In particular, if an SDP offer is not + present in the INVITE, the VoiceXML Media Server will make an offer + in the 200 OK response listing its supported codecs. + +3.2. Early Media + + The VoiceXML Media Server MAY support early establishment of media + streams as described in [RFC3960]. This allows the Application + Server to establish media streams between a User Agent and the + VoiceXML Media Server in parallel with the initial VoiceXML document + being processed (which may involve dynamic VoiceXML page generation + and interaction with databases or other systems). This is useful + primarily for minimizing the delay in starting a VoiceXML Session, + particularly in cases where a session with the User Agent already + exists but the media stream associated with that session needs to be + redirected to a VoiceXML Media Server. + + The following flow demonstrates the use of early media (using the + Gateway model defined in [RFC3960]): + + + + + + + + + +Burke & Scott Standards Track [Page 19] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + SIP VoiceXML HTTP + User Application Media Application + Agent Server Server Server + | | | | + |..(existing session)..| | | + | |(1) INVITE | | + | |------------------>| | + | | |(2) HTTP GET | + | | |------------------>| + | |(3) 183 [offer] | | + |(4) re-INVITE [offer] |<------------------| | + |<---------------------| | | + |(5) 200 OK [answer] | | | + |--------------------->| | | + |(6) ACK | | | + |<---------------------| | | + | | (7) PRACK [answer]| | + | |------------------>| | + | | (8) PRACK 200 OK | | + | |<------------------| | + |(9) RTP/SRTP | | | + |..........................................| | + | | |(10) 200 OK [VXML] | + | | |<------------------| + | | | | + | |(11) 200 OK | | + | |<------------------| | + | |(12) ACK | | + | |------------------>| (execute | + | | | VoiceXML | + | | | application) | + | | | | + + Although [RFC3960] prefers the use of the Application Server model + for early media over the Gateway model, the primary issue with the + Gateway model -- forking -- is significantly less common when issuing + requests to VoiceXML Media Servers. This is because VoiceXML Media + Servers respond to all requests with 200 OK responses in the absence + of unusual errors, and they typically do so within several hundred + milliseconds. This makes them unlikely targets in forking scenarios, + since alternative targets of the forking process would virtually + never be able to respond more quickly than an automated system, + unless they are themselves automated systems -- in which case, there + is little point in setting up a response time race between two + automated systems. Issues with ringing tone generation in the + Gateway model are also mitigated, both by the typically quick 200 OK + + + + + +Burke & Scott Standards Track [Page 20] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + response time, and because this specification mandates that no media + packets are generated until the receipt of an ACK (thus eliminating + the need for the User Agent to perform media packet analysis). + + Note that the offer of early media by a VoiceXML Media Server does + not imply that the referenced VoiceXML application can always be + fetched and executed successfully. For instance, if the HTTP + Application Server were to return a 4xx response in step 10 above, or + if the provided VoiceXML content was not valid, the VoiceXML Media + Server would still return a 500 response (as per Section 2.2). At + this point, it would be the responsibility of the Application Server + to tear down any media streams established with the media server. + +3.3. Modifying the Media Session + + The VoiceXML Media Server MUST allow the media session to be modified + via a re-INVITE and SHOULD support the UPDATE method [RFC3311] for + the same purpose. In particular, it MUST be possible to change + streams between sendrecv, sendonly, and recvonly as specified in + [RFC3264]. + + Unidirectional streams are useful for announcement- or listening-only + (hotword). The preferred mechanism for putting the media session on + hold is specified in [RFC3264], i.e., the UA modifies the stream to + be sendonly and mutes its own stream. Modification of the media + session does not affect VoiceXML application execution (except that + recognition actions initiated while on hold will result in noinput + timeouts). + +3.4. Audio and Video Codecs + + For the purposes of achieving a basic level of interoperability, this + section specifies a minimal subset of codecs and RTP [RFC3550] + payload formats that MUST be supported by the VoiceXML Media Server. + + For audio-only applications, G.711 mu-law and A-law MUST be supported + using the RTP payload type 0 and 8 [RFC3551]. Other codecs and + payload formats MAY be supported. + + Video telephony applications, which employ a video stream in addition + to the audio stream, are possible in VoiceXML 2.0/2.1 through the use + of multimedia file container formats such as the .3gp [TS26244] and + .mp4 formats [IEC14496-14]. Video support is optional for this + specification. If video is supported then: + + 1. H.263 Baseline [RFC4629] MUST be supported. For legacy reasons, + the 1996 version of H.263 MAY be supported using the RTP payload + format defined in [RFC2190] (payload type 34 [RFC3551]). + + + +Burke & Scott Standards Track [Page 21] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + 2. Adaptive Multi-Rate (AMR) narrow band audio [RFC4867] SHOULD be + supported. + + 3. MPEG-4 video [RFC3016] SHOULD be supported. + + 4. MPEG-4 Advanced Audio Coding (AAC) audio [RFC3016] SHOULD be + supported. + + 5. Other codecs and payload formats MAY be supported. + + Video record operations carried out by the VoiceXML Media Server + typically require receipt of an intra-frame before the recording can + commence. The VoiceXML Media Server SHOULD use the mechanism + described in [RFC4585] to request that a new intra-frame be sent. + + Since some applications may choose to transfer confidential + information, the VoiceXML Media Server MUST support Secure RTP (SRTP) + [RFC3711] as discussed in Section 9. + +3.5. DTMF + + DTMF events [RFC4733] MUST be supported. When the User Agent does + not indicate support for [RFC4733], the VoiceXML Media Server MAY + perform DTMF detection using other means such as detecting DTMF tones + in the audio stream. Implementation note: the reason only [RFC4733] + telephone-events must be used when the User Agent indicates support + of it is to avoid the risk of double detection of DTMF if detection + on the audio stream was simultaneously applied. + +4. Returning Data to the Application Server + + This section discusses the mechanisms for returning data (e.g., + collected utterance or digit information) from the VoiceXML Media + Server to the Application Server. + +4.1. HTTP Mechanism + + At any time during the execution of the VoiceXML application, data + can be returned to the Application Server via HTTP using standard + VoiceXML elements such as <submit> or <subdialog>. Notably, the + <data> element in VoiceXML 2.1 [VXML21] allows data to be sent to the + Application Server efficiently without requiring a VoiceXML page + transition and is ideal for short VoiceXML applications such as + "prompt and collect". + + For most applications, it is necessary to correlate the information + being passed over HTTP with a particular VoiceXML Session. One way + this can be achieved is to include the SIP Call-ID (accessible in + + + +Burke & Scott Standards Track [Page 22] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + VoiceXML via the session.connection.protocol.sip.headers array) + within the HTTP POST fields. Alternatively, a unique "POST-back URI" + can be specified as an application-specific URI parameter in the + Request-URI of the initial INVITE (accessible in VoiceXML via the + session.connection.protocol.sip.requesturi array). + + Since some applications may choose to transfer confidential + information, the VoiceXML Media Server MUST support the https: scheme + as discussed in Section 9. + +4.2. SIP Mechanism + + Data can be returned to the Application Server via the expr or + namelist attribute on <exit> or the namelist attribute on + <disconnect>. A VoiceXML Media Server MUST support encoding of the + expr/namelist data in the message body of a BYE request sent from the + VoiceXML Media Server as a result of encountering the <exit> or + <disconnect> element. A VoiceXML Media Server MAY support inclusion + of the expr/namelist data in the message body of the 200 OK message + in response to a received BYE request (i.e., when the VoiceXML + application responds to the connection.disconnect.hangup event and + subsequently executes an <exit> element with the expr or namelist + attribute specified). + + Note that sending expr/namelist data in the 200 OK response requires + that the VoiceXML Media Server delay the final response to the + received BYE request until the VoiceXML application's post-disconnect + final processing state terminates. This mechanism is subject to the + constraint that the VoiceXML Media Server must respond before the + User Agent Client's (UAC's) timer F expires (defaults to 32 seconds). + Moreover, for unreliable transports, the UAC will retransmit the BYE + request according to the rules of [RFC3261]. The VoiceXML Media + Server SHOULD implement the recommendations of [RFC4320] regarding + when to send the 100 Trying provisional response to the BYE request. + + If a VoiceXML application executes a <disconnect> [VXML21] and then + subsequently executes an <exit> with namelist information, the + namelist information from the <exit> element is discarded. + + Namelist variables are first converted to their "JSON value" + equivalent [RFC4627] and encoded in the message body using the + application/x-www-form-urlencoded format content type [HTML4]. The + behavior resulting from specifying a recording variable in the + namelist or an ECMAScript object with circular references is not + defined. If the expr attribute is specified on the <exit> element + instead of the namelist attribute, the reserved name __exit is used. + + + + + +Burke & Scott Standards Track [Page 23] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + To allow the Application Server to differentiate between a BYE + resulting from a <disconnect> from one resulting from an <exit>, the + reserved name __reason is used, with a value of "disconnect" (without + brackets) to reflect the use of VoiceXML's <disconnect> element, and + a value of "exit" (without brackets) to an explicit <exit> in the + VoiceXML document. If the session terminates for other reasons (such + as the media server encountering an error), this parameter may be + omitted, or may take on platform-specific values prefixed with an + underscore. + + This specification extends the application/x-www-form-urlencoded by + replacing non-ASCII characters with one or more octets of the UTF-8 + representation of the character, with each octet in turn replaced by + %HH, where HH represents the uppercase hexadecimal notation for the + octet value and % is a literal character. As a consequence, the + Content-Type header field in a BYE message containing expr/namelist + data MUST be set to application/x-www-form-urlencoded;charset=utf-8. + + The following table provides some examples of <exit> usage and the + corresponding result content. + + +----------------------------------------------------------------+ + |<exit> Usage | Result Content | + |------------------------------|---------------------------------| + |<exit/> | __reason=exit | + |<exit expr="5"/> | __exit=5&__reason=exit | + |<exit expr="'done'"/> | __exit="done"&__reason=exit | + |<exit expr="userAuthorized"/> | __exit=true&__reason=exit | + |<exit namelist="pin errors"/> | pin=1234&errors=0&__reason=exit | + +----------------------------------------------------------------+ + assuming the following VoiceXML variables and values: + userAuthorized = true + pin = 1234 + errors = 0 + + For example, consider the VoiceXML snippet: + + ... + <exit namelist="id pin"/> + ... + + If id equals 1234 and pin equals 9999, say, the BYE message would + look similar to: + + + + + + + + +Burke & Scott Standards Track [Page 24] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + BYE sip:user@pc33.example.com SIP/2.0 + Via: SIP/2.0/UDP 192.0.2.4;branch=z9hG4bKnashds10 + Max-Forwards: 70 + From: sip:dialog@example.com;tag=a6c85cf + To: sip:user@example.com;tag=1928301774 + Call-ID: a84b4c76e66710 + CSeq: 231 BYE + Content-Type: application/x-www-form-urlencoded;charset=utf-8 + Content-Length: 30 + + id=1234&pin=9999&__reason=exit + + Since some applications may choose to transfer confidential + information, the VoiceXML Media Server MUST support the S/MIME + encoding of SIP message bodies as discussed in Section 9. + +5. Outbound Calling + + Outbound calls can be triggered via the Application Server using + third-party call control [RFC3725]. + + Flow IV from [RFC3725] is recommended in conjunction with the + VoiceXML Session preparation mechanism. This flow has several + advantages over others, namely: + + 1. Selection of a VoiceXML Media Server and preparation of the + VoiceXML application can occur before the call is placed to avoid + the callee experiencing delays. + + 2. Avoidance of timing difficulties that could occur with other + flows due to the time taken to fetch and parse the initial + VoiceXML document. + + 3. The flow is IPv6 compatible. + + An example flow for an Application-Server-initiated outbound call is + provided in Section 2.6.2. + +6. Call Transfer + + While VoiceXML is at its core a dialog language, it also provides + optional call transfer capability. VoiceXML's transfer capability is + particularly suited to the PSTN IVR Service Node use case described + in Section 1.1.2. It is NOT RECOMMENDED to use VoiceXML's call + transfer capability in networks involving Application Servers. + Rather, the Application Server itself can provide call routing + + + + + +Burke & Scott Standards Track [Page 25] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + functionality by taking signaling actions based on the data returned + to it from the VoiceXML Media Server via HTTP or in the SIP BYE + message. + + If VoiceXML transfer is supported, the mechanism described in this + section MUST be employed. The transfer flows specified here are + selected on the basis that they provide the best interworking across + a wide range of SIP devices. CCXML<->VoiceXML implementations, which + require tight-coupling in the form of bidirectional eventing to + support all transfer types defined in VoiceXML, may benefit from + other approaches, such as the use of SIP event packages [RFC3265]. + + In what follows, the provisional responses have been omitted for + clarity. + +6.1. Blind + + The blind-transfer sequence is initiated by the VoiceXML Media Server + via a REFER message [RFC3515] on the original SIP dialog. The + Refer-To header contains the URI for the called party, as specified + via the dest or destexpr attributes on the VoiceXML <transfer> tag. + + If the REFER request is accepted, in which case the VoiceXML Media + Server will receive a 2xx response, the VoiceXML Media Server throws + the connection.disconnect.transfer event and will terminate the + VoiceXML Session with a BYE message. For blind transfers, + implementations MAY use [RFC4488] to suppress the implicit + subscription associated with the REFER message. + + If the REFER request results in a non-2xx response, the <transfer>'s + form item variable (or event raised) depends on the SIP response and + is specified in the following table. Note that this indicates that + the transfer request was rejected. + + +-------------------------+-----------------------------------+ + | SIP Response | <transfer> variable / event | + +-------------------------+-----------------------------------+ + | 404 Not Found | error.connection.baddestination | + | 405 Method Not Allowed | error.unsupported.transfer.blind | + | 503 Service Unavailable | error.connection.noresource | + | (No response) | network_busy | + | (Other 3xx/4xx/5xx/6xx) | unknown | + +-------------------------+-----------------------------------+ + + + + + + + + +Burke & Scott Standards Track [Page 26] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + An example is illustrated below (provisional responses and NOTIFY + messages corresponding to provisional responses have been omitted for + clarity). + + User Agent 1 VoiceXML User Agent 2 + (Caller) Media Server (Callee) + | | | + |(0) RTP/SRTP | | + |.................| | + | | | + |(1) REFER | <transfer> | + |<----------------| | + |(2) 202 Accepted | | + |---------------->| | + |(3) BYE | | + |<----------------| | + |(4) 200 OK | | + |---------------->| | + | | Stop RTP (0) | + |(5) INVITE | + |---------------------------------->| + |(6) 200 OK | + |<----------------------------------| + |(7) NOTIFY | | + |---------------->| | + |(8) 200 OK | | + |<--------------- | | + |(9) ACK | + |---------------------------------->| + |(10) RTP/SRTP | + |...................................| + | | | + + If the aai or aaiexpr attribute is present on <transfer>, it is + appended to the Refer-To URI as a parameter named "aai" in the REFER + method. Reserved characters are URL-encoded as required for SIP/SIPS + URIs [RFC3261]. The mapping of values outside of the ASCII range is + platform specific. + +6.2. Bridge + + The bridge transfer function results in the creation of a small + multi-party session involving the Caller, the VoiceXML Media Server, + and the Callee. The VoiceXML Media Server invites the Callee to the + session and will eject the Callee if the transfer is terminated. + + + + + + +Burke & Scott Standards Track [Page 27] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + If the aai or aaiexpr attribute is present on <transfer>, it is + appended to the Request-URI in the INVITE as a URI parameter named + "aai". Reserved characters are URL-encoded as required for SIP/SIPS + URIs [RFC3261]. The mapping of values outside of the ASCII range is + platform specific. + + During the transfer attempt, audio specified in the transferaudio + attribute of <transfer> is streamed to User Agent 1. A VoiceXML + Media Server MAY play early media received from the Callee to the + Caller if the transferaudio attribute is omitted. + + The bridge transfer sequence is illustrated below. The VoiceXML + Media Server (acting as a UAC) makes a call to User Agent 2 with the + same codecs used by User Agent 1. When the call setup is complete, + RTP flows between User Agent 2 and the VoiceXML Media Server. This + stream is mixed with User Agent 1's. + + User Agent 1 VoiceXML User Agent 2 + (Caller) Media Server (Callee) + | | | + |(0)RTP/SRTP | | + |...................| | + | | | + | <transfer>|(1)INVITE [offer] | + | |------------------>| + | |(2) 200 OK [answer]| + | |<------------------| + | |(3) ACK | + | |------------------>| + | |(4) RTP/SRTP | + | mix |...................| + | (0)+(4)| | + + If a final response is not received from User Agent 2 from the INVITE + and the connecttimeout expires (specified as an attribute of + <transfer>), the VoiceXML Media Server will issue a CANCEL to + terminate the transaction and the <transfer>'s form item variable is + set to noanswer. + + If INVITE results in a non-2xx response, the <transfer>'s form item + variable (or event raised) depends on the SIP response and is + specified in the following table. + + + + + + + + + +Burke & Scott Standards Track [Page 28] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + +-------------------------+-----------------------------------+ + | SIP Response | <transfer> variable / event | + +-------------------------+-----------------------------------+ + | 404 Not Found | error.connection.baddestination | + | 405 Method Not Allowed | error.unsupported.transfer.bridge | + | 408 Request Timeout | noanswer | + | 486 Busy Here | busy | + | 503 Service Unavailable | error.connection.noresource | + | (No response) | network_busy | + | (Other 3xx/4xx/5xx/6xx) | unknown | + +-------------------------+-----------------------------------+ + + Once the transfer is established, the VoiceXML Media Server can + "listen" to the media stream from User Agent 1 to perform speech or + DTMF hotword, which when matched results in a near-end disconnect, + i.e., the VoiceXML Media Server issues a BYE to User Agent 2 and the + VoiceXML application continues with User Agent 1. A BYE will also be + issued to User Agent 2 if the call duration exceeds the maximum + duration specified in the maxtime attribute on <transfer>. + + If User Agent 2 issues a BYE during the transfer, the transfer + terminates and the VoiceXML <transfer>'s form item variable receives + the value far_end_disconnect. If User Agent 1 issues a BYE during + the transfer, the transfer terminates and the VoiceXML event + connection.disconnect.transfer is thrown. + +6.3. Consultation + + The consultation transfer (also called attended transfer [RFC5359]) + is similar to a blind transfer except that the outcome of the + transfer call setup is known and the Caller is not dropped as a + result of an unsuccessful transfer attempt. + + Consultation transfer commences with the same flow as for bridge + transfer except that the RTP streams are not mixed at step (4) and + error.unsupported.transfer.consultation supplants + error.unsupported.transfer.bridge. Assuming a new SIP dialog with + User Agent 2 is created, the remainder of the sequence follows as + illustrated below (provisional responses and NOTIFY messages + corresponding to provisional responses have been omitted for + clarity). Consultation transfer makes use of the Replaces: header + [RFC3891] such that User Agent 1 calls User Agent 2 and replaces the + latter's SIP dialog with the VoiceXML Media Server with a new SIP + dialog between the Caller and Callee. + + + + + + + +Burke & Scott Standards Track [Page 29] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + User Agent 1 VoiceXML User Agent 2 + (Caller) Media Server (Callee) + | | | + |(0) RTP/SRTP | | + |.................|(4) RTP/SRTP | + | |.................| + |(5) REFER | | + |<----------------| | + |(6) 202 Accepted | | + |---------------->| | + |(7) INVITE Replaces:ms1.example.com| + |---------------------------------->| + |(8) 200 OK | + |<----------------------------------| + |(9) ACK | + |---------------------------------->| + |(10) RTP/SRTP | + |...................................| + | |(11) BYE | + | |<----------------| + | |(12) 200 OK | + | |---------------->| Stop + |(13) NOTIFY | | RTP (4) + |---------------->| | + |(14) 200 OK | | + |<----------------| | + |(15) BYE | | + |<----------------| | + |(16) 200 OK | | + |---------------->| Stop | + | | RTP (0) | + + If a response other than 202 Accepted is received in response to the + REFER request sent to User Agent 1, the transfer terminates and an + error.unsupported.transfer.consultation event is raised. In + addition, a BYE is sent to User Agent 2 to terminate the established + outbound leg. + + The VoiceXML Media Server uses receipt of a NOTIFY message with a + sipfrag message of 200 OK to determine that the consultation transfer + has succeeded. When this occurs, the connection.disconnect.transfer + event will be thrown to the VoiceXML application, and a BYE is sent + to User Agent 1 to terminate the session. A NOTIFY message with a + non-2xx final response sipfrag message body will result in the + transfer terminating and the associated VoiceXML input item variable + being set to 'unknown'. Note that as a consequence of this + + + + + +Burke & Scott Standards Track [Page 30] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + mechanism, implementations MUST NOT use [RFC4488] to suppress the + implicit subscription associated with the REFER message for + consultation transfers. + +7. Contributors + + The bulk of the early work for this effort was carried out on weekly + teleconferences and over email. The authors would particularly like + to recognize the contributions of R. J. Auburn (Voxeo), Jeff Haynie + (Hakano), and Scott McGlashan (Hewlett-Packard). + +8. Acknowledgements + + This document owes its genesis to, "A SIP Interface to VoiceXML + Dialog Servers", authored by J. Rosenberg, P. Mataga, and D. Ladd. + The following people had input to the current document: + + R. J. Auburn (Voxeo) + + Hans Bjurstrom (Hewlett-Packard) + + Emily Candell (Comverse) + + Peter Danielsen (Lucent) + + Brian Frasca (Tellme) + + Jeff Haynie (Hakano) + + Scott McGlashan (Hewlett-Packard) + + Matt Oshry (Tellme) + + Rao Surapaneni (Tellme) + + The authors would like to acknowledge the support of Cullen Jennings + and the Mediactrl chairs, Eric Burger and Spencer Dawkins. + +9. Security Considerations + + Exposing a VoiceXML media service with a well-known address may + enhance the possibility of exploitation (for example, an invoked + network service may trigger a billing event). The VoiceXML Media + Server is RECOMMENDED to use standard SIP mechanisms [RFC3261] to + authenticate requesting endpoints and authorize per local policy. + + + + + + +Burke & Scott Standards Track [Page 31] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + Some applications may choose to transfer confidential information to + or from the VoiceXML Media Server. To provide data confidentiality, + the VoiceXML Media Server MUST implement the sips: and https: schemes + in addition to S/MIME message body encoding as described in + [RFC3261]. + + The VoiceXML Media Server MUST support Secure RTP (SRTP) [RFC3711] to + provide confidentiality, authentication, and replay protection for + RTP media streams (including RTCP control traffic). + + To mitigate the possibility of denial-of-service attacks, the + VoiceXML Media Server is RECOMMENDED (in addition to authenticating + and authorizing endpoints described above) to provide mechanisms for + implementing local policies such as the time-limiting of VoiceXML + application execution. + +10. IANA Considerations + + IANA has registered the following parameters in the SIP/SIPS URI + Parameters registry, following the Specification Required policy of + [RFC3969]: + + Parameter Name Predefined Values Reference + -------------- ----------------- --------- + maxage No RFC 5552 + maxstale No RFC 5552 + method "get" / "post" RFC 5552 + postbody No RFC 5552 + ccxml No RFC 5552 + aai No RFC 5552 + +11. References + +11.1. Normative References + + [HTML4] Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01 + Specification", W3C Recommendation, Dec 1999. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., + Masinter, L., Leach, P., and T. Berners-Lee, + "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, + June 1999. + + + + + + +Burke & Scott Standards Track [Page 32] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + [RFC3016] Kikuchi, Y., Nomura, T., Fukunaga, S., Matsui, Y., and + H. Kimata, "RTP Payload Format for MPEG-4 Audio/Visual + Streams", RFC 3016, November 2000. + + [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., + Johnston, A., Peterson, J., Sparks, R., Handley, M., + and E. Schooler, "SIP: Session Initiation Protocol", + RFC 3261, June 2002. + + [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer + Model with Session Description Protocol (SDP)", + RFC 3264, June 2002. + + [RFC3265] Roach, A., "Session Initiation Protocol (SIP)-Specific + Event Notification", RFC 3265, June 2002. + + [RFC3311] Rosenberg, J., "The Session Initiation Protocol (SIP) + UPDATE Method", RFC 3311, October 2002. + + [RFC3326] Schulzrinne, H., Oran, D., and G. Camarillo, "The + Reason Header Field for the Session Initiation + Protocol (SIP)", RFC 3326, December 2002. + + [RFC3515] Sparks, R., "The Session Initiation Protocol (SIP) + Refer Method", RFC 3515, April 2003. + + [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. + Jacobson, "RTP: A Transport Protocol for Real-Time + Applications", STD 64, RFC 3550, July 2003. + + [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio + and Video Conferences with Minimal Control", STD 65, + RFC 3551, July 2003. + + [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and + K. Norrman, "The Secure Real-time Transport Protocol + (SRTP)", RFC 3711, March 2004. + + [RFC3725] Rosenberg, J., Peterson, J., Schulzrinne, H., and G. + Camarillo, "Best Current Practices for Third Party + Call Control (3pcc) in the Session Initiation Protocol + (SIP)", BCP 85, RFC 3725, April 2004. + + [RFC3891] Mahy, R., Biggs, B., and R. Dean, "The Session + Initiation Protocol (SIP) "Replaces" Header", + RFC 3891, September 2004. + + + + + +Burke & Scott Standards Track [Page 33] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, + "Uniform Resource Identifier (URI): Generic Syntax", + STD 66, RFC 3986, January 2005. + + [RFC4244] Barnes, M., "An Extension to the Session Initiation + Protocol (SIP) for Request History Information", + RFC 4244, November 2005. + + [RFC4320] Sparks, R., "Actions Addressing Identified Issues with + the Session Initiation Protocol's (SIP) Non-INVITE + Transaction", RFC 4320, January 2006. + + [RFC4488] Levin, O., "Suppression of Session Initiation Protocol + (SIP) REFER Method Implicit Subscription", RFC 4488, + May 2006. + + [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. + Rey, "Extended RTP Profile for Real-time Transport + Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", + RFC 4585, July 2006. + + [RFC4627] Crockford, D., "The application/json Media Type for + JavaScript Object Notation (JSON)", RFC 4627, + July 2006. + + [RFC4629] Ott, H., Bormann, C., Sullivan, G., Wenger, S., and R. + Even, "RTP Payload Format for ITU-T Rec", RFC 4629, + January 2007. + + [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF + Digits, Telephony Tones, and Telephony Signals", + RFC 4733, December 2006. + + [RFC4855] Casner, S., "Media Type Registration of RTP Payload + Formats", RFC 4855, February 2007. + + [RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. + Xie, "RTP Payload Format and File Storage Format for + the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate + Wideband (AMR-WB) Audio Codecs", RFC 4867, April 2007. + + [VXML20] McGlashan, S., Burnett, D., Carter, J., Danielsen, P., + Ferrans, J., Hunt, A., Lucas, B., Porter, B., Rehor, + K., and S. Tryphonas, "Voice Extensible Markup + Language (VoiceXML) Version 2.0", W3C Recommendation, + March 2004. + + + + + +Burke & Scott Standards Track [Page 34] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + + [VXML21] Oshry, M., Auburn, R J., Baggia, P., Bodell, M., + Burke, D., Burnett, D., Candell, E., Kilic, H., + McGlashan, S., Lee, A., Porter, B., and K. Rehor, + "Voice Extensible Markup Language (VoiceXML) Version + 2.1", W3C Candidate Recommendation, June 2005. + +11.2. Informative References + + [CCXML10] Auburn, R J., "Voice Browser Call Control: CCXML + Version 1.0", W3C Working Draft, June 2005. + + [IEC14496-14] "Information technology. Coding of audio-visual + objects. MP4 file format", ISO/IEC ISO/IEC 14496- + 14:2003, October 2003. + + [MRCPv2] Shanmugham, S. and D. Burnett, "Media Resource Control + Protocol Version 2 (MRCPv2)", Work in Progress, + November 2008. + + [RFC2190] Zhu, C., "RTP Payload Format for H.263 Video Streams", + RFC 2190, September 1997. + + [RFC3960] Camarillo, G. and H. Schulzrinne, "Early Media and + Ringing Tone Generation in the Session Initiation + Protocol (SIP)", RFC 3960, December 2004. + + [RFC3969] Camarillo, G., "The Internet Assigned Number Authority + (IANA) Uniform Resource Identifier (URI) Parameter + Registry for the Session Initiation Protocol (SIP)", + BCP 99, RFC 3969, December 2004. + + [RFC4240] Burger, E., Van Dyke, J., and A. Spitzer, "Basic + Network Media Services with SIP", RFC 4240, + December 2005. + + [RFC5359] Johnston, A., Sparks, R., Cunningham, C., Donovan, S., + and K. Summers, "Session Initiation Protocol Service + Examples", BCP 144, RFC 5359, October 2008. + + [TS23002] "3rd Generation Partnership Project: Network + architecture (Release 6)", 3GPP TS 23.002 v6.6.0, + December 2004. + + [TS26244] "Transparent end-to-end packet switched streaming + service (PSS); 3GPP file format (3GP)", 3GPP TS 26.244 + v6.4.0, December 2004. + + + + + +Burke & Scott Standards Track [Page 35] + +RFC 5552 SIP Interface to VoiceXML Media Services May 2009 + + +Appendix A. Notes on Normative References + + We make a "downref" normative reference to [RFC4627] -- an + Informational document describing a proprietary (but extremely + popular) format. + +Authors' Addresses + + Dave Burke + Google + Belgrave House, 76 Buckingham Palace Road + London SW1W 9TQ + United Kingdom + + EMail: daveburke@google.com + + + Mark Scott + Genesys + 1120 Finch Avenue West, 8th floor + Toronto, Ontario M3J 3H7 + Canada + + EMail: Mark.Scott@genesyslab.com + + + + + + + + + + + + + + + + + + + + + + + + + + + +Burke & Scott Standards Track [Page 36] + |