summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc5629.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc5629.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc5629.txt')
-rw-r--r--doc/rfc/rfc5629.txt2131
1 files changed, 2131 insertions, 0 deletions
diff --git a/doc/rfc/rfc5629.txt b/doc/rfc/rfc5629.txt
new file mode 100644
index 0000000..eb0cfd0
--- /dev/null
+++ b/doc/rfc/rfc5629.txt
@@ -0,0 +1,2131 @@
+
+
+
+
+
+
+Network Working Group J. Rosenberg
+Request for Comments: 5629 Cisco Systems
+Category: Standards Track October 2009
+
+
+ A Framework for Application Interaction
+ in the Session Initiation Protocol (SIP)
+
+Abstract
+
+ This document describes a framework for the interaction between users
+ and Session Initiation Protocol (SIP) based applications. By
+ interacting with applications, users can guide the way in which they
+ operate. The focus of this framework is stimulus signaling, which
+ allows a user agent (UA) to interact with an application without
+ knowledge of the semantics of that application. Stimulus signaling
+ can occur to a user interface running locally with the client, or to
+ a remote user interface, through media streams. Stimulus signaling
+ encompasses a wide range of mechanisms, ranging from clicking on
+ hyperlinks, to pressing buttons, to traditional Dual-Tone Multi-
+ Frequency (DTMF) input. In all cases, stimulus signaling is
+ supported through the use of markup languages, which play a key role
+ in this framework.
+
+Status of This Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (c) 2009 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the BSD License.
+
+
+
+
+
+Rosenberg Standards Track [Page 1]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ This document may contain material from IETF Documents or IETF
+ Contributions published or made publicly available before November
+ 10, 2008. The person(s) controlling the copyright in some of this
+ material may not have granted the IETF Trust the right to allow
+
+ modifications of such material outside the IETF Standards Process.
+ Without obtaining an adequate license from the person(s) controlling
+ the copyright in such materials, this document may not be modified
+ outside the IETF Standards Process, and derivative works of it may
+ not be created outside the IETF Standards Process, except to format
+ it for publication as an RFC or to translate it into languages other
+ than English.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 2]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
+ 2. Conventions Used in This Document . . . . . . . . . . . . . . 4
+ 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4
+ 4. A Model for Application Interaction . . . . . . . . . . . . . 7
+ 4.1. Functional vs. Stimulus . . . . . . . . . . . . . . . . . 9
+ 4.2. Real-Time vs. Non-Real-Time . . . . . . . . . . . . . . . 10
+ 4.3. Client-Local vs. Client-Remote . . . . . . . . . . . . . . 10
+ 4.4. Presentation-Capable vs. Presentation-Free . . . . . . . . 11
+ 5. Interaction Scenarios on Telephones . . . . . . . . . . . . . 11
+ 5.1. Client Remote . . . . . . . . . . . . . . . . . . . . . . 12
+ 5.2. Client Local . . . . . . . . . . . . . . . . . . . . . . . 12
+ 5.3. Flip-Flop . . . . . . . . . . . . . . . . . . . . . . . . 13
+ 6. Framework Overview . . . . . . . . . . . . . . . . . . . . . . 13
+ 7. Deployment Topologies . . . . . . . . . . . . . . . . . . . . 16
+ 7.1. Third-Party Application . . . . . . . . . . . . . . . . . 16
+ 7.2. Co-Resident Application . . . . . . . . . . . . . . . . . 17
+ 7.3. Third-Party Application and User Device Proxy . . . . . . 18
+ 7.4. Proxy Application . . . . . . . . . . . . . . . . . . . . 19
+ 8. Application Behavior . . . . . . . . . . . . . . . . . . . . . 19
+ 8.1. Client-Local Interfaces . . . . . . . . . . . . . . . . . 20
+ 8.1.1. Discovering Capabilities . . . . . . . . . . . . . . . 20
+ 8.1.2. Pushing an Initial Interface Component . . . . . . . . 20
+ 8.1.3. Updating an Interface Component . . . . . . . . . . . 22
+ 8.1.4. Terminating an Interface Component . . . . . . . . . . 22
+ 8.2. Client-Remote Interfaces . . . . . . . . . . . . . . . . . 23
+ 8.2.1. Originating and Terminating Applications . . . . . . . 23
+ 8.2.2. Intermediary Applications . . . . . . . . . . . . . . 24
+ 9. User Agent Behavior . . . . . . . . . . . . . . . . . . . . . 24
+ 9.1. Advertising Capabilities . . . . . . . . . . . . . . . . . 24
+ 9.2. Receiving User Interface Components . . . . . . . . . . . 25
+ 9.3. Mapping User Input to User Interface Components . . . . . 26
+ 9.4. Receiving Updates to User Interface Components . . . . . . 27
+ 9.5. Terminating a User Interface Component . . . . . . . . . . 27
+ 10. Inter-Application Feature Interaction . . . . . . . . . . . . 27
+ 10.1. Client-Local UI . . . . . . . . . . . . . . . . . . . . . 28
+ 10.2. Client-Remote UI . . . . . . . . . . . . . . . . . . . . . 29
+ 11. Intra Application Feature Interaction . . . . . . . . . . . . 29
+ 12. Example Call Flow . . . . . . . . . . . . . . . . . . . . . . 30
+ 13. Security Considerations . . . . . . . . . . . . . . . . . . . 36
+ 14. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 36
+ 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 36
+ 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 36
+ 16.1. Normative References . . . . . . . . . . . . . . . . . . . 36
+ 16.2. Informative References . . . . . . . . . . . . . . . . . . 37
+
+
+
+
+
+Rosenberg Standards Track [Page 3]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+1. Introduction
+
+ The Session Initiation Protocol (SIP) [2] provides the ability for
+ users to initiate, manage, and terminate communications sessions.
+ Frequently, these sessions will involve a SIP application. A SIP
+ application is defined as a program running on a SIP-based element
+ (such as a proxy or user agent) that provides some value-added
+ function to a user or system administrator. Examples of SIP
+ applications include prepaid calling card calls, conferencing, and
+ presence-based [12] call routing.
+
+ In order for most applications to properly function, they need input
+ from the user to guide their operation. As an example, a prepaid
+ calling card application requires the user to input their calling
+ card number, their PIN code, and the destination number they wish to
+ reach. The process by which a user provides input to an application
+ is called "application interaction".
+
+ Application interaction can be either functional or stimulus.
+ Functional interaction requires the user device to understand the
+ semantics of the application, whereas stimulus interaction does not.
+ Stimulus signaling allows for applications to be built without
+ requiring modifications to the user device. Stimulus interaction is
+ the subject of this framework. The framework provides a model for
+ how users interact with applications through user interfaces, and how
+ user interfaces and applications can be distributed throughout a
+ network. This model is then used to describe how applications can
+ instantiate and manage user interfaces.
+
+2. Conventions Used in This Document
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [1]
+
+3. Definitions
+
+ SIP Application: A SIP application is defined as a program running
+ on a SIP-based element (such as a proxy or user agent) that
+ provides some value-added function to a user or system
+ administrator. Examples of SIP applications include prepaid
+ calling card calls, conferencing, and presence-based [12] call
+ routing.
+
+ Application Interaction: The process by which a user provides input
+ to an application.
+
+
+
+
+
+Rosenberg Standards Track [Page 4]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ Real-Time Application Interaction: Application interaction that
+ takes place while an application instance is executing. For
+ example, when a user enters their PIN number into a prepaid
+ calling card application, this is real-time application
+ interaction.
+
+ Non-Real-Time Application Interaction: Application interaction that
+ takes place asynchronously with the execution of the application.
+ Generally, non-real-time application interaction is accomplished
+ through provisioning.
+
+ Functional Application Interaction: Application interaction is
+ functional when the user device has an understanding of the
+ semantics of the interaction with the application.
+
+ Stimulus Application Interaction: Application interaction is
+ stimulus when the user device has no understanding of the
+ semantics of the interaction with the application.
+
+ User Interface (UI): The user interface provides the user with
+ context to make decisions about what they want. The user
+ interacts with the device, which conveys the user input to the
+ user interface. The user interface interprets the information and
+ passes it to the application.
+
+ User Interface Component: A piece of user interface that operates
+ independently of other pieces of the user interface. For example,
+ a user might have two separate web interfaces to a prepaid calling
+ card application: one for hanging up and making another call, and
+ another for entering the username and PIN.
+
+ User Device: The software or hardware system that the user directly
+ interacts with to communicate with the application. An example of
+ a user device is a telephone. Another example is a PC with a web
+ browser.
+
+ User Device Proxy: A software or hardware system that a user
+ indirectly interacts through to communicate with the application.
+ This indirection can be through a network. An example is a
+ gateway from IP to the Public Switched Telephone Network (PSTN).
+ It acts as a user device proxy, acting on behalf of the user on
+ the circuit network.
+
+ User Input: The "raw" information passed from a user to a user
+ interface. Examples of user input include a spoken word or a
+ click on a hyperlink.
+
+
+
+
+
+Rosenberg Standards Track [Page 5]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ Client-Local User Interface: A user interface that is co-resident
+ with the user device.
+
+ Client-Remote User Interface: A user interface that executes
+ remotely from the user device. In this case, a standardized
+ interface is needed between the user device and the user
+ interface. Typically, this is done through media sessions: audio,
+ video, or application sharing.
+
+ Markup Language: A markup language describes a logical flow of
+ presentation of information to the user, collection of information
+ from the user, and transmission of that information to an
+ application.
+
+ Media Interaction: A means of separating a user and a user interface
+ by connecting them with media streams.
+
+ Interactive Voice Response (IVR): An IVR is a type of user interface
+ that allows users to speak commands to the application, and hear
+ responses to those commands prompting for more information.
+
+ Prompt-and-Collect: The basic primitive of an IVR user interface.
+ The user is presented with a voice option, and the user speaks
+ their choice.
+
+ Barge-In: The act of entering information into an IVR user interface
+ prior to the completion of a prompt requesting that information.
+
+ Focus: A user interface component has focus when user input is
+ provided to it, as opposed to any other user interface components.
+ This is not to be confused with the term "focus" within the SIP
+ conferencing framework, which refers to the center user agent in a
+ conference [14].
+
+ Focus Determination: The process by which the user device determines
+ which user interface component will receive the user input.
+
+ Focusless Device: A user device that has no ability to perform focus
+ determination. An example of a focusless device is a telephone
+ with a keypad.
+
+ Presentation-Capable UI: A user interface that can prompt the user
+ with input, collect results, and then prompt the user with new
+ information based on those results.
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 6]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ Presentation-Free UI: A user interface that cannot prompt the user
+ with information.
+
+ Feature Interaction: A class of problems that result when multiple
+ applications or application components are trying to provide
+ services to a user at the same time.
+
+ Inter-Application Feature Interaction: Feature interactions that
+ occur between applications.
+
+ DTMF: Dual-Tone Multi-Frequency. DTMF refers to a class of tones
+ generated by circuit-switched telephony devices when the user
+ presses a key on the keypad. As a result, DTMF and keypad input
+ are often used synonymously, when in fact one of them (DTMF) is
+ merely a means of conveying the other (the keypad input) to a
+ client-remote user interface (the switch, for example).
+
+ Application Instance: A single execution path of a SIP application.
+
+ Originating Application: A SIP application that acts as a User Agent
+ Client (UAC), making a call on behalf of the user.
+
+ Terminating Application: A SIP application that acts as a User Agent
+ Server (UAS), answering a call generated by a user. IVR
+ applications are terminating applications.
+
+ Intermediary Application: A SIP application that is neither the
+ caller or callee, but rather a third party involved in a call.
+
+4. A Model for Application Interaction
+
+ +---+ +---+ +---+ +---+
+ | | | | | | | |
+ | | | U | | U | | A |
+ | | Input | s | Input | s | Results | p |
+ | | ---------> | e | ---------> | e | ----------> | p |
+ | U | | r | | r | | l |
+ | s | | | | | | i |
+ | e | | D | | I | | c |
+ | r | Output | e | Output | f | Update | a |
+ | | <--------- | v | <--------- | a | <.......... | t |
+ | | | i | | c | | i |
+ | | | c | | e | | o |
+ | | | e | | | | n |
+ | | | | | | | |
+ +---+ +---+ +---+ +---+
+
+ Figure 1: Model for Real-Time Interactions
+
+
+
+Rosenberg Standards Track [Page 7]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ Figure 1 presents a general model for how users interact with
+ applications. Generally, users interact with a user interface
+ through a user device. A user device can be a telephone, or it can
+ be a PC with a web browser. Its role is to pass the user input from
+ the user to the user interface. The user interface provides the user
+ with context in order to make decisions about what they want. The
+ user interacts with the device, causing information to be passed from
+ the device to the user interface. The user interface interprets the
+ information, and passes it as a user interface event to the
+ application. The application may be able to modify the user
+ interface based on this event. Whether or not this is possible
+ depends on the type of user interface.
+
+ User interfaces are fundamentally about rendering and interpretation.
+ Rendering refers to the way in which the user is provided context.
+ This can be through hyperlinks, images, sounds, videos, text, and so
+ on. Interpretation refers to the way in which the user interface
+ takes the "raw" data provided by the user, and returns the result to
+ the application as a meaningful event, abstracted from the
+ particulars of the user interface. As an example, consider a prepaid
+ calling card application. The user interface worries about details
+ such as what prompt the user is provided, whether the voice is male
+ or female, and so on. It is concerned with recognizing the speech
+ that the user provides, in order to obtain the desired information.
+ In this case, the desired information is the calling card number, the
+ PIN code, and the destination number. The application needs that
+ data, and it doesn't matter to the application whether it was
+ collected using a male prompt or a female one.
+
+ User interfaces generally have real-time requirements towards the
+ user. That is, when a user interacts with the user interface, the
+ user interface needs to react quickly, and that change needs to be
+ propagated to the user right away. However, the interface between
+ the user interface and the application need not be that fast. Faster
+ is better, but the user interface itself can frequently compensate
+ for long latencies between the user interface and the application.
+ In the case of a prepaid calling card application, when the user is
+ prompted to enter their PIN, the prompt should generally stop
+ immediately once the first digit of the PIN is entered. This is
+ referred to as "barge-in". After the user interface collects the
+ rest of the PIN, it can tell the user to "please wait while
+ processing". The PIN can then be gradually transmitted to the
+ application. In this example, the user interface has compensated for
+ a slow UI to application interface by asking the user to wait.
+
+ The separation between user interface and application is absolutely
+ fundamental to the entire framework provided in this document. Its
+ importance cannot be overstated.
+
+
+
+Rosenberg Standards Track [Page 8]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ With this basic model, we can begin to taxonomize the types of
+ systems that can be built.
+
+4.1. Functional vs. Stimulus
+
+ The first way to taxonomize the system is to consider the interface
+ between the UI and the application. There are two fundamentally
+ different models for this interface. In a functional interface, the
+ user interface has detailed knowledge about the application and is,
+ in fact, specific to the application. The interface between the two
+ components is through a functional protocol, capable of representing
+ the semantics that can be exposed through the user interface.
+ Because the user interface has knowledge of the application, it can
+ be optimally designed for that application. As a result, functional
+ user interfaces are almost always the most user friendly, the
+ fastest, and the most responsive. However, in order to allow
+ interoperability between user devices and applications, the details
+ of the functional protocols need to be specified in standards. This
+ slows down innovation and limits the scope of applications that can
+ be built.
+
+ An alternative is a stimulus interface. In a stimulus interface, the
+ user interface is generic -- that is, totally ignorant of the details
+ of the application. Indeed, the application may pass instructions to
+ the user interface describing how it should operate. The user
+ interface translates user input into "stimulus", which are data
+ understood only by the application, and not by the user interface.
+ Because they are generic, and because they require communications
+ with the application in order to change the way in which they render
+ information to the user, stimulus user interfaces are usually slower,
+ less user friendly, and less responsive than a functional
+ counterpart. However, they allow for substantial innovation in
+ applications, since no standardization activity is needed to build a
+ new application, as long as it can interact with the user within the
+ confines of the user interface mechanism. The web is an example of a
+ stimulus user interface to applications.
+
+ In SIP systems, functional interfaces are provided by extending the
+ SIP protocol to provide the needed functionality. For example, the
+ SIP caller preferences specification [15] provides a functional
+ interface that allows a user to request applications to route the
+ call to specific types of user agents. Functional interfaces are
+ important, but are not the subject of this framework. The primary
+ goal of this framework is to address the role of stimulus interfaces
+ to SIP applications.
+
+
+
+
+
+
+Rosenberg Standards Track [Page 9]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+4.2. Real-Time vs. Non-Real-Time
+
+ Application interaction systems can also be real-time or non-real-
+ time. Non-real-time interaction allows the user to enter information
+ about application operation asynchronously with its invocation.
+ Frequently, this is done through provisioning systems. As an
+ example, a user can set up the forwarding number for a call-forward
+ on no-answer application using a web page. Real-time interaction
+ requires the user to interact with the application at the time of its
+ invocation.
+
+4.3. Client-Local vs. Client-Remote
+
+ Another axis in the taxonomization is whether the user interface is
+ co-resident with the user device (which we refer to as a client-local
+ user interface), or the user interface runs in a host separated from
+ the client (which we refer to as a client-remote user interface). In
+ a client-remote user interface, there exists some kind of protocol
+ between the client device and the UI that allows the client to
+ interact with the user interface over a network.
+
+ The most important way to separate the UI and the client device is
+ through media interaction. In media interaction, the interface
+ between the user and the user interface is through media: audio,
+ video, messaging, and so on. This is the classic mode of operation
+ for VoiceXML [5], where the user interface (also referred to as the
+ voice browser) runs on a platform in the network. Users communicate
+ with the voice browser through the telephone network (or using a SIP
+ session). The voice browser interacts with the application using
+ HTTP to convey the information collected from the user.
+
+ In the case of a client-local user interface, the user interface runs
+ co-located with the user device. The interface between them is
+ through the software that interprets the user's input and passes it
+ to the user interface. The classic example of this is the Web. In
+ the Web, the user interface is a web browser, and the interface is
+ defined by the HTML document that it's rendering. The user interacts
+ directly with the user interface running in the browser. The results
+ of that user interface are sent to the application (running on the
+ web server) using HTTP.
+
+ It is important to note that whether or not the user interface is
+ local or remote (in the case of media interaction) is not a property
+ of the modality of the interface, but rather a property of the
+ system. As an example, it is possible for a Web-based user interface
+ to be provided with a client-remote user interface. In such a
+ scenario, video- and application-sharing media sessions can be used
+ between the user and the user interface. The user interface, still
+
+
+
+Rosenberg Standards Track [Page 10]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ guided by HTML, now runs "in the network", remote from the client.
+ Similarly, a VoiceXML document can be interpreted locally by a client
+ device, with no media streams at all. Indeed, the VoiceXML document
+ can be rendered using text, rather than media, with no impact on the
+ interface between the user interface and the application.
+
+ It is also important to note that systems can be hybrid. In a hybrid
+ user interface, some aspects of it (usually those associated with a
+ particular modality) run locally, and others run remotely.
+
+4.4. Presentation-Capable vs. Presentation-Free
+
+ A user interface can be capable of presenting information to the user
+ (a presentation-capable UI), or it can be capable only of collecting
+ user input (a presentation-free UI). These are very different types
+ of user interfaces. A presentation-capable UI can provide the user
+ with feedback after every input, providing the context for collecting
+ the next input. As a result, presentation-capable user interfaces
+ require an update to the information provided to the user after each
+ input. The Web is a classic example of this. After every input
+ (i.e., a click), the browser provides the input to the application
+ and fetches the next page to render. In a presentation-free user
+ interface, this is not the case. Since the user is not provided with
+ feedback, these user interfaces tend to merely collect information as
+ it's entered, and pass it to the application.
+
+ Another difference is that a presentation-free user interface cannot
+ easily support the concept of a focus. Selection of a focus usually
+ requires a means for informing the user of the available
+ applications, allowing the user to choose, and then informing them
+ about which one they have chosen. Without the first and third steps
+ (which a presentation-free UI cannot provide), focus selection is
+ very difficult. Without a selected focus, the input provided to
+ applications through presentation-free user interfaces is more of a
+ broadcast or notification operation.
+
+5. Interaction Scenarios on Telephones
+
+ In this section, we apply the model of Section 4 to telephones.
+
+ In a traditional telephone, the user interface consists of a 12-key
+ keypad, a speaker, and a microphone. Indeed, from here forward, the
+ term "telephone" is used to represent any device that meets, at a
+ minimum, the characteristics described in the previous sentence.
+ Circuit-switched telephony applications are almost universally
+ client-remote user interfaces. In the Public Switched Telephone
+ Network (PSTN), there is usually a circuit interface between the user
+ and the user interface. The user input from the keypad is conveyed
+
+
+
+Rosenberg Standards Track [Page 11]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ using Dual-Tone Multi-Frequency (DTMF), and the microphone input as
+ Pulse Code Modulated (PCM) encoded voice.
+
+ In an IP-based system, there is more variability in how the system
+ can be instantiated. Both client-remote and client-local user
+ interfaces to a telephone can be provided.
+
+ In this framework, a PSTN gateway can be considered a User Device
+ Proxy. It is a proxy for the user because it can provide, to a user
+ interface on an IP network, input taken from a user on a circuit-
+ switched telephone. The gateway may be able to run a client-local
+ user interface, just as an IP telephone might.
+
+5.1. Client Remote
+
+ The most obvious instantiation is the "classic" circuit-switched
+ telephony model. In that model, the user interface runs remotely
+ from the client. The interface between the user and the user
+ interface is through media, which is set up by SIP and carried over
+ the Real Time Transport Protocol (RTP) [18]. The microphone input
+ can be carried using any suitable voice-encoding algorithm. The
+ keypad input can be conveyed in one of two ways. The first is to
+ convert the keypad input to DTMF, and then convey that DTMF using a
+ suitable encoding algorithm (such as PCMU). An alternative, and
+ generally the preferred approach, is to transmit the keypad input
+ using RFC 4733 [19], which provides an encoding mechanism for
+ carrying keypad input within RTP.
+
+ In this classic model, the user interface would run on a server in
+ the IP network. It would perform speech recognition and DTMF
+ recognition to derive the user intent, feed them through the user
+ interface, and provide the result to an application.
+
+5.2. Client Local
+
+ An alternative model is for the entire user interface to reside on
+ the telephone. The user interface can be a VoiceXML browser, running
+ speech recognition on the microphone input, and feeding the keypad
+ input directly into the script. As discussed above, the VoiceXML
+ script could be rendered using text instead of voice, if the
+ telephone has a textual display.
+
+ For simpler phones without a display, the user interface can be
+ described by a Keypad Markup Language request document [8]. As the
+ user enters digits in the keypad, they are passed to the user
+ interface, which generates user interface events that can be
+ transported to the application.
+
+
+
+
+Rosenberg Standards Track [Page 12]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+5.3. Flip-Flop
+
+ A middle-ground approach is to flip back and forth between a client-
+ local and client-remote user interface. Many voice applications are
+ of the type that listen to the media stream and wait for some
+ specific trigger that kicks off a more complex user interaction. The
+ long pound in a prepaid calling card application is one example.
+ Another example is a conference recording application, where the user
+ can press a key at some point in the call to begin recording. When
+ the key is pressed, the user hears a whisper to inform them that
+ recording has started.
+
+ The ideal way to support such an application is to install a client-
+ local user interface component that waits for the trigger to kick off
+ the real interaction. Once the trigger is received, the application
+ connects the user to a client-remote user interface that can play
+ announcements, collect more information, and so on.
+
+ The benefit of flip-flopping between a client-local and client-remote
+ user interface is cost. The client-local user interface will
+ eliminate the need to send media streams into the network just to
+ wait for the user to press the pound key on the keypad.
+
+ The Keypad Markup Language (KPML) was designed to support exactly
+ this kind of need [8]. It models the keypad on a phone and allows an
+ application to be informed when any sequence of keys has been
+ pressed. However, KPML has no presentation component. Since user
+ interfaces generally require a response to user input, the
+ presentation will need to be done using a client-remote user
+ interface that gets instantiated as a result of the trigger.
+
+ It is tempting to use a hybrid model, where a prompt-and-collect
+ application is implemented by using a client-remote user interface
+ that plays the prompts, and a client-local user interface, described
+ by KPML, that collects digits. However, this only complicates the
+ application. Firstly, the keypad input will be sent to both the
+ media stream and the KPML user interface. This requires the
+ application to sort out which user inputs are duplicates, a process
+ that is very complicated. Secondly, the primary benefit of KPML is
+ to avoid having a media stream towards a user interface. However,
+ there is already a media stream for the prompting, so there is no
+ real savings.
+
+6. Framework Overview
+
+ In this framework, we use the term "SIP application" to refer to a
+ broad set of functionality. A SIP application is a program running
+ on a SIP-based element (such as a proxy or user agent) that provides
+
+
+
+Rosenberg Standards Track [Page 13]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ some value-added function to a user or system administrator. SIP
+ applications can execute on behalf of a caller, a called party, or a
+ multitude of users at once.
+
+ Each application has a number of instances that are executing at any
+ given time. An instance represents a single execution path for an
+ application. It is established as a result of some event. That
+ event can be a SIP event, such as the reception of a SIP INVITE
+ request, or it can be a non-SIP event, such as a web form post or
+ even a timer. Application instances also have an end time. Some
+ instances have a lifetime that is coupled with a SIP transaction or
+ dialog. For example, a proxy application might begin when an INVITE
+ arrives, and terminate when the call is answered. Other applications
+ have a lifetime that spans multiple dialogs or transactions. For
+ example, a conferencing application instance may exist so long as
+ there are dialogs connected to it. When the last dialog terminates,
+ the application instance terminates. Other applications have a
+ lifetime that is completely decoupled from SIP events.
+
+ It is fundamental to the framework described here that multiple
+ application instances may interact with a user during a single SIP
+ transaction or dialog. Each instance may be for the same
+ application, or different applications. Each of the applications may
+ be completely independent, in that each may be owned by a different
+ provider, and may not be aware of each other's existence. Similarly,
+ there may be application instances interacting with the caller, and
+ instances interacting with the callee, both within the same
+ transaction or dialog.
+
+ The first step in the interaction with the user is to instantiate one
+ or more user interface components for the application instance. A
+ user interface component is a single piece of the user interface that
+ is defined by a logical flow that is not synchronously coupled with
+ any other component. In other words, each component runs
+ independently.
+
+ A user interface component can be instantiated in one of the user
+ agents in a dialog (for a client-local user interface), or within a
+ network element (for a client-remote user interface). If a client-
+ local user interface is to be used, the application needs to
+ determine whether or not the user agent is capable of supporting a
+ client-local user interface, and in what format. In this framework,
+ all client-local user interface components are described by a markup
+ language. A markup language describes a logical flow of presentation
+ of information to the user, a collection of information from the
+ user, and a transmission of that information to an application.
+ Examples of markup languages include HTML, Wireless Markup Language
+ (WML), VoiceXML, and the Keypad Markup Language (KPML) [8].
+
+
+
+Rosenberg Standards Track [Page 14]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ Unlike an application instance, which has a very flexible lifetime, a
+ user interface component has a very fixed lifetime. A user interface
+ component is always associated with a dialog. The user interface
+ component can be created at any point after the dialog (or early
+ dialog) is created. However, the user interface component terminates
+ when the dialog terminates. The user interface component can be
+ terminated earlier by the user agent, and possibly by the
+ application, but its lifetime never exceeds that of its associated
+ dialog.
+
+ There are two ways to create a client-local interface component. For
+ interface components that are presentation capable, the application
+ sends a REFER [7] request to the user agent. The Refer-To header
+ field contains an HTTP URI that points to the markup for the user
+ interface, and the REFER contains a Target-Dialog header field [10]
+ which identifies the dialog associated with the user interface
+ component. For user interface components that are presentation free
+ (such as those defined by KPML), the application sends a SUBSCRIBE
+ request to the user agent. The body of the SUBSCRIBE request
+ contains a filter, which, in this case, is the markup that defines
+ when information is to be sent to the application in a NOTIFY. The
+ SUBSCRIBE does not contain the Target-Dialog header field, since
+ equivalent information is conveyed in the Event header field.
+
+ If a user interface component is to be instantiated in the network,
+ there is no need to determine the capabilities of the device on which
+ the user interface is instantiated. Presumably, it is on a device on
+ which the application knows a UI can be created. However, the
+ application does need to connect the user device to the user
+ interface. This will require manipulation of media streams in order
+ to establish that connection.
+
+ The interface between the user interface component and the
+ application depends on the type of user interface. For presentation-
+ capable user interfaces, such as those described by HTML and
+ VoiceXML, HTTP form POST operations are used. For presentation-free
+ user interfaces, a SIP NOTIFY is used. The differing needs and
+ capabilities of these two user interfaces, as described in
+ Section 4.4, are what drives the different choices for the
+ interactions. Since presentation-capable user interfaces require an
+ update to the presentation every time user data is entered, they are
+ a good match for HTTP. Since presentation-free user interfaces
+ merely transmit user input to the application, a NOTIFY is more
+ appropriate.
+
+ Indeed, for presentation-free user interfaces, there are two
+ different modalities of operation. The first is called "one shot".
+ In the one-shot role, the markup waits for a user to enter some
+
+
+
+Rosenberg Standards Track [Page 15]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ information and, when they do, reports this event to the application.
+ The application then does something, and the markup is no longer
+ used. In the other modality, called "monitor", the markup stays
+ permanently resident, and reports information back to an application
+ until termination of the associated dialog.
+
+7. Deployment Topologies
+
+ This section presents some of the network topologies in which this
+ framework can be instantiated.
+
+7.1. Third-Party Application
+
+ +-------------+
+ /---| Application |
+ / +-------------+
+ /
+ SUB/ / REFER/
+ NOT / HTTP
+ /
+ +--------+ SIP (INVITE) +-----+
+ | UI A--------------------X |
+ |........| | SIP |
+ | User | RTP | UA |
+ | Device B--------------------Y |
+ +--------+ +-----+
+
+ Figure 2: Third-Party Topology
+
+ In this topology, the application that is interested in interacting
+ with the users exists outside of the SIP dialog between the user
+ agents. In that case, the application learns about the initiation
+ and termination of the dialog, along with the dialog identifiers,
+ through some out-of-band means. One such possibility is the dialog
+ event package [16]. Dialog information is only revealed to trusted
+ parties, so the application would need to be trusted by one of the
+ users in order to obtain this information.
+
+ At any point during the dialog, the application can instantiate user
+ interface components on the user device of the caller or callee. It
+ can do this using either SUBSCRIBE or REFER, depending on the type of
+ user interface (presentation capable or presentation free).
+
+
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 16]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+7.2. Co-Resident Application
+
+ +--------+ SIP (INVITE) +-----+
+ | User A--------------------X SIP |
+ | Device | RTP | UA |
+ |........B--------------------Y |
+ | | SUB/NOT | App)|
+ | UI A'-------------------X' |
+ +--------+ REFER/HTTP +-----+
+
+ Figure 3: Co-Resident Topology
+
+ In this deployment topology, the application is co-resident with one
+ of the user agents (the one on the right in the picture above). This
+ application can install client-local user interface components on the
+ other user agent, which is acting as the user device. These
+ components can be installed using either SUBSCRIBE, for presentation-
+ free user interfaces, or REFER, for presentation-capable ones. This
+ situation typically arises when the application wishes to install UI
+ components on a presentation-capable user interface. If the only
+ user input is via keypad input, the framework is not needed per se,
+ because the UA/application will receive the input via RFC 4733 in the
+ RTP stream.
+
+ If the application resides in the called party, it is called a
+ "terminating application". If it resides in the calling party, it is
+ called an "originating application".
+
+ This kind of topology is common in protocol converter and gateway
+ applications.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 17]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+7.3. Third-Party Application and User Device Proxy
+
+ +-------------+
+ /---| Application |
+ / +-------------+
+ /
+ SUB/ / REFER/
+ NOT / HTTP
+ /
+ +-----+ SIP +---M----+ SIP +-----+
+ | V--------------------C A--------------------X |
+ | SIP | | UI | | SIP |
+ | UAa | RTP | | RTP | UAb |
+ | W--------------------D B--------------------Y |
+ +-----+ +--------+ +-----+
+ User User
+ Device Device
+ Proxy
+
+ Figure 4: User Device Proxy Topology
+
+ In this deployment topology, there is a third-party application as in
+ Section 7.1. However, instead of installing a user interface
+ component on the end user device, the component is installed in an
+ intermediate device, known as a User Device Proxy. From the
+ perspective of the actual user device (on the left), the User Device
+ Proxy is a client remote user interface. As such, media, typically
+ transported using RTP (including RFC 4733 for carrying user input),
+ is sent from the user device to the client remote user interface on
+ the User Device Proxy. As far as the application is concerned, it is
+ installing what it thinks is a client-local user interface on the
+ user device, but it happens to be on a user device proxy that looks
+ like the user device to the application.
+
+ The user device proxy will need to terminate and re-originate both
+ signaling (SIP) and media traffic towards the actual peer in the
+ conversation. The User Device Proxy is a media relay in the
+ terminology of RFC 3550 [18]. The User Device Proxy will need to
+ monitor the media streams associated with each dialog, in order to
+ convert user input received in the media stream to events reported to
+ the user interface. This can pose a challenge in multi-media
+ systems, where it may be unclear on which media stream the user input
+ is being sent. As discussed in RFC 3264 [20], if a user agent has a
+ single media source and is supporting multiple streams, it is
+ supposed to send that source to all streams. In cases where there
+ are multiple sources, the mapping is a matter of local policy. In
+
+
+
+
+
+Rosenberg Standards Track [Page 18]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ the absence of a way to explicitly identify or request which sources
+ map to which streams, the user device proxy will need to do the best
+ job it can. This specification RECOMMENDS that the User Device Proxy
+ monitor the first stream (defined in terms of ordering of media
+ sessions within a session description). As such, user agents SHOULD
+ send their user input on the first stream, absent a policy to direct
+ it otherwise.
+
+7.4. Proxy Application
+
+ +----------+
+ SUB/NOT | App | SUB/NOT
+ +--------------->| |<-----------------+
+ | REFER/HTTP |..........| REFER/HTTP |
+ | | SIP | |
+ | | Proxy | |
+ | +----------+ |
+ V ^ | V
+ +----------+ | | +----------+
+ | UI | INVITE | | INVITE | UI |
+ | |------------+ +------------>| |
+ |......... | |..........|
+ | SIP |...................................| SIP |
+ | UA | | UA |
+ +----------+ RTP +----------+
+ User Device User Device
+
+ Figure 5: Proxy Application Topology
+
+ In this topology, the application is co-resident with a transaction
+ stateful, record-routing proxy server on the call path between two
+ user devices. The application uses SUBSCRIBE or REFER to install
+ user interface components on one or both user devices.
+
+ This topology is common in routing applications, such as a web-
+ assisted call-routing application.
+
+8. Application Behavior
+
+ The behavior of an application within this framework depends on
+ whether it seeks to use a client-local or client-remote user
+ interface.
+
+
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 19]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+8.1. Client-Local Interfaces
+
+ One key component of this framework is support for client-local user
+ interfaces.
+
+8.1.1. Discovering Capabilities
+
+ A client-local user interface can only be instantiated on a user
+ agent if the user agent supports that type of user interface
+ component. Support for client-local user interface components is
+ declared by both the UAC and UAS in their Allow, Accept, Supported,
+ and Allow-Event header fields of dialog-initiating requests and
+ responses. If the Allow header field indicates support for the SIP
+ SUBSCRIBE method, and the Allow-Event header field indicates support
+ for the KPML package [8], and the Supported header field indicates
+ support for the Globally Routable UA URI (GRUU) [9] specification
+ (which, in turn, means that the Contact header field contains a
+ GRUU), it means that the UA can instantiate presentation-free user
+ interface components. In this case, the application can push
+ presentation-free user interface components according to the rules of
+ Section 8.1.2. The specific markup languages that can be supported
+ are indicated in the Accept header field.
+
+ If the Allow header field indicates support for the SIP REFER method,
+ and the Supported header field indicates support for the Target-
+ Dialog header field [10], and the Contact header field contains UA
+ capabilities [6] that indicate support for the HTTP URI scheme, it
+ means that the UA supports presentation-capable user interface
+ components. In this case, the application can push presentation-
+ capable user interface components to the client according to the
+ rules of Section 8.1.2. The specific markups that are supported are
+ indicated in the Accept header field.
+
+ A third-party application that is not present on the call path will
+ not be privy to these header fields in the dialog-initiating requests
+ that pass by. As such, it will need to obtain this capability
+ information in other ways. One way is through the registration event
+ package [21], which can contain user agent capability information
+ provided in REGISTER requests [6].
+
+8.1.2. Pushing an Initial Interface Component
+
+ Generally, we anticipate that interface components will need to be
+ created at various different points in a SIP session. Clearly, they
+ will need to be pushed during session setup, or after the session is
+ established. A user interface component is always associated with a
+ specific dialog, however.
+
+
+
+
+Rosenberg Standards Track [Page 20]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ An application MUST NOT attempt to push a user interface component to
+ a user agent until it has determined that the user agent has the
+ necessary capabilities and a dialog has been created. In the case of
+ a UAC, this means that an application MUST NOT push a user interface
+ component for an INVITE-initiated dialog until the application has
+ seen a request confirming the receipt of a dialog-creating response.
+ This could be an ACK for a 200 OK, or a PRACK for a provisional
+ response [3]. For SUBSCRIBE-initiated dialogs, the application MUST
+ NOT push a user interface component until the application has seen a
+ 200 OK to the NOTIFY request. For a user interface component on a
+ UAS, the application MUST NOT push a user interface component for an
+ INVITE-initiated dialog until it has seen a dialog-creating response
+ from the UAS. For a SUBSCRIBE-initiated dialog, it MUST NOT push a
+ user interface component until it has seen a NOTIFY request from the
+ notifier.
+
+ To create a presentation-capable UI component on the UA, the
+ application sends a REFER request to the UA. This REFER MUST be sent
+ to the GRUU [9] advertised by that UA in the Contact header field of
+ the dialog-initiating request or response sent by that UA. Note that
+ this REFER request creates a separate dialog between the application
+ and the UA. The Refer-To header field of the REFER request MUST
+ contain an HTTP URI that references the markup document to be
+ fetched.
+
+ Furthermore, it is essential for the REFER request to be correlated
+ with the dialog to which the user interface component will be
+ associated. This is necessary for authorization and for terminating
+ the user interface components when the dialog terminates. To provide
+ this context, the REFER request MUST contain a Target-Dialog header
+ field identifying the dialog with which the user interface component
+ is associated. As discussed in [10], this request will also contain
+ a Require header field with the tdialog option tag.
+
+ To create a presentation-free user interface component, the
+ application sends a SUBSCRIBE request to the UA. The SUBSCRIBE MUST
+ be sent to the GRUU advertised by the UA. This SUBSCRIBE request
+ creates a separate dialog. The SUBSCRIBE request MUST use the KPML
+ [8] event package. The body of the SUBSCRIBE request contains the
+ markup document that defines the conditions under which the
+ application wishes to be notified of user input.
+
+ In both cases, the REFER or SUBSCRIBE request SHOULD include a
+ display name in the From header field that identifies the name of the
+ application. For example, a prepaid calling card might include a
+ From header field that looks like:
+
+
+
+
+
+Rosenberg Standards Track [Page 21]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ From: "Prepaid Calling Card" <sip:prepaid@example.com>
+
+ Any of the SIP identity assertion mechanisms that have been defined,
+ such as [11] and [13], are applicable to these requests as well.
+
+8.1.3. Updating an Interface Component
+
+ Once a user interface component has been created on a client, it can
+ be updated. The means for updating it depends on the type of UI
+ component.
+
+ Presentation-capable UI components are updated using techniques
+ already in place for those markups. In particular, user input will
+ cause an HTTP POST operation to push the user input to the
+ application. The result of the POST operation is a new markup that
+ the UI is supposed to use. This allows the UI to be updated in
+ response to user action. Some markups, such as HTML, provide the
+ ability to force a refresh after a certain period of time, so that
+ the UI can be updated without user input. Those mechanisms can be
+ used here as well. However, there is no support for an asynchronous
+ push of an updated UI component from the application to the user
+ agent. A new REFER request to the same GRUU would create a new UI
+ component rather than update any components already in place.
+
+ For presentation-free UI, the story is different. The application
+ MAY update the filter at any time by generating a SUBSCRIBE refresh
+ with the new filter. The UA will immediately begin using this new
+ filter.
+
+8.1.4. Terminating an Interface Component
+
+ User interface components have a well-defined lifetime. They are
+ created when the component is first pushed to the client. User
+ interface components are always associated with the SIP dialog on
+ which they were pushed. As such, their lifetime is bound by the
+ lifetime of the dialog. When the dialog ends, so does the interface
+ component.
+
+ However, there are some cases where the application would like to
+ terminate the user interface component before its natural termination
+ point. For presentation-capable user interfaces, this is not
+ possible. For presentation-free user interfaces, the application MAY
+ terminate the component by sending a SUBSCRIBE with Expires equal to
+ zero. This terminates the subscription, which removes the UI
+ component.
+
+ A client can remove a UI component at any time. For presentation-
+ capable UI, this is analogous to the user dismissing the web form
+
+
+
+Rosenberg Standards Track [Page 22]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ window. There is no mechanism provided for reporting this kind of
+ event to the application. The application MUST be prepared to time
+ out and never receive input from a user. The duration of this
+ timeout is application dependent. For presentation-free user
+ interfaces, the UA can explicitly terminate the subscription. This
+ will result in the generation of a NOTIFY with a Subscription-State
+ header field equal to "terminated".
+
+8.2. Client-Remote Interfaces
+
+ As an alternative to, or in conjunction with client-local user
+ interfaces, an application can make use of client-remote user
+ interfaces. These user interfaces can execute co-resident with the
+ application itself (in which case no standardized interfaces between
+ the UI and the application need to be used), or they can run
+ separately. This framework assumes that the user interface runs on a
+ host that has a sufficient trust relationship with the application.
+ As such, the means for instantiating the user interface is not
+ considered here.
+
+ The primary issue is to connect the user device to the remote user
+ interface. Doing so requires the manipulation of media streams
+ between the client and the user interface. Such manipulation can
+ only be done by user agents. There are two types of user agent
+ applications within this framework: originating/terminating
+ applications, and intermediary applications.
+
+8.2.1. Originating and Terminating Applications
+
+ Originating and terminating applications are applications that are
+ themselves the originator or the final recipient of a SIP invitation.
+ They are "pure" user agent applications, not back-to-back user
+ agents. The classic example of such an application is an interactive
+ voice response (IVR) application, which is typically a terminating
+ application. It is a terminating application because the user
+ explicitly calls it; i.e., it is the actual called party. An example
+ of an originating application is a wakeup call application, which
+ calls a user at a specified time in order to wake them up.
+
+ Because originating and terminating applications are a natural
+ termination point of the dialog, manipulation of the media session by
+ the application is trivial. Traditional SIP techniques for adding
+ and removing media streams, modifying codecs, and changing the
+ address of the recipient of the media streams can be applied.
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 23]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+8.2.2. Intermediary Applications
+
+ Intermediary applications are, at the same time, more common than
+ originating/terminating applications and more complex. Intermediary
+ applications are applications that are neither the actual caller nor
+ the called party. Rather, they represent a "third party" that wishes
+ to interact with the user. The classic example is the ubiquitous
+ prepaid calling card application.
+
+ In order for the intermediary application to add a client-remote user
+ interface, it needs to manipulate the media streams of the user agent
+ to terminate on that user interface. This also introduces a
+ fundamental feature interaction issue. Since the intermediary
+ application is not an actual participant in the call, the user will
+ need to interact with both the intermediary application and its peer
+ in the dialog. Doing both at the same time is complicated and is
+ discussed in more detail in Section 10.
+
+9. User Agent Behavior
+
+9.1. Advertising Capabilities
+
+ In order to participate in applications that make use of stimulus
+ interfaces, a user agent needs to advertise its interaction
+ capabilities.
+
+ If a user agent supports presentation-capable user interfaces, it
+ MUST support the REFER method. It MUST include, in all dialog-
+ initiating requests and responses, an Allow header field that
+ includes the REFER method. The user agent MUST support the target
+ dialog specification [10], and MUST include the "tdialog" option tag
+ in the Supported header field of dialog-forming requests and
+ responses. Furthermore, the UA MUST support the SIP user agent
+ capabilities specification [6]. The UA MUST be capable of being
+ REFERed to an HTTP URI. It MUST include, in the Contact header field
+ of its dialog-initiating requests and responses, a "schemes" Contact
+ header field parameter that includes the HTTP URI scheme. The UA
+ MUST include, in all dialog-initiating requests and responses, an
+ Accept header field listing all of those markups supported by the UA.
+ It is RECOMMENDED that all user agents that support presentation-
+ capable user interfaces support HTML.
+
+ If a user agent supports presentation-free user interfaces, it MUST
+ support the SUBSCRIBE [4] method. It MUST support the KPML [8] event
+ package. It MUST include, in all dialog-initiating requests and
+ responses, an Allow header field that includes the SUBSCRIBE method.
+ It MUST include, in all dialog-initiating requests and responses, an
+ Allow-Events header field that lists the KPML event package. The UA
+
+
+
+Rosenberg Standards Track [Page 24]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ MUST include, in all dialog-initiating requests and responses, an
+ Accept header field listing those event filters it supports. At a
+ minimum, a UA MUST support the "application/kpml-request+xml" MIME
+ type.
+
+ For either presentation-free or presentation-capable user interfaces,
+ the user agent MUST support the GRUU [9] specification. The Contact
+ header field in all dialog-initiating requests and responses MUST
+ contain a GRUU. The UA MUST include a Supported header field that
+ contains the "gruu" option tag and the "tdialog" option tag.
+
+ Because these headers are examined by proxies that may be executing
+ applications, a UA that wishes to support client-local user
+ interfaces should not encrypt them.
+
+9.2. Receiving User Interface Components
+
+ Once the UA has created a dialog (in either the early or confirmed
+ states), it MUST be prepared to receive a SUBSCRIBE or REFER request
+ against its GRUU. If the UA receives such a request prior to the
+ establishment of a dialog, the UA MUST reject the request.
+
+ A user agent SHOULD attempt to authenticate the sender of the
+ request. The sender will generally be an application; therefore, the
+ user agent is unlikely to ever have a shared secret with it, making
+ digest authentication useless. However, authenticated identities can
+ be obtained through other means, such as the Identity mechanism [11].
+
+ A user agent MAY have pre-defined authorization policies that permit
+ applications which have authenticated themselves with a particular
+ identity to push user interface components. If such a set of
+ policies is present, it is checked first. If the application is
+ authorized, processing proceeds.
+
+ If the application has authenticated itself but is not explicitly
+ authorized or blocked, this specification RECOMMENDS that the
+ application be automatically authorized if it can prove that it was
+ either on the call path, or is trusted by one of the elements on the
+ call path. An application proves this to the user agent by
+ demonstrating that it knows the dialog identifiers. That occurs by
+ including them in a Target-Dialog header field for REFER requests, or
+ in the Event header field parameters of the KPML SUBSCRIBE request.
+
+ Because the dialog identifiers serve as a tool for authorization, a
+ user agent compliant to this framework SHOULD use dialog identifiers
+ that are cryptographically random, with at least 128 bits of
+ randomness. It is recommended that this randomness be split between
+ the Call-ID and From header field tags in the case of a UAC.
+
+
+
+Rosenberg Standards Track [Page 25]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ Furthermore, to ensure that only applications resident in or trusted
+ by on-path elements can instantiate a user interface component, a
+ user agent compliant to this specification SHOULD use the Session
+ Initiation Protocol Secure (SIPS) URI scheme for all dialogs it
+ initiates. This will guarantee secure links between all the elements
+ on the signaling path.
+
+ If the dialog was not established with a SIPS URI, or the user agent
+ did not choose cryptographically random dialog identifiers, then the
+ application MUST NOT automatically be authorized, even if it
+ presented valid dialog identifiers. A user agent MAY apply any other
+ policies in addition to (but not instead of) the ones specified here
+ in order to authorize the creation of the user interface component.
+ One such mechanism would be to prompt the user, informing them of the
+ identity of the application and the dialog it is associated with. If
+ an authorization policy requires user interaction, the user agent
+ SHOULD respond to the SUBSCRIBE or REFER request with a 202. In the
+ case of SUBSCRIBE, if authorization is not granted, the user agent
+ SHOULD generate a NOTIFY to terminate the subscription. In the case
+ of REFER, the user agent MUST NOT act upon the URI in the Refer-To
+ header field until user authorization is obtained.
+
+ If an application does not present a valid dialog identifier in its
+ REFER or SUBSCRIBE request, the user agent MUST reject the request
+ with a 403 response.
+
+ If a REFER request to an HTTP URI is authorized, the UA executes the
+ URI and fetches the content to be rendered to the user. This
+ instantiates a presentation-capable user interface component. If a
+ SUBSCRIBE was authorized, a presentation-free user interface
+ component is instantiated.
+
+9.3. Mapping User Input to User Interface Components
+
+ Once the user interface components are instantiated, the user agent
+ must direct user input to the appropriate component. In the case of
+ presentation-capable user interfaces, this process is known as focus
+ selection. It is done by means that are specific to the user
+ interface on the device. In the case of a PC, for example, the
+ window manager would allow the user to select the appropriate user
+ interface component to which their input is directed.
+
+ For presentation-free user interfaces, the situation is more
+ complicated. In some cases, the device may support a mechanism that
+ allows the user to select a "line", and thus the associated dialog.
+ Any user input on the keypad while this line is selected are fed to
+ the user interface components associated with that dialog.
+
+
+
+
+Rosenberg Standards Track [Page 26]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ Otherwise, for client-local user interfaces, the user input is
+ assumed to be associated with all user interface components. For
+ client-remote user interfaces, the user device converts the user
+ input to media, typically conveyed using RFC 4733, and sends this to
+ the client-remote user interface. This user interface then needs to
+ map user input from potentially many media streams into user
+ interface events. The process for doing this is described in
+ Section 7.3.
+
+9.4. Receiving Updates to User Interface Components
+
+ For presentation-capable user interfaces, updates to the user
+ interface occur in ways specific to that user interface component.
+ In the case of HTML, for example, the document can tell the client to
+ fetch a new document periodically. However, this framework does not
+ provide any additional machinery to asynchronously push a new user
+ interface component to the client.
+
+ For presentation-free user interfaces, an application can push an
+ update to a component by sending a SUBSCRIBE refresh with a new
+ filter. The user agent will process these according to the rules of
+ the event package.
+
+9.5. Terminating a User Interface Component
+
+ Termination of a presentation-capable user interface component is a
+ trivial procedure. The user agent merely dismisses the window (or
+ its equivalent). The fact that the component is dismissed is not
+ communicated to the application. As such, it is purely a local
+ matter.
+
+ In the case of a presentation-free user interface, the user might
+ wish to cease interacting with the application. However, most
+ presentation-free user interfaces will not have a way for the user to
+ signal this through the device. If such a mechanism did exist, the
+ UA SHOULD generate a NOTIFY request with a Subscription-State header
+ field equal to "terminated" and a reason of "rejected". This tells
+ the application that the component has been removed and that it
+ should not attempt to re-subscribe.
+
+10. Inter-Application Feature Interaction
+
+ The inter-application feature interaction problem is inherent to
+ stimulus signaling. Whenever there are multiple applications, there
+ are multiple user interfaces. The system has to determine to which
+ user interface any particular input is destined. That question is
+ the essence of the inter-application feature interaction problem.
+
+
+
+
+Rosenberg Standards Track [Page 27]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ Inter-application feature interaction is not an easy problem to
+ resolve. For now, we consider separately the issues for client-local
+ and client-remote user interface components.
+
+10.1. Client-Local UI
+
+ When the user interface itself resides locally on the client device,
+ the feature interaction problem is actually much simpler. The end
+ device knows explicitly about each application, and therefore can
+ present the user with each one separately. When the user provides
+ input, the client device can determine to which user interface the
+ input is destined. The user interface to which input is destined is
+ referred to as the "application in focus", and the means by which the
+ focused application is selected is called "focus determination".
+
+ Generally speaking, focus determination is purely a local operation.
+ In the PC universe, focus determination is provided by window
+ managers. Each application does not know about focus; it merely
+ receives the user input that has been targeted to it when it's in
+ focus. This basic concept applies to SIP-based applications as well.
+
+ Focus determination will frequently be trivial, depending on the user
+ interface type. Consider a user that makes a call from a PC. The
+ call passes through a prepaid calling card application and a call-
+ recording application. Both of these wish to interact with the user.
+ Both push an HTML-based user interface to the user. On the PC, each
+ user interface would appear as a separate window. The user interacts
+ with the call-recording application by selecting its window, and with
+ the prepaid calling card application by selecting its window. Focus
+ determination is literally provided by the PC window manager. It is
+ clear to which application the user input is targeted.
+
+ As another example, consider the same two applications, but on a
+ "smart phone" that has a set of buttons, and next to each button,
+ there is an LCD display that can provide the user with an option.
+ This user interface can be represented using the Wireless Markup
+ Language (WML), for example.
+
+ The phone would allocate some number of buttons to each application.
+ The prepaid calling card would get one button for its "hangup"
+ command, and the recording application would get one for its "start/
+ stop" command. The user can easily determine which application to
+ interact with by pressing the appropriate button. Pressing a button
+ determines focus and provides user input, both at the same time.
+
+ Unfortunately, not all devices will have these advanced displays. A
+ PSTN gateway, or a basic IP telephone, may only have a 12-key keypad.
+ The user interfaces for these devices are provided through the Keypad
+
+
+
+Rosenberg Standards Track [Page 28]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ Markup Language (KPML). Considering once again the feature
+ interaction case above, the prepaid calling card application and the
+ call-recording application would both pass a KPML document to the
+ device. When the user presses a button on the keypad, to which
+ document does the input apply? The device does not allow the user to
+ select. A device where the user cannot provide focus is called a
+ "focusless device". This is quite a hard problem to solve. This
+ framework does not make any explicit normative recommendation, but it
+ concludes that the best option is to send the input to both user
+ interfaces unless the markup in one interface has indicated that it
+ should be suppressed from others. This is a sensible choice by
+ analogy -- it's exactly what the existing circuit-switched telephone
+ network will do. It is an explicit non-goal to provide a better
+ mechanism for feature interaction resolution than the PSTN on devices
+ that have the same user interface as they do on the PSTN. Devices
+ with better displays, such as PCs or screen phones, can benefit from
+ the capabilities of this framework, allowing the user to determine
+ which application they are interacting with.
+
+ Indeed, when a user provides input on a focusless device, the input
+ must be passed to all client-local user interfaces AND all client-
+ remote user interfaces, unless the markup tells the UI to suppress
+ the media. In the case of KPML, key events are passed to remote user
+ interfaces by encoding them as described in RFC 4733 [19]. Of
+ course, since a client cannot determine whether or not a media stream
+ terminates in a remote user interface, these key events are passed in
+ all audio media streams unless the KPML request document is used to
+ suppress them.
+
+10.2. Client-Remote UI
+
+ When the user interfaces run remotely, the determination of focus can
+ be much, much harder. There are many architectures that can be
+ deployed to handle the interaction. None are ideal. However, all
+ are beyond the scope of this specification.
+
+11. Intra Application Feature Interaction
+
+ An application can instantiate a multiplicity of user interface
+ components. For example, a single application can instantiate two
+ separate HTML components and one WML component. Furthermore, an
+ application can instantiate both client-local and client-remote user
+ interfaces.
+
+ The feature interaction issues between these components within the
+ same application are less severe. If an application has multiple
+ client user interface components, their interaction is resolved
+ identically to the inter-application case -- through focus
+
+
+
+Rosenberg Standards Track [Page 29]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ determination. However, the problems in focusless user devices (such
+ as a keypad on a telephone) generally won't exist, since the
+ application can generate user interfaces that do not overlap in their
+ usage of an input.
+
+ The real issue is that the optimal user experience frequently
+ requires some kind of coupling between the differing user interface
+ components. This is a classic problem in multi-modal user
+ interfaces, such as those described by Speech Application Language
+ Tags (SALT). As an example, consider a user interface where a user
+ can either press a labeled button to make a selection, or listen to a
+ prompt, and speak the desired selection. Ideally, when the user
+ presses the button, the prompt should cease immediately, since both
+ of them were targeted at collecting the same information in parallel.
+ Such interactions are best handled by markups that natively support
+ such interactions, such as SALT, and thus require no explicit support
+ from this framework.
+
+12. Example Call Flow
+
+ This section shows the operation of a call-recording application.
+ This application allows a user to record the media in their call by
+ clicking on a button in a web form. The application uses a
+ presentation-capable user interface component that is pushed to the
+ caller. The conventions of [17] are used to describe representation
+ of long message lines.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 30]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ A Recording App B
+ |(1) INVITE | |
+ |----------------------->| |
+ | |(2) INVITE |
+ | |----------------------->|
+ | |(3) 200 OK |
+ | |<-----------------------|
+ |(4) 200 OK | |
+ |<-----------------------| |
+ |(5) ACK | |
+ |----------------------->| |
+ | |(6) ACK |
+ | |----------------------->|
+ |(7) REFER | |
+ |<-----------------------| |
+ |(8) 200 OK | |
+ |----------------------->| |
+ |(9) NOTIFY | |
+ |----------------------->| |
+ |(10) 200 OK | |
+ |<-----------------------| |
+ |(11) HTTP GET | |
+ |----------------------->| |
+ |(12) 200 OK | |
+ |<-----------------------| |
+ |(13) NOTIFY | |
+ |----------------------->| |
+ |(14) 200 OK | |
+ |<-----------------------| |
+ |(15) HTTP POST | |
+ |----------------------->| |
+ |(16) 200 OK | |
+ |<-----------------------| |
+
+ Figure 6
+
+ First, the caller, A, sends an INVITE to set up a call (message 1).
+ Since the caller supports the framework and can handle presentation-
+ capable user interface components, it includes the Supported header
+ field indicating that the GRUU extension and the Target-Dialog header
+ field are understood, the Allow header field indicating that REFER is
+ understood, and the Contact header field that includes the "schemes"
+ header field parameter.
+
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 31]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ INVITE sip:B@example.com SIP/2.0
+ Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
+ From: Caller <sip:A@example.com>;tag=kkaz-
+ To: Callee <sip:B@example.org>
+ Call-ID: fa77as7dad8-sd98ajzz@host.example.com
+ CSeq: 1 INVITE
+ Max-Forwards: 70
+ Supported: gruu, tdialog
+ Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
+ Accept: application/sdp, text/html
+ <allOneLine>
+ Contact: <sip:A@example.com;gr=urn:uuid:f81d4fae
+ -7dec-11d0-a765-00a0c91e6bf6>;schemes="http,sip"
+ </allOneLine>
+ Content-Length: ...
+ Content-Type: application/sdp
+
+ --SDP not shown--
+
+ The proxy acts as a recording server, and forwards the INVITE to the
+ called party (message 2). It strips the Record-Route it would
+ normally insert due to the presence of the GRUU in the INVITE:
+
+ INVITE sip:B@pc.example.com SIP/2.0
+ Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK97sh
+ Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
+ From: Caller <sip:A@example.com>;tag=kkaz-
+ To: Callee <sip:B@example.org>
+ Call-ID: fa77as7dad8-sd98ajzz@host.example.com
+ CSeq: 1 INVITE
+ Max-Forwards: 70
+ Supported: gruu, tdialog
+ Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
+ Accept: application/sdp, text/html
+ <allOneLine>
+ Contact: <sip:A@example.com;gr=urn:uuid:f81d4fae
+ -7dec-11d0-a765-00a0c91e6bf6>;schemes="http,sip"
+ </allOneLine>
+ Content-Length: ...
+ Content-Type: application/sdp
+
+ --SDP not shown--
+
+ B accepts the call with a 200 OK (message 3). It does not support
+ the framework, so the various header fields are not present.
+
+
+
+
+
+
+Rosenberg Standards Track [Page 32]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ SIP/2.0 200 OK
+ Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK97sh
+ Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
+ From: Caller <sip:A@example.com>;tag=kkaz-
+ To: Callee <sip:B@example.com>;tag=7777
+ Call-ID: fa77as7dad8-sd98ajzz@host.example.com
+ CSeq: 1 INVITE
+ Contact: <sip:B@pc.example.com>
+ Content-Length: ...
+ Content-Type: application/sdp
+
+ --SDP not shown--
+
+ This 200 OK is passed back to the caller (message 4):
+
+ SIP/2.0 200 OK
+ Record-Route: <sip:app.example.com;lr>
+ Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz8
+ From: Caller <sip:A@example.com>;tag=kkaz-
+ To: Callee <sip:B@example.com>;tag=7777
+ Call-ID: fa77as7dad8-sd98ajzz@host.example.com
+ CSeq: 1 INVITE
+ Contact: <sip:B@pc.example.com>
+ Content-Length: ...
+ Content-Type: application/sdp
+
+ --SDP not shown--
+
+ The caller generates an ACK (message 5).
+
+ ACK sip:B@pc.example.com
+ Route: <sip:app.example.com;lr>
+ Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz9
+ From: Caller <sip:A@example.com>;tag=kkaz-
+ To: Callee <sip:B@example.com>;tag=7777
+ Call-ID: fa77as7dad8-sd98ajzz@host.example.com
+ CSeq: 1 ACK
+
+ The ACK is forwarded to the called party (message 6).
+
+ ACK sip:B@pc.example.com
+ Via: SIP/2.0/TLS app.example.com;branch=z9hG4bKh7s
+ Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9zz9
+ From: Caller <sip:A@example.com>;tag=kkaz-
+ To: Callee <sip:B@example.com>;tag=7777
+ Call-ID: fa77as7dad8-sd98ajzz@host.example.com
+ CSeq: 1 ACK
+
+
+
+
+Rosenberg Standards Track [Page 33]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ Now, the application decides to push a user interface component to
+ user A. So, it sends it a REFER request (message 7):
+
+ <allOneLine>
+ REFER sip:A@example.com;gr=urn:uuid:f81d4fae
+ -7dec-11d0-a765-00a0c91e6bf6 SIP/2.0
+ </allOneLine>
+ Refer-To: https://app.example.com/script.pl
+ Target-Dialog: fa77as7dad8-sd98ajzz@host.example.com
+ ;remote-tag=7777;local-tag=kkaz-
+ Require: tdialog
+ Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK9zh6
+ Max-Forwards: 70
+ From: Recorder Application <sip:app.example.com>;tag=jhgf
+ <allOneLine>
+ To: Caller <sip:A@example.com;gr=urn:uuid:f81d4fae
+ -7dec-11d0-a765-00a0c91e6bf6>
+ </allOneLine>
+ Require: tdialog
+ Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
+ Call-ID: 66676776767@app.example.com
+ CSeq: 1 REFER
+ Event: refer
+ Contact: <sip:app.example.com>
+
+ Since the recording application is the same as the authoritative
+ proxy for the domain, it resolves the Request URI to the registered
+ contact of A, and then sent there. The REFER is answered by a 200 OK
+ (message 8).
+
+ SIP/2.0 200 OK
+ Via: SIP/2.0/TLS app.example.com;branch=z9hG4bK9zh6
+ From: Recorder Application <sip:app.example.com>;tag=jhgf
+ To: Caller <sip:A@example.com>;tag=pqoew
+ Call-ID: 66676776767@app.example.com
+ Supported: gruu, tdialog
+ Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, REFER
+ <allOneLine>
+ Contact: <sip:A@example.com;gr=urn:uuid:f81d4fae
+ -7dec-11d0-a765-00a0c91e6bf6>;schemes="http,sip"
+ </allOneLine>
+ CSeq: 1 REFER
+
+
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 34]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ User A sends a NOTIFY (message 9):
+
+ NOTIFY sip:app.example.com SIP/2.0
+ Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995
+ To: Recorder Application <sip:app.example.com>;tag=jhgf
+ From: Caller <sip:A@example.com>;tag=pqoew
+ Call-ID: 66676776767@app.example.com
+ CSeq: 1 NOTIFY
+ Max-Forwards: 70
+ <allOneLine>
+ Contact: <sip:A@example.com;gr=urn:uuid:f81d4fae
+ -7dec-11d0-a765-00a0c91e6bf6>;schemes="http,sip"
+ </allOneLine>
+ Event: refer;id=93809824
+ Subscription-State: active;expires=3600
+ Content-Type: message/sipfrag;version=2.0
+ Content-Length: 20
+
+ SIP/2.0 100 Trying
+
+ And the recording server responds with a 200 OK (message 10).
+
+ SIP/2.0 200 OK
+ Via: SIP/2.0/TLS host.example.com;branch=z9hG4bK9320394238995
+ To: Recorder Application <sip:app.example.com>;tag=jhgf
+ From: Caller <sip:A@example.com>;tag=pqoew
+ Call-ID: 66676776767@app.example.com
+ CSeq: 1 NOTIFY
+
+ The REFER request contained a Target-Dialog header field parameter
+ with a valid dialog identifier. Furthermore, all of the signaling
+ was over TLS and the dialog identifiers contain sufficient
+ randomness. As such, the caller, A, automatically authorizes the
+ application. It then acts on the Refer-To URI, fetching the script
+ from app.example.com (message 11). The response, message 12,
+ contains a web application that the user can click on to enable
+ recording. Because the client executed the URL in the Refer-To, it
+ generates another NOTIFY to the application, informing it of the
+ successful response (message 13). This is answered with a 200 OK
+ (message 14). When the user clicks on the link (message 15), the
+ results are posted to the server, and an updated display is provided
+ (message 16).
+
+
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 35]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+13. Security Considerations
+
+ There are many security considerations associated with this
+ framework. It allows applications in the network to instantiate user
+ interface components on a client device. Such instantiations need to
+ be from authenticated applications, and also need to be authorized to
+ place a UI into the client. Indeed, the stronger requirement is
+ authorization. It is not as important to know the name of the
+ provider of the application, as it is to know that the provider is
+ authorized to instantiate components.
+
+ This specification defines specific authorization techniques and
+ requirements. Automatic authorization is granted if the application
+ can prove that it is on the call path, or is trusted by an element on
+ the call path. As documented above, this can be accomplished by the
+ use of cryptographically random dialog identifiers and the usage of
+ SIPS for message confidentiality. It is RECOMMENDED that SIPS be
+ implemented by user agents compliant to this specification. This
+ does not represent a change from the requirements in RFC 3261.
+
+14. Contributors
+
+ This document was produced as a result of discussions amongst the
+ application interaction design team. All members of this team
+ contributed significantly to the ideas embodied in this document.
+ The members of this team were:
+
+ Eric Burger
+ Cullen Jennings
+ Robert Fairlie-Cuninghame
+
+15. Acknowledgements
+
+ The authors would like to thank Martin Dolly and Rohan Mahy for their
+ input and comments. Thanks to Allison Mankin for her support of this
+ work.
+
+16. References
+
+16.1. Normative References
+
+ [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [2] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
+ Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP:
+ Session Initiation Protocol", RFC 3261, June 2002.
+
+
+
+
+Rosenberg Standards Track [Page 36]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ [3] Rosenberg, J. and H. Schulzrinne, "Reliability of Provisional
+ Responses in Session Initiation Protocol (SIP)", RFC 3262,
+ June 2002.
+
+ [4] Roach, A., "Session Initiation Protocol (SIP)-Specific Event
+ Notification", RFC 3265, June 2002.
+
+ [5] McGlashan, S., Lucas, B., Porter, B., Rehor, K., Burnett, D.,
+ Carter, J., Ferrans, J., and A. Hunt, "Voice Extensible Markup
+ Language (VoiceXML) Version 2.0", W3C CR CR-voicexml20-
+ 20030220, February 2003.
+
+ [6] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Indicating
+ User Agent Capabilities in the Session Initiation Protocol
+ (SIP)", RFC 3840, August 2004.
+
+ [7] Sparks, R., "The Session Initiation Protocol (SIP) Refer
+ Method", RFC 3515, April 2003.
+
+ [8] Burger, E. and M. Dolly, "A Session Initiation Protocol (SIP)
+ Event Package for Key Press Stimulus (KPML)", RFC 4730,
+ November 2006.
+
+ [9] Rosenberg, J., "Obtaining and Using Globally Routable User
+ Agent URIs (GRUUs) in the Session Initiation Protocol (SIP)",
+ RFC 5627, October 2009.
+
+ [10] Rosenberg, J., "Request Authorization through Dialog
+ Identification in the Session Initiation Protocol (SIP)",
+ RFC 4538, June 2006.
+
+16.2. Informative References
+
+ [11] Peterson, J. and C. Jennings, "Enhancements for Authenticated
+ Identity Management in the Session Initiation Protocol (SIP)",
+ RFC 4474, August 2006.
+
+ [12] Day, M., Rosenberg, J., and H. Sugano, "A Model for Presence
+ and Instant Messaging", RFC 2778, February 2000.
+
+ [13] Jennings, C., Peterson, J., and M. Watson, "Private Extensions
+ to the Session Initiation Protocol (SIP) for Asserted Identity
+ within Trusted Networks", RFC 3325, November 2002.
+
+ [14] Rosenberg, J., "A Framework for Conferencing with the Session
+ Initiation Protocol (SIP)", RFC 4353, February 2006.
+
+
+
+
+
+Rosenberg Standards Track [Page 37]
+
+RFC 5629 App Interaction Framework October 2009
+
+
+ [15] Rosenberg, J., Schulzrinne, H., and P. Kyzivat, "Caller
+ Preferences for the Session Initiation Protocol (SIP)",
+ RFC 3841, August 2004.
+
+ [16] Rosenberg, J., Schulzrinne, H., and R. Mahy, "An INVITE-
+ Initiated Dialog Event Package for the Session Initiation
+ Protocol (SIP)", RFC 4235, November 2005.
+
+ [17] Sparks, R., Hawrylyshen, A., Johnston, A., Rosenberg, J., and
+ H. Schulzrinne, "Session Initiation Protocol (SIP) Torture Test
+ Messages", RFC 4475, May 2006.
+
+ [18] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
+ "RTP: A Transport Protocol for Real-Time Applications", STD 64,
+ RFC 3550, July 2003.
+
+ [19] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF Digits,
+ Telephony Tones, and Telephony Signals", RFC 4733, December
+ 2006.
+
+ [20] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
+ Session Description Protocol (SDP)", RFC 3264, June 2002.
+
+ [21] Rosenberg, J., "A Session Initiation Protocol (SIP) Event
+ Package for Registrations", RFC 3680, March 2004.
+
+Author's Address
+
+ Jonathan Rosenberg
+ Cisco Systems
+ 600 Lanidex Plaza
+ Parsippany, NJ 07054
+ US
+
+ Phone: +1 973 952-5000
+ EMail: jdrosen@cisco.com
+ URI: http://www.jdrosen.net
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rosenberg Standards Track [Page 38]
+