summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc4463.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc4463.txt')
-rw-r--r--doc/rfc/rfc4463.txt4819
1 files changed, 4819 insertions, 0 deletions
diff --git a/doc/rfc/rfc4463.txt b/doc/rfc/rfc4463.txt
new file mode 100644
index 0000000..37bb4eb
--- /dev/null
+++ b/doc/rfc/rfc4463.txt
@@ -0,0 +1,4819 @@
+
+
+
+
+
+
+Network Working Group S. Shanmugham
+Request for Comments: 4463 Cisco Systems, Inc.
+Category: Informational P. Monaco
+ Nuance Communications
+ B. Eberman
+ Speechworks Inc.
+ April 2006
+
+
+ A Media Resource Control Protocol (MRCP)
+ Developed by Cisco, Nuance, and Speechworks
+
+Status of This Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2006).
+
+IESG Note
+
+ This RFC is not a candidate for any level of Internet Standard. The
+ IETF disclaims any knowledge of the fitness of this RFC for any
+ purpose and in particular notes that the decision to publish is not
+ based on IETF review for such things as security, congestion control,
+ or inappropriate interaction with deployed protocols. The RFC Editor
+ has chosen to publish this document at its discretion. Readers of
+ this document should exercise caution in evaluating its value for
+ implementation and deployment. See RFC 3932 for more information.
+
+ Note that this document uses a MIME type 'application/mrcp' which has
+ not been registered with the IANA, and is therefore not recognized as
+ a standard IETF MIME type. The historical value of this document as
+ an ancestor to ongoing standardization in this space, however, makes
+ the publication of this document meaningful.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 1]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+Abstract
+
+ This document describes a Media Resource Control Protocol (MRCP) that
+ was developed jointly by Cisco Systems, Inc., Nuance Communications,
+ and Speechworks, Inc. It is published as an RFC as input for further
+ IETF development in this area.
+
+ MRCP controls media service resources like speech synthesizers,
+ recognizers, signal generators, signal detectors, fax servers, etc.,
+ over a network. This protocol is designed to work with streaming
+ protocols like RTSP (Real Time Streaming Protocol) or SIP (Session
+ Initiation Protocol), which help establish control connections to
+ external media streaming devices, and media delivery mechanisms like
+ RTP (Real Time Protocol).
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 2. Architecture ....................................................4
+ 2.1. Resources and Services .....................................4
+ 2.2. Server and Resource Addressing .............................5
+ 3. MRCP Protocol Basics ............................................5
+ 3.1. Establishing Control Session and Media Streams .............5
+ 3.2. MRCP over RTSP .............................................6
+ 3.3. Media Streams and RTP Ports ................................8
+ 4. Notational Conventions ..........................................8
+ 5. MRCP Specification ..............................................9
+ 5.1. Request ...................................................10
+ 5.2. Response ..................................................10
+ 5.3. Event .....................................................12
+ 5.4. Message Headers ...........................................12
+ 6. Media Server ...................................................19
+ 6.1. Media Server Session ......................................19
+ 7. Speech Synthesizer Resource ....................................21
+ 7.1. Synthesizer State Machine .................................22
+ 7.2. Synthesizer Methods .......................................22
+ 7.3. Synthesizer Events ........................................23
+ 7.4. Synthesizer Header Fields .................................23
+ 7.5. Synthesizer Message Body ..................................29
+ 7.6. SET-PARAMS ................................................32
+ 7.7. GET-PARAMS ................................................32
+ 7.8. SPEAK .....................................................33
+ 7.9. STOP ......................................................34
+ 7.10. BARGE-IN-OCCURRED ........................................35
+ 7.11. PAUSE ....................................................37
+ 7.12. RESUME ...................................................37
+ 7.13. CONTROL ..................................................38
+ 7.14. SPEAK-COMPLETE ...........................................40
+
+
+
+Shanmugham, et al. Informational [Page 2]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ 7.15. SPEECH-MARKER ............................................41
+ 8. Speech Recognizer Resource .....................................42
+ 8.1. Recognizer State Machine ..................................42
+ 8.2. Recognizer Methods ........................................42
+ 8.3. Recognizer Events .........................................43
+ 8.4. Recognizer Header Fields ..................................43
+ 8.5. Recognizer Message Body ...................................51
+ 8.6. SET-PARAMS ................................................56
+ 8.7. GET-PARAMS ................................................56
+ 8.8. DEFINE-GRAMMAR ............................................57
+ 8.9. RECOGNIZE .................................................60
+ 8.10. STOP .....................................................63
+ 8.11. GET-RESULT ...............................................64
+ 8.12. START-OF-SPEECH ..........................................64
+ 8.13. RECOGNITION-START-TIMERS .................................65
+ 8.14. RECOGNITON-COMPLETE ......................................65
+ 8.15. DTMF Detection ...........................................67
+ 9. Future Study ...................................................67
+ 10. Security Considerations .......................................67
+ 11. RTSP-Based Examples ...........................................67
+ 12. Informative References ........................................74
+ Appendix A. ABNF Message Definitions ..............................76
+ Appendix B. Acknowledgements ......................................84
+
+1. Introduction
+
+ The Media Resource Control Protocol (MRCP) is designed to provide a
+ mechanism for a client device requiring audio/video stream processing
+ to control processing resources on the network. These media
+ processing resources may be speech recognizers (a.k.a. Automatic-
+ Speech-Recognition (ASR) engines), speech synthesizers (a.k.a. Text-
+ To-Speech (TTS) engines), fax, signal detectors, etc. MRCP allows
+ implementation of distributed Interactive Voice Response platforms,
+ for example VoiceXML [6] interpreters. The MRCP protocol defines the
+ requests, responses, and events needed to control the media
+ processing resources. The MRCP protocol defines the state machine
+ for each resource and the required state transitions for each request
+ and server-generated event.
+
+ The MRCP protocol does not address how the control session is
+ established with the server and relies on the Real Time Streaming
+ Protocol (RTSP) [2] to establish and maintain the session. The
+ session control protocol is also responsible for establishing the
+ media connection from the client to the network server. The MRCP
+ protocol and its messaging is designed to be carried over RTSP or
+ another protocol as a MIME-type similar to the Session Description
+ Protocol (SDP) [5].
+
+
+
+
+Shanmugham, et al. Informational [Page 3]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [8].
+
+2. Architecture
+
+ The system consists of a client that requires media streams generated
+ or needs media streams processed and a server that has the resources
+ or devices to process or generate the streams. The client
+ establishes a control session with the server for media processing
+ using a protocol such as RTSP. This will also set up and establish
+ the RTP stream between the client and the server or another RTP
+ endpoint. Each resource needed in processing or generating the
+ stream is addressed or referred to by a URL. The client can now use
+ MRCP messages to control the media resources and affect how they
+ process or generate the media stream.
+
+ |--------------------|
+ ||------------------|| |----------------------|
+ || Application Layer|| ||--------------------||
+ ||------------------|| || TTS | ASR | Fax ||
+ || ASR/TTS API || ||Plugin|Plugin|Plugin||
+ ||------------------|| || on | on | on ||
+ || MRCP Core || || MRCP | MRCP | MRCP ||
+ || Protocol Stack || ||--------------------||
+ ||------------------|| || RTSP Stack ||
+ || RTSP Stack || || ||
+ ||------------------|| ||--------------------||
+ || TCP/IP Stack ||========IP=========|| TCP/IP Stack ||
+ ||------------------|| ||--------------------||
+ |--------------------| |----------------------|
+
+ MRCP client Real-time Streaming MRCP
+ media server
+
+2.1. Resources and Services
+
+ The server is set up to offer a certain set of resources and services
+ to the client. These resources are of 3 types.
+
+ Transmission Resources
+
+ These are resources that are capable of generating real-time streams,
+ like signal generators that generate tones and sounds of certain
+ frequencies and patterns, and speech synthesizers that generate
+ spoken audio streams, etc.
+
+
+
+
+
+Shanmugham, et al. Informational [Page 4]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ Reception Resources
+
+ These are resources that receive and process streaming data like
+ signal detectors and speech recognizers.
+
+ Dual Mode Resources
+
+ These are resources that both send and receive data like a fax
+ resource, capable of sending or receiving fax through a two-way RTP
+ stream.
+
+2.2. Server and Resource Addressing
+
+ The server as a whole is addressed using a container URL, and the
+ individual resources the server has to offer are reached by
+ individual resource URLs within the container URL.
+
+ RTSP Example:
+
+ A media server or container URL like,
+
+ rtsp://mediaserver.com/media/
+
+ may contain one or more resource URLs of the form,
+
+ rtsp://mediaserver.com/media/speechrecognizer/
+ rtsp://mediaserver.com/media/speechsynthesizer/
+ rtsp://mediaserver.com/media/fax/
+
+3. MRCP Protocol Basics
+
+ The message format for MRCP is text based, with mechanisms to carry
+ embedded binary data. This allows data like recognition grammars,
+ recognition results, synthesizer speech markup, etc., to be carried
+ in the MRCP message between the client and the server resource. The
+ protocol does not address session control management, media
+ management, reliable sequencing, and delivery or server or resource
+ addressing. These are left to a protocol like SIP or RTSP. MRCP
+ addresses the issue of controlling and communicating with the
+ resource processing the stream, and defines the requests, responses,
+ and events needed to do that.
+
+3.1. Establishing Control Session and Media Streams
+
+ The control session between the client and the server is established
+ using a protocol like RTSP. This protocol will also set up the
+ appropriate RTP streams between the server and the client, allocating
+ ports and setting up transport parameters as needed. Each control
+
+
+
+Shanmugham, et al. Informational [Page 5]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ session is identified by a unique session-id. The format, usage, and
+ life cycle of the session-id is in accordance with the RTSP protocol.
+ The resources within the session are addressed by the individual
+ resource URLs.
+
+ The MRCP protocol is designed to work with and tunnel through another
+ protocol like RTSP, and augment its capabilities. MRCP relies on
+ RTSP headers for sequencing, reliability, and addressing to make sure
+ that messages get delivered reliably and in the correct order and to
+ the right resource. The MRCP messages are carried in the RTSP
+ message body. The media server delivers the MRCP message to the
+ appropriate resource or device by looking at the session-level
+ message headers and URL information. Another protocol, such as SIP
+ [4], could be used for tunneling MRCP messages.
+
+3.2. MRCP over RTSP
+
+ RTSP supports both TCP and UDP mechanisms for the client to talk to
+ the server and is differentiated by the RTSP URL. All MRCP based
+ media servers MUST support TCP for transport and MAY support UDP.
+
+ In RTSP, the ANNOUNCE method/response MUST be used to carry MRCP
+ request/responses between the client and the server. MRCP messages
+ MUST NOT be communicated in the RTSP SETUP or TEARDOWN messages.
+
+ Currently all RTSP messages are request/responses and there is no
+ support for asynchronous events in RTSP. This is because RTSP was
+ designed to work over TCP or UDP and, hence, could not assume
+ reliability in the underlying protocol. Hence, when using MRCP over
+ RTSP, an asynchronous event from the MRCP server is packaged in a
+ server-initiated ANNOUNCE method/response communication. A future
+ RTSP extension to send asynchronous events from the server to the
+ client would provide an alternate vehicle to carry such asynchronous
+ MRCP events from the server.
+
+ An RTSP session is created when an RTSP SETUP message is sent from
+ the client to a server and is addressed to a server URL or any one of
+ its resource URLs without specifying a session-id. The server will
+ establish a session context and will respond with a session-id to the
+ client. This sequence will also set up the RTP transport parameters
+ between the client and the server, and then the server will be ready
+ to receive or send media streams. If the client wants to attach an
+ additional resource to an existing session, the client should send
+ that session's ID in the subsequent SETUP message.
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 6]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ When a media server implementing MRCP over RTSP receives a PLAY,
+ RECORD, or PAUSE RTSP method from an MRCP resource URL, it should
+ respond with an RTSP 405 "Method not Allowed" response. For these
+ resources, the only allowed RTSP methods are SETUP, TEARDOWN,
+ DESCRIBE, and ANNOUNCE.
+
+ Example 1:
+
+ C->S: ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
+ CSeq:4
+ Session:12345678
+ Content-Type:application/mrcp
+ Content-Length:223
+
+ SPEAK 543257 MRCP/1.0
+ Voice-gender:neutral
+ Voice-category:teenager
+ Prosody-volume:medium
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+
+ S->C: RTSP/1.0 200 OK
+ CSeq: 4
+ Session:12345678
+ RTP-Info:url=rtsp://media.server.com/media/synthesizer;
+ seq=9810092;rtptime=3450012
+ Content-Type:application/mrcp
+ Content-Length:52
+
+ MRCP/1.0 543257 200 IN-PROGRESS
+
+ S->C: ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
+ CSeq:6
+ Session:12345678
+
+
+
+Shanmugham, et al. Informational [Page 7]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ Content-Type:application/mrcp
+ Content-Length:123
+
+ SPEAK-COMPLETE 543257 COMPLETE MRCP/1.0
+
+ C->S: RTSP/1.0 200 OK
+ CSeq:6
+
+ For the sake of brevity, most examples from here on show only the
+ MRCP messages and do not show the RTSP message and headers in which
+ they are tunneled. Also, RTSP messages such as response that are not
+ carrying an MRCP message are also left out.
+
+3.3. Media Streams and RTP Ports
+
+ A single set of RTP/RTCP ports is negotiated and shared between the
+ MRCP client and server when multiple media processing resources, such
+ as automatic speech recognition (ASR) engines and text to speech
+ (TTS) engines, are used for a single session. The individual
+ resource instances allocated on the server under a common session
+ identifier will feed from/to that single RTP stream.
+
+ The client can send multiple media streams towards the server,
+ differentiated by using different synchronized source (SSRC)
+ identifier values. Similarly the server can use multiple
+ Synchronized Source (SSRC) identifier values to differentiate media
+ streams originating from the individual transmission resource URLs if
+ more than one exists. The individual resources may, on the other
+ hand, work together to send just one stream to the client. This is
+ up to the implementation of the media server.
+
+4. Notational Conventions
+
+ Since many of the definitions and syntax are identical to HTTP/1.1,
+ this specification only points to the section where they are defined
+ rather than copying it. For brevity, [HX.Y] refers to Section X.Y of
+ the current HTTP/1.1 specification (RFC 2616 [1]).
+
+ All the mechanisms specified in this document are described in both
+ prose and an augmented Backus-Naur form (ABNF) similar to that used
+ in [H2.1]. It is described in detail in RFC 4234 [3].
+
+ The ABNF provided along with the descriptive text is informative in
+ nature and may not be complete. The complete message format in ABNF
+ form is provided in Appendix A and is the normative format
+ definition.
+
+
+
+
+
+Shanmugham, et al. Informational [Page 8]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+5. MRCP Specification
+
+ The MRCP PDU is textual using an ISO 10646 character set in the UTF-8
+ encoding (RFC 3629 [12]) to allow many different languages to be
+ represented. However, to assist in compact representations, MRCP
+ also allows other character sets such as ISO 8859-1 to be used when
+ desired. The MRCP protocol headers and field names use only the
+ US-ASCII subset of UTF-8. Internationalization only applies to
+ certain fields like grammar, results, speech markup, etc., and not to
+ MRCP as a whole.
+
+ Lines are terminated by CRLF, but receivers SHOULD be prepared to
+ also interpret CR and LF by themselves as line terminators. Also,
+ some parameters in the PDU may contain binary data or a record
+ spanning multiple lines. Such fields have a length value associated
+ with the parameter, which indicates the number of octets immediately
+ following the parameter.
+
+ The whole MRCP PDU is encoded in the body of the session level
+ message as a MIME entity of type application/mrcp. The individual
+ MRCP messages do not have addressing information regarding which
+ resource the request/response are to/from. Instead, the MRCP message
+ relies on the header of the session level message carrying it to
+ deliver the request to the appropriate resource, or to figure out who
+ the response or event is from.
+
+ The MRCP message set consists of requests from the client to the
+ server, responses from the server to the client and asynchronous
+ events from the server to the client. All these messages consist of
+ a start-line, one or more header fields (also known as "headers"), an
+ empty line (i.e., a line with nothing preceding the CRLF) indicating
+ the end of the header fields, and an optional message body.
+
+ generic-message = start-line
+ message-header
+ CRLF
+ [ message-body ]
+
+ message-body = *OCTET
+
+ start-line = request-line / status-line / event-line
+
+ The message-body contains resource-specific and message-specific data
+ that needs to be carried between the client and server as a MIME
+ entity. The information contained here and the actual MIME-types
+ used to carry the data are specified later when addressing the
+ specific messages.
+
+
+
+
+Shanmugham, et al. Informational [Page 9]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ If a message contains data in the message body, the header fields
+ will contain content-headers indicating the MIME-type and encoding of
+ the data in the message body.
+
+5.1. Request
+
+ An MRCP request consists of a Request line followed by zero or more
+ parameters as part of the message headers and an optional message
+ body containing data specific to the request message.
+
+ The Request message from a client to the server includes, within the
+ first line, the method to be applied, a method tag for that request,
+ and the version of protocol in use.
+
+ request-line = method-name SP request-id SP
+ mrcp-version CRLF
+
+ The request-id field is a unique identifier created by the client and
+ sent to the server. The server resource should use this identifier
+ in its response to this request. If the request does not complete
+ with the response, future asynchronous events associated with this
+ request MUST carry the request-id.
+
+ request-id = 1*DIGIT
+
+ The method-name field identifies the specific request that the client
+ is making to the server. Each resource supports a certain list of
+ requests or methods that can be issued to it, and will be addressed
+ in later sections.
+
+ method-name = synthesizer-method
+ / recognizer-method
+
+ The mrcp-version field is the MRCP protocol version that is being
+ used by the client.
+
+ mrcp-version = "MRCP" "/" 1*DIGIT "." 1*DIGIT
+
+5.2. Response
+
+ After receiving and interpreting the request message, the server
+ resource responds with an MRCP response message. It consists of a
+ status line optionally followed by a message body.
+
+ response-line = mrcp-version SP request-id SP status-code SP
+ request-state CRLF
+
+
+
+
+
+Shanmugham, et al. Informational [Page 10]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ The mrcp-version field used here is similar to the one used in the
+ Request Line and indicates the version of MRCP protocol running on
+ the server.
+
+ The request-id used in the response MUST match the one sent in the
+ corresponding request message.
+
+ The status-code field is a 3-digit code representing the success or
+ failure or other status of the request.
+
+ The request-state field indicates if the job initiated by the Request
+ is PENDING, IN-PROGRESS, or COMPLETE. The COMPLETE status means that
+ the Request was processed to completion and that there will be no
+ more events from that resource to the client with that request-id.
+ The PENDING status means that the job has been placed on a queue and
+ will be processed in first-in-first-out order. The IN-PROGRESS
+ status means that the request is being processed and is not yet
+ complete. A PENDING or IN-PROGRESS status indicates that further
+ Event messages will be delivered with that request-id.
+
+ request-state = "COMPLETE"
+ / "IN-PROGRESS"
+ / "PENDING"
+
+5.2.1. Status Codes
+
+ The status codes are classified under the Success(2XX) codes and the
+ Failure(4XX) codes.
+
+5.2.1.1. Success 2xx
+
+ 200 Success
+ 201 Success with some optional parameters ignored.
+
+5.2.1.2. Failure 4xx
+
+ 401 Method not allowed
+ 402 Method not valid in this state
+ 403 Unsupported Parameter
+ 404 Illegal Value for Parameter
+ 405 Not found (e.g., Resource URI not initialized
+ or doesn't exist)
+ 406 Mandatory Parameter Missing
+ 407 Method or Operation Failed (e.g., Grammar compilation
+ failed in the recognizer. Detailed cause codes MAY BE
+ available through a resource specific header field.)
+ 408 Unrecognized or unsupported message entity
+
+
+
+
+Shanmugham, et al. Informational [Page 11]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ 409 Unsupported Parameter Value
+ 421-499 Resource specific Failure codes
+
+5.3. Event
+
+ The server resource may need to communicate a change in state or the
+ occurrence of a certain event to the client. These messages are used
+ when a request does not complete immediately and the response returns
+ a status of PENDING or IN-PROGRESS. The intermediate results and
+ events of the request are indicated to the client through the event
+ message from the server. Events have the request-id of the request
+ that is in progress and is generating these events and status value.
+ The status value is COMPLETE if the request is done and this was the
+ last event, else it is IN-PROGRESS.
+
+ event-line = event-name SP request-id SP request-state SP
+ mrcp-version CRLF
+
+ The mrcp-version used here is identical to the one used in the
+ Request/Response Line and indicates the version of MRCP protocol
+ running on the server.
+
+ The request-id used in the event should match the one sent in the
+ request that caused this event.
+
+ The request-state indicates if the Request/Command causing this event
+ is complete or still in progress, and is the same as the one
+ mentioned in Section 5.2. The final event will contain a COMPLETE
+ status indicating the completion of the request.
+
+ The event-name identifies the nature of the event generated by the
+ media resource. The set of valid event names are dependent on the
+ resource generating it, and will be addressed in later sections.
+
+ event-name = synthesizer-event
+ / recognizer-event
+
+5.4. Message Headers
+
+ MRCP header fields, which include general-header (Section 5.4) and
+ resource-specific-header (Sections 7.4 and 8.4), follow the same
+ generic format as that given in Section 2.1 of RFC 2822 [7]. Each
+ header field consists of a name followed by a colon (":") and the
+ field value. Field names are case-insensitive. The field value MAY
+ be preceded by any amount of linear whitespace (LWS), though a single
+ SP is preferred. Header fields can be extended over multiple lines
+ by preceding each extra line with at least one SP or HT.
+
+
+
+
+Shanmugham, et al. Informational [Page 12]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ message-header = 1*(generic-header / resource-header)
+
+ The order in which header fields with differing field names are
+ received is not significant. However, it is "good practice" to send
+ general-header fields first, followed by request-header or response-
+ header fields, and ending with the entity-header fields.
+
+ Multiple message-header fields with the same field-name MAY be
+ present in a message if and only if the entire field value for that
+ header field is defined as a comma-separated list (i.e., #(values)).
+
+ It MUST be possible to combine the multiple header fields into one
+ "field-name:field-value" pair, without changing the semantics of the
+ message, by appending each subsequent field-value to the first, each
+ separated by a comma. Therefore, the order in which header fields
+ with the same field-name are received is significant to the
+ interpretation of the combined field value, and thus a proxy MUST NOT
+ change the order of these field values when a message is forwarded.
+
+ Generic Headers
+
+ generic-header = active-request-id-list
+ / proxy-sync-id
+ / content-id
+ / content-type
+ / content-length
+ / content-base
+ / content-location
+ / content-encoding
+ / cache-control
+ / logging-tag
+
+ All headers in MRCP will be case insensitive, consistent with HTTP
+ and RTSP protocol header definitions.
+
+5.4.1. Active-Request-Id-List
+
+ In a request, this field indicates the list of request-ids to which
+ it should apply. This is useful when there are multiple Requests
+ that are PENDING or IN-PROGRESS and you want this request to apply to
+ one or more of these specifically.
+
+ In a response, this field returns the list of request-ids that the
+ operation modified or were in progress or just completed. There
+ could be one or more requests that returned a request-state of
+ PENDING or IN-PROGRESS. When a method affecting one or more PENDING
+
+
+
+
+
+Shanmugham, et al. Informational [Page 13]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ or IN-PROGRESS requests is sent from the client to the server, the
+ response MUST contain the list of request-ids that were affected in
+ this header field.
+
+ The active-request-id-list is only used in requests and responses,
+ not in events.
+
+ For example, if a STOP request with no active-request-id-list is sent
+ to a synthesizer resource (a wildcard STOP) that has one or more
+ SPEAK requests in the PENDING or IN-PROGRESS state, all SPEAK
+ requests MUST be cancelled, including the one IN-PROGRESS. In
+ addition, the response to the STOP request would contain the
+ request-id of all the SPEAK requests that were terminated in the
+ active-request-id-list. In this case, no SPEAK-COMPLETE or
+ RECOGNITION-COMPLETE events will be sent for these terminated
+ requests.
+
+ active-request-id-list = "Active-Request-Id-List" ":" request-id
+ *("," request-id) CRLF
+
+5.4.2. Proxy-Sync-Id
+
+ When any server resource generates a barge-in-able event, it will
+ generate a unique Tag and send it as a header field in an event to
+ the client. The client then acts as a proxy to the server resource
+ and sends a BARGE-IN-OCCURRED method (Section 7.10) to the
+ synthesizer server resource with the Proxy-Sync-Id it received from
+ the server resource. When the recognizer and synthesizer resources
+ are part of the same session, they may choose to work together to
+ achieve quicker interaction and response. Here, the proxy-sync-id
+ helps the resource receiving the event, proxied by the client, to
+ decide if this event has been processed through a direct interaction
+ of the resources.
+
+ proxy-sync-id = "Proxy-Sync-Id" ":" 1*ALPHA CRLF
+
+5.4.3. Accept-Charset
+
+ See [H14.2]. This specifies the acceptable character set for
+ entities returned in the response or events associated with this
+ request. This is useful in specifying the character set to use in
+ the Natural Language Semantics Markup Language (NLSML) results of a
+ RECOGNITON-COMPLETE event.
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 14]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+5.4.4. Content-Type
+
+ See [H14.17]. Note that the content types suitable for MRCP are
+ restricted to speech markup, grammar, recognition results, etc., and
+ are specified later in this document. The multi-part content type
+ "multi-part/mixed" is supported to communicate multiple of the above
+ mentioned contents, in which case the body parts cannot contain any
+ MRCP specific headers.
+
+5.4.5. Content-Id
+
+ This field contains an ID or name for the content, by which it can be
+ referred to. The definition of this field conforms to RFC 2392 [14],
+ RFC 2822 [7], RFC 2046 [13] and is needed in multi-part messages. In
+ MRCP whenever the content needs to be stored, by either the client or
+ the server, it is stored associated with this ID. Such content can
+ be referenced during the session in URI form using the session:URI
+ scheme described in a later section.
+
+5.4.6. Content-Base
+
+ The content-base entity-header field may be used to specify the base
+ URI for resolving relative URLs within the entity.
+
+ content-base = "Content-Base" ":" absoluteURI CRLF
+
+ Note, however, that the base URI of the contents within the entity-
+ body may be redefined within that entity-body. An example of this
+ would be a multi-part MIME entity, which in turn can have multiple
+ entities within it.
+
+5.4.7. Content-Encoding
+
+ The content-encoding entity-header field is used as a modifier to the
+ media-type. When present, its value indicates what additional
+ content coding has been applied to the entity-body, and thus what
+ decoding mechanisms must be applied in order to obtain the media-type
+ referenced by the content-type header field. Content-encoding is
+ primarily used to allow a document to be compressed without losing
+ the identity of its underlying media type.
+
+ content-encoding = "Content-Encoding" ":"
+ *WSP content-coding
+ *(*WSP "," *WSP content-coding *WSP )
+ CRLF
+
+ content-coding = token
+
+
+
+
+Shanmugham, et al. Informational [Page 15]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ token = 1*(alphanum / "-" / "." / "!" / "%" / "*"
+ / "_" / "+" / "`" / "'" / "~" )
+
+ Content coding is defined in [H3.5]. An example of its use is
+
+ Content-Encoding:gzip
+
+ If multiple encodings have been applied to an entity, the content
+ codings MUST be listed in the order in which they were applied.
+
+5.4.8. Content-Location
+
+ The content-location entity-header field MAY BE used to supply the
+ resource location for the entity enclosed in the message when that
+ entity is accessible from a location separate from the requested
+ resource's URI.
+
+ content-location = "Content-Location" ":" ( absoluteURI /
+ relativeURI ) CRLF
+
+ The content-location value is a statement of the location of the
+ resource corresponding to this particular entity at the time of the
+ request. The media server MAY use this header field to optimize
+ certain operations. When providing this header field, the entity
+ being sent should not have been modified from what was retrieved from
+ the content-location URI.
+
+ For example, if the client provided a grammar markup inline, and it
+ had previously retrieved it from a certain URI, that URI can be
+ provided as part of the entity, using the content-location header
+ field. This allows a resource like the recognizer to look into its
+ cache to see if this grammar was previously retrieved, compiled, and
+ cached. In which case, it might optimize by using the previously
+ compiled grammar object.
+
+ If the content-location is a relative URI, the relative URI is
+ interpreted relative to the content-base URI.
+
+5.4.9. Content-Length
+
+ This field contains the length of the content of the message body
+ (i.e., after the double CRLF following the last header field).
+ Unlike HTTP, it MUST be included in all messages that carry content
+ beyond the header portion of the message. If it is missing, a
+ default value of zero is assumed. It is interpreted according to
+ [H14.13].
+
+
+
+
+
+Shanmugham, et al. Informational [Page 16]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+5.4.10. Cache-Control
+
+ If the media server plans on implementing caching, it MUST adhere to
+ the cache correctness rules of HTTP 1.1 (RFC2616), when accessing and
+ caching HTTP URI. In particular, the expires and cache-control
+ headers of the cached URI or document must be honored and will always
+ take precedence over the Cache-Control defaults set by this header
+ field. The cache-control directives are used to define the default
+ caching algorithms on the media server for the session or request.
+ The scope of the directive is based on the method it is sent on. If
+ the directives are sent on a SET-PARAMS method, it SHOULD apply for
+ all requests for documents the media server may make in that session.
+ If the directives are sent on any other messages, they MUST only
+ apply to document requests the media server needs to make for that
+ method. An empty cache-control header on the GET-PARAMS method is a
+ request for the media server to return the current cache-control
+ directives setting on the server.
+
+ cache-control = "Cache-Control" ":" *WSP cache-directive
+ *( *WSP "," *WSP cache-directive *WSP )
+ CRLF
+
+ cache-directive = "max-age" "=" delta-seconds
+ / "max-stale" "=" delta-seconds
+ / "min-fresh" "=" delta-seconds
+
+ delta-seconds = 1*DIGIT
+
+ Here, delta-seconds is a time value to be specified as an integer
+ number of seconds, represented in decimal, after the time that the
+ message response or data was received by the media server.
+
+ These directives allow the media server to override the basic
+ expiration mechanism.
+
+ max-age
+
+ Indicates that the client is OK with the media server using a
+ response whose age is no greater than the specified time in
+ seconds. Unless a max-stale directive is also included, the
+ client is not willing to accept the media server using a stale
+ response.
+
+ min-fresh
+
+ Indicates that the client is willing to accept the media server
+ using a response whose freshness lifetime is no less than its
+ current age plus the specified time in seconds. That is, the
+
+
+
+Shanmugham, et al. Informational [Page 17]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ client wants the media server to use a response that will still be
+ fresh for at least the specified number of seconds.
+
+ max-stale
+
+ Indicates that the client is willing to accept the media server
+ using a response that has exceeded its expiration time. If max-
+ stale is assigned a value, then the client is willing to accept
+ the media server using a response that has exceeded its expiration
+ time by no more than the specified number of seconds. If no value
+ is assigned to max-stale, then the client is willing to accept the
+ media server using a stale response of any age.
+
+ The media server cache MAY BE requested to use stale response/data
+ without validation, but only if this does not conflict with any
+ "MUST"-level requirements concerning cache validation (e.g., a
+ "must-revalidate" cache-control directive) in the HTTP 1.1
+ specification pertaining the URI.
+
+ If both the MRCP cache-control directive and the cached entry on the
+ media server include "max-age" directives, then the lesser of the two
+ values is used for determining the freshness of the cached entry for
+ that request.
+
+5.4.11. Logging-Tag
+
+ This header field MAY BE sent as part of a SET-PARAMS/GET-PARAMS
+ method to set the logging tag for logs generated by the media server.
+ Once set, the value persists until a new value is set or the session
+ is ended. The MRCP server should provide a mechanism to subset its
+ output logs so that system administrators can examine or extract only
+ the log file portion during which the logging tag was set to a
+ certain value.
+
+ MRCP clients using this feature should take care to ensure that no
+ two clients specify the same logging tag. In the event that two
+ clients specify the same logging tag, the effect on the MRCP server's
+ output logs in undefined.
+
+ logging-tag = "Logging-Tag" ":" 1*ALPHA CRLF
+
+
+
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 18]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+6. Media Server
+
+ The capability of media server resources can be found using the RTSP
+ DESCRIBE mechanism. When a client issues an RTSP DESCRIBE method for
+ a media resource URI, the media server response MUST contain an SDP
+ description in its body describing the capabilities of the media
+ server resource. The SDP description MUST contain at a minimum the
+ media header (m-line) describing the codec and other media related
+ features it supports. It MAY contain another SDP header as well, but
+ support for it is optional.
+
+ The usage of SDP messages in the RTSP message body and its
+ application follows the SIP RFC 2543 [4], but is limited to media-
+ related negotiation and description.
+
+6.1. Media Server Session
+
+ As discussed in Section 3.2, a client/server should share one RTSP
+ session-id for the different resources it may use under the same
+ session. The client MUST allocate a set of client RTP/RTCP ports for
+ a new session and MUST NOT send a Session-ID in the SETUP message for
+ the first resource. The server then creates a Session-ID and
+ allocates a set of server RTP/RTCP ports and responds to the SETUP
+ message.
+
+ If the client wants to open more resources with the same server under
+ the same session, it will send the session-id (that it got in the
+ earlier SETUP response) in the SETUP for the new resource. A SETUP
+ message with an existing session-id tells the server that this new
+ resource will feed from/into the same RTP/RTCP stream of that
+ existing session.
+
+ If the client wants to open a resource from a media server that is
+ not where the first resource came from, it will send separate SETUP
+ requests with no session-id header field in them. Each server will
+ allocate its own session-id and return it in the response. Each of
+ them will also come back with their own set of RTP/RTCP ports. This
+ would be the case when the synthesizer engine and the recognition
+ engine are on different servers.
+
+ The RTSP SETUP method SHOULD contain an SDP description of the media
+ stream being set up. The RTSP SETUP response MUST contain an SDP
+ description of the media stream that it expects to receive and send
+ on that session.
+
+ The SDP description in the SETUP method from the client SHOULD
+ describe the required media parameters like codec, Named Signaling
+ Event (NSE) payload types, etc. This could have multiple media
+
+
+
+Shanmugham, et al. Informational [Page 19]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ headers (i.e., m-lines) to allow the client to provide the media
+ server with more than one option to choose from.
+
+ The SDP description in the SETUP response should reflect the media
+ parameters that the media server will be using for the stream. It
+ should be within the choices that were specified in the SDP of the
+ SETUP method, if one was provided.
+
+ Example:
+
+ C->S:
+
+ SETUP rtsp://media.server.com/recognizer/ RTSP/1.0
+ CSeq:1
+ Transport:RTP/AVP;unicast;client_port=46456-46457
+ Content-Type:application/sdp
+ Content-Length:190
+
+ v=0
+ o=- 123 456 IN IP4 10.0.0.1
+ s=Media Server
+ p=+1-888-555-1212
+ c=IN IP4 0.0.0.0
+ t=0 0
+ m=audio 46456 RTP/AVP 0 96
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:96 telephone-event/8000
+ a=fmtp:96 0-15
+
+ S->C:
+
+ RTSP/1.0 200 OK
+ CSeq:1
+ Session:0a030258_00003815_3bc4873a_0001_0000
+ Transport:RTP/AVP;unicast;client_port=46456-46457;
+ server_port=46460-46461
+ Content-Length:190
+ Content-Type:application/sdp
+
+ v=0
+ o=- 3211724219 3211724219 IN IP4 10.3.2.88
+ s=Media Server
+ c=IN IP4 0.0.0.0
+ t=0 0
+ m=audio 46460 RTP/AVP 0 96
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:96 telephone-event/8000
+ a=fmtp:96 0-15
+
+
+
+Shanmugham, et al. Informational [Page 20]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ If an SDP description was not provided in the RTSP SETUP method, then
+ the media server may decide on parameters of the stream but MUST
+ specify what it chooses in the SETUP response. An SDP announcement
+ is only returned in a response to a SETUP message that does not
+ specify a Session. That is, the server will not return an SDP
+ announcement for the synthesizer SETUP of a session already
+ established with a recognizer.
+
+ C->S:
+
+ SETUP rtsp://media.server.com/recognizer/ RTSP/1.0
+ CSeq:1
+ Transport:RTP/AVP;unicast;client_port=46498
+
+ S->C:
+
+ RTSP/1.0 200 OK
+ CSeq:1
+ Session:0a030258_000039dc_3bc48a13_0001_0000
+ Transport:RTP/AVP;unicast; client_port=46498;
+ server_port=46502-46503
+ Content-Length:193
+ Content-Type:application/sdp
+
+ v=0
+ o=- 3211724947 3211724947 IN IP4 10.3.2.88
+ s=Media Server
+ c=IN IP4 0.0.0.0
+ t=0 0
+ m=audio 46502 RTP/AVP 0 101
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:101 telephone-event/8000
+ a=fmtp:101 0-15
+
+7. Speech Synthesizer Resource
+
+ This resource is capable of converting text provided by the client
+ and generating a speech stream in real-time. Depending on the
+ implementation and capability of this resource, the client can
+ control parameters like voice characteristics, speaker speed, etc.
+
+ The synthesizer resource is controlled by MRCP requests from the
+ client. Similarly, the resource can respond to these requests or
+ generate asynchronous events to the server to indicate certain
+ conditions during the processing of the stream.
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 21]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+7.1. Synthesizer State Machine
+
+ The synthesizer maintains states because it needs to correlate MRCP
+ requests from the client. The state transitions shown below describe
+ the states of the synthesizer and reflect the request at the head of
+ the queue. A SPEAK request in the PENDING state can be deleted or
+ stopped by a STOP request and does not affect the state of the
+ resource.
+
+ Idle Speaking Paused
+ State State State
+ | | |
+ |----------SPEAK------->| |--------|
+ |<------STOP------------| CONTROL |
+ |<----SPEAK-COMPLETE----| |------->|
+ |<----BARGE-IN-OCCURRED-| |
+ | |--------| |
+ | CONTROL |-----------PAUSE--------->|
+ | |------->|<----------RESUME---------|
+ | | |----------|
+ | | PAUSE |
+ | | |--------->|
+ | |--------|----------| |
+ | BARGE-IN-OCCURRED | SPEECH-MARKER |
+ | |------->|<---------| |
+ |----------| | |------------|
+ | STOP | SPEAK |
+ | | | |----------->|
+ |<---------| |
+ |<-------------------STOP--------------------------|
+
+7.2. Synthesizer Methods
+
+ The synthesizer supports the following methods.
+
+ synthesizer-method = "SET-PARAMS"
+ / "GET-PARAMS"
+ / "SPEAK"
+ / "STOP"
+ / "PAUSE"
+ / "RESUME"
+ / "BARGE-IN-OCCURRED"
+ / "CONTROL"
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 22]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+7.3. Synthesizer Events
+
+ The synthesizer may generate the following events.
+
+ synthesizer-event = "SPEECH-MARKER"
+ / "SPEAK-COMPLETE"
+
+7.4. Synthesizer Header Fields
+
+ A synthesizer message may contain header fields containing request
+ options and information to augment the Request, Response, or Event of
+ the message with which it is associated.
+
+ synthesizer-header = jump-target ; Section 7.4.1
+ / kill-on-barge-in ; Section 7.4.2
+ / speaker-profile ; Section 7.4.3
+ / completion-cause ; Section 7.4.4
+ / voice-parameter ; Section 7.4.5
+ / prosody-parameter ; Section 7.4.6
+ / vendor-specific ; Section 7.4.7
+ / speech-marker ; Section 7.4.8
+ / speech-language ; Section 7.4.9
+ / fetch-hint ; Section 7.4.10
+ / audio-fetch-hint ; Section 7.4.11
+ / fetch-timeout ; Section 7.4.12
+ / failed-uri ; Section 7.4.13
+ / failed-uri-cause ; Section 7.4.14
+ / speak-restart ; Section 7.4.15
+ / speak-length ; Section 7.4.16
+
+ Parameter Support Methods/Events/Response
+
+ jump-target MANDATORY SPEAK, CONTROL
+ logging-tag MANDATORY SET-PARAMS, GET-PARAMS
+ kill-on-barge-in MANDATORY SPEAK
+ speaker-profile OPTIONAL SET-PARAMS, GET-PARAMS,
+ SPEAK, CONTROL
+ completion-cause MANDATORY SPEAK-COMPLETE
+ voice-parameter MANDATORY SET-PARAMS, GET-PARAMS,
+ SPEAK, CONTROL
+ prosody-parameter MANDATORY SET-PARAMS, GET-PARAMS,
+ SPEAK, CONTROL
+ vendor-specific MANDATORY SET-PARAMS, GET-PARAMS
+ speech-marker MANDATORY SPEECH-MARKER
+ speech-language MANDATORY SET-PARAMS, GET-PARAMS, SPEAK
+ fetch-hint MANDATORY SET-PARAMS, GET-PARAMS, SPEAK
+ audio-fetch-hint MANDATORY SET-PARAMS, GET-PARAMS, SPEAK
+ fetch-timeout MANDATORY SET-PARAMS, GET-PARAMS, SPEAK
+
+
+
+Shanmugham, et al. Informational [Page 23]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ failed-uri MANDATORY Any
+ failed-uri-cause MANDATORY Any
+ speak-restart MANDATORY CONTROL
+ speak-length MANDATORY SPEAK, CONTROL
+
+7.4.1. Jump-Target
+
+ This parameter MAY BE specified in a CONTROL method and controls the
+ jump size to move forward or rewind backward on an active SPEAK
+ request. A + or - indicates a relative value to what is being
+ currently played. This MAY BE specified in a SPEAK request to
+ indicate an offset into the speech markup that the SPEAK request
+ should start speaking from. The different speech length units
+ supported are dependent on the synthesizer implementation. If it
+ does not support a unit or the operation, the resource SHOULD respond
+ with a status code of 404 "Illegal or Unsupported value for
+ parameter".
+
+ jump-target = "Jump-Size" ":" speech-length-value CRLF
+ speech-length-value = numeric-speech-length
+ / text-speech-length
+ text-speech-length = 1*ALPHA SP "Tag"
+ numeric-speech-length= ("+" / "-") 1*DIGIT SP
+ numeric-speech-unit
+ numeric-speech-unit = "Second"
+ / "Word"
+ / "Sentence"
+ / "Paragraph"
+
+7.4.2. Kill-On-Barge-In
+
+ This parameter MAY BE sent as part of the SPEAK method to enable
+ kill-on-barge-in support. If enabled, the SPEAK method is
+ interrupted by DTMF input detected by a signal detector resource or
+ by the start of speech sensed or recognized by the speech recognizer
+ resource.
+
+ kill-on-barge-in = "Kill-On-Barge-In" ":" boolean-value CRLF
+ boolean-value = "true" / "false"
+
+ If the recognizer or signal detector resource is on, the same server
+ as the synthesizer, the server should be intelligent enough to
+ recognize their interactions by their common RTSP session-id and work
+ with each other to provide kill-on-barge-in support. The client
+ needs to send a BARGE-IN-OCCURRED method to the synthesizer resource
+ when it receives a barge-in-able event from the synthesizer resource
+
+
+
+
+
+Shanmugham, et al. Informational [Page 24]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ or signal detector resource. These resources MAY BE local or
+ distributed. If this field is not specified, the value defaults to
+ "true".
+
+7.4.3. Speaker Profile
+
+ This parameter MAY BE part of the SET-PARAMS/GET-PARAMS or SPEAK
+ request from the client to the server and specifies the profile of
+ the speaker by a URI, which may be a set of voice parameters like
+ gender, accent, etc.
+
+ speaker-profile = "Speaker-Profile" ":" uri CRLF
+
+7.4.4. Completion Cause
+
+ This header field MUST be specified in a SPEAK-COMPLETE event coming
+ from the synthesizer resource to the client. This indicates the
+ reason behind the SPEAK request completion.
+
+ completion-cause = "Completion-Cause" ":" 1*DIGIT SP 1*ALPHA
+ CRLF
+
+ Cause-Code Cause-Name Description
+ 000 normal SPEAK completed normally.
+ 001 barge-in SPEAK request was terminated because
+ of barge-in.
+ 002 parse-failure SPEAK request terminated because of a
+ failure to parse the speech markup text.
+ 003 uri-failure SPEAK request terminated because, access
+ to one of the URIs failed.
+ 004 error SPEAK request terminated prematurely due
+ to synthesizer error.
+ 005 language-unsupported
+ Language not supported.
+
+7.4.5. Voice-Parameters
+
+ This set of parameters defines the voice of the speaker.
+
+ voice-parameter = "Voice-" voice-param-name ":"
+ voice-param-value CRLF
+
+ voice-param-name is any one of the attribute names under the voice
+ element specified in W3C's Speech Synthesis Markup Language
+ Specification [9]. The voice-param-value is any one of the value
+ choices of the corresponding voice element attribute specified in the
+ above section.
+
+
+
+
+Shanmugham, et al. Informational [Page 25]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ These header fields MAY BE sent in SET-PARAMS/GET-PARAMS request to
+ define/get default values for the entire session or MAY BE sent in
+ the SPEAK request to define default values for that speak request.
+ Furthermore, these attributes can be part of the speech text marked
+ up in Speech Synthesis Markup Language (SSML).
+
+ These voice parameter header fields can also be sent in a CONTROL
+ method to affect a SPEAK request in progress and change its behavior
+ on the fly. If the synthesizer resource does not support this
+ operation, it should respond back to the client with a status of
+ unsupported.
+
+7.4.6. Prosody-Parameters
+
+ This set of parameters defines the prosody of the speech.
+
+ prosody-parameter = "Prosody-" prosody-param-name ":"
+ prosody-param-value CRLF
+
+ prosody-param-name is any one of the attribute names under the
+ prosody element specified in W3C's Speech Synthesis Markup Language
+ Specification [9]. The prosody-param-value is any one of the value
+ choices of the corresponding prosody element attribute specified in
+ the above section.
+
+ These header fields MAY BE sent in SET-PARAMS/GET-PARAMS request to
+ define/get default values for the entire session or MAY BE sent in
+ the SPEAK request to define default values for that speak request.
+ Furthermore, these attributes can be part of the speech text marked
+ up in SSML.
+
+ The prosody parameter header fields in the SET-PARAMS or SPEAK
+ request only apply if the speech data is of type text/plain and does
+ not use a speech markup format.
+
+ These prosody parameter header fields MAY also be sent in a CONTROL
+ method to affect a SPEAK request in progress and to change its
+ behavior on the fly. If the synthesizer resource does not support
+ this operation, it should respond back to the client with a status of
+ unsupported.
+
+
+
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 26]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+7.4.7. Vendor-Specific Parameters
+
+ This set of headers allows for the client to set vendor-specific
+ parameters.
+
+ vendor-specific = "Vendor-Specific-Parameters" ":"
+ vendor-specific-av-pair
+ *[";" vendor-specific-av-pair] CRLF
+
+ vendor-specific-av-pair = vendor-av-pair-name "="
+ vendor-av-pair-value
+
+ This header MAY BE sent in the SET-PARAMS/GET-PARAMS method and is
+ used to set vendor-specific parameters on the server side. The
+ vendor-av-pair-name can be any vendor-specific field name and
+ conforms to the XML vendor-specific attribute naming convention. The
+ vendor-av-pair-value is the value to set the attribute to and needs
+ to be quoted.
+
+ When asking the server to get the current value of these parameters,
+ this header can be sent in the GET-PARAMS method with the list of
+ vendor-specific attribute names to get separated by a semicolon.
+
+7.4.8. Speech Marker
+
+ This header field contains a marker tag that may be embedded in the
+ speech data. Most speech markup formats provide mechanisms to embed
+ marker fields between speech texts. The synthesizer will generate
+ SPEECH-MARKER events when it reaches these marker fields. This field
+ SHOULD be part of the SPEECH-MARKER event and will contain the marker
+ tag values.
+
+ speech-marker = "Speech-Marker" ":" 1*ALPHA CRLF
+
+7.4.9. Speech Language
+
+ This header field specifies the default language of the speech data
+ if it is not specified in the speech data. The value of this header
+ field should follow RFC 3066 [16] for its values. This MAY occur in
+ SPEAK, SET-PARAMS, or GET-PARAMS request.
+
+ speech-language = "Speech-Language" ":" 1*ALPHA CRLF
+
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 27]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+7.4.10. Fetch Hint
+
+ When the synthesizer needs to fetch documents or other resources like
+ speech markup or audio files, etc., this header field controls URI
+ access properties. This defines when the synthesizer should retrieve
+ content from the server. A value of "prefetch" indicates a file may
+ be downloaded when the request is received, whereas "safe" indicates
+ a file that should only be downloaded when actually needed. The
+ default value is "prefetch". This header field MAY occur in SPEAK,
+ SET-PARAMS, or GET-PARAMS requests.
+
+ fetch-hint = "Fetch-Hint" ":" 1*ALPHA CRLF
+
+7.4.11. Audio Fetch Hint
+
+ When the synthesizer needs to fetch documents or other resources like
+ speech audio files, etc., this header field controls URI access
+ properties. This defines whether or not the synthesizer can attempt
+ to optimize speech by pre-fetching audio. The value is either "safe"
+ to say that audio is only fetched when it is needed, never before;
+ "prefetch" to permit, but not require the platform to pre-fetch the
+ audio; or "stream" to allow it to stream the audio fetches. The
+ default value is "prefetch". This header field MAY occur in SPEAK,
+ SET-PARAMS, or GET-PARAMS requests.
+
+ audio-fetch-hint = "Audio-Fetch-Hint" ":" 1*ALPHA CRLF
+
+7.4.12. Fetch Timeout
+
+ When the synthesizer needs to fetch documents or other resources like
+ speech audio files, etc., this header field controls URI access
+ properties. This defines the synthesizer timeout for resources the
+ media server may need to fetch from the network. This is specified
+ in milliseconds. The default value is platform-dependent. This
+ header field MAY occur in SPEAK, SET-PARAMS, or GET-PARAMS.
+
+ fetch-timeout = "Fetch-Timeout" ":" 1*DIGIT CRLF
+
+7.4.13. Failed URI
+
+ When a synthesizer method needs a synthesizer to fetch or access a
+ URI, and the access fails, the media server SHOULD provide the failed
+ URI in this header field in the method response.
+
+ failed-uri = "Failed-URI" ":" Url CRLF
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 28]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+7.4.14. Failed URI Cause
+
+ When a synthesizer method needs a synthesizer to fetch or access a
+ URI, and the access fails, the media server SHOULD provide the URI
+ specific or protocol-specific response code through this header field
+ in the method response. This field has been defined as alphanumeric
+ to accommodate all protocols, some of which might have a response
+ string instead of a numeric response code.
+
+ failed-uri-cause = "Failed-URI-Cause" ":" 1*ALPHA CRLF
+
+7.4.15. Speak Restart
+
+ When a CONTROL jump backward request is issued to a currently
+ speaking synthesizer resource and the jumps beyond the start of the
+ speech, the current SPEAK request re-starts from the beginning of its
+ speech data and the response to the CONTROL request would contain
+ this header indicating a restart. This header MAY occur in the
+ CONTROL response.
+
+ speak-restart = "Speak-Restart" ":" boolean-value CRLF
+
+7.4.16. Speak Length
+
+ This parameter MAY BE specified in a CONTROL method to control the
+ length of speech to speak, relative to the current speaking point in
+ the currently active SPEAK request. A "-" value is illegal in this
+ field. If a field with a Tag unit is specified, then the media must
+ speak until the tag is reached or the SPEAK request complete,
+ whichever comes first. This MAY BE specified in a SPEAK request to
+ indicate the length to speak in the speech data and is relative to
+ the point in speech where the SPEAK request starts. The different
+ speech length units supported are dependent on the synthesizer
+ implementation. If it does not support a unit or the operation, the
+ resource SHOULD respond with a status code of 404 "Illegal or
+ Unsupported value for parameter".
+
+ speak-length = "Speak-Length" ":" speech-length-value
+ CRLF
+
+7.5. Synthesizer Message Body
+
+ A synthesizer message may contain additional information associated
+ with the Method, Response, or Event in its message body.
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 29]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+7.5.1. Synthesizer Speech Data
+
+ Marked-up text for the synthesizer to speak is specified as a MIME
+ entity in the message body. The message to be spoken by the
+ synthesizer can be specified inline (by embedding the data in the
+ message body) or by reference (by providing the URI to the data). In
+ either case, the data and the format used to markup the speech needs
+ to be supported by the media server.
+
+ All media servers MUST support plain text speech data and W3C's
+ Speech Synthesis Markup Language [9] at a minimum and, hence, MUST
+ support the MIME types text/plain and application/synthesis+ssml at a
+ minimum.
+
+ If the speech data needs to be specified by URI reference, the MIME
+ type text/uri-list is used to specify the one or more URIs that will
+ list what needs to be spoken. If a list of speech URIs is specified,
+ speech data provided by each URI must be spoken in the order in which
+ the URI are specified.
+
+ If the data to be spoken consists of a mix of URI and inline speech
+ data, the multipart/mixed MIME-type is used and embedded with the
+ MIME-blocks for text/uri-list, application/synthesis+ssml or
+ text/plain. The character set and encoding used in the speech data
+ may be specified according to standard MIME-type definitions. The
+ multi-part MIME-block can contain actual audio data in .wav or Sun
+ audio format. This is used when the client has audio clips that it
+ may have recorded, then stored in memory or a local device, and that
+ it currently needs to play as part of the SPEAK request. The audio
+ MIME-parts can be sent by the client as part of the multi-part MIME-
+ block. This audio will be referenced in the speech markup data that
+ will be another part in the multi-part MIME-block according to the
+ multipart/mixed MIME-type specification.
+
+ Example 1:
+ Content-Type:text/uri-list
+ Content-Length:176
+
+ http://www.cisco.com/ASR-Introduction.sml
+ http://www.cisco.com/ASR-Document-Part1.sml
+ http://www.cisco.com/ASR-Document-Part2.sml
+ http://www.cisco.com/ASR-Conclusion.sml
+
+ Example 2:
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+
+
+
+Shanmugham, et al. Informational [Page 30]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+
+ Example 3:
+ Content-Type:multipart/mixed; boundary="--break"
+
+ --break
+ Content-Type:text/uri-list
+ Content-Length:176
+
+ http://www.cisco.com/ASR-Introduction.sml
+ http://www.cisco.com/ASR-Document-Part1.sml
+ http://www.cisco.com/ASR-Document-Part2.sml
+ http://www.cisco.com/ASR-Conclusion.sml
+
+ --break
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+ --break
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 31]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+7.6. SET-PARAMS
+
+ The SET-PARAMS method, from the client to server, tells the
+ synthesizer resource to define default synthesizer context
+ parameters, like voice characteristics and prosody, etc. If the
+ server accepted and set all parameters, it MUST return a Response-
+ Status of 200. If it chose to ignore some optional parameters, it
+ MUST return 201.
+
+ If some of the parameters being set are unsupported or have illegal
+ values, the server accepts and sets the remaining parameters and MUST
+ respond with a Response-Status of 403 or 404, and MUST include in the
+ response the header fields that could not be set.
+
+ Example:
+ C->S:SET-PARAMS 543256 MRCP/1.0
+ Voice-gender:female
+ Voice-category:adult
+ Voice-variant:3
+
+ S->C:MRCP/1.0 543256 200 COMPLETE
+
+7.7. GET-PARAMS
+
+ The GET-PARAMS method, from the client to server, asks the
+ synthesizer resource for its current synthesizer context parameters,
+ like voice characteristics and prosody, etc. The client SHOULD send
+ the list of parameters it wants to read from the server by listing a
+ set of empty parameter header fields. If a specific list is not
+ specified then the server SHOULD return all the settable parameters
+ including vendor-specific parameters and their current values. The
+ wild card use can be very intensive as the number of settable
+ parameters can be large depending on the vendor. Hence, it is
+ RECOMMENDED that the client does not use the wildcard GET-PARAMS
+ operation very often.
+
+ Example:
+ C->S:GET-PARAMS 543256 MRCP/1.0
+ Voice-gender:
+ Voice-category:
+ Voice-variant:
+ Vendor-Specific-Parameters:com.mycorp.param1;
+ com.mycorp.param2
+
+ S->C:MRCP/1.0 543256 200 COMPLETE
+ Voice-gender:female
+ Voice-category:adult
+ Voice-variant:3
+
+
+
+Shanmugham, et al. Informational [Page 32]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ Vendor-Specific-Parameters:com.mycorp.param1="Company Name";
+ com.mycorp.param2="124324234@mycorp.com"
+
+7.8. SPEAK
+
+ The SPEAK method from the client to the server provides the
+ synthesizer resource with the speech text and initiates speech
+ synthesis and streaming. The SPEAK method can carry voice and
+ prosody header fields that define the behavior of the voice being
+ synthesized, as well as the actual marked-up text to be spoken. If
+ specific voice and prosody parameters are specified as part of the
+ speech markup text, it will take precedence over the values specified
+ in the header fields and those set using a previous SET-PARAMS
+ request.
+
+ When applying voice parameters, there are 3 levels of scope. The
+ highest precedence are those specified within the speech markup text,
+ followed by those specified in the header fields of the SPEAK request
+ and, hence, apply for that SPEAK request only, followed by the
+ session default values that can be set using the SET-PARAMS request
+ and apply for the whole session moving forward.
+
+ If the resource is idle and the SPEAK request is being actively
+ processed, the resource will respond with a success status code and a
+ request-state of IN-PROGRESS.
+
+ If the resource is in the speaking or paused states (i.e., it is in
+ the middle of processing a previous SPEAK request), the status
+ returns success and a request-state of PENDING. This means that this
+ SPEAK request is in queue and will be processed after the currently
+ active SPEAK request is completed.
+
+ For the synthesizer resource, this is the only request that can
+ return a request-state of IN-PROGRESS or PENDING. When the text to
+ be synthesized is complete, the resource will issue a SPEAK-COMPLETE
+ event with the request-id of the SPEAK message and a request-state of
+ COMPLETE.
+
+ Example:
+ C->S:SPEAK 543257 MRCP/1.0
+ Voice-gender:neutral
+ Voice-category:teenager
+ Prosody-volume:medium
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 33]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+
+ S->C:MRCP/1.0 543257 200 IN-PROGRESS
+
+ S->C:SPEAK-COMPLETE 543257 COMPLETE MRCP/1.0
+ Completion-Cause:000 normal
+
+7.9. STOP
+
+ The STOP method from the client to the server tells the resource to
+ stop speaking if it is speaking something.
+
+ The STOP request can be sent with an active-request-id-list header
+ field to stop the zero or more specific SPEAK requests that may be in
+ queue and return a response code of 200(Success). If no active-
+ request-id-list header field is sent in the STOP request, it will
+ terminate all outstanding SPEAK requests.
+
+ If a STOP request successfully terminated one or more PENDING or
+ IN-PROGRESS SPEAK requests, then the response message body contains
+ an active-request-id-list header field listing the SPEAK request-ids
+ that were terminated. Otherwise, there will be no active-request-
+ id-list header field in the response. No SPEAK-COMPLETE events will
+ be sent for these terminated requests.
+
+ If a SPEAK request that was IN-PROGRESS and speaking was stopped, the
+ next pending SPEAK request, if any, would become IN-PROGRESS and move
+ to the speaking state.
+
+ If a SPEAK request that was IN-PROGRESS and in the paused state was
+ stopped, the next pending SPEAK request, if any, would become
+ IN-PROGRESS and move to the paused state.
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 34]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ Example:
+ C->S:SPEAK 543258 MRCP/1.0
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+
+ S->C:MRCP/1.0 543258 200 IN-PROGRESS
+
+ C->S:STOP 543259 200 MRCP/1.0
+
+ S->C:MRCP/1.0 543259 200 COMPLETE
+ Active-Request-Id-List:543258
+
+7.10. BARGE-IN-OCCURRED
+
+ The BARGE-IN-OCCURRED method is a mechanism for the client to
+ communicate a barge-in-able event it detects to the speech resource.
+
+ This event is useful in two scenarios,
+
+ 1. The client has detected some events like DTMF digits or other
+ barge-in-able events and wants to communicate that to the
+ synthesizer.
+
+ 2. The recognizer resource and the synthesizer resource are in
+ different servers. In which case the client MUST act as a Proxy
+ and receive event from the recognition resource, and then send a
+ BARGE-IN-OCCURRED method to the synthesizer. In such cases, the
+ BARGE-IN-OCCURRED method would also have a proxy-sync-id header
+ field received from the resource generating the original event.
+
+ If a SPEAK request is active with kill-on-barge-in enabled, and the
+ BARGE-IN-OCCURRED event is received, the synthesizer should stop
+ streaming out audio. It should also terminate any speech requests
+ queued behind the current active one, irrespective of whether they
+
+
+
+Shanmugham, et al. Informational [Page 35]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ have barge-in enabled or not. If a barge-in-able prompt was playing
+ and it was terminated, the response MUST contain the request-ids of
+ all SPEAK requests that were terminated in its active-request-id-
+ list. There will be no SPEAK-COMPLETE events generated for these
+ requests.
+
+ If the synthesizer and the recognizer are on the same server, they
+ could be optimized for a quicker kill-on-barge-in response by having
+ them interact directly based on a common RTSP session-id. In these
+ cases, the client MUST still proxy the recognition event through a
+ BARGE-IN-OCCURRED method, but the synthesizer resource may have
+ already stopped and sent a SPEAK-COMPLETE event with a barge-in
+ completion cause code. If there were no SPEAK requests terminated as
+ a result of the BARGE-IN-OCCURRED method, the response would still be
+ a 200 success, but MUST not contain an active-request-id-list header
+ field.
+
+ C->S:SPEAK 543258 MRCP/1.0
+ Voice-gender:neutral
+ Voice-category:teenager
+ Prosody-volume:medium
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+
+ S->C:MRCP/1.0 543258 200 IN-PROGRESS
+
+ C->S:BARGE-IN-OCCURRED 543259 200 MRCP/1.0
+ Proxy-Sync-Id:987654321
+
+ S->C:MRCP/1.0 543259 200 COMPLETE
+ Active-Request-Id-List:543258
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 36]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+7.11. PAUSE
+
+ The PAUSE method from the client to the server tells the resource to
+ pause speech, if it is speaking something. If a PAUSE method is
+ issued on a session when a SPEAK is not active, the server SHOULD
+ respond with a status of 402 or "Method not valid in this state". If
+ a PAUSE method is issued on a session when a SPEAK is active and
+ paused, the server SHOULD respond with a status of 200 or "Success".
+ If a SPEAK request was active, the server MUST return an active-
+ request-id-list header with the request-id of the SPEAK request that
+ was paused.
+
+ C->S:SPEAK 543258 MRCP/1.0
+ Voice-gender:neutral
+ Voice-category:teenager
+ Prosody-volume:medium
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+
+ S->C:MRCP/1.0 543258 200 IN-PROGRESS
+
+ C->S:PAUSE 543259 MRCP/1.0
+
+ S->C:MRCP/1.0 543259 200 COMPLETE
+ Active-Request-Id-List:543258
+
+7.12. RESUME
+
+ The RESUME method from the client to the server tells a paused
+ synthesizer resource to continue speaking. If a RESUME method is
+ issued on a session when a SPEAK is not active, the server SHOULD
+ respond with a status of 402 or "Method not valid in this state". If
+ a RESUME method is issued on a session when a SPEAK is active and
+ speaking (i.e., not paused), the server SHOULD respond with a status
+
+
+
+Shanmugham, et al. Informational [Page 37]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ of 200 or "Success". If a SPEAK request was active, the server MUST
+ return an active-request-id-list header with the request-id of the
+ SPEAK request that was resumed
+
+ Example:
+ C->S:SPEAK 543258 MRCP/1.0
+ Voice-gender:neutral
+ Voice-category:teenager
+ Prosody-volume:medium
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+
+ S->C:MRCP/1.0 543258 200 IN-PROGRESS
+
+ C->S:PAUSE 543259 MRCP/1.0
+
+ S->C:MRCP/1.0 543259 200 COMPLETE
+ Active-Request-Id-List:543258
+
+ C->S:RESUME 543260 MRCP/1.0
+
+ S->C:MRCP/1.0 543260 200 COMPLETE
+ Active-Request-Id-List:543258
+
+7.13. CONTROL
+
+ The CONTROL method from the client to the server tells a synthesizer
+ that is speaking to modify what it is speaking on the fly. This
+ method is used to make the synthesizer jump forward or backward in
+ what it is being spoken, change speaker rate and speaker parameters,
+ etc. It affects the active or IN-PROGRESS SPEAK request. Depending
+ on the implementation and capability of the synthesizer resource, it
+ may allow this operation or one or more of its parameters.
+
+
+
+
+Shanmugham, et al. Informational [Page 38]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ When a CONTROL to jump forward is issued and the operation goes
+ beyond the end of the active SPEAK method's text, the request
+ succeeds. A SPEAK-COMPLETE event follows the response to the CONTROL
+ method. If there are more SPEAK requests in the queue, the
+ synthesizer resource will continue to process the next SPEAK method.
+ When a CONTROL to jump backwards is issued and the operation jumps to
+ the beginning of the speech data of the active SPEAK request, the
+ response to the CONTROL request contains the speak-restart header.
+
+ These two behaviors can be used to rewind or fast-forward across
+ multiple speech requests, if the client wants to break up a speech
+ markup text into multiple SPEAK requests.
+
+ If a SPEAK request was active when the CONTROL method was received,
+ the server MUST return an active-request-id-list header with the
+ Request-id of the SPEAK request that was active.
+
+ Example:
+ C->S:SPEAK 543258 MRCP/1.0
+ Voice-gender:neutral
+ Voice-category:teenager
+ Prosody-volume:medium
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+
+ S->C:MRCP/1.0 543258 200 IN-PROGRESS
+
+ C->S:CONTROL 543259 MRCP/1.0
+ Prosody-rate:fast
+
+ S->C:MRCP/1.0 543259 200 COMPLETE
+ Active-Request-Id-List:543258
+
+ C->S:CONTROL 543260 MRCP/1.0
+
+
+
+Shanmugham, et al. Informational [Page 39]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ Jump-Size:-15 Words
+
+ S->C:MRCP/1.0 543260 200 COMPLETE
+ Active-Request-Id-List:543258
+
+7.14. SPEAK-COMPLETE
+
+ This is an Event message from the synthesizer resource to the client
+ indicating that the SPEAK request was completed. The request-id
+ header field WILL match the request-id of the SPEAK request that
+ initiated the speech that just completed. The request-state field
+ should be COMPLETE indicating that this is the last Event with that
+ request-id, and that the request with that request-id is now
+ complete. The completion-cause header field specifies the cause code
+ pertaining to the status and reason of request completion such as the
+ SPEAK completed normally or because of an error or kill-on-barge-in,
+ etc.
+
+ Example:
+ C->S:SPEAK 543260 MRCP/1.0
+ Voice-gender:neutral
+ Voice-category:teenager
+ Prosody-volume:medium
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+
+ S->C:MRCP/1.0 543260 200 IN-PROGRESS
+
+ S->C:SPEAK-COMPLETE 543260 COMPLETE MRCP/1.0
+
+ Completion-Cause:000 normal
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 40]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+7.15. SPEECH-MARKER
+
+ This is an event generated by the synthesizer resource to the client
+ when it hits a marker tag in the speech markup it is currently
+ processing. The request-id field in the header matches the SPEAK
+ request request-id that initiated the speech. The request-state
+ field should be IN-PROGRESS as the speech is still not complete and
+ there is more to be spoken. The actual speech marker tag hit,
+ describing where the synthesizer is in the speech markup, is returned
+ in the speech-marker header field.
+
+ Example:
+ C->S:SPEAK 543261 MRCP/1.0
+ Voice-gender:neutral
+ Voice-category:teenager
+ Prosody-volume:medium
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+ <mark name="here"/>
+ <sentence>The subject is
+ <prosody rate="-20%">ski trip</prosody>
+ </sentence>
+ <mark name="ANSWER"/>
+ </paragraph>
+ </speak>
+
+ S->C:MRCP/1.0 543261 200 IN-PROGRESS
+
+ S->C:SPEECH-MARKER 543261 IN-PROGRESS MRCP/1.0
+ Speech-Marker:here
+
+ S->C:SPEECH-MARKER 543261 IN-PROGRESS MRCP/1.0
+ Speech-Marker:ANSWER
+
+ S->C:SPEAK-COMPLETE 543261 COMPLETE MRCP/1.0
+ Completion-Cause:000 normal
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 41]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+8. Speech Recognizer Resource
+
+ The speech recognizer resource is capable of receiving an incoming
+ voice stream and providing the client with an interpretation of what
+ was spoken in textual form.
+
+8.1. Recognizer State Machine
+
+ The recognizer resource is controlled by MRCP requests from the
+ client. Similarly, the resource can respond to these requests or
+ generate asynchronous events to the server to indicate certain
+ conditions during the processing of the stream. Hence, the
+ recognizer maintains states to correlate MRCP requests from the
+ client. The state transitions are described below.
+
+ Idle Recognizing Recognized
+ State State State
+ | | |
+ |---------RECOGNIZE---->|---RECOGNITION-COMPLETE-->|
+ |<------STOP------------|<-----RECOGNIZE-----------|
+ | | |
+ | | |-----------|
+ | |--------| GET-RESULT |
+ | START-OF-SPEECH | |---------->|
+ |------------| |------->| |
+ | | |----------| |
+ | DEFINE-GRAMMAR | RECOGNITION-START-TIMERS |
+ |<-----------| |<---------| |
+ | | |
+ | | |
+ |-------| | |
+ | STOP | |
+ |<------| | |
+ | |
+ |<-------------------STOP--------------------------|
+ |<-------------------DEFINE-GRAMMAR----------------|
+
+8.2. Recognizer Methods
+
+ The recognizer supports the following methods.
+ recognizer-method = SET-PARAMS
+ / GET-PARAMS
+ / DEFINE-GRAMMAR
+ / RECOGNIZE
+ / GET-RESULT
+ / RECOGNITION-START-TIMERS
+ / STOP
+
+
+
+
+Shanmugham, et al. Informational [Page 42]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+8.3. Recognizer Events
+
+ The recognizer may generate the following events.
+
+ recognizer-event = START-OF-SPEECH
+ / RECOGNITION-COMPLETE
+
+8.4. Recognizer Header Fields
+
+ A recognizer message may contain header fields containing request
+ options and information to augment the Method, Response, or Event
+ message it is associated with.
+
+ recognizer-header = confidence-threshold ; Section 8.4.1
+ / sensitivity-level ; Section 8.4.2
+ / speed-vs-accuracy ; Section 8.4.3
+ / n-best-list-length ; Section 8.4.4
+ / no-input-timeout ; Section 8.4.5
+ / recognition-timeout ; Section 8.4.6
+ / waveform-url ; Section 8.4.7
+ / completion-cause ; Section 8.4.8
+ / recognizer-context-block ; Section 8.4.9
+ / recognizer-start-timers ; Section 8.4.10
+ / vendor-specific ; Section 8.4.11
+ / speech-complete-timeout ; Section 8.4.12
+ / speech-incomplete-timeout; Section 8.4.13
+ / dtmf-interdigit-timeout ; Section 8.4.14
+ / dtmf-term-timeout ; Section 8.4.15
+ / dtmf-term-char ; Section 8.4.16
+ / fetch-timeout ; Section 8.4.17
+ / failed-uri ; Section 8.4.18
+ / failed-uri-cause ; Section 8.4.19
+ / save-waveform ; Section 8.4.20
+ / new-audio-channel ; Section 8.4.21
+ / speech-language ; Section 8.4.22
+
+ Parameter Support Methods/Events
+
+ confidence-threshold MANDATORY SET-PARAMS, RECOGNIZE
+ GET-RESULT
+ sensitivity-level Optional SET-PARAMS, GET-PARAMS,
+ RECOGNIZE
+ speed-vs-accuracy Optional SET-PARAMS, GET-PARAMS,
+ RECOGNIZE
+ n-best-list-length Optional SET-PARAMS, GET-PARAMS,
+ RECOGNIZE, GET-RESULT
+ no-input-timeout MANDATORY SET-PARAMS, GET-PARAMS,
+ RECOGNIZE
+
+
+
+Shanmugham, et al. Informational [Page 43]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ recognition-timeout MANDATORY SET-PARAMS, GET-PARAMS,
+ RECOGNIZE
+ waveform-url MANDATORY RECOGNITION-COMPLETE
+ completion-cause MANDATORY DEFINE-GRAMMAR, RECOGNIZE,
+ RECOGNITON-COMPLETE
+ recognizer-context-block Optional SET-PARAMS, GET-PARAMS
+ recognizer-start-timers MANDATORY RECOGNIZE
+ vendor-specific MANDATORY SET-PARAMS, GET-PARAMS
+ speech-complete-timeout MANDATORY SET-PARAMS, GET-PARAMS
+ RECOGNIZE
+ speech-incomplete-timeout MANDATORY SET-PARAMS, GET-PARAMS
+ RECOGNIZE
+ dtmf-interdigit-timeout MANDATORY SET-PARAMS, GET-PARAMS
+ RECOGNIZE
+ dtmf-term-timeout MANDATORY SET-PARAMS, GET-PARAMS
+ RECOGNIZE
+ dtmf-term-char MANDATORY SET-PARAMS, GET-PARAMS
+ RECOGNIZE
+ fetch-timeout MANDATORY SET-PARAMS, GET-PARAMS
+ RECOGNIZE, DEFINE-GRAMMAR
+ failed-uri MANDATORY DEFINE-GRAMMAR response,
+ RECOGNITION-COMPLETE
+ failed-uri-cause MANDATORY DEFINE-GRAMMAR response,
+ RECOGNITION-COMPLETE
+ save-waveform MANDATORY SET-PARAMS, GET-PARAMS,
+ RECOGNIZE
+ new-audio-channel MANDATORY RECOGNIZE
+ speech-language MANDATORY SET-PARAMS, GET-PARAMS,
+ RECOGNIZE, DEFINE-GRAMMAR
+
+8.4.1. Confidence Threshold
+
+ When a recognition resource recognizes or matches a spoken phrase
+ with some portion of the grammar, it associates a confidence level
+ with that conclusion. The confidence-threshold parameter tells the
+ recognizer resource what confidence level should be considered a
+ successful match. This is an integer from 0-100 indicating the
+ recognizer's confidence in the recognition. If the recognizer
+ determines that its confidence in all its recognition results is less
+ than the confidence threshold, then it MUST return no-match as the
+ recognition result. This header field MAY occur in RECOGNIZE, SET-
+ PARAMS, or GET-PARAMS. The default value for this field is platform
+ specific.
+
+ confidence-threshold = "Confidence-Threshold" ":" 1*DIGIT CRLF
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 44]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+8.4.2. Sensitivity Level
+
+ To filter out background noise and not mistake it for speech, the
+ recognizer may support a variable level of sound sensitivity. The
+ sensitivity-level parameter allows the client to set this value on
+ the recognizer. This header field MAY occur in RECOGNIZE, SET-
+ PARAMS, or GET-PARAMS. A higher value for this field means higher
+ sensitivity. The default value for this field is platform specific.
+
+ sensitivity-level = "Sensitivity-Level" ":" 1*DIGIT CRLF
+
+8.4.3. Speed Vs Accuracy
+
+ Depending on the implementation and capability of the recognizer
+ resource, it may be tunable towards Performance or Accuracy. Higher
+ accuracy may mean more processing and higher CPU utilization, meaning
+ less calls per media server and vice versa. This parameter on the
+ resource can be tuned by the speed-vs-accuracy header. This header
+ field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS. A higher
+ value for this field means higher speed. The default value for this
+ field is platform specific.
+
+ speed-vs-accuracy = "Speed-Vs-Accuracy" ":" 1*DIGIT CRLF
+
+8.4.4. N Best List Length
+
+ When the recognizer matches an incoming stream with the grammar, it
+ may come up with more than one alternative match because of
+ confidence levels in certain words or conversation paths. If this
+ header field is not specified, by default, the recognition resource
+ will only return the best match above the confidence threshold. The
+ client, by setting this parameter, could ask the recognition resource
+ to send it more than 1 alternative. All alternatives must still be
+ above the confidence-threshold. A value greater than one does not
+ guarantee that the recognizer will send the requested number of
+ alternatives. This header field MAY occur in RECOGNIZE, SET-PARAMS,
+ or GET-PARAMS. The minimum value for this field is 1. The default
+ value for this field is 1.
+
+ n-best-list-length = "N-Best-List-Length" ":" 1*DIGIT CRLF
+
+8.4.5. No Input Timeout
+
+ When recognition is started and there is no speech detected for a
+ certain period of time, the recognizer can send a RECOGNITION-
+ COMPLETE event to the client and terminate the recognition operation.
+ The no-input-timeout header field can set this timeout value. The
+ value is in milliseconds. This header field MAY occur in RECOGNIZE,
+
+
+
+Shanmugham, et al. Informational [Page 45]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ SET-PARAMS, or GET-PARAMS. The value for this field ranges from 0 to
+ MAXTIMEOUT, where MAXTIMEOUT is platform specific. The default value
+ for this field is platform specific.
+
+ no-input-timeout = "No-Input-Timeout" ":" 1*DIGIT CRLF
+
+8.4.6. Recognition Timeout
+
+ When recognition is started and there is no match for a certain
+ period of time, the recognizer can send a RECOGNITION-COMPLETE event
+ to the client and terminate the recognition operation. The
+ recognition-timeout parameter field sets this timeout value. The
+ value is in milliseconds. The value for this field ranges from 0 to
+ MAXTIMEOUT, where MAXTIMEOUT is platform specific. The default value
+ is 10 seconds. This header field MAY occur in RECOGNIZE, SET-PARAMS
+ or GET-PARAMS.
+
+ recognition-timeout = "Recognition-Timeout" ":" 1*DIGIT CRLF
+
+8.4.7. Waveform URL
+
+ If the save-waveform header field is set to true, the recognizer MUST
+ record the incoming audio stream of the recognition into a file and
+ provide a URI for the client to access it. This header MUST be
+ present in the RECOGNITION-COMPLETE event if the save-waveform header
+ field was set to true. The URL value of the header MUST be NULL if
+ there was some error condition preventing the server from recording.
+ Otherwise, the URL generated by the server SHOULD be globally unique
+ across the server and all its recognition sessions. The URL SHOULD
+ BE available until the session is torn down.
+
+ waveform-url = "Waveform-URL" ":" Url CRLF
+
+8.4.8. Completion Cause
+
+ This header field MUST be part of a RECOGNITION-COMPLETE event coming
+ from the recognizer resource to the client. This indicates the
+ reason behind the RECOGNIZE method completion. This header field
+ MUST BE sent in the DEFINE-GRAMMAR and RECOGNIZE responses, if they
+ return with a failure status and a COMPLETE state.
+
+ Cause-Code Cause-Name Description
+
+ 000 success RECOGNIZE completed with a match or
+ DEFINE-GRAMMAR succeeded in
+ downloading and compiling the
+ grammar
+
+
+
+
+Shanmugham, et al. Informational [Page 46]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ 001 no-match RECOGNIZE completed, but no match
+ was found
+ 002 no-input-timeout
+ RECOGNIZE completed without a match
+ due to a no-input-timeout
+ 003 recognition-timeout
+ RECOGNIZE completed without a match
+ due to a recognition-timeout
+ 004 gram-load-failure
+ RECOGNIZE failed due grammar load
+ failure.
+ 005 gram-comp-failure
+ RECOGNIZE failed due to grammar
+ compilation failure.
+ 006 error RECOGNIZE request terminated
+ prematurely due to a recognizer
+ error.
+ 007 speech-too-early
+ RECOGNIZE request terminated because
+ speech was too early.
+ 008 too-much-speech-timeout
+ RECOGNIZE request terminated because
+ speech was too long.
+ 009 uri-failure Failure accessing a URI.
+ 010 language-unsupported
+ Language not supported.
+
+8.4.9. Recognizer Context Block
+
+ This parameter MAY BE sent as part of the SET-PARAMS or GET-PARAMS
+ request. If the GET-PARAMS method contains this header field with no
+ value, then it is a request to the recognizer to return the
+ recognizer context block. The response to such a message MAY contain
+ a recognizer context block as a message entity. If the server
+ returns a recognizer context block, the response MUST contain this
+ header field and its value MUST match the content-id of that entity.
+
+ If the SET-PARAMS method contains this header field, it MUST contain
+ a message entity containing the recognizer context data, and a
+ content-id matching this header field.
+
+ This content-id should match the content-id that came with the
+ context data during the GET-PARAMS operation.
+
+ recognizer-context-block = "Recognizer-Context-Block" ":"
+ 1*ALPHA CRLF
+
+
+
+
+
+Shanmugham, et al. Informational [Page 47]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+8.4.10. Recognition Start Timers
+
+ This parameter MAY BE sent as part of the RECOGNIZE request. A value
+ of false tells the recognizer to start recognition, but not to start
+ the no-input timer yet. The recognizer should not start the timers
+ until the client sends a RECOGNITION-START-TIMERS request to the
+ recognizer. This is useful in the scenario when the recognizer and
+ synthesizer engines are not part of the same session. Here, when a
+ kill-on-barge-in prompt is being played, you want the RECOGNIZE
+ request to be simultaneously active so that it can detect and
+ implement kill-on-barge-in. But at the same time, you don't want the
+ recognizer to start the no-input timers until the prompt is finished.
+ The default value is "true".
+
+ recognizer-start-timers = "Recognizer-Start-Timers" ":"
+ boolean-value CRLF
+
+8.4.11. Vendor Specific Parameters
+
+ This set of headers allows the client to set Vendor Specific
+ parameters.
+
+ This header can be sent in the SET-PARAMS method and is used to set
+ vendor-specific parameters on the server. The vendor-av-pair-name
+ can be any vendor-specific field name and conforms to the XML
+ vendor-specific attribute naming convention. The vendor-av-pair-
+ value is the value to set the attribute to, and needs to be quoted.
+
+ When asking the server to get the current value of these parameters,
+ this header can be sent in the GET-PARAMS method with the list of
+ vendor-specific attribute names to get separated by a semicolon.
+ This header field MAY occur in SET-PARAMS or GET-PARAMS.
+
+8.4.12. Speech Complete Timeout
+
+ This header field specifies the length of silence required following
+ user speech before the speech recognizer finalizes a result (either
+ accepting it or throwing a nomatch event). The speech-complete-
+ timeout value is used when the recognizer currently has a complete
+ match of an active grammar, and specifies how long it should wait for
+ more input before declaring a match. By contrast, the incomplete
+ timeout is used when the speech is an incomplete match to an active
+ grammar. The value is in milliseconds.
+
+ speech-complete-timeout = "Speech-Complete-Timeout" ":"
+ 1*DIGIT CRLF
+
+
+
+
+
+Shanmugham, et al. Informational [Page 48]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ A long speech-complete-timeout value delays the result completion
+ and, therefore, makes the computer's response slow. A short speech-
+ complete-timeout may lead to an utterance being broken up
+ inappropriately. Reasonable complete timeout values are typically in
+ the range of 0.3 seconds to 1.0 seconds. The value for this field
+ ranges from 0 to MAXTIMEOUT, where MAXTIMEOUT is platform specific.
+ The default value for this field is platform specific. This header
+ field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
+
+8.4.13. Speech Incomplete Timeout
+
+ This header field specifies the required length of silence following
+ user speech, after which a recognizer finalizes a result. The
+ incomplete timeout applies when the speech prior to the silence is an
+ incomplete match of all active grammars. In this case, once the
+ timeout is triggered, the partial result is rejected (with a nomatch
+ event). The value is in milliseconds. The value for this field
+ ranges from 0 to MAXTIMEOUT, where MAXTIMEOUT is platform specific.
+ The default value for this field is platform specific.
+
+ speech-incomplete-timeout = "Speech-Incomplete-Timeout" ":"
+ 1*DIGIT CRLF
+
+ The speech-incomplete-timeout also applies when the speech prior to
+ the silence is a complete match of an active grammar, but where it is
+ possible to speak further and still match the grammar. By contrast,
+ the complete timeout is used when the speech is a complete match to
+ an active grammar and no further words can be spoken.
+
+ A long speech-incomplete-timeout value delays the result completion
+ and, therefore, makes the computer's response slow. A short speech-
+ incomplete-timeout may lead to an utterance being broken up
+ inappropriately.
+
+ The speech-incomplete-timeout is usually longer than the speech-
+ complete-timeout to allow users to pause mid-utterance (for example,
+ to breathe). This header field MAY occur in RECOGNIZE, SET-PARAMS,
+ or GET-PARAMS.
+
+8.4.14. DTMF Interdigit Timeout
+
+ This header field specifies the inter-digit timeout value to use when
+ recognizing DTMF input. The value is in milliseconds. The value for
+ this field ranges from 0 to MAXTIMEOUT, where MAXTIMEOUT is platform
+ specific. The default value is 5 seconds. This header field MAY
+ occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
+
+
+
+
+
+Shanmugham, et al. Informational [Page 49]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":"
+ 1*DIGIT CRLF
+
+8.4.15. DTMF Term Timeout
+
+ This header field specifies the terminating timeout to use when
+ recognizing DTMF input. The value is in milliseconds. The value for
+ this field ranges from 0 to MAXTIMEOUT, where MAXTIMEOUT is platform
+ specific. The default value is 10 seconds. This header field MAY
+ occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
+
+ dtmf-term-timeout = "DTMF-Term-Timeout" ":" 1*DIGIT CRLF
+
+8.4.16. DTMF-Term-Char
+
+ This header field specifies the terminating DTMF character for DTMF
+ input recognition. The default value is NULL which is specified as
+ an empty header field. This header field MAY occur in RECOGNIZE,
+ SET-PARAMS, or GET-PARAMS.
+
+ dtmf-term-char = "DTMF-Term-Char" ":" CHAR CRLF
+
+8.4.17. Fetch Timeout
+
+ When the recognizer needs to fetch grammar documents, this header
+ field controls URI access properties. This defines the recognizer
+ timeout for completing the fetch of the resources the media server
+ needs from the network. The value is in milliseconds. The value for
+ this field ranges from 0 to MAXTIMEOUT, where MAXTIMEOUT is platform
+ specific. The default value for this field is platform specific.
+ This header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
+
+8.4.18. Failed URI
+
+ When a recognizer method needs a recognizer to fetch or access a URI,
+ and the access fails, the media server SHOULD provide the failed URI
+ in this header field in the method response.
+
+8.4.19. Failed URI Cause
+
+ When a recognizer method needs a recognizer to fetch or access a URI,
+ and the access fails, the media server SHOULD provide the URI-
+ specific or protocol-specific response code through this header field
+ in the method response. This field has been defined as alphanumeric
+ to accommodate all protocols, some of which might have a response
+ string instead of a numeric response code.
+
+
+
+
+
+Shanmugham, et al. Informational [Page 50]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+8.4.20. Save Waveform
+
+ This header field allows the client to indicate to the recognizer
+ that it MUST save the audio stream that was recognized. The
+ recognizer MUST then record the recognized audio and make it
+ available to the client in the form of a URI returned in the
+ waveform-uri header field in the RECOGNITION-COMPLETE event. If
+ there was an error in recording the stream or the audio clip is
+ otherwise not available, the recognizer MUST return an empty
+ waveform-uri header field. The default value for this fields is
+ "false".
+
+ save-waveform = "Save-Waveform" ":" boolean-value CRLF
+
+8.4.21. New Audio Channel
+
+ This header field MAY BE specified in a RECOGNIZE message and allows
+ the client to tell the media server that, from that point on, it will
+ be sending audio data from a new audio source, channel, or speaker.
+ If the recognition resource had collected any line statistics or
+ information, it MUST discard it and start fresh for this RECOGNIZE.
+ This helps in the case where the client MAY want to reuse an open
+ recognition session with the media server for multiple telephone
+ calls.
+
+ new-audio-channel = "New-Audio-Channel" ":" boolean-value CRLF
+
+8.4.22. Speech Language
+
+ This header field specifies the language of recognition grammar data
+ within a session or request, if it is not specified within the data.
+ The value of this header field should follow RFC 3066 [16] for its
+ values. This MAY occur in DEFINE-GRAMMAR, RECOGNIZE, SET-PARAMS, or
+ GET-PARAMS request.
+
+8.5. Recognizer Message Body
+
+ A recognizer message may carry additional data associated with the
+ method, response, or event. The client may send the grammar to be
+ recognized in DEFINE-GRAMMAR or RECOGNIZE requests. When the grammar
+ is sent in the DEFINE-GRAMMAR method, the server should be able to
+ download compile and optimize the grammar. The RECOGNIZE request
+ MUST contain a list of grammars that need to be active during the
+ recognition. The server resource may send the recognition results in
+ the RECOGNITION-COMPLETE event or the GET-RESULT response. This data
+ will be carried in the message body of the corresponding MRCP
+ message.
+
+
+
+
+Shanmugham, et al. Informational [Page 51]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+8.5.1. Recognizer Grammar Data
+
+ Recognizer grammar data from the client to the server can be provided
+ inline or by reference. Either way, they are carried as MIME
+ entities in the message body of the MRCP request message. The
+ grammar specified inline or by reference specifies the grammar used
+ to match in the recognition process and this data is specified in one
+ of the standard grammar specification formats like W3C's XML or ABNF
+ or Sun's Java Speech Grammar Format, etc. All media servers MUST
+ support W3C's XML based grammar markup format [11] (MIME-type
+ application/grammar+xml) and SHOULD support the ABNF form (MIME-type
+ application/grammar).
+
+ When a grammar is specified in-line in the message, the client MUST
+ provide a content-id for that grammar as part of the content headers.
+ The server MUST store the grammar associated with that content-id for
+ the duration of the session. A stored grammar can be overwritten by
+ defining a new grammar with the same content-id. Grammars that have
+ been associated with a content-id can be referenced through a special
+ "session:" URI scheme.
+
+ Example:
+ session:help@root-level.store
+
+ If grammar data needs to be specified by external URI reference, the
+ MIME-type text/uri-list is used to list the one or more URI that will
+ specify the grammar data. All media servers MUST support the HTTP
+ URI access mechanism.
+
+ If the data to be defined consists of a mix of URI and inline grammar
+ data, the multipart/mixed MIME-type is used and embedded with the
+ MIME-blocks for text/uri-list, application/grammar or
+ application/grammar+xml. The character set and encoding used in the
+ grammar data may be specified according to standard MIME-type
+ definitions.
+
+ When more than one grammar URI or inline grammar block is specified
+ in a message body of the RECOGNIZE request, it is an active list of
+ grammar alternatives to listen. The ordering of the list implies the
+ precedence of the grammars, with the first grammar in the list having
+ the highest precedence.
+
+ Example 1:
+ Content-Type:application/grammar+xml
+ Content-Id:request1@form-level.store
+ Content-Length:104
+
+ <?xml version="1.0"?>
+
+
+
+Shanmugham, et al. Informational [Page 52]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ <!-- the default grammar language is US English -->
+ <grammar xml:lang="en-US" version="1.0">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ <!-- multiple language attachment to a token -->
+ <rule id="people1">
+ <token lexicon="en-US,fr-CA"> Robert </token>
+ </rule>
+
+ <!-- the equivalent single-language attachment expansion -->
+ <rule id="people2">
+ <one-of>
+ <item xml:lang="en-US">Robert</item>
+ <item xml:lang="fr-CA">Robert</item>
+ </one-of>
+ </rule>
+
+ </grammar>
+
+ Example 2:
+ Content-Type:text/uri-list
+ Content-Length:176
+
+ session:help@root-level.store
+ http://www.cisco.com/Directory-Name-List.grxml
+ http://www.cisco.com/Department-List.grxml
+ http://www.cisco.com/TAC-Contact-List.grxml
+ session:menu1@menu-level.store
+
+ Example 3:
+ Content-Type:multipart/mixed; boundary="--break"
+
+
+
+
+Shanmugham, et al. Informational [Page 53]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ --break
+ Content-Type:text/uri-list
+ Content-Length:176
+ http://www.cisco.com/Directory-Name-List.grxml
+ http://www.cisco.com/Department-List.grxml
+ http://www.cisco.com/TAC-Contact-List.grxml
+
+ --break
+ Content-Type:application/grammar+xml
+ Content-Id:request1@form-level.store
+ Content-Length:104
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xml:lang="en-US" version="1.0">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ <!-- multiple language attachment to a token -->
+ <rule id="people1">
+ <token lexicon="en-US,fr-CA"> Robert </token>
+ </rule>
+
+ <!-- the equivalent single-language attachment expansion -->
+
+ <rule id="people2">
+ <one-of>
+ <item xml:lang="en-US">Robert</item>
+ <item xml:lang="fr-CA">Robert</item>
+ </one-of>
+ </rule>
+
+
+
+
+Shanmugham, et al. Informational [Page 54]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ </grammar>
+ --break
+
+8.5.2. Recognizer Result Data
+
+ Recognition result data from the server is carried in the MRCP
+ message body of the RECOGNITION-COMPLETE event or the GET-RESULT
+ response message as MIME entities. All media servers MUST support
+ W3C's Natural Language Semantics Markup Language (NLSML) [10] as the
+ default standard for returning recognition results back to the
+ client, and hence MUST support the MIME-type application/x-nlsml.
+
+ Example 1:
+ Content-Type:application/x-nlsml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <result grammar="http://theYesNoGrammar">
+ <interpretation>
+ <instance>
+ <myApp:yes_no>
+ <response>yes</response>
+ </myApp:yes_no>
+ </instance>
+ <input>ok</input>
+ </interpretation>
+ </result>
+
+8.5.3. Recognizer Context Block
+
+ When the client has to change recognition servers within a call, this
+ is a block of data that the client MAY collect from the first media
+ server and provide to the second media server. This may be because
+ the client needs different language support or because the media
+ server issued an RTSP RE-DIRECT. Here, the first recognizer may have
+ collected acoustic and other data during its recognition. When we
+ switch recognition servers, communicating this data may allow the
+ second recognition server to provide better recognition based on the
+ acoustic data collected by the previous recognizer. This block of
+ data is vendor-specific and MUST be carried as MIME-type
+ application/octets in the body of the message.
+
+ This block of data is communicated in the SET-PARAMS and GET-PARAMS
+ method/response messages. In the GET-PARAMS method, if an empty
+ recognizer-context-block header field is present, then the recognizer
+ should return its vendor-specific context block in the message body
+ as a MIME-entity with a specific content-id. The content-id value
+ should also be specified in the recognizer-context-block header field
+
+
+
+Shanmugham, et al. Informational [Page 55]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ in the GET-PARAMS response. The SET-PARAMS request wishing to
+ provide this vendor-specific data should send it in the message body
+ as a MIME-entity with the same content-id that it received from the
+ GET-PARAMS. The content-id should also be sent in the recognizer-
+ context-block header field of the SET-PARAMS message.
+
+ Each automatic speech recognition (ASR) vendor choosing to use this
+ mechanism to handoff recognizer context data among its servers should
+ distinguish its vendor-specific block of data from other vendors by
+ choosing a unique content-id that they should recognize.
+
+8.6. SET-PARAMS
+
+ The SET-PARAMS method, from the client to the server, tells the
+ recognizer resource to set and modify recognizer context parameters
+ like recognizer characteristics, result detail level, etc. In the
+ following sections some standard parameters are discussed. If the
+ server resource does not recognize an OPTIONAL parameter, it MUST
+ ignore that field. Many of the parameters in the SET-PARAMS method
+ can also be used in another method like the RECOGNIZE method. But
+ the difference is that when you set something like the sensitivity-
+ level using the SET-PARAMS, it applies for all future requests,
+ whenever applicable. On the other hand, when you pass sensitivity-
+ level in a RECOGNIZE request, it applies only to that request.
+
+ Example:
+ C->S:SET-PARAMS 543256 MRCP/1.0
+ Sensitivity-Level:20
+ Recognition-Timeout:30
+ Confidence-Threshold:85
+
+ S->C:MRCP/1.0 543256 200 COMPLETE
+
+8.7. GET-PARAMS
+
+ The GET-PARAMS method, from the client to the server, asks the
+ recognizer resource for its current default parameters, like
+ sensitivity-level, n-best-list-length, etc. The client can request
+ specific parameters from the server by sending it one or more empty
+ parameter headers with no values. The server should then return the
+ settings for those specific parameters only. When the client does
+ not send a specific list of empty parameter headers, the recognizer
+ should return the settings for all parameters. The wild card use can
+ be very intensive as the number of settable parameters can be large
+ depending on the vendor. Hence, it is RECOMMENDED that the client
+ does not use the wildcard GET-PARAMS operation very often.
+
+
+
+
+
+Shanmugham, et al. Informational [Page 56]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ Example:
+ C->S:GET-PARAMS 543256 MRCP/1.0
+ Sensitivity-Level:
+ Recognition-Timeout:
+ Confidence-threshold:
+
+ S->C:MRCP/1.0 543256 200 COMPLETE
+ Sensitivity-Level:20
+ Recognition-Timeout:30
+ Confidence-Threshold:85
+
+8.8. DEFINE-GRAMMAR
+
+ The DEFINE-GRAMMAR method, from the client to the server, provides a
+ grammar and tells the server to define, download if needed, and
+ compile the grammar.
+
+ If the server resource is in the recognition state, the DEFINE-
+ GRAMMAR request MUST respond with a failure status.
+
+ If the resource is in the idle state and is able to successfully load
+ and compile the grammar, the status MUST return a success code and
+ the request-state MUST be COMPLETE.
+
+ If the recognizer could not define the grammar for some reason, say
+ the download failed or the grammar failed to compile, or the grammar
+ was in an unsupported form, the MRCP response for the DEFINE-GRAMMAR
+ method MUST contain a failure status code of 407, and a completion-
+ cause header field describing the failure reason.
+
+ Example:
+ C->S:DEFINE-GRAMMAR 543257 MRCP/1.0
+ Content-Type:application/grammar+xml
+ Content-Id:request1@form-level.store
+ Content-Length:104
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xml:lang="en-US" version="1.0">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+
+
+Shanmugham, et al. Informational [Page 57]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ </grammar>
+
+ S->C:MRCP/1.0 543257 200 COMPLETE
+ Completion-Cause:000 success
+
+
+ C->S:DEFINE-GRAMMAR 543258 MRCP/1.0
+ Content-Type:application/grammar+xml
+ Content-Id:helpgrammar@root-level.store
+ Content-Length:104
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xml:lang="en-US" version="1.0">
+
+ <rule id="request">
+ I need help
+ </rule>
+
+ </grammar>
+
+ S->C:MRCP/1.0 543258 200 COMPLETE
+ Completion-Cause:000 success
+
+ C->S:DEFINE-GRAMMAR 543259 MRCP/1.0
+ Content-Type:application/grammar+xml
+ Content-Id:request2@field-level.store
+ Content-Length:104
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xml:lang="en-US" version="1.0">
+
+ <rule id="request">
+ I need help
+ </rule>
+
+ S->C:MRCP/1.0 543258 200 COMPLETE
+
+
+
+Shanmugham, et al. Informational [Page 58]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ Completion-Cause:000 success
+
+ C->S:DEFINE-GRAMMAR 543259 MRCP/1.0
+ Content-Type:application/grammar+xml
+ Content-Id:request2@field-level.store
+ Content-Length:104
+
+ <?xml version="1.0"?>
+
+ <grammar xml:lang="en">
+
+ <import uri="session:politeness@form-level.store"
+ name="polite"/>
+
+ <rule id="basicCmd" scope="public">
+ <example> please move the window </example>
+ <example> open a file </example>
+
+ <ruleref import="polite#startPolite"/>
+ <ruleref uri="#command"/>
+ <ruleref import="polite#endPolite"/>
+ </rule>
+
+ <rule id="command">
+ <ruleref uri="#action"/> <ruleref uri="#object"/>
+ </rule>
+
+ <rule id="action">
+ <choice>
+ <item weight="10" tag="OPEN"> open </item>
+ <item weight="2" tag="CLOSE"> close </item>
+ <item weight="1" tag="DELETE"> delete </item>
+ <item weight="1" tag="MOVE"> move </item>
+ </choice>
+ </rule>
+
+ <rule id="object">
+ <count number="optional">
+ <choice>
+ <item> the </item>
+ <item> a </item>
+ </choice>
+ </count>
+ <choice>
+ <item> window </item>
+ <item> file </item>
+ <item> menu </item>
+ </choice>
+
+
+
+Shanmugham, et al. Informational [Page 59]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ </rule>
+
+ </grammar>
+
+ S->C:MRCP/1.0 543259 200 COMPLETE
+ Completion-Cause:000 success
+
+ C->S:RECOGNIZE 543260 MRCP/1.0
+ N-Best-List-Length:2
+ Content-Type:text/uri-list
+ Content-Length:176
+
+ session:request1@form-level.store
+ session:request2@field-level.store
+ session:helpgramar@root-level.store
+
+ S->C:MRCP/1.0 543260 200 IN-PROGRESS
+
+ S->C:START-OF-SPEECH 543260 IN-PROGRESS MRCP/1.0
+
+ S->C:RECOGNITION-COMPLETE 543260 COMPLETE MRCP/1.0
+ Completion-Cause:000 success
+ Waveform-URL:http://web.media.com/session123/audio.wav
+ Content-Type:applicationt/x-nlsml
+ Content-Length:276
+
+ <?xml version="1.0"?>
+ <result x-model="http://IdentityModel"
+ xmlns:xf="http://www.w3.org/2000/xforms"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <xf:instance name="Person">
+ <Person>
+ <Name> Andre Roy </Name>
+ </Person>
+ </xf:instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+8.9. RECOGNIZE
+
+ The RECOGNIZE method from the client to the server tells the
+ recognizer to start recognition and provides it with a grammar to
+ match for. The RECOGNIZE method can carry parameters to control the
+ sensitivity, confidence level, and the level of detail in results
+ provided by the recognizer. These parameters override the current
+ defaults set by a previous SET-PARAMS method.
+
+
+
+Shanmugham, et al. Informational [Page 60]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ If the resource is in the recognition state, the RECOGNIZE request
+ MUST respond with a failure status.
+
+ If the resource is in the Idle state and was able to successfully
+ start the recognition, the server MUST return a success code and a
+ request-state of IN-PROGRESS. This means that the recognizer is
+ active and that the client should expect further events with this
+ request-id.
+
+ If the resource could not start a recognition, it MUST return a
+ failure status code of 407 and contain a completion-cause header
+ field describing the cause of failure.
+
+ For the recognizer resource, this is the only request that can return
+ request-state of IN-PROGRESS, meaning that recognition is in
+ progress. When the recognition completes by matching one of the
+ grammar alternatives or by a time-out without a match or for some
+ other reason, the recognizer resource MUST send the client a
+ RECOGNITON-COMPLETE event with the result of the recognition and a
+ request-state of COMPLETE.
+
+ For large grammars that can take a long time to compile and for
+ grammars that are used repeatedly, the client could issue a DEFINE-
+ GRAMMAR request with the grammar ahead of time. In such a case, the
+ client can issue the RECOGNIZE request and reference the grammar
+ through the "session:" special URI. This also applies in general if
+ the client wants to restart recognition with a previous inline
+ grammar.
+
+ Note that since the audio and the messages are carried over separate
+ communication paths there may be a race condition between the start
+ of the flow of audio and the receipt of the RECOGNIZE method. For
+ example, if audio flow is started by the client at the same time as
+ the RECOGNIZE method is sent, either the audio or the RECOGNIZE will
+ arrive at the recognizer first. As another example, the client may
+ chose to continuously send audio to the Media server and signal the
+ Media server to recognize using the RECOGNIZE method. A number of
+ mechanisms exist to resolve this condition and the mechanism chosen
+ is left to the implementers of recognizer Media servers.
+
+ Example:
+ C->S:RECOGNIZE 543257 MRCP/1.0
+ Confidence-Threshold:90
+ Content-Type:application/grammar+xml
+ Content-Id:request1@form-level.store
+ Content-Length:104
+
+ <?xml version="1.0"?>
+
+
+
+Shanmugham, et al. Informational [Page 61]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+
+ <!-- the default grammar language is US English -->
+ <grammar xml:lang="en-US" version="1.0">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ </grammar>
+
+ S->C:MRCP/1.0 543257 200 IN-PROGRESS
+
+ S->C:START-OF-SPEECH 543257 IN-PROGRESS MRCP/1.0
+
+ S->C:RECOGNITION-COMPLETE 543257 COMPLETE MRCP/1.0
+
+ Completion-Cause:000 success
+ Waveform-URL:http://web.media.com/session123/audio.wav
+ Content-Type:application/x-nlsml
+ Content-Length:276
+
+ <?xml version="1.0"?>
+ <result x-model="http://IdentityModel"
+ xmlns:xf="http://www.w3.org/2000/xforms"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <xf:instance name="Person">
+ <Person>
+ <Name> Andre Roy </Name>
+ </Person>
+ </xf:instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+
+
+
+Shanmugham, et al. Informational [Page 62]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+8.10. STOP
+
+ The STOP method from the client to the server tells the resource to
+ stop recognition if one is active. If a RECOGNIZE request is active
+ and the STOP request successfully terminated it, then the response
+ header contains an active-request-id-list header field containing the
+ request-id of the RECOGNIZE request that was terminated. In this
+ case, no RECOGNITION-COMPLETE event will be sent for the terminated
+ request. If there was no recognition active, then the response MUST
+ NOT contain an active-request-id-list header field. Either
+ way,method the response MUST contain a status of 200(Success).
+
+ Example:
+ C->S:RECOGNIZE 543257 MRCP/1.0
+ Confidence-Threshold:90
+ Content-Type:application/grammar+xml
+ Content-Id:request1@form-level.store
+ Content-Length:104
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xml:lang="en-US" version="1.0">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ </grammar>
+
+ S->C:MRCP/1.0 543257 200 IN-PROGRESS
+
+ C->S:STOP 543258 200 MRCP/1.0
+
+
+
+
+
+Shanmugham, et al. Informational [Page 63]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ S->C:MRCP/1.0 543258 200 COMPLETE
+ Active-Request-Id-List:543257
+
+8.11. GET-RESULT
+
+ The GET-RESULT method from the client to the server can be issued
+ when the recognizer is in the recognized state. This request allows
+ the client to retrieve results for a completed recognition. This is
+ useful if the client decides it wants more alternatives or more
+ information. When the media server receives this request, it should
+ re-compute and return the results according to the recognition
+ constraints provided in the GET-RESULT request.
+
+ The GET-RESULT request could specify constraints like a different
+ confidence-threshold, or n-best-list-length. This feature is
+ optional and the automatic speech recognition (ASR) engine may return
+ a status of unsupported feature.
+
+ Example:
+ C->S:GET-RESULT 543257 MRCP/1.0
+ Confidence-Threshold:90
+
+ S->C:MRCP/1.0 543257 200 COMPLETE
+ Content-Type:application/x-nlsml
+ Content-Length:276
+
+ <?xml version="1.0"?>
+ <result x-model="http://IdentityModel"
+ xmlns:xf="http://www.w3.org/2000/xforms"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <xf:instance name="Person">
+ <Person>
+ <Name> Andre Roy </Name>
+
+ </Person>
+ </xf:instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+8.12. START-OF-SPEECH
+
+ This is an event from the recognizer to the client indicating that it
+ has detected speech. This event is useful in implementing kill-on-
+ barge-in scenarios when the synthesizer resource is in a different
+ session than the recognizer resource and, hence, is not aware of an
+ incoming audio source. In these cases, it is up to the client to act
+
+
+
+Shanmugham, et al. Informational [Page 64]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ as a proxy and turn around and issue the BARGE-IN-OCCURRED method to
+ the synthesizer resource. The recognizer resource also sends a
+ unique proxy-sync-id in the header for this event, which is sent to
+ the synthesizer in the BARGE-IN-OCCURRED method to the synthesizer.
+
+ This event should be generated irrespective of whether the
+ synthesizer and recognizer are in the same media server or not.
+
+8.13. RECOGNITION-START-TIMERS
+
+ This request is sent from the client to the recognition resource when
+ it knows that a kill-on-barge-in prompt has finished playing. This
+ is useful in the scenario when the recognition and synthesizer
+ engines are not in the same session. Here, when a kill-on-barge-in
+ prompt is being played, you want the RECOGNIZE request to be
+ simultaneously active so that it can detect and implement kill-on-
+ barge-in. But at the same time, you don't want the recognizer to
+ start the no-input timers until the prompt is finished. The
+ parameter recognizer-start-timers header field in the RECOGNIZE
+ request will allow the client to say if the timers should be started
+ or not. The recognizer should not start the timers until the client
+ sends a RECOGNITION-START-TIMERS method to the recognizer.
+
+8.14. RECOGNITON-COMPLETE
+
+ This is an Event from the recognizer resource to the client
+ indicating that the recognition completed. The recognition result is
+ sent in the MRCP body of the message. The request-state field MUST
+ be COMPLETE indicating that this is the last event with that
+ request-id, and that the request with that request-id is now
+ complete. The recognizer context still holds the results and the
+ audio waveform input of that recognition until the next RECOGNIZE
+ request is issued. A URL to the audio waveform MAY BE returned to
+ the client in a waveform-url header field in the RECOGNITION-COMPLETE
+ event. The client can use this URI to retrieve or playback the
+ audio.
+
+ Example:
+ C->S:RECOGNIZE 543257 MRCP/1.0
+ Confidence-Threshold:90
+ Content-Type:application/grammar+xml
+ Content-Id:request1@form-level.store
+ Content-Length:104
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xml:lang="en-US" version="1.0">
+
+
+
+Shanmugham, et al. Informational [Page 65]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ </grammar>
+
+ S->C:MRCP/1.0 543257 200 IN-PROGRESS
+
+ S->C:START-OF-SPEECH 543257 IN-PROGRESS MRCP/1.0
+
+ S->C:RECOGNITION-COMPLETE 543257 COMPLETE MRCP/1.0
+ Completion-Cause:000 success
+ Waveform-URL:http://web.media.com/session123/audio.wav
+ Content-Type:application/x-nlsml
+ Content-Length:276
+
+ <?xml version="1.0"?>
+ <result x-model="http://IdentityModel"
+ xmlns:xf="http://www.w3.org/2000/xforms"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <xf:instance name="Person">
+ <Person>
+ <Name> Andre Roy </Name>
+ </Person>
+ </xf:instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 66]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+8.15. DTMF Detection
+
+ Digits received as DTMF tones will be delivered to the automatic
+ speech recognition (ASR) engine in the RTP stream according to RFC
+ 2833 [15]. The automatic speech recognizer (ASR) needs to support
+ RFC 2833 [15] to recognize digits. If it does not support RFC 2833
+ [15], it will have to process the audio stream and extract the audio
+ tones from it.
+
+9. Future Study
+
+ Various sections of the recognizer could be distributed into Digital
+ Signal Processors (DSPs) on the Voice Browser/Gateway or IP Phones.
+ For instance, the gateway might perform voice activity detection to
+ reduce network bandwidth and CPU requirement of the automatic speech
+ recognition (ASR) server. Such extensions are deferred for further
+ study and will not be addressed in this document.
+
+10. Security Considerations
+
+ The MRCP protocol may carry sensitive information such as account
+ numbers, passwords, etc. For this reason it is important that the
+ client have the option of secure communication with the server for
+ both the control messages as well as the media, though the client is
+ not required to use it. If all MRCP communications happens in a
+ trusted domain behind a firewall, this may not be necessary. If the
+ client or server is deployed in an insecure network, communication
+ happening across this insecure network needs to be protected. In
+ such cases, the following additional security functionality MUST be
+ supported on the MRCP server. MRCP servers MUST implement Transport
+ Layer Security (TLS) to secure the RTSP communication, i.e., the RTSP
+ stack SHOULD support the rtsps: URI form. MRCP servers MUST support
+ Secure Real-Time Transport Protocol (SRTP) as an option to send and
+ receive media.
+
+11. RTSP-Based Examples
+
+ The following is an example of a typical session of speech synthesis
+ and recognition between a client and the server.
+
+ Opening the synthesizer. This is the first resource for this
+ session. The server and client agree on a single Session ID 12345678
+ and set of RTP/RTCP ports on both sides.
+
+ C->S:SETUP rtsp://media.server.com/media/synthesizer RTSP/1.0
+ CSeq:2
+ Transport:RTP/AVP;unicast;client_port=46456-46457
+ Content-Type:application/sdp
+
+
+
+Shanmugham, et al. Informational [Page 67]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ Content-Length:190
+
+ v=0
+ o=- 123 456 IN IP4 10.0.0.1
+ s=Media Server
+ p=+1-888-555-1212
+ c=IN IP4 0.0.0.0
+ t=0 0
+ m=audio 0 RTP/AVP 0 96
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:96 telephone-event/8000
+ a=fmtp:96 0-15
+
+ S->C:RTSP/1.0 200 OK
+ CSeq:2
+ Transport:RTP/AVP;unicast;client_port=46456-46457;
+ server_port=46460-46461
+ Session:12345678
+ Content-Length:190
+ Content-Type:application/sdp
+
+ v=0
+ o=- 3211724219 3211724219 IN IP4 10.3.2.88
+ s=Media Server
+ c=IN IP4 0.0.0.0
+ t=0 0
+ m=audio 46460 RTP/AVP 0 96
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:96 telephone-event/8000
+ a=fmtp:96 0-15
+
+ Opening a recognizer resource. Uses the existing session ID and
+ ports.
+
+ C->S:SETUP rtsp://media.server.com/media/recognizer RTSP/1.0
+ CSeq:3
+ Transport:RTP/AVP;unicast;client_port=46456-46457;
+ mode=record;ttl=127
+ Session:12345678
+
+ S->C:RTSP/1.0 200 OK
+ CSeq:3
+ Transport:RTP/AVP;unicast;client_port=46456-46457;
+ server_port=46460-46461;mode=record;ttl=127
+ Session:12345678
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 68]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ An ANNOUNCE message with the MRCP SPEAK request initiates speech.
+
+ C->S:ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
+ CSeq:4
+ Session:12345678
+ Content-Type:application/mrcp
+ Content-Length:456
+
+ SPEAK 543257 MRCP/1.0
+ Kill-On-Barge-In:false
+ Voice-gender:neutral
+ Voice-category:teenager
+ Prosody-volume:medium
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>You have 4 new messages.</sentence>
+ <sentence>The first is from <say-as
+ type="name">Stephanie Williams</say-as> <mark
+ name="Stephanie"/>
+ and arrived at <break/>
+ <say-as type="time">3:45pm</say-as>.</sentence>
+
+ <sentence>The subject is <prosody
+ rate="-20%">ski trip</prosody></sentence>
+ </paragraph>
+ </speak>
+
+ S->C:RTSP/1.0 200 OK
+ CSeq:4
+ Session:12345678
+ RTP-Info:url=rtsp://media.server.com/media/synthesizer;
+ seq=9810092;rtptime=3450012
+ Content-Type:application/mrcp
+ Content-Length:456
+
+ MRCP/1.0 543257 200 IN-PROGRESS
+
+
+ The synthesizer hits the special marker in the message to be spoken
+ and faithfully informs the client of the event.
+
+ S->C:ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
+ CSeq:5
+ Session:12345678
+
+
+
+Shanmugham, et al. Informational [Page 69]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ Content-Type:application/mrcp
+ Content-Length:123
+
+ SPEECH-MARKER 543257 IN-PROGRESS MRCP/1.0
+ Speech-Marker:Stephanie
+ C->S:RTSP/1.0 200 OK
+ CSeq:5
+
+ The synthesizer finishes with the SPEAK request.
+
+ S->C:ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
+ CSeq:6
+ Session:12345678
+ Content-Type:application/mrcp
+ Content-Length:123
+
+ SPEAK-COMPLETE 543257 COMPLETE MRCP/1.0
+
+
+ C->S:RTSP/1.0 200 OK
+ CSeq:6
+
+ The recognizer is issued a request to listen for the customer
+ choices.
+
+ C->S:ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0
+ CSeq:7
+ Session:12345678
+
+ RECOGNIZE 543258 MRCP/1.0
+ Content-Type:application/grammar+xml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xml:lang="en-US" version="1.0">
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ Can I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ </grammar>
+
+
+
+Shanmugham, et al. Informational [Page 70]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ S->C:RTSP/1.0 200 OK
+ CSeq:7
+ Content-Type:application/mrcp
+ Content-Length:123
+
+ MRCP/1.0 543258 200 IN-PROGRESS
+
+ The client issues the next MRCP SPEAK method in an ANNOUNCE message,
+ asking the user the question. It is generally RECOMMENDED when
+ playing a prompt to the user with kill-on-barge-in and asking for
+ input, that the client issue the RECOGNIZE request ahead of the SPEAK
+ request for optimum performance and user experience. This way, it is
+ guaranteed that the recognizer is online before the prompt starts
+ playing and the user's speech will not be truncated at the beginning
+ (especially for power users).
+
+ C->S:ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
+ CSeq:8 Session:12345678 Content-Type:application/mrcp
+ Content-Length:733
+
+ SPEAK 543259 MRCP/1.0
+ Kill-On-Barge-In:true
+ Content-Type:application/synthesis+ssml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <speak>
+ <paragraph>
+ <sentence>Welcome to ABC corporation.</sentence>
+ <sentence>Who would you like Talk to.</sentence>
+ </paragraph>
+ </speak>
+
+ S->C:RTSP/1.0 200 OK
+ CSeq:8
+ Content-Type:application/mrcp
+ Content-Length:123
+
+ MRCP/1.0 543259 200 IN-PROGRESS
+
+ Since the last SPEAK request had Kill-On-Barge-In set to "true", the
+ message synthesizer is interrupted when the user starts speaking, and
+ the client is notified.
+
+ Now, since the recognition and synthesizer resources are in the same
+ session, they worked with each other to deliver kill-on-barge-in. If
+ the resources were in different sessions, it would have taken a few
+ more messages before the client got the SPEAK-COMPLETE event from the
+
+
+
+Shanmugham, et al. Informational [Page 71]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ synthesizer resource. Whether the synthesizer and recognizer are in
+ the same session or not, the recognizer MUST generate the START-OF-
+ SPEECH event to the client.
+
+ The client should have then blindly turned around and issued a
+ BARGE-IN-OCCURRED method to the synthesizer resource. The
+ synthesizer, if kill-on-barge-in was enabled on the current SPEAK
+ request, would have then interrupted it and issued SPEAK-COMPLETE
+ event to the client. In this example, since the synthesizer and
+ recognizer are in the same session, the client did not issue the
+ BARGE-IN-OCCURRED method to the synthesizer and assumed that kill-
+ on-barge-in was implemented between the two resources in the same
+ session and worked.
+
+ The completion-cause code differentiates if this is normal completion
+ or a kill-on-barge-in interruption.
+
+ S->C:ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0
+ CSeq:9
+ Session:12345678
+ Content-Type:application/mrcp
+ Content-Length:273
+
+ START-OF-SPEECH 543258 IN-PROGRESS MRCP/1.0
+
+ C->S:RTSP/1.0 200 OK
+ CSeq:9
+
+ S->C:ANNOUNCE rtsp://media.server.com/media/synthesizer RTSP/1.0
+ CSeq:10
+ Session:12345678
+ Content-Type:application/mrcp
+ Content-Length:273
+
+ SPEAK-COMPLETE 543259 COMPLETE MRCP/1.0
+ Completion-Cause:000 normal
+
+ C->S:RTSP/1.0 200 OK
+ CSeq:10
+
+ The recognition resource matched the spoken stream to a grammar and
+ generated results. The result of the recognition is returned by the
+ server as part of the RECOGNITION-COMPLETE event.
+
+ S->C:ANNOUNCE rtsp://media.server.com/media/recognizer RTSP/1.0
+ CSeq:11
+ Session:12345678
+ Content-Type:application/mrcp
+
+
+
+Shanmugham, et al. Informational [Page 72]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ Content-Length:733
+
+ RECOGNITION-COMPLETE 543258 COMPLETE MRCP/1.0
+ Completion-Cause:000 success
+ Waveform-URL:http://web.media.com/session123/audio.wav
+ Content-Type:application/x-nlsml
+ Content-Length:104
+
+ <?xml version="1.0"?>
+ <result x-model="http://IdentityModel"
+ xmlns:xf="http://www.w3.org/2000/xforms"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <xf:instance name="Person">
+ <Person>
+ <Name> Andre Roy </Name>
+ </Person>
+ </xf:instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+ C->S:RTSP/1.0 200 OK
+ CSeq:11
+
+ C->S:TEARDOWN rtsp://media.server.com/media/synthesizer RTSP/1.0
+ CSeq:12
+ Session:12345678
+
+ S->C:RTSP/1.0 200 OK
+ CSeq:12
+
+ We are done with the resources and are tearing them down. When the
+ last of the resources for this session are released, the Session-ID
+ and the RTP/RTCP ports are also released.
+
+ C->S:TEARDOWN rtsp://media.server.com/media/recognizer RTSP/1.0
+ CSeq:13
+ Session:12345678
+
+ S->C:RTSP/1.0 200 OK
+ CSeq:13
+
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 73]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+12. Informative References
+
+ [1] Fielding, R., Gettys, J., Mogul, J., Frystyk. H., Masinter, L.,
+ Leach, P., and T. Berners-Lee, "Hypertext transfer protocol --
+ HTTP/1.1", RFC 2616, June 1999.
+
+ [2] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
+ Protocol (RTSP)", RFC 2326, April 1998
+
+ [3] Crocker, D. and P. Overell, "Augmented BNF for Syntax
+ Specifications: ABNF", RFC 4234, October 2005.
+
+ [4] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A.,
+ Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP:
+ Session Initiation Protocol", RFC 3261, June 2002.
+
+ [5] Handley, M. and V. Jacobson, "SDP: Session Description
+ Protocol", RFC 2327, April 1998.
+
+ [6] World Wide Web Consortium, "Voice Extensible Markup Language
+ (VoiceXML) Version 2.0", W3C Candidate Recommendation, March
+ 2004.
+
+ [7] Resnick, P., "Internet Message Format", RFC 2822, April 2001.
+
+ [8] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [9] World Wide Web Consortium, "Speech Synthesis Markup Language
+ (SSML) Version 1.0", W3C Candidate Recommendation, September
+ 2004.
+
+ [10] World Wide Web Consortium, "Natural Language Semantics Markup
+ Language (NLSML) for the Speech Interface Framework", W3C
+ Working Draft, 30 May 2001.
+
+ [11] World Wide Web Consortium, "Speech Recognition Grammar
+ Specification Version 1.0", W3C Candidate Recommendation, March
+ 2004.
+
+ [12] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD
+ 63, RFC 3629, November 2003.
+
+ [13] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
+ Extensions (MIME) Part Two: Media Types", RFC 2046, November
+ 1996.
+
+
+
+
+
+Shanmugham, et al. Informational [Page 74]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ [14] Levinson, E., "Content-ID and Message-ID Uniform Resource
+ Locators", RFC 2392, August 1998.
+
+ [15] Schulzrinne, H. and S. Petrack, "RTP Payload for DTMF Digits,
+ Telephony Tones and Telephony Signals", RFC 2833, May 2000.
+
+ [16] Alvestrand, H., "Tags for the Identification of Languages", BCP
+ 47, RFC 3066, January 2001.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 75]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+Appendix A. ABNF Message Definitions
+
+ ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
+
+ CHAR = %x01-7F ; any 7-bit US-ASCII character,
+ ; excluding NUL
+
+ CR = %x0D ; carriage return
+
+ CRLF = CR LF ; Internet standard newline
+
+ DIGIT = %x30-39 ; 0-9
+
+ DQUOTE = %x22 ; " (Double Quote)
+
+ HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
+
+ HTAB = %x09 ; horizontal tab
+
+ LF = %x0A ; linefeed
+
+ OCTET = %x00-FF ; 8 bits of data
+
+ SP = %x20 ; space
+
+ WSP = SP / HTAB ; white space
+
+ LWS = [*WSP CRLF] 1*WSP ; linear whitespace
+
+ SWS = [LWS] ; sep whitespace
+
+ UTF8-NONASCII = %xC0-DF 1UTF8-CONT
+ / %xE0-EF 2UTF8-CONT
+ / %xF0-F7 3UTF8-CONT
+ / %xF8-Fb 4UTF8-CONT
+ / %xFC-FD 5UTF8-CONT
+
+ UTF8-CONT = %x80-BF
+
+ param = *pchar
+
+ quoted-string = SWS DQUOTE *(qdtext / quoted-pair )
+ DQUOTE
+
+ qdtext = LWS / %x21 / %x23-5B / %x5D-7E
+ / UTF8-NONASCII
+
+
+
+
+
+Shanmugham, et al. Informational [Page 76]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ quoted-pair = "\" (%x00-09 / %x0B-0C
+ / %x0E-7F)
+
+ token = 1*(alphanum / "-" / "." / "!" / "%" / "*"
+ / "_" / "+" / "`" / "'" / "~" )
+
+ reserved = ";" / "/" / "?" / ":" / "@" / "&" / "="
+ / "+" / "$" / ","
+
+ mark = "-" / "_" / "." / "!" / "~" / "*" / "'"
+ / "(" / ")"
+
+ unreserved = alphanum / mark
+
+ char = unreserved / escaped /
+ ":" / "@" / "&" / "=" / "+" / "$" / ","
+
+ alphanum = ALPHA / DIGIT
+
+ escaped = "%" HEXDIG HEXDIG
+
+ absoluteURI = scheme ":" ( hier-part / opaque-part )
+
+ relativeURI = ( net-path / abs-path / rel-path )
+ [ "?" query ]
+
+ hier-part = ( net-path / abs-path ) [ "?" query ]
+
+ net-path = "//" authority [ abs-path ]
+
+ abs-path = "/" path-segments
+
+ rel-path = rel-segment [ abs-path ]
+
+ rel-segment = 1*( unreserved / escaped / ";" / "@"
+ / "&" / "=" / "+" / "$" / "," )
+
+ opaque-part = uric-no-slash *uric
+
+ uric = reserved / unreserved / escaped
+
+ uric-no-slash = unreserved / escaped / ";" / "?" / ":"
+ / "@" / "&" / "=" / "+" / "$" / ","
+
+ path-segments = segment *( "/" segment )
+
+ segment = *pchar *( ";" param )
+
+
+
+
+Shanmugham, et al. Informational [Page 77]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
+
+ authority = srvr / reg-name
+
+ srvr = [ [ userinfo "@" ] hostport ]
+
+ reg-name = 1*( unreserved / escaped / "$" / ","
+ / ";" / ":" / "@" / "&" / "=" / "+" )
+
+ query = *uric
+
+ userinfo = ( user ) [ ":" password ] "@"
+
+ user = 1*( unreserved / escaped
+ / user-unreserved )
+
+ user-unreserved = "&" / "=" / "+" / "$" / "," / ";"
+ / "?" / "/"
+
+ password = *( unreserved / escaped /
+ "&" / "=" / "+" / "$" / "," )
+
+ hostport = host [ ":" port ]
+
+ host = hostname / IPv4address / IPv6reference
+
+ hostname = *( domainlabel "." ) toplabel [ "." ]
+
+ domainlabel = alphanum
+ / alphanum *( alphanum / "-" ) alphanum
+
+ toplabel = ALPHA / ALPHA *( alphanum / "-" )
+ alphanum
+
+ IPv4address = 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "."
+ 1*3DIGIT
+
+ IPv6reference = "[" IPv6address "]"
+
+ IPv6address = hexpart [ ":" IPv4address ]
+
+ hexpart = hexseq / hexseq "::" [ hexseq ] / "::"
+ [ hexseq ]
+
+ hexseq = hex4 *( ":" hex4)
+
+ hex4 = 1*4HEXDIG
+
+
+
+
+Shanmugham, et al. Informational [Page 78]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ port = 1*DIGIT
+
+ generic-message = start-line
+ message-header
+ CRLF
+ [ message-body ]
+
+ message-body = *OCTET
+
+ start-line = request-line / status-line / event-line
+
+ request-line = method-name SP request-id SP
+ mrcp-version CRLF
+
+ status-line = mrcp-version SP request-id SP
+ status-code SP request-state CRLF
+
+ event-line = event-name SP request-id SP
+ request-state SP mrcp-version CRLF
+
+ message-header = 1*(generic-header / resource-header)
+
+ generic-header = active-request-id-list
+ / proxy-sync-id
+ / content-id
+ / content-type
+ / content-length
+ / content-base
+ / content-location
+ / content-encoding
+ / cache-control
+ / logging-tag
+ ; -- content-id is as defined in RFC 2392 and RFC 2046
+
+ mrcp-version = "MRCP" "/" 1*DIGIT "." 1*DIGIT
+
+ request-id = 1*DIGIT
+
+ status-code = 1*DIGIT
+
+ active-request-id-list = "Active-Request-Id-List" ":"
+ request-id *("," request-id) CRLF
+
+ proxy-sync-id = "Proxy-Sync-Id" ":" 1*ALPHA CRLF
+
+ content-length = "Content-Length" ":" 1*DIGIT CRLF
+
+ content-base = "Content-Base" ":" absoluteURI CRLF
+
+
+
+Shanmugham, et al. Informational [Page 79]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ content-type = "Content-Type" ":" media-type
+
+ media-type = type "/" subtype *( ";" parameter )
+
+ type = token
+
+ subtype = token
+
+ parameter = attribute "=" value
+
+ attribute = token
+
+ value = token / quoted-string
+
+ content-encoding = "Content-Encoding" ":"
+ *WSP content-coding
+ *(*WSP "," *WSP content-coding *WSP )
+ CRLF
+
+ content-coding = token
+
+
+ content-location = "Content-Location" ":"
+ ( absoluteURI / relativeURI ) CRLF
+
+ cache-control = "Cache-Control" ":"
+ *WSP cache-directive
+ *( *WSP "," *WSP cache-directive *WSP )
+ CRLF
+
+ cache-directive = "max-age" "=" delta-seconds
+ / "max-stale" "=" delta-seconds
+ / "min-fresh" "=" delta-seconds
+
+ logging-tag = "Logging-Tag" ":" 1*ALPHA CRLF
+
+
+ resource-header = recognizer-header
+ / synthesizer-header
+
+ method-name = synthesizer-method
+ / recognizer-method
+
+ event-name = synthesizer-event
+ / recognizer-event
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 80]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ request-state = "COMPLETE"
+ / "IN-PROGRESS"
+ / "PENDING"
+
+ synthesizer-method = "SET-PARAMS"
+ / "GET-PARAMS"
+ / "SPEAK"
+ / "STOP"
+ / "PAUSE"
+ / "RESUME"
+ / "BARGE-IN-OCCURRED"
+ / "CONTROL"
+
+ synthesizer-event = "SPEECH-MARKER"
+ / "SPEAK-COMPLETE"
+
+ synthesizer-header = jump-target
+ / kill-on-barge-in
+ / speaker-profile
+ / completion-cause
+ / voice-parameter
+ / prosody-parameter
+ / vendor-specific
+ / speech-marker
+ / speech-language
+ / fetch-hint
+ / audio-fetch-hint
+ / fetch-timeout
+ / failed-uri
+ / failed-uri-cause
+ / speak-restart
+ / speak-length
+
+ recognizer-method = "SET-PARAMS"
+ / "GET-PARAMS"
+ / "DEFINE-GRAMMAR"
+ / "RECOGNIZE"
+ / "GET-RESULT"
+ / "RECOGNITION-START-TIMERS"
+ / "STOP"
+
+ recognizer-event = "START-OF-SPEECH"
+ / "RECOGNITION-COMPLETE"
+
+ recognizer-header = confidence-threshold
+ / sensitivity-level
+ / speed-vs-accuracy
+ / n-best-list-length
+
+
+
+Shanmugham, et al. Informational [Page 81]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ / no-input-timeout
+ / recognition-timeout
+ / waveform-url
+ / completion-cause
+ / recognizer-context-block
+ / recognizer-start-timers
+ / vendor-specific
+ / speech-complete-timeout
+ / speech-incomplete-timeout
+ / dtmf-interdigit-timeout
+ / dtmf-term-timeout
+ / dtmf-term-char
+ / fetch-timeout
+ / failed-uri
+ / failed-uri-cause
+ / save-waveform
+ / new-audio-channel
+ / speech-language
+
+ jump-target = "Jump-Size" ":" speech-length-value CRLF
+
+ speech-length-value = numeric-speech-length
+ / text-speech-length
+
+ text-speech-length = 1*ALPHA SP "Tag"
+
+ numeric-speech-length =("+" / "-") 1*DIGIT SP
+ numeric-speech-unit
+
+ numeric-speech-unit = "Second"
+ / "Word"
+ / "Sentence"
+ / "Paragraph"
+
+ delta-seconds = 1*DIGIT
+
+ kill-on-barge-in = "Kill-On-Barge-In" ":" boolean-value CRLF
+
+ boolean-value = "true" / "false"
+
+ speaker-profile = "Speaker-Profile" ":" absoluteURI CRLF
+
+ completion-cause = "Completion-Cause" ":" 1*DIGIT SP
+ 1*ALPHA CRLF
+
+ voice-parameter = "Voice-" voice-param-name ":"
+ voice-param-value CRLF
+
+
+
+
+Shanmugham, et al. Informational [Page 82]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ voice-param-name = 1*ALPHA
+
+ voice-param-value = 1*alphanum
+
+ prosody-parameter = "Prosody-" prosody-param-name ":"
+ prosody-param-value CRLF
+
+ prosody-param-name = 1*ALPHA
+
+ prosody-param-value = 1*alphanum
+
+ vendor-specific = "Vendor-Specific-Parameters" ":"
+ vendor-specific-av-pair
+ *[";" vendor-specific-av-pair] CRLF
+
+ vendor-specific-av-pair = vendor-av-pair-name "="
+ vendor-av-pair-value
+
+ vendor-av-pair-name = 1*ALPHA
+
+ vendor-av-pair-value = 1*alphanum
+
+ speech-marker = "Speech-Marker" ":" 1*ALPHA CRLF
+
+ speech-language = "Speech-Language" ":" 1*ALPHA CRLF
+
+ fetch-hint = "Fetch-Hint" ":" 1*ALPHA CRLF
+
+ audio-fetch-hint = "Audio-Fetch-Hint" ":" 1*ALPHA CRLF
+
+ fetch-timeout = "Fetch-Timeout" ":" 1*DIGIT CRLF
+
+ failed-uri = "Failed-URI" ":" absoluteURI CRLF
+
+ failed-uri-cause = "Failed-URI-Cause" ":" 1*ALPHA CRLF
+
+ speak-restart = "Speak-Restart" ":" boolean-value CRLF
+
+ speak-length = "Speak-Length" ":" speech-length-value
+ CRLF
+ confidence-threshold = "Confidence-Threshold" ":"
+ 1*DIGIT CRLF
+
+ sensitivity-level = "Sensitivity-Level" ":" 1*DIGIT CRLF
+
+ speed-vs-accuracy = "Speed-Vs-Accuracy" ":" 1*DIGIT CRLF
+
+ n-best-list-length = "N-Best-List-Length" ":" 1*DIGIT CRLF
+
+
+
+Shanmugham, et al. Informational [Page 83]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+ no-input-timeout = "No-Input-Timeout" ":" 1*DIGIT CRLF
+
+ recognition-timeout = "Recognition-Timeout" ":" 1*DIGIT CRLF
+
+ waveform-url = "Waveform-URL" ":" absoluteURI CRLF
+
+ recognizer-context-block = "Recognizer-Context-Block" ":"
+ 1*ALPHA CRLF
+
+ recognizer-start-timers = "Recognizer-Start-Timers" ":"
+ boolean-value CRLF
+
+ speech-complete-timeout = "Speech-Complete-Timeout" ":"
+ 1*DIGIT CRLF
+
+ speech-incomplete-timeout = "Speech-Incomplete-Timeout" ":"
+ 1*DIGIT CRLF
+
+ dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":"
+ 1*DIGIT CRLF
+
+ dtmf-term-timeout = "DTMF-Term-Timeout" ":" 1*DIGIT CRLF
+
+ dtmf-term-char = "DTMF-Term-Char" ":" CHAR CRLF
+
+ save-waveform = "Save-Waveform" ":" boolean-value CRLF
+
+ new-audio-channel = "New-Audio-Channel" ":"
+ boolean-value CRLF
+
+Appendix B. Acknowledgements
+
+ Andre Gillet (Nuance Communications)
+ Andrew Hunt (SpeechWorks)
+ Aaron Kneiss (SpeechWorks)
+ Kristian Finlator (SpeechWorks)
+ Martin Dragomirecky (Cisco Systems, Inc.)
+ Pierre Forgues (Nuance Communications)
+ Suresh Kaliannan (Cisco Systems, Inc.)
+ Corey Stohs (Cisco Systems, Inc.)
+ Dan Burnett (Nuance Communications)
+
+
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 84]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+Authors' Addresses
+
+ Saravanan Shanmugham
+ Cisco Systems, Inc.
+ 170 W. Tasman Drive
+ San Jose, CA 95134
+
+ EMail: sarvi@cisco.com
+
+
+ Peter Monaco
+ Nuasis Corporation
+ 303 Bryant St.
+ Mountain View, CA 94041
+
+ EMail: peter.monaco@nuasis.com
+
+
+ Brian Eberman
+ Speechworks, Inc.
+ 695 Atlantic Avenue
+ Boston, MA 02111
+
+ EMail: brian.eberman@speechworks.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 85]
+
+RFC 4463 MRCP by Cisco, Nuance, and Speechworks April 2006
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2006).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78 and at www.rfc-editor.org/copyright.html, and
+ except as set forth therein, the authors retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at
+ ietf-ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is provided by the IETF
+ Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+Shanmugham, et al. Informational [Page 86]
+