summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6787.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc6787.txt')
-rw-r--r--doc/rfc/rfc6787.txt12547
1 files changed, 12547 insertions, 0 deletions
diff --git a/doc/rfc/rfc6787.txt b/doc/rfc/rfc6787.txt
new file mode 100644
index 0000000..ca651b7
--- /dev/null
+++ b/doc/rfc/rfc6787.txt
@@ -0,0 +1,12547 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) D. Burnett
+Request for Comments: 6787 Voxeo
+Category: Standards Track S. Shanmugham
+ISSN: 2070-1721 Cisco Systems, Inc.
+ November 2012
+
+
+ Media Resource Control Protocol Version 2 (MRCPv2)
+
+Abstract
+
+ The Media Resource Control Protocol Version 2 (MRCPv2) allows client
+ hosts to control media service resources such as speech synthesizers,
+ recognizers, verifiers, and identifiers residing in servers on the
+ network. MRCPv2 is not a "stand-alone" protocol -- it relies on
+ other protocols, such as the Session Initiation Protocol (SIP), to
+ coordinate MRCPv2 clients and servers and manage sessions between
+ them, and the Session Description Protocol (SDP) to describe,
+ discover, and exchange capabilities. It also depends on SIP and SDP
+ to establish the media sessions and associated parameters between the
+ media source or sink and the media server. Once this is done, the
+ MRCPv2 exchange operates over the control session established above,
+ allowing the client to control the media processing resources on the
+ speech resource server.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6787.
+
+Copyright Notice
+
+ Copyright (c) 2012 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+
+
+
+Burnett & Shanmugham Standards Track [Page 1]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+ This document may contain material from IETF Documents or IETF
+ Contributions published or made publicly available before November
+ 10, 2008. The person(s) controlling the copyright in some of this
+ material may not have granted the IETF Trust the right to allow
+ modifications of such material outside the IETF Standards Process.
+ Without obtaining an adequate license from the person(s) controlling
+ the copyright in such materials, this document may not be modified
+ outside the IETF Standards Process, and derivative works of it may
+ not be created outside the IETF Standards Process, except to format
+ it for publication as an RFC or to translate it into languages other
+ than English.
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8
+ 2. Document Conventions . . . . . . . . . . . . . . . . . . . . 9
+ 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . 10
+ 2.2. State-Machine Diagrams . . . . . . . . . . . . . . . . . 10
+ 2.3. URI Schemes . . . . . . . . . . . . . . . . . . . . . . 11
+ 3. Architecture . . . . . . . . . . . . . . . . . . . . . . . . 11
+ 3.1. MRCPv2 Media Resource Types . . . . . . . . . . . . . . 12
+ 3.2. Server and Resource Addressing . . . . . . . . . . . . . 14
+ 4. MRCPv2 Basics . . . . . . . . . . . . . . . . . . . . . . . . 14
+ 4.1. Connecting to the Server . . . . . . . . . . . . . . . . 14
+ 4.2. Managing Resource Control Channels . . . . . . . . . . . 14
+ 4.3. SIP Session Example . . . . . . . . . . . . . . . . . . 17
+ 4.4. Media Streams and RTP Ports . . . . . . . . . . . . . . 22
+ 4.5. MRCPv2 Message Transport . . . . . . . . . . . . . . . . 24
+ 4.6. MRCPv2 Session Termination . . . . . . . . . . . . . . . 24
+ 5. MRCPv2 Specification . . . . . . . . . . . . . . . . . . . . 24
+ 5.1. Common Protocol Elements . . . . . . . . . . . . . . . . 25
+ 5.2. Request . . . . . . . . . . . . . . . . . . . . . . . . 28
+ 5.3. Response . . . . . . . . . . . . . . . . . . . . . . . . 29
+ 5.4. Status Codes . . . . . . . . . . . . . . . . . . . . . . 30
+ 5.5. Events . . . . . . . . . . . . . . . . . . . . . . . . . 31
+ 6. MRCPv2 Generic Methods, Headers, and Result Structure . . . . 32
+ 6.1. Generic Methods . . . . . . . . . . . . . . . . . . . . 32
+ 6.1.1. SET-PARAMS . . . . . . . . . . . . . . . . . . . . . 32
+ 6.1.2. GET-PARAMS . . . . . . . . . . . . . . . . . . . . . 33
+ 6.2. Generic Message Headers . . . . . . . . . . . . . . . . 34
+ 6.2.1. Channel-Identifier . . . . . . . . . . . . . . . . . 35
+ 6.2.2. Accept . . . . . . . . . . . . . . . . . . . . . . . 36
+
+
+
+Burnett & Shanmugham Standards Track [Page 2]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ 6.2.3. Active-Request-Id-List . . . . . . . . . . . . . . . 36
+ 6.2.4. Proxy-Sync-Id . . . . . . . . . . . . . . . . . . . 36
+ 6.2.5. Accept-Charset . . . . . . . . . . . . . . . . . . . 37
+ 6.2.6. Content-Type . . . . . . . . . . . . . . . . . . . . 37
+ 6.2.7. Content-ID . . . . . . . . . . . . . . . . . . . . . 38
+ 6.2.8. Content-Base . . . . . . . . . . . . . . . . . . . . 38
+ 6.2.9. Content-Encoding . . . . . . . . . . . . . . . . . . 38
+ 6.2.10. Content-Location . . . . . . . . . . . . . . . . . . 39
+ 6.2.11. Content-Length . . . . . . . . . . . . . . . . . . . 39
+ 6.2.12. Fetch Timeout . . . . . . . . . . . . . . . . . . . 39
+ 6.2.13. Cache-Control . . . . . . . . . . . . . . . . . . . 40
+ 6.2.14. Logging-Tag . . . . . . . . . . . . . . . . . . . . 41
+ 6.2.15. Set-Cookie . . . . . . . . . . . . . . . . . . . . . 42
+ 6.2.16. Vendor-Specific Parameters . . . . . . . . . . . . . 44
+ 6.3. Generic Result Structure . . . . . . . . . . . . . . . . 44
+ 6.3.1. Natural Language Semantics Markup Language . . . . . 45
+ 7. Resource Discovery . . . . . . . . . . . . . . . . . . . . . 46
+ 8. Speech Synthesizer Resource . . . . . . . . . . . . . . . . . 47
+ 8.1. Synthesizer State Machine . . . . . . . . . . . . . . . 48
+ 8.2. Synthesizer Methods . . . . . . . . . . . . . . . . . . 48
+ 8.3. Synthesizer Events . . . . . . . . . . . . . . . . . . . 49
+ 8.4. Synthesizer Header Fields . . . . . . . . . . . . . . . 49
+ 8.4.1. Jump-Size . . . . . . . . . . . . . . . . . . . . . 49
+ 8.4.2. Kill-On-Barge-In . . . . . . . . . . . . . . . . . . 50
+ 8.4.3. Speaker-Profile . . . . . . . . . . . . . . . . . . 51
+ 8.4.4. Completion-Cause . . . . . . . . . . . . . . . . . . 51
+ 8.4.5. Completion-Reason . . . . . . . . . . . . . . . . . 52
+ 8.4.6. Voice-Parameter . . . . . . . . . . . . . . . . . . 52
+ 8.4.7. Prosody-Parameters . . . . . . . . . . . . . . . . . 53
+ 8.4.8. Speech-Marker . . . . . . . . . . . . . . . . . . . 53
+ 8.4.9. Speech-Language . . . . . . . . . . . . . . . . . . 54
+ 8.4.10. Fetch-Hint . . . . . . . . . . . . . . . . . . . . . 54
+ 8.4.11. Audio-Fetch-Hint . . . . . . . . . . . . . . . . . . 55
+ 8.4.12. Failed-URI . . . . . . . . . . . . . . . . . . . . . 55
+ 8.4.13. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 55
+ 8.4.14. Speak-Restart . . . . . . . . . . . . . . . . . . . 56
+ 8.4.15. Speak-Length . . . . . . . . . . . . . . . . . . . . 56
+ 8.4.16. Load-Lexicon . . . . . . . . . . . . . . . . . . . . 57
+ 8.4.17. Lexicon-Search-Order . . . . . . . . . . . . . . . . 57
+ 8.5. Synthesizer Message Body . . . . . . . . . . . . . . . . 57
+ 8.5.1. Synthesizer Speech Data . . . . . . . . . . . . . . 57
+ 8.5.2. Lexicon Data . . . . . . . . . . . . . . . . . . . . 59
+ 8.6. SPEAK Method . . . . . . . . . . . . . . . . . . . . . . 60
+ 8.7. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 62
+ 8.8. BARGE-IN-OCCURRED . . . . . . . . . . . . . . . . . . . 63
+ 8.9. PAUSE . . . . . . . . . . . . . . . . . . . . . . . . . 65
+ 8.10. RESUME . . . . . . . . . . . . . . . . . . . . . . . . . 66
+ 8.11. CONTROL . . . . . . . . . . . . . . . . . . . . . . . . 67
+
+
+
+Burnett & Shanmugham Standards Track [Page 3]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ 8.12. SPEAK-COMPLETE . . . . . . . . . . . . . . . . . . . . . 69
+ 8.13. SPEECH-MARKER . . . . . . . . . . . . . . . . . . . . . 70
+ 8.14. DEFINE-LEXICON . . . . . . . . . . . . . . . . . . . . . 71
+ 9. Speech Recognizer Resource . . . . . . . . . . . . . . . . . 72
+ 9.1. Recognizer State Machine . . . . . . . . . . . . . . . . 74
+ 9.2. Recognizer Methods . . . . . . . . . . . . . . . . . . . 74
+ 9.3. Recognizer Events . . . . . . . . . . . . . . . . . . . 75
+ 9.4. Recognizer Header Fields . . . . . . . . . . . . . . . . 75
+ 9.4.1. Confidence-Threshold . . . . . . . . . . . . . . . . 77
+ 9.4.2. Sensitivity-Level . . . . . . . . . . . . . . . . . 77
+ 9.4.3. Speed-Vs-Accuracy . . . . . . . . . . . . . . . . . 77
+ 9.4.4. N-Best-List-Length . . . . . . . . . . . . . . . . . 78
+ 9.4.5. Input-Type . . . . . . . . . . . . . . . . . . . . . 78
+ 9.4.6. No-Input-Timeout . . . . . . . . . . . . . . . . . . 78
+ 9.4.7. Recognition-Timeout . . . . . . . . . . . . . . . . 79
+ 9.4.8. Waveform-URI . . . . . . . . . . . . . . . . . . . . 79
+ 9.4.9. Media-Type . . . . . . . . . . . . . . . . . . . . . 80
+ 9.4.10. Input-Waveform-URI . . . . . . . . . . . . . . . . . 80
+ 9.4.11. Completion-Cause . . . . . . . . . . . . . . . . . . 80
+ 9.4.12. Completion-Reason . . . . . . . . . . . . . . . . . 83
+ 9.4.13. Recognizer-Context-Block . . . . . . . . . . . . . . 83
+ 9.4.14. Start-Input-Timers . . . . . . . . . . . . . . . . . 83
+ 9.4.15. Speech-Complete-Timeout . . . . . . . . . . . . . . 84
+ 9.4.16. Speech-Incomplete-Timeout . . . . . . . . . . . . . 84
+ 9.4.17. DTMF-Interdigit-Timeout . . . . . . . . . . . . . . 85
+ 9.4.18. DTMF-Term-Timeout . . . . . . . . . . . . . . . . . 85
+ 9.4.19. DTMF-Term-Char . . . . . . . . . . . . . . . . . . . 85
+ 9.4.20. Failed-URI . . . . . . . . . . . . . . . . . . . . . 86
+ 9.4.21. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 86
+ 9.4.22. Save-Waveform . . . . . . . . . . . . . . . . . . . 86
+ 9.4.23. New-Audio-Channel . . . . . . . . . . . . . . . . . 86
+ 9.4.24. Speech-Language . . . . . . . . . . . . . . . . . . 87
+ 9.4.25. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 87
+ 9.4.26. Recognition-Mode . . . . . . . . . . . . . . . . . . 87
+ 9.4.27. Cancel-If-Queue . . . . . . . . . . . . . . . . . . 88
+ 9.4.28. Hotword-Max-Duration . . . . . . . . . . . . . . . . 88
+ 9.4.29. Hotword-Min-Duration . . . . . . . . . . . . . . . . 88
+ 9.4.30. Interpret-Text . . . . . . . . . . . . . . . . . . . 89
+ 9.4.31. DTMF-Buffer-Time . . . . . . . . . . . . . . . . . . 89
+ 9.4.32. Clear-DTMF-Buffer . . . . . . . . . . . . . . . . . 89
+ 9.4.33. Early-No-Match . . . . . . . . . . . . . . . . . . . 90
+ 9.4.34. Num-Min-Consistent-Pronunciations . . . . . . . . . 90
+ 9.4.35. Consistency-Threshold . . . . . . . . . . . . . . . 90
+ 9.4.36. Clash-Threshold . . . . . . . . . . . . . . . . . . 90
+ 9.4.37. Personal-Grammar-URI . . . . . . . . . . . . . . . . 91
+ 9.4.38. Enroll-Utterance . . . . . . . . . . . . . . . . . . 91
+ 9.4.39. Phrase-Id . . . . . . . . . . . . . . . . . . . . . 91
+ 9.4.40. Phrase-NL . . . . . . . . . . . . . . . . . . . . . 92
+
+
+
+Burnett & Shanmugham Standards Track [Page 4]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ 9.4.41. Weight . . . . . . . . . . . . . . . . . . . . . . . 92
+ 9.4.42. Save-Best-Waveform . . . . . . . . . . . . . . . . . 92
+ 9.4.43. New-Phrase-Id . . . . . . . . . . . . . . . . . . . 93
+ 9.4.44. Confusable-Phrases-URI . . . . . . . . . . . . . . . 93
+ 9.4.45. Abort-Phrase-Enrollment . . . . . . . . . . . . . . 93
+ 9.5. Recognizer Message Body . . . . . . . . . . . . . . . . 93
+ 9.5.1. Recognizer Grammar Data . . . . . . . . . . . . . . 93
+ 9.5.2. Recognizer Result Data . . . . . . . . . . . . . . . 97
+ 9.5.3. Enrollment Result Data . . . . . . . . . . . . . . . 98
+ 9.5.4. Recognizer Context Block . . . . . . . . . . . . . . 98
+ 9.6. Recognizer Results . . . . . . . . . . . . . . . . . . . 99
+ 9.6.1. Markup Functions . . . . . . . . . . . . . . . . . . 99
+ 9.6.2. Overview of Recognizer Result Elements and Their
+ Relationships . . . . . . . . . . . . . . . . . . . 100
+ 9.6.3. Elements and Attributes . . . . . . . . . . . . . . 101
+ 9.7. Enrollment Results . . . . . . . . . . . . . . . . . . . 106
+ 9.7.1. <num-clashes> Element . . . . . . . . . . . . . . . 106
+ 9.7.2. <num-good-repetitions> Element . . . . . . . . . . . 106
+ 9.7.3. <num-repetitions-still-needed> Element . . . . . . . 107
+ 9.7.4. <consistency-status> Element . . . . . . . . . . . . 107
+ 9.7.5. <clash-phrase-ids> Element . . . . . . . . . . . . . 107
+ 9.7.6. <transcriptions> Element . . . . . . . . . . . . . . 107
+ 9.7.7. <confusable-phrases> Element . . . . . . . . . . . . 107
+ 9.8. DEFINE-GRAMMAR . . . . . . . . . . . . . . . . . . . . . 107
+ 9.9. RECOGNIZE . . . . . . . . . . . . . . . . . . . . . . . 111
+ 9.10. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 118
+ 9.11. GET-RESULT . . . . . . . . . . . . . . . . . . . . . . . 119
+ 9.12. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 120
+ 9.13. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 120
+ 9.14. RECOGNITION-COMPLETE . . . . . . . . . . . . . . . . . . 120
+ 9.15. START-PHRASE-ENROLLMENT . . . . . . . . . . . . . . . . 123
+ 9.16. ENROLLMENT-ROLLBACK . . . . . . . . . . . . . . . . . . 124
+ 9.17. END-PHRASE-ENROLLMENT . . . . . . . . . . . . . . . . . 124
+ 9.18. MODIFY-PHRASE . . . . . . . . . . . . . . . . . . . . . 125
+ 9.19. DELETE-PHRASE . . . . . . . . . . . . . . . . . . . . . 125
+ 9.20. INTERPRET . . . . . . . . . . . . . . . . . . . . . . . 125
+ 9.21. INTERPRETATION-COMPLETE . . . . . . . . . . . . . . . . 127
+ 9.22. DTMF Detection . . . . . . . . . . . . . . . . . . . . . 128
+ 10. Recorder Resource . . . . . . . . . . . . . . . . . . . . . . 129
+ 10.1. Recorder State Machine . . . . . . . . . . . . . . . . . 129
+ 10.2. Recorder Methods . . . . . . . . . . . . . . . . . . . . 130
+ 10.3. Recorder Events . . . . . . . . . . . . . . . . . . . . 130
+ 10.4. Recorder Header Fields . . . . . . . . . . . . . . . . . 130
+ 10.4.1. Sensitivity-Level . . . . . . . . . . . . . . . . . 130
+ 10.4.2. No-Input-Timeout . . . . . . . . . . . . . . . . . . 131
+ 10.4.3. Completion-Cause . . . . . . . . . . . . . . . . . . 131
+ 10.4.4. Completion-Reason . . . . . . . . . . . . . . . . . 132
+ 10.4.5. Failed-URI . . . . . . . . . . . . . . . . . . . . . 132
+
+
+
+Burnett & Shanmugham Standards Track [Page 5]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ 10.4.6. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 132
+ 10.4.7. Record-URI . . . . . . . . . . . . . . . . . . . . . 132
+ 10.4.8. Media-Type . . . . . . . . . . . . . . . . . . . . . 133
+ 10.4.9. Max-Time . . . . . . . . . . . . . . . . . . . . . . 133
+ 10.4.10. Trim-Length . . . . . . . . . . . . . . . . . . . . 134
+ 10.4.11. Final-Silence . . . . . . . . . . . . . . . . . . . 134
+ 10.4.12. Capture-On-Speech . . . . . . . . . . . . . . . . . 134
+ 10.4.13. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 134
+ 10.4.14. Start-Input-Timers . . . . . . . . . . . . . . . . . 135
+ 10.4.15. New-Audio-Channel . . . . . . . . . . . . . . . . . 135
+ 10.5. Recorder Message Body . . . . . . . . . . . . . . . . . 135
+ 10.6. RECORD . . . . . . . . . . . . . . . . . . . . . . . . . 135
+ 10.7. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 136
+ 10.8. RECORD-COMPLETE . . . . . . . . . . . . . . . . . . . . 137
+ 10.9. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 138
+ 10.10. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 138
+ 11. Speaker Verification and Identification . . . . . . . . . . . 139
+ 11.1. Speaker Verification State Machine . . . . . . . . . . . 140
+ 11.2. Speaker Verification Methods . . . . . . . . . . . . . . 142
+ 11.3. Verification Events . . . . . . . . . . . . . . . . . . 144
+ 11.4. Verification Header Fields . . . . . . . . . . . . . . . 144
+ 11.4.1. Repository-URI . . . . . . . . . . . . . . . . . . . 144
+ 11.4.2. Voiceprint-Identifier . . . . . . . . . . . . . . . 145
+ 11.4.3. Verification-Mode . . . . . . . . . . . . . . . . . 145
+ 11.4.4. Adapt-Model . . . . . . . . . . . . . . . . . . . . 146
+ 11.4.5. Abort-Model . . . . . . . . . . . . . . . . . . . . 146
+ 11.4.6. Min-Verification-Score . . . . . . . . . . . . . . . 147
+ 11.4.7. Num-Min-Verification-Phrases . . . . . . . . . . . . 147
+ 11.4.8. Num-Max-Verification-Phrases . . . . . . . . . . . . 147
+ 11.4.9. No-Input-Timeout . . . . . . . . . . . . . . . . . . 148
+ 11.4.10. Save-Waveform . . . . . . . . . . . . . . . . . . . 148
+ 11.4.11. Media-Type . . . . . . . . . . . . . . . . . . . . . 148
+ 11.4.12. Waveform-URI . . . . . . . . . . . . . . . . . . . . 148
+ 11.4.13. Voiceprint-Exists . . . . . . . . . . . . . . . . . 149
+ 11.4.14. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 149
+ 11.4.15. Input-Waveform-URI . . . . . . . . . . . . . . . . . 149
+ 11.4.16. Completion-Cause . . . . . . . . . . . . . . . . . . 150
+ 11.4.17. Completion-Reason . . . . . . . . . . . . . . . . . 151
+ 11.4.18. Speech-Complete-Timeout . . . . . . . . . . . . . . 151
+ 11.4.19. New-Audio-Channel . . . . . . . . . . . . . . . . . 152
+ 11.4.20. Abort-Verification . . . . . . . . . . . . . . . . . 152
+ 11.4.21. Start-Input-Timers . . . . . . . . . . . . . . . . . 152
+ 11.5. Verification Message Body . . . . . . . . . . . . . . . 152
+ 11.5.1. Verification Result Data . . . . . . . . . . . . . . 152
+ 11.5.2. Verification Result Elements . . . . . . . . . . . . 153
+ 11.6. START-SESSION . . . . . . . . . . . . . . . . . . . . . 157
+ 11.7. END-SESSION . . . . . . . . . . . . . . . . . . . . . . 158
+ 11.8. QUERY-VOICEPRINT . . . . . . . . . . . . . . . . . . . . 159
+
+
+
+Burnett & Shanmugham Standards Track [Page 6]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ 11.9. DELETE-VOICEPRINT . . . . . . . . . . . . . . . . . . . 160
+ 11.10. VERIFY . . . . . . . . . . . . . . . . . . . . . . . . . 160
+ 11.11. VERIFY-FROM-BUFFER . . . . . . . . . . . . . . . . . . . 160
+ 11.12. VERIFY-ROLLBACK . . . . . . . . . . . . . . . . . . . . 164
+ 11.13. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 164
+ 11.14. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 165
+ 11.15. VERIFICATION-COMPLETE . . . . . . . . . . . . . . . . . 165
+ 11.16. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 166
+ 11.17. CLEAR-BUFFER . . . . . . . . . . . . . . . . . . . . . . 166
+ 11.18. GET-INTERMEDIATE-RESULT . . . . . . . . . . . . . . . . 167
+ 12. Security Considerations . . . . . . . . . . . . . . . . . . . 168
+ 12.1. Rendezvous and Session Establishment . . . . . . . . . . 168
+ 12.2. Control Channel Protection . . . . . . . . . . . . . . . 168
+ 12.3. Media Session Protection . . . . . . . . . . . . . . . . 169
+ 12.4. Indirect Content Access . . . . . . . . . . . . . . . . 169
+ 12.5. Protection of Stored Media . . . . . . . . . . . . . . . 170
+ 12.6. DTMF and Recognition Buffers . . . . . . . . . . . . . . 171
+ 12.7. Client-Set Server Parameters . . . . . . . . . . . . . . 171
+ 12.8. DELETE-VOICEPRINT and Authorization . . . . . . . . . . 171
+ 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 171
+ 13.1. New Registries . . . . . . . . . . . . . . . . . . . . . 171
+ 13.1.1. MRCPv2 Resource Types . . . . . . . . . . . . . . . 171
+ 13.1.2. MRCPv2 Methods and Events . . . . . . . . . . . . . 172
+ 13.1.3. MRCPv2 Header Fields . . . . . . . . . . . . . . . . 173
+ 13.1.4. MRCPv2 Status Codes . . . . . . . . . . . . . . . . 176
+ 13.1.5. Grammar Reference List Parameters . . . . . . . . . 176
+ 13.1.6. MRCPv2 Vendor-Specific Parameters . . . . . . . . . 176
+ 13.2. NLSML-Related Registrations . . . . . . . . . . . . . . 177
+ 13.2.1. 'application/nlsml+xml' Media Type Registration . . 177
+ 13.3. NLSML XML Schema Registration . . . . . . . . . . . . . 178
+ 13.4. MRCPv2 XML Namespace Registration . . . . . . . . . . . 178
+ 13.5. Text Media Type Registrations . . . . . . . . . . . . . 178
+ 13.5.1. text/grammar-ref-list . . . . . . . . . . . . . . . 178
+ 13.6. 'session' URI Scheme Registration . . . . . . . . . . . 180
+ 13.7. SDP Parameter Registrations . . . . . . . . . . . . . . 181
+ 13.7.1. Sub-Registry "proto" . . . . . . . . . . . . . . . . 181
+ 13.7.2. Sub-Registry "att-field (media-level)" . . . . . . . 182
+ 14. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 183
+ 14.1. Message Flow . . . . . . . . . . . . . . . . . . . . . . 183
+ 14.2. Recognition Result Examples . . . . . . . . . . . . . . 192
+ 14.2.1. Simple ASR Ambiguity . . . . . . . . . . . . . . . . 192
+ 14.2.2. Mixed Initiative . . . . . . . . . . . . . . . . . . 192
+ 14.2.3. DTMF Input . . . . . . . . . . . . . . . . . . . . . 193
+ 14.2.4. Interpreting Meta-Dialog and Meta-Task Utterances . 194
+ 14.2.5. Anaphora and Deixis . . . . . . . . . . . . . . . . 195
+ 14.2.6. Distinguishing Individual Items from Sets with
+ One Member . . . . . . . . . . . . . . . . . . . . . 195
+ 14.2.7. Extensibility . . . . . . . . . . . . . . . . . . . 196
+
+
+
+Burnett & Shanmugham Standards Track [Page 7]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ 15. ABNF Normative Definition . . . . . . . . . . . . . . . . . . 196
+ 16. XML Schemas . . . . . . . . . . . . . . . . . . . . . . . . . 211
+ 16.1. NLSML Schema Definition . . . . . . . . . . . . . . . . 211
+ 16.2. Enrollment Results Schema Definition . . . . . . . . . . 213
+ 16.3. Verification Results Schema Definition . . . . . . . . . 214
+ 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 218
+ 17.1. Normative References . . . . . . . . . . . . . . . . . . 218
+ 17.2. Informative References . . . . . . . . . . . . . . . . . 220
+ Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 223
+ Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 223
+
+1. Introduction
+
+ MRCPv2 is designed to allow a client device to control media
+ processing resources on the network. Some of these media processing
+ resources include speech recognition engines, speech synthesis
+ engines, speaker verification, and speaker identification engines.
+ MRCPv2 enables the implementation of distributed Interactive Voice
+ Response platforms using VoiceXML [W3C.REC-voicexml20-20040316]
+ browsers or other client applications while maintaining separate
+ back-end speech processing capabilities on specialized speech
+ processing servers. MRCPv2 is based on the earlier Media Resource
+ Control Protocol (MRCP) [RFC4463] developed jointly by Cisco Systems,
+ Inc., Nuance Communications, and Speechworks, Inc. Although some of
+ the method names are similar, the way in which these methods are
+ communicated is different. There are also more resources and more
+ methods for each resource. The first version of MRCP was essentially
+ taken only as input to the development of this protocol. There is no
+ expectation that an MRCPv2 client will work with an MRCPv1 server or
+ vice versa. There is no migration plan or gateway definition between
+ the two protocols.
+
+ The protocol requirements of Speech Services Control (SPEECHSC)
+ [RFC4313] include that the solution be capable of reaching a media
+ processing server, setting up communication channels to the media
+ resources, and sending and receiving control messages and media
+ streams to/from the server. The Session Initiation Protocol (SIP)
+ [RFC3261] meets these requirements.
+
+ The proprietary version of MRCP ran over the Real Time Streaming
+ Protocol (RTSP) [RFC2326]. At the time work on MRCPv2 was begun, the
+ consensus was that this use of RTSP would break the RTSP protocol or
+ cause backward-compatibility problems, something forbidden by Section
+ 3.2 of [RFC4313]. This is the reason why MRCPv2 does not run over
+ RTSP.
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 8]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ MRCPv2 leverages these capabilities by building upon SIP and the
+ Session Description Protocol (SDP) [RFC4566]. MRCPv2 uses SIP to set
+ up and tear down media and control sessions with the server. In
+ addition, the client can use a SIP re-INVITE method (an INVITE dialog
+ sent within an existing SIP session) to change the characteristics of
+ these media and control session while maintaining the SIP dialog
+ between the client and server. SDP is used to describe the
+ parameters of the media sessions associated with that dialog. It is
+ mandatory to support SIP as the session establishment protocol to
+ ensure interoperability. Other protocols can be used for session
+ establishment by prior agreement. This document only describes the
+ use of SIP and SDP.
+
+ MRCPv2 uses SIP and SDP to create the speech client/server dialog and
+ set up the media channels to the server. It also uses SIP and SDP to
+ establish MRCPv2 control sessions between the client and the server
+ for each media processing resource required for that dialog. The
+ MRCPv2 protocol exchange between the client and the media resource is
+ carried on that control session. MRCPv2 exchanges do not change the
+ state of the SIP dialog, the media sessions, or other parameters of
+ the dialog initiated via SIP. It controls and affects the state of
+ the media processing resource associated with the MRCPv2 session(s).
+
+ MRCPv2 defines the messages to control the different media processing
+ resources and the state machines required to guide their operation.
+ It also describes how these messages are carried over a transport-
+ layer protocol such as the Transmission Control Protocol (TCP)
+ [RFC0793] or the Transport Layer Security (TLS) Protocol [RFC5246].
+ (Note: the Stream Control Transmission Protocol (SCTP) [RFC4960] is a
+ viable transport for MRCPv2 as well, but the mapping onto SCTP is not
+ described in this specification.)
+
+2. Document Conventions
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [RFC2119].
+
+ Since many of the definitions and syntax are identical to those for
+ the Hypertext Transfer Protocol -- HTTP/1.1 [RFC2616], this
+ specification refers to the section where they are defined rather
+ than copying it. For brevity, [HX.Y] is to be taken to refer to
+ Section X.Y of RFC 2616.
+
+ All the mechanisms specified in this document are described in both
+ prose and an augmented Backus-Naur form (ABNF [RFC5234]).
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 9]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ The complete message format in ABNF form is provided in Section 15
+ and is the normative format definition. Note that productions may be
+ duplicated within the main body of the document for reading
+ convenience. If a production in the body of the text conflicts with
+ one in the normative definition, the latter rules.
+
+2.1. Definitions
+
+ Media Resource
+ An entity on the speech processing server that can be
+ controlled through MRCPv2.
+
+ MRCP Server
+ Aggregate of one or more "Media Resource" entities on
+ a server, exposed through MRCPv2. Often, 'server' in
+ this document refers to an MRCP server.
+
+ MRCP Client
+ An entity controlling one or more Media Resources
+ through MRCPv2 ("Client" for short).
+
+ DTMF
+ Dual-Tone Multi-Frequency; a method of transmitting
+ key presses in-band, either as actual tones (Q.23
+ [Q.23]) or as named tone events (RFC 4733 [RFC4733]).
+
+ Endpointing
+ The process of automatically detecting the beginning
+ and end of speech in an audio stream. This is
+ critical both for speech recognition and for automated
+ recording as one would find in voice mail systems.
+
+ Hotword Mode
+ A mode of speech recognition where a stream of
+ utterances is evaluated for match against a small set
+ of command words. This is generally employed either
+ to trigger some action or to control the subsequent
+ grammar to be used for further recognition.
+
+2.2. State-Machine Diagrams
+
+ The state-machine diagrams in this document do not show every
+ possible method call. Rather, they reflect the state of the resource
+ based on the methods that have moved to IN-PROGRESS or COMPLETE
+ states (see Section 5.3). Note that since PENDING requests
+ essentially have not affected the resource yet and are in the queue
+ to be processed, they are not reflected in the state-machine
+ diagrams.
+
+
+
+Burnett & Shanmugham Standards Track [Page 10]
+
+RFC 6787 MRCPv2 November 2012
+
+
+2.3. URI Schemes
+
+ This document defines many protocol headers that contain URIs
+ (Uniform Resource Identifiers [RFC3986]) or lists of URIs for
+ referencing media. The entire document, including the Security
+ Considerations section (Section 12), assumes that HTTP or HTTP over
+ TLS (HTTPS) [RFC2818] will be used as the URI addressing scheme
+ unless otherwise stated. However, implementations MAY support other
+ schemes (such as 'file'), provided they have addressed any security
+ considerations described in this document and any others particular
+ to the specific scheme. For example, implementations where the
+ client and server both reside on the same physical hardware and the
+ file system is secured by traditional user-level file access controls
+ could be reasonable candidates for supporting the 'file' scheme.
+
+3. Architecture
+
+ A system using MRCPv2 consists of a client that requires the
+ generation and/or consumption of media streams and a media resource
+ server that has the resources or "engines" to process these streams
+ as input or generate these streams as output. The client uses SIP
+ and SDP to establish an MRCPv2 control channel with the server to use
+ its media processing resources. MRCPv2 servers are addressed using
+ SIP URIs.
+
+ SIP uses SDP with the offer/answer model described in RFC 3264
+ [RFC3264] to set up the MRCPv2 control channels and describe their
+ characteristics. A separate MRCPv2 session is needed to control each
+ of the media processing resources associated with the SIP dialog
+ between the client and server. Within a SIP dialog, the individual
+ resource control channels for the different resources are added or
+ removed through SDP offer/answer carried in a SIP re-INVITE
+ transaction.
+
+ The server, through the SDP exchange, provides the client with a
+ difficult-to-guess, unambiguous channel identifier and a TCP port
+ number (see Section 4.2). The client MAY then open a new TCP
+ connection with the server on this port number. Multiple MRCPv2
+ channels can share a TCP connection between the client and the
+ server. All MRCPv2 messages exchanged between the client and the
+ server carry the specified channel identifier that the server MUST
+ ensure is unambiguous among all MRCPv2 control channels that are
+ active on that server. The client uses this channel identifier to
+ indicate the media processing resource associated with that channel.
+ For information on message framing, see Section 5.
+
+ SIP also establishes the media sessions between the client (or other
+ source/sink of media) and the MRCPv2 server using SDP "m=" lines.
+
+
+
+Burnett & Shanmugham Standards Track [Page 11]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ One or more media processing resources may share a media session
+ under a SIP session, or each media processing resource may have its
+ own media session.
+
+ The following diagram shows the general architecture of a system that
+ uses MRCPv2. To simplify the diagram, only a few resources are
+ shown.
+
+ MRCPv2 client MRCPv2 Media Resource Server
+|--------------------| |------------------------------------|
+||------------------|| ||----------------------------------||
+|| Application Layer|| ||Synthesis|Recognition|Verification||
+||------------------|| || Engine | Engine | Engine ||
+||Media Resource API|| || || | || | || ||
+||------------------|| ||Synthesis|Recognizer | Verifier ||
+|| SIP | MRCPv2 || ||Resource | Resource | Resource ||
+||Stack | || || Media Resource Management ||
+|| | || ||----------------------------------||
+||------------------|| || SIP | MRCPv2 ||
+|| TCP/IP Stack ||---MRCPv2---|| Stack | ||
+|| || ||----------------------------------||
+||------------------||----SIP-----|| TCP/IP Stack ||
+|--------------------| || ||
+ | ||----------------------------------||
+ SIP |------------------------------------|
+ | /
+|-------------------| RTP
+| | /
+| Media Source/Sink |------------/
+| |
+|-------------------|
+
+ Figure 1: Architectural Diagram
+
+3.1. MRCPv2 Media Resource Types
+
+ An MRCPv2 server may offer one or more of the following media
+ processing resources to its clients.
+
+ Basic Synthesizer
+ A speech synthesizer resource that has very limited
+ capabilities and can generate its media stream
+ exclusively from concatenated audio clips. The speech
+ data is described using a limited subset of the Speech
+ Synthesis Markup Language (SSML)
+ [W3C.REC-speech-synthesis-20040907] elements. A basic
+ synthesizer MUST support the SSML tags <speak>,
+ <audio>, <say-as>, and <mark>.
+
+
+
+Burnett & Shanmugham Standards Track [Page 12]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Speech Synthesizer
+ A full-capability speech synthesis resource that can
+ render speech from text. Such a synthesizer MUST have
+ full SSML [W3C.REC-speech-synthesis-20040907] support.
+
+ Recorder
+ A resource capable of recording audio and providing a
+ URI pointer to the recording. A recorder MUST provide
+ endpointing capabilities for suppressing silence at
+ the beginning and end of a recording, and MAY also
+ suppress silence in the middle of a recording. If
+ such suppression is done, the recorder MUST maintain
+ timing metadata to indicate the actual timestamps of
+ the recorded media.
+
+ DTMF Recognizer
+ A recognizer resource capable of extracting and
+ interpreting Dual-Tone Multi-Frequency (DTMF) [Q.23]
+ digits in a media stream and matching them against a
+ supplied digit grammar. It could also do a semantic
+ interpretation based on semantic tags in the grammar.
+
+ Speech Recognizer
+ A full speech recognition resource that is capable of
+ receiving a media stream containing audio and
+ interpreting it to recognition results. It also has a
+ natural language semantic interpreter to post-process
+ the recognized data according to the semantic data in
+ the grammar and provide semantic results along with
+ the recognized input. The recognizer MAY also support
+ enrolled grammars, where the client can enroll and
+ create new personal grammars for use in future
+ recognition operations.
+
+ Speaker Verifier
+ A resource capable of verifying the authenticity of a
+ claimed identity by matching a media stream containing
+ spoken input to a pre-existing voiceprint. This may
+ also involve matching the caller's voice against more
+ than one voiceprint, also called multi-verification or
+ speaker identification.
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 13]
+
+RFC 6787 MRCPv2 November 2012
+
+
+3.2. Server and Resource Addressing
+
+ The MRCPv2 server is a generic SIP server, and is thus addressed by a
+ SIP URI (RFC 3261 [RFC3261]).
+
+ For example:
+
+ sip:mrcpv2@example.net or
+ sips:mrcpv2@example.net
+
+4. MRCPv2 Basics
+
+ MRCPv2 requires a connection-oriented transport-layer protocol such
+ as TCP to guarantee reliable sequencing and delivery of MRCPv2
+ control messages between the client and the server. In order to meet
+ the requirements for security enumerated in SPEECHSC requirements
+ [RFC4313], clients and servers MUST implement TLS as well. One or
+ more connections between the client and the server can be shared
+ among different MRCPv2 channels to the server. The individual
+ messages carry the channel identifier to differentiate messages on
+ different channels. MRCPv2 encoding is text based with mechanisms to
+ carry embedded binary data. This allows arbitrary data like
+ recognition grammars, recognition results, synthesizer speech markup,
+ etc., to be carried in MRCPv2 messages. For information on message
+ framing, see Section 5.
+
+4.1. Connecting to the Server
+
+ MRCPv2 employs SIP, in conjunction with SDP, as the session
+ establishment and management protocol. The client reaches an MRCPv2
+ server using conventional INVITE and other SIP requests for
+ establishing, maintaining, and terminating SIP dialogs. The SDP
+ offer/answer exchange model over SIP is used to establish a resource
+ control channel for each resource. The SDP offer/answer exchange is
+ also used to establish media sessions between the server and the
+ source or sink of audio.
+
+4.2. Managing Resource Control Channels
+
+ The client needs a separate MRCPv2 resource control channel to
+ control each media processing resource under the SIP dialog. A
+ unique channel identifier string identifies these resource control
+ channels. The channel identifier is a difficult-to-guess,
+ unambiguous string followed by an "@", then by a string token
+ specifying the type of resource. The server generates the channel
+ identifier and MUST make sure it does not clash with the identifier
+ of any other MRCP channel currently allocated by that server. MRCPv2
+ defines the following IANA-registered types of media processing
+
+
+
+Burnett & Shanmugham Standards Track [Page 14]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ resources. Additional resource types and their associated methods/
+ events and state machines may be added as described below in
+ Section 13.
+
+ +---------------+----------------------+--------------+
+ | Resource Type | Resource Description | Described in |
+ +---------------+----------------------+--------------+
+ | speechrecog | Speech Recognizer | Section 9 |
+ | dtmfrecog | DTMF Recognizer | Section 9 |
+ | speechsynth | Speech Synthesizer | Section 8 |
+ | basicsynth | Basic Synthesizer | Section 8 |
+ | speakverify | Speaker Verification | Section 11 |
+ | recorder | Speech Recorder | Section 10 |
+ +---------------+----------------------+--------------+
+
+ Table 1: Resource Types
+
+ The SIP INVITE or re-INVITE transaction and the SDP offer/answer
+ exchange it carries contain "m=" lines describing the resource
+ control channel to be allocated. There MUST be one SDP "m=" line for
+ each MRCPv2 resource to be used in the session. This "m=" line MUST
+ have a media type field of "application" and a transport type field
+ of either "TCP/MRCPv2" or "TCP/TLS/MRCPv2". The port number field of
+ the "m=" line MUST contain the "discard" port of the transport
+ protocol (port 9 for TCP) in the SDP offer from the client and MUST
+ contain the TCP listen port on the server in the SDP answer. The
+ client may then either set up a TCP or TLS connection to that server
+ port or share an already established connection to that port. Since
+ MRCPv2 allows multiple sessions to share the same TCP connection,
+ multiple "m=" lines in a single SDP document MAY share the same port
+ field value; MRCPv2 servers MUST NOT assume any relationship between
+ resources using the same port other than the sharing of the
+ communication channel.
+
+ MRCPv2 resources do not use the port or format field of the "m=" line
+ to distinguish themselves from other resources using the same
+ channel. The client MUST specify the resource type identifier in the
+ resource attribute associated with the control "m=" line of the SDP
+ offer. The server MUST respond with the full Channel-Identifier
+ (which includes the resource type identifier and a difficult-to-
+ guess, unambiguous string) in the "channel" attribute associated with
+ the control "m=" line of the SDP answer. To remain backwards
+ compatible with conventional SDP usage, the format field of the "m="
+ line MUST have the arbitrarily selected value of "1".
+
+ When the client wants to add a media processing resource to the
+ session, it issues a new SDP offer, according to the procedures of
+ RFC 3264 [RFC3264], in a SIP re-INVITE request. The SDP offer/answer
+
+
+
+Burnett & Shanmugham Standards Track [Page 15]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ exchange carried by this SIP transaction contains one or more
+ additional control "m=" lines for the new resources to be allocated
+ to the session. The server, on seeing the new "m=" line, allocates
+ the resources (if they are available) and responds with a
+ corresponding control "m=" line in the SDP answer carried in the SIP
+ response. If the new resources are not available, the re-INVITE
+ receives an error message, and existing media processing going on
+ before the re-INVITE will continue as it was before. It is not
+ possible to allocate more than one resource of each type. If a
+ client requests more than one resource of any type, the server MUST
+ behave as if the resources of that type (beyond the first one) are
+ not available.
+
+ MRCPv2 clients and servers using TCP as a transport protocol MUST use
+ the procedures specified in RFC 4145 [RFC4145] for setting up the TCP
+ connection, with the considerations described hereby. Similarly,
+ MRCPv2 clients and servers using TCP/TLS as a transport protocol MUST
+ use the procedures specified in RFC 4572 [RFC4572] for setting up the
+ TLS connection, with the considerations described hereby. The
+ a=setup attribute, as described in RFC 4145 [RFC4145], MUST be
+ "active" for the offer from the client and MUST be "passive" for the
+ answer from the MRCPv2 server. The a=connection attribute MUST have
+ a value of "new" on the very first control "m=" line offer from the
+ client to an MRCPv2 server. Subsequent control "m=" line offers from
+ the client to the MRCP server MAY contain "new" or "existing",
+ depending on whether the client wants to set up a new connection or
+ share an existing connection, respectively. If the client specifies
+ a value of "new", the server MUST respond with a value of "new". If
+ the client specifies a value of "existing", the server MUST respond.
+ The legal values in the response are "existing" if the server prefers
+ to share an existing connection or "new" if not. In the latter case,
+ the client MUST initiate a new transport connection.
+
+ When the client wants to deallocate the resource from this session,
+ it issues a new SDP offer, according to RFC 3264 [RFC3264], where the
+ control "m=" line port MUST be set to 0. This SDP offer is sent in a
+ SIP re-INVITE request. This deallocates the associated MRCPv2
+ identifier and resource. The server MUST NOT close the TCP or TLS
+ connection if it is currently being shared among multiple MRCP
+ channels. When all MRCP channels that may be sharing the connection
+ are released and/or the associated SIP dialog is terminated, the
+ client or server terminates the connection.
+
+ When the client wants to tear down the whole session and all its
+ resources, it MUST issue a SIP BYE request to close the SIP session.
+ This will deallocate all the control channels and resources allocated
+ under the session.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 16]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ All servers MUST support TLS. Servers MAY use TCP without TLS in
+ controlled environments (e.g., not in the public Internet) where both
+ nodes are inside a protected perimeter, for example, preventing
+ access to the MRCP server from remote nodes outside the controlled
+ perimeter. It is up to the client, through the SDP offer, to choose
+ which transport it wants to use for an MRCPv2 session. Aside from
+ the exceptions given above, when using TCP, the "m=" lines MUST
+ conform to RFC4145 [RFC4145], which describes the usage of SDP for
+ connection-oriented transport. When using TLS, the SDP "m=" line for
+ the control stream MUST conform to Connection-Oriented Media
+ (COMEDIA) over TLS [RFC4572], which specifies the usage of SDP for
+ establishing a secure connection-oriented transport over TLS.
+
+4.3. SIP Session Example
+
+ This first example shows the power of using SIP to route to the
+ appropriate resource. In the example, note the use of a request to a
+ domain's speech server service in the INVITE to
+ mresources@example.com. The SIP routing machinery in the domain
+ locates the actual server, mresources@server.example.com, which gets
+ returned in the 200 OK. Note that "cmid" is defined in Section 4.4.
+
+ This example exchange adds a resource control channel for a
+ synthesizer. Since a synthesizer also generates an audio stream,
+ this interaction also creates a receive-only Real-Time Protocol (RTP)
+ [RFC3550] media session for the server to send audio to. The SIP
+ dialog with the media source/sink is independent of MRCP and is not
+ shown.
+
+ C->S: INVITE sip:mresources@example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf1
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:314161 INVITE
+ Contact:<sip:sarvi@client.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=sarvi 2890844526 2890844526 IN IP4 192.0.2.12
+ s=-
+ c=IN IP4 192.0.2.12
+ t=0 0
+ m=application 9 TCP/MRCPv2 1
+ a=setup:active
+
+
+
+Burnett & Shanmugham Standards Track [Page 17]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ a=connection:new
+ a=resource:speechsynth
+ a=cmid:1
+ m=audio 49170 RTP/AVP 0
+ a=rtpmap:0 pcmu/8000
+ a=recvonly
+ a=mid:1
+
+
+ S->C: SIP/2.0 200 OK
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf1;received=192.0.32.10
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:314161 INVITE
+ Contact:<sip:mresources@server.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=- 2890842808 2890842808 IN IP4 192.0.2.11
+ s=-
+ c=IN IP4 192.0.2.11
+ t=0 0
+ m=application 32416 TCP/MRCPv2 1
+ a=setup:passive
+ a=connection:new
+ a=channel:32AECB234338@speechsynth
+ a=cmid:1
+ m=audio 48260 RTP/AVP 0
+ a=rtpmap:0 pcmu/8000
+ a=sendonly
+ a=mid:1
+
+
+ C->S: ACK sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf2
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:Sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:314161 ACK
+ Content-Length:0
+
+ Example: Add Synthesizer Control Channel
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 18]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ This example exchange continues from the previous figure and
+ allocates an additional resource control channel for a recognizer.
+ Since a recognizer would need to receive an audio stream for
+ recognition, this interaction also updates the audio stream to
+ sendrecv, making it a two-way RTP media session.
+
+ C->S: INVITE sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf3
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:314162 INVITE
+ Contact:<sip:sarvi@client.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=sarvi 2890844526 2890844527 IN IP4 192.0.2.12
+ s=-
+ c=IN IP4 192.0.2.12
+ t=0 0
+ m=application 9 TCP/MRCPv2 1
+ a=setup:active
+ a=connection:existing
+ a=resource:speechsynth
+ a=cmid:1
+ m=audio 49170 RTP/AVP 0 96
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:96 telephone-event/8000
+ a=fmtp:96 0-15
+ a=sendrecv
+ a=mid:1
+ m=application 9 TCP/MRCPv2 1
+ a=setup:active
+ a=connection:existing
+ a=resource:speechrecog
+ a=cmid:1
+
+
+ S->C: SIP/2.0 200 OK
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf3;received=192.0.32.10
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:314162 INVITE
+
+
+
+Burnett & Shanmugham Standards Track [Page 19]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Contact:<sip:mresources@server.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=- 2890842808 2890842809 IN IP4 192.0.2.11
+ s=-
+ c=IN IP4 192.0.2.11
+ t=0 0
+ m=application 32416 TCP/MRCPv2 1
+ a=setup:passive
+ a=connection:existing
+ a=channel:32AECB234338@speechsynth
+ a=cmid:1
+ m=audio 48260 RTP/AVP 0 96
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:96 telephone-event/8000
+ a=fmtp:96 0-15
+ a=sendrecv
+ a=mid:1
+ m=application 32416 TCP/MRCPv2 1
+ a=setup:passive
+ a=connection:existing
+ a=channel:32AECB234338@speechrecog
+ a=cmid:1
+
+
+ C->S: ACK sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf4
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:Sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:314162 ACK
+ Content-Length:0
+
+ Example: Add Recognizer
+
+ This example exchange continues from the previous figure and
+ deallocates the recognizer channel. Since a recognizer no longer
+ needs to receive an audio stream, this interaction also updates the
+ RTP media session to recvonly.
+
+ C->S: INVITE sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf5
+ Max-Forwards:6
+
+
+
+Burnett & Shanmugham Standards Track [Page 20]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:314163 INVITE
+ Contact:<sip:sarvi@client.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=sarvi 2890844526 2890844528 IN IP4 192.0.2.12
+ s=-
+ c=IN IP4 192.0.2.12
+ t=0 0
+ m=application 9 TCP/MRCPv2 1
+ a=resource:speechsynth
+ a=cmid:1
+ m=audio 49170 RTP/AVP 0
+ a=rtpmap:0 pcmu/8000
+ a=recvonly
+ a=mid:1
+ m=application 0 TCP/MRCPv2 1
+ a=resource:speechrecog
+ a=cmid:1
+
+
+ S->C: SIP/2.0 200 OK
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf5;received=192.0.32.10
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:314163 INVITE
+ Contact:<sip:mresources@server.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=- 2890842808 2890842810 IN IP4 192.0.2.11
+ s=-
+ c=IN IP4 192.0.2.11
+ t=0 0
+ m=application 32416 TCP/MRCPv2 1
+ a=channel:32AECB234338@speechsynth
+ a=cmid:1
+ m=audio 48260 RTP/AVP 0
+ a=rtpmap:0 pcmu/8000
+ a=sendonly
+ a=mid:1
+
+
+
+Burnett & Shanmugham Standards Track [Page 21]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ m=application 0 TCP/MRCPv2 1
+ a=channel:32AECB234338@speechrecog
+ a=cmid:1
+
+ C->S: ACK sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf6
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:Sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:314163 ACK
+ Content-Length:0
+
+ Example: Deallocate Recognizer
+
+4.4. Media Streams and RTP Ports
+
+ Since MRCPv2 resources either generate or consume media streams, the
+ client or the server needs to associate media sessions with their
+ corresponding resource or resources. More than one resource could be
+ associated with a single media session or each resource could be
+ assigned a separate media session. Also, note that more than one
+ media session can be associated with a single resource if need be,
+ but this scenario is not useful for the current set of resources.
+ For example, a synthesizer and a recognizer could be associated to
+ the same media session (m=audio line), if it is opened in "sendrecv"
+ mode. Alternatively, the recognizer could have its own "sendonly"
+ audio session, and the synthesizer could have its own "recvonly"
+ audio session.
+
+ The association between control channels and their corresponding
+ media sessions is established using a new "resource channel media
+ identifier" media-level attribute ("cmid"). Valid values of this
+ attribute are the values of the "mid" attribute defined in RFC 5888
+ [RFC5888]. If there is more than one audio "m=" line, then each
+ audio "m=" line MUST have a "mid" attribute. Each control "m=" line
+ MAY have one or more "cmid" attributes that match the resource
+ control channel to the "mid" attributes of the audio "m=" lines it is
+ associated with. Note that if a control "m=" line does not have a
+ "cmid" attribute it will not be associated with any media. The
+ operations on such a resource will hence be limited. For example, if
+ it was a recognizer resource, the RECOGNIZE method requires an
+ associated media to process while the INTERPRET method does not. The
+ formatting of the "cmid" attribute is described by the following
+ ABNF:
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 22]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ cmid-attribute = "a=cmid:" identification-tag
+ identification-tag = token
+
+ To allow this flexible mapping of media sessions to MRCPv2 control
+ channels, a single audio "m=" line can be associated with multiple
+ resources, or each resource can have its own audio "m=" line. For
+ example, if the client wants to allocate a recognizer and a
+ synthesizer and associate them with a single two-way audio stream,
+ the SDP offer would contain two control "m=" lines and a single audio
+ "m=" line with an attribute of "sendrecv". Each of the control "m="
+ lines would have a "cmid" attribute whose value matches the "mid" of
+ the audio "m=" line. If, on the other hand, the client wants to
+ allocate a recognizer and a synthesizer each with its own separate
+ audio stream, the SDP offer would carry two control "m=" lines (one
+ for the recognizer and another for the synthesizer) and two audio
+ "m=" lines (one with the attribute "sendonly" and another with
+ attribute "recvonly"). The "cmid" attribute of the recognizer
+ control "m=" line would match the "mid" value of the "sendonly" audio
+ "m=" line, and the "cmid" attribute of the synthesizer control "m="
+ line would match the "mid" attribute of the "recvonly" "m=" line.
+
+ When a server receives media (e.g., audio) on a media session that is
+ associated with more than one media processing resource, it is the
+ responsibility of the server to receive and fork the media to the
+ resources that need to consume it. If multiple resources in an
+ MRCPv2 session are generating audio (or other media) to be sent on a
+ single associated media session, it is the responsibility of the
+ server either to multiplex the multiple streams onto the single RTP
+ session or to contain an embedded RTP mixer (see RFC 3550 [RFC3550])
+ to combine the multiple streams into one. In the former case, the
+ media stream will contain RTP packets generated by different sources,
+ and hence the packets will have different Synchronization Source
+ Identifiers (SSRCs). In the latter case, the RTP packets will
+ contain multiple Contributing Source Identifiers (CSRCs)
+ corresponding to the original streams before being combined by the
+ mixer. If an MRCPv2 server implementation neither multiplexes nor
+ mixes, it MUST disallow the client from associating multiple such
+ resources to a single audio stream by rejecting the SDP offer with a
+ SIP 488 "Not Acceptable" error. Note that there is a large installed
+ base that will return a SIP 501 "Not Implemented" error in this case.
+ To facilitate interoperability with this installed base, new
+ implementations SHOULD treat a 501 in this context as a 488 when it
+ is received from an element known to be a legacy implementation.
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 23]
+
+RFC 6787 MRCPv2 November 2012
+
+
+4.5. MRCPv2 Message Transport
+
+ The MRCPv2 messages defined in this document are transported over a
+ TCP or TLS connection between the client and the server. The method
+ for setting up this transport connection and the resource control
+ channel is discussed in Sections 4.1 and 4.2. Multiple resource
+ control channels between a client and a server that belong to
+ different SIP dialogs can share one or more TLS or TCP connections
+ between them; the server and client MUST support this mode of
+ operation. Clients and servers MUST use the MRCPv2 channel
+ identifier, carried in the Channel-Identifier header field in
+ individual MRCPv2 messages, to differentiate MRCPv2 messages from
+ different resource channels (see Section 6.2.1 for details). All
+ MRCPv2 servers MUST support TLS. Servers MAY use TCP without TLS in
+ controlled environments (e.g., not in the public Internet) where both
+ nodes are inside a protected perimeter, for example, preventing
+ access to the MRCP server from remote nodes outside the controlled
+ perimeter. It is up to the client to choose which mode of transport
+ it wants to use for an MRCPv2 session.
+
+ Most examples from here on show only the MRCPv2 messages and do not
+ show the SIP messages that may have been used to establish the MRCPv2
+ control channel.
+
+4.6. MRCPv2 Session Termination
+
+ If an MRCP client notices that the underlying connection has been
+ closed for one of its MRCP channels, and it has not previously
+ initiated a re-INVITE to close that channel, it MUST send a BYE to
+ close down the SIP dialog and all other MRCP channels. If an MRCP
+ server notices that the underlying connection has been closed for one
+ of its MRCP channels, and it has not previously received and accepted
+ a re-INVITE closing that channel, then it MUST send a BYE to close
+ down the SIP dialog and all other MRCP channels.
+
+5. MRCPv2 Specification
+
+ Except as otherwise indicated, MRCPv2 messages are Unicode encoded in
+ UTF-8 (RFC 3629 [RFC3629]) to allow many different languages to be
+ represented. DEFINE-GRAMMAR (Section 9.8), for example, is one such
+ exception, since its body can contain arbitrary XML in arbitrary (but
+ specified via XML) encodings. MRCPv2 also allows message bodies to
+ be represented in other character sets (for example, ISO 8859-1
+ [ISO.8859-1.1987]) because, in some locales, other character sets are
+ already in widespread use. The MRCPv2 headers (the first line of an
+ MRCP message) and header field names use only the US-ASCII subset of
+ UTF-8.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 24]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Lines are terminated by CRLF (carriage return, then line feed).
+ Also, some parameters in the message may contain binary data or a
+ record spanning multiple lines. Such fields have a length value
+ associated with the parameter, which indicates the number of octets
+ immediately following the parameter.
+
+5.1. Common Protocol Elements
+
+ The MRCPv2 message set consists of requests from the client to the
+ server, responses from the server to the client, and asynchronous
+ events from the server to the client. All these messages consist of
+ a start-line, one or more header fields, an empty line (i.e., a line
+ with nothing preceding the CRLF) indicating the end of the header
+ fields, and an optional message body.
+
+generic-message = start-line
+ message-header
+ CRLF
+ [ message-body ]
+
+message-body = *OCTET
+
+start-line = request-line / response-line / event-line
+
+message-header = 1*(generic-header / resource-header / generic-field)
+
+resource-header = synthesizer-header
+ / recognizer-header
+ / recorder-header
+ / verifier-header
+
+ The message-body contains resource-specific and message-specific
+ data. The actual media types used to carry the data are specified in
+ the sections defining the individual messages. Generic header fields
+ are described in Section 6.2.
+
+ If a message contains a message body, the message MUST contain
+ content-headers indicating the media type and encoding of the data in
+ the message body.
+
+ Request, response and event messages (described in following
+ sections) include the version of MRCP that the message conforms to.
+ Version compatibility rules follow [H3.1] regarding version ordering,
+ compliance requirements, and upgrading of version numbers. The
+ version information is indicated by "MRCP" (as opposed to "HTTP" in
+ [H3.1]) or "MRCP/2.0" (as opposed to "HTTP/1.1" in [H3.1]). To be
+ compliant with this specification, clients and servers sending MRCPv2
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 25]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ messages MUST indicate an mrcp-version of "MRCP/2.0". ABNF
+ productions using mrcp-version can be found in Sections 5.2, 5.3, and
+ 5.5.
+
+ mrcp-version = "MRCP" "/" 1*2DIGIT "." 1*2DIGIT
+
+ The message-length field specifies the length of the message in
+ octets, including the start-line, and MUST be the second token from
+ the beginning of the message. This is to make the framing and
+ parsing of the message simpler to do. This field specifies the
+ length of the message including data that may be encoded into the
+ body of the message. Note that this value MAY be given as a fixed-
+ length integer that is zero-padded (with leading zeros) in order to
+ eliminate or reduce inefficiency in cases where the message-length
+ value would change as a result of the length of the message-length
+ token itself. This value, as with all lengths in MRCP, is to be
+ interpreted as a base-10 number. In particular, leading zeros do not
+ indicate that the value is to be interpreted as a base-8 number.
+
+ message-length = 1*19DIGIT
+
+ The following sample MRCP exchange demonstrates proper message-length
+ values. The values for message-length have been removed from all
+ other examples in the specification and replaced by '...' to reduce
+ confusion in the case of minor message-length computation errors in
+ those examples.
+
+ C->S: MRCP/2.0 877 INTERPRET 543266
+ Channel-Identifier:32AECB23433801@speechrecog
+ Interpret-Text:may I speak to Andre Roy
+ Content-Type:application/srgs+xml
+ Content-ID:<request1@form-level.store>
+ Content-Length:661
+
+ <?xml version="1.0"?>
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0" root="request">
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 26]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+ </grammar>
+
+ S->C: MRCP/2.0 82 543266 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+ S->C: MRCP/2.0 634 INTERPRETATION-COMPLETE 543266 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+ Content-Type:application/nlsml+xml
+ Content-Length:441
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <instance name="Person">
+ <ex:Person>
+ <ex:Name> Andre Roy </ex:Name>
+ </ex:Person>
+ </instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+ All MRCPv2 messages, responses and events MUST carry the Channel-
+ Identifier header field so the server or client can differentiate
+ messages from different control channels that may share the same
+ transport connection.
+
+ In the resource-specific header field descriptions in Sections 8-11,
+ a header field is disallowed on a method (request, response, or
+ event) for that resource unless specifically listed as being allowed.
+ Also, the phrasing "This header field MAY occur on method X"
+ indicates that the header field is allowed on that method but is not
+ required to be used in every instance of that method.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 27]
+
+RFC 6787 MRCPv2 November 2012
+
+
+5.2. Request
+
+ An MRCPv2 request consists of a Request line followed by the message
+ header section and an optional message body containing data specific
+ to the request message.
+
+ The Request message from a client to the server includes within the
+ first line the method to be applied, a method tag for that request
+ and the version of the protocol in use.
+
+ request-line = mrcp-version SP message-length SP method-name
+ SP request-id CRLF
+
+ The mrcp-version field is the MRCP protocol version that is being
+ used by the client.
+
+ The message-length field specifies the length of the message,
+ including the start-line.
+
+ Details about the mrcp-version and message-length fields are given in
+ Section 5.1.
+
+ The method-name field identifies the specific request that the client
+ is making to the server. Each resource supports a subset of the
+ MRCPv2 methods. The subset for each resource is defined in the
+ section of the specification for the corresponding resource.
+
+ method-name = generic-method
+ / synthesizer-method
+ / recognizer-method
+ / recorder-method
+ / verifier-method
+
+ The request-id field is a unique identifier representable as an
+ unsigned 32-bit integer created by the client and sent to the server.
+ Clients MUST utilize monotonically increasing request-ids for
+ consecutive requests within an MRCP session. The request-id space is
+ linear (i.e., not mod(32)), so the space does not wrap, and validity
+ can be checked with a simple unsigned comparison operation. The
+ client may choose any initial value for its first request, but a
+ small integer is RECOMMENDED to avoid exhausting the space in long
+ sessions. If the server receives duplicate or out-of-order requests,
+ the server MUST reject the request with a response code of 410.
+ Since request-ids are scoped to the MRCP session, they are unique
+ across all TCP connections and all resource channels in the session.
+
+ The server resource MUST use the client-assigned identifier in its
+ response to the request. If the request does not complete
+
+
+
+Burnett & Shanmugham Standards Track [Page 28]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ synchronously, future asynchronous events associated with this
+ request MUST carry the client-assigned request-id.
+
+ request-id = 1*10DIGIT
+
+5.3. Response
+
+ After receiving and interpreting the request message for a method,
+ the server resource responds with an MRCPv2 response message. The
+ response consists of a response line followed by the message header
+ section and an optional message body containing data specific to the
+ method.
+
+ response-line = mrcp-version SP message-length SP request-id
+ SP status-code SP request-state CRLF
+
+ The mrcp-version field MUST contain the version of the request if
+ supported; otherwise, it MUST contain the highest version of MRCP
+ supported by the server.
+
+ The message-length field specifies the length of the message,
+ including the start-line.
+
+ Details about the mrcp-version and message-length fields are given in
+ Section 5.1.
+
+ The request-id used in the response MUST match the one sent in the
+ corresponding request message.
+
+ The status-code field is a 3-digit code representing the success or
+ failure or other status of the request.
+
+ status-code = 3DIGIT
+
+ The request-state field indicates if the action initiated by the
+ Request is PENDING, IN-PROGRESS, or COMPLETE. The COMPLETE status
+ means that the request was processed to completion and that there
+ will be no more events or other messages from that resource to the
+ client with that request-id. The PENDING status means that the
+ request has been placed in a queue and will be processed in first-in-
+ first-out order. The IN-PROGRESS status means that the request is
+ being processed and is not yet complete. A PENDING or IN-PROGRESS
+ status indicates that further Event messages may be delivered with
+ that request-id.
+
+ request-state = "COMPLETE"
+ / "IN-PROGRESS"
+ / "PENDING"
+
+
+
+Burnett & Shanmugham Standards Track [Page 29]
+
+RFC 6787 MRCPv2 November 2012
+
+
+5.4. Status Codes
+
+ The status codes are classified under the Success (2xx), Client
+ Failure (4xx), and Server Failure (5xx) codes.
+
+ +------------+--------------------------------------------------+
+ | Code | Meaning |
+ +------------+--------------------------------------------------+
+ | 200 | Success |
+ | 201 | Success with some optional header fields ignored |
+ +------------+--------------------------------------------------+
+
+ Success (2xx)
+
+ +--------+----------------------------------------------------------+
+ | Code | Meaning |
+ +--------+----------------------------------------------------------+
+ | 401 | Method not allowed |
+ | 402 | Method not valid in this state |
+ | 403 | Unsupported header field |
+ | 404 | Illegal value for header field. This is the error for a |
+ | | syntax violation. |
+ | 405 | Resource not allocated for this session or does not |
+ | | exist |
+ | 406 | Mandatory Header Field Missing |
+ | 407 | Method or Operation Failed (e.g., Grammar compilation |
+ | | failed in the recognizer. Detailed cause codes might be |
+ | | available through a resource-specific header.) |
+ | 408 | Unrecognized or unsupported message entity |
+ | 409 | Unsupported Header Field Value. This is a value that is |
+ | | syntactically legal but exceeds the implementation's |
+ | | capabilities or expectations. |
+ | 410 | Non-Monotonic or Out-of-order sequence number in request.|
+ | 411-420| Reserved for future assignment |
+ +--------+----------------------------------------------------------+
+
+ Client Failure (4xx)
+
+ +------------+--------------------------------+
+ | Code | Meaning |
+ +------------+--------------------------------+
+ | 501 | Server Internal Error |
+ | 502 | Protocol Version not supported |
+ | 503 | Reserved for future assignment |
+ | 504 | Message too large |
+ +------------+--------------------------------+
+
+ Server Failure (5xx)
+
+
+
+Burnett & Shanmugham Standards Track [Page 30]
+
+RFC 6787 MRCPv2 November 2012
+
+
+5.5. Events
+
+ The server resource may need to communicate a change in state or the
+ occurrence of a certain event to the client. These messages are used
+ when a request does not complete immediately and the response returns
+ a status of PENDING or IN-PROGRESS. The intermediate results and
+ events of the request are indicated to the client through the event
+ message from the server. The event message consists of an event
+ header line followed by the message header section and an optional
+ message body containing data specific to the event message. The
+ header line has the request-id of the corresponding request and
+ status value. The request-state value is COMPLETE if the request is
+ done and this was the last event, else it is IN-PROGRESS.
+
+ event-line = mrcp-version SP message-length SP event-name
+ SP request-id SP request-state CRLF
+
+ The mrcp-version used here is identical to the one used in the
+ Request/Response line and indicates the highest version of MRCP
+ running on the server.
+
+ The message-length field specifies the length of the message,
+ including the start-line.
+
+ Details about the mrcp-version and message-length fields are given in
+ Section 5.1.
+
+ The event-name identifies the nature of the event generated by the
+ media resource. The set of valid event names depends on the resource
+ generating it. See the corresponding resource-specific section of
+ the document.
+
+ event-name = synthesizer-event
+ / recognizer-event
+ / recorder-event
+ / verifier-event
+
+ The request-id used in the event MUST match the one sent in the
+ request that caused this event.
+
+ The request-state indicates whether the Request/Command causing this
+ event is complete or still in progress and whether it is the same as
+ the one mentioned in Section 5.3. The final event for a request has
+ a COMPLETE status indicating the completion of the request.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 31]
+
+RFC 6787 MRCPv2 November 2012
+
+
+6. MRCPv2 Generic Methods, Headers, and Result Structure
+
+ MRCPv2 supports a set of methods and header fields that are common to
+ all resources. These are discussed here; resource-specific methods
+ and header fields are discussed in the corresponding resource-
+ specific section of the document.
+
+6.1. Generic Methods
+
+ MRCPv2 supports two generic methods for reading and writing the state
+ associated with a resource.
+
+ generic-method = "SET-PARAMS"
+ / "GET-PARAMS"
+
+ These are described in the following subsections.
+
+6.1.1. SET-PARAMS
+
+ The SET-PARAMS method, from the client to the server, tells the
+ MRCPv2 resource to define parameters for the session, such as voice
+ characteristics and prosody on synthesizers, recognition timers on
+ recognizers, etc. If the server accepts and sets all parameters, it
+ MUST return a response status-code of 200. If it chooses to ignore
+ some optional header fields that can be safely ignored without
+ affecting operation of the server, it MUST return 201.
+
+ If one or more of the header fields being sent is incorrect, error
+ 403, 404, or 409 MUST be returned as follows:
+
+ o If one or more of the header fields being set has an illegal
+ value, the server MUST reject the request with a 404 Illegal Value
+ for Header Field.
+
+ o If one or more of the header fields being set is unsupported for
+ the resource, the server MUST reject the request with a 403
+ Unsupported Header Field, except as described in the next
+ paragraph.
+
+ o If one or more of the header fields being set has an unsupported
+ value, the server MUST reject the request with a 409 Unsupported
+ Header Field Value, except as described in the next paragraph.
+
+ If both error 404 and another error have occurred, only error 404
+ MUST be returned. If both errors 403 and 409 have occurred, but not
+ error 404, only error 403 MUST be returned.
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 32]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ If error 403, 404, or 409 is returned, the response MUST include the
+ bad or unsupported header fields and their values exactly as they
+ were sent from the client. Session parameters modified using
+ SET-PARAMS do not override parameters explicitly specified on
+ individual requests or requests that are IN-PROGRESS.
+
+ C->S: MRCP/2.0 ... SET-PARAMS 543256
+ Channel-Identifier:32AECB23433802@speechsynth
+ Voice-gender:female
+ Voice-variant:3
+
+ S->C: MRCP/2.0 ... 543256 200 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+
+6.1.2. GET-PARAMS
+
+ The GET-PARAMS method, from the client to the server, asks the MRCPv2
+ resource for its current session parameters, such as voice
+ characteristics and prosody on synthesizers, recognition timers on
+ recognizers, etc. For every header field the client sends in the
+ request without a value, the server MUST include the header field and
+ its corresponding value in the response. If no parameter header
+ fields are specified by the client, then the server MUST return all
+ the settable parameters and their values in the corresponding header
+ section of the response, including vendor-specific parameters. Such
+ wildcard parameter requests can be very processing-intensive, since
+ the number of settable parameters can be large depending on the
+ implementation. Hence, it is RECOMMENDED that the client not use the
+ wildcard GET-PARAMS operation very often. Note that GET-PARAMS
+ returns header field values that apply to the whole session and not
+ values that have a request-level scope. For example, Input-Waveform-
+ URI is a request-level header field and thus would not be returned by
+ GET-PARAMS.
+
+ If all of the header fields requested are supported, the server MUST
+ return a response status-code of 200. If some of the header fields
+ being retrieved are unsupported for the resource, the server MUST
+ reject the request with a 403 Unsupported Header Field. Such a
+ response MUST include the unsupported header fields exactly as they
+ were sent from the client, without values.
+
+ C->S: MRCP/2.0 ... GET-PARAMS 543256
+ Channel-Identifier:32AECB23433802@speechsynth
+ Voice-gender:
+ Voice-variant:
+ Vendor-Specific-Parameters:com.example.param1;
+ com.example.param2
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 33]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... 543256 200 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Voice-gender:female
+ Voice-variant:3
+ Vendor-Specific-Parameters:com.example.param1="Company Name";
+ com.example.param2="124324234@example.com"
+
+6.2. Generic Message Headers
+
+ All MRCPv2 header fields, which include both the generic-headers
+ defined in the following subsections and the resource-specific header
+ fields defined later, follow the same generic format as that given in
+ Section 3.1 of RFC 5322 [RFC5322]. Each header field consists of a
+ name followed by a colon (":") and the value. Header field names are
+ case-insensitive. The value MAY be preceded by any amount of LWS
+ (linear white space), though a single SP (space) is preferred.
+ Header fields may extend over multiple lines by preceding each extra
+ line with at least one SP or HT (horizontal tab).
+
+ generic-field = field-name ":" [ field-value ]
+ field-name = token
+ field-value = *LWS field-content *( CRLF 1*LWS field-content)
+ field-content = <the OCTETs making up the field-value
+ and consisting of either *TEXT or combinations
+ of token, separators, and quoted-string>
+
+ The field-content does not include any leading or trailing LWS (i.e.,
+ linear white space occurring before the first non-whitespace
+ character of the field-value or after the last non-whitespace
+ character of the field-value). Such leading or trailing LWS MAY be
+ removed without changing the semantics of the field value. Any LWS
+ that occurs between field-content MAY be replaced with a single SP
+ before interpreting the field value or forwarding the message
+ downstream.
+
+ MRCPv2 servers and clients MUST NOT depend on header field order. It
+ is RECOMMENDED to send general-header fields first, followed by
+ request-header or response-header fields, and ending with the entity-
+ header fields. However, MRCPv2 servers and clients MUST be prepared
+ to process the header fields in any order. The only exception to
+ this rule is when there are multiple header fields with the same name
+ in a message.
+
+ Multiple header fields with the same name MAY be present in a message
+ if and only if the entire value for that header field is defined as a
+ comma-separated list [i.e., #(values)].
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 34]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Since vendor-specific parameters may be order-dependent, it MUST be
+ possible to combine multiple header fields of the same name into one
+ "name:value" pair without changing the semantics of the message, by
+ appending each subsequent value to the first, each separated by a
+ comma. The order in which header fields with the same name are
+ received is therefore significant to the interpretation of the
+ combined header field value, and thus an intermediary MUST NOT change
+ the order of these values when a message is forwarded.
+
+ generic-header = channel-identifier
+ / accept
+ / active-request-id-list
+ / proxy-sync-id
+ / accept-charset
+ / content-type
+ / content-id
+ / content-base
+ / content-encoding
+ / content-location
+ / content-length
+ / fetch-timeout
+ / cache-control
+ / logging-tag
+ / set-cookie
+ / vendor-specific
+
+6.2.1. Channel-Identifier
+
+ All MRCPv2 requests, responses, and events MUST contain the Channel-
+ Identifier header field. The value is allocated by the server when a
+ control channel is added to the session and communicated to the
+ client by the "a=channel" attribute in the SDP answer from the
+ server. The header field value consists of 2 parts separated by the
+ '@' symbol. The first part is an unambiguous string identifying the
+ MRCPv2 session. The second part is a string token that specifies one
+ of the media processing resource types listed in Section 3.1. The
+ unambiguous string (first part) MUST be difficult to guess, unique
+ among the resource instances managed by the server, and common to all
+ resource channels with that server established through a single SIP
+ dialog.
+
+ channel-identifier = "Channel-Identifier" ":" channel-id CRLF
+ channel-id = 1*alphanum "@" 1*alphanum
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 35]
+
+RFC 6787 MRCPv2 November 2012
+
+
+6.2.2. Accept
+
+ The Accept header field follows the syntax defined in [H14.1]. The
+ semantics are also identical, with the exception that if no Accept
+ header field is present, the server MUST assume a default value that
+ is specific to the resource type that is being controlled. This
+ default value can be changed for a resource on a session by sending
+ this header field in a SET-PARAMS method. The current default value
+ of this header field for a resource in a session can be found through
+ a GET-PARAMS method. This header field MAY occur on any request.
+
+6.2.3. Active-Request-Id-List
+
+ In a request, this header field indicates the list of request-ids to
+ which the request applies. This is useful when there are multiple
+ requests that are PENDING or IN-PROGRESS and the client wants this
+ request to apply to one or more of these specifically.
+
+ In a response, this header field returns the list of request-ids that
+ the method modified or affected. There could be one or more requests
+ in a request-state of PENDING or IN-PROGRESS. When a method
+ affecting one or more PENDING or IN-PROGRESS requests is sent from
+ the client to the server, the response MUST contain the list of
+ request-ids that were affected or modified by this command in its
+ header section.
+
+ The Active-Request-Id-List is only used in requests and responses,
+ not in events.
+
+ For example, if a STOP request with no Active-Request-Id-List is sent
+ to a synthesizer resource that has one or more SPEAK requests in the
+ PENDING or IN-PROGRESS state, all SPEAK requests MUST be cancelled,
+ including the one IN-PROGRESS. The response to the STOP request
+ contains in the Active-Request-Id-List value the request-ids of all
+ the SPEAK requests that were terminated. After sending the STOP
+ response, the server MUST NOT send any SPEAK-COMPLETE or RECOGNITION-
+ COMPLETE events for the terminated requests.
+
+ active-request-id-list = "Active-Request-Id-List" ":"
+ request-id *("," request-id) CRLF
+
+6.2.4. Proxy-Sync-Id
+
+ When any server resource generates a "barge-in-able" event, it also
+ generates a unique tag. The tag is sent as this header field's value
+ in an event to the client. The client then acts as an intermediary
+ among the server resources and sends a BARGE-IN-OCCURRED method to
+ the synthesizer server resource with the Proxy-Sync-Id it received
+
+
+
+Burnett & Shanmugham Standards Track [Page 36]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ from the server resource. When the recognizer and synthesizer
+ resources are part of the same session, they may choose to work
+ together to achieve quicker interaction and response. Here, the
+ Proxy-Sync-Id helps the resource receiving the event, intermediated
+ by the client, to decide if this event has been processed through a
+ direct interaction of the resources. This header field MAY occur
+ only on events and the BARGE-IN-OCCURRED method. The name of this
+ header field contains the word 'proxy' only for historical reasons
+ and does not imply that a proxy server is involved.
+
+ proxy-sync-id = "Proxy-Sync-Id" ":" 1*VCHAR CRLF
+
+6.2.5. Accept-Charset
+
+ See [H14.2]. This specifies the acceptable character sets for
+ entities returned in the response or events associated with this
+ request. This is useful in specifying the character set to use in
+ the Natural Language Semantic Markup Language (NLSML) results of a
+ RECOGNITION-COMPLETE event. This header field is only used on
+ requests.
+
+6.2.6. Content-Type
+
+ See [H14.17]. MRCPv2 supports a restricted set of registered media
+ types for content, including speech markup, grammar, and recognition
+ results. The content types applicable to each MRCPv2 resource-type
+ are specified in the corresponding section of the document and are
+ registered in the MIME Media Types registry maintained by IANA. The
+ multipart content type "multipart/mixed" is supported to communicate
+ multiple of the above mentioned contents, in which case the body
+ parts MUST NOT contain any MRCPv2-specific header fields. This
+ header field MAY occur on all messages.
+
+ content-type = "Content-Type" ":" media-type-value CRLF
+
+ media-type-value = type "/" subtype *( ";" parameter )
+
+ type = token
+
+ subtype = token
+
+ parameter = attribute "=" value
+
+ attribute = token
+
+ value = token / quoted-string
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 37]
+
+RFC 6787 MRCPv2 November 2012
+
+
+6.2.7. Content-ID
+
+ This header field contains an ID or name for the content by which it
+ can be referenced. This header field operates according to the
+ specification in RFC 2392 [RFC2392] and is required for content
+ disambiguation in multipart messages. In MRCPv2, whenever the
+ associated content is stored by either the client or the server, it
+ MUST be retrievable using this ID. Such content can be referenced
+ later in a session by addressing it with the 'session' URI scheme
+ described in Section 13.6. This header field MAY occur on all
+ messages.
+
+6.2.8. Content-Base
+
+ The Content-Base entity-header MAY be used to specify the base URI
+ for resolving relative URIs within the entity.
+
+ content-base = "Content-Base" ":" absoluteURI CRLF
+
+ Note, however, that the base URI of the contents within the entity-
+ body may be redefined within that entity-body. An example of this
+ would be multipart media, which in turn can have multiple entities
+ within it. This header field MAY occur on all messages.
+
+6.2.9. Content-Encoding
+
+ The Content-Encoding entity-header is used as a modifier to the
+ Content-Type. When present, its value indicates what additional
+ content encoding has been applied to the entity-body, and thus what
+ decoding mechanisms must be applied in order to obtain the Media Type
+ referenced by the Content-Type header field. Content-Encoding is
+ primarily used to allow a document to be compressed without losing
+ the identity of its underlying media type. Note that the SIP session
+ can be used to determine accepted encodings (see Section 7). This
+ header field MAY occur on all messages.
+
+ content-encoding = "Content-Encoding" ":"
+ *WSP content-coding
+ *(*WSP "," *WSP content-coding *WSP )
+ CRLF
+
+ Content codings are defined in [H3.5]. An example of its use is
+ Content-Encoding:gzip
+
+ If multiple encodings have been applied to an entity, the content
+ encodings MUST be listed in the order in which they were applied.
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 38]
+
+RFC 6787 MRCPv2 November 2012
+
+
+6.2.10. Content-Location
+
+ The Content-Location entity-header MAY be used to supply the resource
+ location for the entity enclosed in the message when that entity is
+ accessible from a location separate from the requested resource's
+ URI. Refer to [H14.14].
+
+ content-location = "Content-Location" ":"
+ ( absoluteURI / relativeURI ) CRLF
+
+ The Content-Location value is a statement of the location of the
+ resource corresponding to this particular entity at the time of the
+ request. This header field is provided for optimization purposes
+ only. The receiver of this header field MAY assume that the entity
+ being sent is identical to what would have been retrieved or might
+ already have been retrieved from the Content-Location URI.
+
+ For example, if the client provided a grammar markup inline, and it
+ had previously retrieved it from a certain URI, that URI can be
+ provided as part of the entity, using the Content-Location header
+ field. This allows a resource like the recognizer to look into its
+ cache to see if this grammar was previously retrieved, compiled, and
+ cached. In this case, it might optimize by using the previously
+ compiled grammar object.
+
+ If the Content-Location is a relative URI, the relative URI is
+ interpreted relative to the Content-Base URI. This header field MAY
+ occur on all messages.
+
+6.2.11. Content-Length
+
+ This header field contains the length of the content of the message
+ body (i.e., after the double CRLF following the last header field).
+ Unlike in HTTP, it MUST be included in all messages that carry
+ content beyond the header section. If it is missing, a default value
+ of zero is assumed. Otherwise, it is interpreted according to
+ [H14.13]. When a message having no use for a message body contains
+ one, i.e., the Content-Length is non-zero, the receiver MUST ignore
+ the content of the message body. This header field MAY occur on all
+ messages.
+
+ content-length = "Content-Length" ":" 1*19DIGIT CRLF
+
+6.2.12. Fetch Timeout
+
+ When the recognizer or synthesizer needs to fetch documents or other
+ resources, this header field controls the corresponding URI access
+ properties. This defines the timeout for content that the server may
+
+
+
+Burnett & Shanmugham Standards Track [Page 39]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ need to fetch over the network. The value is interpreted to be in
+ milliseconds and ranges from 0 to an implementation-specific maximum
+ value. It is RECOMMENDED that servers be cautious about accepting
+ long timeout values. The default value for this header field is
+ implementation specific. This header field MAY occur in DEFINE-
+ GRAMMAR, RECOGNIZE, SPEAK, SET-PARAMS, or GET-PARAMS.
+
+ fetch-timeout = "Fetch-Timeout" ":" 1*19DIGIT CRLF
+
+6.2.13. Cache-Control
+
+ If the server implements content caching, it MUST adhere to the cache
+ correctness rules of HTTP 1.1 [RFC2616] when accessing and caching
+ stored content. In particular, the "expires" and "cache-control"
+ header fields of the cached URI or document MUST be honored and take
+ precedence over the Cache-Control defaults set by this header field.
+ The Cache-Control directives are used to define the default caching
+ algorithms on the server for the session or request. The scope of
+ the directive is based on the method it is sent on. If the directive
+ is sent on a SET-PARAMS method, it applies for all requests for
+ external documents the server makes during that session, unless it is
+ overridden by a Cache-Control header field on an individual request.
+ If the directives are sent on any other requests, they apply only to
+ external document requests the server makes for that request. An
+ empty Cache-Control header field on the GET-PARAMS method is a
+ request for the server to return the current Cache-Control directives
+ setting on the server. This header field MAY occur only on requests.
+
+ cache-control = "Cache-Control" ":"
+ [*WSP cache-directive
+ *( *WSP "," *WSP cache-directive *WSP )]
+ CRLF
+
+ cache-directive = "max-age" "=" delta-seconds
+ / "max-stale" [ "=" delta-seconds ]
+ / "min-fresh" "=" delta-seconds
+
+ delta-seconds = 1*19DIGIT
+
+ Here, delta-seconds is a decimal time value specifying the number of
+ seconds since the instant the message response or data was received
+ by the server.
+
+ The different cache-directive options allow the client to ask the
+ server to override the default cache expiration mechanisms:
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 40]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ max-age Indicates that the client can tolerate the server
+ using content whose age is no greater than the
+ specified time in seconds. Unless a "max-stale"
+ directive is also included, the client is not willing
+ to accept a response based on stale data.
+
+ min-fresh Indicates that the client is willing to accept a
+ server response with cached data whose expiration is
+ no less than its current age plus the specified time
+ in seconds. If the server's cache time-to-live
+ exceeds the client-supplied min-fresh value, the
+ server MUST NOT utilize cached content.
+
+ max-stale Indicates that the client is willing to allow a server
+ to utilize cached data that has exceeded its
+ expiration time. If "max-stale" is assigned a value,
+ then the client is willing to allow the server to use
+ cached data that has exceeded its expiration time by
+ no more than the specified number of seconds. If no
+ value is assigned to "max-stale", then the client is
+ willing to allow the server to use stale data of any
+ age.
+
+ If the server cache is requested to use stale response/data without
+ validation, it MAY do so only if this does not conflict with any
+ "MUST"-level requirements concerning cache validation (e.g., a "must-
+ revalidate" Cache-Control directive in the HTTP 1.1 specification
+ pertaining to the corresponding URI).
+
+ If both the MRCPv2 Cache-Control directive and the cached entry on
+ the server include "max-age" directives, then the lesser of the two
+ values is used for determining the freshness of the cached entry for
+ that request.
+
+6.2.14. Logging-Tag
+
+ This header field MAY be sent as part of a SET-PARAMS/GET-PARAMS
+ method to set or retrieve the logging tag for logs generated by the
+ server. Once set, the value persists until a new value is set or the
+ session ends. The MRCPv2 server MAY provide a mechanism to create
+ subsets of its output logs so that system administrators can examine
+ or extract only the log file portion during which the logging tag was
+ set to a certain value.
+
+ It is RECOMMENDED that clients include in the logging tag information
+ to identify the MRCPv2 client User Agent, so that one can determine
+ which MRCPv2 client request generated a given log message at the
+ server. It is also RECOMMENDED that MRCPv2 clients not log
+
+
+
+Burnett & Shanmugham Standards Track [Page 41]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ personally identifiable information such as credit card numbers and
+ national identification numbers.
+
+ logging-tag = "Logging-Tag" ":" 1*UTFCHAR CRLF
+
+6.2.15. Set-Cookie
+
+ Since the associated HTTP client on an MRCPv2 server fetches
+ documents for processing on behalf of the MRCPv2 client, the cookie
+ store in the HTTP client of the MRCPv2 server is treated as an
+ extension of the cookie store in the HTTP client of the MRCPv2
+ client. This requires that the MRCPv2 client and server be able to
+ synchronize their common cookie store as needed. To enable the
+ MRCPv2 client to push its stored cookies to the MRCPv2 server and get
+ new cookies from the MRCPv2 server stored back to the MRCPv2 client,
+ the Set-Cookie entity-header field MAY be included in MRCPv2 requests
+ to update the cookie store on a server and be returned in final
+ MRCPv2 responses or events to subsequently update the client's own
+ cookie store. The stored cookies on the server persist for the
+ duration of the MRCPv2 session and MUST be destroyed at the end of
+ the session. To ensure support for cookies, MRCPv2 clients and
+ servers MUST support the Set-Cookie entity-header field.
+
+ Note that it is the MRCPv2 client that determines which, if any,
+ cookies are sent to the server. There is no requirement that all
+ cookies be shared. Rather, it is RECOMMENDED that MRCPv2 clients
+ communicate only cookies needed by the MRCPv2 server to process its
+ requests.
+
+ set-cookie = "Set-Cookie:" cookies CRLF
+ cookies = cookie *("," *LWS cookie)
+ cookie = attribute "=" value *(";" cookie-av)
+ cookie-av = "Comment" "=" value
+ / "Domain" "=" value
+ / "Max-Age" "=" value
+ / "Path" "=" value
+ / "Secure"
+ / "Version" "=" 1*19DIGIT
+ / "Age" "=" delta-seconds
+
+ set-cookie = "Set-Cookie:" SP set-cookie-string
+ set-cookie-string = cookie-pair *( ";" SP cookie-av )
+ cookie-pair = cookie-name "=" cookie-value
+ cookie-name = token
+ cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
+ cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
+ token = <token, defined in [RFC2616], Section 2.2>
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 42]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ cookie-av = expires-av / max-age-av / domain-av /
+ path-av / secure-av / httponly-av /
+ extension-av / age-av
+ expires-av = "Expires=" sane-cookie-date
+ sane-cookie-date = <rfc1123-date, defined in [RFC2616], Section 3.3.1>
+ max-age-av = "Max-Age=" non-zero-digit *DIGIT
+ non-zero-digit = %x31-39
+ domain-av = "Domain=" domain-value
+ domain-value = <subdomain>
+ path-av = "Path=" path-value
+ path-value = <any CHAR except CTLs or ";">
+ secure-av = "Secure"
+ httponly-av = "HttpOnly"
+ extension-av = <any CHAR except CTLs or ";">
+ age-av = "Age=" delta-seconds
+
+ The Set-Cookie header field is specified in RFC 6265 [RFC6265]. The
+ "Age" attribute is introduced in this specification to indicate the
+ age of the cookie and is OPTIONAL. An MRCPv2 client or server MUST
+ calculate the age of the cookie according to the age calculation
+ rules in the HTTP/1.1 specification [RFC2616] and append the "Age"
+ attribute accordingly. This attribute is provided because time may
+ have passed since the client received the cookie from an HTTP server.
+ Rather than having the client reduce Max-Age by the actual age, it
+ passes Max-Age verbatim and appends the "Age" attribute, thus
+ maintaining the cookie as received while still accounting for the
+ fact that time has passed.
+
+ The MRCPv2 client or server MUST supply defaults for the "Domain" and
+ "Path" attributes, as specified in RFC 6265, if they are omitted by
+ the HTTP origin server. Note that there is no leading dot present in
+ the "Domain" attribute value in this case. Although an explicitly
+ specified "Domain" value received via the HTTP protocol may be
+ modified to include a leading dot, an MRCPv2 client or server MUST
+ NOT modify the "Domain" value when received via the MRCPv2 protocol.
+
+ An MRCPv2 client or server MAY combine multiple cookie header fields
+ of the same type into a single "field-name:field-value" pair as
+ described in Section 6.2.
+
+ The Set-Cookie header field MAY be specified in any request that
+ subsequently results in the server performing an HTTP access. When a
+ server receives new cookie information from an HTTP origin server,
+ and assuming the cookie store is modified according to RFC 6265, the
+ server MUST return the new cookie information in the MRCPv2 COMPLETE
+ response or event, as appropriate, to allow the client to update its
+ own cookie store.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 43]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ The SET-PARAMS request MAY specify the Set-Cookie header field to
+ update the cookie store on a server. The GET-PARAMS request MAY be
+ used to return the entire cookie store of "Set-Cookie" type cookies
+ to the client.
+
+6.2.16. Vendor-Specific Parameters
+
+ This set of header fields allows for the client to set or retrieve
+ vendor-specific parameters.
+
+ vendor-specific = "Vendor-Specific-Parameters" ":"
+ [vendor-specific-av-pair
+ *(";" vendor-specific-av-pair)] CRLF
+
+ vendor-specific-av-pair = vendor-av-pair-name "="
+ value
+
+ vendor-av-pair-name = 1*UTFCHAR
+
+ Header fields of this form MAY be sent in any method (request) and
+ are used to manage implementation-specific parameters on the server
+ side. The vendor-av-pair-name follows the reverse Internet Domain
+ Name convention (see Section 13.1.6 for syntax and registration
+ information). The value of the vendor attribute is specified after
+ the "=" symbol and MAY be quoted. For example:
+
+ com.example.companyA.paramxyz=256
+ com.example.companyA.paramabc=High
+ com.example.companyB.paramxyz=Low
+
+ When used in GET-PARAMS to get the current value of these parameters
+ from the server, this header field value MAY contain a semicolon-
+ separated list of implementation-specific attribute names.
+
+6.3. Generic Result Structure
+
+ Result data from the server for the Recognizer and Verifier resources
+ is carried as a typed media entity in the MRCPv2 message body of
+ various events. The Natural Language Semantics Markup Language
+ (NLSML), an XML markup based on an early draft from the W3C, is the
+ default standard for returning results back to the client. Hence,
+ all servers implementing these resource types MUST support the media
+ type 'application/nlsml+xml'. The Extensible MultiModal Annotation
+ (EMMA) [W3C.REC-emma-20090210] format can be used to return results
+ as well. This can be done by negotiating the format at session
+ establishment time with SDP (a=resultformat:application/emma+xml) or
+ with SIP (Allow/Accept). With SIP, for example, if a client wants
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 44]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ results in EMMA, an MRCPv2 server can route the request to another
+ server that supports EMMA by inspecting the SIP header fields, rather
+ than having to inspect the SDP.
+
+ MRCPv2 uses this representation to convey content among the clients
+ and servers that generate and make use of the markup. MRCPv2 uses
+ NSLML specifically to convey recognition, enrollment, and
+ verification results between the corresponding resource on the MRCPv2
+ server and the MRCPv2 client. Details of this result format are
+ fully described in Section 6.3.1.
+
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="http://theYesNoGrammar">
+ <interpretation>
+ <instance>
+ <ex:response>yes</ex:response>
+ </instance>
+ <input>OK</input>
+ </interpretation>
+ </result>
+
+ Result Example
+
+6.3.1. Natural Language Semantics Markup Language
+
+ The Natural Language Semantics Markup Language (NLSML) is an XML data
+ structure with elements and attributes designed to carry result
+ information from recognizer (including enrollment) and verifier
+ resources. The normative definition of NLSML is the RelaxNG schema
+ in Section 16.1. Note that the elements and attributes of this
+ format are defined in the MRCPv2 namespace. In the result structure,
+ they must either be prefixed by a namespace prefix declared within
+ the result or must be children of an element identified as belonging
+ to the respective namespace. For details on how to use XML
+ Namespaces, see [W3C.REC-xml-names11-20040204]. Section 2 of
+ [W3C.REC-xml-names11-20040204] provides details on how to declare
+ namespaces and namespace prefixes.
+
+ The root element of NLSML is <result>. Optional child elements are
+ <interpretation>, <enrollment-result>, and <verification-result>, at
+ least one of which must be present. A single <result> MAY contain
+ any or all of the optional child elements. Details of the <result>
+ and <interpretation> elements and their subelements and attributes
+
+
+
+Burnett & Shanmugham Standards Track [Page 45]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ can be found in Section 9.6. Details of the <enrollment-result>
+ element and its subelements can be found in Section 9.7. Details of
+ the <verification-result> element and its subelements can be found in
+ Section 11.5.2.
+
+7. Resource Discovery
+
+ Server resources may be discovered and their capabilities learned by
+ clients through standard SIP machinery. The client MAY issue a SIP
+ OPTIONS transaction to a server, which has the effect of requesting
+ the capabilities of the server. The server MUST respond to such a
+ request with an SDP-encoded description of its capabilities according
+ to RFC 3264 [RFC3264]. The MRCPv2 capabilities are described by a
+ single "m=" line containing the media type "application" and
+ transport type "TCP/TLS/MRCPv2" or "TCP/MRCPv2". There MUST be one
+ "resource" attribute for each media resource that the server
+ supports, and it has the resource type identifier as its value.
+
+ The SDP description MUST also contain "m=" lines describing the audio
+ capabilities and the coders the server supports.
+
+ In this example, the client uses the SIP OPTIONS method to query the
+ capabilities of the MRCPv2 server.
+
+ C->S:
+ OPTIONS sip:mrcp@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf7
+ Max-Forwards:6
+ To:<sip:mrcp@example.com>
+ From:Sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:63104 OPTIONS
+ Contact:<sip:sarvi@client.example.com>
+ Accept:application/sdp
+ Content-Length:0
+
+
+ S->C:
+ SIP/2.0 200 OK
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bf7;received=192.0.32.10
+ To:<sip:mrcp@example.com>;tag=62784
+ From:Sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:63104 OPTIONS
+ Contact:<sip:mrcp@server.example.com>
+ Allow:INVITE, ACK, CANCEL, OPTIONS, BYE
+
+
+
+Burnett & Shanmugham Standards Track [Page 46]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Accept:application/sdp
+ Accept-Encoding:gzip
+ Accept-Language:en
+ Supported:foo
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=sarvi 2890844536 2890842811 IN IP4 192.0.2.12
+ s=-
+ i=MRCPv2 server capabilities
+ c=IN IP4 192.0.2.12/127
+ t=0 0
+ m=application 0 TCP/TLS/MRCPv2 1
+ a=resource:speechsynth
+ a=resource:speechrecog
+ a=resource:speakverify
+ m=audio 0 RTP/AVP 0 3
+ a=rtpmap:0 PCMU/8000
+ a=rtpmap:3 GSM/8000
+
+ Using SIP OPTIONS for MRCPv2 Server Capability Discovery
+
+8. Speech Synthesizer Resource
+
+ This resource processes text markup provided by the client and
+ generates a stream of synthesized speech in real time. Depending
+ upon the server implementation and capability of this resource, the
+ client can also dictate parameters of the synthesized speech such as
+ voice characteristics, speaker speed, etc.
+
+ The synthesizer resource is controlled by MRCPv2 requests from the
+ client. Similarly, the resource can respond to these requests or
+ generate asynchronous events to the client to indicate conditions of
+ interest to the client during the generation of the synthesized
+ speech stream.
+
+ This section applies for the following resource types:
+
+ o speechsynth
+
+ o basicsynth
+
+ The capabilities of these resources are defined in Section 3.1.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 47]
+
+RFC 6787 MRCPv2 November 2012
+
+
+8.1. Synthesizer State Machine
+
+ The synthesizer maintains a state machine to process MRCPv2 requests
+ from the client. The state transitions shown below describe the
+ states of the synthesizer and reflect the state of the request at the
+ head of the synthesizer resource queue. A SPEAK request in the
+ PENDING state can be deleted or stopped by a STOP request without
+ affecting the state of the resource.
+
+ Idle Speaking Paused
+ State State State
+ | | |
+ |----------SPEAK-------->| |--------|
+ |<------STOP-------------| CONTROL |
+ |<----SPEAK-COMPLETE-----| |------->|
+ |<----BARGE-IN-OCCURRED--| |
+ | |---------| |
+ | CONTROL |-----------PAUSE--------->|
+ | |-------->|<----------RESUME---------|
+ | | |----------|
+ |----------| | PAUSE |
+ | BARGE-IN-OCCURRED | |--------->|
+ |<---------| |----------| |
+ | | SPEECH-MARKER |
+ | |<---------| |
+ |----------| |----------| |
+ | STOP | RESUME |
+ | | |<---------| |
+ |<---------| | |
+ |<---------------------STOP-------------------------|
+ |----------| | |
+ | DEFINE-LEXICON | |
+ | | | |
+ |<---------| | |
+ |<---------------BARGE-IN-OCCURRED------------------|
+
+ Synthesizer State Machine
+
+8.2. Synthesizer Methods
+
+ The synthesizer supports the following methods.
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 48]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ synthesizer-method = "SPEAK"
+ / "STOP"
+ / "PAUSE"
+ / "RESUME"
+ / "BARGE-IN-OCCURRED"
+ / "CONTROL"
+ / "DEFINE-LEXICON"
+
+8.3. Synthesizer Events
+
+ The synthesizer can generate the following events.
+
+ synthesizer-event = "SPEECH-MARKER"
+ / "SPEAK-COMPLETE"
+
+8.4. Synthesizer Header Fields
+
+ A synthesizer method can contain header fields containing request
+ options and information to augment the Request, Response, or Event it
+ is associated with.
+
+ synthesizer-header = jump-size
+ / kill-on-barge-in
+ / speaker-profile
+ / completion-cause
+ / completion-reason
+ / voice-parameter
+ / prosody-parameter
+ / speech-marker
+ / speech-language
+ / fetch-hint
+ / audio-fetch-hint
+ / failed-uri
+ / failed-uri-cause
+ / speak-restart
+ / speak-length
+ / load-lexicon
+ / lexicon-search-order
+
+8.4.1. Jump-Size
+
+ This header field MAY be specified in a CONTROL method and controls
+ the amount to jump forward or backward in an active SPEAK request. A
+ '+' or '-' indicates a relative value to what is being currently
+ played. This header field MAY also be specified in a SPEAK request
+ as a desired offset into the synthesized speech. In this case, the
+ synthesizer MUST begin speaking from this amount of time into the
+ speech markup. Note that an offset that extends beyond the end of
+
+
+
+Burnett & Shanmugham Standards Track [Page 49]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ the produced speech will result in audio of length zero. The
+ different speech length units supported are dependent on the
+ synthesizer implementation. If the synthesizer resource does not
+ support a unit for the operation, the resource MUST respond with a
+ status-code of 409 "Unsupported Header Field Value".
+
+ jump-size = "Jump-Size" ":" speech-length-value CRLF
+
+ speech-length-value = numeric-speech-length
+ / text-speech-length
+
+ text-speech-length = 1*UTFCHAR SP "Tag"
+
+ numeric-speech-length = ("+" / "-") positive-speech-length
+
+ positive-speech-length = 1*19DIGIT SP numeric-speech-unit
+
+ numeric-speech-unit = "Second"
+ / "Word"
+ / "Sentence"
+ / "Paragraph"
+
+8.4.2. Kill-On-Barge-In
+
+ This header field MAY be sent as part of the SPEAK method to enable
+ "kill-on-barge-in" support. If enabled, the SPEAK method is
+ interrupted by DTMF input detected by a signal detector resource or
+ by the start of speech sensed or recognized by the speech recognizer
+ resource.
+
+ kill-on-barge-in = "Kill-On-Barge-In" ":" BOOLEAN CRLF
+
+ The client MUST send a BARGE-IN-OCCURRED method to the synthesizer
+ resource when it receives a barge-in-able event from any source.
+ This source could be a synthesizer resource or signal detector
+ resource and MAY be either local or distributed. If this header
+ field is not specified in a SPEAK request or explicitly set by a
+ SET-PARAMS, the default value for this header field is "true".
+
+ If the recognizer or signal detector resource is on the same server
+ as the synthesizer and both are part of the same session, the server
+ MAY work with both to provide internal notification to the
+ synthesizer so that audio may be stopped without having to wait for
+ the client's BARGE-IN-OCCURRED event.
+
+ It is generally RECOMMENDED when playing a prompt to the user with
+ Kill-On-Barge-In and asking for input, that the client issue the
+ RECOGNIZE request ahead of the SPEAK request for optimum performance
+
+
+
+Burnett & Shanmugham Standards Track [Page 50]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ and user experience. This way, it is guaranteed that the recognizer
+ is online before the prompt starts playing and the user's speech will
+ not be truncated at the beginning (especially for power users).
+
+8.4.3. Speaker-Profile
+
+ This header field MAY be part of the SET-PARAMS/GET-PARAMS or SPEAK
+ request from the client to the server and specifies a URI that
+ references the profile of the speaker. Speaker profiles are
+ collections of voice parameters like gender, accent, etc.
+
+ speaker-profile = "Speaker-Profile" ":" uri CRLF
+
+8.4.4. Completion-Cause
+
+ This header field MUST be specified in a SPEAK-COMPLETE event coming
+ from the synthesizer resource to the client. This indicates the
+ reason the SPEAK request completed.
+
+ completion-cause = "Completion-Cause" ":" 3DIGIT SP
+ 1*VCHAR CRLF
+
+ +------------+-----------------------+------------------------------+
+ | Cause-Code | Cause-Name | Description |
+ +------------+-----------------------+------------------------------+
+ | 000 | normal | SPEAK completed normally. |
+ | 001 | barge-in | SPEAK request was terminated |
+ | | | because of barge-in. |
+ | 002 | parse-failure | SPEAK request terminated |
+ | | | because of a failure to |
+ | | | parse the speech markup |
+ | | | text. |
+ | 003 | uri-failure | SPEAK request terminated |
+ | | | because access to one of the |
+ | | | URIs failed. |
+ | 004 | error | SPEAK request terminated |
+ | | | prematurely due to |
+ | | | synthesizer error. |
+ | 005 | language-unsupported | Language not supported. |
+ | 006 | lexicon-load-failure | Lexicon loading failed. |
+ | 007 | cancelled | A prior SPEAK request failed |
+ | | | while this one was still in |
+ | | | the queue. |
+ +------------+-----------------------+------------------------------+
+
+ Synthesizer Resource Completion Cause Codes
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 51]
+
+RFC 6787 MRCPv2 November 2012
+
+
+8.4.5. Completion-Reason
+
+ This header field MAY be specified in a SPEAK-COMPLETE event coming
+ from the synthesizer resource to the client. This contains the
+ reason text behind the SPEAK request completion. This header field
+ communicates text describing the reason for the failure, such as an
+ error in parsing the speech markup text.
+
+ completion-reason = "Completion-Reason" ":"
+ quoted-string CRLF
+
+ The completion reason text is provided for client use in logs and for
+ debugging and instrumentation purposes. Clients MUST NOT interpret
+ the completion reason text.
+
+8.4.6. Voice-Parameter
+
+ This set of header fields defines the voice of the speaker.
+
+ voice-parameter = voice-gender
+ / voice-age
+ / voice-variant
+ / voice-name
+
+ voice-gender = "Voice-Gender:" voice-gender-value CRLF
+ voice-gender-value = "male"
+ / "female"
+ / "neutral"
+ voice-age = "Voice-Age:" 1*3DIGIT CRLF
+ voice-variant = "Voice-Variant:" 1*19DIGIT CRLF
+ voice-name = "Voice-Name:"
+ 1*UTFCHAR *(1*WSP 1*UTFCHAR) CRLF
+
+ The "Voice-" parameters are derived from the similarly named
+ attributes of the voice element specified in W3C's Speech Synthesis
+ Markup Language Specification (SSML)
+ [W3C.REC-speech-synthesis-20040907]. Legal values for these
+ parameters are as defined in that specification.
+
+ These header fields MAY be sent in SET-PARAMS or GET-PARAMS requests
+ to define or get default values for the entire session or MAY be sent
+ in the SPEAK request to define default values for that SPEAK request.
+ Note that SSML content can itself set these values internal to the
+ SSML document, of course.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 52]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Voice parameter header fields MAY also be sent in a CONTROL method to
+ affect a SPEAK request in progress and change its behavior on the
+ fly. If the synthesizer resource does not support this operation, it
+ MUST reject the request with a status-code of 403 "Unsupported Header
+ Field".
+
+8.4.7. Prosody-Parameters
+
+ This set of header fields defines the prosody of the speech.
+
+ prosody-parameter = "Prosody-" prosody-param-name ":"
+ prosody-param-value CRLF
+
+ prosody-param-name = 1*VCHAR
+
+ prosody-param-value = 1*VCHAR
+
+ prosody-param-name is any one of the attribute names under the
+ prosody element specified in W3C's Speech Synthesis Markup Language
+ Specification [W3C.REC-speech-synthesis-20040907]. The prosody-
+ param-value is any one of the value choices of the corresponding
+ prosody element attribute from that specification.
+
+ These header fields MAY be sent in SET-PARAMS or GET-PARAMS requests
+ to define or get default values for the entire session or MAY be sent
+ in the SPEAK request to define default values for that SPEAK request.
+ Furthermore, these attributes can be part of the speech text marked
+ up in SSML.
+
+ The prosody parameter header fields in the SET-PARAMS or SPEAK
+ request only apply if the speech data is of type 'text/plain' and
+ does not use a speech markup format.
+
+ These prosody parameter header fields MAY also be sent in a CONTROL
+ method to affect a SPEAK request in progress and change its behavior
+ on the fly. If the synthesizer resource does not support this
+ operation, it MUST respond back to the client with a status-code of
+ 403 "Unsupported Header Field".
+
+8.4.8. Speech-Marker
+
+ This header field contains timestamp information in a "timestamp"
+ field. This is a Network Time Protocol (NTP) [RFC5905] timestamp, a
+ 64-bit number in decimal form. It MUST be synced with the Real-Time
+ Protocol (RTP) [RFC3550] timestamp of the media stream through the
+ Real-Time Control Protocol (RTCP) [RFC3550].
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 53]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Markers are bookmarks that are defined within the markup. Most
+ speech markup formats provide mechanisms to embed marker fields
+ within speech texts. The synthesizer generates SPEECH-MARKER events
+ when it reaches these marker fields. This header field MUST be part
+ of the SPEECH-MARKER event and contain the marker tag value after the
+ timestamp, separated by a semicolon. In these events, the timestamp
+ marks the time the text corresponding to the marker was emitted as
+ speech by the synthesizer.
+
+ This header field MUST also be returned in responses to STOP,
+ CONTROL, and BARGE-IN-OCCURRED methods, in the SPEAK-COMPLETE event,
+ and in an IN-PROGRESS SPEAK response. In these messages, if any
+ markers have been encountered for the current SPEAK, the marker tag
+ value MUST be the last embedded marker encountered. If no markers
+ have yet been encountered for the current SPEAK, only the timestamp
+ is REQUIRED. Note that in these events, the purpose of this header
+ field is to provide timestamp information associated with important
+ events within the lifecycle of a request (start of SPEAK processing,
+ end of SPEAK processing, receipt of CONTROL/STOP/BARGE-IN-OCCURRED).
+
+ timestamp = "timestamp" "=" time-stamp-value
+
+ time-stamp-value = 1*20DIGIT
+
+ speech-marker = "Speech-Marker" ":"
+ timestamp
+ [";" 1*(UTFCHAR / %x20)] CRLF
+
+8.4.9. Speech-Language
+
+ This header field specifies the default language of the speech data
+ if the language is not specified in the markup. The value of this
+ header field MUST follow RFC 5646 [RFC5646] for its values. The
+ header field MAY occur in SPEAK, SET-PARAMS, or GET-PARAMS requests.
+
+ speech-language = "Speech-Language" ":" 1*VCHAR CRLF
+
+8.4.10. Fetch-Hint
+
+ When the synthesizer needs to fetch documents or other resources like
+ speech markup or audio files, this header field controls the
+ corresponding URI access properties. This provides client policy on
+ when the synthesizer should retrieve content from the server. A
+ value of "prefetch" indicates the content MAY be downloaded when the
+ request is received, whereas "safe" indicates that content MUST NOT
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 54]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ be downloaded until actually referenced. The default value is
+ "prefetch". This header field MAY occur in SPEAK, SET-PARAMS, or
+ GET-PARAMS requests.
+
+ fetch-hint = "Fetch-Hint" ":" ("prefetch" / "safe") CRLF
+
+8.4.11. Audio-Fetch-Hint
+
+ When the synthesizer needs to fetch documents or other resources like
+ speech audio files, this header field controls the corresponding URI
+ access properties. This provides client policy whether or not the
+ synthesizer is permitted to attempt to optimize speech by pre-
+ fetching audio. The value is either "safe" to say that audio is only
+ fetched when it is referenced, never before; "prefetch" to permit,
+ but not require the implementation to pre-fetch the audio; or
+ "stream" to allow it to stream the audio fetches. The default value
+ is "prefetch". This header field MAY occur in SPEAK, SET-PARAMS, or
+ GET-PARAMS requests.
+
+ audio-fetch-hint = "Audio-Fetch-Hint" ":"
+ ("prefetch" / "safe" / "stream") CRLF
+
+8.4.12. Failed-URI
+
+ When a synthesizer method needs a synthesizer to fetch or access a
+ URI and the access fails, the server SHOULD provide the failed URI in
+ this header field in the method response, unless there are multiple
+ URI failures, in which case the server MUST provide one of the failed
+ URIs in this header field in the method response.
+
+ failed-uri = "Failed-URI" ":" absoluteURI CRLF
+
+8.4.13. Failed-URI-Cause
+
+ When a synthesizer method needs a synthesizer to fetch or access a
+ URI and the access fails, the server MUST provide the URI-specific or
+ protocol-specific response code for the URI in the Failed-URI header
+ field in the method response through this header field. The value
+ encoding is UTF-8 (RFC 3629 [RFC3629]) to accommodate any access
+ protocol -- some access protocols might have a response string
+ instead of a numeric response code.
+
+ failed-uri-cause = "Failed-URI-Cause" ":" 1*UTFCHAR CRLF
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 55]
+
+RFC 6787 MRCPv2 November 2012
+
+
+8.4.14. Speak-Restart
+
+ When a client issues a CONTROL request to a currently speaking
+ synthesizer resource to jump backward, and the target jump point is
+ before the start of the current SPEAK request, the current SPEAK
+ request MUST restart from the beginning of its speech data and the
+ server's response to the CONTROL request MUST contain this header
+ field with a value of "true" indicating a restart.
+
+ speak-restart = "Speak-Restart" ":" BOOLEAN CRLF
+
+8.4.15. Speak-Length
+
+ This header field MAY be specified in a CONTROL method to control the
+ maximum length of speech to speak, relative to the current speaking
+ point in the currently active SPEAK request. If numeric, the value
+ MUST be a positive integer. If a header field with a Tag unit is
+ specified, then the speech output continues until the tag is reached
+ or the SPEAK request is completed, whichever comes first. This
+ header field MAY be specified in a SPEAK request to indicate the
+ length to speak from the speech data and is relative to the point in
+ speech that the SPEAK request starts. The different speech length
+ units supported are synthesizer implementation dependent. If a
+ server does not support the specified unit, the server MUST respond
+ with a status-code of 409 "Unsupported Header Field Value".
+
+ speak-length = "Speak-Length" ":" positive-length-value
+ CRLF
+
+ positive-length-value = positive-speech-length
+ / text-speech-length
+
+ text-speech-length = 1*UTFCHAR SP "Tag"
+
+ positive-speech-length = 1*19DIGIT SP numeric-speech-unit
+
+ numeric-speech-unit = "Second"
+ / "Word"
+ / "Sentence"
+ / "Paragraph"
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 56]
+
+RFC 6787 MRCPv2 November 2012
+
+
+8.4.16. Load-Lexicon
+
+ This header field is used to indicate whether a lexicon has to be
+ loaded or unloaded. The value "true" means to load the lexicon if
+ not already loaded, and the value "false" means to unload the lexicon
+ if it is loaded. The default value for this header field is "true".
+ This header field MAY be specified in a DEFINE-LEXICON method.
+
+ load-lexicon = "Load-Lexicon" ":" BOOLEAN CRLF
+
+8.4.17. Lexicon-Search-Order
+
+ This header field is used to specify a list of active pronunciation
+ lexicon URIs and the search order among the active lexicons.
+ Lexicons specified within the SSML document take precedence over the
+ lexicons specified in this header field. This header field MAY be
+ specified in the SPEAK, SET-PARAMS, and GET-PARAMS methods.
+
+ lexicon-search-order = "Lexicon-Search-Order" ":"
+ "<" absoluteURI ">" *(" " "<" absoluteURI ">") CRLF
+
+8.5. Synthesizer Message Body
+
+ A synthesizer message can contain additional information associated
+ with the Request, Response, or Event in its message body.
+
+8.5.1. Synthesizer Speech Data
+
+ Marked-up text for the synthesizer to speak is specified as a typed
+ media entity in the message body. The speech data to be spoken by
+ the synthesizer can be specified inline by embedding the data in the
+ message body or by reference by providing a URI for accessing the
+ data. In either case, the data and the format used to markup the
+ speech needs to be of a content type supported by the server.
+
+ All MRCPv2 servers containing synthesizer resources MUST support both
+ plain text speech data and W3C's Speech Synthesis Markup Language
+ [W3C.REC-speech-synthesis-20040907] and hence MUST support the media
+ types 'text/plain' and 'application/ssml+xml'. Other formats MAY be
+ supported.
+
+ If the speech data is to be fetched by URI reference, the media type
+ 'text/uri-list' (see RFC 2483 [RFC2483]) is used to indicate one or
+ more URIs that, when dereferenced, will contain the content to be
+ spoken. If a list of speech URIs is specified, the resource MUST
+ speak the speech data provided by each URI in the order in which the
+ URIs are specified in the content.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 57]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ MRCPv2 clients and servers MUST support the 'multipart/mixed' media
+ type. This is the appropriate media type to use when providing a mix
+ of URI and inline speech data. Embedded within the multipart content
+ block, there MAY be content for the 'text/uri-list', 'application/
+ ssml+xml', and/or 'text/plain' media types. The character set and
+ encoding used in the speech data is specified according to standard
+ media type definitions. The multipart content MAY also contain
+ actual audio data. Clients may have recorded audio clips stored in
+ memory or on a local device and wish to play it as part of the SPEAK
+ request. The audio portions MAY be sent by the client as part of the
+ multipart content block. This audio is referenced in the speech
+ markup data that is another part in the multipart content block
+ according to the 'multipart/mixed' media type specification.
+
+ Content-Type:text/uri-list
+ Content-Length:...
+
+ http://www.example.com/ASR-Introduction.ssml
+ http://www.example.com/ASR-Document-Part1.ssml
+ http://www.example.com/ASR-Document-Part2.ssml
+ http://www.example.com/ASR-Conclusion.ssml
+
+ URI List Example
+
+
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Aldine Turnbet
+ and arrived at <break/>
+ <say-as interpret-as="vxml:time">0345p</say-as>.</s>
+
+ <s>The subject is <prosody
+ rate="-20%">ski trip</prosody></s>
+ </p>
+ </speak>
+
+ SSML Example
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 58]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Content-Type:multipart/mixed; boundary="break"
+
+ --break
+ Content-Type:text/uri-list
+ Content-Length:...
+
+ http://www.example.com/ASR-Introduction.ssml
+ http://www.example.com/ASR-Document-Part1.ssml
+ http://www.example.com/ASR-Document-Part2.ssml
+ http://www.example.com/ASR-Conclusion.ssml
+
+ --break
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Stephanie Williams
+ and arrived at <break/>
+ <say-as interpret-as="vxml:time">0342p</say-as>.</s>
+
+ <s>The subject is <prosody
+ rate="-20%">ski trip</prosody></s>
+ </p>
+ </speak>
+ --break--
+
+ Multipart Example
+
+8.5.2. Lexicon Data
+
+ Synthesizer lexicon data from the client to the server can be
+ provided inline or by reference. Either way, they are carried as
+ typed media in the message body of the MRCPv2 request message (see
+ Section 8.14).
+
+ When a lexicon is specified inline in the message, the client MUST
+ provide a Content-ID for that lexicon as part of the content header
+ fields. The server MUST store the lexicon associated with that
+ Content-ID for the duration of the session. A stored lexicon can be
+ overwritten by defining a new lexicon with the same Content-ID.
+
+
+
+Burnett & Shanmugham Standards Track [Page 59]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Lexicons that have been associated with a Content-ID can be
+ referenced through the 'session' URI scheme (see Section 13.6).
+
+ If lexicon data is specified by external URI reference, the media
+ type 'text/uri-list' (see RFC 2483 [RFC2483] ) is used to list the
+ one or more URIs that may be dereferenced to obtain the lexicon data.
+ All MRCPv2 servers MUST support the "http" and "https" URI access
+ mechanisms, and MAY support other mechanisms.
+
+ If the data in the message body consists of a mix of URI and inline
+ lexicon data, the 'multipart/mixed' media type is used. The
+ character set and encoding used in the lexicon data may be specified
+ according to standard media type definitions.
+
+8.6. SPEAK Method
+
+ The SPEAK request provides the synthesizer resource with the speech
+ text and initiates speech synthesis and streaming. The SPEAK method
+ MAY carry voice and prosody header fields that alter the behavior of
+ the voice being synthesized, as well as a typed media message body
+ containing the actual marked-up text to be spoken.
+
+ The SPEAK method implementation MUST do a fetch of all external URIs
+ that are part of that operation. If caching is implemented, this URI
+ fetching MUST conform to the cache-control hints and parameter header
+ fields associated with the method in deciding whether it is to be
+ fetched from cache or from the external server. If these hints/
+ parameters are not specified in the method, the values set for the
+ session using SET-PARAMS/GET-PARAMS apply. If it was not set for the
+ session, their default values apply.
+
+ When applying voice parameters, there are three levels of precedence.
+ The highest precedence are those specified within the speech markup
+ text, followed by those specified in the header fields of the SPEAK
+ request and hence that apply for that SPEAK request only, followed by
+ the session default values that can be set using the SET-PARAMS
+ request and apply for subsequent methods invoked during the session.
+
+ If the resource was idle at the time the SPEAK request arrived at the
+ server and the SPEAK method is being actively processed, the resource
+ responds immediately with a success status code and a request-state
+ of IN-PROGRESS.
+
+ If the resource is in the speaking or paused state when the SPEAK
+ method arrives at the server, i.e., it is in the middle of processing
+ a previous SPEAK request, the status returns success with a request-
+ state of PENDING. The server places the SPEAK request in the
+ synthesizer resource request queue. The request queue operates
+
+
+
+Burnett & Shanmugham Standards Track [Page 60]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ strictly FIFO: requests are processed serially in order of receipt.
+ If the current SPEAK fails, all SPEAK methods in the pending queue
+ are cancelled and each generates a SPEAK-COMPLETE event with a
+ Completion-Cause of "cancelled".
+
+ For the synthesizer resource, SPEAK is the only method that can
+ return a request-state of IN-PROGRESS or PENDING. When the text has
+ been synthesized and played into the media stream, the resource
+ issues a SPEAK-COMPLETE event with the request-id of the SPEAK
+ request and a request-state of COMPLETE.
+
+ C->S: MRCP/2.0 ... SPEAK 543257
+ Channel-Identifier:32AECB23433802@speechsynth
+ Voice-gender:neutral
+ Voice-Age:25
+ Prosody-volume:medium
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Stephanie Williams and arrived at
+ <break/>
+ <say-as interpret-as="vxml:time">0342p</say-as>.
+ </s>
+ <s>The subject is
+ <prosody rate="-20%">ski trip</prosody>
+ </s>
+ </p>
+ </speak>
+
+ S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@speechsynth
+ Speech-Marker:timestamp=857206027059
+
+ S->C: MRCP/2.0 ... SPEAK-COMPLETE 543257 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Completion-Cause:000 normal
+ Speech-Marker:timestamp=857206027059
+
+ SPEAK Example
+
+
+
+Burnett & Shanmugham Standards Track [Page 61]
+
+RFC 6787 MRCPv2 November 2012
+
+
+8.7. STOP
+
+ The STOP method from the client to the server tells the synthesizer
+ resource to stop speaking if it is speaking something.
+
+ The STOP request can be sent with an Active-Request-Id-List header
+ field to stop the zero or more specific SPEAK requests that may be in
+ queue and return a response status-code of 200 "Success". If no
+ Active-Request-Id-List header field is sent in the STOP request, the
+ server terminates all outstanding SPEAK requests.
+
+ If a STOP request successfully terminated one or more PENDING or
+ IN-PROGRESS SPEAK requests, then the response MUST contain an Active-
+ Request-Id-List header field enumerating the SPEAK request-ids that
+ were terminated. Otherwise, there is no Active-Request-Id-List
+ header field in the response. No SPEAK-COMPLETE events are sent for
+ such terminated requests.
+
+ If a SPEAK request that was IN-PROGRESS and speaking was stopped, the
+ next pending SPEAK request, if any, becomes IN-PROGRESS at the
+ resource and enters the speaking state.
+
+ If a SPEAK request that was IN-PROGRESS and paused was stopped, the
+ next pending SPEAK request, if any, becomes IN-PROGRESS and enters
+ the paused state.
+
+ C->S: MRCP/2.0 ... SPEAK 543258
+ Channel-Identifier:32AECB23433802@speechsynth
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Stephanie Williams and arrived at
+ <break/>
+ <say-as interpret-as="vxml:time">0342p</say-as>.</s>
+ <s>The subject is
+ <prosody rate="-20%">ski trip</prosody></s>
+ </p>
+ </speak>
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 62]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@speechsynth
+ Speech-Marker:timestamp=857206027059
+
+ C->S: MRCP/2.0 ... STOP 543259
+ Channel-Identifier:32AECB23433802@speechsynth
+
+ S->C: MRCP/2.0 ... 543259 200 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Active-Request-Id-List:543258
+ Speech-Marker:timestamp=857206039059
+
+ STOP Example
+
+8.8. BARGE-IN-OCCURRED
+
+ The BARGE-IN-OCCURRED method, when used with the synthesizer
+ resource, provides a client that has detected a barge-in-able event a
+ means to communicate the occurrence of the event to the synthesizer
+ resource.
+
+ This method is useful in two scenarios:
+
+ 1. The client has detected DTMF digits in the input media or some
+ other barge-in-able event and wants to communicate that to the
+ synthesizer resource.
+
+ 2. The recognizer resource and the synthesizer resource are in
+ different servers. In this case, the client acts as an
+ intermediary for the two servers. It receives an event from the
+ recognition resource and sends a BARGE-IN-OCCURRED request to the
+ synthesizer. In such cases, the BARGE-IN-OCCURRED method would
+ also have a Proxy-Sync-Id header field received from the resource
+ generating the original event.
+
+ If a SPEAK request is active with kill-on-barge-in enabled (see
+ Section 8.4.2), and the BARGE-IN-OCCURRED event is received, the
+ synthesizer MUST immediately stop streaming out audio. It MUST also
+ terminate any speech requests queued behind the current active one,
+ irrespective of whether or not they have barge-in enabled. If a
+ barge-in-able SPEAK request was playing and it was terminated, the
+ response MUST contain an Active-Request-Id-List header field listing
+ the request-ids of all SPEAK requests that were terminated. The
+ server generates no SPEAK-COMPLETE events for these requests.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 63]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ If there were no SPEAK requests terminated by the synthesizer
+ resource as a result of the BARGE-IN-OCCURRED method, the server MUST
+ respond to the BARGE-IN-OCCURRED with a status-code of 200 "Success",
+ and the response MUST NOT contain an Active-Request-Id-List header
+ field.
+
+ If the synthesizer and recognizer resources are part of the same
+ MRCPv2 session, they can be optimized for a quicker kill-on-barge-in
+ response if the recognizer and synthesizer interact directly. In
+ these cases, the client MUST still react to a START-OF-INPUT event
+ from the recognizer by invoking the BARGE-IN-OCCURRED method to the
+ synthesizer. The client MUST invoke the BARGE-IN-OCCURRED if it has
+ any outstanding requests to the synthesizer resource in either the
+ PENDING or IN-PROGRESS state.
+
+ C->S: MRCP/2.0 ... SPEAK 543258
+ Channel-Identifier:32AECB23433802@speechsynth
+ Voice-gender:neutral
+ Voice-Age:25
+ Prosody-volume:medium
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Stephanie Williams and arrived at
+ <break/>
+ <say-as interpret-as="vxml:time">0342p</say-as>.</s>
+ <s>The subject is
+ <prosody rate="-20%">ski trip</prosody></s>
+ </p>
+ </speak>
+
+ S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@speechsynth
+ Speech-Marker:timestamp=857206027059
+
+ C->S: MRCP/2.0 ... BARGE-IN-OCCURRED 543259
+ Channel-Identifier:32AECB23433802@speechsynth
+ Proxy-Sync-Id:987654321
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 64]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C:MRCP/2.0 ... 543259 200 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Active-Request-Id-List:543258
+ Speech-Marker:timestamp=857206039059
+
+ BARGE-IN-OCCURRED Example
+
+8.9. PAUSE
+
+ The PAUSE method from the client to the server tells the synthesizer
+ resource to pause speech output if it is speaking something. If a
+ PAUSE method is issued on a session when a SPEAK is not active, the
+ server MUST respond with a status-code of 402 "Method not valid in
+ this state". If a PAUSE method is issued on a session when a SPEAK
+ is active and paused, the server MUST respond with a status-code of
+ 200 "Success". If a SPEAK request was active, the server MUST return
+ an Active-Request-Id-List header field whose value contains the
+ request-id of the SPEAK request that was paused.
+
+ C->S: MRCP/2.0 ... SPEAK 543258
+ Channel-Identifier:32AECB23433802@speechsynth
+ Voice-gender:neutral
+ Voice-Age:25
+ Prosody-volume:medium
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Stephanie Williams and arrived at
+ <break/>
+ <say-as interpret-as="vxml:time">0342p</say-as>.</s>
+
+ <s>The subject is
+ <prosody rate="-20%">ski trip</prosody></s>
+ </p>
+ </speak>
+
+ S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@speechsynth
+ Speech-Marker:timestamp=857206027059
+
+
+
+Burnett & Shanmugham Standards Track [Page 65]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ C->S: MRCP/2.0 ... PAUSE 543259
+ Channel-Identifier:32AECB23433802@speechsynth
+
+ S->C: MRCP/2.0 ... 543259 200 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Active-Request-Id-List:543258
+
+ PAUSE Example
+
+8.10. RESUME
+
+ The RESUME method from the client to the server tells a paused
+ synthesizer resource to resume speaking. If a RESUME request is
+ issued on a session with no active SPEAK request, the server MUST
+ respond with a status-code of 402 "Method not valid in this state".
+ If a RESUME request is issued on a session with an active SPEAK
+ request that is speaking (i.e., not paused), the server MUST respond
+ with a status-code of 200 "Success". If a SPEAK request was paused,
+ the server MUST return an Active-Request-Id-List header field whose
+ value contains the request-id of the SPEAK request that was resumed.
+
+ C->S: MRCP/2.0 ... SPEAK 543258
+ Channel-Identifier:32AECB23433802@speechsynth
+ Voice-gender:neutral
+ Voice-age:25
+ Prosody-volume:medium
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Stephanie Williams and arrived at
+ <break/>
+ <say-as interpret-as="vxml:time">0342p</say-as>.</s>
+ <s>The subject is
+ <prosody rate="-20%">ski trip</prosody></s>
+ </p>
+ </speak>
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 66]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS@speechsynth
+ Channel-Identifier:32AECB23433802
+ Speech-Marker:timestamp=857206027059
+
+ C->S: MRCP/2.0 ... PAUSE 543259
+ Channel-Identifier:32AECB23433802@speechsynth
+
+ S->C: MRCP/2.0 ... 543259 200 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Active-Request-Id-List:543258
+
+ C->S: MRCP/2.0 ... RESUME 543260
+ Channel-Identifier:32AECB23433802@speechsynth
+
+ S->C: MRCP/2.0 ... 543260 200 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Active-Request-Id-List:543258
+
+ RESUME Example
+
+8.11. CONTROL
+
+ The CONTROL method from the client to the server tells a synthesizer
+ that is speaking to modify what it is speaking on the fly. This
+ method is used to request the synthesizer to jump forward or backward
+ in what it is speaking, change speaker rate, speaker parameters, etc.
+ It affects only the currently IN-PROGRESS SPEAK request. Depending
+ on the implementation and capability of the synthesizer resource, it
+ may or may not support the various modifications indicated by header
+ fields in the CONTROL request.
+
+ When a client invokes a CONTROL method to jump forward and the
+ operation goes beyond the end of the active SPEAK method's text, the
+ CONTROL request still succeeds. The active SPEAK request completes
+ and returns a SPEAK-COMPLETE event following the response to the
+ CONTROL method. If there are more SPEAK requests in the queue, the
+ synthesizer resource starts at the beginning of the next SPEAK
+ request in the queue.
+
+ When a client invokes a CONTROL method to jump backward and the
+ operation jumps to the beginning or beyond the beginning of the
+ speech data of the active SPEAK method, the CONTROL request still
+ succeeds. The response to the CONTROL request contains the speak-
+ restart header field, and the active SPEAK request restarts from the
+ beginning of its speech data.
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 67]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ These two behaviors can be used to rewind or fast-forward across
+ multiple speech requests, if the client wants to break up a speech
+ markup text into multiple SPEAK requests.
+
+ If a SPEAK request was active when the CONTROL method was received,
+ the server MUST return an Active-Request-Id-List header field
+ containing the request-id of the SPEAK request that was active.
+
+ C->S: MRCP/2.0 ... SPEAK 543258
+ Channel-Identifier:32AECB23433802@speechsynth
+ Voice-gender:neutral
+ Voice-age:25
+ Prosody-volume:medium
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Stephanie Williams
+ and arrived at <break/>
+ <say-as interpret-as="vxml:time">0342p</say-as>.</s>
+
+ <s>The subject is <prosody
+ rate="-20%">ski trip</prosody></s>
+ </p>
+ </speak>
+
+ S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@speechsynth
+ Speech-Marker:timestamp=857205016059
+
+ C->S: MRCP/2.0 ... CONTROL 543259
+ Channel-Identifier:32AECB23433802@speechsynth
+ Prosody-rate:fast
+
+ S->C: MRCP/2.0 ... 543259 200 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Active-Request-Id-List:543258
+ Speech-Marker:timestamp=857206027059
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 68]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ C->S: MRCP/2.0 ... CONTROL 543260
+ Channel-Identifier:32AECB23433802@speechsynth
+ Jump-Size:-15 Words
+
+ S->C: MRCP/2.0 ... 543260 200 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Active-Request-Id-List:543258
+ Speech-Marker:timestamp=857206039059
+
+ CONTROL Example
+
+8.12. SPEAK-COMPLETE
+
+ This is an Event message from the synthesizer resource to the client
+ that indicates the corresponding SPEAK request was completed. The
+ request-id field matches the request-id of the SPEAK request that
+ initiated the speech that just completed. The request-state field is
+ set to COMPLETE by the server, indicating that this is the last event
+ with the corresponding request-id. The Completion-Cause header field
+ specifies the cause code pertaining to the status and reason of
+ request completion, such as the SPEAK completed normally or because
+ of an error, kill-on-barge-in, etc.
+
+ C->S: MRCP/2.0 ... SPEAK 543260
+ Channel-Identifier:32AECB23433802@speechsynth
+ Voice-gender:neutral
+ Voice-age:25
+ Prosody-volume:medium
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Stephanie Williams
+ and arrived at <break/>
+ <say-as interpret-as="vxml:time">0342p</say-as>.</s>
+ <s>The subject is
+ <prosody rate="-20%">ski trip</prosody></s>
+ </p>
+ </speak>
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 69]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... 543260 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@speechsynth
+ Speech-Marker:timestamp=857206027059
+
+ S->C: MRCP/2.0 ... SPEAK-COMPLETE 543260 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Completion-Cause:000 normal
+ Speech-Marker:timestamp=857206039059
+
+ SPEAK-COMPLETE Example
+
+8.13. SPEECH-MARKER
+
+ This is an event generated by the synthesizer resource to the client
+ when the synthesizer encounters a marker tag in the speech markup it
+ is currently processing. The value of the request-id field MUST
+ match that of the corresponding SPEAK request. The request-state
+ field MUST have the value "IN-PROGRESS" as the speech is still not
+ complete. The value of the speech marker tag hit, describing where
+ the synthesizer is in the speech markup, MUST be returned in the
+ Speech-Marker header field, along with an NTP timestamp indicating
+ the instant in the output speech stream that the marker was
+ encountered. The SPEECH-MARKER event MUST also be generated with a
+ null marker value and output NTP timestamp when a SPEAK request in
+ Pending-State (i.e., in the queue) changes state to IN-PROGRESS and
+ starts speaking. The NTP timestamp MUST be synchronized with the RTP
+ timestamp used to generate the speech stream through standard RTCP
+ machinery.
+
+ C->S: MRCP/2.0 ... SPEAK 543261
+ Channel-Identifier:32AECB23433802@speechsynth
+ Voice-gender:neutral
+ Voice-age:25
+ Prosody-volume:medium
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Stephanie Williams
+ and arrived at <break/>
+
+
+
+Burnett & Shanmugham Standards Track [Page 70]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <say-as interpret-as="vxml:time">0342p</say-as>.</s>
+ <mark name="here"/>
+ <s>The subject is
+ <prosody rate="-20%">ski trip</prosody>
+ </s>
+ <mark name="ANSWER"/>
+ </p>
+ </speak>
+
+ S->C: MRCP/2.0 ... 543261 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@speechsynth
+ Speech-Marker:timestamp=857205015059
+
+ S->C: MRCP/2.0 ... SPEECH-MARKER 543261 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@speechsynth
+ Speech-Marker:timestamp=857206027059;here
+
+ S->C: MRCP/2.0 ... SPEECH-MARKER 543261 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@speechsynth
+ Speech-Marker:timestamp=857206039059;ANSWER
+
+ S->C: MRCP/2.0 ... SPEAK-COMPLETE 543261 COMPLETE
+ Channel-Identifier:32AECB23433802@speechsynth
+ Completion-Cause:000 normal
+ Speech-Marker:timestamp=857207689259;ANSWER
+
+ SPEECH-MARKER Example
+
+8.14. DEFINE-LEXICON
+
+ The DEFINE-LEXICON method, from the client to the server, provides a
+ lexicon and tells the server to load or unload the lexicon (see
+ Section 8.4.16). The media type of the lexicon is provided in the
+ Content-Type header (see Section 8.5.2). One such media type is
+ "application/pls+xml" for the Pronunciation Lexicon Specification
+ (PLS) [W3C.REC-pronunciation-lexicon-20081014] [RFC4267].
+
+ If the server resource is in the speaking or paused state, the server
+ MUST respond with a failure status-code of 402 "Method not valid in
+ this state".
+
+ If the resource is in the idle state and is able to successfully
+ load/unload the lexicon, the status MUST return a 200 "Success"
+ status-code and the request-state MUST be COMPLETE.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 71]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ If the synthesizer could not define the lexicon for some reason, for
+ example, because the download failed or the lexicon was in an
+ unsupported form, the server MUST respond with a failure status-code
+ of 407 and a Completion-Cause header field describing the failure
+ reason.
+
+9. Speech Recognizer Resource
+
+ The speech recognizer resource receives an incoming voice stream and
+ provides the client with an interpretation of what was spoken in
+ textual form.
+
+ The recognizer resource is controlled by MRCPv2 requests from the
+ client. The recognizer resource can both respond to these requests
+ and generate asynchronous events to the client to indicate conditions
+ of interest during the processing of the method.
+
+ This section applies to the following resource types.
+
+ 1. speechrecog
+
+ 2. dtmfrecog
+
+ The difference between the above two resources is in their level of
+ support for recognition grammars. The "dtmfrecog" resource type is
+ capable of recognizing only DTMF digits and hence accepts only DTMF
+ grammars. It only generates barge-in for DTMF inputs and ignores
+ speech. The "speechrecog" resource type can recognize regular speech
+ as well as DTMF digits and hence MUST support grammars describing
+ either speech or DTMF. This resource generates barge-in events for
+ speech and/or DTMF. By analyzing the grammars that are activated by
+ the RECOGNIZE method, it determines if a barge-in should occur for
+ speech and/or DTMF. When the recognizer decides it needs to generate
+ a barge-in, it also generates a START-OF-INPUT event to the client.
+ The recognizer resource MAY support recognition in the normal or
+ hotword modes or both (although note that a single "speechrecog"
+ resource does not perform normal and hotword mode recognition
+ simultaneously). For implementations where a single recognizer
+ resource does not support both modes, or simultaneous normal and
+ hotword recognition is desired, the two modes can be invoked through
+ separate resources allocated to the same SIP dialog (with different
+ MRCP session identifiers) and share the RTP audio feed.
+
+ The capabilities of the recognizer resource are enumerated below:
+
+ Normal Mode Recognition Normal mode recognition tries to match all
+ of the speech or DTMF against the grammar and returns a no-match
+ status if the input fails to match or the method times out.
+
+
+
+Burnett & Shanmugham Standards Track [Page 72]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Hotword Mode Recognition Hotword mode is where the recognizer looks
+ for a match against specific speech grammar or DTMF sequence and
+ ignores speech or DTMF that does not match. The recognition
+ completes only if there is a successful match of grammar, if the
+ client cancels the request, or if there is a non-input or
+ recognition timeout.
+
+ Voice Enrolled Grammars A recognizer resource MAY optionally support
+ Voice Enrolled Grammars. With this functionality, enrollment is
+ performed using a person's voice. For example, a list of contacts
+ can be created and maintained by recording the person's names
+ using the caller's voice. This technique is sometimes also called
+ speaker-dependent recognition.
+
+ Interpretation A recognizer resource MAY be employed strictly for
+ its natural language interpretation capabilities by supplying it
+ with a text string as input instead of speech. In this mode, the
+ resource takes text as input and produces an "interpretation" of
+ the input according to the supplied grammar.
+
+ Voice enrollment has the concept of an enrollment session. A session
+ to add a new phrase to a personal grammar involves the initial
+ enrollment followed by a repeat of enough utterances before
+ committing the new phrase to the personal grammar. Each time an
+ utterance is recorded, it is compared for similarity with the other
+ samples and a clash test is performed against other entries in the
+ personal grammar to ensure there are no similar and confusable
+ entries.
+
+ Enrollment is done using a recognizer resource. Controlling which
+ utterances are to be considered for enrollment of a new phrase is
+ done by setting a header field (see Section 9.4.39) in the Recognize
+ request.
+
+ Interpretation is accomplished through the INTERPRET method
+ (Section 9.20) and the Interpret-Text header field (Section 9.4.30).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 73]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.1. Recognizer State Machine
+
+ The recognizer resource maintains a state machine to process MRCPv2
+ requests from the client.
+
+ Idle Recognizing Recognized
+ State State State
+ | | |
+ |---------RECOGNIZE---->|---RECOGNITION-COMPLETE-->|
+ |<------STOP------------|<-----RECOGNIZE-----------|
+ | | |
+ | |--------| |-----------|
+ | START-OF-INPUT | GET-RESULT |
+ | |------->| |---------->|
+ |------------| | |
+ | DEFINE-GRAMMAR |----------| |
+ |<-----------| | START-INPUT-TIMERS |
+ | |<---------| |
+ |------| | |
+ | INTERPRET | |
+ |<-----| |------| |
+ | | RECOGNIZE |
+ |-------| |<-----| |
+ | STOP |
+ |<------| |
+ |<-------------------STOP--------------------------|
+ |<-------------------DEFINE-GRAMMAR----------------|
+
+ Recognizer State Machine
+
+ If a recognizer resource supports voice enrolled grammars, starting
+ an enrollment session does not change the state of the recognizer
+ resource. Once an enrollment session is started, then utterances are
+ enrolled by calling the RECOGNIZE method repeatedly. The state of
+ the speech recognizer resource goes from IDLE to RECOGNIZING state
+ each time RECOGNIZE is called.
+
+9.2. Recognizer Methods
+
+ The recognizer supports the following methods.
+
+ recognizer-method = recog-only-method
+ / enrollment-method
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 74]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ recog-only-method = "DEFINE-GRAMMAR"
+ / "RECOGNIZE"
+ / "INTERPRET"
+ / "GET-RESULT"
+ / "START-INPUT-TIMERS"
+ / "STOP"
+
+ It is OPTIONAL for a recognizer resource to support voice enrolled
+ grammars. If the recognizer resource does support voice enrolled
+ grammars, it MUST support the following methods.
+
+ enrollment-method = "START-PHRASE-ENROLLMENT"
+ / "ENROLLMENT-ROLLBACK"
+ / "END-PHRASE-ENROLLMENT"
+ / "MODIFY-PHRASE"
+ / "DELETE-PHRASE"
+
+9.3. Recognizer Events
+
+ The recognizer can generate the following events.
+
+ recognizer-event = "START-OF-INPUT"
+ / "RECOGNITION-COMPLETE"
+ / "INTERPRETATION-COMPLETE"
+
+9.4. Recognizer Header Fields
+
+ A recognizer message can contain header fields containing request
+ options and information to augment the Method, Response, or Event
+ message it is associated with.
+
+ recognizer-header = recog-only-header
+ / enrollment-header
+
+ recog-only-header = confidence-threshold
+ / sensitivity-level
+ / speed-vs-accuracy
+ / n-best-list-length
+ / no-input-timeout
+ / input-type
+ / recognition-timeout
+ / waveform-uri
+ / input-waveform-uri
+ / completion-cause
+ / completion-reason
+ / recognizer-context-block
+ / start-input-timers
+ / speech-complete-timeout
+
+
+
+Burnett & Shanmugham Standards Track [Page 75]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ / speech-incomplete-timeout
+ / dtmf-interdigit-timeout
+ / dtmf-term-timeout
+ / dtmf-term-char
+ / failed-uri
+ / failed-uri-cause
+ / save-waveform
+ / media-type
+ / new-audio-channel
+ / speech-language
+ / ver-buffer-utterance
+ / recognition-mode
+ / cancel-if-queue
+ / hotword-max-duration
+ / hotword-min-duration
+ / interpret-text
+ / dtmf-buffer-time
+ / clear-dtmf-buffer
+ / early-no-match
+
+ If a recognizer resource supports voice enrolled grammars, the
+ following header fields are also used.
+
+ enrollment-header = num-min-consistent-pronunciations
+ / consistency-threshold
+ / clash-threshold
+ / personal-grammar-uri
+ / enroll-utterance
+ / phrase-id
+ / phrase-nl
+ / weight
+ / save-best-waveform
+ / new-phrase-id
+ / confusable-phrases-uri
+ / abort-phrase-enrollment
+
+ For enrollment-specific header fields that can appear as part of
+ SET-PARAMS or GET-PARAMS methods, the following general rule applies:
+ the START-PHRASE-ENROLLMENT method MUST be invoked before these
+ header fields may be set through the SET-PARAMS method or retrieved
+ through the GET-PARAMS method.
+
+ Note that the Waveform-URI header field of the Recognizer resource
+ can also appear in the response to the END-PHRASE-ENROLLMENT method.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 76]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.4.1. Confidence-Threshold
+
+ When a recognizer resource recognizes or matches a spoken phrase with
+ some portion of the grammar, it associates a confidence level with
+ that match. The Confidence-Threshold header field tells the
+ recognizer resource what confidence level the client considers a
+ successful match. This is a float value between 0.0-1.0 indicating
+ the recognizer's confidence in the recognition. If the recognizer
+ determines that there is no candidate match with a confidence that is
+ greater than the confidence threshold, then it MUST return no-match
+ as the recognition result. This header field MAY occur in RECOGNIZE,
+ SET-PARAMS, or GET-PARAMS. The default value for this header field
+ is implementation specific, as is the interpretation of any specific
+ value for this header field. Although values for servers from
+ different vendors are not comparable, it is expected that clients
+ will tune this value over time for a given server.
+
+ confidence-threshold = "Confidence-Threshold" ":" FLOAT CRLF
+
+9.4.2. Sensitivity-Level
+
+ To filter out background noise and not mistake it for speech, the
+ recognizer resource supports a variable level of sound sensitivity.
+ The Sensitivity-Level header field is a float value between 0.0 and
+ 1.0 and allows the client to set the sensitivity level for the
+ recognizer. This header field MAY occur in RECOGNIZE, SET-PARAMS, or
+ GET-PARAMS. A higher value for this header field means higher
+ sensitivity. The default value for this header field is
+ implementation specific, as is the interpretation of any specific
+ value for this header field. Although values for servers from
+ different vendors are not comparable, it is expected that clients
+ will tune this value over time for a given server.
+
+ sensitivity-level = "Sensitivity-Level" ":" FLOAT CRLF
+
+9.4.3. Speed-Vs-Accuracy
+
+ Depending on the implementation and capability of the recognizer
+ resource it may be tunable towards Performance or Accuracy. Higher
+ accuracy may mean more processing and higher CPU utilization, meaning
+ fewer active sessions per server and vice versa. The value is a
+ float between 0.0 and 1.0. A value of 0.0 means fastest recognition.
+ A value of 1.0 means best accuracy. This header field MAY occur in
+ RECOGNIZE, SET-PARAMS, or GET-PARAMS. The default value for this
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 77]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ header field is implementation specific. Although values for servers
+ from different vendors are not comparable, it is expected that
+ clients will tune this value over time for a given server.
+
+ speed-vs-accuracy = "Speed-Vs-Accuracy" ":" FLOAT CRLF
+
+9.4.4. N-Best-List-Length
+
+ When the recognizer matches an incoming stream with the grammar, it
+ may come up with more than one alternative match because of
+ confidence levels in certain words or conversation paths. If this
+ header field is not specified, by default, the recognizer resource
+ returns only the best match above the confidence threshold. The
+ client, by setting this header field, can ask the recognition
+ resource to send it more than one alternative. All alternatives must
+ still be above the Confidence-Threshold. A value greater than one
+ does not guarantee that the recognizer will provide the requested
+ number of alternatives. This header field MAY occur in RECOGNIZE,
+ SET-PARAMS, or GET-PARAMS. The minimum value for this header field
+ is 1. The default value for this header field is 1.
+
+ n-best-list-length = "N-Best-List-Length" ":" 1*19DIGIT CRLF
+
+9.4.5. Input-Type
+
+ When the recognizer detects barge-in-able input and generates a
+ START-OF-INPUT event, that event MUST carry this header field to
+ specify whether the input that caused the barge-in was DTMF or
+ speech.
+
+ input-type = "Input-Type" ":" inputs CRLF
+ inputs = "speech" / "dtmf"
+
+9.4.6. No-Input-Timeout
+
+ When recognition is started and there is no speech detected for a
+ certain period of time, the recognizer can send a RECOGNITION-
+ COMPLETE event to the client with a Completion-Cause of "no-input-
+ timeout" and terminate the recognition operation. The client can use
+ the No-Input-Timeout header field to set this timeout. The value is
+ in milliseconds and can range from 0 to an implementation-specific
+ maximum value. This header field MAY occur in RECOGNIZE, SET-PARAMS,
+ or GET-PARAMS. The default value is implementation specific.
+
+ no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT CRLF
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 78]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.4.7. Recognition-Timeout
+
+ When recognition is started and there is no match for a certain
+ period of time, the recognizer can send a RECOGNITION-COMPLETE event
+ to the client and terminate the recognition operation. The
+ Recognition-Timeout header field allows the client to set this
+ timeout value. The value is in milliseconds. The value for this
+ header field ranges from 0 to an implementation-specific maximum
+ value. The default value is 10 seconds. This header field MAY occur
+ in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
+
+ recognition-timeout = "Recognition-Timeout" ":" 1*19DIGIT CRLF
+
+9.4.8. Waveform-URI
+
+ If the Save-Waveform header field is set to "true", the recognizer
+ MUST record the incoming audio stream of the recognition into a
+ stored form and provide a URI for the client to access it. This
+ header field MUST be present in the RECOGNITION-COMPLETE event if the
+ Save-Waveform header field was set to "true". The value of the
+ header field MUST be empty if there was some error condition
+ preventing the server from recording. Otherwise, the URI generated
+ by the server MUST be unambiguous across the server and all its
+ recognition sessions. The content associated with the URI MUST be
+ available to the client until the MRCPv2 session terminates.
+
+ Similarly, if the Save-Best-Waveform header field is set to "true",
+ the recognizer MUST save the audio stream for the best repetition of
+ the phrase that was used during the enrollment session. The
+ recognizer MUST then record the recognized audio and make it
+ available to the client by returning a URI in the Waveform-URI header
+ field in the response to the END-PHRASE-ENROLLMENT method. The value
+ of the header field MUST be empty if there was some error condition
+ preventing the server from recording. Otherwise, the URI generated
+ by the server MUST be unambiguous across the server and all its
+ recognition sessions. The content associated with the URI MUST be
+ available to the client until the MRCPv2 session terminates. See the
+ discussion on the sensitivity of saved waveforms in Section 12.
+
+ The server MUST also return the size in octets and the duration in
+ milliseconds of the recorded audio waveform as parameters associated
+ with the header field.
+
+ waveform-uri = "Waveform-URI" ":" ["<" uri ">"
+ ";" "size" "=" 1*19DIGIT
+ ";" "duration" "=" 1*19DIGIT] CRLF
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 79]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.4.9. Media-Type
+
+ This header field MAY be specified in the SET-PARAMS, GET-PARAMS, or
+ the RECOGNIZE methods and tells the server resource the media type in
+ which to store captured audio or video, such as the one captured and
+ returned by the Waveform-URI header field.
+
+ media-type = "Media-Type" ":" media-type-value
+ CRLF
+
+9.4.10. Input-Waveform-URI
+
+ This optional header field specifies a URI pointing to audio content
+ to be processed by the RECOGNIZE operation. This enables the client
+ to request recognition from a specified buffer or audio file.
+
+ input-waveform-uri = "Input-Waveform-URI" ":" uri CRLF
+
+9.4.11. Completion-Cause
+
+ This header field MUST be part of a RECOGNITION-COMPLETE event coming
+ from the recognizer resource to the client. It indicates the reason
+ behind the RECOGNIZE method completion. This header field MUST be
+ sent in the DEFINE-GRAMMAR and RECOGNIZE responses, if they return
+ with a failure status and a COMPLETE state. In the ABNF below, the
+ cause-code contains a numerical value selected from the Cause-Code
+ column of the following table. The cause-name contains the
+ corresponding token selected from the Cause-Name column.
+
+ completion-cause = "Completion-Cause" ":" cause-code SP
+ cause-name CRLF
+ cause-code = 3DIGIT
+ cause-name = *VCHAR
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 80]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ +------------+-----------------------+------------------------------+
+ | Cause-Code | Cause-Name | Description |
+ +------------+-----------------------+------------------------------+
+ | 000 | success | RECOGNIZE completed with a |
+ | | | match or DEFINE-GRAMMAR |
+ | | | succeeded in downloading and |
+ | | | compiling the grammar. |
+ | | | |
+ | 001 | no-match | RECOGNIZE completed, but no |
+ | | | match was found. |
+ | | | |
+ | 002 | no-input-timeout | RECOGNIZE completed without |
+ | | | a match due to a |
+ | | | no-input-timeout. |
+ | | | |
+ | 003 | hotword-maxtime | RECOGNIZE in hotword mode |
+ | | | completed without a match |
+ | | | due to a |
+ | | | recognition-timeout. |
+ | | | |
+ | 004 | grammar-load-failure | RECOGNIZE failed due to |
+ | | | grammar load failure. |
+ | | | |
+ | 005 | grammar-compilation- | RECOGNIZE failed due to |
+ | | failure | grammar compilation failure. |
+ | | | |
+ | 006 | recognizer-error | RECOGNIZE request terminated |
+ | | | prematurely due to a |
+ | | | recognizer error. |
+ | | | |
+ | 007 | speech-too-early | RECOGNIZE request terminated |
+ | | | because speech was too |
+ | | | early. This happens when the |
+ | | | audio stream is already |
+ | | | "in-speech" when the |
+ | | | RECOGNIZE request was |
+ | | | received. |
+ | | | |
+ | 008 | success-maxtime | RECOGNIZE request terminated |
+ | | | because speech was too long |
+ | | | but whatever was spoken till |
+ | | | that point was a full match. |
+ | | | |
+ | 009 | uri-failure | Failure accessing a URI. |
+ | | | |
+ | 010 | language-unsupported | Language not supported. |
+ | | | |
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 81]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ | 011 | cancelled | A new RECOGNIZE cancelled |
+ | | | this one, or a prior |
+ | | | RECOGNIZE failed while this |
+ | | | one was still in the queue. |
+ | | | |
+ | 012 | semantics-failure | Recognition succeeded, but |
+ | | | semantic interpretation of |
+ | | | the recognized input failed. |
+ | | | The RECOGNITION-COMPLETE |
+ | | | event MUST contain the |
+ | | | Recognition result with only |
+ | | | input text and no |
+ | | | interpretation. |
+ | | | |
+ | 013 | partial-match | Speech Incomplete Timeout |
+ | | | expired before there was a |
+ | | | full match. But whatever was |
+ | | | spoken till that point was a |
+ | | | partial match to one or more |
+ | | | grammars. |
+ | | | |
+ | 014 | partial-match-maxtime | The Recognition-Timeout |
+ | | | expired before full match |
+ | | | was achieved. But whatever |
+ | | | was spoken till that point |
+ | | | was a partial match to one |
+ | | | or more grammars. |
+ | | | |
+ | 015 | no-match-maxtime | The Recognition-Timeout |
+ | | | expired. Whatever was spoken |
+ | | | till that point did not |
+ | | | match any of the grammars. |
+ | | | This cause could also be |
+ | | | returned if the recognizer |
+ | | | does not support detecting |
+ | | | partial grammar matches. |
+ | | | |
+ | 016 | grammar-definition- | Any DEFINE-GRAMMAR error |
+ | | failure | other than |
+ | | | grammar-load-failure and |
+ | | | grammar-compilation-failure. |
+ +------------+-----------------------+------------------------------+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 82]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.4.12. Completion-Reason
+
+ This header field MAY be specified in a RECOGNITION-COMPLETE event
+ coming from the recognizer resource to the client. This contains the
+ reason text behind the RECOGNIZE request completion. The server uses
+ this header field to communicate text describing the reason for the
+ failure, such as the specific error encountered in parsing a grammar
+ markup.
+
+ The completion reason text is provided for client use in logs and for
+ debugging and instrumentation purposes. Clients MUST NOT interpret
+ the completion reason text.
+
+ completion-reason = "Completion-Reason" ":"
+ quoted-string CRLF
+
+9.4.13. Recognizer-Context-Block
+
+ This header field MAY be sent as part of the SET-PARAMS or GET-PARAMS
+ request. If the GET-PARAMS method contains this header field with no
+ value, then it is a request to the recognizer to return the
+ recognizer context block. The response to such a message MAY contain
+ a recognizer context block as a typed media message body. If the
+ server returns a recognizer context block, the response MUST contain
+ this header field and its value MUST match the Content-ID of the
+ corresponding media block.
+
+ If the SET-PARAMS method contains this header field, it MUST also
+ contain a message body containing the recognizer context data and a
+ Content-ID matching this header field value. This Content-ID MUST
+ match the Content-ID that came with the context data during the
+ GET-PARAMS operation.
+
+ An implementation choosing to use this mechanism to hand off
+ recognizer context data between servers MUST distinguish its
+ implementation-specific block of data by using an IANA-registered
+ content type in the IANA Media Type vendor tree.
+
+ recognizer-context-block = "Recognizer-Context-Block" ":"
+ [1*VCHAR] CRLF
+
+9.4.14. Start-Input-Timers
+
+ This header field MAY be sent as part of the RECOGNIZE request. A
+ value of false tells the recognizer to start recognition but not to
+ start the no-input timer yet. The recognizer MUST NOT start the
+ timers until the client sends a START-INPUT-TIMERS request to the
+ recognizer. This is useful in the scenario when the recognizer and
+
+
+
+Burnett & Shanmugham Standards Track [Page 83]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ synthesizer engines are not part of the same session. In such
+ configurations, when a kill-on-barge-in prompt is being played (see
+ Section 8.4.2), the client wants the RECOGNIZE request to be
+ simultaneously active so that it can detect and implement kill-on-
+ barge-in. However, the recognizer SHOULD NOT start the no-input
+ timers until the prompt is finished. The default value is "true".
+
+ start-input-timers = "Start-Input-Timers" ":" BOOLEAN CRLF
+
+9.4.15. Speech-Complete-Timeout
+
+ This header field specifies the length of silence required following
+ user speech before the speech recognizer finalizes a result (either
+ accepting it or generating a no-match result). The Speech-Complete-
+ Timeout value applies when the recognizer currently has a complete
+ match against an active grammar, and specifies how long the
+ recognizer MUST wait for more input before declaring a match. By
+ contrast, the Speech-Incomplete-Timeout is used when the speech is an
+ incomplete match to an active grammar. The value is in milliseconds.
+
+ speech-complete-timeout = "Speech-Complete-Timeout" ":" 1*19DIGIT CRLF
+
+ A long Speech-Complete-Timeout value delays the result to the client
+ and therefore makes the application's response to a user slow. A
+ short Speech-Complete-Timeout may lead to an utterance being broken
+ up inappropriately. Reasonable speech complete timeout values are
+ typically in the range of 0.3 seconds to 1.0 seconds. The value for
+ this header field ranges from 0 to an implementation-specific maximum
+ value. The default value for this header field is implementation
+ specific. This header field MAY occur in RECOGNIZE, SET-PARAMS, or
+ GET-PARAMS.
+
+9.4.16. Speech-Incomplete-Timeout
+
+ This header field specifies the required length of silence following
+ user speech after which a recognizer finalizes a result. The
+ incomplete timeout applies when the speech prior to the silence is an
+ incomplete match of all active grammars. In this case, once the
+ timeout is triggered, the partial result is rejected (with a
+ Completion-Cause of "partial-match"). The value is in milliseconds.
+ The value for this header field ranges from 0 to an implementation-
+ specific maximum value. The default value for this header field is
+ implementation specific.
+
+ speech-incomplete-timeout = "Speech-Incomplete-Timeout" ":" 1*19DIGIT
+ CRLF
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 84]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ The Speech-Incomplete-Timeout also applies when the speech prior to
+ the silence is a complete match of an active grammar, but where it is
+ possible to speak further and still match the grammar. By contrast,
+ the Speech-Complete-Timeout is used when the speech is a complete
+ match to an active grammar and no further spoken words can continue
+ to represent a match.
+
+ A long Speech-Incomplete-Timeout value delays the result to the
+ client and therefore makes the application's response to a user slow.
+ A short Speech-Incomplete-Timeout may lead to an utterance being
+ broken up inappropriately.
+
+ The Speech-Incomplete-Timeout is usually longer than the Speech-
+ Complete-Timeout to allow users to pause mid-utterance (for example,
+ to breathe). This header field MAY occur in RECOGNIZE, SET-PARAMS,
+ or GET-PARAMS.
+
+9.4.17. DTMF-Interdigit-Timeout
+
+ This header field specifies the inter-digit timeout value to use when
+ recognizing DTMF input. The value is in milliseconds. The value for
+ this header field ranges from 0 to an implementation-specific maximum
+ value. The default value is 5 seconds. This header field MAY occur
+ in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
+
+ dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":" 1*19DIGIT CRLF
+
+9.4.18. DTMF-Term-Timeout
+
+ This header field specifies the terminating timeout to use when
+ recognizing DTMF input. The DTMF-Term-Timeout applies only when no
+ additional input is allowed by the grammar; otherwise, the
+ DTMF-Interdigit-Timeout applies. The value is in milliseconds. The
+ value for this header field ranges from 0 to an implementation-
+ specific maximum value. The default value is 10 seconds. This
+ header field MAY occur in RECOGNIZE, SET-PARAMS, or GET-PARAMS.
+
+ dtmf-term-timeout = "DTMF-Term-Timeout" ":" 1*19DIGIT CRLF
+
+9.4.19. DTMF-Term-Char
+
+ This header field specifies the terminating DTMF character for DTMF
+ input recognition. The default value is NULL, which is indicated by
+ an empty header field value. This header field MAY occur in
+ RECOGNIZE, SET-PARAMS, or GET-PARAMS.
+
+ dtmf-term-char = "DTMF-Term-Char" ":" VCHAR CRLF
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 85]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.4.20. Failed-URI
+
+ When a recognizer needs to fetch or access a URI and the access
+ fails, the server SHOULD provide the failed URI in this header field
+ in the method response, unless there are multiple URI failures, in
+ which case one of the failed URIs MUST be provided in this header
+ field in the method response.
+
+ failed-uri = "Failed-URI" ":" absoluteURI CRLF
+
+9.4.21. Failed-URI-Cause
+
+ When a recognizer method needs a recognizer to fetch or access a URI
+ and the access fails, the server MUST provide the URI-specific or
+ protocol-specific response code for the URI in the Failed-URI header
+ field through this header field in the method response. The value
+ encoding is UTF-8 (RFC 3629 [RFC3629]) to accommodate any access
+ protocol, some of which might have a response string instead of a
+ numeric response code.
+
+ failed-uri-cause = "Failed-URI-Cause" ":" 1*UTFCHAR CRLF
+
+9.4.22. Save-Waveform
+
+ This header field allows the client to request the recognizer
+ resource to save the audio input to the recognizer. The recognizer
+ resource MUST then attempt to record the recognized audio, without
+ endpointing, and make it available to the client in the form of a URI
+ returned in the Waveform-URI header field in the RECOGNITION-COMPLETE
+ event. If there was an error in recording the stream or the audio
+ content is otherwise not available, the recognizer MUST return an
+ empty Waveform-URI header field. The default value for this field is
+ "false". This header field MAY occur in RECOGNIZE, SET-PARAMS, or
+ GET-PARAMS. See the discussion on the sensitivity of saved waveforms
+ in Section 12.
+
+ save-waveform = "Save-Waveform" ":" BOOLEAN CRLF
+
+9.4.23. New-Audio-Channel
+
+ This header field MAY be specified in a RECOGNIZE request and allows
+ the client to tell the server that, from this point on, further input
+ audio comes from a different audio source, channel, or speaker. If
+ the recognizer resource had collected any input statistics or
+ adaptation state, the recognizer resource MUST do what is appropriate
+ for the specific recognition technology, which includes but is not
+ limited to discarding any collected input statistics or adaptation
+ state before starting the RECOGNIZE request. Note that if there are
+
+
+
+Burnett & Shanmugham Standards Track [Page 86]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ multiple resources that are sharing a media stream and are collecting
+ or using this data, and the client issues this header field to one of
+ the resources, the reset operation applies to all resources that use
+ the shared media stream. This helps in a number of use cases,
+ including where the client wishes to reuse an open recognition
+ session with an existing media session for multiple telephone calls.
+
+ new-audio-channel = "New-Audio-Channel" ":" BOOLEAN
+ CRLF
+
+9.4.24. Speech-Language
+
+ This header field specifies the language of recognition grammar data
+ within a session or request, if it is not specified within the data.
+ The value of this header field MUST follow RFC 5646 [RFC5646] for its
+ values. This MAY occur in DEFINE-GRAMMAR, RECOGNIZE, SET-PARAMS, or
+ GET-PARAMS requests.
+
+ speech-language = "Speech-Language" ":" 1*VCHAR CRLF
+
+9.4.25. Ver-Buffer-Utterance
+
+ This header field lets the client request the server to buffer the
+ utterance associated with this recognition request into a buffer
+ available to a co-resident verifier resource. The buffer is shared
+ across resources within a session and is allocated when a verifier
+ resource is added to this session. The client MUST NOT send this
+ header field unless a verifier resource is instantiated for the
+ session. The buffer is released when the verifier resource is
+ released from the session.
+
+9.4.26. Recognition-Mode
+
+ This header field specifies what mode the RECOGNIZE method will
+ operate in. The value choices are "normal" or "hotword". If the
+ value is "normal", the RECOGNIZE starts matching speech and DTMF to
+ the grammars specified in the RECOGNIZE request. If any portion of
+ the speech does not match the grammar, the RECOGNIZE command
+ completes with a no-match status. Timers may be active to detect
+ speech in the audio (see Section 9.4.14), so the RECOGNIZE method may
+ complete because of a timeout waiting for speech. If the value of
+ this header field is "hotword", the RECOGNIZE method operates in
+ hotword mode, where it only looks for the particular keywords or DTMF
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 87]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ sequences specified in the grammar and ignores silence or other
+ speech in the audio stream. The default value for this header field
+ is "normal". This header field MAY occur on the RECOGNIZE method.
+
+ recognition-mode = "Recognition-Mode" ":"
+ "normal" / "hotword" CRLF
+
+9.4.27. Cancel-If-Queue
+
+ This header field specifies what will happen if the client attempts
+ to invoke another RECOGNIZE method when this RECOGNIZE request is
+ already in progress for the resource. The value for this header
+ field is a Boolean. A value of "true" means the server MUST
+ terminate this RECOGNIZE request, with a Completion-Cause of
+ "cancelled", if the client issues another RECOGNIZE request for the
+ same resource. A value of "false" for this header field indicates to
+ the server that this RECOGNIZE request will continue to completion,
+ and if the client issues more RECOGNIZE requests to the same
+ resource, they are queued. When the currently active RECOGNIZE
+ request is stopped or completes with a successful match, the first
+ RECOGNIZE method in the queue becomes active. If the current
+ RECOGNIZE fails, all RECOGNIZE methods in the pending queue are
+ cancelled, and each generates a RECOGNITION-COMPLETE event with a
+ Completion-Cause of "cancelled". This header field MUST be present
+ in every RECOGNIZE request. There is no default value.
+
+ cancel-if-queue = "Cancel-If-Queue" ":" BOOLEAN CRLF
+
+9.4.28. Hotword-Max-Duration
+
+ This header field MAY be sent in a hotword mode RECOGNIZE request.
+ It specifies the maximum length of an utterance (in seconds) that
+ will be considered for hotword recognition. This header field, along
+ with Hotword-Min-Duration, can be used to tune performance by
+ preventing the recognizer from evaluating utterances that are too
+ short or too long to be one of the hotwords in the grammar(s). The
+ value is in milliseconds. The default is implementation dependent.
+ If present in a RECOGNIZE request specifying a mode other than
+ "hotword", the header field is ignored.
+
+ hotword-max-duration = "Hotword-Max-Duration" ":" 1*19DIGIT
+ CRLF
+
+9.4.29. Hotword-Min-Duration
+
+ This header field MAY be sent in a hotword mode RECOGNIZE request.
+ It specifies the minimum length of an utterance (in seconds) that
+ will be considered for hotword recognition. This header field, along
+
+
+
+Burnett & Shanmugham Standards Track [Page 88]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ with Hotword-Max-Duration, can be used to tune performance by
+ preventing the recognizer from evaluating utterances that are too
+ short or too long to be one of the hotwords in the grammar(s). The
+ value is in milliseconds. The default value is implementation
+ dependent. If present in a RECOGNIZE request specifying a mode other
+ than "hotword", the header field is ignored.
+
+ hotword-min-duration = "Hotword-Min-Duration" ":" 1*19DIGIT CRLF
+
+9.4.30. Interpret-Text
+
+ The value of this header field is used to provide a pointer to the
+ text for which a natural language interpretation is desired. The
+ value is either a URI or text. If the value is a URI, it MUST be a
+ Content-ID that refers to an entity of type 'text/plain' in the body
+ of the message. Otherwise, the server MUST treat the value as the
+ text to be interpreted. This header field MUST be used when invoking
+ the INTERPRET method.
+
+ interpret-text = "Interpret-Text" ":" 1*VCHAR CRLF
+
+9.4.31. DTMF-Buffer-Time
+
+ This header field MAY be specified in a GET-PARAMS or SET-PARAMS
+ method and is used to specify the amount of time, in milliseconds, of
+ the type-ahead buffer for the recognizer. This is the buffer that
+ collects DTMF digits as they are pressed even when there is no
+ RECOGNIZE command active. When a subsequent RECOGNIZE method is
+ received, it MUST look to this buffer to match the RECOGNIZE request.
+ If the digits in the buffer are not sufficient, then it can continue
+ to listen to more digits to match the grammar. The default size of
+ this DTMF buffer is platform specific.
+
+ dtmf-buffer-time = "DTMF-Buffer-Time" ":" 1*19DIGIT CRLF
+
+9.4.32. Clear-DTMF-Buffer
+
+ This header field MAY be specified in a RECOGNIZE method and is used
+ to tell the recognizer to clear the DTMF type-ahead buffer before
+ starting the RECOGNIZE. The default value of this header field is
+ "false", which does not clear the type-ahead buffer before starting
+ the RECOGNIZE method. If this header field is specified to be
+ "true", then the RECOGNIZE will clear the DTMF buffer before starting
+ recognition. This means digits pressed by the caller before the
+ RECOGNIZE command was issued are discarded.
+
+ clear-dtmf-buffer = "Clear-DTMF-Buffer" ":" BOOLEAN CRLF
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 89]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.4.33. Early-No-Match
+
+ This header field MAY be specified in a RECOGNIZE method and is used
+ to tell the recognizer that it MUST NOT wait for the end of speech
+ before processing the collected speech to match active grammars. A
+ value of "true" indicates the recognizer MUST do early matching. The
+ default value for this header field if not specified is "false". If
+ the recognizer does not support the processing of the collected audio
+ before the end of speech, this header field can be safely ignored.
+
+ early-no-match = "Early-No-Match" ":" BOOLEAN CRLF
+
+9.4.34. Num-Min-Consistent-Pronunciations
+
+ This header field MAY be specified in a START-PHRASE-ENROLLMENT,
+ SET-PARAMS, or GET-PARAMS method and is used to specify the minimum
+ number of consistent pronunciations that must be obtained to voice
+ enroll a new phrase. The minimum value is 1. The default value is
+ implementation specific and MAY be greater than 1.
+
+ num-min-consistent-pronunciations =
+ "Num-Min-Consistent-Pronunciations" ":" 1*19DIGIT CRLF
+
+9.4.35. Consistency-Threshold
+
+ This header field MAY be sent as part of the START-PHRASE-ENROLLMENT,
+ SET-PARAMS, or GET-PARAMS method. Used during voice enrollment, this
+ header field specifies how similar to a previously enrolled
+ pronunciation of the same phrase an utterance needs to be in order to
+ be considered "consistent". The higher the threshold, the closer the
+ match between an utterance and previous pronunciations must be for
+ the pronunciation to be considered consistent. The range for this
+ threshold is a float value between 0.0 and 1.0. The default value
+ for this header field is implementation specific.
+
+ consistency-threshold = "Consistency-Threshold" ":" FLOAT CRLF
+
+9.4.36. Clash-Threshold
+
+ This header field MAY be sent as part of the START-PHRASE-ENROLLMENT,
+ SET-PARAMS, or GET-PARAMS method. Used during voice enrollment, this
+ header field specifies how similar the pronunciations of two
+ different phrases can be before they are considered to be clashing.
+ For example, pronunciations of phrases such as "John Smith" and "Jon
+ Smits" may be so similar that they are difficult to distinguish
+ correctly. A smaller threshold reduces the number of clashes
+ detected. The range for this threshold is a float value between 0.0
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 90]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ and 1.0. The default value for this header field is implementation
+ specific. Clash testing can be turned off completely by setting the
+ Clash-Threshold header field value to 0.
+
+ clash-threshold = "Clash-Threshold" ":" FLOAT CRLF
+
+9.4.37. Personal-Grammar-URI
+
+ This header field specifies the speaker-trained grammar to be used or
+ referenced during enrollment operations. Phrases are added to this
+ grammar during enrollment. For example, a contact list for user
+ "Jeff" could be stored at the Personal-Grammar-URI
+ "http://myserver.example.com/myenrollmentdb/jeff-list". The
+ generated grammar syntax MAY be implementation specific. There is no
+ default value for this header field. This header field MAY be sent
+ as part of the START-PHRASE-ENROLLMENT, SET-PARAMS, or GET-PARAMS
+ method.
+
+ personal-grammar-uri = "Personal-Grammar-URI" ":" uri CRLF
+
+9.4.38. Enroll-Utterance
+
+ This header field MAY be specified in the RECOGNIZE method. If this
+ header field is set to "true" and an Enrollment is active, the
+ RECOGNIZE command MUST add the collected utterance to the personal
+ grammar that is being enrolled. The way in which this occurs is
+ engine specific and may be an area of future standardization. The
+ default value for this header field is "false".
+
+ enroll-utterance = "Enroll-Utterance" ":" BOOLEAN CRLF
+
+9.4.39. Phrase-Id
+
+ This header field in a request identifies a phrase in an existing
+ personal grammar for which enrollment is desired. It is also
+ returned to the client in the RECOGNIZE complete event. This header
+ field MAY occur in START-PHRASE-ENROLLMENT, MODIFY-PHRASE, or DELETE-
+ PHRASE requests. There is no default value for this header field.
+
+ phrase-id = "Phrase-ID" ":" 1*VCHAR CRLF
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 91]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.4.40. Phrase-NL
+
+ This string specifies the interpreted text to be returned when the
+ phrase is recognized. This header field MAY occur in START-PHRASE-
+ ENROLLMENT and MODIFY-PHRASE requests. There is no default value for
+ this header field.
+
+ phrase-nl = "Phrase-NL" ":" 1*UTFCHAR CRLF
+
+9.4.41. Weight
+
+ The value of this header field represents the occurrence likelihood
+ of a phrase in an enrolled grammar. When using grammar enrollment,
+ the system is essentially constructing a grammar segment consisting
+ of a list of possible match phrases. This can be thought of to be
+ similar to the dynamic construction of a <one-of> tag in the W3C
+ grammar specification. Each enrolled-phrase becomes an item in the
+ list that can be matched against spoken input similar to the <item>
+ within a <one-of> list. This header field allows you to assign a
+ weight to the phrase (i.e., <item> entry) in the <one-of> list that
+ is enrolled. Grammar weights are normalized to a sum of one at
+ grammar compilation time, so a weight value of 1 for each phrase in
+ an enrolled grammar list indicates all items in that list have the
+ same weight. This header field MAY occur in START-PHRASE-ENROLLMENT
+ and MODIFY-PHRASE requests. The default value for this header field
+ is implementation specific.
+
+ weight = "Weight" ":" FLOAT CRLF
+
+9.4.42. Save-Best-Waveform
+
+ This header field allows the client to request the recognizer
+ resource to save the audio stream for the best repetition of the
+ phrase that was used during the enrollment session. The recognizer
+ MUST attempt to record the recognized audio and make it available to
+ the client in the form of a URI returned in the Waveform-URI header
+ field in the response to the END-PHRASE-ENROLLMENT method. If there
+ was an error in recording the stream or the audio data is otherwise
+ not available, the recognizer MUST return an empty Waveform-URI
+ header field. This header field MAY occur in the START-PHRASE-
+ ENROLLMENT, SET-PARAMS, and GET-PARAMS methods.
+
+ save-best-waveform = "Save-Best-Waveform" ":" BOOLEAN CRLF
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 92]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.4.43. New-Phrase-Id
+
+ This header field replaces the ID used to identify the phrase in a
+ personal grammar. The recognizer returns the new ID when using an
+ enrollment grammar. This header field MAY occur in MODIFY-PHRASE
+ requests.
+
+ new-phrase-id = "New-Phrase-ID" ":" 1*VCHAR CRLF
+
+9.4.44. Confusable-Phrases-URI
+
+ This header field specifies a grammar that defines invalid phrases
+ for enrollment. For example, typical applications do not allow an
+ enrolled phrase that is also a command word. This header field MAY
+ occur in RECOGNIZE requests that are part of an enrollment session.
+
+ confusable-phrases-uri = "Confusable-Phrases-URI" ":" uri CRLF
+
+9.4.45. Abort-Phrase-Enrollment
+
+ This header field MAY be specified in the END-PHRASE-ENROLLMENT
+ method to abort the phrase enrollment, rather than committing the
+ phrase to the personal grammar.
+
+ abort-phrase-enrollment = "Abort-Phrase-Enrollment" ":"
+ BOOLEAN CRLF
+
+9.5. Recognizer Message Body
+
+ A recognizer message can carry additional data associated with the
+ request, response, or event. The client MAY provide the grammar to
+ be recognized in DEFINE-GRAMMAR or RECOGNIZE requests. When one or
+ more grammars are specified using the DEFINE-GRAMMAR method, the
+ server MUST attempt to fetch, compile, and optimize the grammar
+ before returning a response to the DEFINE-GRAMMAR method. A
+ RECOGNIZE request MUST completely specify the grammars to be active
+ during the recognition operation, except when the RECOGNIZE method is
+ being used to enroll a grammar. During grammar enrollment, such
+ grammars are OPTIONAL. The server resource sends the recognition
+ results in the RECOGNITION-COMPLETE event and the GET-RESULT
+ response. Grammars and recognition results are carried in the
+ message body of the corresponding MRCPv2 messages.
+
+9.5.1. Recognizer Grammar Data
+
+ Recognizer grammar data from the client to the server can be provided
+ inline or by reference. Either way, grammar data is carried as typed
+ media entities in the message body of the RECOGNIZE or DEFINE-GRAMMAR
+
+
+
+Burnett & Shanmugham Standards Track [Page 93]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ request. All MRCPv2 servers MUST accept grammars in the XML form
+ (media type 'application/srgs+xml') of the W3C's XML-based Speech
+ Grammar Markup Format (SRGS) [W3C.REC-speech-grammar-20040316] and
+ MAY accept grammars in other formats. Examples include but are not
+ limited to:
+
+ o the ABNF form (media type 'application/srgs') of SRGS
+
+ o Sun's Java Speech Grammar Format (JSGF)
+ [refs.javaSpeechGrammarFormat]
+
+ Additionally, MRCPv2 servers MAY support the Semantic Interpretation
+ for Speech Recognition (SISR)
+ [W3C.REC-semantic-interpretation-20070405] specification.
+
+ When a grammar is specified inline in the request, the client MUST
+ provide a Content-ID for that grammar as part of the content header
+ fields. If there is no space on the server to store the inline
+ grammar, the request MUST return with a Completion-Cause code of 016
+ "grammar-definition-failure". Otherwise, the server MUST associate
+ the inline grammar block with that Content-ID and MUST store it on
+ the server for the duration of the session. However, if the
+ Content-ID is redefined later in the session through a subsequent
+ DEFINE-GRAMMAR, the inline grammar previously associated with the
+ Content-ID MUST be freed. If the Content-ID is redefined through a
+ subsequent DEFINE-GRAMMAR with an empty message body (i.e., no
+ grammar definition), then in addition to freeing any grammar
+ previously associated with the Content-ID, the server MUST clear all
+ bindings and associations to the Content-ID. Unless and until
+ subsequently redefined, this URI MUST be interpreted by the server as
+ one that has never been set.
+
+ Grammars that have been associated with a Content-ID can be
+ referenced through the 'session' URI scheme (see Section 13.6). For
+ example:
+ session:help@root-level.store
+
+ Grammar data MAY be specified using external URI references. To do
+ so, the client uses a body of media type 'text/uri-list' (see RFC
+ 2483 [RFC2483] ) to list the one or more URIs that point to the
+ grammar data. The client can use a body of media type 'text/
+ grammar-ref-list' (see Section 13.5.1) if it wants to assign weights
+ to the list of grammar URI. All MRCPv2 servers MUST support grammar
+ access using the 'http' and 'https' URI schemes.
+
+ If the grammar data the client wishes to be used on a request
+ consists of a mix of URI and inline grammar data, the client uses the
+ 'multipart/mixed' media type to enclose the 'text/uri-list',
+
+
+
+Burnett & Shanmugham Standards Track [Page 94]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ 'application/srgs', or 'application/srgs+xml' content entities. The
+ character set and encoding used in the grammar data are specified
+ using to standard media type definitions.
+
+ When more than one grammar URI or inline grammar block is specified
+ in a message body of the RECOGNIZE request, the server interprets
+ this as a list of grammar alternatives to match against.
+
+ Content-Type:application/srgs+xml
+ Content-ID:<request1@form-level.store>
+ Content-Length:...
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0" root="request">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ <!-- multiple language attachment to a token -->
+ <rule id="people1">
+ <token lexicon="en-US,fr-CA"> Robert </token>
+ </rule>
+
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 95]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <!-- the equivalent single-language attachment expansion -->
+ <rule id="people2">
+ <one-of>
+ <item xml:lang="en-US">Robert</item>
+ <item xml:lang="fr-CA">Robert</item>
+ </one-of>
+ </rule>
+
+ </grammar>
+
+ SRGS Grammar Example
+
+
+ Content-Type:text/uri-list
+ Content-Length:...
+
+ session:help@root-level.store
+ http://www.example.com/Directory-Name-List.grxml
+ http://www.example.com/Department-List.grxml
+ http://www.example.com/TAC-Contact-List.grxml
+ session:menu1@menu-level.store
+
+ Grammar Reference Example
+
+
+ Content-Type:multipart/mixed; boundary="break"
+
+ --break
+ Content-Type:text/uri-list
+ Content-Length:...
+
+ http://www.example.com/Directory-Name-List.grxml
+ http://www.example.com/Department-List.grxml
+ http://www.example.com/TAC-Contact-List.grxml
+
+ --break
+ Content-Type:application/srgs+xml
+ Content-ID:<request1@form-level.store>
+ Content-Length:...
+
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 96]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ <!-- multiple language attachment to a token -->
+ <rule id="people1">
+ <token lexicon="en-US,fr-CA"> Robert </token>
+ </rule>
+
+ <!-- the equivalent single-language attachment expansion -->
+ <rule id="people2">
+ <one-of>
+ <item xml:lang="en-US">Robert</item>
+ <item xml:lang="fr-CA">Robert</item>
+ </one-of>
+ </rule>
+
+ </grammar>
+ --break--
+
+ Mixed Grammar Reference Example
+
+9.5.2. Recognizer Result Data
+
+ Recognition results are returned to the client in the message body of
+ the RECOGNITION-COMPLETE event or the GET-RESULT response message as
+ described in Section 6.3. Element and attribute descriptions for the
+ recognition portion of the NLSML format are provided in Section 9.6
+ with a normative definition of the schema in Section 16.1.
+
+
+
+Burnett & Shanmugham Standards Track [Page 97]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="http://www.example.com/theYesNoGrammar">
+ <interpretation>
+ <instance>
+ <ex:response>yes</ex:response>
+ </instance>
+ <input>OK</input>
+ </interpretation>
+ </result>
+
+ Result Example
+
+9.5.3. Enrollment Result Data
+
+ Enrollment results are returned to the client in the message body of
+ the RECOGNITION-COMPLETE event as described in Section 6.3. Element
+ and attribute descriptions for the enrollment portion of the NLSML
+ format are provided in Section 9.7 with a normative definition of the
+ schema in Section 16.2.
+
+9.5.4. Recognizer Context Block
+
+ When a client changes servers while operating on the behalf of the
+ same incoming communication session, this header field allows the
+ client to collect a block of opaque data from one server and provide
+ it to another server. This capability is desirable if the client
+ needs different language support or because the server issued a
+ redirect. Here, the first recognizer resource may have collected
+ acoustic and other data during its execution of recognition methods.
+ After a server switch, communicating this data may allow the
+ recognizer resource on the new server to provide better recognition.
+ This block of data is implementation specific and MUST be carried as
+ media type 'application/octets' in the body of the message.
+
+ This block of data is communicated in the SET-PARAMS and GET-PARAMS
+ method/response messages. In the GET-PARAMS method, if an empty
+ Recognizer-Context-Block header field is present, then the recognizer
+ SHOULD return its vendor-specific context block, if any, in the
+ message body as an entity of media type 'application/octets' with a
+ specific Content-ID. The Content-ID value MUST also be specified in
+ the Recognizer-Context-Block header field in the GET-PARAMS response.
+ The SET-PARAMS request wishing to provide this vendor-specific data
+ MUST send it in the message body as a typed entity with the same
+
+
+
+Burnett & Shanmugham Standards Track [Page 98]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Content-ID that it received from the GET-PARAMS. The Content-ID MUST
+ also be sent in the Recognizer-Context-Block header field of the
+ SET-PARAMS message.
+
+ Each speech recognition implementation choosing to use this mechanism
+ to hand off recognizer context data among servers MUST distinguish
+ its implementation-specific block of data from other implementations
+ by choosing a Content-ID that is recognizable among the participating
+ servers and unlikely to collide with values chosen by another
+ implementation.
+
+9.6. Recognizer Results
+
+ The recognizer portion of NLSML (see Section 6.3.1) represents
+ information automatically extracted from a user's utterances by a
+ semantic interpretation component, where "utterance" is to be taken
+ in the general sense of a meaningful user input in any modality
+ supported by the MRCPv2 implementation.
+
+9.6.1. Markup Functions
+
+ MRCPv2 recognizer resources employ the Natural Language Semantics
+ Markup Language (NLSML) to interpret natural language speech input
+ and to format the interpretation for consumption by an MRCPv2 client.
+
+ The elements of the markup fall into the following general functional
+ categories: interpretation, side information, and multi-modal
+ integration.
+
+9.6.1.1. Interpretation
+
+ Elements and attributes represent the semantics of a user's
+ utterance, including the <result>, <interpretation>, and <instance>
+ elements. The <result> element contains the full result of
+ processing one utterance. It MAY contain multiple <interpretation>
+ elements if the interpretation of the utterance results in multiple
+ alternative meanings due to uncertainty in speech recognition or
+ natural language understanding. There are at least two reasons for
+ providing multiple interpretations:
+
+ 1. The client application might have additional information, for
+ example, information from a database, that would allow it to
+ select a preferred interpretation from among the possible
+ interpretations returned from the semantic interpreter.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 99]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ 2. A client-based dialog manager (e.g., VoiceXML
+ [W3C.REC-voicexml20-20040316]) that was unable to select between
+ several competing interpretations could use this information to
+ go back to the user and find out what was intended. For example,
+ it could issue a SPEAK request to a synthesizer resource to emit
+ "Did you say 'Boston' or 'Austin'?"
+
+9.6.1.2. Side Information
+
+ These are elements and attributes representing additional information
+ about the interpretation, over and above the interpretation itself.
+ Side information includes:
+
+ 1. Whether an interpretation was achieved (the <nomatch> element)
+ and the system's confidence in an interpretation (the
+ "confidence" attribute of <interpretation>).
+
+ 2. Alternative interpretations (<interpretation>)
+
+ 3. Input formats and Automatic Speech Recognition (ASR) information:
+ the <input> element, representing the input to the semantic
+ interpreter.
+
+9.6.1.3. Multi-Modal Integration
+
+ When more than one modality is available for input, the
+ interpretation of the inputs needs to be coordinated. The "mode"
+ attribute of <input> supports this by indicating whether the
+ utterance was input by speech, DTMF, pointing, etc. The "timestamp-
+ start" and "timestamp-end" attributes of <input> also provide for
+ temporal coordination by indicating when inputs occurred.
+
+9.6.2. Overview of Recognizer Result Elements and Their Relationships
+
+ The recognizer elements in NLSML fall into two categories:
+
+ 1. description of the input that was processed, and
+
+ 2. description of the meaning which was extracted from the input.
+
+ Next to each element are its attributes. In addition, some elements
+ can contain multiple instances of other elements. For example, a
+ <result> can contain multiple <interpretation> elements, each of
+ which is taken to be an alternative. Similarly, <input> can contain
+ multiple child <input> elements, which are taken to be cumulative.
+ To illustrate the basic usage of these elements, as a simple example,
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 100]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ consider the utterance "OK" (interpreted as "yes"). The example
+ illustrates how that utterance and its interpretation would be
+ represented in the NLSML markup.
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="http://www.example.com/theYesNoGrammar">
+ <interpretation>
+ <instance>
+ <ex:response>yes</ex:response>
+ </instance>
+ <input>OK</input>
+ </interpretation>
+ </result>
+
+ This example includes only the minimum required information. There
+ is an overall <result> element, which includes one interpretation and
+ an input element. The interpretation contains the application-
+ specific element "<response>", which is the semantically interpreted
+ result.
+
+9.6.3. Elements and Attributes
+
+9.6.3.1. <result> Root Element
+
+ The root element of the markup is <result>. The <result> element
+ includes one or more <interpretation> elements. Multiple
+ interpretations can result from ambiguities in the input or in the
+ semantic interpretation. If the "grammar" attribute does not apply
+ to all of the interpretations in the result, it can be overridden for
+ individual interpretations at the <interpretation> level.
+
+ Attributes:
+
+ 1. grammar: The grammar or recognition rule matched by this result.
+ The format of the grammar attribute will match the rule reference
+ semantics defined in the grammar specification. Specifically,
+ the rule reference is in the external XML form for grammar rule
+ references. The markup interpreter needs to know the grammar
+ rule that is matched by the utterance because multiple rules may
+ be simultaneously active. The value is the grammar URI used by
+ the markup interpreter to specify the grammar. The grammar can
+ be overridden by a grammar attribute in the <interpretation>
+ element if the input was ambiguous as to which grammar it
+ matched. If all interpretation elements within the result
+ element contain their own grammar attributes, the attribute can
+ be dropped from the result element.
+
+
+
+Burnett & Shanmugham Standards Track [Page 101]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ grammar="http://www.example.com/grammar">
+ <interpretation>
+ ....
+ </interpretation>
+ </result>
+
+9.6.3.2. <interpretation> Element
+
+ An <interpretation> element contains a single semantic
+ interpretation.
+
+ Attributes:
+
+ 1. confidence: A float value from 0.0-1.0 indicating the semantic
+ analyzer's confidence in this interpretation. A value of 1.0
+ indicates maximum confidence. The values are implementation
+ dependent but are intended to align with the value interpretation
+ for the confidence MRCPv2 header field defined in Section 9.4.1.
+ This attribute is OPTIONAL.
+
+ 2. grammar: The grammar or recognition rule matched by this
+ interpretation (if needed to override the grammar specification
+ at the <interpretation> level.) This attribute is only needed
+ under <interpretation> if it is necessary to override a grammar
+ that was defined at the <result> level. Note that the grammar
+ attribute for the interpretation element is optional if and only
+ if the grammar attribute is specified in the <result> element.
+
+ Interpretations MUST be sorted best-first by some measure of
+ "goodness". The goodness measure is "confidence" if present;
+ otherwise, it is some implementation-specific indication of quality.
+
+ The grammar is expected to be specified most frequently at the
+ <result> level. However, it can be overridden at the
+ <interpretation> level because it is possible that different
+ interpretations may match different grammar rules.
+
+ The <interpretation> element includes an optional <input> element
+ containing the input being analyzed, and at least one <instance>
+ element containing the interpretation of the utterance.
+
+ <interpretation confidence="0.75"
+ grammar="http://www.example.com/grammar">
+ ...
+ </interpretation>
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 102]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.6.3.3. <instance> Element
+
+ The <instance> element contains the interpretation of the utterance.
+ When the Semantic Interpretation for Speech Recognition format is
+ used, the <instance> element contains the XML serialization of the
+ result using the approach defined in that specification. When there
+ is semantic markup in the grammar that does not create semantic
+ objects, but instead only does a semantic translation of a portion of
+ the input, such as translating "coke" to "coca-cola", the instance
+ contains the whole input but with the translation applied. The NLSML
+ looks like the markup in Figure 2 below. If there are no semantic
+ objects created, nor any semantic translation, the instance value is
+ the same as the input value.
+
+ Attributes:
+
+ 1. confidence: Each element of the instance MAY have a confidence
+ attribute, defined in the NLSML namespace. The confidence
+ attribute contains a float value in the range from 0.0-1.0
+ reflecting the system's confidence in the analysis of that slot.
+ A value of 1.0 indicates maximum confidence. The values are
+ implementation dependent, but are intended to align with the
+ value interpretation for the MRCPv2 header field Confidence-
+ Threshold defined in Section 9.4.1. This attribute is OPTIONAL.
+
+ <instance>
+ <nameAddress>
+ <street confidence="0.75">123 Maple Street</street>
+ <city>Mill Valley</city>
+ <state>CA</state>
+ <zip>90952</zip>
+ </nameAddress>
+ </instance>
+ <input>
+ My address is 123 Maple Street,
+ Mill Valley, California, 90952
+ </input>
+
+
+ <instance>
+ I would like to buy a coca-cola
+ </instance>
+ <input>
+ I would like to buy a coke
+ </input>
+
+ Figure 2: NSLML Example
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 103]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.6.3.4. <input> Element
+
+ The <input> element is the text representation of a user's input. It
+ includes an optional "confidence" attribute, which indicates the
+ recognizer's confidence in the recognition result (as opposed to the
+ confidence in the interpretation, which is indicated by the
+ "confidence" attribute of <interpretation>). Optional "timestamp-
+ start" and "timestamp-end" attributes indicate the start and end
+ times of a spoken utterance, in ISO 8601 format [ISO.8601.1988].
+
+ Attributes:
+
+ 1. timestamp-start: The time at which the input began. (optional)
+
+ 2. timestamp-end: The time at which the input ended. (optional)
+
+ 3. mode: The modality of the input, for example, speech, DTMF, etc.
+ (optional)
+
+ 4. confidence: The confidence of the recognizer in the correctness
+ of the input in the range 0.0 to 1.0. (optional)
+
+ Note that it may not make sense for temporally overlapping inputs to
+ have the same mode; however, this constraint is not expected to be
+ enforced by implementations.
+
+ When there is no time zone designator, ISO 8601 time representations
+ default to local time.
+
+ There are three possible formats for the <input> element.
+
+ 1. The <input> element can contain simple text:
+
+ <input>onions</input>
+
+ A future possibility is for <input> to contain not only text but
+ additional markup that represents prosodic information that was
+ contained in the original utterance and extracted by the speech
+ recognizer. This depends on the availability of ASRs that are
+ capable of producing prosodic information. MRCPv2 clients MUST
+ be prepared to receive such markup and MAY make use of it.
+
+ 2. An <input> tag can also contain additional <input> tags. Having
+ additional input elements allows the representation to support
+ future multi-modal inputs as well as finer-grained speech
+ information, such as timestamps for individual words and word-
+ level confidences.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 104]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <input>
+ <input mode="speech" confidence="0.5"
+ timestamp-start="2000-04-03T0:00:00"
+ timestamp-end="2000-04-03T0:00:00.2">fried</input>
+ <input mode="speech" confidence="1.0"
+ timestamp-start="2000-04-03T0:00:00.25"
+ timestamp-end="2000-04-03T0:00:00.6">onions</input>
+ </input>
+
+ 3. Finally, the <input> element can contain <nomatch> and <noinput>
+ elements, which describe situations in which the speech
+ recognizer received input that it was unable to process or did
+ not receive any input at all, respectively.
+
+9.6.3.5. <nomatch> Element
+
+ The <nomatch> element under <input> is used to indicate that the
+ semantic interpreter was unable to successfully match any input with
+ confidence above the threshold. It can optionally contain the text
+ of the best of the (rejected) matches.
+
+ <interpretation>
+ <instance/>
+ <input confidence="0.1">
+ <nomatch/>
+ </input>
+ </interpretation>
+ <interpretation>
+ <instance/>
+ <input mode="speech" confidence="0.1">
+ <nomatch>I want to go to New York</nomatch>
+ </input>
+ </interpretation>
+
+9.6.3.6. <noinput> Element
+
+ <noinput> indicates that there was no input -- a timeout occurred in
+ the speech recognizer due to silence.
+ <interpretation>
+ <instance/>
+ <input>
+ <noinput/>
+ </input>
+ </interpretation>
+
+ If there are multiple levels of inputs, the most natural place for
+ <nomatch> and <noinput> elements to appear is under the highest level
+ of <input> for <noinput>, and under the appropriate level of
+
+
+
+Burnett & Shanmugham Standards Track [Page 105]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <interpretation> for <nomatch>. So, <noinput> means "no input at
+ all" and <nomatch> means "no match in speech modality" or "no match
+ in DTMF modality". For example, to represent garbled speech combined
+ with DTMF "1 2 3 4", the markup would be:
+ <input>
+ <input mode="speech"><nomatch/></input>
+ <input mode="dtmf">1 2 3 4</input>
+ </input>
+
+ Note: while <noinput> could be represented as an attribute of input,
+ <nomatch> cannot, since it could potentially include PCDATA content
+ with the best match. For parallelism, <noinput> is also an element.
+
+9.7. Enrollment Results
+
+ All enrollment elements are contained within a single
+ <enrollment-result> element under <result>. The elements are
+ described below and have the schema defined in Section 16.2. The
+ following elements are defined:
+
+ 1. num-clashes
+
+ 2. num-good-repetitions
+
+ 3. num-repetitions-still-needed
+
+ 4. consistency-status
+
+ 5. clash-phrase-ids
+
+ 6. transcriptions
+
+ 7. confusable-phrases
+
+9.7.1. <num-clashes> Element
+
+ The <num-clashes> element contains the number of clashes that this
+ pronunciation has with other pronunciations in an active enrollment
+ session. The associated Clash-Threshold header field determines the
+ sensitivity of the clash measurement. Note that clash testing can be
+ turned off completely by setting the Clash-Threshold header field
+ value to 0.
+
+9.7.2. <num-good-repetitions> Element
+
+ The <num-good-repetitions> element contains the number of consistent
+ pronunciations obtained so far in an active enrollment session.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 106]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.7.3. <num-repetitions-still-needed> Element
+
+ The <num-repetitions-still-needed> element contains the number of
+ consistent pronunciations that must still be obtained before the new
+ phrase can be added to the enrollment grammar. The number of
+ consistent pronunciations required is specified by the client in the
+ request header field Num-Min-Consistent-Pronunciations. The returned
+ value must be 0 before the client can successfully commit a phrase to
+ the grammar by ending the enrollment session.
+
+9.7.4. <consistency-status> Element
+
+ The <consistency-status> element is used to indicate how consistent
+ the repetitions are when learning a new phrase. It can have the
+ values of consistent, inconsistent, and undecided.
+
+9.7.5. <clash-phrase-ids> Element
+
+ The <clash-phrase-ids> element contains the phrase IDs of clashing
+ pronunciation(s), if any. This element is absent if there are no
+ clashes.
+
+9.7.6. <transcriptions> Element
+
+ The <transcriptions> element contains the transcriptions returned in
+ the last repetition of the phrase being enrolled.
+
+9.7.7. <confusable-phrases> Element
+
+ The <confusable-phrases> element contains a list of phrases from a
+ command grammar that are confusable with the phrase being added to
+ the personal grammar. This element MAY be absent if there are no
+ confusable phrases.
+
+9.8. DEFINE-GRAMMAR
+
+ The DEFINE-GRAMMAR method, from the client to the server, provides
+ one or more grammars and requests the server to access, fetch, and
+ compile the grammars as needed. The DEFINE-GRAMMAR method
+ implementation MUST do a fetch of all external URIs that are part of
+ that operation. If caching is implemented, this URI fetching MUST
+ conform to the cache control hints and parameter header fields
+ associated with the method in deciding whether the URIs should be
+ fetched from cache or from the external server. If these hints/
+ parameters are not specified in the method, the values set for the
+ session using SET-PARAMS/GET-PARAMS apply. If it was not set for the
+ session, their default values apply.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 107]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ If the server resource is in the recognition state, the DEFINE-
+ GRAMMAR request MUST respond with a failure status.
+
+ If the resource is in the idle state and is able to successfully
+ process the supplied grammars, the server MUST return a success code
+ status and the request-state MUST be COMPLETE.
+
+ If the recognizer resource could not define the grammar for some
+ reason (for example, if the download failed, the grammar failed to
+ compile, or the grammar was in an unsupported form), the MRCPv2
+ response for the DEFINE-GRAMMAR method MUST contain a failure status-
+ code of 407 and contain a Completion-Cause header field describing
+ the failure reason.
+
+ C->S:MRCP/2.0 ... DEFINE-GRAMMAR 543257
+ Channel-Identifier:32AECB23433801@speechrecog
+ Content-Type:application/srgs+xml
+ Content-ID:<request1@form-level.store>
+ Content-Length:...
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ </grammar>
+
+ S->C:MRCP/2.0 ... 543257 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+
+
+
+Burnett & Shanmugham Standards Track [Page 108]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ C->S:MRCP/2.0 ... DEFINE-GRAMMAR 543258
+ Channel-Identifier:32AECB23433801@speechrecog
+ Content-Type:application/srgs+xml
+ Content-ID:<helpgrammar@root-level.store>
+ Content-Length:...
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0">
+
+ <rule id="request">
+ I need help
+ </rule>
+
+ S->C:MRCP/2.0 ... 543258 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+
+ C->S:MRCP/2.0 ... DEFINE-GRAMMAR 543259
+ Channel-Identifier:32AECB23433801@speechrecog
+ Content-Type:application/srgs+xml
+ Content-ID:<request2@field-level.store>
+ Content-Length:...
+
+ <?xml version="1.0" encoding="UTF-8"?>
+
+ <!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
+ "http://www.w3.org/TR/speech-grammar/grammar.dtd">
+
+ <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/06/grammar
+ http://www.w3.org/TR/speech-grammar/grammar.xsd"
+ version="1.0" mode="voice" root="basicCmd">
+
+ <meta name="author" content="Stephanie Williams"/>
+
+ <rule id="basicCmd" scope="public">
+ <example> please move the window </example>
+ <example> open a file </example>
+
+ <ruleref
+ uri="http://grammar.example.com/politeness.grxml#startPolite"/>
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 109]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <ruleref uri="#command"/>
+ <ruleref
+ uri="http://grammar.example.com/politeness.grxml#endPolite"/>
+ </rule>
+
+ <rule id="command">
+ <ruleref uri="#action"/> <ruleref uri="#object"/>
+ </rule>
+
+ <rule id="action">
+ <one-of>
+ <item weight="10"> open <tag>open</tag> </item>
+ <item weight="2"> close <tag>close</tag> </item>
+ <item weight="1"> delete <tag>delete</tag> </item>
+ <item weight="1"> move <tag>move</tag> </item>
+ </one-of>
+ </rule>
+
+ <rule id="object">
+ <item repeat="0-1">
+ <one-of>
+ <item> the </item>
+ <item> a </item>
+ </one-of>
+ </item>
+
+ <one-of>
+ <item> window </item>
+ <item> file </item>
+ <item> menu </item>
+ </one-of>
+ </rule>
+
+ </grammar>
+
+
+ S->C:MRCP/2.0 ... 543259 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+
+ C->S:MRCP/2.0 ... RECOGNIZE 543260
+ Channel-Identifier:32AECB23433801@speechrecog
+ N-Best-List-Length:2
+ Content-Type:text/uri-list
+ Content-Length:...
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 110]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ session:request1@form-level.store
+ session:request2@field-level.store
+ session:helpgramar@root-level.store
+
+ S->C:MRCP/2.0 ... 543260 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+ S->C:MRCP/2.0 ... START-OF-INPUT 543260 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+ S->C:MRCP/2.0 ... RECOGNITION-COMPLETE 543260 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+ Waveform-URI:<http://web.media.com/session123/audio.wav>;
+ size=124535;duration=2340
+ Content-Type:application/x-nlsml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <instance name="Person">
+ <ex:Person>
+ <ex:Name> Andre Roy </ex:Name>
+ </ex:Person>
+ </instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+ Define Grammar Example
+
+9.9. RECOGNIZE
+
+ The RECOGNIZE method from the client to the server requests the
+ recognizer to start recognition and provides it with one or more
+ grammar references for grammars to match against the input media.
+ The RECOGNIZE method can carry header fields to control the
+ sensitivity, confidence level, and the level of detail in results
+ provided by the recognizer. These header field values override the
+ current values set by a previous SET-PARAMS method.
+
+ The RECOGNIZE method can request the recognizer resource to operate
+ in normal or hotword mode as specified by the Recognition-Mode header
+ field. The default value is "normal". If the resource could not
+ start a recognition, the server MUST respond with a failure status-
+
+
+
+Burnett & Shanmugham Standards Track [Page 111]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ code of 407 and a Completion-Cause header field in the response
+ describing the cause of failure.
+
+ The RECOGNIZE request uses the message body to specify the grammars
+ applicable to the request. The active grammar(s) for the request can
+ be specified in one of three ways. If the client needs to explicitly
+ control grammar weights for the recognition operation, it MUST employ
+ method 3 below. The order of these grammars specifies the precedence
+ of the grammars that is used when more than one grammar in the list
+ matches the speech; in this case, the grammar with the higher
+ precedence is returned as a match. This precedence capability is
+ useful in applications like VoiceXML browsers to order grammars
+ specified at the dialog, document, and root level of a VoiceXML
+ application.
+
+ 1. The grammar MAY be placed directly in the message body as typed
+ content. If more than one grammar is included in the body, the
+ order of inclusion controls the corresponding precedence for the
+ grammars during recognition, with earlier grammars in the body
+ having a higher precedence than later ones.
+
+ 2. The body MAY contain a list of grammar URIs specified in content
+ of media type 'text/uri-list' [RFC2483]. The order of the URIs
+ determines the corresponding precedence for the grammars during
+ recognition, with highest precedence first and decreasing for
+ each URI thereafter.
+
+ 3. The body MAY contain a list of grammar URIs specified in content
+ of media type 'text/grammar-ref-list'. This type defines a list
+ of grammar URIs and allows each grammar URI to be assigned a
+ weight in the list. This weight has the same meaning as the
+ weights described in Section 2.4.1 of the Speech Grammar Markup
+ Format (SRGS) [W3C.REC-speech-grammar-20040316].
+
+ In addition to performing recognition on the input, the recognizer
+ MUST also enroll the collected utterance in a personal grammar if the
+ Enroll-Utterance header field is set to true and an Enrollment is
+ active (via an earlier execution of the START-PHRASE-ENROLLMENT
+ method). If so, and if the RECOGNIZE request contains a Content-ID
+ header field, then the resulting grammar (which includes the personal
+ grammar as a sub-grammar) can be referenced through the 'session' URI
+ scheme (see Section 13.6).
+
+ If the resource was able to successfully start the recognition, the
+ server MUST return a success status-code and a request-state of
+ IN-PROGRESS. This means that the recognizer is active and that the
+ client MUST be prepared to receive further events with this
+ request-id.
+
+
+
+Burnett & Shanmugham Standards Track [Page 112]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ If the resource was able to queue the request, the server MUST return
+ a success code and request-state of PENDING. This means that the
+ recognizer is currently active with another request and that this
+ request has been queued for processing.
+
+ If the resource could not start a recognition, the server MUST
+ respond with a failure status-code of 407 and a Completion-Cause
+ header field in the response describing the cause of failure.
+
+ For the recognizer resource, RECOGNIZE and INTERPRET are the only
+ requests that return a request-state of IN-PROGRESS, meaning that
+ recognition is in progress. When the recognition completes by
+ matching one of the grammar alternatives or by a timeout without a
+ match or for some other reason, the recognizer resource MUST send the
+ client a RECOGNITION-COMPLETE event (or INTERPRETATION-COMPLETE, if
+ INTERPRET was the request) with the result of the recognition and a
+ request-state of COMPLETE.
+
+ Large grammars can take a long time for the server to compile. For
+ grammars that are used repeatedly, the client can improve server
+ performance by issuing a DEFINE-GRAMMAR request with the grammar
+ ahead of time. In such a case, the client can issue the RECOGNIZE
+ request and reference the grammar through the 'session' URI scheme
+ (see Section 13.6). This also applies in general if the client wants
+ to repeat recognition with a previous inline grammar.
+
+ The RECOGNIZE method implementation MUST do a fetch of all external
+ URIs that are part of that operation. If caching is implemented,
+ this URI fetching MUST conform to the cache control hints and
+ parameter header fields associated with the method in deciding
+ whether it should be fetched from cache or from the external server.
+ If these hints/parameters are not specified in the method, the values
+ set for the session using SET-PARAMS/GET-PARAMS apply. If it was not
+ set for the session, their default values apply.
+
+ Note that since the audio and the messages are carried over separate
+ communication paths there may be a race condition between the start
+ of the flow of audio and the receipt of the RECOGNIZE method. For
+ example, if an audio flow is started by the client at the same time
+ as the RECOGNIZE method is sent, either the audio or the RECOGNIZE
+ can arrive at the recognizer first. As another example, the client
+ may choose to continuously send audio to the server and signal the
+ server to recognize using the RECOGNIZE method. Mechanisms to
+ resolve this condition are outside the scope of this specification.
+ The recognizer can expect the media to start flowing when it receives
+ the RECOGNIZE request, but it MUST NOT buffer anything it receives
+ beforehand in order to preserve the semantics that application
+ authors expect with respect to the input timers.
+
+
+
+Burnett & Shanmugham Standards Track [Page 113]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ When a RECOGNIZE method has been received, the recognition is
+ initiated on the stream. The No-Input-Timer MUST be started at this
+ time if the Start-Input-Timers header field is specified as "true".
+ If this header field is set to "false", the No-Input-Timer MUST be
+ started when it receives the START-INPUT-TIMERS method from the
+ client. The Recognition-Timeout MUST be started when the recognition
+ resource detects speech or a DTMF digit in the media stream.
+
+ For recognition when not in hotword mode:
+
+ When the recognizer resource detects speech or a DTMF digit in the
+ media stream, it MUST send the START-OF-INPUT event. When enough
+ speech has been collected for the server to process, the recognizer
+ can try to match the collected speech with the active grammars. If
+ the speech collected at this point fully matches with any of the
+ active grammars, the Speech-Complete-Timer is started. If it matches
+ partially with one or more of the active grammars, with more speech
+ needed before a full match is achieved, then the Speech-Incomplete-
+ Timer is started.
+
+ 1. When the No-Input-Timer expires, the recognizer MUST complete
+ with a Completion-Cause code of "no-input-timeout".
+
+ 2. The recognizer MUST support detecting a no-match condition upon
+ detecting end of speech. The recognizer MAY support detecting a
+ no-match condition before waiting for end-of-speech. If this is
+ supported, this capability is enabled by setting the Early-No-
+ Match header field to "true". Upon detecting a no-match
+ condition, the RECOGNIZE MUST return with "no-match".
+
+ 3. When the Speech-Incomplete-Timer expires, the recognizer SHOULD
+ complete with a Completion-Cause code of "partial-match", unless
+ the recognizer cannot differentiate a partial-match, in which
+ case it MUST return a Completion-Cause code of "no-match". The
+ recognizer MAY return results for the partially matched grammar.
+
+ 4. When the Speech-Complete-Timer expires, the recognizer MUST
+ complete with a Completion-Cause code of "success".
+
+
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 114]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ 5. When the Recognition-Timeout expires, one of the following MUST
+ happen:
+
+ 5.1. If there was a partial-match, the recognizer SHOULD
+ complete with a Completion-Cause code of "partial-match-
+ maxtime", unless the recognizer cannot differentiate a
+ partial-match, in which case it MUST complete with a
+ Completion-Cause code of "no-match-maxtime". The
+ recognizer MAY return results for the partially matched
+ grammar.
+
+ 5.2. If there was a full-match, the recognizer MUST complete
+ with a Completion-Cause code of "success-maxtime".
+
+ 5.3. If there was a no match, the recognizer MUST complete with
+ a Completion-Cause code of "no-match-maxtime".
+
+ For recognition in hotword mode:
+
+ Note that for recognition in hotword mode the START-OF-INPUT event is
+ not generated when speech or a DTMF digit is detected.
+
+ 1. When the No-Input-Timer expires, the recognizer MUST complete
+ with a Completion-Cause code of "no-input-timeout".
+
+ 2. If at any point a match occurs, the RECOGNIZE MUST complete with
+ a Completion-Cause code of "success".
+
+ 3. When the Recognition-Timeout expires and there is not a match,
+ the RECOGNIZE MUST complete with a Completion-Cause code of
+ "hotword-maxtime".
+
+ 4. When the Recognition-Timeout expires and there is a match, the
+ RECOGNIZE MUST complete with a Completion-Cause code of "success-
+ maxtime".
+
+ 5. When the Recognition-Timeout is running but the detected speech/
+ DTMF has not resulted in a match, the Recognition-Timeout MUST be
+ stopped and reset. It MUST then be restarted when speech/DTMF is
+ again detected.
+
+ Below is a complete example of using RECOGNIZE. It shows the call to
+ RECOGNIZE, the IN-PROGRESS and START-OF-INPUT status messages, and
+ the final RECOGNITION-COMPLETE message containing the result.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 115]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ C->S:MRCP/2.0 ... RECOGNIZE 543257
+ Channel-Identifier:32AECB23433801@speechrecog
+ Confidence-Threshold:0.9
+ Content-Type:application/srgs+xml
+ Content-ID:<request1@form-level.store>
+ Content-Length:...
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0" root="request">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+
+ </grammar>
+
+ S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+ S->C:MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+ S->C:MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+ Waveform-URI:<http://web.media.com/session123/audio.wav>;
+ size=424252;duration=2543
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 116]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <instance name="Person">
+ <ex:Person>
+ <ex:Name> Andre Roy </ex:Name>
+ </ex:Person>
+ </instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+ Below is an example of calling RECOGNIZE with a different grammar.
+ No status or completion messages are shown in this example, although
+ they would of course occur in normal usage.
+
+ C->S: MRCP/2.0 ... RECOGNIZE 543257
+ Channel-Identifier:32AECB23433801@speechrecog
+ Confidence-Threshold:0.9
+ Fetch-Timeout:20
+ Content-Type:application/srgs+xml
+ Content-Length:...
+
+ <?xml version="1.0"? Version="1.0" mode="voice"
+ root="Basic md">
+ <rule id="rule_list" scope="public">
+ <one-of>
+ <item weight=10>
+ <ruleref uri=
+ "http://grammar.example.com/world-cities.grxml#canada"/>
+ </item>
+ <item weight=1.5>
+ <ruleref uri=
+ "http://grammar.example.com/world-cities.grxml#america"/>
+ </item>
+ <item weight=0.5>
+ <ruleref uri=
+ "http://grammar.example.com/world-cities.grxml#india"/>
+ </item>
+ </one-of>
+ </rule>
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 117]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.10. STOP
+
+ The STOP method from the client to the server tells the resource to
+ stop recognition if a request is active. If a RECOGNIZE request is
+ active and the STOP request successfully terminated it, then the
+ response header section contains an Active-Request-Id-List header
+ field containing the request-id of the RECOGNIZE request that was
+ terminated. In this case, no RECOGNITION-COMPLETE event is sent for
+ the terminated request. If there was no recognition active, then the
+ response MUST NOT contain an Active-Request-Id-List header field.
+ Either way, the response MUST contain a status-code of 200 "Success".
+
+ C->S: MRCP/2.0 ... RECOGNIZE 543257
+ Channel-Identifier:32AECB23433801@speechrecog
+ Confidence-Threshold:0.9
+ Content-Type:application/srgs+xml
+ Content-ID:<request1@form-level.store>
+ Content-Length:...
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0" root="request">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+ </grammar>
+
+ S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+ C->S: MRCP/2.0 ... STOP 543258 200
+ Channel-Identifier:32AECB23433801@speechrecog
+
+
+
+Burnett & Shanmugham Standards Track [Page 118]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... 543258 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Active-Request-Id-List:543257
+
+9.11. GET-RESULT
+
+ The GET-RESULT method from the client to the server MAY be issued
+ when the recognizer resource is in the recognized state. This
+ request allows the client to retrieve results for a completed
+ recognition. This is useful if the client decides it wants more
+ alternatives or more information. When the server receives this
+ request, it re-computes and returns the results according to the
+ recognition constraints provided in the GET-RESULT request.
+
+ The GET-RESULT request can specify constraints such as a different
+ confidence-threshold or n-best-list-length. This capability is
+ OPTIONAL for MRCPv2 servers and the automatic speech recognition
+ engine in the server MUST return a status of unsupported feature if
+ not supported.
+
+ C->S: MRCP/2.0 ... GET-RESULT 543257
+ Channel-Identifier:32AECB23433801@speechrecog
+ Confidence-Threshold:0.9
+
+
+ S->C: MRCP/2.0 ... 543257 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <instance name="Person">
+ <ex:Person>
+ <ex:Name> Andre Roy </ex:Name>
+ </ex:Person>
+ </instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 119]
+
+RFC 6787 MRCPv2 November 2012
+
+
+9.12. START-OF-INPUT
+
+ This is an event from the server to the client indicating that the
+ recognizer resource has detected speech or a DTMF digit in the media
+ stream. This event is useful in implementing kill-on-barge-in
+ scenarios when a synthesizer resource is in a different session from
+ the recognizer resource and hence is not aware of an incoming audio
+ source (see Section 8.4.2). In these cases, it is up to the client
+ to act as an intermediary and respond to this event by issuing a
+ BARGE-IN-OCCURRED event to the synthesizer resource. The recognizer
+ resource also MUST send a Proxy-Sync-Id header field with a unique
+ value for this event.
+
+ This event MUST be generated by the server, irrespective of whether
+ or not the synthesizer and recognizer are on the same server.
+
+9.13. START-INPUT-TIMERS
+
+ This request is sent from the client to the recognizer resource when
+ it knows that a kill-on-barge-in prompt has finished playing (see
+ Section 8.4.2). This is useful in the scenario when the recognition
+ and synthesizer engines are not in the same session. When a kill-on-
+ barge-in prompt is being played, the client may want a RECOGNIZE
+ request to be simultaneously active so that it can detect and
+ implement kill-on-barge-in. But at the same time the client doesn't
+ want the recognizer to start the no-input timers until the prompt is
+ finished. The Start-Input-Timers header field in the RECOGNIZE
+ request allows the client to say whether or not the timers should be
+ started immediately. If not, the recognizer resource MUST NOT start
+ the timers until the client sends a START-INPUT-TIMERS method to the
+ recognizer.
+
+9.14. RECOGNITION-COMPLETE
+
+ This is an event from the recognizer resource to the client
+ indicating that the recognition completed. The recognition result is
+ sent in the body of the MRCPv2 message. The request-state field MUST
+ be COMPLETE indicating that this is the last event with that
+ request-id and that the request with that request-id is now complete.
+ The server MUST maintain the recognizer context containing the
+ results and the audio waveform input of that recognition until the
+ next RECOGNIZE request is issued for that resource or the session
+ terminates. If the server returns a URI to the audio waveform, it
+ MUST do so in a Waveform-URI header field in the RECOGNITION-COMPLETE
+ event. The client can use this URI to retrieve or playback the
+ audio.
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 120]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Note, if an enrollment session was active, the RECOGNITION-COMPLETE
+ event can contain either recognition or enrollment results depending
+ on what was spoken. The following example shows a complete exchange
+ with a recognition result.
+
+ C->S: MRCP/2.0 ... RECOGNIZE 543257
+ Channel-Identifier:32AECB23433801@speechrecog
+ Confidence-Threshold:0.9
+ Content-Type:application/srgs+xml
+ Content-ID:<request1@form-level.store>
+ Content-Length:...
+
+ <?xml version="1.0"?>
+
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0" root="request">
+
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+ </grammar>
+
+ S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+ S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 121]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+ Waveform-URI:<http://web.media.com/session123/audio.wav>;
+ size=342456;duration=25435
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <instance name="Person">
+ <ex:Person>
+ <ex:Name> Andre Roy </ex:Name>
+ </ex:Person>
+ </instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+ If the result were instead an enrollment result, the final message
+ from the server above could have been:
+
+ S->C: MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version= "1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ grammar="Personal-Grammar-URI">
+ <enrollment-result>
+ <num-clashes> 2 </num-clashes>
+ <num-good-repetitions> 1 </num-good-repetitions>
+ <num-repetitions-still-needed>
+ 1
+ </num-repetitions-still-needed>
+ <consistency-status> consistent </consistency-status>
+ <clash-phrase-ids>
+ <item> Jeff </item> <item> Andre </item>
+ </clash-phrase-ids>
+ <transcriptions>
+ <item> m ay b r ow k er </item>
+ <item> m ax r aa k ah </item>
+ </transcriptions>
+
+
+
+Burnett & Shanmugham Standards Track [Page 122]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <confusable-phrases>
+ <item>
+ <phrase> call </phrase>
+ <confusion-level> 10 </confusion-level>
+ </item>
+ </confusable-phrases>
+ </enrollment-result>
+ </result>
+
+9.15. START-PHRASE-ENROLLMENT
+
+ The START-PHRASE-ENROLLMENT method from the client to the server
+ starts a new phrase enrollment session during which the client can
+ call RECOGNIZE multiple times to enroll a new utterance in a grammar.
+ An enrollment session consists of a set of calls to RECOGNIZE in
+ which the caller speaks a phrase several times so the system can
+ "learn" it. The phrase is then added to a personal grammar (speaker-
+ trained grammar), so that the system can recognize it later.
+
+ Only one phrase enrollment session can be active at a time for a
+ resource. The Personal-Grammar-URI identifies the grammar that is
+ used during enrollment to store the personal list of phrases. Once
+ RECOGNIZE is called, the result is returned in a RECOGNITION-COMPLETE
+ event and will contain either an enrollment result OR a recognition
+ result for a regular recognition.
+
+ Calling END-PHRASE-ENROLLMENT ends the ongoing phrase enrollment
+ session, which is typically done after a sequence of successful calls
+ to RECOGNIZE. This method can be called to commit the new phrase to
+ the personal grammar or to abort the phrase enrollment session.
+
+ The grammar to contain the new enrolled phrase, specified by
+ Personal-Grammar-URI, is created if it does not exist. Also, the
+ personal grammar MUST ONLY contain phrases added via a phrase
+ enrollment session.
+
+ The Phrase-ID passed to this method is used to identify this phrase
+ in the grammar and will be returned as the speech input when doing a
+ RECOGNIZE on the grammar. The Phrase-NL similarly is returned in a
+ RECOGNITION-COMPLETE event in the same manner as other Natural
+ Language (NL) in a grammar. The tag-format of this NL is
+ implementation specific.
+
+ If the client has specified Save-Best-Waveform as true, then the
+ response after ending the phrase enrollment session MUST contain the
+ location/URI of a recording of the best repetition of the learned
+ phrase.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 123]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ C->S: MRCP/2.0 ... START-PHRASE-ENROLLMENT 543258
+ Channel-Identifier:32AECB23433801@speechrecog
+ Num-Min-Consistent-Pronunciations:2
+ Consistency-Threshold:30
+ Clash-Threshold:12
+ Personal-Grammar-URI:<personal grammar uri>
+ Phrase-Id:<phrase id>
+ Phrase-NL:<NL phrase>
+ Weight:1
+ Save-Best-Waveform:true
+
+ S->C: MRCP/2.0 ... 543258 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+
+9.16. ENROLLMENT-ROLLBACK
+
+ The ENROLLMENT-ROLLBACK method discards the last live utterance from
+ the RECOGNIZE operation. The client can invoke this method when the
+ caller provides undesirable input such as non-speech noises, side-
+ speech, commands, utterance from the RECOGNIZE grammar, etc. Note
+ that this method does not provide a stack of rollback states.
+ Executing ENROLLMENT-ROLLBACK twice in succession without an
+ intervening recognition operation has no effect the second time.
+
+ C->S: MRCP/2.0 ... ENROLLMENT-ROLLBACK 543261
+ Channel-Identifier:32AECB23433801@speechrecog
+
+ S->C: MRCP/2.0 ... 543261 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+
+9.17. END-PHRASE-ENROLLMENT
+
+ The client MAY call the END-PHRASE-ENROLLMENT method ONLY during an
+ active phrase enrollment session. It MUST NOT be called during an
+ ongoing RECOGNIZE operation. To commit the new phrase in the
+ grammar, the client MAY call this method once successive calls to
+ RECOGNIZE have succeeded and Num-Repetitions-Still-Needed has been
+ returned as 0 in the RECOGNITION-COMPLETE event. Alternatively, the
+ client MAY abort the phrase enrollment session by calling this method
+ with the Abort-Phrase-Enrollment header field.
+
+ If the client has specified Save-Best-Waveform as "true" in the
+ START-PHRASE-ENROLLMENT request, then the response MUST contain a
+ Waveform-URI header whose value is the location/URI of a recording of
+ the best repetition of the learned phrase.
+
+ C->S: MRCP/2.0 ... END-PHRASE-ENROLLMENT 543262
+ Channel-Identifier:32AECB23433801@speechrecog
+
+
+
+Burnett & Shanmugham Standards Track [Page 124]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... 543262 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Waveform-URI:<http://mediaserver.com/recordings/file1324.wav>;
+ size=242453;duration=25432
+
+9.18. MODIFY-PHRASE
+
+ The MODIFY-PHRASE method sent from the client to the server is used
+ to change the phrase ID, NL phrase, and/or weight for a given phrase
+ in a personal grammar.
+
+ If no fields are supplied, then calling this method has no effect.
+
+ C->S: MRCP/2.0 ... MODIFY-PHRASE 543265
+ Channel-Identifier:32AECB23433801@speechrecog
+ Personal-Grammar-URI:<personal grammar uri>
+ Phrase-Id:<phrase id>
+ New-Phrase-Id:<new phrase id>
+ Phrase-NL:<NL phrase>
+ Weight:1
+
+ S->C: MRCP/2.0 ... 543265 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+
+9.19. DELETE-PHRASE
+
+ The DELETE-PHRASE method sent from the client to the server is used
+ to delete a phase that is in a personal grammar and was added through
+ voice enrollment or text enrollment. If the specified phrase does
+ not exist, this method has no effect.
+
+ C->S: MRCP/2.0 ... DELETE-PHRASE 543266
+ Channel-Identifier:32AECB23433801@speechrecog
+ Personal-Grammar-URI:<personal grammar uri>
+ Phrase-Id:<phrase id>
+
+ S->C: MRCP/2.0 ... 543266 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+
+9.20. INTERPRET
+
+ The INTERPRET method from the client to the server takes as input an
+ Interpret-Text header field containing the text for which the
+ semantic interpretation is desired, and returns, via the
+ INTERPRETATION-COMPLETE event, an interpretation result that is very
+ similar to the one returned from a RECOGNIZE method invocation. Only
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 125]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ portions of the result relevant to acoustic matching are excluded
+ from the result. The Interpret-Text header field MUST be included in
+ the INTERPRET request.
+
+ Recognizer grammar data is treated in the same way as it is when
+ issuing a RECOGNIZE method call.
+
+ If a RECOGNIZE, RECORD, or another INTERPRET operation is already in
+ progress for the resource, the server MUST reject the request with a
+ response having a status-code of 402 "Method not valid in this
+ state", and a COMPLETE request state.
+
+ C->S: MRCP/2.0 ... INTERPRET 543266
+ Channel-Identifier:32AECB23433801@speechrecog
+ Interpret-Text:may I speak to Andre Roy
+ Content-Type:application/srgs+xml
+ Content-ID:<request1@form-level.store>
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0" root="request">
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+ </grammar>
+
+ S->C: MRCP/2.0 ... 543266 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 126]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... INTERPRETATION-COMPLETE 543266 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <instance name="Person">
+ <ex:Person>
+ <ex:Name> Andre Roy </ex:Name>
+ </ex:Person>
+ </instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+9.21. INTERPRETATION-COMPLETE
+
+ This event from the recognizer resource to the client indicates that
+ the INTERPRET operation is complete. The interpretation result is
+ sent in the body of the MRCP message. The request state MUST be set
+ to COMPLETE.
+
+ The Completion-Cause header field MUST be included in this event and
+ MUST be set to an appropriate value from the list of cause codes.
+
+ C->S: MRCP/2.0 ... INTERPRET 543266
+ Channel-Identifier:32AECB23433801@speechrecog
+ Interpret-Text:may I speak to Andre Roy
+ Content-Type:application/srgs+xml
+ Content-ID:<request1@form-level.store>
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0" root="request">
+ <!-- single language attachment to tokens -->
+ <rule id="yes">
+ <one-of>
+ <item xml:lang="fr-CA">oui</item>
+ <item xml:lang="en-US">yes</item>
+ </one-of>
+ </rule>
+
+
+
+Burnett & Shanmugham Standards Track [Page 127]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ may I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+ </grammar>
+
+ S->C: MRCP/2.0 ... 543266 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+ S->C: MRCP/2.0 ... INTERPRETATION-COMPLETE 543266 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <instance name="Person">
+ <ex:Person>
+ <ex:Name> Andre Roy </ex:Name>
+ </ex:Person>
+ </instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+9.22. DTMF Detection
+
+ Digits received as DTMF tones are delivered to the recognition
+ resource in the MRCPv2 server in the RTP stream according to RFC 4733
+ [RFC4733]. The Automatic Speech Recognizer (ASR) MUST support RFC
+ 4733 to recognize digits, and it MAY support recognizing DTMF tones
+ [Q.23] in the audio.
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 128]
+
+RFC 6787 MRCPv2 November 2012
+
+
+10. Recorder Resource
+
+ This resource captures received audio and video and stores it as
+ content pointed to by a URI. The main usages of recorders are
+
+ 1. to capture speech audio that may be submitted for recognition at
+ a later time, and
+
+ 2. recording voice or video mails.
+
+ Both these applications require functionality above and beyond those
+ specified by protocols such as RTSP [RFC2326]. This includes audio
+ endpointing (i.e., detecting speech or silence). The support for
+ video is OPTIONAL and is mainly capturing video mails that may
+ require the speech or audio processing mentioned above.
+
+ A recorder MUST provide endpointing capabilities for suppressing
+ silence at the beginning and end of a recording, and it MAY also
+ suppress silence in the middle of a recording. If such suppression
+ is done, the recorder MUST maintain timing metadata to indicate the
+ actual time stamps of the recorded media.
+
+ See the discussion on the sensitivity of saved waveforms in
+ Section 12.
+
+10.1. Recorder State Machine
+
+ Idle Recording
+ State State
+ | |
+ |---------RECORD------->|
+ | |
+ |<------STOP------------|
+ | |
+ |<--RECORD-COMPLETE-----|
+ | |
+ | |--------|
+ | START-OF-INPUT |
+ | |------->|
+ | |
+ | |--------|
+ | START-INPUT-TIMERS |
+ | |------->|
+ | |
+
+ Recorder State Machine
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 129]
+
+RFC 6787 MRCPv2 November 2012
+
+
+10.2. Recorder Methods
+
+ The recorder resource supports the following methods.
+
+ recorder-method = "RECORD"
+ / "STOP"
+ / "START-INPUT-TIMERS"
+
+10.3. Recorder Events
+
+ The recorder resource can generate the following events.
+
+ recorder-event = "START-OF-INPUT"
+ / "RECORD-COMPLETE"
+
+10.4. Recorder Header Fields
+
+ Method invocations for the recorder resource can contain resource-
+ specific header fields containing request options and information to
+ augment the Method, Response, or Event message it is associated with.
+
+ recorder-header = sensitivity-level
+ / no-input-timeout
+ / completion-cause
+ / completion-reason
+ / failed-uri
+ / failed-uri-cause
+ / record-uri
+ / media-type
+ / max-time
+ / trim-length
+ / final-silence
+ / capture-on-speech
+ / ver-buffer-utterance
+ / start-input-timers
+ / new-audio-channel
+
+10.4.1. Sensitivity-Level
+
+ To filter out background noise and not mistake it for speech, the
+ recorder can support a variable level of sound sensitivity. The
+ Sensitivity-Level header field is a float value between 0.0 and 1.0
+ and allows the client to set the sensitivity level for the recorder.
+ This header field MAY occur in RECORD, SET-PARAMS, or GET-PARAMS. A
+ higher value for this header field means higher sensitivity. The
+ default value for this header field is implementation specific.
+
+ sensitivity-level = "Sensitivity-Level" ":" FLOAT CRLF
+
+
+
+Burnett & Shanmugham Standards Track [Page 130]
+
+RFC 6787 MRCPv2 November 2012
+
+
+10.4.2. No-Input-Timeout
+
+ When recording is started and there is no speech detected for a
+ certain period of time, the recorder can send a RECORD-COMPLETE event
+ to the client and terminate the record operation. The No-Input-
+ Timeout header field can set this timeout value. The value is in
+ milliseconds. This header field MAY occur in RECORD, SET-PARAMS, or
+ GET-PARAMS. The value for this header field ranges from 0 to an
+ implementation-specific maximum value. The default value for this
+ header field is implementation specific.
+
+ no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT CRLF
+
+10.4.3. Completion-Cause
+
+ This header field MUST be part of a RECORD-COMPLETE event from the
+ recorder resource to the client. This indicates the reason behind
+ the RECORD method completion. This header field MUST be sent in the
+ RECORD responses if they return with a failure status and a COMPLETE
+ state. In the ABNF below, the 'cause-code' contains a numerical
+ value selected from the Cause-Code column of the following table.
+ The 'cause-name' contains the corresponding token selected from the
+ Cause-Name column.
+
+ completion-cause = "Completion-Cause" ":" cause-code SP
+ cause-name CRLF
+ cause-code = 3DIGIT
+ cause-name = *VCHAR
+
+ +------------+-----------------------+------------------------------+
+ | Cause-Code | Cause-Name | Description |
+ +------------+-----------------------+------------------------------+
+ | 000 | success-silence | RECORD completed with a |
+ | | | silence at the end. |
+ | 001 | success-maxtime | RECORD completed after |
+ | | | reaching maximum recording |
+ | | | time specified in record |
+ | | | method. |
+ | 002 | no-input-timeout | RECORD failed due to no |
+ | | | input. |
+ | 003 | uri-failure | Failure accessing the record |
+ | | | URI. |
+ | 004 | error | RECORD request terminated |
+ | | | prematurely due to a |
+ | | | recorder error. |
+ +------------+-----------------------+------------------------------+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 131]
+
+RFC 6787 MRCPv2 November 2012
+
+
+10.4.4. Completion-Reason
+
+ This header field MAY be present in a RECORD-COMPLETE event coming
+ from the recorder resource to the client. It contains the reason
+ text behind the RECORD request completion. This header field
+ communicates text describing the reason for the failure.
+
+ The completion reason text is provided for client use in logs and for
+ debugging and instrumentation purposes. Clients MUST NOT interpret
+ the completion reason text.
+
+ completion-reason = "Completion-Reason" ":"
+ quoted-string CRLF
+
+10.4.5. Failed-URI
+
+ When a recorder method needs to post the audio to a URI and access to
+ the URI fails, the server MUST provide the failed URI in this header
+ field in the method response.
+
+ failed-uri = "Failed-URI" ":" absoluteURI CRLF
+
+10.4.6. Failed-URI-Cause
+
+ When a recorder method needs to post the audio to a URI and access to
+ the URI fails, the server MAY provide the URI-specific or protocol-
+ specific response code through this header field in the method
+ response. The value encoding is UTF-8 (RFC 3629 [RFC3629]) to
+ accommodate any access protocol -- some access protocols might have a
+ response string instead of a numeric response code.
+
+ failed-uri-cause = "Failed-URI-Cause" ":" 1*UTFCHAR
+ CRLF
+
+10.4.7. Record-URI
+
+ When a recorder method contains this header field, the server MUST
+ capture the audio and store it. If the header field is present but
+ specified with no value, the server MUST store the content locally
+ and generate a URI that points to it. This URI is then returned in
+ either the STOP response or the RECORD-COMPLETE event. If the header
+ field in the RECORD method specifies a URI, the server MUST attempt
+ to capture and store the audio at that location. If this header
+ field is not specified in the RECORD request, the server MUST capture
+ the audio, MUST encode it, and MUST send it in the STOP response or
+ the RECORD-COMPLETE event as a message body. In this case, the
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 132]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ response carrying the audio content MUST include a Content ID (cid)
+ [RFC2392] value in this header pointing to the Content-ID in the
+ message body.
+
+ The server MUST also return the size in octets and the duration in
+ milliseconds of the recorded audio waveform as parameters associated
+ with the header field.
+
+ Implementations MUST support 'http' [RFC2616], 'https' [RFC2818],
+ 'file' [RFC3986], and 'cid' [RFC2392] schemes in the URI. Note that
+ implementations already exist that support other schemes.
+
+ record-uri = "Record-URI" ":" ["<" uri ">"
+ ";" "size" "=" 1*19DIGIT
+ ";" "duration" "=" 1*19DIGIT] CRLF
+
+10.4.8. Media-Type
+
+ A RECORD method MUST contain this header field, which specifies to
+ the server the media type of the captured audio or video.
+
+ media-type = "Media-Type" ":" media-type-value
+ CRLF
+
+10.4.9. Max-Time
+
+ When recording is started, this specifies the maximum length of the
+ recording in milliseconds, calculated from the time the actual
+ capture and store begins and is not necessarily the time the RECORD
+ method is received. It specifies the duration before silence
+ suppression, if any, has been applied by the recorder resource.
+ After this time, the recording stops and the server MUST return a
+ RECORD-COMPLETE event to the client having a request-state of
+ COMPLETE. This header field MAY occur in RECORD, SET-PARAMS, or GET-
+ PARAMS. The value for this header field ranges from 0 to an
+ implementation-specific maximum value. A value of 0 means infinity,
+ and hence the recording continues until one or more of the other stop
+ conditions are met. The default value for this header field is 0.
+
+ max-time = "Max-Time" ":" 1*19DIGIT CRLF
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 133]
+
+RFC 6787 MRCPv2 November 2012
+
+
+10.4.10. Trim-Length
+
+ This header field MAY be sent on a STOP method and specifies the
+ length of audio to be trimmed from the end of the recording after the
+ stop. The length is interpreted to be in milliseconds. The default
+ value for this header field is 0.
+
+ trim-length = "Trim-Length" ":" 1*19DIGIT CRLF
+
+10.4.11. Final-Silence
+
+ When the recorder is started and the actual capture begins, this
+ header field specifies the length of silence in the audio that is to
+ be interpreted as the end of the recording. This header field MAY
+ occur in RECORD, SET-PARAMS, or GET-PARAMS. The value for this
+ header field ranges from 0 to an implementation-specific maximum
+ value and is interpreted to be in milliseconds. A value of 0 means
+ infinity, and hence the recording will continue until one of the
+ other stop conditions are met. The default value for this header
+ field is implementation specific.
+
+ final-silence = "Final-Silence" ":" 1*19DIGIT CRLF
+
+10.4.12. Capture-On-Speech
+
+ If "false", the recorder MUST start capturing immediately when
+ started. If "true", the recorder MUST wait for the endpointing
+ functionality to detect speech before it starts capturing. This
+ header field MAY occur in the RECORD, SET-PARAMS, or GET-PARAMS. The
+ value for this header field is a Boolean. The default value for this
+ header field is "false".
+
+ capture-on-speech = "Capture-On-Speech " ":" BOOLEAN CRLF
+
+10.4.13. Ver-Buffer-Utterance
+
+ This header field is the same as the one described for the verifier
+ resource (see Section 11.4.14). This tells the server to buffer the
+ utterance associated with this recording request into the
+ verification buffer. Sending this header field is permitted only if
+ the verification buffer is for the session. This buffer is shared
+ across resources within a session. It gets instantiated when a
+ verifier resource is added to this session and is released when the
+ verifier resource is released from the session.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 134]
+
+RFC 6787 MRCPv2 November 2012
+
+
+10.4.14. Start-Input-Timers
+
+ This header field MAY be sent as part of the RECORD request. A value
+ of "false" tells the recorder resource to start the operation, but
+ not to start the no-input timer until the client sends a START-INPUT-
+ TIMERS request to the recorder resource. This is useful in the
+ scenario when the recorder and synthesizer resources are not part of
+ the same session. When a kill-on-barge-in prompt is being played,
+ the client may want the RECORD request to be simultaneously active so
+ that it can detect and implement kill-on-barge-in (see
+ Section 8.4.2). But at the same time, the client doesn't want the
+ recorder resource to start the no-input timers until the prompt is
+ finished. The default value is "true".
+
+ start-input-timers = "Start-Input-Timers" ":"
+ BOOLEAN CRLF
+
+10.4.15. New-Audio-Channel
+
+ This header field is the same as the one described for the recognizer
+ resource (see Section 9.4.23).
+
+10.5. Recorder Message Body
+
+ If the RECORD request did not have a Record-URI header field, the
+ STOP response or the RECORD-COMPLETE event MUST contain a message
+ body carrying the captured audio. In this case, the message carrying
+ the audio content has a Record-URI header field with a Content ID
+ value pointing to the message body entity that contains the recorded
+ audio. See Section 10.4.7 for details.
+
+10.6. RECORD
+
+ The RECORD request places the recorder resource in the recording
+ state. Depending on the header fields specified in the RECORD
+ method, the resource may start recording the audio immediately or
+ wait for the endpointing functionality to detect speech in the audio.
+ The audio is then made available to the client either in the message
+ body or as specified by Record-URI.
+
+ The server MUST support the 'https' URI scheme and MAY support other
+ schemes. Note that, due to the sensitive nature of voice recordings,
+ any protocols used for dereferencing SHOULD employ integrity and
+ confidentiality, unless other means, such as use of a controlled
+ environment (see Section 4.2), are employed.
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 135]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ If a RECORD operation is already in progress, invoking this method
+ causes the server to issue a response having a status-code of 402
+ "Method not valid in this state" and a request-state of COMPLETE.
+
+ If the Record-URI is not valid, a status-code of 404 "Illegal Value
+ for Header Field" is returned in the response. If it is impossible
+ for the server to create the requested stored content, a status-code
+ of 407 "Method or Operation Failed" is returned.
+
+ If the type specified in the Media-Type header field is not
+ supported, the server MUST respond with a status-code of 409
+ "Unsupported Header Field Value" with the Media-Type header field in
+ its response.
+
+ When the recording operation is initiated, the response indicates an
+ IN-PROGRESS request state. The server MAY generate a subsequent
+ START-OF-INPUT event when speech is detected. Upon completion of the
+ recording operation, the server generates a RECORD-COMPLETE event.
+
+ C->S: MRCP/2.0 ... RECORD 543257
+ Channel-Identifier:32AECB23433802@recorder
+ Record-URI:<file://mediaserver/recordings/myfile.wav>
+ Media-Type:audio/wav
+ Capture-On-Speech:true
+ Final-Silence:300
+ Max-Time:6000
+
+ S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@recorder
+
+ S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@recorder
+
+ S->C: MRCP/2.0 ... RECORD-COMPLETE 543257 COMPLETE
+ Channel-Identifier:32AECB23433802@recorder
+ Completion-Cause:000 success-silence
+ Record-URI:<file://mediaserver/recordings/myfile.wav>;
+ size=242552;duration=25645
+
+ RECORD Example
+
+10.7. STOP
+
+ The STOP method moves the recorder from the recording state back to
+ the idle state. If a RECORD request is active and the STOP request
+ successfully terminates it, then the STOP response MUST contain an
+ Active-Request-Id-List header field containing the RECORD request-id
+ that was terminated. In this case, no RECORD-COMPLETE event is sent
+
+
+
+Burnett & Shanmugham Standards Track [Page 136]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ for the terminated request. If there was no recording active, then
+ the response MUST NOT contain an Active-Request-Id-List header field.
+ If the recording was a success, the STOP response MUST contain a
+ Record-URI header field pointing to the recorded audio content or to
+ a typed entity in the body of the STOP response containing the
+ recorded audio. The STOP method MAY have a Trim-Length header field,
+ in which case the specified length of audio is trimmed from the end
+ of the recording after the stop. In any case, the response MUST
+ contain a status-code of 200 "Success".
+
+ C->S: MRCP/2.0 ... RECORD 543257
+ Channel-Identifier:32AECB23433802@recorder
+ Record-URI:<file://mediaserver/recordings/myfile.wav>
+ Capture-On-Speech:true
+ Final-Silence:300
+ Max-Time:6000
+
+ S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@recorder
+
+ S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@recorder
+
+ C->S: MRCP/2.0 ... STOP 543257
+ Channel-Identifier:32AECB23433802@recorder
+ Trim-Length:200
+
+ S->C: MRCP/2.0 ... 543257 200 COMPLETE
+ Channel-Identifier:32AECB23433802@recorder
+ Record-URI:<file://mediaserver/recordings/myfile.wav>;
+ size=324253;duration=24561
+ Active-Request-Id-List:543257
+
+ STOP Example
+
+10.8. RECORD-COMPLETE
+
+ If the recording completes due to no input, silence after speech, or
+ reaching the max-time, the server MUST generate the RECORD-COMPLETE
+ event to the client with a request-state of COMPLETE. If the
+ recording was a success, the RECORD-COMPLETE event contains a Record-
+ URI header field pointing to the recorded audio file on the server or
+ to a typed entity in the message body containing the recorded audio.
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 137]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ C->S: MRCP/2.0 ... RECORD 543257
+ Channel-Identifier:32AECB23433802@recorder
+ Record-URI:<file://mediaserver/recordings/myfile.wav>
+ Capture-On-Speech:true
+ Final-Silence:300
+ Max-Time:6000
+
+ S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@recorder
+
+ S->C: MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
+ Channel-Identifier:32AECB23433802@recorder
+
+ S->C: MRCP/2.0 ... RECORD-COMPLETE 543257 COMPLETE
+ Channel-Identifier:32AECB23433802@recorder
+ Completion-Cause:000 success
+ Record-URI:<file://mediaserver/recordings/myfile.wav>;
+ size=325325;duration=24652
+
+ RECORD-COMPLETE Example
+
+10.9. START-INPUT-TIMERS
+
+ This request is sent from the client to the recorder resource when it
+ discovers that a kill-on-barge-in prompt has finished playing (see
+ Section 8.4.2). This is useful in the scenario when the recorder and
+ synthesizer resources are not in the same MRCPv2 session. When a
+ kill-on-barge-in prompt is being played, the client wants the RECORD
+ request to be simultaneously active so that it can detect and
+ implement kill-on-barge-in. But at the same time, the client doesn't
+ want the recorder resource to start the no-input timers until the
+ prompt is finished. The Start-Input-Timers header field in the
+ RECORD request allows the client to say if the timers should be
+ started or not. In the above case, the recorder resource does not
+ start the timers until the client sends a START-INPUT-TIMERS method
+ to the recorder.
+
+10.10. START-OF-INPUT
+
+ The START-OF-INPUT event is returned from the server to the client
+ once the server has detected speech. This event is always returned
+ by the recorder resource when speech has been detected. The recorder
+ resource also MUST send a Proxy-Sync-Id header field with a unique
+ value for this event.
+
+ S->C: MRCP/2.0 ... START-OF-INPUT 543259 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@recorder
+ Proxy-Sync-Id:987654321
+
+
+
+Burnett & Shanmugham Standards Track [Page 138]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11. Speaker Verification and Identification
+
+ This section describes the methods, responses and events employed by
+ MRCPv2 for doing speaker verification/identification.
+
+ Speaker verification is a voice authentication methodology that can
+ be used to identify the speaker in order to grant the user access to
+ sensitive information and transactions. Because speech is a
+ biometric, a number of essential security considerations related to
+ biometric authentication technologies apply to its implementation and
+ usage. Implementers should carefully read Section 12 in this
+ document and the corresponding section of the SPEECHSC requirements
+ [RFC4313]. Implementers and deployers of this technology are
+ strongly encouraged to check the state of the art for any new risks
+ and solutions that might have been developed.
+
+ In speaker verification, a recorded utterance is compared to a
+ previously stored voiceprint, which is in turn associated with a
+ claimed identity for that user. Verification typically consists of
+ two phases: a designation phase to establish the claimed identity of
+ the caller and an execution phase in which a voiceprint is either
+ created (training) or used to authenticate the claimed identity
+ (verification).
+
+ Speaker identification is the process of associating an unknown
+ speaker with a member in a population. It does not employ a claim of
+ identity. When an individual claims to belong to a group (e.g., one
+ of the owners of a joint bank account) a group authentication is
+ performed. This is generally implemented as a kind of verification
+ involving comparison with more than one voice model. It is sometimes
+ called 'multi-verification'. If the individual speaker can be
+ identified from the group, this may be useful for applications where
+ multiple users share the same access privileges to some data or
+ application. Speaker identification and group authentication are
+ also done in two phases, a designation phase and an execution phase.
+ Note that, from a functionality standpoint, identification can be
+ thought of as a special case of group authentication (if the
+ individual is identified) where the group is the entire population,
+ although the implementation of speaker identification may be
+ different from the way group authentication is performed. To
+ accommodate single-voiceprint verification, verification against
+ multiple voiceprints, group authentication, and identification, this
+ specification provides a single set of methods that can take a list
+ of identifiers, called "voiceprint identifiers", and return a list of
+ identifiers, with a score for each that represents how well the input
+ speech matched each identifier. The input and output lists of
+ identifiers do not have to match, allowing a vendor-specific group
+ identifier to be used as input to indicate that identification is to
+
+
+
+Burnett & Shanmugham Standards Track [Page 139]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ be performed. In this specification, the terms "identification" and
+ "multi-verification" are used to indicate that the input represents a
+ group (potentially the entire population) and that results for
+ multiple voiceprints may be returned.
+
+ It is possible for a verifier resource to share the same session with
+ a recognizer resource or to operate independently. In order to share
+ the same session, the verifier and recognizer resources MUST be
+ allocated from within the same SIP dialog. Otherwise, an independent
+ verifier resource, running on the same physical server or a separate
+ one, will be set up. Note that, in addition to allowing both
+ resources to be allocated in the same INVITE, it is possible to
+ allocate one initially and the other later via a re-INVITE.
+
+ Some of the speaker verification methods, described below, apply only
+ to a specific mode of operation.
+
+ The verifier resource has a verification buffer associated with it
+ (see Section 11.4.14). This allows the storage of speech utterances
+ for the purposes of verification, identification, or training from
+ the buffered speech. This buffer is owned by the verifier resource,
+ but other input resources (such as the recognizer resource or
+ recorder resource) may write to it. This allows the speech received
+ as part of a recognition or recording operation to be later used for
+ verification, identification, or training. Access to the buffer is
+ limited to one operation at time. Hence, when the resource is doing
+ read, write, or delete operations, such as a RECOGNIZE with
+ ver-buffer-utterance turned on, another operation involving the
+ buffer fails with a status-code of 402. The verification buffer can
+ be cleared by a CLEAR-BUFFER request from the client and is freed
+ when the verifier resource is deallocated or the session with the
+ server terminates.
+
+ The verification buffer is different from collecting waveforms and
+ processing them using either the real-time audio stream or stored
+ audio, because this buffering mechanism does not simply accumulate
+ speech to a buffer. The verification buffer MAY contain additional
+ information gathered by the recognizer resource that serves to
+ improve verification performance.
+
+11.1. Speaker Verification State Machine
+
+ Speaker verification may operate in a training or a verification
+ session. Starting one of these sessions does not change the state of
+ the verifier resource, i.e., it remains idle. Once a verification or
+ training session is started, then utterances are trained or verified
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 140]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ by calling the VERIFY or VERIFY-FROM-BUFFER method. The state of the
+ verifier resources goes from IDLE to VERIFYING state each time VERIFY
+ or VERIFY-FROM-BUFFER is called.
+
+ Idle Session Opened Verifying/Training
+ State State State
+ | | |
+ |--START-SESSION--->| |
+ | | |
+ | |----------| |
+ | | START-SESSION |
+ | |<---------| |
+ | | |
+ |<--END-SESSION-----| |
+ | | |
+ | |---------VERIFY--------->|
+ | | |
+ | |---VERIFY-FROM-BUFFER--->|
+ | | |
+ | |----------| |
+ | | VERIFY-ROLLBACK |
+ | |<---------| |
+ | | |
+ | | |--------|
+ | | GET-INTERMEDIATE-RESULT |
+ | | |------->|
+ | | |
+ | | |--------|
+ | | START-INPUT-TIMERS |
+ | | |------->|
+ | | |
+ | | |--------|
+ | | START-OF-INPUT |
+ | | |------->|
+ | | |
+ | |<-VERIFICATION-COMPLETE--|
+ | | |
+ | |<--------STOP------------|
+ | | |
+ | |----------| |
+ | | STOP |
+ | |<---------| |
+ | | |
+ |----------| | |
+ | STOP | |
+ |<---------| | |
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 141]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ | |----------| |
+ | | CLEAR-BUFFER |
+ | |<---------| |
+ | | |
+ |----------| | |
+ | CLEAR-BUFFER | |
+ |<---------| | |
+ | | |
+ | |----------| |
+ | | QUERY-VOICEPRINT |
+ | |<---------| |
+ | | |
+ |----------| | |
+ | QUERY-VOICEPRINT | |
+ |<---------| | |
+ | | |
+ | |----------| |
+ | | DELETE-VOICEPRINT |
+ | |<---------| |
+ | | |
+ |----------| | |
+ | DELETE-VOICEPRINT | |
+ |<---------| | |
+
+ Verifier Resource State Machine
+
+11.2. Speaker Verification Methods
+
+ The verifier resource supports the following methods.
+
+ verifier-method = "START-SESSION"
+ / "END-SESSION"
+ / "QUERY-VOICEPRINT"
+ / "DELETE-VOICEPRINT"
+ / "VERIFY"
+ / "VERIFY-FROM-BUFFER"
+ / "VERIFY-ROLLBACK"
+ / "STOP"
+ / "CLEAR-BUFFER"
+ / "START-INPUT-TIMERS"
+ / "GET-INTERMEDIATE-RESULT"
+
+ These methods allow the client to control the mode and target of
+ verification or identification operations within the context of a
+ session. All the verification input operations that occur within a
+ session can be used to create, update, or validate against the
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 142]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ voiceprint specified during the session. At the beginning of each
+ session, the verifier resource is reset to the state it had prior to
+ any previous verification session.
+
+ Verification/identification operations can be executed against live
+ or buffered audio. The verifier resource provides methods for
+ collecting and evaluating live audio data, and methods for
+ controlling the verifier resource and adjusting its configured
+ behavior.
+
+ There are no dedicated methods for collecting buffered audio data.
+ This is accomplished by calling VERIFY, RECOGNIZE, or RECORD as
+ appropriate for the resource, with the header field
+ Ver-Buffer-Utterance. Then, when the following method is called,
+ verification is performed using the set of buffered audio.
+
+ 1. VERIFY-FROM-BUFFER
+
+ The following methods are used for verification of live audio
+ utterances:
+
+ 1. VERIFY
+
+ 2. START-INPUT-TIMERS
+
+ The following methods are used for configuring the verifier resource
+ and for establishing resource states:
+
+ 1. START-SESSION
+
+ 2. END-SESSION
+
+ 3. QUERY-VOICEPRINT
+
+ 4. DELETE-VOICEPRINT
+
+ 5. VERIFY-ROLLBACK
+
+ 6. STOP
+
+ 7. CLEAR-BUFFER
+
+ The following method allows the polling of a verification in progress
+ for intermediate results.
+
+ 1. GET-INTERMEDIATE-RESULT
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 143]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.3. Verification Events
+
+ The verifier resource generates the following events.
+
+ verifier-event = "VERIFICATION-COMPLETE"
+ / "START-OF-INPUT"
+
+11.4. Verification Header Fields
+
+ A verifier resource message can contain header fields containing
+ request options and information to augment the Request, Response, or
+ Event message it is associated with.
+
+ verification-header = repository-uri
+ / voiceprint-identifier
+ / verification-mode
+ / adapt-model
+ / abort-model
+ / min-verification-score
+ / num-min-verification-phrases
+ / num-max-verification-phrases
+ / no-input-timeout
+ / save-waveform
+ / media-type
+ / waveform-uri
+ / voiceprint-exists
+ / ver-buffer-utterance
+ / input-waveform-uri
+ / completion-cause
+ / completion-reason
+ / speech-complete-timeout
+ / new-audio-channel
+ / abort-verification
+ / start-input-timers
+
+11.4.1. Repository-URI
+
+ This header field specifies the voiceprint repository to be used or
+ referenced during speaker verification or identification operations.
+ This header field is required in the START-SESSION, QUERY-VOICEPRINT,
+ and DELETE-VOICEPRINT methods.
+
+ repository-uri = "Repository-URI" ":" uri CRLF
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 144]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.4.2. Voiceprint-Identifier
+
+ This header field specifies the claimed identity for verification
+ applications. The claimed identity MAY be used to specify an
+ existing voiceprint or to establish a new voiceprint. This header
+ field MUST be present in the QUERY-VOICEPRINT and DELETE-VOICEPRINT
+ methods. The Voiceprint-Identifier MUST be present in the START-
+ SESSION method for verification operations. For identification or
+ multi-verification operations, this header field MAY contain a list
+ of voiceprint identifiers separated by semicolons. For
+ identification operations, the client MAY also specify a voiceprint
+ group identifier instead of a list of voiceprint identifiers.
+
+ voiceprint-identifier = "Voiceprint-Identifier" ":"
+ vid *[";" vid] CRLF
+ vid = 1*VCHAR ["." 1*VCHAR]
+
+11.4.3. Verification-Mode
+
+ This header field specifies the mode of the verifier resource and is
+ set by the START-SESSION method. Acceptable values indicate whether
+ the verification session will train a voiceprint ("train") or verify/
+ identify using an existing voiceprint ("verify").
+
+ Training and verification sessions both require the voiceprint
+ Repository-URI to be specified in the START-SESSION. In many usage
+ scenarios, however, the system does not know the speaker's claimed
+ identity until a recognition operation has, for example, recognized
+ an account number to which the user desires access. In order to
+ allow the first few utterances of a dialog to be both recognized and
+ verified, the verifier resource on the MRCPv2 server retains a
+ buffer. In this buffer, the MRCPv2 server accumulates recognized
+ utterances. The client can later execute a verification method and
+ apply the buffered utterances to the current verification session.
+
+ Some voice user interfaces may require additional user input that
+ should not be subject to verification. For example, the user's input
+ may have been recognized with low confidence and thus require a
+ confirmation cycle. In such cases, the client SHOULD NOT execute the
+ VERIFY or VERIFY-FROM-BUFFER methods to collect and analyze the
+ caller's input. A separate recognizer resource can analyze the
+ caller's response without any participation by the verifier resource.
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 145]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Once the following conditions have been met:
+
+ 1. the voiceprint identity has been successfully established through
+ the Voiceprint-Identifier header fields of the START-SESSION
+ method, and
+
+ 2. the verification mode has been set to one of "train" or "verify",
+
+ the verifier resource can begin providing verification information
+ during verification operations. If the verifier resource does not
+ reach one of the two major states ("train" or "verify") , it MUST
+ report an error condition in the MRCPv2 status code to indicate why
+ the verifier resource is not ready for the corresponding usage.
+
+ The value of verification-mode is persistent within a verification
+ session. If the client attempts to change the mode during a
+ verification session, the verifier resource reports an error and the
+ mode retains its current value.
+
+ verification-mode = "Verification-Mode" ":"
+ verification-mode-string
+
+ verification-mode-string = "train"
+ / "verify"
+
+11.4.4. Adapt-Model
+
+ This header field indicates the desired behavior of the verifier
+ resource after a successful verification operation. If the value of
+ this header field is "true", the server SHOULD use audio collected
+ during the verification session to update the voiceprint to account
+ for ongoing changes in a speaker's incoming speech characteristics,
+ unless local policy prohibits updating the voiceprint. If the value
+ is "false" (the default), the server MUST NOT update the voiceprint.
+ This header field MAY occur in the START-SESSION method.
+
+ adapt-model = "Adapt-Model" ":" BOOLEAN CRLF
+
+11.4.5. Abort-Model
+
+ The Abort-Model header field indicates the desired behavior of the
+ verifier resource upon session termination. If the value of this
+ header field is "true", the server MUST discard any pending changes
+ to a voiceprint due to verification training or verification
+ adaptation. If the value is "false" (the default), the server MUST
+ commit any pending changes for a training session or a successful
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 146]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ verification session to the voiceprint repository. A value of "true"
+ for Abort-Model overrides a value of "true" for the Adapt-Model
+ header field. This header field MAY occur in the END-SESSION method.
+
+ abort-model = "Abort-Model" ":" BOOLEAN CRLF
+
+11.4.6. Min-Verification-Score
+
+ The Min-Verification-Score header field, when used with a verifier
+ resource through a SET-PARAMS, GET-PARAMS, or START-SESSION method,
+ determines the minimum verification score for which a verification
+ decision of "accepted" may be declared by the server. This is a
+ float value between -1.0 and 1.0. The default value for this header
+ field is implementation specific.
+
+ min-verification-score = "Min-Verification-Score" ":"
+ [ %x2D ] FLOAT CRLF
+
+11.4.7. Num-Min-Verification-Phrases
+
+ The Num-Min-Verification-Phrases header field is used to specify the
+ minimum number of valid utterances before a positive decision is
+ given for verification. The value for this header field is an
+ integer and the default value is 1. The verifier resource MUST NOT
+ declare a verification 'accepted' unless Num-Min-Verification-Phrases
+ valid utterances have been received. The minimum value is 1. This
+ header field MAY occur in START-SESSION, SET-PARAMS, or GET-PARAMS.
+
+ num-min-verification-phrases = "Num-Min-Verification-Phrases" ":"
+ 1*19DIGIT CRLF
+
+11.4.8. Num-Max-Verification-Phrases
+
+ The Num-Max-Verification-Phrases header field is used to specify the
+ number of valid utterances required before a decision is forced for
+ verification. The verifier resource MUST NOT return a decision of
+ 'undecided' once Num-Max-Verification-Phrases have been collected and
+ used to determine a verification score. The value for this header
+ field is an integer and the minimum value is 1. The default value is
+ implementation specific. This header field MAY occur in START-
+ SESSION, SET-PARAMS, or GET-PARAMS.
+
+ num-max-verification-phrases = "Num-Max-Verification-Phrases" ":"
+ 1*19DIGIT CRLF
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 147]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.4.9. No-Input-Timeout
+
+ The No-Input-Timeout header field sets the length of time from the
+ start of the verification timers (see START-INPUT-TIMERS) until the
+ VERIFICATION-COMPLETE server event message declares that no input has
+ been received (i.e., has a Completion-Cause of no-input-timeout).
+ The value is in milliseconds. This header field MAY occur in VERIFY,
+ SET-PARAMS, or GET-PARAMS. The value for this header field ranges
+ from 0 to an implementation-specific maximum value. The default
+ value for this header field is implementation specific.
+
+ no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT CRLF
+
+11.4.10. Save-Waveform
+
+ This header field allows the client to request that the verifier
+ resource save the audio stream that was used for verification/
+ identification. The verifier resource MUST attempt to record the
+ audio and make it available to the client in the form of a URI
+ returned in the Waveform-URI header field in the VERIFICATION-
+ COMPLETE event. If there was an error in recording the stream, or
+ the audio content is otherwise not available, the verifier resource
+ MUST return an empty Waveform-URI header field. The default value
+ for this header field is "false". This header field MAY appear in
+ the VERIFY method. Note that this header field does not appear in
+ the VERIFY-FROM-BUFFER method since it only controls whether or not
+ to save the waveform for live verification/identification operations.
+
+ save-waveform = "Save-Waveform" ":" BOOLEAN CRLF
+
+11.4.11. Media-Type
+
+ This header field MAY be specified in the SET-PARAMS, GET-PARAMS, or
+ the VERIFY methods and tells the server resource the media type of
+ the captured audio or video such as the one captured and returned by
+ the Waveform-URI header field.
+
+ media-type = "Media-Type" ":" media-type-value
+ CRLF
+
+11.4.12. Waveform-URI
+
+ If the Save-Waveform header field is set to "true", the verifier
+ resource MUST attempt to record the incoming audio stream of the
+ verification into a file and provide a URI for the client to access
+ it. This header field MUST be present in the VERIFICATION-COMPLETE
+ event if the Save-Waveform header field was set to true by the
+ client. The value of the header field MUST be empty if there was
+
+
+
+Burnett & Shanmugham Standards Track [Page 148]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ some error condition preventing the server from recording.
+ Otherwise, the URI generated by the server MUST be globally unique
+ across the server and all its verification sessions. The content
+ MUST be available via the URI until the verification session ends.
+ Since the Save-Waveform header field applies only to live
+ verification/identification operations, the server can return the
+ Waveform-URI only in the VERIFICATION-COMPLETE event for live
+ verification/identification operations.
+
+ The server MUST also return the size in octets and the duration in
+ milliseconds of the recorded audio waveform as parameters associated
+ with the header field.
+
+ waveform-uri = "Waveform-URI" ":" ["<" uri ">"
+ ";" "size" "=" 1*19DIGIT
+ ";" "duration" "=" 1*19DIGIT] CRLF
+
+11.4.13. Voiceprint-Exists
+
+ This header field MUST be returned in QUERY-VOICEPRINT and DELETE-
+ VOICEPRINT responses. This is the status of the voiceprint specified
+ in the QUERY-VOICEPRINT method. For the DELETE-VOICEPRINT method,
+ this header field indicates the status of the voiceprint at the
+ moment the method execution started.
+
+ voiceprint-exists = "Voiceprint-Exists" ":" BOOLEAN CRLF
+
+11.4.14. Ver-Buffer-Utterance
+
+ This header field is used to indicate that this utterance could be
+ later considered for speaker verification. This way, a client can
+ request the server to buffer utterances while doing regular
+ recognition or verification activities, and speaker verification can
+ later be requested on the buffered utterances. This header field is
+ optional in the RECOGNIZE, VERIFY, and RECORD methods. The default
+ value for this header field is "false".
+
+ ver-buffer-utterance = "Ver-Buffer-Utterance" ":" BOOLEAN
+ CRLF
+
+11.4.15. Input-Waveform-URI
+
+ This header field specifies stored audio content that the client
+ requests the server to fetch and process according to the current
+ verification mode, either to train the voiceprint or verify a claimed
+ identity. This header field enables the client to implement the
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 149]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ buffering use case where the recognizer and verifier resources are in
+ different sessions and the verification buffer technique cannot be
+ used. It MAY be specified on the VERIFY request.
+
+ input-waveform-uri = "Input-Waveform-URI" ":" uri CRLF
+
+11.4.16. Completion-Cause
+
+ This header field MUST be part of a VERIFICATION-COMPLETE event from
+ the verifier resource to the client. This indicates the cause of
+ VERIFY or VERIFY-FROM-BUFFER method completion. This header field
+ MUST be sent in the VERIFY, VERIFY-FROM-BUFFER, and QUERY-VOICEPRINT
+ responses, if they return with a failure status and a COMPLETE state.
+ In the ABNF below, the 'cause-code' contains a numerical value
+ selected from the Cause-Code column of the following table. The
+ 'cause-name' contains the corresponding token selected from the
+ Cause-Name column.
+
+ completion-cause = "Completion-Cause" ":" cause-code SP
+ cause-name CRLF
+ cause-code = 3DIGIT
+ cause-name = *VCHAR
+
+ +------------+--------------------------+---------------------------+
+ | Cause-Code | Cause-Name | Description |
+ +------------+--------------------------+---------------------------+
+ | 000 | success | VERIFY or |
+ | | | VERIFY-FROM-BUFFER |
+ | | | request completed |
+ | | | successfully. The verify |
+ | | | decision can be |
+ | | | "accepted", "rejected", |
+ | | | or "undecided". |
+ | 001 | error | VERIFY or |
+ | | | VERIFY-FROM-BUFFER |
+ | | | request terminated |
+ | | | prematurely due to a |
+ | | | verifier resource or |
+ | | | system error. |
+ | 002 | no-input-timeout | VERIFY request completed |
+ | | | with no result due to a |
+ | | | no-input-timeout. |
+ | 003 | too-much-speech-timeout | VERIFY request completed |
+ | | | with no result due to too |
+ | | | much speech. |
+ | 004 | speech-too-early | VERIFY request completed |
+ | | | with no result due to |
+ | | | speech too soon. |
+
+
+
+Burnett & Shanmugham Standards Track [Page 150]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ | 005 | buffer-empty | VERIFY-FROM-BUFFER |
+ | | | request completed with no |
+ | | | result due to empty |
+ | | | buffer. |
+ | 006 | out-of-sequence | Verification operation |
+ | | | failed due to |
+ | | | out-of-sequence method |
+ | | | invocations, for example, |
+ | | | calling VERIFY before |
+ | | | QUERY-VOICEPRINT. |
+ | 007 | repository-uri-failure | Failure accessing |
+ | | | Repository URI. |
+ | 008 | repository-uri-missing | Repository-URI is not |
+ | | | specified. |
+ | 009 | voiceprint-id-missing | Voiceprint-Identifier is |
+ | | | not specified. |
+ | 010 | voiceprint-id-not-exist | Voiceprint-Identifier |
+ | | | does not exist in the |
+ | | | voiceprint repository. |
+ | 011 | speech-not-usable | VERIFY request completed |
+ | | | with no result because |
+ | | | the speech was not usable |
+ | | | (too noisy, too short, |
+ | | | etc.) |
+ +------------+--------------------------+---------------------------+
+
+11.4.17. Completion-Reason
+
+ This header field MAY be specified in a VERIFICATION-COMPLETE event
+ coming from the verifier resource to the client. It contains the
+ reason text behind the VERIFY request completion. This header field
+ communicates text describing the reason for the failure.
+
+ The completion reason text is provided for client use in logs and for
+ debugging and instrumentation purposes. Clients MUST NOT interpret
+ the completion reason text.
+
+ completion-reason = "Completion-Reason" ":"
+ quoted-string CRLF
+
+11.4.18. Speech-Complete-Timeout
+
+ This header field is the same as the one described for the Recognizer
+ resource. See Section 9.4.15. This header field MAY occur in
+ VERIFY, SET-PARAMS, or GET-PARAMS.
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 151]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.4.19. New-Audio-Channel
+
+ This header field is the same as the one described for the Recognizer
+ resource. See Section 9.4.23. This header field MAY be specified in
+ a VERIFY request.
+
+11.4.20. Abort-Verification
+
+ This header field MUST be sent in a STOP request to indicate whether
+ or not to abort a VERIFY method in progress. A value of "true"
+ requests the server to discard the results. A value of "false"
+ requests the server to return in the STOP response the verification
+ results obtained up to the point it received the STOP request.
+
+ abort-verification = "Abort-Verification " ":" BOOLEAN CRLF
+
+11.4.21. Start-Input-Timers
+
+ This header field MAY be sent as part of a VERIFY request. A value
+ of "false" tells the verifier resource to start the VERIFY operation
+ but not to start the no-input timer yet. The verifier resource MUST
+ NOT start the timers until the client sends a START-INPUT-TIMERS
+ request to the resource. This is useful in the scenario when the
+ verifier and synthesizer resources are not part of the same session.
+ In this scenario, when a kill-on-barge-in prompt is being played, the
+ client may want the VERIFY request to be simultaneously active so
+ that it can detect and implement kill-on-barge-in (see
+ Section 8.4.2). But at the same time, the client doesn't want the
+ verifier resource to start the no-input timers until the prompt is
+ finished. The default value is "true".
+
+ start-input-timers = "Start-Input-Timers" ":"
+ BOOLEAN CRLF
+
+11.5. Verification Message Body
+
+ A verification response or event message can carry additional data as
+ described in the following subsection.
+
+11.5.1. Verification Result Data
+
+ Verification results are returned to the client in the message body
+ of the VERIFICATION-COMPLETE event or the GET-INTERMEDIATE-RESULT
+ response message as described in Section 6.3. Element and attribute
+ descriptions for the verification portion of the NLSML format are
+ provided in Section 11.5.2 with a normative definition of the schema
+ in Section 16.3.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 152]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.5.2. Verification Result Elements
+
+ All verification elements are contained within a single
+ <verification-result> element under <result>. The elements are
+ described below and have the schema defined in Section 16.2. The
+ following elements are defined:
+
+ 1. <voiceprint>
+
+ 2. <incremental>
+
+ 3. <cumulative>
+
+ 4. <decision>
+
+ 5. <utterance-length>
+
+ 6. <device>
+
+ 7. <gender>
+
+ 8. <adapted>
+
+ 9. <verification-score>
+
+ 10. <vendor-specific-results>
+
+11.5.2.1. <voiceprint> Element
+
+ This element in the verification results provides information on how
+ the speech data matched a single voiceprint. The result data
+ returned MAY have more than one such entity in the case of
+ identification or multi-verification. Each <voiceprint> element and
+ the XML data within the element describe verification result
+ information for how well the speech data matched that particular
+ voiceprint. The list of <voiceprint> element data are ordered
+ according to their cumulative verification match scores, with the
+ highest score first.
+
+11.5.2.2. <cumulative> Element
+
+ Within each <voiceprint> element there MUST be a <cumulative> element
+ with the cumulative scores of how well multiple utterances matched
+ the voiceprint.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 153]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.5.2.3. <incremental> Element
+
+ The first <voiceprint> element MAY contain an <incremental> element
+ with the incremental scores of how well the last utterance matched
+ the voiceprint.
+
+11.5.2.4. <Decision> Element
+
+ This element is found within the <incremental> or <cumulative>
+ element within the verification results. Its value indicates the
+ verification decision. It can have the values of "accepted",
+ "rejected", or "undecided".
+
+11.5.2.5. <utterance-length> Element
+
+ This element MAY occur within either the <incremental> or
+ <cumulative> elements within the first <voiceprint> element. Its
+ value indicates the size in milliseconds, respectively, of the last
+ utterance or the cumulated set of utterances.
+
+11.5.2.6. <device> Element
+
+ This element is found within the <incremental> or <cumulative>
+ element within the verification results. Its value indicates the
+ apparent type of device used by the caller as determined by the
+ verifier resource. It can have the values of "cellular-phone",
+ "electret-phone", "carbon-button-phone", or "unknown".
+
+11.5.2.7. <gender> Element
+
+ This element is found within the <incremental> or <cumulative>
+ element within the verification results. Its value indicates the
+ apparent gender of the speaker as determined by the verifier
+ resource. It can have the values of "male", "female", or "unknown".
+
+11.5.2.8. <adapted> Element
+
+ This element is found within the first <voiceprint> element within
+ the verification results. When verification is trying to confirm the
+ voiceprint, this indicates if the voiceprint has been adapted as a
+ consequence of analyzing the source utterances. It is not returned
+ during verification training. The value can be "true" or "false".
+
+11.5.2.9. <verification-score> Element
+
+ This element is found within the <incremental> or <cumulative>
+ element within the verification results. Its value indicates the
+ score of the last utterance as determined by verification.
+
+
+
+Burnett & Shanmugham Standards Track [Page 154]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ During verification, the higher the score, the more likely it is that
+ the speaker is the same one as the one who spoke the voiceprint
+ utterances. During training, the higher the score, the more likely
+ the speaker is to have spoken all of the analyzed utterances. The
+ value is a floating point between -1.0 and 1.0. If there are no such
+ utterances, the score is -1. Note that the verification score is not
+ a probability value.
+
+11.5.2.10. <vendor-specific-results> Element
+
+ MRCPv2 servers MAY send verification results that contain
+ implementation-specific data that augment the information provided by
+ the MRCPv2-defined elements. Such data might be useful to clients
+ who have private knowledge of how to interpret these schema
+ extensions. Implementation-specific additions to the verification
+ results schema MUST belong to the vendor's own namespace. In the
+ result structure, either they MUST be indicated by a namespace prefix
+ declared within the result, or they MUST be children of an element
+ identified as belonging to the respective namespace.
+
+ The following example shows the results of three voiceprints. Note
+ that the first one has crossed the verification score threshold, and
+ the speaker has been accepted. The voiceprint was also adapted with
+ the most recent utterance.
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ grammar="What-Grammar-URI">
+ <verification-result>
+ <voiceprint id="johnsmith">
+ <adapted> true </adapted>
+ <incremental>
+ <utterance-length> 500 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> male </gender>
+ <decision> accepted </decision>
+ <verification-score> 0.98514 </verification-score>
+ </incremental>
+ <cumulative>
+ <utterance-length> 10000 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> male </gender>
+ <decision> accepted </decision>
+ <verification-score> 0.96725</verification-score>
+ </cumulative>
+ </voiceprint>
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 155]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <voiceprint id="marysmith">
+ <cumulative>
+ <verification-score> 0.93410 </verification-score>
+ </cumulative>
+ </voiceprint>
+ <voiceprint uri="juniorsmith">
+ <cumulative>
+ <verification-score> 0.74209 </verification-score>
+ </cumulative>
+ </voiceprint>
+ </verification-result>
+ </result>
+
+ Verification Results Example 1
+
+ In this next example, the verifier has enough information to decide
+ to reject the speaker.
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:xmpl="http://www.example.org/2003/12/mrcpv2"
+ grammar="What-Grammar-URI">
+ <verification-result>
+ <voiceprint id="johnsmith">
+ <incremental>
+ <utterance-length> 500 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> male </gender>
+ <verification-score> 0.88514 </verification-score>
+ <xmpl:raspiness> high </xmpl:raspiness>
+ <xmpl:emotion> sadness </xmpl:emotion>
+ </incremental>
+ <cumulative>
+ <utterance-length> 10000 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> male </gender>
+ <decision> rejected </decision>
+ <verification-score> 0.9345 </verification-score>
+ </cumulative>
+ </voiceprint>
+ </verification-result>
+ </result>
+
+ Verification Results Example 2
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 156]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.6. START-SESSION
+
+ The START-SESSION method starts a speaker verification or speaker
+ identification session. Execution of this method places the verifier
+ resource into its initial state. If this method is called during an
+ ongoing verification session, the previous session is implicitly
+ aborted. If this method is invoked when VERIFY or VERIFY-FROM-BUFFER
+ is active, the method fails and the server returns a status-code of
+ 402.
+
+ Upon completion of the START-SESSION method, the verifier resource
+ MUST have terminated any ongoing verification session and cleared any
+ voiceprint designation.
+
+ A verification session is associated with the voiceprint repository
+ to be used during the session. This is specified through the
+ Repository-URI header field (see Section 11.4.1).
+
+ The START-SESSION method also establishes, through the Voiceprint-
+ Identifier header field, which voiceprints are to be matched or
+ trained during the verification session. If this is an
+ Identification session or if the client wants to do Multi-
+ Verification, the Voiceprint-Identifier header field contains a list
+ of semicolon-separated voiceprint identifiers.
+
+ The Adapt-Model header field MAY also be present in the START-SESSION
+ request to indicate whether or not to adapt a voiceprint based on
+ data collected during the session (if the voiceprint verification
+ phase succeeds). By default, the voiceprint model MUST NOT be
+ adapted with data from a verification session.
+
+ The START-SESSION also determines whether the session is for a train
+ or verify of a voiceprint. Hence, the Verification-Mode header field
+ MUST be sent in every START-SESSION request. The value of the
+ Verification-Mode header field MUST be one of either "train" or
+ "verify".
+
+ Before a verification/identification session is started, the client
+ may only request that VERIFY-ROLLBACK and generic SET-PARAMS and
+ GET-PARAMS operations be performed on the verifier resource. The
+ server MUST return status-code 402 "Method not valid in this state"
+ for all other verification operations.
+
+ A verifier resource MUST NOT have more than a single session active
+ at one time.
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 157]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ C->S: MRCP/2.0 ... START-SESSION 314161
+ Channel-Identifier:32AECB23433801@speakverify
+ Repository-URI:http://www.example.com/voiceprintdbase/
+ Voiceprint-Mode:verify
+ Voiceprint-Identifier:johnsmith.voiceprint
+ Adapt-Model:true
+
+ S->C: MRCP/2.0 ... 314161 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+
+11.7. END-SESSION
+
+ The END-SESSION method terminates an ongoing verification session and
+ releases the verification voiceprint resources. The session may
+ terminate in one of three ways:
+
+ 1. abort - the voiceprint adaptation or creation may be aborted so
+ that the voiceprint remains unchanged (or is not created).
+
+ 2. commit - when terminating a voiceprint training session, the new
+ voiceprint is committed to the repository.
+
+ 3. adapt - an existing voiceprint is modified using a successful
+ verification.
+
+ The Abort-Model header field MAY be included in the END-SESSION to
+ control whether or not to abort any pending changes to the
+ voiceprint. The default behavior is to commit (not abort) any
+ pending changes to the designated voiceprint.
+
+ The END-SESSION method may be safely executed multiple times without
+ first executing the START-SESSION method. Any additional executions
+ of this method without an intervening use of the START-SESSION method
+ have no effect on the verifier resource.
+
+ The following example assumes there is either a training session or a
+ verification session in progress.
+
+ C->S: MRCP/2.0 ... END-SESSION 314174
+ Channel-Identifier:32AECB23433801@speakverify
+ Abort-Model:true
+
+ S->C: MRCP/2.0 ... 314174 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 158]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.8. QUERY-VOICEPRINT
+
+ The QUERY-VOICEPRINT method is used to get status information on a
+ particular voiceprint and can be used by the client to ascertain if a
+ voiceprint or repository exists and if it contains trained
+ voiceprints.
+
+ The response to the QUERY-VOICEPRINT request contains an indication
+ of the status of the designated voiceprint in the Voiceprint-Exists
+ header field, allowing the client to determine whether to use the
+ current voiceprint for verification, train a new voiceprint, or
+ choose a different voiceprint.
+
+ A voiceprint is completely specified by providing a repository
+ location and a voiceprint identifier. The particular voiceprint or
+ identity within the repository is specified by a string identifier
+ that is unique within the repository. The Voiceprint-Identifier
+ header field carries this unique voiceprint identifier within a given
+ repository.
+
+ The following example assumes a verification session is in progress
+ and the voiceprint exists in the voiceprint repository.
+
+ C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314168
+ Channel-Identifier:32AECB23433801@speakverify
+ Repository-URI:http://www.example.com/voiceprints/
+ Voiceprint-Identifier:johnsmith.voiceprint
+
+ S->C: MRCP/2.0 ... 314168 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+ Repository-URI:http://www.example.com/voiceprints/
+ Voiceprint-Identifier:johnsmith.voiceprint
+ Voiceprint-Exists:true
+
+ The following example assumes that the URI provided in the
+ Repository-URI header field is a bad URI.
+
+ C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314168
+ Channel-Identifier:32AECB23433801@speakverify
+ Repository-URI:http://www.example.com/bad-uri/
+ Voiceprint-Identifier:johnsmith.voiceprint
+
+ S->C: MRCP/2.0 ... 314168 405 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+ Repository-URI:http://www.example.com/bad-uri/
+ Voiceprint-Identifier:johnsmith.voiceprint
+ Completion-Cause:007 repository-uri-failure
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 159]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.9. DELETE-VOICEPRINT
+
+ The DELETE-VOICEPRINT method removes a voiceprint from a repository.
+ This method MUST carry the Repository-URI and Voiceprint-Identifier
+ header fields.
+
+ An MRCPv2 server MUST reject a DELETE-VOICEPRINT request with a 401
+ status code unless the MRCPv2 client has been authenticated and
+ authorized. Note that MRCPv2 does not have a standard mechanism for
+ this. See Section 12.8.
+
+ If the corresponding voiceprint does not exist, the DELETE-VOICEPRINT
+ method MUST return a 200 status code.
+
+ The following example demonstrates a DELETE-VOICEPRINT operation to
+ remove a specific voiceprint.
+
+ C->S: MRCP/2.0 ... DELETE-VOICEPRINT 314168
+ Channel-Identifier:32AECB23433801@speakverify
+ Repository-URI:http://www.example.com/bad-uri/
+ Voiceprint-Identifier:johnsmith.voiceprint
+
+ S->C: MRCP/2.0 ... 314168 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+
+11.10. VERIFY
+
+ The VERIFY method is used to request that the verifier resource
+ either train/adapt the voiceprint or verify/identify a claimed
+ identity. If the voiceprint is new or was deleted by a previous
+ DELETE-VOICEPRINT method, the VERIFY method trains the voiceprint.
+ If the voiceprint already exists, it is adapted and not retrained by
+ the VERIFY command.
+
+ C->S: MRCP/2.0 ... VERIFY 543260
+ Channel-Identifier:32AECB23433801@speakverify
+
+ S->C: MRCP/2.0 ... 543260 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speakverify
+
+ When the VERIFY request completes, the MRCPv2 server MUST send a
+ VERIFICATION-COMPLETE event to the client.
+
+11.11. VERIFY-FROM-BUFFER
+
+ The VERIFY-FROM-BUFFER method directs the verifier resource to verify
+ buffered audio against a voiceprint. Only one VERIFY or VERIFY-FROM-
+ BUFFER method may be active for a verifier resource at a time.
+
+
+
+Burnett & Shanmugham Standards Track [Page 160]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ The buffered audio is not consumed by this method and thus VERIFY-
+ FROM-BUFFER may be invoked multiple times by the client to attempt
+ verification against different voiceprints.
+
+ For the VERIFY-FROM-BUFFER method, the server MAY optionally return
+ an IN-PROGRESS response before the VERIFICATION-COMPLETE event.
+
+ When the VERIFY-FROM-BUFFER method is invoked and the verification
+ buffer is in use by another resource sharing it, the server MUST
+ return an IN-PROGRESS response and wait until the buffer is available
+ to it. The verification buffer is owned by the verifier resource but
+ is shared with write access from other input resources on the same
+ session. Hence, it is considered to be in use if there is a read or
+ write operation such as a RECORD or RECOGNIZE with the
+ Ver-Buffer-Utterance header field set to "true" on a resource that
+ shares this buffer. Note that if a RECORD or RECOGNIZE method
+ returns with a failure cause code, the VERIFY-FROM-BUFFER request
+ waiting to process that buffer MUST also fail with a Completion-Cause
+ of 005 (buffer-empty).
+
+ The following example illustrates the usage of some buffering
+ methods. In this scenario, the client first performed a live
+ verification, but the utterance had been rejected. In the meantime,
+ the utterance is also saved to the audio buffer. Then, another
+ voiceprint is used to do verification against the audio buffer and
+ the utterance is accepted. For the example, we assume both
+ Num-Min-Verification-Phrases and Num-Max-Verification-Phrases are 1.
+
+ C->S: MRCP/2.0 ... START-SESSION 314161
+ Channel-Identifier:32AECB23433801@speakverify
+ Verification-Mode:verify
+ Adapt-Model:true
+ Repository-URI:http://www.example.com/voiceprints
+ Voiceprint-Identifier:johnsmith.voiceprint
+
+ S->C: MRCP/2.0 ... 314161 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+
+ C->S: MRCP/2.0 ... VERIFY 314162
+ Channel-Identifier:32AECB23433801@speakverify
+ Ver-buffer-utterance:true
+
+ S->C: MRCP/2.0 ... 314162 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speakverify
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 161]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... VERIFICATION-COMPLETE 314162 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+ Completion-Cause:000 success
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ grammar="What-Grammar-URI">
+ <verification-result>
+ <voiceprint id="johnsmith">
+ <incremental>
+ <utterance-length> 500 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> female </gender>
+ <decision> rejected </decision>
+ <verification-score> 0.05465 </verification-score>
+ </incremental>
+ <cumulative>
+ <utterance-length> 500 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> female </gender>
+ <decision> rejected </decision>
+ <verification-score> 0.05465 </verification-score>
+ </cumulative>
+ </voiceprint>
+ </verification-result>
+ </result>
+
+ C->S: MRCP/2.0 ... QUERY-VOICEPRINT 314163
+ Channel-Identifier:32AECB23433801@speakverify
+ Repository-URI:http://www.example.com/voiceprints/
+ Voiceprint-Identifier:johnsmith
+
+ S->C: MRCP/2.0 ... 314163 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+ Repository-URI:http://www.example.com/voiceprints/
+ Voiceprint-Identifier:johnsmith.voiceprint
+ Voiceprint-Exists:true
+
+ C->S: MRCP/2.0 ... START-SESSION 314164
+ Channel-Identifier:32AECB23433801@speakverify
+ Verification-Mode:verify
+ Adapt-Model:true
+ Repository-URI:http://www.example.com/voiceprints
+ Voiceprint-Identifier:marysmith.voiceprint
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 162]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... 314164 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+
+ C->S: MRCP/2.0 ... VERIFY-FROM-BUFFER 314165
+ Channel-Identifier:32AECB23433801@speakverify
+
+ S->C: MRCP/2.0 ... 314165 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speakverify
+
+ S->C: MRCP/2.0 ... VERIFICATION-COMPLETE 314165 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+ Completion-Cause:000 success
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ grammar="What-Grammar-URI">
+ <verification-result>
+ <voiceprint id="marysmith">
+ <incremental>
+ <utterance-length> 1000 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> female </gender>
+ <decision> accepted </decision>
+ <verification-score> 0.98 </verification-score>
+ </incremental>
+ <cumulative>
+ <utterance-length> 1000 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> female </gender>
+ <decision> accepted </decision>
+ <verification-score> 0.98 </verification-score>
+ </cumulative>
+ </voiceprint>
+ </verification-result>
+ </result>
+
+
+ C->S: MRCP/2.0 ... END-SESSION 314166
+ Channel-Identifier:32AECB23433801@speakverify
+
+ S->C: MRCP/2.0 ... 314166 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+
+ VERIFY-FROM-BUFFER Example
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 163]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.12. VERIFY-ROLLBACK
+
+ The VERIFY-ROLLBACK method discards the last buffered utterance or
+ discards the last live utterances (when the mode is "train" or
+ "verify"). The client will likely want to invoke this method when
+ the user provides undesirable input such as non-speech noises, side-
+ speech, out-of-grammar utterances, commands, etc. Note that this
+ method does not provide a stack of rollback states. Executing
+ VERIFY-ROLLBACK twice in succession without an intervening
+ recognition operation has no effect on the second attempt.
+
+ C->S: MRCP/2.0 ... VERIFY-ROLLBACK 314165
+ Channel-Identifier:32AECB23433801@speakverify
+
+ S->C: MRCP/2.0 ... 314165 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+
+ VERIFY-ROLLBACK Example
+
+11.13. STOP
+
+ The STOP method from the client to the server tells the verifier
+ resource to stop the VERIFY or VERIFY-FROM-BUFFER request if one is
+ active. If such a request is active and the STOP request
+ successfully terminated it, then the response header section contains
+ an Active-Request-Id-List header field containing the request-id of
+ the VERIFY or VERIFY-FROM-BUFFER request that was terminated. In
+ this case, no VERIFICATION-COMPLETE event is sent for the terminated
+ request. If there was no verify request active, then the response
+ MUST NOT contain an Active-Request-Id-List header field. Either way,
+ the response MUST contain a status-code of 200 "Success".
+
+ The STOP method can carry an Abort-Verification header field, which
+ specifies if the verification result until that point should be
+ discarded or returned. If this header field is not present or if the
+ value is "true", the verification result is discarded and the STOP
+ response does not contain any result data. If the header field is
+ present and its value is "false", the STOP response MUST contain a
+ Completion-Cause header field and carry the Verification result data
+ in its body.
+
+ An aborted VERIFY request does an automatic rollback and hence does
+ not affect the cumulative score. A VERIFY request that was stopped
+ with no Abort-Verification header field or with the Abort-
+ Verification header field set to "false" does affect cumulative
+ scores and would need to be explicitly rolled back if the client does
+ not want the verification result considered in the cumulative scores.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 164]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ The following example assumes a voiceprint identity has already been
+ established.
+
+ C->S: MRCP/2.0 ... VERIFY 314177
+ Channel-Identifier:32AECB23433801@speakverify
+
+ S->C: MRCP/2.0 ... 314177 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speakverify
+
+ C->S: MRCP/2.0 ... STOP 314178
+ Channel-Identifier:32AECB23433801@speakverify
+
+ S->C: MRCP/2.0 ... 314178 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+ Active-Request-Id-List:314177
+
+ STOP Verification Example
+
+11.14. START-INPUT-TIMERS
+
+ This request is sent from the client to the verifier resource to
+ start the no-input timer, usually once the client has ascertained
+ that any audio prompts to the user have played to completion.
+
+ C->S: MRCP/2.0 ... START-INPUT-TIMERS 543260
+ Channel-Identifier:32AECB23433801@speakverify
+
+ S->C: MRCP/2.0 ... 543260 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+
+11.15. VERIFICATION-COMPLETE
+
+ The VERIFICATION-COMPLETE event follows a call to VERIFY or VERIFY-
+ FROM-BUFFER and is used to communicate the verification results to
+ the client. The event message body contains only verification
+ results.
+
+ S->C: MRCP/2.0 ... VERIFICATION-COMPLETE 543259 COMPLETE
+ Completion-Cause:000 success
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ grammar="What-Grammar-URI">
+ <verification-result>
+ <voiceprint id="johnsmith">
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 165]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <incremental>
+ <utterance-length> 500 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> male </gender>
+ <decision> accepted </decision>
+ <verification-score> 0.85 </verification-score>
+ </incremental>
+ <cumulative>
+ <utterance-length> 1500 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> male </gender>
+ <decision> accepted </decision>
+ <verification-score> 0.75 </verification-score>
+ </cumulative>
+ </voiceprint>
+ </verification-result>
+ </result>
+
+11.16. START-OF-INPUT
+
+ The START-OF-INPUT event is returned from the server to the client
+ once the server has detected speech. This event is always returned
+ by the verifier resource when speech has been detected, irrespective
+ of whether or not the recognizer and verifier resources share the
+ same session.
+
+ S->C: MRCP/2.0 ... START-OF-INPUT 543259 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speakverify
+
+11.17. CLEAR-BUFFER
+
+ The CLEAR-BUFFER method can be used to clear the verification buffer.
+ This buffer is used to buffer speech during recognition, record, or
+ verification operations that may later be used by VERIFY-FROM-BUFFER.
+ As noted before, the buffer associated with the verifier resource is
+ shared by other input resources like recognizers and recorders.
+ Hence, a CLEAR-BUFFER request fails if the verification buffer is in
+ use. This can happen when any one of the input resources that share
+ this buffer has an active read or write operation such as RECORD,
+ RECOGNIZE, or VERIFY with the Ver-Buffer-Utterance header field set
+ to "true".
+
+ C->S: MRCP/2.0 ... CLEAR-BUFFER 543260
+ Channel-Identifier:32AECB23433801@speakverify
+
+ S->C: MRCP/2.0 ... 543260 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 166]
+
+RFC 6787 MRCPv2 November 2012
+
+
+11.18. GET-INTERMEDIATE-RESULT
+
+ A client can use the GET-INTERMEDIATE-RESULT method to poll for
+ intermediate results of a verification request that is in progress.
+ Invoking this method does not change the state of the resource. The
+ verifier resource collects the accumulated verification results and
+ returns the information in the method response. The message body in
+ the response to a GET-INTERMEDIATE-RESULT REQUEST contains only
+ verification results. The method response MUST NOT contain a
+ Completion-Cause header field as the request is not yet complete. If
+ the resource does not have a verification in progress, the response
+ has a 402 failure status-code and no result in the body.
+
+ C->S: MRCP/2.0 ... GET-INTERMEDIATE-RESULT 543260
+ Channel-Identifier:32AECB23433801@speakverify
+
+ S->C: MRCP/2.0 ... 543260 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speakverify
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ grammar="What-Grammar-URI">
+ <verification-result>
+ <voiceprint id="marysmith">
+ <incremental>
+ <utterance-length> 50 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> female </gender>
+ <decision> undecided </decision>
+ <verification-score> 0.85 </verification-score>
+ </incremental>
+ <cumulative>
+ <utterance-length> 150 </utterance-length>
+ <device> cellular-phone </device>
+ <gender> female </gender>
+ <decision> undecided </decision>
+ <verification-score> 0.65 </verification-score>
+ </cumulative>
+ </voiceprint>
+ </verification-result>
+ </result>
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 167]
+
+RFC 6787 MRCPv2 November 2012
+
+
+12. Security Considerations
+
+ MRCPv2 is designed to comply with the security-related requirements
+ documented in the SPEECHSC requirements [RFC4313]. Implementers and
+ users of MRCPv2 are strongly encouraged to read the Security
+ Considerations section of [RFC4313], because that document contains
+ discussion of a number of important security issues associated with
+ the utilization of speech as biometric authentication technology, and
+ on the threats against systems which store recorded speech, contain
+ large corpora of voiceprints, and send and receive sensitive
+ information based on voice input to a recognizer or speech output
+ from a synthesizer. Specific security measures employed by MRCPv2
+ are summarized in the following subsections. See the corresponding
+ sections of this specification for how the security-related machinery
+ is invoked by individual protocol operations.
+
+12.1. Rendezvous and Session Establishment
+
+ MRCPv2 control sessions are established as media sessions described
+ by SDP within the context of a SIP dialog. In order to ensure secure
+ rendezvous between MRCPv2 clients and servers, the following are
+ required:
+
+ 1. The SIP implementation in MRCPv2 clients and servers MUST support
+ SIP digest authentication [RFC3261] and SHOULD employ it.
+
+ 2. The SIP implementation in MRCPv2 clients and servers MUST support
+ 'sips' URIs and SHOULD employ 'sips' URIs; this includes that
+ clients and servers SHOULD set up TLS [RFC5246] connections.
+
+ 3. If media stream cryptographic keying is done through SDP (e.g.
+ using [RFC4568]), the MRCPv2 clients and servers MUST employ the
+ 'sips' URI.
+
+ 4. When TLS is used for SIP, the client MUST verify the identity of
+ the server to which it connects, following the rules and
+ guidelines defined in [RFC5922].
+
+12.2. Control Channel Protection
+
+ Sensitive data is carried over the MRCPv2 control channel. This
+ includes things like the output of speech recognition operations,
+ speaker verification results, input to text-to-speech conversion,
+ personally identifying grammars, etc. For this reason, MRCPv2
+ servers must be properly authenticated, and the control channel must
+ permit the use of both confidentiality and integrity for the data.
+ To ensure control channel protection, MRCPv2 clients and servers MUST
+ support TLS and SHOULD utilize it by default unless alternative
+
+
+
+Burnett & Shanmugham Standards Track [Page 168]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ control channel protection is used. When TLS is used, the client
+ MUST verify the identity of the server to which it connects,
+ following the rules and guidelines defined in [RFC4572]. If there
+ are multiple TLS-protected channels between the client and the
+ server, the server MUST NOT send a response to the client over a
+ channel for which the TLS identities of the server or client differ
+ from the channel over which the server received the corresponding
+ request. Alternative control-channel protection MAY be used if
+ desired (e.g., Security Architecture for the Internet Protocol
+ (IPsec) [RFC4301]).
+
+12.3. Media Session Protection
+
+ Sensitive data is also carried on media sessions terminating on
+ MRCPv2 servers (the other end of a media channel may or may not be on
+ the MRCPv2 client). This data includes the user's spoken utterances
+ and the output of text-to-speech operations. MRCPv2 servers MUST
+ support a security mechanism for protection of audio media sessions.
+ MRCPv2 clients that originate or consume audio similarly MUST support
+ a security mechanism for protection of the audio. One such mechanism
+ is the Secure Real-time Transport Protocol (SRTP) [RFC3711].
+
+12.4. Indirect Content Access
+
+ MCRPv2 employs content indirection extensively. Content may be
+ fetched and/or stored based on URI addressing on systems other than
+ the MRCPv2 client or server. Not all of the stored content is
+ necessarily sensitive (e.g., XML schemas), but the majority generally
+ needs protection, and some indirect content, such as voice recordings
+ and voiceprints, is extremely sensitive and must always be protected.
+ MRCPv2 clients and servers MUST implement HTTPS for indirect content
+ access and SHOULD employ secure access for all sensitive indirect
+ content. Other secure URI schemes such as Secure FTP (FTPS)
+ [RFC4217] MAY also be used. See Section 6.2.15 for the header fields
+ used to transfer cookie information between the MRCPv2 client and
+ server if needed for authentication.
+
+ Access to URIs provided by servers introduces risks that need to be
+ considered. Although RFC 6454 [RFC6454] discusses and focuses on a
+ same-origin policy, which MRCPv2 does not restrict URIs to, it still
+ provides an excellent description of the pitfalls of blindly
+ following server-provided URIs in Section 3 of the RFC. Servers also
+ need to be aware that clients could provide URIs to sites designed to
+ tie up the server in long or otherwise problematic document fetches.
+ MRCPv2 servers, and the services they access, MUST always be prepared
+ for the possibility of such a denial-of-service attack.
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 169]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ MRCPv2 makes no inherent assumptions about the lifetime and access
+ controls associated with a URI. For example, if neither
+ authentication nor scheme-specific access controls are used, a leak
+ of the URI is equivalent to a leak of the content. Moreover, MRCPv2
+ makes no specific demands on the lifetime of a URI. If a server
+ offers a URI and the client takes a long, long time to access that
+ URI, the server may have removed the resource in the interim time
+ period. MRCPv2 deals with this case by using the URI access scheme's
+ 'resource not found' error, such as 404 for HTTPS. How long a server
+ should keep a dynamic resource available is highly application and
+ context dependent. However, the server SHOULD keep the resource
+ available for a reasonable amount of time to make it likely the
+ client will have the resource available when the client needs the
+ resource. Conversely, to mitigate state exhaustion attacks, MRCPv2
+ servers are not obligated to keep resources and resource state in
+ perpetuity. The server SHOULD delete dynamically generated resources
+ associated with an MRCPv2 session when the session ends.
+
+ One method to avoid resource leakage is for the server to use
+ difficult-to-guess, one-time resource URIs. In this instance, there
+ can be only a single access to the underlying resource using the
+ given URI. A downside to this approach is if an attacker uses the
+ URI before the client uses the URI, then the client is denied the
+ resource. Other methods would be to adopt a mechanism similar to the
+ URLAUTH IMAP extension [RFC4467], where the server sets cryptographic
+ checks on URI usage, as well as capabilities for expiration,
+ revocation, and so on. Specifying such a mechanism is beyond the
+ scope of this document.
+
+12.5. Protection of Stored Media
+
+ MRCPv2 applications often require the use of stored media. Voice
+ recordings are both stored (e.g., for diagnosis and system tuning),
+ and fetched (for replaying utterances into multiple MRCPv2
+ resources). Voiceprints are fundamental to the speaker
+ identification and verification functions. This data can be
+ extremely sensitive and can present substantial privacy and
+ impersonation risks if stolen. Systems employing MRCPv2 SHOULD be
+ deployed in ways that minimize these risks. The SPEECHSC
+ requirements RFC [RFC4313] contains a more extensive discussion of
+ these risks and ways they may be mitigated.
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 170]
+
+RFC 6787 MRCPv2 November 2012
+
+
+12.6. DTMF and Recognition Buffers
+
+ DTMF buffers and recognition buffers may grow large enough to exceed
+ the capabilities of a server, and the server MUST be prepared to
+ gracefully handle resource consumption. A server MAY respond with
+ the appropriate recognition incomplete if the server is in danger of
+ running out of resources.
+
+12.7. Client-Set Server Parameters
+
+ In MRCPv2, there are some tasks, such as URI resource fetches, that
+ the server does on behalf of the client. To control this behavior,
+ MRCPv2 has a number of server parameters that a client can configure.
+ With one such parameter, Fetch-Timeout (Section 6.2.12), a malicious
+ client could set a very large value and then request the server to
+ fetch a non-existent document. It is RECOMMENDED that servers be
+ cautious about accepting long timeout values or abnormally large
+ values for other client-set parameters.
+
+12.8. DELETE-VOICEPRINT and Authorization
+
+ Since this specification does not mandate a specific mechanism for
+ authentication and authorization when requesting DELETE-VOICEPRINT
+ (Section 11.9), there is a risk that an MRCPv2 server may not do such
+ a check for authentication and authorization. In practice, each
+ provider of voice biometric solutions does insist on its own
+ authentication and authorization mechanism, outside of this
+ specification, so this is not likely to be a major problem. If in
+ the future voice biometric providers standardize on such a mechanism,
+ then a future version of MRCP can mandate it.
+
+13. IANA Considerations
+
+13.1. New Registries
+
+ This section describes the name spaces (registries) for MRCPv2 that
+ IANA has created and now maintains. Assignment/registration policies
+ are described in RFC 5226 [RFC5226].
+
+13.1.1. MRCPv2 Resource Types
+
+ IANA has created a new name space of "MRCPv2 Resource Types". All
+ maintenance within and additions to the contents of this name space
+ MUST be according to the "Standards Action" registration policy. The
+ initial contents of the registry, defined in Section 4.2, are given
+ below:
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 171]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Resource type Resource description Reference
+ ------------- -------------------- ---------
+ speechrecog Speech Recognizer [RFC6787]
+ dtmfrecog DTMF Recognizer [RFC6787]
+ speechsynth Speech Synthesizer [RFC6787]
+ basicsynth Basic Synthesizer [RFC6787]
+ speakverify Speaker Verifier [RFC6787]
+ recorder Speech Recorder [RFC6787]
+
+13.1.2. MRCPv2 Methods and Events
+
+ IANA has created a new name space of "MRCPv2 Methods and Events".
+ All maintenance within and additions to the contents of this name
+ space MUST be according to the "Standards Action" registration
+ policy. The initial contents of the registry, defined by the
+ "method-name" and "event-name" BNF in Section 15 and explained in
+ Sections 5.2 and 5.5, are given below.
+
+ Name Resource type Method/Event Reference
+ ---- ------------- ------------ ---------
+ SET-PARAMS Generic Method [RFC6787]
+ GET-PARAMS Generic Method [RFC6787]
+ SPEAK Synthesizer Method [RFC6787]
+ STOP Synthesizer Method [RFC6787]
+ PAUSE Synthesizer Method [RFC6787]
+ RESUME Synthesizer Method [RFC6787]
+ BARGE-IN-OCCURRED Synthesizer Method [RFC6787]
+ CONTROL Synthesizer Method [RFC6787]
+ DEFINE-LEXICON Synthesizer Method [RFC6787]
+ DEFINE-GRAMMAR Recognizer Method [RFC6787]
+ RECOGNIZE Recognizer Method [RFC6787]
+ INTERPRET Recognizer Method [RFC6787]
+ GET-RESULT Recognizer Method [RFC6787]
+ START-INPUT-TIMERS Recognizer Method [RFC6787]
+ STOP Recognizer Method [RFC6787]
+ START-PHRASE-ENROLLMENT Recognizer Method [RFC6787]
+ ENROLLMENT-ROLLBACK Recognizer Method [RFC6787]
+ END-PHRASE-ENROLLMENT Recognizer Method [RFC6787]
+ MODIFY-PHRASE Recognizer Method [RFC6787]
+ DELETE-PHRASE Recognizer Method [RFC6787]
+ RECORD Recorder Method [RFC6787]
+ STOP Recorder Method [RFC6787]
+ START-INPUT-TIMERS Recorder Method [RFC6787]
+ START-SESSION Verifier Method [RFC6787]
+ END-SESSION Verifier Method [RFC6787]
+ QUERY-VOICEPRINT Verifier Method [RFC6787]
+ DELETE-VOICEPRINT Verifier Method [RFC6787]
+ VERIFY Verifier Method [RFC6787]
+
+
+
+Burnett & Shanmugham Standards Track [Page 172]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ VERIFY-FROM-BUFFER Verifier Method [RFC6787]
+ VERIFY-ROLLBACK Verifier Method [RFC6787]
+ STOP Verifier Method [RFC6787]
+ START-INPUT-TIMERS Verifier Method [RFC6787]
+ GET-INTERMEDIATE-RESULT Verifier Method [RFC6787]
+ SPEECH-MARKER Synthesizer Event [RFC6787]
+ SPEAK-COMPLETE Synthesizer Event [RFC6787]
+ START-OF-INPUT Recognizer Event [RFC6787]
+ RECOGNITION-COMPLETE Recognizer Event [RFC6787]
+ INTERPRETATION-COMPLETE Recognizer Event [RFC6787]
+ START-OF-INPUT Recorder Event [RFC6787]
+ RECORD-COMPLETE Recorder Event [RFC6787]
+ VERIFICATION-COMPLETE Verifier Event [RFC6787]
+ START-OF-INPUT Verifier Event [RFC6787]
+
+13.1.3. MRCPv2 Header Fields
+
+ IANA has created a new name space of "MRCPv2 Header Fields". All
+ maintenance within and additions to the contents of this name space
+ MUST be according to the "Standards Action" registration policy. The
+ initial contents of the registry, defined by the "message-header" BNF
+ in Section 15 and explained in Section 5.1, are given below. Note
+ that the values permitted for the "Vendor-Specific-Parameters"
+ parameter are managed according to a different policy. See
+ Section 13.1.6.
+
+ Name Resource type Reference
+ ---- ------------- ---------
+ Channel-Identifier Generic [RFC6787]
+ Accept Generic [RFC2616]
+ Active-Request-Id-List Generic [RFC6787]
+ Proxy-Sync-Id Generic [RFC6787]
+ Accept-Charset Generic [RFC2616]
+ Content-Type Generic [RFC6787]
+ Content-ID Generic
+ [RFC2392], [RFC2046], and [RFC5322]
+ Content-Base Generic [RFC6787]
+ Content-Encoding Generic [RFC6787]
+ Content-Location Generic [RFC6787]
+ Content-Length Generic [RFC6787]
+ Fetch-Timeout Generic [RFC6787]
+ Cache-Control Generic [RFC6787]
+ Logging-Tag Generic [RFC6787]
+ Set-Cookie Generic [RFC6787]
+ Vendor-Specific Generic [RFC6787]
+ Jump-Size Synthesizer [RFC6787]
+ Kill-On-Barge-In Synthesizer [RFC6787]
+ Speaker-Profile Synthesizer [RFC6787]
+
+
+
+Burnett & Shanmugham Standards Track [Page 173]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Completion-Cause Synthesizer [RFC6787]
+ Completion-Reason Synthesizer [RFC6787]
+ Voice-Parameter Synthesizer [RFC6787]
+ Prosody-Parameter Synthesizer [RFC6787]
+ Speech-Marker Synthesizer [RFC6787]
+ Speech-Language Synthesizer [RFC6787]
+ Fetch-Hint Synthesizer [RFC6787]
+ Audio-Fetch-Hint Synthesizer [RFC6787]
+ Failed-URI Synthesizer [RFC6787]
+ Failed-URI-Cause Synthesizer [RFC6787]
+ Speak-Restart Synthesizer [RFC6787]
+ Speak-Length Synthesizer [RFC6787]
+ Load-Lexicon Synthesizer [RFC6787]
+ Lexicon-Search-Order Synthesizer [RFC6787]
+ Confidence-Threshold Recognizer [RFC6787]
+ Sensitivity-Level Recognizer [RFC6787]
+ Speed-Vs-Accuracy Recognizer [RFC6787]
+ N-Best-List-Length Recognizer [RFC6787]
+ Input-Type Recognizer [RFC6787]
+ No-Input-Timeout Recognizer [RFC6787]
+ Recognition-Timeout Recognizer [RFC6787]
+ Waveform-URI Recognizer [RFC6787]
+ Input-Waveform-URI Recognizer [RFC6787]
+ Completion-Cause Recognizer [RFC6787]
+ Completion-Reason Recognizer [RFC6787]
+ Recognizer-Context-Block Recognizer [RFC6787]
+ Start-Input-Timers Recognizer [RFC6787]
+ Speech-Complete-Timeout Recognizer [RFC6787]
+ Speech-Incomplete-Timeout Recognizer [RFC6787]
+ Dtmf-Interdigit-Timeout Recognizer [RFC6787]
+ Dtmf-Term-Timeout Recognizer [RFC6787]
+ Dtmf-Term-Char Recognizer [RFC6787]
+ Failed-URI Recognizer [RFC6787]
+ Failed-URI-Cause Recognizer [RFC6787]
+ Save-Waveform Recognizer [RFC6787]
+ Media-Type Recognizer [RFC6787]
+ New-Audio-Channel Recognizer [RFC6787]
+ Speech-Language Recognizer [RFC6787]
+ Ver-Buffer-Utterance Recognizer [RFC6787]
+ Recognition-Mode Recognizer [RFC6787]
+ Cancel-If-Queue Recognizer [RFC6787]
+ Hotword-Max-Duration Recognizer [RFC6787]
+ Hotword-Min-Duration Recognizer [RFC6787]
+ Interpret-Text Recognizer [RFC6787]
+ Dtmf-Buffer-Time Recognizer [RFC6787]
+ Clear-Dtmf-Buffer Recognizer [RFC6787]
+ Early-No-Match Recognizer [RFC6787]
+ Num-Min-Consistent-Pronunciations Recognizer [RFC6787]
+
+
+
+Burnett & Shanmugham Standards Track [Page 174]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Consistency-Threshold Recognizer [RFC6787]
+ Clash-Threshold Recognizer [RFC6787]
+ Personal-Grammar-URI Recognizer [RFC6787]
+ Enroll-Utterance Recognizer [RFC6787]
+ Phrase-ID Recognizer [RFC6787]
+ Phrase-NL Recognizer [RFC6787]
+ Weight Recognizer [RFC6787]
+ Save-Best-Waveform Recognizer [RFC6787]
+ New-Phrase-ID Recognizer [RFC6787]
+ Confusable-Phrases-URI Recognizer [RFC6787]
+ Abort-Phrase-Enrollment Recognizer [RFC6787]
+ Sensitivity-Level Recorder [RFC6787]
+ No-Input-Timeout Recorder [RFC6787]
+ Completion-Cause Recorder [RFC6787]
+ Completion-Reason Recorder [RFC6787]
+ Failed-URI Recorder [RFC6787]
+ Failed-URI-Cause Recorder [RFC6787]
+ Record-URI Recorder [RFC6787]
+ Media-Type Recorder [RFC6787]
+ Max-Time Recorder [RFC6787]
+ Trim-Length Recorder [RFC6787]
+ Final-Silence Recorder [RFC6787]
+ Capture-On-Speech Recorder [RFC6787]
+ Ver-Buffer-Utterance Recorder [RFC6787]
+ Start-Input-Timers Recorder [RFC6787]
+ New-Audio-Channel Recorder [RFC6787]
+ Repository-URI Verifier [RFC6787]
+ Voiceprint-Identifier Verifier [RFC6787]
+ Verification-Mode Verifier [RFC6787]
+ Adapt-Model Verifier [RFC6787]
+ Abort-Model Verifier [RFC6787]
+ Min-Verification-Score Verifier [RFC6787]
+ Num-Min-Verification-Phrases Verifier [RFC6787]
+ Num-Max-Verification-Phrases Verifier [RFC6787]
+ No-Input-Timeout Verifier [RFC6787]
+ Save-Waveform Verifier [RFC6787]
+ Media-Type Verifier [RFC6787]
+ Waveform-URI Verifier [RFC6787]
+ Voiceprint-Exists Verifier [RFC6787]
+ Ver-Buffer-Utterance Verifier [RFC6787]
+ Input-Waveform-URI Verifier [RFC6787]
+ Completion-Cause Verifier [RFC6787]
+ Completion-Reason Verifier [RFC6787]
+ Speech-Complete-Timeout Verifier [RFC6787]
+ New-Audio-Channel Verifier [RFC6787]
+ Abort-Verification Verifier [RFC6787]
+ Start-Input-Timers Verifier [RFC6787]
+ Input-Type Verifier [RFC6787]
+
+
+
+Burnett & Shanmugham Standards Track [Page 175]
+
+RFC 6787 MRCPv2 November 2012
+
+
+13.1.4. MRCPv2 Status Codes
+
+ IANA has created a new name space of "MRCPv2 Status Codes" with the
+ initial values that are defined in Section 5.4. All maintenance
+ within and additions to the contents of this name space MUST be
+ according to the "Specification Required with Expert Review"
+ registration policy.
+
+13.1.5. Grammar Reference List Parameters
+
+ IANA has created a new name space of "Grammar Reference List
+ Parameters". All maintenance within and additions to the contents of
+ this name space MUST be according to the "Specification Required with
+ Expert Review" registration policy. There is only one initial
+ parameter as shown below.
+
+ Name Reference
+ ---- -------------
+ weight [RFC6787]
+
+13.1.6. MRCPv2 Vendor-Specific Parameters
+
+ IANA has created a new name space of "MRCPv2 Vendor-Specific
+ Parameters". All maintenance within and additions to the contents of
+ this name space MUST be according to the "Hierarchical Allocation"
+ registration policy as follows. Each name (corresponding to the
+ "vendor-av-pair-name" ABNF production) MUST satisfy the syntax
+ requirements of Internet Domain Names as described in Section 2.3.1
+ of RFC 1035 [RFC1035] (and as updated or obsoleted by successive
+ RFCs), with one exception, the order of the domain names is reversed.
+ For example, a vendor-specific parameter "foo" by example.com would
+ have the form "com.example.foo". The first, or top-level domain, is
+ restricted to exactly the set of Top-Level Internet Domains defined
+ by IANA and will be updated by IANA when and only when that set
+ changes. The second-level and all subdomains within the parameter
+ name MUST be allocated according to the "First Come First Served"
+ policy. It is RECOMMENDED that assignment requests adhere to the
+ existing allocations of Internet domain names to organizations,
+ institutions, corporations, etc.
+
+ The registry contains a list of vendor-registered parameters, where
+ each defined parameter is associated with a contact person and
+ includes an optional reference to the definition of the parameter,
+ preferably an RFC. The registry is initially empty.
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 176]
+
+RFC 6787 MRCPv2 November 2012
+
+
+13.2. NLSML-Related Registrations
+
+13.2.1. 'application/nlsml+xml' Media Type Registration
+
+ IANA has registered the following media type according to the process
+ defined in RFC 4288 [RFC4288].
+
+ To: ietf-types@iana.org
+
+ Subject: Registration of media type application/nlsml+xml
+
+ MIME media type name: application
+
+ MIME subtype name: nlsml+xml
+
+ Required parameters: none
+
+ Optional parameters:
+
+ charset: All of the considerations described in RFC 3023
+ [RFC3023] also apply to the application/nlsml+xml media type.
+
+ Encoding considerations: All of the considerations described in RFC
+ 3023 also apply to the 'application/nlsml+xml' media type.
+
+ Security considerations: As with HTML, NLSML documents contain links
+ to other data stores (grammars, verifier resources, etc.). Unlike
+ HTML, however, the data stores are not treated as media to be
+ rendered. Nevertheless, linked files may themselves have security
+ considerations, which would be those of the individual registered
+ types. Additionally, this media type has all of the security
+ considerations described in RFC 3023.
+
+ Interoperability considerations: Although an NLSML document is
+ itself a complete XML document, for a fuller interpretation of the
+ content a receiver of an NLSML document may wish to access
+ resources linked to by the document. The inability of an NLSML
+ processor to access or process such linked resources could result
+ in different behavior by the ultimate consumer of the data.
+
+ Published specification: RFC 6787
+
+ Applications that use this media type: MRCPv2 clients and servers
+
+ Additional information: none
+
+ Magic number(s): There is no single initial octet sequence that is
+ always present for NLSML files.
+
+
+
+Burnett & Shanmugham Standards Track [Page 177]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Person & email address to contact for further information:
+ Sarvi Shanmugham, sarvi@cisco.com
+
+ Intended usage: This media type is expected to be used only in
+ conjunction with MRCPv2.
+
+13.3. NLSML XML Schema Registration
+
+ IANA has registered and now maintains the following XML Schema.
+ Information provided follows the template in RFC 3688 [RFC3688].
+
+ XML element type: schema
+
+ URI: urn:ietf:params:xml:schema:nlsml
+
+ Registrant Contact: IESG
+
+ XML: See Section 16.1.
+
+13.4. MRCPv2 XML Namespace Registration
+
+ IANA has registered and now maintains the following XML Name space.
+ Information provided follows the template in RFC 3688 [RFC3688].
+
+ XML element type: ns
+
+ URI: urn:ietf:params:xml:ns:mrcpv2
+
+ Registrant Contact: IESG
+
+ XML: RFC 6787
+
+13.5. Text Media Type Registrations
+
+ IANA has registered the following text media type according to the
+ process defined in RFC 4288 [RFC4288].
+
+13.5.1. text/grammar-ref-list
+
+ To: ietf-types@iana.org
+
+ Subject: Registration of media type text/grammar-ref-list
+
+ MIME media type name: text
+
+ MIME subtype name: text/grammar-ref-list
+
+ Required parameters: none
+
+
+
+Burnett & Shanmugham Standards Track [Page 178]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Optional parameters: none
+
+ Encoding considerations: Depending on the transfer protocol, a
+ transfer encoding may be necessary to deal with very long lines.
+
+ Security considerations: This media type contains URIs that may
+ represent references to external resources. As these resources
+ are assumed to be speech recognition grammars, similar
+ considerations as for the media types 'application/srgs' and
+ 'application/srgs+xml' apply.
+
+ Interoperability considerations: '>' must be percent encoded in URIs
+ according to RFC 3986 [RFC3986].
+
+ Published specification: The RECOGNIZE method of the MRCP protocol
+ performs a recognition operation that matches input against a set
+ of grammars. When matching against more than one grammar, it is
+ sometimes necessary to use different weights for the individual
+ grammars. These weights are not a property of the grammar
+ resource itself but qualify the reference to that grammar for the
+ particular recognition operation initiated by the RECOGNIZE
+ method. The format of the proposed 'text/grammar-ref-list' media
+ type is as follows:
+
+ body = *reference
+ reference = "<" uri ">" [parameters] CRLF
+ parameters = ";" parameter *(";" parameter)
+ parameter = attribute "=" value
+
+ This specification currently only defines a 'weight' parameter,
+ but new parameters MAY be added through the "Grammar Reference
+ List Parameters" IANA registry established through this
+ specification. Example:
+
+ <http://example.com/grammars/field1.gram>
+ <http://example.com/grammars/field2.gram>;weight="0.85"
+ <session:field3@form-level.store>;weight="0.9"
+ <http://example.com/grammars/universals.gram>;weight="0.75"
+
+ Applications that use this media type: MRCPv2 clients and servers
+
+ Additional information: none
+
+ Magic number(s): none
+
+ Person & email address to contact for further information:
+ Sarvi Shanmugham, sarvi@cisco.com
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 179]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Intended usage: This media type is expected to be used only in
+ conjunction with MRCPv2.
+
+13.6. 'session' URI Scheme Registration
+
+ IANA has registered the following new URI scheme. The information
+ below follows the template given in RFC 4395 [RFC4395].
+
+ URI scheme name: session
+
+ Status: Permanent
+
+ URI scheme syntax: The syntax of this scheme is identical to that
+ defined for the "cid" scheme in Section 2 of RFC 2392 [RFC2392].
+
+ URI scheme semantics: The URI is intended to identify a data
+ resource previously given to the network computing resource. The
+ purpose of this scheme is to permit access to the specific
+ resource for the lifetime of the session with the entity storing
+ the resource. The media type of the resource CAN vary. There is
+ no explicit mechanism for communication of the media type. This
+ scheme is currently widely used internally by existing
+ implementations, and the registration is intended to provide
+ information in the rare (and unfortunate) case that the scheme is
+ used elsewhere. The scheme SHOULD NOT be used for open Internet
+ protocols.
+
+ Encoding considerations: There are no other encoding considerations
+ for the 'session' URIs not described in RFC 3986 [RFC3986]
+
+ Applications/protocols that use this URI scheme name: This scheme
+ name is used by MRCPv2 clients and servers.
+
+ Interoperability considerations: Note that none of the resources are
+ accessible after the MCRPv2 session ends, hence the name of the
+ scheme. For clients who establish one MRCPv2 session only for the
+ entire speech application being implemented, this is sufficient,
+ but clients who create, terminate, and recreate MRCP sessions for
+ performance or scalability reasons will lose access to resources
+ established in the earlier session(s).
+
+ Security considerations: Generic security considerations for URIs
+ described in RFC 3986 [RFC3986] apply to this scheme as well. The
+ URIs defined here provide an identification mechanism only. Given
+ that the communication channel between client and server is
+ secure, that the server correctly accesses the resource associated
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 180]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ with the URI, and that the server ensures session-only lifetime
+ and access for each URI, the only additional security issues are
+ those of the types of media referred to by the URI.
+
+ Contact: Sarvi Shanmugham, sarvi@cisco.com
+
+ Author/Change controller: IESG, iesg@ietf.org
+
+ References: This specification, particularly Sections 6.2.7, 8.5.2,
+ 9.5.1, and 9.9.
+
+13.7. SDP Parameter Registrations
+
+ IANA has registered the following SDP parameter values. The
+ information for each follows the template given in RFC 4566
+ [RFC4566], Appendix B.
+
+13.7.1. Sub-Registry "proto"
+
+ "TCP/MRCPv2" value of the "proto" parameter
+
+ Contact name, email address, and telephone number: Sarvi Shanmugham,
+ sarvi@cisco.com, +1.408.902.3875
+
+ Name being registered (as it will appear in SDP): TCP/MRCPv2
+
+ Long-form name in English: MCRPv2 over TCP
+
+ Type of name: proto
+
+ Explanation of name: This name represents the MCRPv2 protocol
+ carried over TCP.
+
+ Reference to specification of name: RFC 6787
+
+ "TCP/TLS/MRCPv2" value of the "proto" parameter
+
+ Contact name, email address, and telephone number: Sarvi Shanmugham,
+ sarvi@cisco.com, +1.408.902.3875
+
+ Name being registered (as it will appear in SDP): TCP/TLS/MRCPv2
+
+ Long-form name in English: MCRPv2 over TLS over TCP
+
+ Type of name: proto
+
+ Explanation of name: This name represents the MCRPv2 protocol
+ carried over TLS over TCP.
+
+
+
+Burnett & Shanmugham Standards Track [Page 181]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Reference to specification of name: RFC 6787
+
+13.7.2. Sub-Registry "att-field (media-level)"
+
+ "resource" value of the "att-field" parameter
+
+ Contact name, email address, and telephone number: Sarvi Shanmugham,
+ sarvi@cisco.com, +1.408.902.3875
+
+ Attribute name (as it will appear in SDP): resource
+
+ Long-form attribute name in English: MRCPv2 resource type
+
+ Type of attribute: media-level
+
+ Subject to charset attribute? no
+
+ Explanation of attribute: See Section 4.2 of RFC 6787 for
+ description and examples.
+
+ Specification of appropriate attribute values: See section
+ Section 13.1.1 of RFC 6787.
+
+ "channel" value of the "att-field" parameter
+
+ Contact name, email address, and telephone number: Sarvi Shanmugham,
+ sarvi@cisco.com, +1.408.902.3875
+
+ Attribute name (as it will appear in SDP): channel
+
+ Long-form attribute name in English: MRCPv2 resource channel
+ identifier
+
+ Type of attribute: media-level
+
+ Subject to charset attribute? no
+
+ Explanation of attribute: See Section 4.2 of RFC 6787 for
+ description and examples.
+
+ Specification of appropriate attribute values: See Section 4.2 and
+ the "channel-id" ABNF production rules of RFC 6787.
+
+ "cmid" value of the "att-field" parameter
+
+ Contact name, email address, and telephone number: Sarvi Shanmugham,
+ sarvi@cisco.com, +1.408.902.3875
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 182]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Attribute name (as it will appear in SDP): cmid
+
+ Long-form attribute name in English: MRCPv2 resource channel media
+ identifier
+
+ Type of attribute: media-level
+
+ Subject to charset attribute? no
+
+ Explanation of attribute: See Section 4.4 of RFC 6787 for
+ description and examples.
+
+ Specification of appropriate attribute values: See Section 4.4 and
+ the "cmid-attribute" ABNF production rules of RFC 6787.
+
+14. Examples
+
+14.1. Message Flow
+
+ The following is an example of a typical MRCPv2 session of speech
+ synthesis and recognition between a client and a server. Although
+ the SDP "s=" attribute in these examples has a text description value
+ to assist in understanding the examples, please keep in mind that RFC
+ 3264 [RFC3264] recommends that messages actually put on the wire use
+ a space or a dash.
+
+ The figure below illustrates opening a session to the MRCPv2 server.
+ This exchange does not allocate a resource or setup media. It simply
+ establishes a SIP session with the MRCPv2 server.
+
+ C->S:
+ INVITE sip:mresources@example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bg1
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:323123 INVITE
+ Contact:<sip:sarvi@client.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=sarvi 2614933546 2614933546 IN IP4 192.0.2.12
+ s=Set up MRCPv2 control and audio
+ i=Initial contact
+ c=IN IP4 192.0.2.12
+
+
+
+Burnett & Shanmugham Standards Track [Page 183]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C:
+ SIP/2.0 200 OK
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bg1;received=192.0.32.10
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:323123 INVITE
+ Contact:<sip:mresources@server.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=- 3000000001 3000000001 IN IP4 192.0.2.11
+ s=Set up MRCPv2 control and audio
+ i=Initial contact
+ c=IN IP4 192.0.2.11
+
+ C->S:
+ ACK sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bg2
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:Sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:323123 ACK
+ Content-Length:0
+
+ The client requests the server to create a synthesizer resource
+ control channel to do speech synthesis. This also adds a media
+ stream to send the generated speech. Note that, in this example, the
+ client requests a new MRCPv2 TCP stream between the client and the
+ server. In the following requests, the client will ask to use the
+ existing connection.
+
+ C->S:
+ INVITE sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bg3
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:323124 INVITE
+ Contact:<sip:sarvi@client.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+
+
+Burnett & Shanmugham Standards Track [Page 184]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ v=0
+ o=sarvi 2614933546 2614933547 IN IP4 192.0.2.12
+ s=Set up MRCPv2 control and audio
+ i=Add TCP channel, synthesizer and one-way audio
+ c=IN IP4 192.0.2.12
+ t=0 0
+ m=application 9 TCP/MRCPv2 1
+ a=setup:active
+ a=connection:new
+ a=resource:speechsynth
+ a=cmid:1
+ m=audio 49170 RTP/AVP 0 96
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:96 telephone-event/8000
+ a=fmtp:96 0-15
+ a=recvonly
+ a=mid:1
+
+
+ S->C:
+ SIP/2.0 200 OK
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bg3;received=192.0.32.10
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:323124 INVITE
+ Contact:<sip:mresources@server.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=- 3000000001 3000000002 IN IP4 192.0.2.11
+ s=Set up MRCPv2 control and audio
+ i=Add TCP channel, synthesizer and one-way audio
+ c=IN IP4 192.0.2.11
+ t=0 0
+ m=application 32416 TCP/MRCPv2 1
+ a=setup:passive
+ a=connection:new
+ a=channel:32AECB23433801@speechsynth
+ a=cmid:1
+ m=audio 48260 RTP/AVP 0
+ a=rtpmap:0 pcmu/8000
+ a=sendonly
+ a=mid:1
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 185]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ C->S:
+ ACK sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bg4
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:Sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:323124 ACK
+ Content-Length:0
+
+ This exchange allocates an additional resource control channel for a
+ recognizer. Since a recognizer would need to receive an audio stream
+ for recognition, this interaction also updates the audio stream to
+ sendrecv, making it a two-way audio stream.
+
+ C->S:
+ INVITE sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bg5
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:323125 INVITE
+ Contact:<sip:sarvi@client.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=sarvi 2614933546 2614933548 IN IP4 192.0.2.12
+ s=Set up MRCPv2 control and audio
+ i=Add recognizer and duplex the audio
+ c=IN IP4 192.0.2.12
+ t=0 0
+ m=application 9 TCP/MRCPv2 1
+ a=setup:active
+ a=connection:existing
+ a=resource:speechsynth
+ a=cmid:1
+ m=audio 49170 RTP/AVP 0 96
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:96 telephone-event/8000
+ a=fmtp:96 0-15
+ a=recvonly
+ a=mid:1
+ m=application 9 TCP/MRCPv2 1
+ a=setup:active
+
+
+
+Burnett & Shanmugham Standards Track [Page 186]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ a=connection:existing
+ a=resource:speechrecog
+ a=cmid:2
+ m=audio 49180 RTP/AVP 0 96
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:96 telephone-event/8000
+ a=fmtp:96 0-15
+ a=sendonly
+ a=mid:2
+
+
+ S->C:
+ SIP/2.0 200 OK
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bg5;received=192.0.32.10
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:323125 INVITE
+ Contact:<sip:mresources@server.example.com>
+ Content-Type:application/sdp
+ Content-Length:...
+
+ v=0
+ o=- 3000000001 3000000003 IN IP4 192.0.2.11
+ s=Set up MRCPv2 control and audio
+ i=Add recognizer and duplex the audio
+ c=IN IP4 192.0.2.11
+ t=0 0
+ m=application 32416 TCP/MRCPv2 1
+ a=channel:32AECB23433801@speechsynth
+ a=cmid:1
+ m=audio 48260 RTP/AVP 0
+ a=rtpmap:0 pcmu/8000
+ a=sendonly
+ a=mid:1
+ m=application 32416 TCP/MRCPv2 1
+ a=channel:32AECB23433801@speechrecog
+ a=cmid:2
+ m=audio 48260 RTP/AVP 0
+ a=rtpmap:0 pcmu/8000
+ a=rtpmap:96 telephone-event/8000
+ a=fmtp:96 0-15
+ a=recvonly
+ a=mid:2
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 187]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ C->S:
+ ACK sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bg6
+ Max-Forwards:6
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ From:Sarvi <sip:sarvi@example.com>;tag=1928301774
+ Call-ID:a84b4c76e66710
+ CSeq:323125 ACK
+ Content-Length:0
+
+ A MRCPv2 SPEAK request initiates speech.
+
+ C->S:
+ MRCP/2.0 ... SPEAK 543257
+ Channel-Identifier:32AECB23433801@speechsynth
+ Kill-On-Barge-In:false
+ Voice-gender:neutral
+ Voice-age:25
+ Prosody-volume:medium
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>You have 4 new messages.</s>
+ <s>The first is from Stephanie Williams
+ <mark name="Stephanie"/>
+ and arrived at <break/>
+ <say-as interpret-as="vxml:time">0345p</say-as>.</s>
+ <s>The subject is <prosody
+ rate="-20%">ski trip</prosody></s>
+ </p>
+ </speak>
+
+ S->C:
+ MRCP/2.0 ... 543257 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechsynth
+ Speech-Marker:timestamp=857205015059
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 188]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ The synthesizer hits the special marker in the message to be spoken
+ and faithfully informs the client of the event.
+
+ S->C: MRCP/2.0 ... SPEECH-MARKER 543257 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechsynth
+ Speech-Marker:timestamp=857206027059;Stephanie
+
+ The synthesizer finishes with the SPEAK request.
+
+ S->C: MRCP/2.0 ... SPEAK-COMPLETE 543257 COMPLETE
+ Channel-Identifier:32AECB23433801@speechsynth
+ Speech-Marker:timestamp=857207685213;Stephanie
+
+
+ The recognizer is issued a request to listen for the customer
+ choices.
+
+ C->S: MRCP/2.0 ... RECOGNIZE 543258
+ Channel-Identifier:32AECB23433801@speechrecog
+ Content-Type:application/srgs+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <!-- the default grammar language is US English -->
+ <grammar xmlns="http://www.w3.org/2001/06/grammar"
+ xml:lang="en-US" version="1.0" root="request">
+ <!-- single language attachment to a rule expansion -->
+ <rule id="request">
+ Can I speak to
+ <one-of xml:lang="fr-CA">
+ <item>Michel Tremblay</item>
+ <item>Andre Roy</item>
+ </one-of>
+ </rule>
+ </grammar>
+
+
+ S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+
+ The client issues the next MRCPv2 SPEAK method.
+
+ C->S: MRCP/2.0 ... SPEAK 543259
+ Channel-Identifier:32AECB23433801@speechsynth
+ Kill-On-Barge-In:true
+ Content-Type:application/ssml+xml
+ Content-Length:...
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 189]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <?xml version="1.0"?>
+ <speak version="1.0"
+ xmlns="http://www.w3.org/2001/10/synthesis"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
+ http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
+ xml:lang="en-US">
+ <p>
+ <s>Welcome to ABC corporation.</s>
+ <s>Who would you like to talk to?</s>
+ </p>
+ </speak>
+
+ S->C: MRCP/2.0 ... 543259 200 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechsynth
+ Speech-Marker:timestamp=857207696314
+
+ This next section of this ongoing example demonstrates how kill-on-
+ barge-in support works. Since this last SPEAK request had Kill-On-
+ Barge-In set to "true", when the recognizer (the server) generated
+ the START-OF-INPUT event while a SPEAK was active, the client
+ immediately issued a BARGE-IN-OCCURRED method to the synthesizer
+ resource. The speech synthesizer then terminated playback and
+ notified the client. The completion-cause code provided the
+ indication that this was a kill-on-barge-in interruption rather than
+ a normal completion.
+
+ Note that, since the recognition and synthesizer resources are in the
+ same session on the same server, to obtain a faster response the
+ server might have internally relayed the start-of-input condition to
+ the synthesizer directly, before receiving the expected BARGE-IN-
+ OCCURRED event. However, any such communication is outside the scope
+ of MRCPv2.
+
+ S->C: MRCP/2.0 ... START-OF-INPUT 543258 IN-PROGRESS
+ Channel-Identifier:32AECB23433801@speechrecog
+ Proxy-Sync-Id:987654321
+
+
+ C->S: MRCP/2.0 ... BARGE-IN-OCCURRED 543259
+ Channel-Identifier:32AECB23433801@speechsynth
+ Proxy-Sync-Id:987654321
+
+
+ S->C: MRCP/2.0 ... 543259 200 COMPLETE
+ Channel-Identifier:32AECB23433801@speechsynth
+ Active-Request-Id-List:543258
+ Speech-Marker:timestamp=857206096314
+
+
+
+Burnett & Shanmugham Standards Track [Page 190]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ S->C: MRCP/2.0 ... SPEAK-COMPLETE 543259 COMPLETE
+ Channel-Identifier:32AECB23433801@speechsynth
+ Completion-Cause:001 barge-in
+ Speech-Marker:timestamp=857207685213
+
+
+ The recognizer resource matched the spoken stream to a grammar and
+ generated results. The result of the recognition is returned by the
+ server as part of the RECOGNITION-COMPLETE event.
+
+ S->C: MRCP/2.0 ... RECOGNITION-COMPLETE 543258 COMPLETE
+ Channel-Identifier:32AECB23433801@speechrecog
+ Completion-Cause:000 success
+ Waveform-URI:<http://web.media.com/session123/audio.wav>;
+ size=423523;duration=25432
+ Content-Type:application/nlsml+xml
+ Content-Length:...
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="session:request1@form-level.store">
+ <interpretation>
+ <instance name="Person">
+ <ex:Person>
+ <ex:Name> Andre Roy </ex:Name>
+ </ex:Person>
+ </instance>
+ <input> may I speak to Andre Roy </input>
+ </interpretation>
+ </result>
+
+ Since the client was now finished with the session, including all
+ resources, it issued a SIP BYE request to close the SIP session.
+ This caused all control channels and resources allocated under the
+ session to be deallocated.
+
+ C->S: BYE sip:mresources@server.example.com SIP/2.0
+ Via:SIP/2.0/TCP client.atlanta.example.com:5060;
+ branch=z9hG4bK74bg7
+ Max-Forwards:6
+ From:Sarvi <sip:sarvi@example.com>;tag=1928301774
+ To:MediaServer <sip:mresources@example.com>;tag=62784
+ Call-ID:a84b4c76e66710
+ CSeq:323126 BYE
+ Content-Length:0
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 191]
+
+RFC 6787 MRCPv2 November 2012
+
+
+14.2. Recognition Result Examples
+
+14.2.1. Simple ASR Ambiguity
+
+ System: To which city will you be traveling?
+ User: I want to go to Pittsburgh.
+
+ <?xml version="1.0"?>
+ <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns:ex="http://www.example.com/example"
+ grammar="http://www.example.com/flight">
+ <interpretation confidence="0.6">
+ <instance>
+ <ex:airline>
+ <ex:to_city>Pittsburgh</ex:to_city>
+ <ex:airline>
+ <instance>
+ <input mode="speech">
+ I want to go to Pittsburgh
+ </input>
+ </interpretation>
+ <interpretation confidence="0.4"
+ <instance>
+ <ex:airline>
+ <ex:to_city>Stockholm</ex:to_city>
+ </ex:airline>
+ </instance>
+ <input>I want to go to Stockholm</input>
+ </interpretation>
+ </result>
+
+14.2.2. Mixed Initiative
+
+ System: What would you like?
+ User: I would like 2 pizzas, one with pepperoni and cheese,
+ one with sausage and a bottle of coke, to go.
+
+ This example includes an order object which in turn contains objects
+ named "food_item", "drink_item", and "delivery_method". The
+ representation assumes there are no ambiguities in the speech or
+ natural language processing. Note that this representation also
+ assumes some level of intra-sentential anaphora resolution, i.e., to
+ resolve the two "one"s as "pizza".
+
+ <?xml version="1.0"?>
+ <nl:result xmlns:nl="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns="http://www.example.com/example"
+ grammar="http://www.example.com/foodorder">
+
+
+
+Burnett & Shanmugham Standards Track [Page 192]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <nl:interpretation confidence="1.0" >
+ <nl:instance>
+ <order>
+ <food_item confidence="1.0">
+ <pizza>
+ <ingredients confidence="1.0">
+ pepperoni
+ </ingredients>
+ <ingredients confidence="1.0">
+ cheese
+ </ingredients>
+ </pizza>
+ <pizza>
+ <ingredients>sausage</ingredients>
+ </pizza>
+ </food_item>
+ <drink_item confidence="1.0">
+ <size>2-liter</size>
+ </drink_item>
+ <delivery_method>to go</delivery_method>
+ </order>
+ </nl:instance>
+ <nl:input mode="speech">I would like 2 pizzas,
+ one with pepperoni and cheese, one with sausage
+ and a bottle of coke, to go.
+ </nl:input>
+ </nl:interpretation>
+ </nl:result>
+
+14.2.3. DTMF Input
+
+ A combination of DTMF input and speech is represented using nested
+ input elements. For example:
+ User: My pin is (dtmf 1 2 3 4)
+
+ <input>
+ <input mode="speech" confidence ="1.0"
+ timestamp-start="2000-04-03T0:00:00"
+ timestamp-end="2000-04-03T0:00:01.5">My pin is
+ </input>
+ <input mode="dtmf" confidence ="1.0"
+ timestamp-start="2000-04-03T0:00:01.5"
+ timestamp-end="2000-04-03T0:00:02.0">1 2 3 4
+ </input>
+ </input>
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 193]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Note that grammars that recognize mixtures of speech and DTMF are not
+ currently possible in SRGS; however, this representation might be
+ needed for other applications of NLSML, and this mixture capability
+ might be introduced in future versions of SRGS.
+
+14.2.4. Interpreting Meta-Dialog and Meta-Task Utterances
+
+ Natural language communication makes use of meta-dialog and meta-task
+ utterances. This specification is flexible enough so that meta-
+ utterances can be represented on an application-specific basis
+ without requiring other standard markup.
+
+ Here are two examples of how meta-task and meta-dialog utterances
+ might be represented.
+
+System: What toppings do you want on your pizza?
+User: What toppings do you have?
+
+<interpretation grammar="http://www.example.com/toppings">
+ <instance>
+ <question>
+ <questioned_item>toppings<questioned_item>
+ <questioned_property>
+ availability
+ </questioned_property>
+ </question>
+ </instance>
+ <input mode="speech">
+ what toppings do you have?
+ </input>
+</interpretation>
+
+User: slow down.
+
+<interpretation grammar="http://www.example.com/generalCommandsGrammar">
+ <instance>
+ <command>
+ <action>reduce speech rate</action>
+ <doer>system</doer>
+ </command>
+ </instance>
+ <input mode="speech">slow down</input>
+</interpretation>
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 194]
+
+RFC 6787 MRCPv2 November 2012
+
+
+14.2.5. Anaphora and Deixis
+
+ This specification can be used on an application-specific basis to
+ represent utterances that contain unresolved anaphoric and deictic
+ references. Anaphoric references, which include pronouns and
+ definite noun phrases that refer to something that was mentioned in
+ the preceding linguistic context, and deictic references, which refer
+ to something that is present in the non-linguistic context, present
+ similar problems in that there may not be sufficient unambiguous
+ linguistic context to determine what their exact role in the
+ interpretation should be. In order to represent unresolved anaphora
+ and deixis using this specification, one strategy would be for the
+ developer to define a more surface-oriented representation that
+ leaves the specific details of the interpretation of the reference
+ open. (This assumes that a later component is responsible for
+ actually resolving the reference).
+
+ Example: (ignoring the issue of representing the input from the
+ pointing gesture.)
+
+ System: What do you want to drink?
+ User: I want this. (clicks on picture of large root beer.)
+
+ <?xml version="1.0"?>
+ <nl:result xmlns:nl="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns="http://www.example.com/example"
+ grammar="http://www.example.com/beverages.grxml">
+ <nl:interpretation>
+ <nl:instance>
+ <doer>I</doer>
+ <action>want</action>
+ <object>this</object>
+ </nl:instance>
+ <nl:input mode="speech">I want this</nl:input>
+ </nl:interpretation>
+ </nl:result>
+
+14.2.6. Distinguishing Individual Items from Sets with One Member
+
+ For programming convenience, it is useful to be able to distinguish
+ between individual items and sets containing one item in the XML
+ representation of semantic results. For example, a pizza order might
+ consist of exactly one pizza, but a pizza might contain zero or more
+ toppings. Since there is no standard way of marking this distinction
+ directly in XML, in the current framework, the developer is free to
+ adopt any conventions that would convey this information in the XML
+ markup. One strategy would be for the developer to wrap the set of
+ items in a grouping element, as in the following example.
+
+
+
+Burnett & Shanmugham Standards Track [Page 195]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <order>
+ <pizza>
+ <topping-group>
+ <topping>mushrooms</topping>
+ </topping-group>
+ </pizza>
+ <drink>coke</drink>
+ </order>
+
+ In this example, the programmer can assume that there is supposed to
+ be exactly one pizza and one drink in the order, but the fact that
+ there is only one topping is an accident of this particular pizza
+ order.
+
+ Note that the client controls both the grammar and the semantics to
+ be returned upon grammar matches, so the user of MRCPv2 is fully
+ empowered to cause results to be returned in NLSML in such a way that
+ the interpretation is clear to that user.
+
+14.2.7. Extensibility
+
+ Extensibility in NLSML is provided via result content flexibility, as
+ described in the discussions of meta-utterances and anaphora. NLSML
+ can easily be used in sophisticated systems to convey application-
+ specific information that more basic systems would not make use of,
+ for example, defining speech acts.
+
+15. ABNF Normative Definition
+
+ The following productions make use of the core rules defined in
+ Section B.1 of RFC 5234 [RFC5234].
+
+LWS = [*WSP CRLF] 1*WSP ; linear whitespace
+
+SWS = [LWS] ; sep whitespace
+
+UTF8-NONASCII = %xC0-DF 1UTF8-CONT
+ / %xE0-EF 2UTF8-CONT
+ / %xF0-F7 3UTF8-CONT
+ / %xF8-FB 4UTF8-CONT
+ / %xFC-FD 5UTF8-CONT
+
+UTF8-CONT = %x80-BF
+UTFCHAR = %x21-7E
+ / UTF8-NONASCII
+param = *pchar
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 196]
+
+RFC 6787 MRCPv2 November 2012
+
+
+quoted-string = SWS DQUOTE *(qdtext / quoted-pair )
+ DQUOTE
+
+qdtext = LWS / %x21 / %x23-5B / %x5D-7E
+ / UTF8-NONASCII
+
+quoted-pair = "\" (%x00-09 / %x0B-0C / %x0E-7F)
+
+token = 1*(alphanum / "-" / "." / "!" / "%" / "*"
+ / "_" / "+" / "`" / "'" / "~" )
+
+reserved = ";" / "/" / "?" / ":" / "@" / "&" / "="
+ / "+" / "$" / ","
+
+mark = "-" / "_" / "." / "!" / "~" / "*" / "'"
+ / "(" / ")"
+
+unreserved = alphanum / mark
+
+pchar = unreserved / escaped
+ / ":" / "@" / "&" / "=" / "+" / "$" / ","
+
+alphanum = ALPHA / DIGIT
+
+BOOLEAN = "true" / "false"
+
+FLOAT = *DIGIT ["." *DIGIT]
+
+escaped = "%" HEXDIG HEXDIG
+
+fragment = *uric
+
+uri = [ absoluteURI / relativeURI ]
+ [ "#" fragment ]
+
+absoluteURI = scheme ":" ( hier-part / opaque-part )
+
+relativeURI = ( net-path / abs-path / rel-path )
+ [ "?" query ]
+
+hier-part = ( net-path / abs-path ) [ "?" query ]
+
+net-path = "//" authority [ abs-path ]
+
+abs-path = "/" path-segments
+
+rel-path = rel-segment [ abs-path ]
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 197]
+
+RFC 6787 MRCPv2 November 2012
+
+
+rel-segment = 1*( unreserved / escaped / ";" / "@"
+ / "&" / "=" / "+" / "$" / "," )
+
+opaque-part = uric-no-slash *uric
+
+uric = reserved / unreserved / escaped
+
+uric-no-slash = unreserved / escaped / ";" / "?" / ":"
+ / "@" / "&" / "=" / "+" / "$" / ","
+
+path-segments = segment *( "/" segment )
+
+segment = *pchar *( ";" param )
+
+scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
+
+authority = srvr / reg-name
+
+srvr = [ [ userinfo "@" ] hostport ]
+
+reg-name = 1*( unreserved / escaped / "$" / ","
+ / ";" / ":" / "@" / "&" / "=" / "+" )
+
+query = *uric
+
+userinfo = ( user ) [ ":" password ] "@"
+
+user = 1*( unreserved / escaped
+ / user-unreserved )
+
+user-unreserved = "&" / "=" / "+" / "$" / "," / ";"
+ / "?" / "/"
+
+password = *( unreserved / escaped
+ / "&" / "=" / "+" / "$" / "," )
+
+hostport = host [ ":" port ]
+
+host = hostname / IPv4address / IPv6reference
+
+hostname = *( domainlabel "." ) toplabel [ "." ]
+
+domainlabel = alphanum / alphanum *( alphanum / "-" )
+ alphanum
+
+toplabel = ALPHA / ALPHA *( alphanum / "-" )
+ alphanum
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 198]
+
+RFC 6787 MRCPv2 November 2012
+
+
+IPv4address = 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "."
+ 1*3DIGIT
+
+IPv6reference = "[" IPv6address "]"
+
+IPv6address = hexpart [ ":" IPv4address ]
+
+hexpart = hexseq / hexseq "::" [ hexseq ] / "::"
+ [ hexseq ]
+
+hexseq = hex4 *( ":" hex4)
+
+hex4 = 1*4HEXDIG
+
+port = 1*19DIGIT
+
+; generic-message is the top-level rule
+
+generic-message = start-line message-header CRLF
+ [ message-body ]
+
+message-body = *OCTET
+
+start-line = request-line / response-line / event-line
+
+request-line = mrcp-version SP message-length SP method-name
+ SP request-id CRLF
+
+response-line = mrcp-version SP message-length SP request-id
+ SP status-code SP request-state CRLF
+
+event-line = mrcp-version SP message-length SP event-name
+ SP request-id SP request-state CRLF
+
+method-name = generic-method
+ / synthesizer-method
+ / recognizer-method
+ / recorder-method
+ / verifier-method
+
+generic-method = "SET-PARAMS"
+ / "GET-PARAMS"
+
+request-state = "COMPLETE"
+ / "IN-PROGRESS"
+ / "PENDING"
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 199]
+
+RFC 6787 MRCPv2 November 2012
+
+
+event-name = synthesizer-event
+ / recognizer-event
+ / recorder-event
+ / verifier-event
+
+message-header = 1*(generic-header / resource-header / generic-field)
+
+generic-field = field-name ":" [ field-value ]
+field-name = token
+field-value = *LWS field-content *( CRLF 1*LWS field-content)
+field-content = <the OCTETs making up the field-value
+ and consisting of either *TEXT or combinations
+ of token, separators, and quoted-string>
+
+resource-header = synthesizer-header
+ / recognizer-header
+ / recorder-header
+ / verifier-header
+
+generic-header = channel-identifier
+ / accept
+ / active-request-id-list
+ / proxy-sync-id
+ / accept-charset
+ / content-type
+ / content-id
+ / content-base
+ / content-encoding
+ / content-location
+ / content-length
+ / fetch-timeout
+ / cache-control
+ / logging-tag
+ / set-cookie
+ / vendor-specific
+
+; -- content-id is as defined in RFC 2392, RFC 2046 and RFC 5322
+; -- accept and accept-charset are as defined in RFC 2616
+
+mrcp-version = "MRCP" "/" 1*2DIGIT "." 1*2DIGIT
+
+message-length = 1*19DIGIT
+
+request-id = 1*10DIGIT
+
+status-code = 3DIGIT
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 200]
+
+RFC 6787 MRCPv2 November 2012
+
+
+channel-identifier = "Channel-Identifier" ":"
+ channel-id CRLF
+
+channel-id = 1*alphanum "@" 1*alphanum
+
+active-request-id-list = "Active-Request-Id-List" ":"
+ request-id *("," request-id) CRLF
+
+proxy-sync-id = "Proxy-Sync-Id" ":" 1*VCHAR CRLF
+
+content-base = "Content-Base" ":" absoluteURI CRLF
+
+content-length = "Content-Length" ":" 1*19DIGIT CRLF
+
+content-type = "Content-Type" ":" media-type-value CRLF
+
+media-type-value = type "/" subtype *( ";" parameter )
+
+type = token
+
+subtype = token
+
+parameter = attribute "=" value
+
+attribute = token
+
+value = token / quoted-string
+
+content-encoding = "Content-Encoding" ":"
+ *WSP content-coding
+ *(*WSP "," *WSP content-coding *WSP )
+ CRLF
+
+content-coding = token
+
+content-location = "Content-Location" ":"
+ ( absoluteURI / relativeURI ) CRLF
+
+cache-control = "Cache-Control" ":"
+ [*WSP cache-directive
+ *( *WSP "," *WSP cache-directive *WSP )]
+ CRLF
+
+fetch-timeout = "Fetch-Timeout" ":" 1*19DIGIT CRLF
+
+cache-directive = "max-age" "=" delta-seconds
+ / "max-stale" ["=" delta-seconds ]
+ / "min-fresh" "=" delta-seconds
+
+
+
+Burnett & Shanmugham Standards Track [Page 201]
+
+RFC 6787 MRCPv2 November 2012
+
+
+delta-seconds = 1*19DIGIT
+
+logging-tag = "Logging-Tag" ":" 1*UTFCHAR CRLF
+
+vendor-specific = "Vendor-Specific-Parameters" ":"
+ [vendor-specific-av-pair
+ *(";" vendor-specific-av-pair)] CRLF
+
+vendor-specific-av-pair = vendor-av-pair-name "="
+ value
+
+vendor-av-pair-name = 1*UTFCHAR
+
+set-cookie = "Set-Cookie:" SP set-cookie-string
+set-cookie-string = cookie-pair *( ";" SP cookie-av )
+cookie-pair = cookie-name "=" cookie-value
+cookie-name = token
+cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
+cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
+token = <token, defined in [RFC2616], Section 2.2>
+
+cookie-av = expires-av / max-age-av / domain-av /
+ path-av / secure-av / httponly-av /
+ extension-av / age-av
+expires-av = "Expires=" sane-cookie-date
+sane-cookie-date = <rfc1123-date, defined in [RFC2616], Section 3.3.1>
+max-age-av = "Max-Age=" non-zero-digit *DIGIT
+non-zero-digit = %x31-39
+domain-av = "Domain=" domain-value
+domain-value = <subdomain>
+path-av = "Path=" path-value
+path-value = <any CHAR except CTLs or ";">
+secure-av = "Secure"
+httponly-av = "HttpOnly"
+extension-av = <any CHAR except CTLs or ";">
+age-av = "Age=" delta-seconds
+
+; Synthesizer ABNF
+
+synthesizer-method = "SPEAK"
+ / "STOP"
+ / "PAUSE"
+ / "RESUME"
+ / "BARGE-IN-OCCURRED"
+ / "CONTROL"
+ / "DEFINE-LEXICON"
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 202]
+
+RFC 6787 MRCPv2 November 2012
+
+
+synthesizer-event = "SPEECH-MARKER"
+ / "SPEAK-COMPLETE"
+
+synthesizer-header = jump-size
+ / kill-on-barge-in
+ / speaker-profile
+ / completion-cause
+ / completion-reason
+ / voice-parameter
+ / prosody-parameter
+ / speech-marker
+ / speech-language
+ / fetch-hint
+ / audio-fetch-hint
+ / failed-uri
+ / failed-uri-cause
+ / speak-restart
+ / speak-length
+ / load-lexicon
+ / lexicon-search-order
+
+jump-size = "Jump-Size" ":" speech-length-value CRLF
+
+speech-length-value = numeric-speech-length
+ / text-speech-length
+
+text-speech-length = 1*UTFCHAR SP "Tag"
+
+numeric-speech-length = ("+" / "-") positive-speech-length
+
+positive-speech-length = 1*19DIGIT SP numeric-speech-unit
+
+numeric-speech-unit = "Second"
+ / "Word"
+ / "Sentence"
+ / "Paragraph"
+
+kill-on-barge-in = "Kill-On-Barge-In" ":" BOOLEAN
+ CRLF
+
+speaker-profile = "Speaker-Profile" ":" uri CRLF
+
+completion-cause = "Completion-Cause" ":" cause-code SP
+ cause-name CRLF
+cause-code = 3DIGIT
+cause-name = *VCHAR
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 203]
+
+RFC 6787 MRCPv2 November 2012
+
+
+completion-reason = "Completion-Reason" ":"
+ quoted-string CRLF
+
+voice-parameter = voice-gender
+ / voice-age
+ / voice-variant
+ / voice-name
+
+voice-gender = "Voice-Gender:" voice-gender-value CRLF
+
+voice-gender-value = "male"
+ / "female"
+ / "neutral"
+
+voice-age = "Voice-Age:" 1*3DIGIT CRLF
+
+voice-variant = "Voice-Variant:" 1*19DIGIT CRLF
+
+voice-name = "Voice-Name:"
+ 1*UTFCHAR *(1*WSP 1*UTFCHAR) CRLF
+
+prosody-parameter = "Prosody-" prosody-param-name ":"
+ prosody-param-value CRLF
+
+prosody-param-name = 1*VCHAR
+
+prosody-param-value = 1*VCHAR
+
+timestamp = "timestamp" "=" time-stamp-value
+
+time-stamp-value = 1*20DIGIT
+
+speech-marker = "Speech-Marker" ":"
+ timestamp
+ [";" 1*(UTFCHAR / %x20)] CRLF
+
+speech-language = "Speech-Language" ":" 1*VCHAR CRLF
+
+fetch-hint = "Fetch-Hint" ":" ("prefetch" / "safe") CRLF
+
+audio-fetch-hint = "Audio-Fetch-Hint" ":"
+ ("prefetch" / "safe" / "stream") CRLF
+
+failed-uri = "Failed-URI" ":" absoluteURI CRLF
+
+failed-uri-cause = "Failed-URI-Cause" ":" 1*UTFCHAR CRLF
+
+speak-restart = "Speak-Restart" ":" BOOLEAN CRLF
+
+
+
+Burnett & Shanmugham Standards Track [Page 204]
+
+RFC 6787 MRCPv2 November 2012
+
+
+speak-length = "Speak-Length" ":" positive-length-value
+ CRLF
+
+positive-length-value = positive-speech-length
+ / text-speech-length
+
+load-lexicon = "Load-Lexicon" ":" BOOLEAN CRLF
+
+lexicon-search-order = "Lexicon-Search-Order" ":"
+ "<" absoluteURI ">" *(" " "<" absoluteURI ">") CRLF
+
+; Recognizer ABNF
+
+recognizer-method = recog-only-method
+ / enrollment-method
+
+recog-only-method = "DEFINE-GRAMMAR"
+ / "RECOGNIZE"
+ / "INTERPRET"
+ / "GET-RESULT"
+ / "START-INPUT-TIMERS"
+ / "STOP"
+
+enrollment-method = "START-PHRASE-ENROLLMENT"
+ / "ENROLLMENT-ROLLBACK"
+ / "END-PHRASE-ENROLLMENT"
+ / "MODIFY-PHRASE"
+ / "DELETE-PHRASE"
+
+recognizer-event = "START-OF-INPUT"
+ / "RECOGNITION-COMPLETE"
+ / "INTERPRETATION-COMPLETE"
+
+recognizer-header = recog-only-header
+ / enrollment-header
+
+recog-only-header = confidence-threshold
+ / sensitivity-level
+ / speed-vs-accuracy
+ / n-best-list-length
+ / input-type
+ / no-input-timeout
+ / recognition-timeout
+ / waveform-uri
+ / input-waveform-uri
+ / completion-cause
+ / completion-reason
+ / recognizer-context-block
+
+
+
+Burnett & Shanmugham Standards Track [Page 205]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ / start-input-timers
+ / speech-complete-timeout
+ / speech-incomplete-timeout
+ / dtmf-interdigit-timeout
+ / dtmf-term-timeout
+ / dtmf-term-char
+ / failed-uri
+ / failed-uri-cause
+ / save-waveform
+ / media-type
+ / new-audio-channel
+ / speech-language
+ / ver-buffer-utterance
+ / recognition-mode
+ / cancel-if-queue
+ / hotword-max-duration
+ / hotword-min-duration
+ / interpret-text
+ / dtmf-buffer-time
+ / clear-dtmf-buffer
+ / early-no-match
+
+enrollment-header = num-min-consistent-pronunciations
+ / consistency-threshold
+ / clash-threshold
+ / personal-grammar-uri
+ / enroll-utterance
+ / phrase-id
+ / phrase-nl
+ / weight
+ / save-best-waveform
+ / new-phrase-id
+ / confusable-phrases-uri
+ / abort-phrase-enrollment
+
+confidence-threshold = "Confidence-Threshold" ":"
+ FLOAT CRLF
+
+sensitivity-level = "Sensitivity-Level" ":" FLOAT
+ CRLF
+
+speed-vs-accuracy = "Speed-Vs-Accuracy" ":" FLOAT
+ CRLF
+
+n-best-list-length = "N-Best-List-Length" ":" 1*19DIGIT
+ CRLF
+
+input-type = "Input-Type" ":" inputs CRLF
+
+
+
+Burnett & Shanmugham Standards Track [Page 206]
+
+RFC 6787 MRCPv2 November 2012
+
+
+inputs = "speech" / "dtmf"
+
+no-input-timeout = "No-Input-Timeout" ":" 1*19DIGIT
+ CRLF
+
+recognition-timeout = "Recognition-Timeout" ":" 1*19DIGIT
+ CRLF
+
+waveform-uri = "Waveform-URI" ":" ["<" uri ">"
+ ";" "size" "=" 1*19DIGIT
+ ";" "duration" "=" 1*19DIGIT] CRLF
+
+recognizer-context-block = "Recognizer-Context-Block" ":"
+ [1*VCHAR] CRLF
+
+start-input-timers = "Start-Input-Timers" ":"
+ BOOLEAN CRLF
+
+speech-complete-timeout = "Speech-Complete-Timeout" ":"
+ 1*19DIGIT CRLF
+
+speech-incomplete-timeout = "Speech-Incomplete-Timeout" ":"
+ 1*19DIGIT CRLF
+
+dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":"
+ 1*19DIGIT CRLF
+
+dtmf-term-timeout = "DTMF-Term-Timeout" ":" 1*19DIGIT
+ CRLF
+
+dtmf-term-char = "DTMF-Term-Char" ":" VCHAR CRLF
+
+save-waveform = "Save-Waveform" ":" BOOLEAN CRLF
+
+new-audio-channel = "New-Audio-Channel" ":"
+ BOOLEAN CRLF
+
+recognition-mode = "Recognition-Mode" ":"
+ "normal" / "hotword" CRLF
+
+cancel-if-queue = "Cancel-If-Queue" ":" BOOLEAN CRLF
+
+hotword-max-duration = "Hotword-Max-Duration" ":"
+ 1*19DIGIT CRLF
+
+hotword-min-duration = "Hotword-Min-Duration" ":"
+ 1*19DIGIT CRLF
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 207]
+
+RFC 6787 MRCPv2 November 2012
+
+
+interpret-text = "Interpret-Text" ":" 1*VCHAR CRLF
+
+dtmf-buffer-time = "DTMF-Buffer-Time" ":" 1*19DIGIT CRLF
+
+clear-dtmf-buffer = "Clear-DTMF-Buffer" ":" BOOLEAN CRLF
+
+early-no-match = "Early-No-Match" ":" BOOLEAN CRLF
+
+num-min-consistent-pronunciations =
+ "Num-Min-Consistent-Pronunciations" ":" 1*19DIGIT CRLF
+
+consistency-threshold = "Consistency-Threshold" ":" FLOAT
+ CRLF
+
+clash-threshold = "Clash-Threshold" ":" FLOAT CRLF
+
+personal-grammar-uri = "Personal-Grammar-URI" ":" uri CRLF
+
+enroll-utterance = "Enroll-Utterance" ":" BOOLEAN CRLF
+
+phrase-id = "Phrase-ID" ":" 1*VCHAR CRLF
+
+phrase-nl = "Phrase-NL" ":" 1*UTFCHAR CRLF
+
+weight = "Weight" ":" FLOAT CRLF
+
+save-best-waveform = "Save-Best-Waveform" ":"
+ BOOLEAN CRLF
+
+new-phrase-id = "New-Phrase-ID" ":" 1*VCHAR CRLF
+
+confusable-phrases-uri = "Confusable-Phrases-URI" ":"
+ uri CRLF
+
+abort-phrase-enrollment = "Abort-Phrase-Enrollment" ":"
+ BOOLEAN CRLF
+
+; Recorder ABNF
+
+recorder-method = "RECORD"
+ / "STOP"
+ / "START-INPUT-TIMERS"
+
+recorder-event = "START-OF-INPUT"
+ / "RECORD-COMPLETE"
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 208]
+
+RFC 6787 MRCPv2 November 2012
+
+
+recorder-header = sensitivity-level
+ / no-input-timeout
+ / completion-cause
+ / completion-reason
+ / failed-uri
+ / failed-uri-cause
+ / record-uri
+ / media-type
+ / max-time
+ / trim-length
+ / final-silence
+ / capture-on-speech
+ / ver-buffer-utterance
+ / start-input-timers
+ / new-audio-channel
+
+record-uri = "Record-URI" ":" [ "<" uri ">"
+ ";" "size" "=" 1*19DIGIT
+ ";" "duration" "=" 1*19DIGIT] CRLF
+
+media-type = "Media-Type" ":" media-type-value CRLF
+
+max-time = "Max-Time" ":" 1*19DIGIT CRLF
+
+trim-length = "Trim-Length" ":" 1*19DIGIT CRLF
+
+final-silence = "Final-Silence" ":" 1*19DIGIT CRLF
+
+capture-on-speech = "Capture-On-Speech " ":"
+ BOOLEAN CRLF
+
+; Verifier ABNF
+
+verifier-method = "START-SESSION"
+ / "END-SESSION"
+ / "QUERY-VOICEPRINT"
+ / "DELETE-VOICEPRINT"
+ / "VERIFY"
+ / "VERIFY-FROM-BUFFER"
+ / "VERIFY-ROLLBACK"
+ / "STOP"
+ / "CLEAR-BUFFER"
+ / "START-INPUT-TIMERS"
+ / "GET-INTERMEDIATE-RESULT"
+
+verifier-event = "VERIFICATION-COMPLETE"
+ / "START-OF-INPUT"
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 209]
+
+RFC 6787 MRCPv2 November 2012
+
+
+verifier-header = repository-uri
+ / voiceprint-identifier
+ / verification-mode
+ / adapt-model
+ / abort-model
+ / min-verification-score
+ / num-min-verification-phrases
+ / num-max-verification-phrases
+ / no-input-timeout
+ / save-waveform
+ / media-type
+ / waveform-uri
+ / voiceprint-exists
+ / ver-buffer-utterance
+ / input-waveform-uri
+ / completion-cause
+ / completion-reason
+ / speech-complete-timeout
+ / new-audio-channel
+ / abort-verification
+ / start-input-timers
+ / input-type
+
+repository-uri = "Repository-URI" ":" uri CRLF
+
+voiceprint-identifier = "Voiceprint-Identifier" ":"
+ vid *[";" vid] CRLF
+vid = 1*VCHAR ["." 1*VCHAR]
+
+verification-mode = "Verification-Mode" ":"
+ verification-mode-string
+
+verification-mode-string = "train" / "verify"
+
+adapt-model = "Adapt-Model" ":" BOOLEAN CRLF
+
+abort-model = "Abort-Model" ":" BOOLEAN CRLF
+
+min-verification-score = "Min-Verification-Score" ":"
+ [ %x2D ] FLOAT CRLF
+
+num-min-verification-phrases = "Num-Min-Verification-Phrases"
+ ":" 1*19DIGIT CRLF
+
+num-max-verification-phrases = "Num-Max-Verification-Phrases"
+ ":" 1*19DIGIT CRLF
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 210]
+
+RFC 6787 MRCPv2 November 2012
+
+
+voiceprint-exists = "Voiceprint-Exists" ":"
+ BOOLEAN CRLF
+
+ver-buffer-utterance = "Ver-Buffer-Utterance" ":"
+ BOOLEAN CRLF
+
+input-waveform-uri = "Input-Waveform-URI" ":" uri CRLF
+
+abort-verification = "Abort-Verification " ":"
+ BOOLEAN CRLF
+
+ The following productions add a new SDP session-level attribute. See
+ Paragraph 5.
+
+ cmid-attribute = "a=cmid:" identification-tag
+
+ identification-tag = token
+
+16. XML Schemas
+
+16.1. NLSML Schema Definition
+
+ <?xml version="1.0" encoding="UTF-8"?>
+ <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
+ targetNamespace="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns="urn:ietf:params:xml:ns:mrcpv2"
+ elementFormDefault="qualified"
+ attributeFormDefault="unqualified" >
+ <xs:annotation>
+ <xs:documentation> Natural Language Semantic Markup Schema
+ </xs:documentation>
+ </xs:annotation>
+ <xs:include schemaLocation="enrollment-schema.rng"/>
+ <xs:include schemaLocation="verification-schema.rng"/>
+ <xs:element name="result">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element name="interpretation" maxOccurs="unbounded">
+ <xs:complexType>
+ <xs:sequence>
+ <xs:element name="instance">
+ <xs:complexType mixed="true">
+ <xs:sequence minOccurs="0">
+ <xs:any namespace="##other" processContents="lax"/>
+ </xs:sequence>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="input" minOccurs="0">
+
+
+
+Burnett & Shanmugham Standards Track [Page 211]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <xs:complexType mixed="true">
+ <xs:choice>
+ <xs:element name="noinput" minOccurs="0"/>
+ <xs:element name="nomatch" minOccurs="0"/>
+ <xs:element name="input" minOccurs="0"/>
+ </xs:choice>
+ <xs:attribute name="mode"
+ type="xs:string"
+ default="speech"/>
+ <xs:attribute name="confidence"
+ type="confidenceinfo"
+ default="1.0"/>
+ <xs:attribute name="timestamp-start"
+ type="xs:string"/>
+ <xs:attribute name="timestamp-end"
+ type="xs:string"/>
+ </xs:complexType>
+ </xs:element>
+ </xs:sequence>
+ <xs:attribute name="confidence" type="confidenceinfo"
+ default="1.0"/>
+ <xs:attribute name="grammar" type="xs:anyURI"
+ use="optional"/>
+ </xs:complexType>
+ </xs:element>
+ <xs:element name="enrollment-result"
+ type="enrollment-contents"/>
+ <xs:element name="verification-result"
+ type="verification-contents"/>
+ </xs:sequence>
+ <xs:attribute name="grammar" type="xs:anyURI"
+ use="optional"/>
+ </xs:complexType>
+ </xs:element>
+
+ <xs:simpleType name="confidenceinfo">
+ <xs:restriction base="xs:float">
+ <xs:minInclusive value="0.0"/>
+ <xs:maxInclusive value="1.0"/>
+ </xs:restriction>
+ </xs:simpleType>
+ </xs:schema>
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 212]
+
+RFC 6787 MRCPv2 November 2012
+
+
+16.2. Enrollment Results Schema Definition
+
+ <?xml version="1.0" encoding="UTF-8"?>
+
+ <!-- MRCP Enrollment Schema
+ (See http://www.oasis-open.org/committees/relax-ng/spec.html)
+ -->
+
+ <grammar datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
+ ns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns="http://relaxng.org/ns/structure/1.0">
+
+ <start>
+ <element name="enrollment-result">
+ <ref name="enrollment-content"/>
+ </element>
+ </start>
+
+ <define name="enrollment-content">
+ <interleave>
+ <element name="num-clashes">
+ <data type="nonNegativeInteger"/>
+ </element>
+ <element name="num-good-repetitions">
+ <data type="nonNegativeInteger"/>
+ </element>
+ <element name="num-repetitions-still-needed">
+ <data type="nonNegativeInteger"/>
+ </element>
+ <element name="consistency-status">
+ <choice>
+ <value>consistent</value>
+ <value>inconsistent</value>
+ <value>undecided</value>
+ </choice>
+ </element>
+ <optional>
+ <element name="clash-phrase-ids">
+ <oneOrMore>
+ <element name="item">
+ <data type="token"/>
+ </element>
+ </oneOrMore>
+ </element>
+ </optional>
+ <optional>
+ <element name="transcriptions">
+ <oneOrMore>
+
+
+
+Burnett & Shanmugham Standards Track [Page 213]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <element name="item">
+ <text/>
+ </element>
+ </oneOrMore>
+ </element>
+ </optional>
+ <optional>
+ <element name="confusable-phrases">
+ <oneOrMore>
+ <element name="item">
+ <text/>
+ </element>
+ </oneOrMore>
+ </element>
+ </optional>
+ </interleave>
+ </define>
+
+ </grammar>
+
+16.3. Verification Results Schema Definition
+ <?xml version="1.0" encoding="UTF-8"?>
+
+ <!-- MRCP Verification Results Schema
+ (See http://www.oasis-open.org/committees/relax-ng/spec.html)
+ -->
+
+ <grammar datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
+ ns="urn:ietf:params:xml:ns:mrcpv2"
+ xmlns="http://relaxng.org/ns/structure/1.0">
+
+ <start>
+ <element name="verification-result">
+ <ref name="verification-contents"/>
+ </element>
+ </start>
+
+ <define name="verification-contents">
+ <element name="voiceprint">
+ <ref name="firstVoiceprintContent"/>
+ </element>
+ <zeroOrMore>
+ <element name="voiceprint">
+ <ref name="restVoiceprintContent"/>
+ </element>
+ </zeroOrMore>
+ </define>
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 214]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <define name="firstVoiceprintContent">
+ <attribute name="id">
+ <data type="string"/>
+ </attribute>
+ <interleave>
+ <optional>
+ <element name="adapted">
+ <data type="boolean"/>
+ </element>
+ </optional>
+ <optional>
+ <element name="needmoredata">
+ <ref name="needmoredataContent"/>
+ </element>
+ </optional>
+ <optional>
+ <element name="incremental">
+ <ref name="firstCommonContent"/>
+ </element>
+ </optional>
+ <element name="cumulative">
+ <ref name="firstCommonContent"/>
+ </element>
+ </interleave>
+ </define>
+
+ <define name="restVoiceprintContent">
+ <attribute name="id">
+ <data type="string"/>
+ </attribute>
+ <element name="cumulative">
+ <ref name="restCommonContent"/>
+ </element>
+ </define>
+
+ <define name="firstCommonContent">
+ <interleave>
+ <element name="decision">
+ <ref name="decisionContent"/>
+ </element>
+ <optional>
+ <element name="utterance-length">
+ <ref name="utterance-lengthContent"/>
+ </element>
+ </optional>
+ <optional>
+ <element name="device">
+ <ref name="deviceContent"/>
+
+
+
+Burnett & Shanmugham Standards Track [Page 215]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ </element>
+ </optional>
+ <optional>
+ <element name="gender">
+ <ref name="genderContent"/>
+ </element>
+ </optional>
+ <zeroOrMore>
+ <element name="verification-score">
+ <ref name="verification-scoreContent"/>
+ </element>
+ </zeroOrMore>
+ </interleave>
+ </define>
+
+ <define name="restCommonContent">
+ <interleave>
+ <optional>
+ <element name="decision">
+ <ref name="decisionContent"/>
+ </element>
+ </optional>
+ <optional>
+ <element name="device">
+ <ref name="deviceContent"/>
+ </element>
+ </optional>
+ <optional>
+ <element name="gender">
+ <ref name="genderContent"/>
+ </element>
+ </optional>
+ <zeroOrMore>
+ <element name="verification-score">
+ <ref name="verification-scoreContent"/>
+ </element>
+ </zeroOrMore>
+ </interleave>
+ </define>
+
+ <define name="decisionContent">
+ <choice>
+ <value>accepted</value>
+ <value>rejected</value>
+ <value>undecided</value>
+ </choice>
+ </define>
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 216]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ <define name="needmoredataContent">
+ <data type="boolean"/>
+ </define>
+
+ <define name="utterance-lengthContent">
+ <data type="nonNegativeInteger"/>
+ </define>
+
+ <define name="deviceContent">
+ <choice>
+ <value>cellular-phone</value>
+ <value>electret-phone</value>
+ <value>carbon-button-phone</value>
+ <value>unknown</value>
+ </choice>
+ </define>
+
+ <define name="genderContent">
+ <choice>
+ <value>male</value>
+ <value>female</value>
+ <value>unknown</value>
+ </choice>
+ </define>
+
+ <define name="verification-scoreContent">
+ <data type="float">
+ <param name="minInclusive">-1</param>
+ <param name="maxInclusive">1</param>
+ </data>
+ </define>
+
+ </grammar>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 217]
+
+RFC 6787 MRCPv2 November 2012
+
+
+17. References
+
+17.1. Normative References
+
+ [ISO.8859-1.1987]
+ International Organization for Standardization,
+ "Information technology - 8-bit single byte coded graphic
+ - character sets - Part 1: Latin alphabet No. 1, JTC1/
+ SC2", ISO Standard 8859-1, 1987.
+
+ [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
+ RFC 793, September 1981.
+
+ [RFC1035] Mockapetris, P., "Domain names - implementation and
+ specification", STD 13, RFC 1035, November 1987.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+ [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
+ Streaming Protocol (RTSP)", RFC 2326, April 1998.
+
+ [RFC2392] Levinson, E., "Content-ID and Message-ID Uniform Resource
+ Locators", RFC 2392, August 1998.
+
+ [RFC2483] Mealling, M. and R. Daniel, "URI Resolution Services
+ Necessary for URN Resolution", RFC 2483, January 1999.
+
+ [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
+ Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
+ Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
+
+ [RFC3023] Murata, M., St. Laurent, S., and D. Kohn, "XML Media
+ Types", RFC 3023, January 2001.
+
+ [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
+ A., Peterson, J., Sparks, R., Handley, M., and E.
+ Schooler, "SIP: Session Initiation Protocol", RFC 3261,
+ June 2002.
+
+ [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
+ with Session Description Protocol (SDP)", RFC 3264,
+ June 2002.
+
+ [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
+ Jacobson, "RTP: A Transport Protocol for Real-Time
+ Applications", STD 64, RFC 3550, July 2003.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 218]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
+ 10646", STD 63, RFC 3629, November 2003.
+
+ [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688,
+ January 2004.
+
+ [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
+ Norrman, "The Secure Real-time Transport Protocol (SRTP)",
+ RFC 3711, March 2004.
+
+ [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
+ Resource Identifier (URI): Generic Syntax", STD 66,
+ RFC 3986, January 2005.
+
+ [RFC4145] Yon, D. and G. Camarillo, "TCP-Based Media Transport in
+ the Session Description Protocol (SDP)", RFC 4145,
+ September 2005.
+
+ [RFC4288] Freed, N. and J. Klensin, "Media Type Specifications and
+ Registration Procedures", BCP 13, RFC 4288, December 2005.
+
+ [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
+ Description Protocol", RFC 4566, July 2006.
+
+ [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session
+ Description Protocol (SDP) Security Descriptions for Media
+ Streams", RFC 4568, July 2006.
+
+ [RFC4572] Lennox, J., "Connection-Oriented Media Transport over the
+ Transport Layer Security (TLS) Protocol in the Session
+ Description Protocol (SDP)", RFC 4572, July 2006.
+
+ [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an
+ IANA Considerations Section in RFCs", BCP 26, RFC 5226,
+ May 2008.
+
+ [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
+ Specifications: ABNF", STD 68, RFC 5234, January 2008.
+
+ [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
+ (TLS) Protocol Version 1.2", RFC 5246, August 2008.
+
+ [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322,
+ October 2008.
+
+ [RFC5646] Phillips, A. and M. Davis, "Tags for Identifying
+ Languages", BCP 47, RFC 5646, September 2009.
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 219]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description
+ Protocol (SDP) Grouping Framework", RFC 5888, June 2010.
+
+ [RFC5905] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network
+ Time Protocol Version 4: Protocol and Algorithms
+ Specification", RFC 5905, June 2010.
+
+ [RFC5922] Gurbani, V., Lawrence, S., and A. Jeffrey, "Domain
+ Certificates in the Session Initiation Protocol (SIP)",
+ RFC 5922, June 2010.
+
+ [RFC6265] Barth, A., "HTTP State Management Mechanism", RFC 6265,
+ April 2011.
+
+ [W3C.REC-semantic-interpretation-20070405]
+ Tichelen, L. and D. Burke, "Semantic Interpretation for
+ Speech Recognition (SISR) Version 1.0", World Wide Web
+ Consortium Recommendation REC-semantic-
+ interpretation-20070405, April 2007,
+ <http://www.w3.org/TR/2007/
+ REC-semantic-interpretation-20070405>.
+
+ [W3C.REC-speech-grammar-20040316]
+ McGlashan, S. and A. Hunt, "Speech Recognition Grammar
+ Specification Version 1.0", World Wide Web Consortium
+ Recommendation REC-speech-grammar-20040316, March 2004,
+ <http://www.w3.org/TR/2004/REC-speech-grammar-20040316>.
+
+ [W3C.REC-speech-synthesis-20040907]
+ Walker, M., Burnett, D., and A. Hunt, "Speech Synthesis
+ Markup Language (SSML) Version 1.0", World Wide Web
+ Consortium Recommendation REC-speech-synthesis-20040907,
+ September 2004,
+ <http://www.w3.org/TR/2004/REC-speech-synthesis-20040907>.
+
+ [W3C.REC-xml-names11-20040204]
+ Layman, A., Bray, T., Hollander, D., and R. Tobin,
+ "Namespaces in XML 1.1", World Wide Web Consortium First
+ Edition REC-xml-names11-20040204, February 2004,
+ <http://www.w3.org/TR/2004/REC-xml-names11-20040204>.
+
+17.2. Informative References
+
+ [ISO.8601.1988]
+ International Organization for Standardization, "Data
+ elements and interchange formats - Information interchange
+ - Representation of dates and times", ISO Standard 8601,
+ June 1988.
+
+
+
+Burnett & Shanmugham Standards Track [Page 220]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ [Q.23] International Telecommunications Union, "Technical
+ Features of Push-Button Telephone Sets", ITU-T Q.23, 1993.
+
+ [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
+ Extensions (MIME) Part Two: Media Types", RFC 2046,
+ November 1996.
+
+ [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000.
+
+ [RFC4217] Ford-Hutchinson, P., "Securing FTP with TLS", RFC 4217,
+ October 2005.
+
+ [RFC4267] Froumentin, M., "The W3C Speech Interface Framework Media
+ Types: application/voicexml+xml, application/ssml+xml,
+ application/srgs, application/srgs+xml, application/
+ ccxml+xml, and application/pls+xml", RFC 4267,
+ November 2005.
+
+ [RFC4301] Kent, S. and K. Seo, "Security Architecture for the
+ Internet Protocol", RFC 4301, December 2005.
+
+ [RFC4313] Oran, D., "Requirements for Distributed Control of
+ Automatic Speech Recognition (ASR), Speaker
+ Identification/Speaker Verification (SI/SV), and Text-to-
+ Speech (TTS) Resources", RFC 4313, December 2005.
+
+ [RFC4395] Hansen, T., Hardie, T., and L. Masinter, "Guidelines and
+ Registration Procedures for New URI Schemes", BCP 35,
+ RFC 4395, February 2006.
+
+ [RFC4463] Shanmugham, S., Monaco, P., and B. Eberman, "A Media
+ Resource Control Protocol (MRCP) Developed by Cisco,
+ Nuance, and Speechworks", RFC 4463, April 2006.
+
+ [RFC4467] Crispin, M., "Internet Message Access Protocol (IMAP) -
+ URLAUTH Extension", RFC 4467, May 2006.
+
+ [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF
+ Digits, Telephony Tones, and Telephony Signals", RFC 4733,
+ December 2006.
+
+ [RFC4960] Stewart, R., "Stream Control Transmission Protocol",
+ RFC 4960, September 2007.
+
+ [RFC6454] Barth, A., "The Web Origin Concept", RFC 6454,
+ December 2011.
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 221]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ [W3C.REC-emma-20090210]
+ Johnston, M., Baggia, P., Burnett, D., Carter, J., Dahl,
+ D., McCobb, G., and D. Raggett, "EMMA: Extensible
+ MultiModal Annotation markup language", World Wide Web
+ Consortium Recommendation REC-emma-20090210,
+ February 2009,
+ <http://www.w3.org/TR/2009/REC-emma-20090210>.
+
+ [W3C.REC-pronunciation-lexicon-20081014]
+ Baggia, P., Bagshaw, P., Burnett, D., Carter, J., and F.
+ Scahill, "Pronunciation Lexicon Specification (PLS)",
+ World Wide Web Consortium Recommendation
+ REC-pronunciation-lexicon-20081014, October 2008,
+ <http://www.w3.org/TR/2008/
+ REC-pronunciation-lexicon-20081014>.
+
+ [W3C.REC-voicexml20-20040316]
+ Danielsen, P., Porter, B., Hunt, A., Rehor, K., Lucas, B.,
+ Burnett, D., Ferrans, J., Tryphonas, S., McGlashan, S.,
+ and J. Carter, "Voice Extensible Markup Language
+ (VoiceXML) Version 2.0", World Wide Web Consortium
+ Recommendation REC-voicexml20-20040316, March 2004,
+ <http://www.w3.org/TR/2004/REC-voicexml20-20040316>.
+
+ [refs.javaSpeechGrammarFormat]
+ Sun Microsystems, "Java Speech Grammar Format Version
+ 1.0", October 1998.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 222]
+
+RFC 6787 MRCPv2 November 2012
+
+
+Appendix A. Contributors
+
+ Pierre Forgues
+ Nuance Communications Ltd.
+ 1500 University Street
+ Suite 935
+ Montreal, Quebec
+ Canada H3A 3S7
+
+ EMail: forgues@nuance.com
+
+
+ Charles Galles
+ Intervoice, Inc.
+ 17811 Waterview Parkway
+ Dallas, Texas 75252
+ USA
+
+ EMail: charles.galles@intervoice.com
+
+
+ Klaus Reifenrath
+ Scansoft, Inc
+ Guldensporenpark 32
+ Building D
+ 9820 Merelbeke
+ Belgium
+
+ EMail: klaus.reifenrath@scansoft.com
+
+Appendix B. Acknowledgements
+
+ Andre Gillet (Nuance Communications)
+ Andrew Hunt (ScanSoft)
+ Andrew Wahbe (Genesys)
+ Aaron Kneiss (ScanSoft)
+ Brian Eberman (ScanSoft)
+ Corey Stohs (Cisco Systems, Inc.)
+ Dave Burke (VoxPilot)
+ Jeff Kusnitz (IBM Corp)
+ Ganesh N. Ramaswamy (IBM Corp)
+ Klaus Reifenrath (ScanSoft)
+ Kristian Finlator (ScanSoft)
+ Magnus Westerlund (Ericsson)
+ Martin Dragomirecky (Cisco Systems, Inc.)
+ Paolo Baggia (Loquendo)
+ Peter Monaco (Nuance Communications)
+ Pierre Forgues (Nuance Communications)
+
+
+
+Burnett & Shanmugham Standards Track [Page 223]
+
+RFC 6787 MRCPv2 November 2012
+
+
+ Ran Zilca (IBM Corp)
+ Suresh Kaliannan (Cisco Systems, Inc.)
+ Skip Cave (Intervoice, Inc.)
+ Thomas Gal (LumenVox)
+
+ The chairs of the SPEECHSC work group are Eric Burger (Georgetown
+ University) and Dave Oran (Cisco Systems, Inc.).
+
+ Many thanks go in particular to Robert Sparks, Alex Agranovsky, and
+ Henry Phan, who were there at the end to dot all the i's and cross
+ all the t's.
+
+Authors' Addresses
+
+ Daniel C. Burnett
+ Voxeo
+ 189 South Orange Avenue #1000
+ Orlando, FL 32801
+ USA
+
+ EMail: dburnett@voxeo.com
+
+
+ Saravanan Shanmugham
+ Cisco Systems, Inc.
+ 170 W. Tasman Dr.
+ San Jose, CA 95134
+ USA
+
+ EMail: sarvi@cisco.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Burnett & Shanmugham Standards Track [Page 224]
+