From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc6787.txt | 12547 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 12547 insertions(+) create mode 100644 doc/rfc/rfc6787.txt (limited to 'doc/rfc/rfc6787.txt') diff --git a/doc/rfc/rfc6787.txt b/doc/rfc/rfc6787.txt new file mode 100644 index 0000000..ca651b7 --- /dev/null +++ b/doc/rfc/rfc6787.txt @@ -0,0 +1,12547 @@ + + + + + + +Internet Engineering Task Force (IETF) D. Burnett +Request for Comments: 6787 Voxeo +Category: Standards Track S. Shanmugham +ISSN: 2070-1721 Cisco Systems, Inc. + November 2012 + + + Media Resource Control Protocol Version 2 (MRCPv2) + +Abstract + + The Media Resource Control Protocol Version 2 (MRCPv2) allows client + hosts to control media service resources such as speech synthesizers, + recognizers, verifiers, and identifiers residing in servers on the + network. MRCPv2 is not a "stand-alone" protocol -- it relies on + other protocols, such as the Session Initiation Protocol (SIP), to + coordinate MRCPv2 clients and servers and manage sessions between + them, and the Session Description Protocol (SDP) to describe, + discover, and exchange capabilities. It also depends on SIP and SDP + to establish the media sessions and associated parameters between the + media source or sink and the media server. Once this is done, the + MRCPv2 exchange operates over the control session established above, + allowing the client to control the media processing resources on the + speech resource server. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc6787. + +Copyright Notice + + Copyright (c) 2012 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + + + +Burnett & Shanmugham Standards Track [Page 1] + +RFC 6787 MRCPv2 November 2012 + + + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + + This document may contain material from IETF Documents or IETF + Contributions published or made publicly available before November + 10, 2008. The person(s) controlling the copyright in some of this + material may not have granted the IETF Trust the right to allow + modifications of such material outside the IETF Standards Process. + Without obtaining an adequate license from the person(s) controlling + the copyright in such materials, this document may not be modified + outside the IETF Standards Process, and derivative works of it may + not be created outside the IETF Standards Process, except to format + it for publication as an RFC or to translate it into languages other + than English. + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8 + 2. Document Conventions . . . . . . . . . . . . . . . . . . . . 9 + 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . 10 + 2.2. State-Machine Diagrams . . . . . . . . . . . . . . . . . 10 + 2.3. URI Schemes . . . . . . . . . . . . . . . . . . . . . . 11 + 3. Architecture . . . . . . . . . . . . . . . . . . . . . . . . 11 + 3.1. MRCPv2 Media Resource Types . . . . . . . . . . . . . . 12 + 3.2. Server and Resource Addressing . . . . . . . . . . . . . 14 + 4. MRCPv2 Basics . . . . . . . . . . . . . . . . . . . . . . . . 14 + 4.1. Connecting to the Server . . . . . . . . . . . . . . . . 14 + 4.2. Managing Resource Control Channels . . . . . . . . . . . 14 + 4.3. SIP Session Example . . . . . . . . . . . . . . . . . . 17 + 4.4. Media Streams and RTP Ports . . . . . . . . . . . . . . 22 + 4.5. MRCPv2 Message Transport . . . . . . . . . . . . . . . . 24 + 4.6. MRCPv2 Session Termination . . . . . . . . . . . . . . . 24 + 5. MRCPv2 Specification . . . . . . . . . . . . . . . . . . . . 24 + 5.1. Common Protocol Elements . . . . . . . . . . . . . . . . 25 + 5.2. Request . . . . . . . . . . . . . . . . . . . . . . . . 28 + 5.3. Response . . . . . . . . . . . . . . . . . . . . . . . . 29 + 5.4. Status Codes . . . . . . . . . . . . . . . . . . . . . . 30 + 5.5. Events . . . . . . . . . . . . . . . . . . . . . . . . . 31 + 6. MRCPv2 Generic Methods, Headers, and Result Structure . . . . 32 + 6.1. Generic Methods . . . . . . . . . . . . . . . . . . . . 32 + 6.1.1. SET-PARAMS . . . . . . . . . . . . . . . . . . . . . 32 + 6.1.2. GET-PARAMS . . . . . . . . . . . . . . . . . . . . . 33 + 6.2. Generic Message Headers . . . . . . . . . . . . . . . . 34 + 6.2.1. Channel-Identifier . . . . . . . . . . . . . . . . . 35 + 6.2.2. Accept . . . . . . . . . . . . . . . . . . . . . . . 36 + + + +Burnett & Shanmugham Standards Track [Page 2] + +RFC 6787 MRCPv2 November 2012 + + + 6.2.3. Active-Request-Id-List . . . . . . . . . . . . . . . 36 + 6.2.4. Proxy-Sync-Id . . . . . . . . . . . . . . . . . . . 36 + 6.2.5. Accept-Charset . . . . . . . . . . . . . . . . . . . 37 + 6.2.6. Content-Type . . . . . . . . . . . . . . . . . . . . 37 + 6.2.7. Content-ID . . . . . . . . . . . . . . . . . . . . . 38 + 6.2.8. Content-Base . . . . . . . . . . . . . . . . . . . . 38 + 6.2.9. Content-Encoding . . . . . . . . . . . . . . . . . . 38 + 6.2.10. Content-Location . . . . . . . . . . . . . . . . . . 39 + 6.2.11. Content-Length . . . . . . . . . . . . . . . . . . . 39 + 6.2.12. Fetch Timeout . . . . . . . . . . . . . . . . . . . 39 + 6.2.13. Cache-Control . . . . . . . . . . . . . . . . . . . 40 + 6.2.14. Logging-Tag . . . . . . . . . . . . . . . . . . . . 41 + 6.2.15. Set-Cookie . . . . . . . . . . . . . . . . . . . . . 42 + 6.2.16. Vendor-Specific Parameters . . . . . . . . . . . . . 44 + 6.3. Generic Result Structure . . . . . . . . . . . . . . . . 44 + 6.3.1. Natural Language Semantics Markup Language . . . . . 45 + 7. Resource Discovery . . . . . . . . . . . . . . . . . . . . . 46 + 8. Speech Synthesizer Resource . . . . . . . . . . . . . . . . . 47 + 8.1. Synthesizer State Machine . . . . . . . . . . . . . . . 48 + 8.2. Synthesizer Methods . . . . . . . . . . . . . . . . . . 48 + 8.3. Synthesizer Events . . . . . . . . . . . . . . . . . . . 49 + 8.4. Synthesizer Header Fields . . . . . . . . . . . . . . . 49 + 8.4.1. Jump-Size . . . . . . . . . . . . . . . . . . . . . 49 + 8.4.2. Kill-On-Barge-In . . . . . . . . . . . . . . . . . . 50 + 8.4.3. Speaker-Profile . . . . . . . . . . . . . . . . . . 51 + 8.4.4. Completion-Cause . . . . . . . . . . . . . . . . . . 51 + 8.4.5. Completion-Reason . . . . . . . . . . . . . . . . . 52 + 8.4.6. Voice-Parameter . . . . . . . . . . . . . . . . . . 52 + 8.4.7. Prosody-Parameters . . . . . . . . . . . . . . . . . 53 + 8.4.8. Speech-Marker . . . . . . . . . . . . . . . . . . . 53 + 8.4.9. Speech-Language . . . . . . . . . . . . . . . . . . 54 + 8.4.10. Fetch-Hint . . . . . . . . . . . . . . . . . . . . . 54 + 8.4.11. Audio-Fetch-Hint . . . . . . . . . . . . . . . . . . 55 + 8.4.12. Failed-URI . . . . . . . . . . . . . . . . . . . . . 55 + 8.4.13. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 55 + 8.4.14. Speak-Restart . . . . . . . . . . . . . . . . . . . 56 + 8.4.15. Speak-Length . . . . . . . . . . . . . . . . . . . . 56 + 8.4.16. Load-Lexicon . . . . . . . . . . . . . . . . . . . . 57 + 8.4.17. Lexicon-Search-Order . . . . . . . . . . . . . . . . 57 + 8.5. Synthesizer Message Body . . . . . . . . . . . . . . . . 57 + 8.5.1. Synthesizer Speech Data . . . . . . . . . . . . . . 57 + 8.5.2. Lexicon Data . . . . . . . . . . . . . . . . . . . . 59 + 8.6. SPEAK Method . . . . . . . . . . . . . . . . . . . . . . 60 + 8.7. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 62 + 8.8. BARGE-IN-OCCURRED . . . . . . . . . . . . . . . . . . . 63 + 8.9. PAUSE . . . . . . . . . . . . . . . . . . . . . . . . . 65 + 8.10. RESUME . . . . . . . . . . . . . . . . . . . . . . . . . 66 + 8.11. CONTROL . . . . . . . . . . . . . . . . . . . . . . . . 67 + + + +Burnett & Shanmugham Standards Track [Page 3] + +RFC 6787 MRCPv2 November 2012 + + + 8.12. SPEAK-COMPLETE . . . . . . . . . . . . . . . . . . . . . 69 + 8.13. SPEECH-MARKER . . . . . . . . . . . . . . . . . . . . . 70 + 8.14. DEFINE-LEXICON . . . . . . . . . . . . . . . . . . . . . 71 + 9. Speech Recognizer Resource . . . . . . . . . . . . . . . . . 72 + 9.1. Recognizer State Machine . . . . . . . . . . . . . . . . 74 + 9.2. Recognizer Methods . . . . . . . . . . . . . . . . . . . 74 + 9.3. Recognizer Events . . . . . . . . . . . . . . . . . . . 75 + 9.4. Recognizer Header Fields . . . . . . . . . . . . . . . . 75 + 9.4.1. Confidence-Threshold . . . . . . . . . . . . . . . . 77 + 9.4.2. Sensitivity-Level . . . . . . . . . . . . . . . . . 77 + 9.4.3. Speed-Vs-Accuracy . . . . . . . . . . . . . . . . . 77 + 9.4.4. N-Best-List-Length . . . . . . . . . . . . . . . . . 78 + 9.4.5. Input-Type . . . . . . . . . . . . . . . . . . . . . 78 + 9.4.6. No-Input-Timeout . . . . . . . . . . . . . . . . . . 78 + 9.4.7. Recognition-Timeout . . . . . . . . . . . . . . . . 79 + 9.4.8. Waveform-URI . . . . . . . . . . . . . . . . . . . . 79 + 9.4.9. Media-Type . . . . . . . . . . . . . . . . . . . . . 80 + 9.4.10. Input-Waveform-URI . . . . . . . . . . . . . . . . . 80 + 9.4.11. Completion-Cause . . . . . . . . . . . . . . . . . . 80 + 9.4.12. Completion-Reason . . . . . . . . . . . . . . . . . 83 + 9.4.13. Recognizer-Context-Block . . . . . . . . . . . . . . 83 + 9.4.14. Start-Input-Timers . . . . . . . . . . . . . . . . . 83 + 9.4.15. Speech-Complete-Timeout . . . . . . . . . . . . . . 84 + 9.4.16. Speech-Incomplete-Timeout . . . . . . . . . . . . . 84 + 9.4.17. DTMF-Interdigit-Timeout . . . . . . . . . . . . . . 85 + 9.4.18. DTMF-Term-Timeout . . . . . . . . . . . . . . . . . 85 + 9.4.19. DTMF-Term-Char . . . . . . . . . . . . . . . . . . . 85 + 9.4.20. Failed-URI . . . . . . . . . . . . . . . . . . . . . 86 + 9.4.21. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 86 + 9.4.22. Save-Waveform . . . . . . . . . . . . . . . . . . . 86 + 9.4.23. New-Audio-Channel . . . . . . . . . . . . . . . . . 86 + 9.4.24. Speech-Language . . . . . . . . . . . . . . . . . . 87 + 9.4.25. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 87 + 9.4.26. Recognition-Mode . . . . . . . . . . . . . . . . . . 87 + 9.4.27. Cancel-If-Queue . . . . . . . . . . . . . . . . . . 88 + 9.4.28. Hotword-Max-Duration . . . . . . . . . . . . . . . . 88 + 9.4.29. Hotword-Min-Duration . . . . . . . . . . . . . . . . 88 + 9.4.30. Interpret-Text . . . . . . . . . . . . . . . . . . . 89 + 9.4.31. DTMF-Buffer-Time . . . . . . . . . . . . . . . . . . 89 + 9.4.32. Clear-DTMF-Buffer . . . . . . . . . . . . . . . . . 89 + 9.4.33. Early-No-Match . . . . . . . . . . . . . . . . . . . 90 + 9.4.34. Num-Min-Consistent-Pronunciations . . . . . . . . . 90 + 9.4.35. Consistency-Threshold . . . . . . . . . . . . . . . 90 + 9.4.36. Clash-Threshold . . . . . . . . . . . . . . . . . . 90 + 9.4.37. Personal-Grammar-URI . . . . . . . . . . . . . . . . 91 + 9.4.38. Enroll-Utterance . . . . . . . . . . . . . . . . . . 91 + 9.4.39. Phrase-Id . . . . . . . . . . . . . . . . . . . . . 91 + 9.4.40. Phrase-NL . . . . . . . . . . . . . . . . . . . . . 92 + + + +Burnett & Shanmugham Standards Track [Page 4] + +RFC 6787 MRCPv2 November 2012 + + + 9.4.41. Weight . . . . . . . . . . . . . . . . . . . . . . . 92 + 9.4.42. Save-Best-Waveform . . . . . . . . . . . . . . . . . 92 + 9.4.43. New-Phrase-Id . . . . . . . . . . . . . . . . . . . 93 + 9.4.44. Confusable-Phrases-URI . . . . . . . . . . . . . . . 93 + 9.4.45. Abort-Phrase-Enrollment . . . . . . . . . . . . . . 93 + 9.5. Recognizer Message Body . . . . . . . . . . . . . . . . 93 + 9.5.1. Recognizer Grammar Data . . . . . . . . . . . . . . 93 + 9.5.2. Recognizer Result Data . . . . . . . . . . . . . . . 97 + 9.5.3. Enrollment Result Data . . . . . . . . . . . . . . . 98 + 9.5.4. Recognizer Context Block . . . . . . . . . . . . . . 98 + 9.6. Recognizer Results . . . . . . . . . . . . . . . . . . . 99 + 9.6.1. Markup Functions . . . . . . . . . . . . . . . . . . 99 + 9.6.2. Overview of Recognizer Result Elements and Their + Relationships . . . . . . . . . . . . . . . . . . . 100 + 9.6.3. Elements and Attributes . . . . . . . . . . . . . . 101 + 9.7. Enrollment Results . . . . . . . . . . . . . . . . . . . 106 + 9.7.1. Element . . . . . . . . . . . . . . . 106 + 9.7.2. Element . . . . . . . . . . . 106 + 9.7.3. Element . . . . . . . 107 + 9.7.4. Element . . . . . . . . . . . . 107 + 9.7.5. Element . . . . . . . . . . . . . 107 + 9.7.6. Element . . . . . . . . . . . . . . 107 + 9.7.7. Element . . . . . . . . . . . . 107 + 9.8. DEFINE-GRAMMAR . . . . . . . . . . . . . . . . . . . . . 107 + 9.9. RECOGNIZE . . . . . . . . . . . . . . . . . . . . . . . 111 + 9.10. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 118 + 9.11. GET-RESULT . . . . . . . . . . . . . . . . . . . . . . . 119 + 9.12. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 120 + 9.13. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 120 + 9.14. RECOGNITION-COMPLETE . . . . . . . . . . . . . . . . . . 120 + 9.15. START-PHRASE-ENROLLMENT . . . . . . . . . . . . . . . . 123 + 9.16. ENROLLMENT-ROLLBACK . . . . . . . . . . . . . . . . . . 124 + 9.17. END-PHRASE-ENROLLMENT . . . . . . . . . . . . . . . . . 124 + 9.18. MODIFY-PHRASE . . . . . . . . . . . . . . . . . . . . . 125 + 9.19. DELETE-PHRASE . . . . . . . . . . . . . . . . . . . . . 125 + 9.20. INTERPRET . . . . . . . . . . . . . . . . . . . . . . . 125 + 9.21. INTERPRETATION-COMPLETE . . . . . . . . . . . . . . . . 127 + 9.22. DTMF Detection . . . . . . . . . . . . . . . . . . . . . 128 + 10. Recorder Resource . . . . . . . . . . . . . . . . . . . . . . 129 + 10.1. Recorder State Machine . . . . . . . . . . . . . . . . . 129 + 10.2. Recorder Methods . . . . . . . . . . . . . . . . . . . . 130 + 10.3. Recorder Events . . . . . . . . . . . . . . . . . . . . 130 + 10.4. Recorder Header Fields . . . . . . . . . . . . . . . . . 130 + 10.4.1. Sensitivity-Level . . . . . . . . . . . . . . . . . 130 + 10.4.2. No-Input-Timeout . . . . . . . . . . . . . . . . . . 131 + 10.4.3. Completion-Cause . . . . . . . . . . . . . . . . . . 131 + 10.4.4. Completion-Reason . . . . . . . . . . . . . . . . . 132 + 10.4.5. Failed-URI . . . . . . . . . . . . . . . . . . . . . 132 + + + +Burnett & Shanmugham Standards Track [Page 5] + +RFC 6787 MRCPv2 November 2012 + + + 10.4.6. Failed-URI-Cause . . . . . . . . . . . . . . . . . . 132 + 10.4.7. Record-URI . . . . . . . . . . . . . . . . . . . . . 132 + 10.4.8. Media-Type . . . . . . . . . . . . . . . . . . . . . 133 + 10.4.9. Max-Time . . . . . . . . . . . . . . . . . . . . . . 133 + 10.4.10. Trim-Length . . . . . . . . . . . . . . . . . . . . 134 + 10.4.11. Final-Silence . . . . . . . . . . . . . . . . . . . 134 + 10.4.12. Capture-On-Speech . . . . . . . . . . . . . . . . . 134 + 10.4.13. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 134 + 10.4.14. Start-Input-Timers . . . . . . . . . . . . . . . . . 135 + 10.4.15. New-Audio-Channel . . . . . . . . . . . . . . . . . 135 + 10.5. Recorder Message Body . . . . . . . . . . . . . . . . . 135 + 10.6. RECORD . . . . . . . . . . . . . . . . . . . . . . . . . 135 + 10.7. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 136 + 10.8. RECORD-COMPLETE . . . . . . . . . . . . . . . . . . . . 137 + 10.9. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 138 + 10.10. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 138 + 11. Speaker Verification and Identification . . . . . . . . . . . 139 + 11.1. Speaker Verification State Machine . . . . . . . . . . . 140 + 11.2. Speaker Verification Methods . . . . . . . . . . . . . . 142 + 11.3. Verification Events . . . . . . . . . . . . . . . . . . 144 + 11.4. Verification Header Fields . . . . . . . . . . . . . . . 144 + 11.4.1. Repository-URI . . . . . . . . . . . . . . . . . . . 144 + 11.4.2. Voiceprint-Identifier . . . . . . . . . . . . . . . 145 + 11.4.3. Verification-Mode . . . . . . . . . . . . . . . . . 145 + 11.4.4. Adapt-Model . . . . . . . . . . . . . . . . . . . . 146 + 11.4.5. Abort-Model . . . . . . . . . . . . . . . . . . . . 146 + 11.4.6. Min-Verification-Score . . . . . . . . . . . . . . . 147 + 11.4.7. Num-Min-Verification-Phrases . . . . . . . . . . . . 147 + 11.4.8. Num-Max-Verification-Phrases . . . . . . . . . . . . 147 + 11.4.9. No-Input-Timeout . . . . . . . . . . . . . . . . . . 148 + 11.4.10. Save-Waveform . . . . . . . . . . . . . . . . . . . 148 + 11.4.11. Media-Type . . . . . . . . . . . . . . . . . . . . . 148 + 11.4.12. Waveform-URI . . . . . . . . . . . . . . . . . . . . 148 + 11.4.13. Voiceprint-Exists . . . . . . . . . . . . . . . . . 149 + 11.4.14. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 149 + 11.4.15. Input-Waveform-URI . . . . . . . . . . . . . . . . . 149 + 11.4.16. Completion-Cause . . . . . . . . . . . . . . . . . . 150 + 11.4.17. Completion-Reason . . . . . . . . . . . . . . . . . 151 + 11.4.18. Speech-Complete-Timeout . . . . . . . . . . . . . . 151 + 11.4.19. New-Audio-Channel . . . . . . . . . . . . . . . . . 152 + 11.4.20. Abort-Verification . . . . . . . . . . . . . . . . . 152 + 11.4.21. Start-Input-Timers . . . . . . . . . . . . . . . . . 152 + 11.5. Verification Message Body . . . . . . . . . . . . . . . 152 + 11.5.1. Verification Result Data . . . . . . . . . . . . . . 152 + 11.5.2. Verification Result Elements . . . . . . . . . . . . 153 + 11.6. START-SESSION . . . . . . . . . . . . . . . . . . . . . 157 + 11.7. END-SESSION . . . . . . . . . . . . . . . . . . . . . . 158 + 11.8. QUERY-VOICEPRINT . . . . . . . . . . . . . . . . . . . . 159 + + + +Burnett & Shanmugham Standards Track [Page 6] + +RFC 6787 MRCPv2 November 2012 + + + 11.9. DELETE-VOICEPRINT . . . . . . . . . . . . . . . . . . . 160 + 11.10. VERIFY . . . . . . . . . . . . . . . . . . . . . . . . . 160 + 11.11. VERIFY-FROM-BUFFER . . . . . . . . . . . . . . . . . . . 160 + 11.12. VERIFY-ROLLBACK . . . . . . . . . . . . . . . . . . . . 164 + 11.13. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 164 + 11.14. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 165 + 11.15. VERIFICATION-COMPLETE . . . . . . . . . . . . . . . . . 165 + 11.16. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 166 + 11.17. CLEAR-BUFFER . . . . . . . . . . . . . . . . . . . . . . 166 + 11.18. GET-INTERMEDIATE-RESULT . . . . . . . . . . . . . . . . 167 + 12. Security Considerations . . . . . . . . . . . . . . . . . . . 168 + 12.1. Rendezvous and Session Establishment . . . . . . . . . . 168 + 12.2. Control Channel Protection . . . . . . . . . . . . . . . 168 + 12.3. Media Session Protection . . . . . . . . . . . . . . . . 169 + 12.4. Indirect Content Access . . . . . . . . . . . . . . . . 169 + 12.5. Protection of Stored Media . . . . . . . . . . . . . . . 170 + 12.6. DTMF and Recognition Buffers . . . . . . . . . . . . . . 171 + 12.7. Client-Set Server Parameters . . . . . . . . . . . . . . 171 + 12.8. DELETE-VOICEPRINT and Authorization . . . . . . . . . . 171 + 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 171 + 13.1. New Registries . . . . . . . . . . . . . . . . . . . . . 171 + 13.1.1. MRCPv2 Resource Types . . . . . . . . . . . . . . . 171 + 13.1.2. MRCPv2 Methods and Events . . . . . . . . . . . . . 172 + 13.1.3. MRCPv2 Header Fields . . . . . . . . . . . . . . . . 173 + 13.1.4. MRCPv2 Status Codes . . . . . . . . . . . . . . . . 176 + 13.1.5. Grammar Reference List Parameters . . . . . . . . . 176 + 13.1.6. MRCPv2 Vendor-Specific Parameters . . . . . . . . . 176 + 13.2. NLSML-Related Registrations . . . . . . . . . . . . . . 177 + 13.2.1. 'application/nlsml+xml' Media Type Registration . . 177 + 13.3. NLSML XML Schema Registration . . . . . . . . . . . . . 178 + 13.4. MRCPv2 XML Namespace Registration . . . . . . . . . . . 178 + 13.5. Text Media Type Registrations . . . . . . . . . . . . . 178 + 13.5.1. text/grammar-ref-list . . . . . . . . . . . . . . . 178 + 13.6. 'session' URI Scheme Registration . . . . . . . . . . . 180 + 13.7. SDP Parameter Registrations . . . . . . . . . . . . . . 181 + 13.7.1. Sub-Registry "proto" . . . . . . . . . . . . . . . . 181 + 13.7.2. Sub-Registry "att-field (media-level)" . . . . . . . 182 + 14. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 183 + 14.1. Message Flow . . . . . . . . . . . . . . . . . . . . . . 183 + 14.2. Recognition Result Examples . . . . . . . . . . . . . . 192 + 14.2.1. Simple ASR Ambiguity . . . . . . . . . . . . . . . . 192 + 14.2.2. Mixed Initiative . . . . . . . . . . . . . . . . . . 192 + 14.2.3. DTMF Input . . . . . . . . . . . . . . . . . . . . . 193 + 14.2.4. Interpreting Meta-Dialog and Meta-Task Utterances . 194 + 14.2.5. Anaphora and Deixis . . . . . . . . . . . . . . . . 195 + 14.2.6. Distinguishing Individual Items from Sets with + One Member . . . . . . . . . . . . . . . . . . . . . 195 + 14.2.7. Extensibility . . . . . . . . . . . . . . . . . . . 196 + + + +Burnett & Shanmugham Standards Track [Page 7] + +RFC 6787 MRCPv2 November 2012 + + + 15. ABNF Normative Definition . . . . . . . . . . . . . . . . . . 196 + 16. XML Schemas . . . . . . . . . . . . . . . . . . . . . . . . . 211 + 16.1. NLSML Schema Definition . . . . . . . . . . . . . . . . 211 + 16.2. Enrollment Results Schema Definition . . . . . . . . . . 213 + 16.3. Verification Results Schema Definition . . . . . . . . . 214 + 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 218 + 17.1. Normative References . . . . . . . . . . . . . . . . . . 218 + 17.2. Informative References . . . . . . . . . . . . . . . . . 220 + Appendix A. Contributors . . . . . . . . . . . . . . . . . . . . 223 + Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . 223 + +1. Introduction + + MRCPv2 is designed to allow a client device to control media + processing resources on the network. Some of these media processing + resources include speech recognition engines, speech synthesis + engines, speaker verification, and speaker identification engines. + MRCPv2 enables the implementation of distributed Interactive Voice + Response platforms using VoiceXML [W3C.REC-voicexml20-20040316] + browsers or other client applications while maintaining separate + back-end speech processing capabilities on specialized speech + processing servers. MRCPv2 is based on the earlier Media Resource + Control Protocol (MRCP) [RFC4463] developed jointly by Cisco Systems, + Inc., Nuance Communications, and Speechworks, Inc. Although some of + the method names are similar, the way in which these methods are + communicated is different. There are also more resources and more + methods for each resource. The first version of MRCP was essentially + taken only as input to the development of this protocol. There is no + expectation that an MRCPv2 client will work with an MRCPv1 server or + vice versa. There is no migration plan or gateway definition between + the two protocols. + + The protocol requirements of Speech Services Control (SPEECHSC) + [RFC4313] include that the solution be capable of reaching a media + processing server, setting up communication channels to the media + resources, and sending and receiving control messages and media + streams to/from the server. The Session Initiation Protocol (SIP) + [RFC3261] meets these requirements. + + The proprietary version of MRCP ran over the Real Time Streaming + Protocol (RTSP) [RFC2326]. At the time work on MRCPv2 was begun, the + consensus was that this use of RTSP would break the RTSP protocol or + cause backward-compatibility problems, something forbidden by Section + 3.2 of [RFC4313]. This is the reason why MRCPv2 does not run over + RTSP. + + + + + + +Burnett & Shanmugham Standards Track [Page 8] + +RFC 6787 MRCPv2 November 2012 + + + MRCPv2 leverages these capabilities by building upon SIP and the + Session Description Protocol (SDP) [RFC4566]. MRCPv2 uses SIP to set + up and tear down media and control sessions with the server. In + addition, the client can use a SIP re-INVITE method (an INVITE dialog + sent within an existing SIP session) to change the characteristics of + these media and control session while maintaining the SIP dialog + between the client and server. SDP is used to describe the + parameters of the media sessions associated with that dialog. It is + mandatory to support SIP as the session establishment protocol to + ensure interoperability. Other protocols can be used for session + establishment by prior agreement. This document only describes the + use of SIP and SDP. + + MRCPv2 uses SIP and SDP to create the speech client/server dialog and + set up the media channels to the server. It also uses SIP and SDP to + establish MRCPv2 control sessions between the client and the server + for each media processing resource required for that dialog. The + MRCPv2 protocol exchange between the client and the media resource is + carried on that control session. MRCPv2 exchanges do not change the + state of the SIP dialog, the media sessions, or other parameters of + the dialog initiated via SIP. It controls and affects the state of + the media processing resource associated with the MRCPv2 session(s). + + MRCPv2 defines the messages to control the different media processing + resources and the state machines required to guide their operation. + It also describes how these messages are carried over a transport- + layer protocol such as the Transmission Control Protocol (TCP) + [RFC0793] or the Transport Layer Security (TLS) Protocol [RFC5246]. + (Note: the Stream Control Transmission Protocol (SCTP) [RFC4960] is a + viable transport for MRCPv2 as well, but the mapping onto SCTP is not + described in this specification.) + +2. Document Conventions + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119 [RFC2119]. + + Since many of the definitions and syntax are identical to those for + the Hypertext Transfer Protocol -- HTTP/1.1 [RFC2616], this + specification refers to the section where they are defined rather + than copying it. For brevity, [HX.Y] is to be taken to refer to + Section X.Y of RFC 2616. + + All the mechanisms specified in this document are described in both + prose and an augmented Backus-Naur form (ABNF [RFC5234]). + + + + + +Burnett & Shanmugham Standards Track [Page 9] + +RFC 6787 MRCPv2 November 2012 + + + The complete message format in ABNF form is provided in Section 15 + and is the normative format definition. Note that productions may be + duplicated within the main body of the document for reading + convenience. If a production in the body of the text conflicts with + one in the normative definition, the latter rules. + +2.1. Definitions + + Media Resource + An entity on the speech processing server that can be + controlled through MRCPv2. + + MRCP Server + Aggregate of one or more "Media Resource" entities on + a server, exposed through MRCPv2. Often, 'server' in + this document refers to an MRCP server. + + MRCP Client + An entity controlling one or more Media Resources + through MRCPv2 ("Client" for short). + + DTMF + Dual-Tone Multi-Frequency; a method of transmitting + key presses in-band, either as actual tones (Q.23 + [Q.23]) or as named tone events (RFC 4733 [RFC4733]). + + Endpointing + The process of automatically detecting the beginning + and end of speech in an audio stream. This is + critical both for speech recognition and for automated + recording as one would find in voice mail systems. + + Hotword Mode + A mode of speech recognition where a stream of + utterances is evaluated for match against a small set + of command words. This is generally employed either + to trigger some action or to control the subsequent + grammar to be used for further recognition. + +2.2. State-Machine Diagrams + + The state-machine diagrams in this document do not show every + possible method call. Rather, they reflect the state of the resource + based on the methods that have moved to IN-PROGRESS or COMPLETE + states (see Section 5.3). Note that since PENDING requests + essentially have not affected the resource yet and are in the queue + to be processed, they are not reflected in the state-machine + diagrams. + + + +Burnett & Shanmugham Standards Track [Page 10] + +RFC 6787 MRCPv2 November 2012 + + +2.3. URI Schemes + + This document defines many protocol headers that contain URIs + (Uniform Resource Identifiers [RFC3986]) or lists of URIs for + referencing media. The entire document, including the Security + Considerations section (Section 12), assumes that HTTP or HTTP over + TLS (HTTPS) [RFC2818] will be used as the URI addressing scheme + unless otherwise stated. However, implementations MAY support other + schemes (such as 'file'), provided they have addressed any security + considerations described in this document and any others particular + to the specific scheme. For example, implementations where the + client and server both reside on the same physical hardware and the + file system is secured by traditional user-level file access controls + could be reasonable candidates for supporting the 'file' scheme. + +3. Architecture + + A system using MRCPv2 consists of a client that requires the + generation and/or consumption of media streams and a media resource + server that has the resources or "engines" to process these streams + as input or generate these streams as output. The client uses SIP + and SDP to establish an MRCPv2 control channel with the server to use + its media processing resources. MRCPv2 servers are addressed using + SIP URIs. + + SIP uses SDP with the offer/answer model described in RFC 3264 + [RFC3264] to set up the MRCPv2 control channels and describe their + characteristics. A separate MRCPv2 session is needed to control each + of the media processing resources associated with the SIP dialog + between the client and server. Within a SIP dialog, the individual + resource control channels for the different resources are added or + removed through SDP offer/answer carried in a SIP re-INVITE + transaction. + + The server, through the SDP exchange, provides the client with a + difficult-to-guess, unambiguous channel identifier and a TCP port + number (see Section 4.2). The client MAY then open a new TCP + connection with the server on this port number. Multiple MRCPv2 + channels can share a TCP connection between the client and the + server. All MRCPv2 messages exchanged between the client and the + server carry the specified channel identifier that the server MUST + ensure is unambiguous among all MRCPv2 control channels that are + active on that server. The client uses this channel identifier to + indicate the media processing resource associated with that channel. + For information on message framing, see Section 5. + + SIP also establishes the media sessions between the client (or other + source/sink of media) and the MRCPv2 server using SDP "m=" lines. + + + +Burnett & Shanmugham Standards Track [Page 11] + +RFC 6787 MRCPv2 November 2012 + + + One or more media processing resources may share a media session + under a SIP session, or each media processing resource may have its + own media session. + + The following diagram shows the general architecture of a system that + uses MRCPv2. To simplify the diagram, only a few resources are + shown. + + MRCPv2 client MRCPv2 Media Resource Server +|--------------------| |------------------------------------| +||------------------|| ||----------------------------------|| +|| Application Layer|| ||Synthesis|Recognition|Verification|| +||------------------|| || Engine | Engine | Engine || +||Media Resource API|| || || | || | || || +||------------------|| ||Synthesis|Recognizer | Verifier || +|| SIP | MRCPv2 || ||Resource | Resource | Resource || +||Stack | || || Media Resource Management || +|| | || ||----------------------------------|| +||------------------|| || SIP | MRCPv2 || +|| TCP/IP Stack ||---MRCPv2---|| Stack | || +|| || ||----------------------------------|| +||------------------||----SIP-----|| TCP/IP Stack || +|--------------------| || || + | ||----------------------------------|| + SIP |------------------------------------| + | / +|-------------------| RTP +| | / +| Media Source/Sink |------------/ +| | +|-------------------| + + Figure 1: Architectural Diagram + +3.1. MRCPv2 Media Resource Types + + An MRCPv2 server may offer one or more of the following media + processing resources to its clients. + + Basic Synthesizer + A speech synthesizer resource that has very limited + capabilities and can generate its media stream + exclusively from concatenated audio clips. The speech + data is described using a limited subset of the Speech + Synthesis Markup Language (SSML) + [W3C.REC-speech-synthesis-20040907] elements. A basic + synthesizer MUST support the SSML tags , +