1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
|
Network Working Group M. Froumentin
Request for Comments: 4267 W3C
Category: Informational November 2005
The W3C Speech Interface Framework Media Types:
application/voicexml+xml, application/ssml+xml, application/srgs,
application/srgs+xml, application/ccxml+xml, and application/pls+xml
Status of This Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
This document defines the media types for the languages of the W3C
Speech Interface Framework, as designed by the Voice Browser Working
Group in the following specifications: the Voice Extensible Markup
Language (VoiceXML), the Speech Synthesis Markup Language (SSML), the
Speech Recognition Grammar Specification (SRGS), the Call Control XML
(CCXML), and the Pronunciation Lexicon Specification (PLS).
Table of Contents
1. Introduction ....................................................2
2. Registration of application/voicexml+xml, application/ssml+xml,
application/srgs+xml, application/ccxml+xml, and
application/pls+xml .............................................3
2.1. Encoding Considerations ....................................3
2.2. Interoperability Considerations ............................3
2.3. Published Specifications ...................................3
2.4. Applications that Use These Media Types ....................4
2.5. Security Considerations ....................................4
2.6. Additional Information .....................................4
2.6.1. Magic Numbers .......................................4
2.6.2. File Extensions .....................................4
2.6.3. Fragment Identifiers ................................5
2.6.4. Macintosh File Type Code ............................5
2.6.5. Person and Email Address to Contact for
Further Information .................................5
2.6.6. Intended Usage ......................................5
2.6.7. Change Controller ...................................5
Froumentin Informational [Page 1]
^L
RFC 4267 W3C Speech Interface Media Types November 2005
3. Registration of application/srgs ................................5
3.1. Encoding Considerations ....................................5
3.2. Interoperability Considerations ............................5
3.3. Published Specifications ...................................5
3.4. Applications That Use This Media Type ......................6
3.5. Security Considerations ....................................6
3.6. Additional Information .....................................6
3.6.1. Magic Numbers .......................................6
3.6.2. File Extensions .....................................6
3.6.3. Macintosh File Type Code ............................6
3.6.4. Person and Email Address to Contact for
Further Information .................................7
3.6.5. Intended Usage ......................................7
3.6.6. Change Controller ...................................7
4. IANA Considerations .............................................7
5. Normative References ............................................7
1. Introduction
This specification defines the media types of the Voice Extensible
Markup Language (VoiceXML), the Speech Synthesis Markup Language
(SSML), the Speech Recognition Grammar Specification (SRGS), the Call
Control XML (CCXML), and the Pronunciation Lexicon Specification
(PLS), the specifications of the W3C Speech Interface Framework.
VoiceXML ([VoiceXML2.0]) is an Extensible Markup Language (XML)
designed for creating audio dialogs that feature synthesized speech,
digitized audio, recognition of spoken and DTMF key input, recording
of spoken input, telephony, and mixed initiative conversations. The
associated media type defined in this document is
"application/voicexml+xml".
The Speech Synthesis Markup Language specification (SSML) defines an
XML-based markup language for assisting the generation of synthetic
speech in Web and other applications. The essential role of SSML is
to provide authors of synthesizable content a standard way to control
aspects of speech such as pronunciation, volume, pitch, and rate,
across different synthesis-capable platforms. The associated media
type defined in this document is "application/ssml+xml".
The Speech Recognition Grammar Specification (SRGS) defines syntax
for representing grammars for use in speech recognition so that
developers can specify the words and patterns of words to be listened
for by a speech recognizer. The syntax of the grammar format exists
in two forms, an Augmented BNF (ABNF) Form and an XML Form. The
respective media types defined in this document are
"application/srgs" and "application/srgs+xml".
Froumentin Informational [Page 2]
^L
RFC 4267 W3C Speech Interface Media Types November 2005
The Call Control EXtensible Markup Language (CCXML) is an XML
designed to provide telephony call control support for dialog
systems, such as VoiceXML. The associated media type defined in this
document is "application/ccxml+xml".
The Pronunciation Lexicon Specification (PLS) defines an XML syntax
for specifying pronunciation lexicons to be used by speech
recognition and speech synthesis engines in voice browser
applications. The associated media type defined in this document is
"application/pls+xml".
2. Registration of application/voicexml+xml, application/ssml+xml,
application/srgs+xml, application/ccxml+xml, and application/pls+xml
MIME media type name: application
MIME subtype names: voicexml+xml, ssml+xml, srgs+xml, ccxml+xml,
pls+xml
Required parameters: none
Optional parameters:
"charset": This parameter has identical semantics to the charset
parameter of the "application/xml" media type as specified in RFC
3023 [RFC3023].
2.1. Encoding Considerations
Identical to those of "application/xml" as described in RFC 3023
[RFC3023], section 3.2.
2.2. Interoperability Considerations
There are no known interoperability issues.
2.3. Published Specifications
Voice Extensible Markup Language 2.0 [VoiceXML2.0]
Voice Extensible Markup Language 2.1 [VoiceXML2.1]
Speech Synthesis Markup Language (SSML) Version 1.0 [SSML]
Speech Recognition Grammar Specification Version 1.0 [SRGS]
Voice Browser Call Control: CCXML Version 1.0 [CCXML]
Froumentin Informational [Page 3]
^L
RFC 4267 W3C Speech Interface Media Types November 2005
Pronunciation Lexicon Specification (PLS) Version 1.0 [PLS]
2.4. Applications that Use These Media Types
Various W3C Speech Interface Framework implementations use these
media types.
2.5. Security Considerations
Several instructions in the cited specifications may cause arbitrary
Uniform Resource Identifiers (URIs) to be dereferenced. In this
case, the security issues of [RFC3986], section 7, should be
considered.
In addition, because of the extensibility features of those
specifications, it is possible that the registered media types may
describe content that has security implications beyond those
described here. However, if the processor follows only the normative
semantics of the specifications, this content will be ignored. Only
in the case where the processor recognizes and processes the
additional content, or where further processing of that content is
dispatched to other processors, would security issues potentially
arise. And in that case, they would fall outside the domain of this
registration document.
2.6. Additional Information
2.6.1. Magic Numbers
Although no byte sequences can be counted on to always be present,
XML MIME entities in ASCII-compatible charsets (including UTF-8)
often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"), and those in
UTF-16 often begin with hexadecimal FE FF 00 3C 00 3F 00 78 00 6D 00
6C or FF FE 3C 00 3F 00 78 00 6D 00 6C 00 (the Byte Order Mark (BOM)
followed by "<?xml"). For more information, see Appendix F of [XML].
2.6.2. File Extensions
VoiceXML files: .vxml
SSML files: .ssml
SRGS files (XML syntax): .grxml
CCXML files: .ccxml
PLS files: .pls
Froumentin Informational [Page 4]
^L
RFC 4267 W3C Speech Interface Media Types November 2005
2.6.3. Fragment Identifiers
Identical to that of "application/xml" as described in RFC 3023
[RFC3023], section 5.
2.6.4. Macintosh File Type Code
"TEXT"
2.6.5. Person and Email Address to Contact for Further Information
World Wide Web Consortium <web-human@w3.org>
2.6.6. Intended Usage
COMMON
2.6.7. Change Controller
The Speech Interface Framework specifications set is a work product
of the World Wide Web Consortium's Voice Browser Working Group. The
W3C has change control over these specifications.
3. Registration of application/srgs
MIME media type name: application
MIME subtype names: srgs
Required parameters: none
Optional parameters: none
3.1. Encoding Considerations
The ABNF Form of SRGS follows the character encoding handling defined
for XML: an ABNF grammar processor must accept both the UTF-8 and
UTF-16 encodings of ISO/IEC 10646 and may support other character
encodings.
3.2. Interoperability Considerations
There are no known interoperability issues.
3.3. Published Specifications
Speech Recognition Grammar Specification Version 1.0 [SRGS]
Froumentin Informational [Page 5]
^L
RFC 4267 W3C Speech Interface Media Types November 2005
3.4. Applications That Use This Media Type
Various SRGS implementations use this media type.
3.5. Security Considerations
Several instructions in SRGS may cause arbitrary URIs to be
dereferenced. In this case, the security issues of [RFC3986],
section 7, should be considered.
In addition, because of the extensibility features of SRGS, it is
possible that the registered media types may describe content that
has security implications beyond those described here. However, if
the processor follows only the normative semantics of the
specifications, this content will be ignored. Only in the case where
the processor recognizes and processes the additional content, or
where further processing of that content is dispatched to other
processors, would security issues potentially arise. In that case,
they would fall outside the domain of this registration document.
3.6. Additional Information
3.6.1. Magic Numbers
The ABNF self-identifying header must be present in any legal stand-
alone ABNF Form grammar document. The first character of an ABNF
document must be the "#" symbol (x23) unless preceded by an optional
XML 1.0 byte order mark. The ABNF byte order mark follows the XML
definition and requirements. For example, documents encoded in UTF-
16 must begin with the byte order mark. The optional byte order mark
and required "#" symbol must be followed immediately by the exact
string "ABNF" (x41 x42 x4d x46) or the appropriate equivalent for the
document's encoding (e.g., for UTF-16 little-endian: x23 x00 x41 x00
x42 x00 x4d x00 x46 x00). If the byte order mark is absent on a
grammar encoded in UTF-16, then the grammar processor should perform
auto-detection of character encoding in a manner analogous to auto-
detection of character encoding in XML. Next follows a single-space
character (x20) and the required version number, which is "1.0" for
this specification (x31 x2e x30).
3.6.2. File Extensions
.gram
3.6.3. Macintosh File Type Code
"TEXT"
Froumentin Informational [Page 6]
^L
RFC 4267 W3C Speech Interface Media Types November 2005
3.6.4. Person and Email Address to Contact for Further Information
World Wide Web Consortium <web-human@w3.org>
3.6.5. Intended Usage
COMMON
3.6.6. Change Controller
The SRGS specification is a work product of the World Wide Web
Consortium's Voice Browser Working Group. The W3C has change control
over the SRGS specification.
4. IANA Considerations
This document registers six new MIME media types, according to the
registrations in Section 2 and Section 3.
5. Normative References
[CCXML] Auburn, RJ., Ed., "Voice Browser Call Control: CCXML
Version 1.0, W3C Working Draft", January 2005,
<http://www.w3.org/TR/2005/WD-ccxml-20050111/>.
[PLS] Baggia, P., Ed., "Pronunciation Lexicon Specification
(PLS) Version 1.0, W3C Working Draft", February 2005,
<http://w3.org/TR/2005/WD-pronunciation-lexicon-
20050214/>.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter,
"Uniform Resource Identifier (URI): Generic Syntax",
STD 66, RFC 3986, January 2005.
[RFC3023] Murata, M., St. Laurent, S., and D. Kohn, "XML Media
Types", RFC 3023, January 2001.
[SRGS] Hunt, A., Ed. and S. McGlashan, Ed., "Speech
Recognition Grammar Specification Version 1.0, W3C
Recommendation", March 2004,
<http://www.w3.org/TR/2004/REC-speech-grammar-
20040316/>.
[SSML] Burnett, D., Ed., Walker, M., Ed., and A. Hunt, Ed.,
"Speech Synthesis Markup Language (SSML) Version 1.0,
W3C Recommendation", September 2004,
<http://www.w3.org/TR/2004/REC-speech-synthesis-
20040907/>.
Froumentin Informational [Page 7]
^L
RFC 4267 W3C Speech Interface Media Types November 2005
[VoiceXML2.0] McGlashan, S., Ed., "Voice Extensible Markup Language
(VoiceXML) Version 2.0, W3C Recommendation", March
2004, <http://www.w3.org/TR/2004/REC-voicexml20-
20040316/>.
[VoiceXML2.1] Oshry, M., Ed., "Voice Extensible Markup Language
(VoiceXML) Version 2.1, W3C Working Draft", July 2004,
<http://www.w3.org/TR/2004/WD-voicexml21-20040728/>.
[XML] Bray, T., Paoli, J., Sperberg-McQueen, C., Maler, E.,
and F. Yergeau, "Extensible Markup Language (XML) 1.0
(Third Edition)", February 2004,
<http://www.w3.org/TR/2004/REC-xml-20040204/>.
Author's Address
Max Froumentin
World Wide Web Consortium
EMail: mf@w3.org
Froumentin Informational [Page 8]
^L
RFC 4267 W3C Speech Interface Media Types November 2005
Full Copyright Statement
Copyright (C) The Internet Society (2005).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Froumentin Informational [Page 9]
^L
|