1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
|
Internet Engineering Task Force (IETF) L. Romary
Request for Comments: 6129 TEI Consortium and INRIA
Category: Informational S. Lundberg
ISSN: 2070-1721 The Royal Library, Copenhagen
February 2011
The 'application/tei+xml' Media Type
Abstract
This document defines the 'application/tei+xml' media type for markup
languages defined in accordance with the Text Encoding and
Interchange guidelines.
Status of This Memo
This document is not an Internet Standards Track specification; it is
published for informational purposes.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are a candidate for any level of Internet
Standard; see Section 2 of RFC 5741.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc6129.
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Romary & Lundberg Informational [Page 1]
^L
RFC 6129 The 'application/tei+xml' Media Type February 2011
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Recognizing TEI Files . . . . . . . . . . . . . . . . . . . . . 2
3. Fragment Identifier . . . . . . . . . . . . . . . . . . . . . . 4
4. Security Considerations . . . . . . . . . . . . . . . . . . . . 4
4.1. Harmful Content . . . . . . . . . . . . . . . . . . . . . . 4
4.2. Intellectual Property Rights . . . . . . . . . . . . . . . 4
4.3. Authenticity and confidentiality . . . . . . . . . . . . . 5
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 5
5.1. Registration of MIME Type 'application/tei+xml' . . . . . . 5
6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 6
6.1. Normative References . . . . . . . . . . . . . . . . . . . 6
6.2. Informative References . . . . . . . . . . . . . . . . . . 7
1. Introduction
Text Encoding and Interchange (TEI) is an international and
interdisciplinary standard that is widely used by libraries, museums,
publishers, and individual scholars to represent all kinds of textual
material for online research and teaching [TEI].
This document defines the 'application/tei+xml' media type in
accordance with [RFC3023] in order to enable generic processing of
such documents on the Internet using eXtensible Markup Language (XML)
[W3C.REC-xml-20081126] technologies.
2. Recognizing TEI Files
TEI files are XML documents or fragments having the root element (as
defined in [W3C.REC-xml-20081126]) in a TEI namespace. TEI namespace
names are defined as a Universal Resource Identifier (URI) [RFC3986]
in accordance with [W3C.REC-xml-names-20091208] and begins with
http://www.tei-c.org/ns/ followed by the version number of the
namespace. The current namespace is http://www.tei-c.org/ns/1.0
The most common root element names for TEI documents are
<TEI>
<teiCorpus>
Romary & Lundberg Informational [Page 2]
^L
RFC 6129 The 'application/tei+xml' Media Type February 2011
The teiCorpus documents provide the ability to bundle multiple
documents into a single file.
Examples:
A document having <TEI> root element
<?xml version="1.0" encoding="UTF-8" ?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
...
</teiHeader>
<text>
...
</text>
</TEI>
A document having <teiCorpus> root element
<?xml version="1.0" encoding="UTF-8" ?>
<teiCorpus xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
...
</teiHeader>
<TEI>
<teiHeader>
...
</teiHeader>
<text>
...
</text>
</TEI>
<TEI>
... second document ...
</TEI>
<TEI>
... third document ...
</TEI>
</teiCorpus>
TEI and teiCorpus files are often given the extensions .tei and
.teiCorpus, respectively. There is a third type of file, which often
is given the suffix .odd. ODD ("One Document Does it All") is a TEI
XML document that includes schema fragments, prose documentation, and
reference documentation. It is used for the definition and
documentation of XML-based languages, and primarily for the TEI
Guidelines [ODD]. In other words, ODD files do not differ from other
TEI files in syntax, only in function.
Romary & Lundberg Informational [Page 3]
^L
RFC 6129 The 'application/tei+xml' Media Type February 2011
3. Fragment Identifier
Documents having the media type 'application/tei+xml' use the
fragment identifier notation as specified in [RFC3023] for the media
type 'application/xml'.
4. Security Considerations
An XML resource does not in itself compromise data security. When
being available on a network simply through the dereferencing of an
Internationalized Resource Identifier (IRI) [RFC3987] or a URI, care
must be taken to properly interpret the data to prevent unintended
access. Hence the security issues of [RFC3986], Section 7, apply.
In addition, as this media type uses the "+xml" convention, it shares
the same security considerations as described in RFC 3023 [RFC3023],
Section 10. In general, security issues related to the use of XML in
IETF protocols are treated in RFC 3470 [RFC3470], Section 7. We will
not try to duplicate this material, but review some aspects that are
important for document-centric XML as applied to text encoding.
4.1. Harmful Content
Any application accepting submitted or retrieving TEI XML for
processing has to be aware of risks connected with injection of
harmful scripts and executable XML. XML inclusion
[W3C.REC-xinclude-20061115] and the use of external entities are
vulnerable to various forms of spoofing, and can also reveal aspects
of a service in a way that may compromise its security. Any
vulnerability of these kinds are, however, application specific. The
TEI namespaces do not contain such elements.
4.2. Intellectual Property Rights
TEI documents often arise in digitization of cultural heritage
materials. Texts made accessible in TEI format may be unrestricted
in the sense that their distribution may be unlimited by Digital
Rights Management [DRM] or Intellectual Property Rights [IPR]
constraints. However, TEI documents are heterogeneous. Some parts
of a document may be unrestricted, whereas others, such as editorial
text and annotations, may be subject to DRM restrictions.
The TEI format provides means for highly granular attribution, down
to the content of individual XML elements. Software agents
participating in the exchange or processing TEI may be required to
honour markup of this kind. Even when there are no IPR constraints,
intellectual property attribution alone requires that document users
be able to tell the difference between content from different
sources.
Romary & Lundberg Informational [Page 4]
^L
RFC 6129 The 'application/tei+xml' Media Type February 2011
4.3. Authenticity and confidentiality
Historical archival records are often encoded in TEI and legal
document may be binding centuries after they were written.
Digitization and encoding of legal texts may require technologies for
assuring authenticity, such as cryptographic checksums and electronic
signatures.
Similarly, historical documents may in part or in their entirety be
confidential. This may be required by law or by the terms and
conditions, such as in the case of donated or deposited text from
private sources. A text archive may need content filtering or
cryptographic technologies to meet such requirements.
5. IANA Considerations
5.1. Registration of MIME Type 'application/tei+xml'
MIME media type name: application
MIME subtype name: tei+xml
Required parameters: None
Optional parameters: charset
the parameter has identical semantics to the charset parameter
of the "application/xml" media type as specified in RFC 3023
[RFC3023].
Encoding considerations:
Identical to those for 'application/xml'. See RFC 3023
[RFC3023], Section 3.2.
Security considerations:
See Security Considerations (Section 4) in this specification.
Interoperability considerations:
TEI documents are often given the extension '.xml', which is
not uncommon for other XML document formats.
Published specification:
This media type registration is for TEI documents [TEI] as
described here. TEI syntax is defined in a schema [TEIschema].
Romary & Lundberg Informational [Page 5]
^L
RFC 6129 The 'application/tei+xml' Media Type February 2011
Applications which use this media type:
There are currently no known applications using the media type
'application/tei+xml'.
Additional information:
Magic number(s):
There is no single initial octet sequence that is always
present in TEI documents.
file extension(s):
Common extensions are '.tei', '.teiCorpus' and '.odd'. See
Recognizing TEI files (Section 2) in this specification.
Macintosh File Type Code(s)
TEXT
Object Identifier(s) or OID(s)
Not applicable
6. References
6.1. Normative References
[RFC3023] Murata, M., St. Laurent, S., and D. Kohn, "XML Media
Types", RFC 3023, January 2001.
[RFC3470] Hollenbeck, S., Rose, M., and L. Masinter, "Guidelines for
the Use of Extensible Markup Language (XML)
within IETF Protocols", BCP 70, RFC 3470, January 2003.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66,
RFC 3986, January 2005.
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
Identifiers (IRIs)", RFC 3987, January 2005.
[TEI] "TEI Guidelines", <http://www.tei-c.org/Vault/P5/1.8.0/
doc/tei-p5-doc/en/html/>.
Romary & Lundberg Informational [Page 6]
^L
RFC 6129 The 'application/tei+xml' Media Type February 2011
[TEIschema]
"Schema generated from ODD source", <http://www.tei-c.org/
release/xml/tei/custom/schema/relaxng/tei_all.rng>.
[W3C.REC-xml-20081126]
Paoli, J., Yergeau, F., Sperberg-McQueen, C., Maler, E.,
and T. Bray, "Extensible Markup Language (XML) 1.0 (Fifth
Edition)", World Wide Web Consortium Recommendation REC-
xml-20081126, November 2008,
<http://www.w3.org/TR/2008/REC-xml-20081126>.
[W3C.REC-xml-names-20091208]
Bray, T., Hollander, D., Layman, A., Tobin, R., and H.
Thompson, "Namespaces in XML 1.0 (Third Edition)", World
Wide Web Consortium Recommendation REC-xml-names-20091208,
December 2009,
<http://www.w3.org/TR/2009/REC-xml-names-20091208>.
6.2. Informative References
[DRM] "Digital rights management", <http://en.wikipedia.org/w/
index.php?title=Digital_rights_management&
oldid=412653591>.
[IPR] "Intellectual property", <http://en.wikipedia.org/w/
index.php?title=Intellectual_property&oldid=411690322>.
[ODD] "Getting Started with P5 ODDs",
<http://www.tei-c.org/Guidelines/Customization/odds.xml>.
[W3C.REC-xinclude-20061115]
Marsh, J., Orchard, D., and D. Veillard, "XML Inclusions
(XInclude) Version 1.0 (Second Edition)", World Wide Web
Consortium Recommendation REC-xinclude-20061115,
November 2006,
<http://www.w3.org/TR/2006/REC-xinclude-20061115>.
Romary & Lundberg Informational [Page 7]
^L
RFC 6129 The 'application/tei+xml' Media Type February 2011
Authors' Addresses
Laurent Romary
TEI Consortium and INRIA
EMail: laurent.romary@inria.fr
URI: http://www.tei-c.org/
Sigfrid Lundberg
The Royal Library, Copenhagen
Postbox 2149
1016 Koebenhavn K
Denmark
EMail: slu@kb.dk
URI: http://sigfrid-lundberg.se/
Romary & Lundberg Informational [Page 8]
^L
|