summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc2278.txt
blob: d4c9863b894f01c8926294358dd1e603b586b647 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
Network Working Group                                           N. Freed
Request for Comments: 2278                                      Innosoft
BCP: 19                                                        J. Postel
Category: Best Current Practice                                      ISI
                                                            January 1998

                              IANA Charset
                        Registration Procedures

Status of this Memo

   This document specifies an Internet Best Current Practices for the
   Internet Community, and requests discussion and suggestions for
   improvements.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The Internet Society (1998).  All Rights Reserved.

1.  Abstract

   MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various other
   modern Internet protocols are capable of using many different
   charsets. This in turn means that the ability to label different
   charsets is essential. This registration procedure exists solely to
   associate a specific name or names with a given charset and to give
   an indication of whether or not a given charset can be used in MIME
   text objects. In particular, the general applicability and
   appropriateness of a given registered charset is a protocol issue,
   not a registration issue, and is not dealt with by this registration
   procedure.

2.  Definitions and Notation

   The following sections define various terms used in this document.

2.1.  Requirements Notation

   This document occasionally uses terms that appear in capital letters.
   When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
   appear capitalized, they are being used to indicate particular
   requirements of this specification. A discussion of the meanings of
   these terms appears in [RFC-2119].








Freed & Postel           Best Current Practice                  [Page 1]
^L
RFC 2278                  Charset Registration              January 1998


2.2.  Character

   A member of a set of elements used for the organisation, control, or
   representation of data.

2.3.  Charset

   The term "charset" (see historical note below) is used here to refer
   to a method of converting a sequence of octets into a sequence of
   characters. This conversion may also optionally produce additional
   control information such as directionality indicators.

   Note that unconditional and unambiguous conversion in the other
   direction is not required, in that not all characters may be
   representable by a given charset and a charset may provide more than
   one sequence of octets to represent a particular sequence of
   characters.

   This definition is intended to allow charsets to be defined in a
   variety of different ways, from simple single-table mappings such as
   US-ASCII to complex table switching methods such as those that use
   ISO 2022's techniques, to be used as charsets.  However, the
   definition associated with a charset name must fully specify the
   mapping to be performed.  In particular, use of external profiling
   information to determine the exact mapping is not permitted.

   HISTORICAL NOTE: The term "character set" was originally used in MIME
   to describe such straightforward schemes as US-ASCII and ISO-8859-1
   which consist of a small set of characters and a simple one-to-one
   mapping from single octets to single characters. Multi-octet
   character encoding schemes and switching techniques make the
   situation much more complex. As such, the definition of this term was
   revised to emphasize both the conversion aspect of the process, and
   the term itself has been changed to "charset" to emphasize that it is
   not, after all, just a set of characters. A discussion of these
   issues as well as specification of standard terminology for use in
   the IETF appears in RFC 2130.

2.4.  Coded Character Set

   A Coded Character Set (CCS) is a mapping from a set of abstract
   characters to a set of integers. Examples of coded character sets are
   ISO 10646 [ISO-10646], US-ASCII [US-ASCII], and the ISO-8859 series
   [ISO-8859].







Freed & Postel           Best Current Practice                  [Page 2]
^L
RFC 2278                  Charset Registration              January 1998


2.5.  Character Encoding Scheme

   A Character Encoding Scheme (CES) is a mapping from a Coded Character
   Set or several coded character sets to a set of octets. A given CES
   is typically associated with a single CCS; for example, UTF-8 applies
   only to ISO 10646.

3.  Registration Requirements

   Registered charsets are expected to conform to a number of
   requirements as described below.

3.1.  Required Characteristics

   Registered charsets MUST conform to the definition of a "charset"
   given above.  In addition, charsets intended for use in MIME content
   types under the "text" top-level type must conform to the
   restrictions on that type described in RFC 2045. All registered
   charsets MUST note whether or not they are suitable for use in MIME.

   All charsets which are constructed as a composition of a CCS and a
   CES MUST either include the CCS and CES they are based on in their
   registration or else cite a definition of their CCS and CES that
   appears elsewhere.

   All registered charsets MUST be specified in a stable, openly
   available specification. Registration of charsets whose
   specifications aren't stable and openly available is forbidden.

3.2.  New Charsets

   This registration mechanism is not intended to be a vehicle for the
   definition of entirely new charsets. This is due to the fact that the
   registration process does NOT contain adequate review mechanisims for
   such undertakings.

   As such, only charsets defined by other processes and standards
   bodies, or specific profiles of such charsets, are eligible for
   registration.

3.3.  Naming Requirements

   One or more names MUST be assigned to all registered charsets.
   Multiple names for the same charset are permitted, but if multiple
   names are assigned a single primary name for the charset MUST be
   identified. All other names are considered to be aliases for the
   primary name and use of the primary name is preferred over use of any
   of the aliases.



Freed & Postel           Best Current Practice                  [Page 3]
^L
RFC 2278                  Charset Registration              January 1998


   Each assigned name MUST uniquely identify a single charset.  All
   charset names MUST be suitable for use as the value of a MIME content
   type charset parameter and hence MUST conform to MIME parameter value
   syntax. This applies even if the specific charset being registered is
   not suitable for use with the "text" media type.

   Finally, charsets being registered for use with the "text" media type
   MUST have a primary name that conforms to the more restrictive syntax
   of the charset field in MIME encoded-words [RFC-2047, RFC-2184] and
   MIME extended parameter values [RFC-2184]. A combined ABNF definition
   for such names is as follows:

   mime-charset = 1*<Any CHAR except SPACE, CTLs, and cspecials>

   cspecials    = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
                  <"> / "/" / "[" / "]" / "?" / "." / "=" / "*"

   CHAR         =  <any ASCII character>        ; (  0-177,  0.-127.)
   SPACE        =  <ASCII SP, space>            ; (     40,      32.)
   CTL          =  <any ASCII control           ; (  0- 37,  0.- 31.)
                    character and DEL>          ; (    177,     127.)

3.4.  Functionality Requirement

   Charsets must function as actual charsets: Registration of things
   that are better thought of as a transfer encoding, as a media type,
   or as a collection of separate entities of another type, is not
   allowed.  For example, although HTML could theoretically be thought
   of as a charset, it is really better thought of as a media type and
   as such it cannot be registered as a charset.

3.5.  Usage and Implementation Requirements

   Use of a large number of charsets in a given protocol may hamper
   interoperability. However, the use of a large number of undocumented
   and/or unlabelled charsets hampers interoperability even more.

   A charset should therefore be registered ONLY if it adds significant
   functionality that is valuable to a large community, OR if it
   documents existing practice in a large community. Note that charsets
   registered for the second reason should be explicitly marked as being
   of limited or specialized use and should only be used in Internet
   messages with prior bilateral agreement.

3.6.  Publication Requirements

   Charset registrations can be published in RFCs, however, RFC
   publication is not required to register a new charset.



Freed & Postel           Best Current Practice                  [Page 4]
^L
RFC 2278                  Charset Registration              January 1998


   The registration of a charset does not imply endorsement, approval,
   or recommendation by the IANA, IESG, or IETF, or even certification
   that the specification is adequate. It is expected that applicability
   statements for particular applications will be published from time to
   time that recommend implementation of, and support for, charsets that
   have proven particularly useful in those contexts.

3.7.  MIBenum Requirements

   Each registered charset MUST also be assigned a unique enumerated
   integer value. These "MIBenum" values are defined by and used in the
   Printer MIB [RFC-1759].

   A MIBenum value for each charset will be assigned by IANA at the time
   of registration.

4.  Registration Procedure

   The following procedure has been implemented by the IANA for review
   and approval of new charsets.  This is not a formal standards
   process, but rather an administrative procedure intended to allow
   community comment and sanity checking without excessive time delay.

4.1.  Present the Charset to the Community

   Send the proposed charset registration to the "ietf-
   charsets@iana.org" mailing list.  This mailing list has been
   established for the sole purpose of reviewing proposed charset
   registrations. Proposed charsets are not formally registered and must
   not be used; the "x-" prefix specified in RFC 2045 can be used until
   registration is complete.

   The intent of the public posting is to solicit comments and feedback
   on the definition of the charset and the name chosen for it over a
   two week period.

4.2.  Charset Reviewer

   When the two week period has passed and the registration proposer is
   convinced that consensus has been achieved, the registration
   application should be submitted to IANA and the charset reviewer. The
   charset reviewer, who is appointed by the IETF Applications Area
   Director(s), either approves the request for registration or rejects
   it.  Rejection may occur because of significant objections raised on
   the list or objections raised externally.  If the charset reviewer
   considers the registration sufficiently important and controversial,
   a last call for comments may be issued to the full IETF. The charset




Freed & Postel           Best Current Practice                  [Page 5]
^L
RFC 2278                  Charset Registration              January 1998


   reviewer may also recommend standards track processing (before or
   after registration) when that appears appropriate and the level of
   specification of the charset is adequate.

   Decisions made by the reviewer must be posted to the ietf-charsets
   mailing list within 14 days. Decisions made by the reviewer may be
   appealed to the IESG.

4.3.  IANA Registration

   Provided that the charset registration has either passed review or
   has been successfully appealed to the IESG, the IANA will register
   the charset, assign a MIBenum value, and make its registration
   available to the community.

5.  Location of Registered Charset List

   Charset registrations will be posted in the anonymous FTP file
   "ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets" and all
   registered charsets will be listed in the periodically issued
   "Assigned Numbers" RFC [currently RFC-1700].  The description of the
   charset may also be published as an Informational RFC by sending it
   to "rfc-editor@isi.edu" (please follow the instructions to RFC
   authors [RFC-2223]).

6.  Registration Template

   To: ietf-charsets@iana.org
   Subject: Registration of new charset

   Charset name(s):

   (All names must be suitable for use as the value of a MIME content-
   type parameter.)

   Published specification(s):

   (A specification for the charset must be openly available that
   accurately describes what is being registered. If a charset is
   defined as a composition of a CCS and a CES then these defintions
   must either be included or referenced.)

   Person & email address to contact for further information:








Freed & Postel           Best Current Practice                  [Page 6]
^L
RFC 2278                  Charset Registration              January 1998


7.  Security Considerations

   This registration procedure is not known to raise any sort of
   security considerations that are appreciably different from those
   already existing in the protocols that employ registered charsets.

8.  References

   [ISO-2022]
        International Standard -- Information Processing -- Character
        Code Structure and Extension Techniques, ISO/IEC 2022:1994, 4th
        ed.

   [ISO-8859]
        International Standard -- Information Processing -- 8-bit
        Single-Byte Coded Graphic Character Sets
        - Part 1: Latin Alphabet No. 1, ISO 8859-1:1987, 1st ed.
        - Part 2: Latin Alphabet No. 2, ISO 8859-2:1987, 1st ed.
        - Part 3: Latin Alphabet No. 3, ISO 8859-3:1988, 1st ed.
        - Part 4: Latin Alphabet No. 4, ISO 8859-4:1988, 1st ed.
        - Part 5: Latin/Cyrillic Alphabet, ISO 8859-5:1988, 1st
        ed.
        - Part 6: Latin/Arabic Alphabet, ISO 8859-6:1987, 1st ed.
        - Part 7: Latin/Greek Alphabet, ISO 8859-7:1987, 1st ed.
        - Part 8: Latin/Hebrew Alphabet, ISO 8859-8:1988, 1st ed.
        - Part 9: Latin Alphabet No. 5, ISO/IEC 8859-9:1989, 1st
        ed.
        International Standard -- Information Technology -- 8-bit
        Single-Byte Coded Graphic Character Sets
        - Part 10: Latin Alphabet No. 6, ISO/IEC 8859-10:1992,
        1st ed.

   [ISO-10646]
        ISO/IEC 10646-1:1993(E),  "Information technology --
        Universal Multiple-Octet Coded Character Set (UCS) --
        Part 1: Architecture and Basic Multilingual Plane",
        JTC1/SC2, 1993.

   [RFC-2048]
        Freed, N., Klensin, J., and J. Postel, "Multipurpose Internet
        Mail Extensions (MIME) Part Four: Registration Procedures", RFC
        2048, November 1996.

   [RFC-1700]
        Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
        1700, October 1994.





Freed & Postel           Best Current Practice                  [Page 7]
^L
RFC 2278                  Charset Registration              January 1998


   [RFC-1759]
        Smith, R., Wright, F., Hastings, T., Zilles, S., and J.
        Gyllenskog, "Printer MIB", RFC 1759, March 1995.

   [RFC-2045]
        Freed, N., and N. Borenstein, "Multipurpose Internet Mail
        Extensions (MIME) Part One: Format of Internet Message Bodies",
        RFC 2045, November 1996.

   [RFC-2046]
        Freed, N., and N. Borenstein, "Multipurpose Internet Mail
        Extensions (MIME) Part Two: Media Types", RFC 2046, November
        1996.

   [RFC-2047]
        Moore, K., "Multipurpose Internet Mail Extensions (MIME) Part
        Three: Representation of Non-Ascii Text in Internet Message
        Headers", RFC 2047, November 1996.

   [RFC-2119]
        Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels", BCP 14, RFC 2119, March 1997.

   [RFC-2130]
        Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson,
        R., Crispin, M., and P. Svanberg, "Report from the IAB Character
        Set Workshop", RFC 2130, April 1997.

   [RFC-2184]
        Freed, N., and K. Moore, "MIME Parameter Value and Encoded Word
        Extensions: Character Sets, Languages, and Continuations", RFC
        2184, August 1997.

   [US-ASCII]
        Coded Character Set -- 7-Bit American Standard Code for
        Information Interchange, ANSI X3.4-1986.















Freed & Postel           Best Current Practice                  [Page 8]
^L
RFC 2278                  Charset Registration              January 1998


9.  Authors' Addresses

   Ned Freed
   Innosoft International, Inc.
   1050 Lakes Drive
   West Covina, CA 91790
   USA

   Phone: +1 626 919 3600
   Fax:   +1 626 919 3614
   EMail: ned.freed@innosoft.com


   Jon Postel
   USC/Information Sciences Institute
   4676 Admiralty Way
   Marina del Rey, CA  90292
   USA

   Phone: +1 310 822 1511
   Fax:   +1 310 823 6714
   EMail: Postel@ISI.EDU





























Freed & Postel           Best Current Practice                  [Page 9]
^L
RFC 2278                  Charset Registration              January 1998


Full Copyright Statement

   Copyright (C) The Internet Society (1998).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
























Freed & Postel           Best Current Practice                 [Page 10]
^L