summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc2278.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc2278.txt')
-rw-r--r--doc/rfc/rfc2278.txt563
1 files changed, 563 insertions, 0 deletions
diff --git a/doc/rfc/rfc2278.txt b/doc/rfc/rfc2278.txt
new file mode 100644
index 0000000..d4c9863
--- /dev/null
+++ b/doc/rfc/rfc2278.txt
@@ -0,0 +1,563 @@
+
+
+
+
+
+
+Network Working Group N. Freed
+Request for Comments: 2278 Innosoft
+BCP: 19 J. Postel
+Category: Best Current Practice ISI
+ January 1998
+
+ IANA Charset
+ Registration Procedures
+
+Status of this Memo
+
+ This document specifies an Internet Best Current Practices for the
+ Internet Community, and requests discussion and suggestions for
+ improvements. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (1998). All Rights Reserved.
+
+1. Abstract
+
+ MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various other
+ modern Internet protocols are capable of using many different
+ charsets. This in turn means that the ability to label different
+ charsets is essential. This registration procedure exists solely to
+ associate a specific name or names with a given charset and to give
+ an indication of whether or not a given charset can be used in MIME
+ text objects. In particular, the general applicability and
+ appropriateness of a given registered charset is a protocol issue,
+ not a registration issue, and is not dealt with by this registration
+ procedure.
+
+2. Definitions and Notation
+
+ The following sections define various terms used in this document.
+
+2.1. Requirements Notation
+
+ This document occasionally uses terms that appear in capital letters.
+ When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
+ appear capitalized, they are being used to indicate particular
+ requirements of this specification. A discussion of the meanings of
+ these terms appears in [RFC-2119].
+
+
+
+
+
+
+
+
+Freed & Postel Best Current Practice [Page 1]
+
+RFC 2278 Charset Registration January 1998
+
+
+2.2. Character
+
+ A member of a set of elements used for the organisation, control, or
+ representation of data.
+
+2.3. Charset
+
+ The term "charset" (see historical note below) is used here to refer
+ to a method of converting a sequence of octets into a sequence of
+ characters. This conversion may also optionally produce additional
+ control information such as directionality indicators.
+
+ Note that unconditional and unambiguous conversion in the other
+ direction is not required, in that not all characters may be
+ representable by a given charset and a charset may provide more than
+ one sequence of octets to represent a particular sequence of
+ characters.
+
+ This definition is intended to allow charsets to be defined in a
+ variety of different ways, from simple single-table mappings such as
+ US-ASCII to complex table switching methods such as those that use
+ ISO 2022's techniques, to be used as charsets. However, the
+ definition associated with a charset name must fully specify the
+ mapping to be performed. In particular, use of external profiling
+ information to determine the exact mapping is not permitted.
+
+ HISTORICAL NOTE: The term "character set" was originally used in MIME
+ to describe such straightforward schemes as US-ASCII and ISO-8859-1
+ which consist of a small set of characters and a simple one-to-one
+ mapping from single octets to single characters. Multi-octet
+ character encoding schemes and switching techniques make the
+ situation much more complex. As such, the definition of this term was
+ revised to emphasize both the conversion aspect of the process, and
+ the term itself has been changed to "charset" to emphasize that it is
+ not, after all, just a set of characters. A discussion of these
+ issues as well as specification of standard terminology for use in
+ the IETF appears in RFC 2130.
+
+2.4. Coded Character Set
+
+ A Coded Character Set (CCS) is a mapping from a set of abstract
+ characters to a set of integers. Examples of coded character sets are
+ ISO 10646 [ISO-10646], US-ASCII [US-ASCII], and the ISO-8859 series
+ [ISO-8859].
+
+
+
+
+
+
+
+Freed & Postel Best Current Practice [Page 2]
+
+RFC 2278 Charset Registration January 1998
+
+
+2.5. Character Encoding Scheme
+
+ A Character Encoding Scheme (CES) is a mapping from a Coded Character
+ Set or several coded character sets to a set of octets. A given CES
+ is typically associated with a single CCS; for example, UTF-8 applies
+ only to ISO 10646.
+
+3. Registration Requirements
+
+ Registered charsets are expected to conform to a number of
+ requirements as described below.
+
+3.1. Required Characteristics
+
+ Registered charsets MUST conform to the definition of a "charset"
+ given above. In addition, charsets intended for use in MIME content
+ types under the "text" top-level type must conform to the
+ restrictions on that type described in RFC 2045. All registered
+ charsets MUST note whether or not they are suitable for use in MIME.
+
+ All charsets which are constructed as a composition of a CCS and a
+ CES MUST either include the CCS and CES they are based on in their
+ registration or else cite a definition of their CCS and CES that
+ appears elsewhere.
+
+ All registered charsets MUST be specified in a stable, openly
+ available specification. Registration of charsets whose
+ specifications aren't stable and openly available is forbidden.
+
+3.2. New Charsets
+
+ This registration mechanism is not intended to be a vehicle for the
+ definition of entirely new charsets. This is due to the fact that the
+ registration process does NOT contain adequate review mechanisims for
+ such undertakings.
+
+ As such, only charsets defined by other processes and standards
+ bodies, or specific profiles of such charsets, are eligible for
+ registration.
+
+3.3. Naming Requirements
+
+ One or more names MUST be assigned to all registered charsets.
+ Multiple names for the same charset are permitted, but if multiple
+ names are assigned a single primary name for the charset MUST be
+ identified. All other names are considered to be aliases for the
+ primary name and use of the primary name is preferred over use of any
+ of the aliases.
+
+
+
+Freed & Postel Best Current Practice [Page 3]
+
+RFC 2278 Charset Registration January 1998
+
+
+ Each assigned name MUST uniquely identify a single charset. All
+ charset names MUST be suitable for use as the value of a MIME content
+ type charset parameter and hence MUST conform to MIME parameter value
+ syntax. This applies even if the specific charset being registered is
+ not suitable for use with the "text" media type.
+
+ Finally, charsets being registered for use with the "text" media type
+ MUST have a primary name that conforms to the more restrictive syntax
+ of the charset field in MIME encoded-words [RFC-2047, RFC-2184] and
+ MIME extended parameter values [RFC-2184]. A combined ABNF definition
+ for such names is as follows:
+
+ mime-charset = 1*<Any CHAR except SPACE, CTLs, and cspecials>
+
+ cspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
+ <"> / "/" / "[" / "]" / "?" / "." / "=" / "*"
+
+ CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
+ SPACE = <ASCII SP, space> ; ( 40, 32.)
+ CTL = <any ASCII control ; ( 0- 37, 0.- 31.)
+ character and DEL> ; ( 177, 127.)
+
+3.4. Functionality Requirement
+
+ Charsets must function as actual charsets: Registration of things
+ that are better thought of as a transfer encoding, as a media type,
+ or as a collection of separate entities of another type, is not
+ allowed. For example, although HTML could theoretically be thought
+ of as a charset, it is really better thought of as a media type and
+ as such it cannot be registered as a charset.
+
+3.5. Usage and Implementation Requirements
+
+ Use of a large number of charsets in a given protocol may hamper
+ interoperability. However, the use of a large number of undocumented
+ and/or unlabelled charsets hampers interoperability even more.
+
+ A charset should therefore be registered ONLY if it adds significant
+ functionality that is valuable to a large community, OR if it
+ documents existing practice in a large community. Note that charsets
+ registered for the second reason should be explicitly marked as being
+ of limited or specialized use and should only be used in Internet
+ messages with prior bilateral agreement.
+
+3.6. Publication Requirements
+
+ Charset registrations can be published in RFCs, however, RFC
+ publication is not required to register a new charset.
+
+
+
+Freed & Postel Best Current Practice [Page 4]
+
+RFC 2278 Charset Registration January 1998
+
+
+ The registration of a charset does not imply endorsement, approval,
+ or recommendation by the IANA, IESG, or IETF, or even certification
+ that the specification is adequate. It is expected that applicability
+ statements for particular applications will be published from time to
+ time that recommend implementation of, and support for, charsets that
+ have proven particularly useful in those contexts.
+
+3.7. MIBenum Requirements
+
+ Each registered charset MUST also be assigned a unique enumerated
+ integer value. These "MIBenum" values are defined by and used in the
+ Printer MIB [RFC-1759].
+
+ A MIBenum value for each charset will be assigned by IANA at the time
+ of registration.
+
+4. Registration Procedure
+
+ The following procedure has been implemented by the IANA for review
+ and approval of new charsets. This is not a formal standards
+ process, but rather an administrative procedure intended to allow
+ community comment and sanity checking without excessive time delay.
+
+4.1. Present the Charset to the Community
+
+ Send the proposed charset registration to the "ietf-
+ charsets@iana.org" mailing list. This mailing list has been
+ established for the sole purpose of reviewing proposed charset
+ registrations. Proposed charsets are not formally registered and must
+ not be used; the "x-" prefix specified in RFC 2045 can be used until
+ registration is complete.
+
+ The intent of the public posting is to solicit comments and feedback
+ on the definition of the charset and the name chosen for it over a
+ two week period.
+
+4.2. Charset Reviewer
+
+ When the two week period has passed and the registration proposer is
+ convinced that consensus has been achieved, the registration
+ application should be submitted to IANA and the charset reviewer. The
+ charset reviewer, who is appointed by the IETF Applications Area
+ Director(s), either approves the request for registration or rejects
+ it. Rejection may occur because of significant objections raised on
+ the list or objections raised externally. If the charset reviewer
+ considers the registration sufficiently important and controversial,
+ a last call for comments may be issued to the full IETF. The charset
+
+
+
+
+Freed & Postel Best Current Practice [Page 5]
+
+RFC 2278 Charset Registration January 1998
+
+
+ reviewer may also recommend standards track processing (before or
+ after registration) when that appears appropriate and the level of
+ specification of the charset is adequate.
+
+ Decisions made by the reviewer must be posted to the ietf-charsets
+ mailing list within 14 days. Decisions made by the reviewer may be
+ appealed to the IESG.
+
+4.3. IANA Registration
+
+ Provided that the charset registration has either passed review or
+ has been successfully appealed to the IESG, the IANA will register
+ the charset, assign a MIBenum value, and make its registration
+ available to the community.
+
+5. Location of Registered Charset List
+
+ Charset registrations will be posted in the anonymous FTP file
+ "ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets" and all
+ registered charsets will be listed in the periodically issued
+ "Assigned Numbers" RFC [currently RFC-1700]. The description of the
+ charset may also be published as an Informational RFC by sending it
+ to "rfc-editor@isi.edu" (please follow the instructions to RFC
+ authors [RFC-2223]).
+
+6. Registration Template
+
+ To: ietf-charsets@iana.org
+ Subject: Registration of new charset
+
+ Charset name(s):
+
+ (All names must be suitable for use as the value of a MIME content-
+ type parameter.)
+
+ Published specification(s):
+
+ (A specification for the charset must be openly available that
+ accurately describes what is being registered. If a charset is
+ defined as a composition of a CCS and a CES then these defintions
+ must either be included or referenced.)
+
+ Person & email address to contact for further information:
+
+
+
+
+
+
+
+
+Freed & Postel Best Current Practice [Page 6]
+
+RFC 2278 Charset Registration January 1998
+
+
+7. Security Considerations
+
+ This registration procedure is not known to raise any sort of
+ security considerations that are appreciably different from those
+ already existing in the protocols that employ registered charsets.
+
+8. References
+
+ [ISO-2022]
+ International Standard -- Information Processing -- Character
+ Code Structure and Extension Techniques, ISO/IEC 2022:1994, 4th
+ ed.
+
+ [ISO-8859]
+ International Standard -- Information Processing -- 8-bit
+ Single-Byte Coded Graphic Character Sets
+ - Part 1: Latin Alphabet No. 1, ISO 8859-1:1987, 1st ed.
+ - Part 2: Latin Alphabet No. 2, ISO 8859-2:1987, 1st ed.
+ - Part 3: Latin Alphabet No. 3, ISO 8859-3:1988, 1st ed.
+ - Part 4: Latin Alphabet No. 4, ISO 8859-4:1988, 1st ed.
+ - Part 5: Latin/Cyrillic Alphabet, ISO 8859-5:1988, 1st
+ ed.
+ - Part 6: Latin/Arabic Alphabet, ISO 8859-6:1987, 1st ed.
+ - Part 7: Latin/Greek Alphabet, ISO 8859-7:1987, 1st ed.
+ - Part 8: Latin/Hebrew Alphabet, ISO 8859-8:1988, 1st ed.
+ - Part 9: Latin Alphabet No. 5, ISO/IEC 8859-9:1989, 1st
+ ed.
+ International Standard -- Information Technology -- 8-bit
+ Single-Byte Coded Graphic Character Sets
+ - Part 10: Latin Alphabet No. 6, ISO/IEC 8859-10:1992,
+ 1st ed.
+
+ [ISO-10646]
+ ISO/IEC 10646-1:1993(E), "Information technology --
+ Universal Multiple-Octet Coded Character Set (UCS) --
+ Part 1: Architecture and Basic Multilingual Plane",
+ JTC1/SC2, 1993.
+
+ [RFC-2048]
+ Freed, N., Klensin, J., and J. Postel, "Multipurpose Internet
+ Mail Extensions (MIME) Part Four: Registration Procedures", RFC
+ 2048, November 1996.
+
+ [RFC-1700]
+ Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
+ 1700, October 1994.
+
+
+
+
+
+Freed & Postel Best Current Practice [Page 7]
+
+RFC 2278 Charset Registration January 1998
+
+
+ [RFC-1759]
+ Smith, R., Wright, F., Hastings, T., Zilles, S., and J.
+ Gyllenskog, "Printer MIB", RFC 1759, March 1995.
+
+ [RFC-2045]
+ Freed, N., and N. Borenstein, "Multipurpose Internet Mail
+ Extensions (MIME) Part One: Format of Internet Message Bodies",
+ RFC 2045, November 1996.
+
+ [RFC-2046]
+ Freed, N., and N. Borenstein, "Multipurpose Internet Mail
+ Extensions (MIME) Part Two: Media Types", RFC 2046, November
+ 1996.
+
+ [RFC-2047]
+ Moore, K., "Multipurpose Internet Mail Extensions (MIME) Part
+ Three: Representation of Non-Ascii Text in Internet Message
+ Headers", RFC 2047, November 1996.
+
+ [RFC-2119]
+ Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [RFC-2130]
+ Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson,
+ R., Crispin, M., and P. Svanberg, "Report from the IAB Character
+ Set Workshop", RFC 2130, April 1997.
+
+ [RFC-2184]
+ Freed, N., and K. Moore, "MIME Parameter Value and Encoded Word
+ Extensions: Character Sets, Languages, and Continuations", RFC
+ 2184, August 1997.
+
+ [US-ASCII]
+ Coded Character Set -- 7-Bit American Standard Code for
+ Information Interchange, ANSI X3.4-1986.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed & Postel Best Current Practice [Page 8]
+
+RFC 2278 Charset Registration January 1998
+
+
+9. Authors' Addresses
+
+ Ned Freed
+ Innosoft International, Inc.
+ 1050 Lakes Drive
+ West Covina, CA 91790
+ USA
+
+ Phone: +1 626 919 3600
+ Fax: +1 626 919 3614
+ EMail: ned.freed@innosoft.com
+
+
+ Jon Postel
+ USC/Information Sciences Institute
+ 4676 Admiralty Way
+ Marina del Rey, CA 90292
+ USA
+
+ Phone: +1 310 822 1511
+ Fax: +1 310 823 6714
+ EMail: Postel@ISI.EDU
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed & Postel Best Current Practice [Page 9]
+
+RFC 2278 Charset Registration January 1998
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (1998). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed & Postel Best Current Practice [Page 10]
+