summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1815.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1815.txt')
-rw-r--r--doc/rfc/rfc1815.txt339
1 files changed, 339 insertions, 0 deletions
diff --git a/doc/rfc/rfc1815.txt b/doc/rfc/rfc1815.txt
new file mode 100644
index 0000000..c3c7722
--- /dev/null
+++ b/doc/rfc/rfc1815.txt
@@ -0,0 +1,339 @@
+
+
+
+
+
+
+Network Working Group M. Ohta
+Request For Comments: 1815 Tokyo Institute of Technology
+Category: Informational July 1995
+
+
+ Character Sets ISO-10646 and ISO-10646-J-1
+
+Status of this Memo
+
+ This memo provides information for the Internet community. This memo
+ does not specify an Internet standard of any kind. Distribution of
+ this memo is unlimited.
+
+Abstract
+
+ Though the ISO character set standard of ISO 10646 is specified
+ reasonably well about European characters, it is not so useful in an
+ fully internationalized environment.
+
+ For the practical use of ISO 10646, a lot of external profiling such
+ as restriction of characters, restriction of combination of
+ characters and addition of language information is necessary.
+
+ This memo provides information on such profiling, along with charset
+ names to each profiled instance.
+
+ Though all the effort is done to make the resulting charset as useful
+ 10646 based charset as possible, the result is not so good. So, the
+ charsets defined in this memo are only for reference purpose and its
+ use for practical purpose is strongly discouraged.
+
+Introduction
+
+ This memo describes two text encoding schemes based on ISO 10646
+ [10646].
+
+ As ISO 10646 specifies too little about how text is visualized, to
+ practically use ISO 10646, it is necessary to restrict the standard
+ minimally and then add some amount of profiling information.
+
+ For ISO 2022 [ISO2022] based national standards, sufficient profiling
+ information is provided by national standardization bodies, but, for
+ ISO 10646, such a profiling is not yet provided.
+
+ As the profiling of ISO 10646 largely affects which character or
+ combination of characters could be properly displayed, changes of
+ profiling of ISO 10646 are as significant as additions of new
+ character sets of ISO 2022.
+
+
+
+M. Ohta Informational [Page 1]
+
+RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995
+
+
+ That is, it's impractical to support the entirety of ISO 10646 (new
+ restriction or profiling can always be added), so a client needs to
+ know whether some restriction or profiling is being used before it
+ can decide whether to display the body part. Thus, it is necessary to
+ provide multiple charset names to each variation of ISO 10646.
+
+ For example, in Japan with Japanese windows NT, only those Han
+ characters already supported by MS Kanji code (mostly equivalent to
+ JIS X 0208 [JISX0208]) can be displayed, because no other font
+ pattern is commonly provided.
+
+ The other problem of ISO 10646 for Han characters is that, to display
+ them in quality required for daily plain text processing in
+ China/Japan/Korea, it is necessary to add profiling information on
+ which one of Chinese/Japanese/Korean the text is using. It should be
+ noted that this feature makes multilingual mixed
+ Chinese/Japanese/Korean text with ISO 10646 impractical.
+
+ Also, just as [RFC1521] was unclear about how bi-directionality
+ should be supported with "ISO-8859-6" and "ISO-8859-8" which was
+ corrected by [RFC1556], it is also unclear how bi-directionality
+ could be supported with ISO 10646. There are too much ways to
+ support bi- directionality. So, until some bi-directionality
+ mechanism(s) becomes widely supported, it is necessary to exclude
+ characters for languages which requires bi-directionality support
+ from the minimal variation. It should be noted that, though ISO
+ 10646 is intended to be free from long term states, save for some
+ profiling information, introduction of bi-directionality with ISO
+ 10646 do requires the long term states.
+
+ Combining characters also cause problems. In many countries where
+ combining characters based on [ISO2022] is used, there are
+ restrictions on how combining characters are ordered [TIS]. Without
+ such restriction, the result of combination is completely meaningless
+ which is the current state of ISO 10646. That is, if some
+ combination is allowed in some implementation while the other does
+ not support it, communication between them is difficult unless ISO
+ 10646 is profiled to be least common set of widely supported
+ combinations. So, again, until combination restriction will be
+ developed for each language, it is necessary to exclude characters
+ for such languages from the minimal variation.
+
+ Conjoining characters also, may or may not be supported, which
+ requires another profiling.
+
+ According to those considerations, this memo defines two variations
+ of ISO 10646. They are "ISO-10646" as the minimal basic variation and
+ "ISO-10646-J-1" as the variation which could be useful in Japan.
+
+
+
+M. Ohta Informational [Page 2]
+
+RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995
+
+
+ Finally, this memo, by no means, promotes the use of ISO 10646 on the
+ Internet. It's use is strongly discouraged, when there are other
+ charsets which can encode the same information, Families of ISO 10646
+ based charsets, like ISO 2022 based charsets, only forms set of
+ mutually incompatible encoding systems and, unlike ISO 2022 based
+ charsets [2022INT], they can not be merged together to be the single
+ world wide charset.
+
+Description of "ISO-10646"
+
+ ISO-10646 is profiled to be the most basic part of the family of
+ encodings based on ISO 10646 and contains the following minimal
+ graphic characters:
+
+ collection number and name positions further restriction
+ ------------------------------------------------------------------
+ 1 BASIC LATIN 0020-007E
+ 2 LATIN-1 SUPPLEMENT 00A0-00FF
+
+ C0 and C1 control characters may also be used as specified in the
+ section 16 of ISO 10646.
+
+ The text with "ISO-10646" encodes text in 16 bit big endian form.
+
+ As no combining characters are included, "ISO-10646" can be used with
+ applications at implementation level 1.
+
+ Left-to-right directionality should be used.
+
+ The encoding is implemented by Windows/NT.
+
+ For practical communication, use of "ISO-10646" is discouraged.
+ "ISO-8859-1" [RFC1345] should be used instead.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+M. Ohta Informational [Page 3]
+
+RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995
+
+
+Description of "ISO-10646-J-1"
+
+ ISO-10646-J-1 is profiled to be useful for Japanese PC users who use
+ Japanese version of Windows/NT and contains the following graphic
+ characters:
+
+ collection number and name positions further restrictions
+ ------------------------------------------------------------------
+ 1 BASIC LATIN 0020-007E
+ 2 LATIN-1 SUPPLEMENT 00A0-00FF
+ 8 BASIC GREEK 0370-03CF
+ 10 CYRILLIC 0400-04FF
+ 32 GENERAL PUNCTUATION 2000-206F See note 1, below.
+ 39 MATHEMATICAL OPERATORS 2200-22FF See note 1, below.
+ 44 BOX DRAWING 2500-257F
+ 49 CJK SYMBOLS AND PUNCTUATION 3000-303F See note 1, below.
+ 50 HIRAGANA 3040-309F
+ 51 KATAKANA 30A0-30FF
+ 60 CJK UNIFIED IDEOGRAPHS 4E00-9FFF See note 1, below.
+ 62 CJK COMPATIBILITY IDEOGRAPHS F900-FAFF See note 1, below.
+ 66 CJK COMPATIBILITY FORMS FE30-FE4F
+ 69 HALFWIDTH AND FULLWIDTH FORMS FF00-FFEF
+
+ Note 1: Most of the characters are excluded. That is, only those
+ characters of JIS X 0208 [JISX0208] are included. The reason is that
+ the Japanese version of Windows/NT have fonts for them only and most
+ of the users can not read messages which contains other characters.
+
+ C0 and C1 control characters may also be used as specified in the
+ section 16 of ISO 10646.
+
+ The text with "ISO-10646-J-1" encodes text in 16 bit big endian form.
+
+ Shapes of Han characters should be of Japanese Han, that is, those of
+ column "J" in section 26 of ISO 10646.
+
+ As no combining characters are included, "ISO-10646-J-1" can be used
+ with applications at implementation level 1.
+
+ Characters in "HALFWIDTH AND FULLWIDTH FORMS" compared to be
+ different characters to the normal width characters.
+
+ When text is displayed horizontally, left-to-right directionality
+ should be used.
+
+ For practical communication, use of "ISO-10646-J-1" is discouraged.
+ ISO-2022-JP" [2022JP] should be used instead.
+
+
+
+
+M. Ohta Informational [Page 4]
+
+RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995
+
+
+MIME Considerations
+
+ The names given to the character encoding methods described in this
+ memo are, respectively, "ISO-10646" and "ISO-10646-J-1". This name
+ is intended to be used in MIME messages as follows:
+
+ Content-Type: text/plain; charset=iso-10646
+
+ The ISO-10646 and ISO-10646-J-1 encoding are in 16-bit form, so it is
+ often necessary to use a Content-Transfer-Encoding header. Base64
+ should be useful.
+
+ The ISO-10646 and ISO-10646-J-1 may also be used in MIME Part 2
+ headers [RFC1522]. The "B" encoding should be used with them.
+
+References
+
+ [10646] International Organization for Standardization (ISO),
+ "Universal Multiple-Octet Coded Character Set (UCS)",
+ International Standard, Ref. No. ISO/IEC 10646-1:1993
+ (E).
+
+ [2022INT] (An Internet Draft "draft-ohta-text-encoding-*.txt" may
+ be available).
+
+ [2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese
+ Character Encoding for Internet Messages", RFC 1468, June
+ 1993.
+
+ [ISO2022] International Organization for Standardization (ISO),
+ "Information processing -- ISO 7-bit and 8-bit coded
+ character sets -- Code extension techniques",
+ International Standard, Ref. No. ISO 2022-1986 (E).
+
+ [JISX0208] Japanese Standards Association, "Code of the Japanese
+ graphic character set for information interchange", JIS X
+ 0208-1990.
+
+ [RFC1345] Simonsen, K., "Character Mnemonics & Character Sets",
+ RFC-1345, Rationel Almen Planlaegning, June 1992.
+
+ [RFC1521] Borenstein, N., and Freed, N., "MIME (Multipurpose
+ Internet Mail Extensions) Part One: Mechanisms for
+ Specifying and Describing the Format of Internet Message
+ Bodies", RFC 1521, September 1993.
+
+
+
+
+
+
+M. Ohta Informational [Page 5]
+
+RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995
+
+
+ [RFC1522] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
+ Part Two: Message Header Extensions for Non-ASCII Text",
+ RFC 1522, September 1993.
+
+ [RFC1556] Nussbacher, H., "Handling of Bi-directional Texts in
+ MIME" RFC 1556, Israeli Inter-University Computer Center,
+ December 1993.
+
+ [TIS] Thai Industrial Standard for Thai Character Code for
+ Computer, TIS 620-2533:1990.
+
+Security Considerations
+
+ Security issues are not discussed in this memo.
+
+Author's Address
+
+ Masataka Ohta
+ Tokyo Institute of Technology
+ 2-12-1, O-okayama, Meguro-ku,
+ Tokyo 152, JAPAN
+
+ Phone: +81-3-5499-7084
+ Fax: +81-3-3729-1940
+ EMail: mohta@cc.titech.ac.jp
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+M. Ohta Informational [Page 6]
+