diff options
Diffstat (limited to 'doc/rfc/rfc1815.txt')
-rw-r--r-- | doc/rfc/rfc1815.txt | 339 |
1 files changed, 339 insertions, 0 deletions
diff --git a/doc/rfc/rfc1815.txt b/doc/rfc/rfc1815.txt new file mode 100644 index 0000000..c3c7722 --- /dev/null +++ b/doc/rfc/rfc1815.txt @@ -0,0 +1,339 @@ + + + + + + +Network Working Group M. Ohta +Request For Comments: 1815 Tokyo Institute of Technology +Category: Informational July 1995 + + + Character Sets ISO-10646 and ISO-10646-J-1 + +Status of this Memo + + This memo provides information for the Internet community. This memo + does not specify an Internet standard of any kind. Distribution of + this memo is unlimited. + +Abstract + + Though the ISO character set standard of ISO 10646 is specified + reasonably well about European characters, it is not so useful in an + fully internationalized environment. + + For the practical use of ISO 10646, a lot of external profiling such + as restriction of characters, restriction of combination of + characters and addition of language information is necessary. + + This memo provides information on such profiling, along with charset + names to each profiled instance. + + Though all the effort is done to make the resulting charset as useful + 10646 based charset as possible, the result is not so good. So, the + charsets defined in this memo are only for reference purpose and its + use for practical purpose is strongly discouraged. + +Introduction + + This memo describes two text encoding schemes based on ISO 10646 + [10646]. + + As ISO 10646 specifies too little about how text is visualized, to + practically use ISO 10646, it is necessary to restrict the standard + minimally and then add some amount of profiling information. + + For ISO 2022 [ISO2022] based national standards, sufficient profiling + information is provided by national standardization bodies, but, for + ISO 10646, such a profiling is not yet provided. + + As the profiling of ISO 10646 largely affects which character or + combination of characters could be properly displayed, changes of + profiling of ISO 10646 are as significant as additions of new + character sets of ISO 2022. + + + +M. Ohta Informational [Page 1] + +RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995 + + + That is, it's impractical to support the entirety of ISO 10646 (new + restriction or profiling can always be added), so a client needs to + know whether some restriction or profiling is being used before it + can decide whether to display the body part. Thus, it is necessary to + provide multiple charset names to each variation of ISO 10646. + + For example, in Japan with Japanese windows NT, only those Han + characters already supported by MS Kanji code (mostly equivalent to + JIS X 0208 [JISX0208]) can be displayed, because no other font + pattern is commonly provided. + + The other problem of ISO 10646 for Han characters is that, to display + them in quality required for daily plain text processing in + China/Japan/Korea, it is necessary to add profiling information on + which one of Chinese/Japanese/Korean the text is using. It should be + noted that this feature makes multilingual mixed + Chinese/Japanese/Korean text with ISO 10646 impractical. + + Also, just as [RFC1521] was unclear about how bi-directionality + should be supported with "ISO-8859-6" and "ISO-8859-8" which was + corrected by [RFC1556], it is also unclear how bi-directionality + could be supported with ISO 10646. There are too much ways to + support bi- directionality. So, until some bi-directionality + mechanism(s) becomes widely supported, it is necessary to exclude + characters for languages which requires bi-directionality support + from the minimal variation. It should be noted that, though ISO + 10646 is intended to be free from long term states, save for some + profiling information, introduction of bi-directionality with ISO + 10646 do requires the long term states. + + Combining characters also cause problems. In many countries where + combining characters based on [ISO2022] is used, there are + restrictions on how combining characters are ordered [TIS]. Without + such restriction, the result of combination is completely meaningless + which is the current state of ISO 10646. That is, if some + combination is allowed in some implementation while the other does + not support it, communication between them is difficult unless ISO + 10646 is profiled to be least common set of widely supported + combinations. So, again, until combination restriction will be + developed for each language, it is necessary to exclude characters + for such languages from the minimal variation. + + Conjoining characters also, may or may not be supported, which + requires another profiling. + + According to those considerations, this memo defines two variations + of ISO 10646. They are "ISO-10646" as the minimal basic variation and + "ISO-10646-J-1" as the variation which could be useful in Japan. + + + +M. Ohta Informational [Page 2] + +RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995 + + + Finally, this memo, by no means, promotes the use of ISO 10646 on the + Internet. It's use is strongly discouraged, when there are other + charsets which can encode the same information, Families of ISO 10646 + based charsets, like ISO 2022 based charsets, only forms set of + mutually incompatible encoding systems and, unlike ISO 2022 based + charsets [2022INT], they can not be merged together to be the single + world wide charset. + +Description of "ISO-10646" + + ISO-10646 is profiled to be the most basic part of the family of + encodings based on ISO 10646 and contains the following minimal + graphic characters: + + collection number and name positions further restriction + ------------------------------------------------------------------ + 1 BASIC LATIN 0020-007E + 2 LATIN-1 SUPPLEMENT 00A0-00FF + + C0 and C1 control characters may also be used as specified in the + section 16 of ISO 10646. + + The text with "ISO-10646" encodes text in 16 bit big endian form. + + As no combining characters are included, "ISO-10646" can be used with + applications at implementation level 1. + + Left-to-right directionality should be used. + + The encoding is implemented by Windows/NT. + + For practical communication, use of "ISO-10646" is discouraged. + "ISO-8859-1" [RFC1345] should be used instead. + + + + + + + + + + + + + + + + + + +M. Ohta Informational [Page 3] + +RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995 + + +Description of "ISO-10646-J-1" + + ISO-10646-J-1 is profiled to be useful for Japanese PC users who use + Japanese version of Windows/NT and contains the following graphic + characters: + + collection number and name positions further restrictions + ------------------------------------------------------------------ + 1 BASIC LATIN 0020-007E + 2 LATIN-1 SUPPLEMENT 00A0-00FF + 8 BASIC GREEK 0370-03CF + 10 CYRILLIC 0400-04FF + 32 GENERAL PUNCTUATION 2000-206F See note 1, below. + 39 MATHEMATICAL OPERATORS 2200-22FF See note 1, below. + 44 BOX DRAWING 2500-257F + 49 CJK SYMBOLS AND PUNCTUATION 3000-303F See note 1, below. + 50 HIRAGANA 3040-309F + 51 KATAKANA 30A0-30FF + 60 CJK UNIFIED IDEOGRAPHS 4E00-9FFF See note 1, below. + 62 CJK COMPATIBILITY IDEOGRAPHS F900-FAFF See note 1, below. + 66 CJK COMPATIBILITY FORMS FE30-FE4F + 69 HALFWIDTH AND FULLWIDTH FORMS FF00-FFEF + + Note 1: Most of the characters are excluded. That is, only those + characters of JIS X 0208 [JISX0208] are included. The reason is that + the Japanese version of Windows/NT have fonts for them only and most + of the users can not read messages which contains other characters. + + C0 and C1 control characters may also be used as specified in the + section 16 of ISO 10646. + + The text with "ISO-10646-J-1" encodes text in 16 bit big endian form. + + Shapes of Han characters should be of Japanese Han, that is, those of + column "J" in section 26 of ISO 10646. + + As no combining characters are included, "ISO-10646-J-1" can be used + with applications at implementation level 1. + + Characters in "HALFWIDTH AND FULLWIDTH FORMS" compared to be + different characters to the normal width characters. + + When text is displayed horizontally, left-to-right directionality + should be used. + + For practical communication, use of "ISO-10646-J-1" is discouraged. + ISO-2022-JP" [2022JP] should be used instead. + + + + +M. Ohta Informational [Page 4] + +RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995 + + +MIME Considerations + + The names given to the character encoding methods described in this + memo are, respectively, "ISO-10646" and "ISO-10646-J-1". This name + is intended to be used in MIME messages as follows: + + Content-Type: text/plain; charset=iso-10646 + + The ISO-10646 and ISO-10646-J-1 encoding are in 16-bit form, so it is + often necessary to use a Content-Transfer-Encoding header. Base64 + should be useful. + + The ISO-10646 and ISO-10646-J-1 may also be used in MIME Part 2 + headers [RFC1522]. The "B" encoding should be used with them. + +References + + [10646] International Organization for Standardization (ISO), + "Universal Multiple-Octet Coded Character Set (UCS)", + International Standard, Ref. No. ISO/IEC 10646-1:1993 + (E). + + [2022INT] (An Internet Draft "draft-ohta-text-encoding-*.txt" may + be available). + + [2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese + Character Encoding for Internet Messages", RFC 1468, June + 1993. + + [ISO2022] International Organization for Standardization (ISO), + "Information processing -- ISO 7-bit and 8-bit coded + character sets -- Code extension techniques", + International Standard, Ref. No. ISO 2022-1986 (E). + + [JISX0208] Japanese Standards Association, "Code of the Japanese + graphic character set for information interchange", JIS X + 0208-1990. + + [RFC1345] Simonsen, K., "Character Mnemonics & Character Sets", + RFC-1345, Rationel Almen Planlaegning, June 1992. + + [RFC1521] Borenstein, N., and Freed, N., "MIME (Multipurpose + Internet Mail Extensions) Part One: Mechanisms for + Specifying and Describing the Format of Internet Message + Bodies", RFC 1521, September 1993. + + + + + + +M. Ohta Informational [Page 5] + +RFC 1815 Character Sets ISO-10646 and ISO-10646-J-1 July 1995 + + + [RFC1522] Moore, K., "MIME (Multipurpose Internet Mail Extensions) + Part Two: Message Header Extensions for Non-ASCII Text", + RFC 1522, September 1993. + + [RFC1556] Nussbacher, H., "Handling of Bi-directional Texts in + MIME" RFC 1556, Israeli Inter-University Computer Center, + December 1993. + + [TIS] Thai Industrial Standard for Thai Character Code for + Computer, TIS 620-2533:1990. + +Security Considerations + + Security issues are not discussed in this memo. + +Author's Address + + Masataka Ohta + Tokyo Institute of Technology + 2-12-1, O-okayama, Meguro-ku, + Tokyo 152, JAPAN + + Phone: +81-3-5499-7084 + Fax: +81-3-3729-1940 + EMail: mohta@cc.titech.ac.jp + + + + + + + + + + + + + + + + + + + + + + + + + + +M. Ohta Informational [Page 6] + |