diff options
Diffstat (limited to 'doc/rfc/rfc2237.txt')
-rw-r--r-- | doc/rfc/rfc2237.txt | 339 |
1 files changed, 339 insertions, 0 deletions
diff --git a/doc/rfc/rfc2237.txt b/doc/rfc/rfc2237.txt new file mode 100644 index 0000000..21dac0a --- /dev/null +++ b/doc/rfc/rfc2237.txt @@ -0,0 +1,339 @@ + + + + + + +Network Working Group K. Tamaru +Request for Comments: 2237 Microsoft Corporation +Category: Informational November 1997 + + + + Japanese Character Encoding for Internet Messages + + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1997). All Rights Reserved. + +1. Abstract + + This memo defines an encoding scheme for the Japanese Characters, + describes "ISO-2022-JP-1", which is used in electronic mail [RFC- + 822], and network news [RFC 1036]. Also this memo provides a listing + of the Japanese Character Set that can be used in this encoding + scheme. + +2. Requirements Notation + + This document uses terms that appear in capital letters to indicate + particular requirements of this specification. Those terms are + "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY". The meaning of + each term are found in [RFC-2119] + +3. Introduction + + RFC 1468 defines the way Japanese Characters are encoded, likewise + what this memo defines. It defines the use of JIS X 0208 as the + double-byte character set in ISO-2022-JP text. + + Today, many operating systems support proprietary extended Japanese + characters or JIS X 0212, This includes the Unicode character set, + which does not conform to JIS X 0201 nor JIS X 0208. Therefore, this + limits the ability to communicate and correspond precise information + because of the limited availability of Kanji characters. Fortunately + JIS (Japanese Industry Standard) defines JIS X 0212 as "code of the + + + + + +Tamaru Informational [Page 1] + +RFC 2237 Japanese Character Encoding November 1997 + + + supplementary Japanese graphic character set for information + interchange". Most Japanese characters which are used in regular + electronic mail in most cases can be accommodated in JIS X 0201, JIS + X 0208 and JIS X 0212. + + Also it is recognized that there is a tendency to use Unicode, + however, Unicode is not yet widely used and there is a certain + limitation with old electronic mail system. Furthermore, the purpose + of this comment is to add the capability of writing out JIS X 0212. + + This comment does not describe any representation of iso-2022-jp-1 + version information in addition to JIS X 0212 support. + +4. Description + + In "ISO-2022-JP-1" text, the initial character code of the message is + in ASCII. The "double-byte-seq"(see "Format Syntax" section) (ESC "$" + "B" / ESC "$" "@" / ESC "$" "(" "D") is the only designator that + indicates that the following character is double-byte, and it is + valid until another escape sequence appears. It is very discouraged + to use (ESC "$" "@") for double byte character encoding, new + implementation SHOULD use only (ESC "$" "B") for double byte encoding + instead. + + The end of "ISO-2022-JP-1" text MUST be in ASCII. Also it is strongly + recommended to back up to the ASCII at the end of each line rather + than JIS X 0201-Roman if there is any none ASCII character in middle + of a line. + + Since "ISO-2022-JP-1" is designed to add the capability of writing + out JIS X 0212, if the message does not contain none of JIS X 0212 + characters. "ISO-2022-JP" text MUST BE used. + + JIS X 0201-Roman is not identical to the ASCII with two different + characters. + + The following list are the escape sequences and character sets that + can be used in "ISO-2022-JP-1" text. The registered number in the ISO + 2375 Register which allow double-byte ideographic scripts to be + encoded within ISO/IEC 2022 code structure is indicated as reg# + below. + + reg# character set ESC sequence designated to + 6 ASCII ESC 2/8 4/2 ESC ( B G0 + 42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0 + 87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0 + 14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0 + 159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0 + + + +Tamaru Informational [Page 2] + +RFC 2237 Japanese Character Encoding November 1997 + + + Other restrictions are given in the Formal Syntax below. + +5. Formal Syntax + + The notational conventions used here are identical to those used in + STD 11, RFC 822 [RFC822]. + + The * (asterisk) convention is as follows: + l*m something + meaning at least l and at most m something, with l and m taking + default values of 0 and infinity, respectively. + + iso-2022-jp-1-text = *( line CRLF ) [line] + + line = (*single-byte-char *segment + single-byte-seq *single-byte-char) / + *single-byte-char + + segment = single-byte-segment / double-byte-segment + + single-byte-segment = single-byte-seq *single-byte-char + double-byte-segment = double-byte-seq *(one-of-94 one-of-94) + + reset-seq = ESC "(" ( "B" / "J" ) + single-byte-seq = ESC "(" ( "B" / "J" ) + double-byte-seq = (ESC "$" ( "@" / "B" )) / + (ESC "$" "(" "D" ) + + CRLF = CR LF;( Octal, Decimal.) + ESC = <ISO 2022 ESC, escape>;( 33,27.) + SI = <ISO 2022 SI, shift-in>;( 17,15.) + SO = <ISO 2022 SO, shift-out>;( 16,14.) + CR = <ASCII CR, carriage return>;( 15,13.) + LF = <ASCII LF, linefeed>;( 12,10.) + one-of-94 = <any one of 94 values>;(41-176,33.-126.) + one-of-96 = <any one of 96 values>;(40-177,32.-127.) + 7BIT = <any 7-bit value>;(0-177,0.-127.) + single-byte-char = <any 7BIT, including bare CR & bare LF, + but NOT including CRLF, and not including + ESC, SI, SO> + +6. Security Considerations + + This memo raises no known security issues. + + + + + + + +Tamaru Informational [Page 3] + +RFC 2237 Japanese Character Encoding November 1997 + + +7. MIME Considerations + + The name to be used for the Japanese encoding scheme in content is + "ISO-2022-JP-1". When this name is used in the MIME message form, it + would be: + + Content-Type: text/plain; charset=iso-2022-jp-1 + + Since the "ISO-2022-JP-1" is 7bit encoding, it will be unnecessary to + encode in another format by specifying the "Content-Transfer- + Encoding" header. Also applying Based64 or Quoted-Printable encoding + MAY cause today's software to fail to decode the message. + + "ISO-2022-JP-1" can be used in MIME headers. Also "ISO-2022-JP-1" + text can be used with Base64 or Quoted-Printable encoding. + +8. Additional Information + + As long as mail systems are capable of writing out Unicode, it is + recommended to also write out Unicode text in addition to "ISO- + 2022-JP-1" text. Also writing out "ISO-2022-JP" text in addition to + "ISO-2022-JP-1" is strongly encouraged for backward compatibility + reasons. + + Some mail systems write out 8bits characters in 'parameter' and + 'value' defined in [RFC 822] and [RFC 1521]. All 8bit characters MUST + NOT be used in those fields. The implementation of future mail + systems SHOULD support those only for interoperability reasons. + +9. References + + [ISO2022] + International Organization for Standardization (ISO), + "Information processing -- ISO 7-bit and 8-bit coded + character sets -- Code extension techniques", + International Standard, Ref. No. ISO 2022-1986 (E). + + [ISOREG] + International Organization for Standardization (ISO), + "International Register of Coded Character Sets To Be Used + With Escape Sequences". + + [RFC-822] + Crocker, D., "Standard for the Format of ARPA Internet + Text Messages", STD 11, RFC 822, August 1982. + + + + + + +Tamaru Informational [Page 4] + +RFC 2237 Japanese Character Encoding November 1997 + + + [RFC-1468] + Murai, J., Crispin, M., and E. van der Poel, "Japanese + Character Encoding for Internet Messages", RFC 1468, June + 1993. + + [RFC-1766] + Alvestrand, H., "Tags for the Identification of + Languages", RFC 1766, March 1995. + + [RFC-2045] + Freed, N., and N. Borenstein, "Multipurpose Internet Mail + Extensions (MIME) Part One: Format of Internet Message + Bodies", RFC 2045, December 1996. + + [RFC-2046] + Freed, N., and N. Borenstein, "Multipurpose Internet Mail + Extensions (MIME) Part Two: Media Types", RFC 2046, + December 1996. + + [RFC-2047] + Moore, K., "Multipurpose Internet Mail Extensions (MIME) + Part Three: Representation of Non-ASCII Text in Internet + Message Headers", RFC 2047, December 1996. + + [RFC-2048] + Freed, N., Klensin, J. and J. Postel, "Multipurpose + Internet Mail Extensions (MIME) Part Four: MIME + Registration Procedures", RFC 2048, December 1996. + + [RFC-2049] + Freed, N., and N. Borenstein, "Multipurpose Internet Mail + Extensions (MIME) Part Five: Conformance Criteria and + Examples", RFC 2049, December 1996. + + [RFC-2119] + Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", RFC 2119, March 1997. + +Author's Address + + Kenzaburo Tamaru + Microsoft Corporation + One Microsoft Way + Redmond, WA 98052-6399 + + EMail: kenzat@microsoft.com + + + + + +Tamaru Informational [Page 5] + +RFC 2237 Japanese Character Encoding November 1997 + + +Full Copyright Statement + + Copyright (C) The Internet Society (1997). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Tamaru Informational [Page 6] + |