diff options
Diffstat (limited to 'doc/rfc/rfc1468.txt')
-rw-r--r-- | doc/rfc/rfc1468.txt | 339 |
1 files changed, 339 insertions, 0 deletions
diff --git a/doc/rfc/rfc1468.txt b/doc/rfc/rfc1468.txt new file mode 100644 index 0000000..e8cb5b8 --- /dev/null +++ b/doc/rfc/rfc1468.txt @@ -0,0 +1,339 @@ + + + + + + +Network Working Group J. Murai +Request for Comments: 1468 Keio University + M. Crispin + Panda Programming + E. van der Poel + June 1993 + + + Japanese Character Encoding for Internet Messages + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard. Distribution of this memo is + unlimited. + +Introduction + + This document describes the encoding used in electronic mail [RFC822] + and network news [RFC1036] messages in several Japanese networks. It + was first specified by and used in JUNET [JUNET]. The encoding is now + also widely used in Japanese IP communities. + + The name given to this encoding is "ISO-2022-JP", which is intended + to be used in the "charset" parameter field of MIME headers (see + [MIME1] and [MIME2]). + +Description + + The text starts in ASCII [ASCII], and switches to Japanese characters + through an escape sequence. For example, the escape sequence ESC $ B + (three bytes, hexadecimal values: 1B 24 42) indicates that the bytes + following this escape sequence are Japanese characters, which are + encoded in two bytes each. To switch back to ASCII, the escape + sequence ESC ( B is used. + + The following table gives the escape sequences and the character sets + used in ISO-2022-JP messages. The ISOREG number is the registration + number in ISO's registry [ISOREG]. + + Esc Seq Character Set ISOREG + + ESC ( B ASCII 6 + ESC ( J JIS X 0201-1976 ("Roman" set) 14 + ESC $ @ JIS X 0208-1978 42 + ESC $ B JIS X 0208-1983 87 + + Note that JIS X 0208 was called JIS C 6226 until the name was changed + + + +Murai, Crispin & van der Poel [Page 1] + +RFC 1468 Japanese Character Encoding for Internet Messages June 1993 + + + on March 1st, 1987. Likewise, JIS C 6220 was renamed JIS X 0201. + + The "Roman" character set of JIS X 0201 [JISX0201] is identical to + ASCII except for backslash () and tilde (~). The backslash is + replaced by the Yen sign, and the tilde is replaced by overline. This + set is Japan's national variant of ISO 646 [ISO646]. + + The JIS X 0208 [JISX0208] character sets consist of Kanji, Hiragana, + Katakana and some other symbols and characters. Each character takes + up two bytes. + + For further details about the JIS Japanese national character set + standards, refer to [JISX0201] and [JISX0208]. For further + information about the escape sequences, see [ISO2022] and [ISOREG]. + + If there are JIS X 0208 characters on a line, there must be a switch + to ASCII or to the "Roman" set of JIS X 0201 before the end of the + line (i.e., before the CRLF). This means that the next line starts in + the character set that was switched to before the end of the previous + line. + + Also, the text must end in ASCII. + + Other restrictions are given in the Formal Syntax below. + +Formal Syntax + + The notational conventions used here are identical to those used in + RFC 822 [RFC822]. + + The * (asterisk) convention is as follows: + + l*m something + + meaning at least l and at most m somethings, with l and m taking + default values of 0 and infinity, respectively. + + + message = headers 1*( CRLF *single-byte-char *segment + single-byte-seq *single-byte-char ) + ; see also [MIME1] "body-part" + ; note: must end in ASCII + + headers = <see [RFC822] "fields" and [MIME1] "body-part"> + + segment = single-byte-segment / double-byte-segment + + single-byte-segment = single-byte-seq 1*single-byte-char + + + +Murai, Crispin & van der Poel [Page 2] + +RFC 1468 Japanese Character Encoding for Internet Messages June 1993 + + + double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 ) + + single-byte-seq = ESC "(" ( "B" / "J" ) + + double-byte-seq = ESC "$" ( "@" / "B" ) + + CRLF = CR LF + + ; ( Octal, Decimal.) + + ESC = <ISO 2022 ESC, escape> ; ( 33, 27.) + + SI = <ISO 2022 SI, shift-in> ; ( 17, 15.) + + SO = <ISO 2022 SO, shift-out> ; ( 16, 14.) + + CR = <ASCII CR, carriage return>; ( 15, 13.) + + LF = <ASCII LF, linefeed> ; ( 12, 10.) + + one-of-94 = <any one of 94 values> ; (41-176, 33.-126.) + + 7BIT = <any 7-bit value> ; ( 0-177, 0.-127.) + + single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT + including CRLF, and not including ESC, SI, SO> + +MIME Considerations + + The name given to the JUNET character encoding is "ISO-2022-JP". This + name is intended to be used in MIME messages as follows: + + Content-Type: text/plain; charset=iso-2022-jp + + The ISO-2022-JP encoding is already in 7-bit form, so it is not + necessary to use a Content-Transfer-Encoding header. It should be + noted that applying the Base64 or Quoted-Printable encoding will + render the message unreadable in current JUNET software. + + ISO-2022-JP may also be used in MIME Part 2 headers. The "B" + encoding should be used with ISO-2022-JP text. + +Background Information + + The JUNET encoding was described in the JUNET User's Guide [JUNET] + (JUNET Riyou No Tebiki Dai Ippan). + + The encoding is based on the particular usage of ISO 2022 announced + + + +Murai, Crispin & van der Poel [Page 3] + +RFC 1468 Japanese Character Encoding for Internet Messages June 1993 + + + by 4/1 (see [ISO2022] for details). However, the escape sequence + normally used for this announcement is not included in ISO-2022-JP + messages. + + The Kana set of JIS X 0201 is not used in ISO-2022-JP messages. + + In the past, some systems erroneously used the escape sequence ESC ( + H in JUNET messages. This escape sequence is officially registered + for a Swedish character set [ISOREG], and should not be used in ISO- + 2022-JP messages. + + Some systems do not distinguish between ESC ( B and ESC ( J or + between ESC $ @ and ESC $ B for display. However, when relaying a + message to another system, the escape sequences must not be altered + in any way. + + The human user (not implementor) should try to keep lines within 80 + display columns, or, preferably, within 75 (or so) columns, to allow + insertion of ">" at the beginning of each line in excerpts. Each JIS + X 0208 character takes up two columns, and the escape sequences do + not take up any columns. The implementor is reminded that JIS X 0208 + characters take up two bytes and should not be split in the middle to + break lines for displaying, etc. + + The JIS X 0208 standard was revised in 1990, to add two characters at + the end of the table. Although ISO 2022 specifies special additional + escape sequences to indicate the use of revised character sets, it is + suggested here not to make use of this special escape sequence in + ISO-2022-JP text, even if the two characters added to JIS X 0208 in + 1990 are used. + + For further information about Japanese character encodings such as PC + codes, FTP locations of implementations, etc, see "Electronic + Handling of Japanese Text" [JPN.INF]. + +References + + [ASCII] American National Standards Institute, "Coded character set + -- 7-bit American national standard code for information + interchange", ANSI X3.4-1986. + + [ISO646] International Organization for Standardization (ISO), + "Information technology -- ISO 7-bit coded character set for + information interchange", International Standard, Ref. No. ISO/IEC + 646:1991. + + [ISO2022] International Organization for Standardization (ISO), + "Information processing -- ISO 7-bit and 8-bit coded character sets + + + +Murai, Crispin & van der Poel [Page 4] + +RFC 1468 Japanese Character Encoding for Internet Messages June 1993 + + + -- Code extension techniques", International Standard, Ref. No. ISO + 2022-1986 (E). + + [ISOREG] International Organization for Standardization (ISO), + "International Register of Coded Character Sets To Be Used With + Escape Sequences". + + [JISX0201] Japanese Standards Association, "Code for Information + Interchange", JIS X 0201-1976. + + [JISX0208] Japanese Standards Association, "Code of the Japanese + graphic character set for information interchange", JIS X 0208-1978, + -1983 and -1990. + + [JPN.INF] Ken R. Lunde <lunde@adobe.com>, "Electronic Handling of + Japanese Text", March 1992, + msi.umn.edu(128.101.24.1):pub/lunde/japan[123].inf + + [JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide + Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET + User's Guide (First Edition)"), February 1988. + + [MIME1] Borenstein N., and N. Freed, "MIME (Multipurpose + Internet Mail Extensions): Mechanisms for Specifying and + Describing the Format of Internet Message Bodies", RFC 1341, + Bellcore, Innosoft, June 1992. + + [MIME2] Moore, K., "Representation of Non-ASCII Text in Internet + Message Headers", RFC 1342, University of Tennessee, June 1992. + + [RFC822] Crocker, D., "Standard for the Format of ARPA Internet + Text Messages", STD 11, RFC 822, UDEL, August 1982. + + [RFC1036] Horton M., and R. Adams, "Standard for Interchange of USENET + Messages", RFC 1036, AT&T Bell Laboratories, Center for Seismic + Studies, December 1987. + +Acknowledgements + + Many people assisted in drafting this document. The authors wish to + thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi + Handa. + +Security Considerations + + Security issues are not discussed in this memo. + + + + + +Murai, Crispin & van der Poel [Page 5] + +RFC 1468 Japanese Character Encoding for Internet Messages June 1993 + + +Authors' Addresses + + Jun Murai + Keio University + 5322 Endo, Fujisawa + Kanagawa 252 Japan + + Fax: +81 466 49 1101 + EMail: jun@wide.ad.jp + + + Mark Crispin + Panda Programming + 6158 Lariat Loop NE + Bainbridge Island, WA 98110-2098 + USA + + Phone: +1 206 842 2385 + EMail: MRC@PANDA.COM + + + Erik M. van der Poel + A-105 Park Avenue + 4-4-10 Ohta, Kisarazu + Chiba 292 Japan + + Phone: +81 438 22 5836 + Fax: +81 438 22 5837 + EMail: erik@poel.juice.or.jp + + + + + + + + + + + + + + + + + + + + + + +Murai, Crispin & van der Poel [Page 6] +
\ No newline at end of file |