diff options
Diffstat (limited to 'doc/rfc/rfc1554.txt')
-rw-r--r-- | doc/rfc/rfc1554.txt | 339 |
1 files changed, 339 insertions, 0 deletions
diff --git a/doc/rfc/rfc1554.txt b/doc/rfc/rfc1554.txt new file mode 100644 index 0000000..544b34e --- /dev/null +++ b/doc/rfc/rfc1554.txt @@ -0,0 +1,339 @@ + + + + + + +Network Working Group M. Ohta +Request for Comments: 1554 Tokyo Institute of Technology +Category: Informational K. Handa + ETL + December 1993 + + + ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP + +Status of this Memo + + This memo provides information for the Internet community. This memo + does not specify an Internet standard of any kind. Distribution of + this memo is unlimited. + +Introduction + + This memo describes a text encoding scheme: "ISO-2022-JP-2", which is + used experimentally for electronic mail [RFC822] and network news + [RFC1036] messages in several Japanese networks. The encoding is a + multilingual extension of "ISO-2022-JP", the existing encoding for + Japanese [2022JP]. The encoding is supported by an Emacs based + multilingual text editor: MULE [MULE]. + + The name, "ISO-2022-JP-2", is intended to be used in the "charset" + parameter field of MIME headers (see [MIME1] and [MIME2]). + +Description + + The text with "ISO-2022-JP-2" starts in ASCII [ASCII], and switches + to other character sets of ISO 2022 [ISO2022] through limited + combinations of escape sequences. All the characters are encoded + with 7 bits only. + + At the beginning of text, the existence of an announcer sequence: + "ESC 2/0 4/1 ESC 2/0 4/6 ESC 2/0 5/10" is (though omitted) assumed. + Thus, characters of 94 character sets are designated to G0 and + invoked as GL. C1 control characters are represented with 7 bits. + Characters of 96 character sets are designated to G2 and invoked with + SS2 (single shift two, "ESC 4/14" or "ESC N"). + + For example, the escape sequence "ESC 2/4 2/8 4/3" or "ESC $ ( C" + indicates that the bytes following the escape sequence are Korean KSC + characters, which are encoded in two bytes each. The escape sequence + "ESC 2/14 4/1" or "ESC . A" indicates that ISO 8859-1 is designated + to G2. After the designation, the single shifted sequence "ESC 4/14 + 4/1" or "ESC N A" is interpreted to represent a character "A with + acute". + + + +Ohta & Handa [Page 1] + +RFC 1554 Multilingual Extension of ISO-2022-JP December 1993 + + + The following table gives the escape sequences and the character sets + used in "ISO-2022-JP-2" messages. The reg# is the registration number + in ISO's registry [ISOREG]. + + 94 character sets + reg# character set ESC sequence designated to + ------------------------------------------------------------------ + 6 ASCII ESC 2/8 4/2 ESC ( B G0 + 42 JIS X 0208-1978 ESC 2/4 4/0 ESC $ @ G0 + 87 JIS X 0208-1983 ESC 2/4 4/2 ESC $ B G0 + 14 JIS X 0201-Roman ESC 2/8 4/10 ESC ( J G0 + 58 GB2312-1980 ESC 2/4 4/1 ESC $ A G0 + 149 KSC5601-1987 ESC 2/4 2/8 4/3 ESC $ ( C G0 + 159 JIS X 0212-1990 ESC 2/4 2/8 4/4 ESC $ ( D G0 + + 96 character sets + reg# character set ESC sequence designated to + ------------------------------------------------------------------ + 100 ISO8859-1 ESC 2/14 4/1 ESC . A G2 + 126 ISO8859-7(Greek) ESC 2/14 4/6 ESC . F G2 + + For further information about the character sets and the escape + sequences, see [ISO2022] and [ISOREG]. + + If there is any G0 designation in text, there must be a switch to + ASCII or to JIS X 0201-Roman before a space character (but not + necessarily before "ESC 4/14 2/0" or "ESC N ' '") or control + characters such as tab or CRLF. This means that the next line starts + in the character set that was switched to before the end of the + previous line. Though the designation to JIS X 0201-Roman is allowed + for backward compatibility to "ISO-2022-JP", its use is discouraged. + Applications such as pagers and editors which randomly seek within a + text file encoded with "ISO-2022-JP-2" may assume that all the lines + begin with ASCII, not with JIS X 0201-Roman. + + At the beginning of a line, information on G2 designation of the + previous line is cleared. New designation must be given before a + character in 96 character sets is used in the line. + + The text must end in ASCII designated to G0. + + As the "ISO-2022-JP", and thus, "ISO-2022-JP-2", is designed to + represent English and modern Japanese, left-to-right directionality + is assumed if the text is displayed horizontally. + + Users of "ISO-2022-JP-2" must be aware that some common transport + such as old Bnews can not relay a 7-bit value 7/15 (decimal 127), + which is used to encode, say, "y with diaeresis" of ISO 8859-1. + + + +Ohta & Handa [Page 2] + +RFC 1554 Multilingual Extension of ISO-2022-JP December 1993 + + + Other restrictions are given in the Formal Syntax section below. + +Formal Syntax + + The notational conventions used here are identical to those used in + STD 11, RFC 822 [RFC822]. + + The * (asterisk) convention is as follows: + + l*m something + + meaning at least l and at most m somethings, with l and m taking + default values of 0 and infinity, respectively. + + message = headers 1*(CRLF text) + ; see also [MIME1] "body-part" + ; note: must end in ASCII + + text = *(single-byte-char / + g2-desig-seq / + single-shift-char) + [*segment + reset-seq + *(single-byte-char / + g2-desig-seq / + single-shift-char ) ] + ; note: g2-desig-seq must + ; precede single-shift-char + + headers = <see [RFC822] "fields" and [MIME1] "body-part"> + + segment = single-byte-segment / double-byte-segment + + single-byte-segment = single-byte-seq + *(single-byte-char / + g2-desig-seq / + single-shift-char ) + + double-byte-segment = double-byte-seq + *((one-of-94 one-of-94) / + g2-desig-seq / + single-shift-char ) + + reset-seq = ESC "(" ( "B" / "J" ) + + single-byte-seq = ESC "(" ( "B" / "J" ) + + double-byte-seq = (ESC "$" ( "@" / "A" / "B" )) / + + + +Ohta & Handa [Page 3] + +RFC 1554 Multilingual Extension of ISO-2022-JP December 1993 + + + (ESC "$" "(" ( "C" / "D" )) + + g2-desig-seq = ESC "." ( "A" / "F" ) + + single-shift-seq = ESC "N" + + single-shift-char = single-shift-seq one-of-96 + + CRLF = CR LF + + ; ( Octal, Decimal.) + + ESC = <ISO 2022 ESC, escape> ; ( 33, 27.) + + SI = <ISO 2022 SI, shift-in> ; ( 17, 15.) + + SO = <ISO 2022 SO, shift-out> ; ( 16, 14.) + + CR = <ASCII CR, carriage return>; ( 15, 13.) + + LF = <ASCII LF, linefeed> ; ( 12, 10.) + + one-of-94 = <any one of 94 values> ; (41-176, 33.-126.) + + one-of-96 = <any one of 96 values> ; (40-177, 32.-127.) + + 7BIT = <any 7-bit value> ; ( 0-177, 0.-127.) + + single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT + including CRLF, and not including ESC, SI, SO> + +MIME Considerations + + The name given to the character encoding is "ISO-2022-JP-2". This + name is intended to be used in MIME messages as follows: + + Content-Type: text/plain; charset=iso-2022-jp-2 + + The "ISO-2022-JP-2" encoding is already in 7-bit form, so it is not + necessary to use a Content-Transfer-Encoding header. It should be + noted that applying the Base64 or Quoted-Printable encoding will + render the message unreadable in non-MIME-compliant software. + + "ISO-2022-JP-2" may also be used in MIME headers. Both "B" and "Q" + encoding could be useful with "ISO-2022-JP-2" text. + + + + + + +Ohta & Handa [Page 4] + +RFC 1554 Multilingual Extension of ISO-2022-JP December 1993 + + +References + + [ASCII] American National Standards Institute, "Coded character set + -- 7-bit American national standard code for information + interchange", ANSI X3.4-1986. + + + [ISO2022] International Organization for Standardization (ISO), + "Information processing -- ISO 7-bit and 8-bit coded + character sets -- Code extension techniques", + International Standard, Ref. No. ISO 2022-1986 (E). + + [ISOREG] International Organization for Standardization (ISO), + "International Register of Coded Character Sets To Be Used + With Escape Sequences". + + [MIME1] Borenstein, N., and N. Freed, "MIME (Multipurpose Internet + Mail Extensions) Part One: Mechanisms for Specifying and + Describing the Format of Internet Message Bodies", RFC 1521, + September 1993. + + [MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part + Two: Message Header Extensions for Non-ASCII Text", RFC 1522, + September 1993. + + [RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text + Messages", STD 11, RFC 1522, UDEL, August 1982. + + [RFC1036] Horton M., and R. Adams, "Standard for Interchange of + USENET Messages", RFC 1036, AT&T Bell Laboratories, Center + for Seismic Studies, December 1987. + + [2022JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese + Character Encoding for Internet Messages", RFC 1468, June + 1993. + + [MULE] Nishikimi, M., Handa, K., and S. Tomura, "Mule: MULtilingual + Enhancement to GNU Emacs", Proc. of INET'93, August, 1993. + +Acknowledgements + + This memo is the result of discussion between various people in a + news group: fj.kanji and is reviewed by a mailing list: jp-msg + @iij.ad.jp. The Authors wish to thank in particular Prof. Eiichi + Wada for his suggestions based on profound knowledge in ISO 2022 and + related standards. + + + + + +Ohta & Handa [Page 5] + +RFC 1554 Multilingual Extension of ISO-2022-JP December 1993 + + +Security Considerations + + Security issues are not discussed in this memo. + +Authors' Addresses + + Masataka Ohta + Tokyo Institute of Technology + 2-12-1, O-okayama, Meguro-ku, + Tokyo 152, JAPAN + + Phone: +81-3-5499-7084 + Fax: +81-3-3729-1940 + EMail: mohta@cc.titech.ac.jp + + + Ken'ichi Handa + Electrotechnical Laboratory + Umezono 1-1-4, Tsukuba, + Ibaraki 305, JAPAN + + Phone: +81-298-58-5916 + Fax: +81-298-58-5918 + EMail: handa@etl.go.jp + + + + + + + + + + + + + + + + + + + + + + + + + + + +Ohta & Handa [Page 6] +
\ No newline at end of file |