summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1922.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1922.txt')
-rw-r--r--doc/rfc/rfc1922.txt1515
1 files changed, 1515 insertions, 0 deletions
diff --git a/doc/rfc/rfc1922.txt b/doc/rfc/rfc1922.txt
new file mode 100644
index 0000000..981b91a
--- /dev/null
+++ b/doc/rfc/rfc1922.txt
@@ -0,0 +1,1515 @@
+
+
+
+
+
+
+Network Working Group HF. Zhu
+Request for Comments: 1922 Tsinghua U
+Category: Informational DY. Hu
+ Tsinghua U
+ ZG. Wang
+ CITS
+ TC. Kao
+ III
+ WCH. Chang
+ III
+ M. Crispin
+ U Washington
+ March 1996
+
+
+ Chinese Character Encoding for Internet Messages
+
+Status of this Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard. Distribution of this memo is
+ unlimited.
+
+Abstract
+
+ This memo describes methods of transporting Chinese characters in
+ Internet services which transport text, such as electronic mail
+ [RFC-822], network news [RFC-1036], telnet [RFC-854] and the World
+ Wide Web [RFC-1866].
+
+Introduction
+
+ As the use of Internet covers more and more Chinese people in the
+ world, the need has increased for the ability to send documents
+ containing Chinese characters on the Internet. The methods described
+ in this document provide means of transporting existing Chinese
+ character sets as well as leaving space for future extension.
+
+ This document describes two encodings, ISO-2022-CN and
+ ISO-2022-CN-EXT. These are designed with interoperability in mind
+ and are encouraged in this document for current Chinese interchange;
+ they are 7-bit, support both simplified and traditional characters
+ using both GB and CNS/Big5, and do not impose any unusual quoting
+ requirements on ASCII characters.
+
+ As important related issues, this document gives detailed
+ descriptions of the two encodings CN-GB and CN-Big5, and a brief
+ description of ISO/IEC 10646 [ISO-10646]. CN-GB and CN-Big5 are
+
+
+
+Zhu, et al Informational [Page 1]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ currently used as the internal codes for Chinese documents.
+ ISO-10646 is the universal multi-octet character set defined by ISO;
+ we feel that in the future it may become the preferred technology for
+ Chinese documents and electronic mail when it is widely available.
+
+Specification
+
+1. 7-bit Chinese encodings: ISO-2022-CN and ISO-2022-CN-EXT
+
+1.1. Description
+
+ ISO-2022-CN is based on ISO 2022 [ISO-2022], similar to earlier work
+ on ISO-2022-JP [RFC-1468] and ISO-2022-KR [RFC-1557] for the Japanese
+ and Korean languages respectively. It is 7-bit, and supports both
+ simplified Chinese characters using GB 2312-80 [GB-2312] and
+ traditional Chinese characters using the first two planes of CNS
+ 11643 [CNS-11643], as well as ASCII [ASCII] characters.
+
+ ISO-2022-CN-EXT is a superset of ISO-2022-CN that additionally
+ supports other GB character sets and planes of CNS 11643.
+
+ Since ISO-2022-CN and ISO-2022-CN-EXT are 7-bit encodings, they do
+ not require the 8-bit SMTP extensions. ISO-2022-CN supports all the
+ Chinese characters that appear in Big5 [BIG5].
+
+1.2. ISO-2022-CN
+
+ The starting code of ISO-2022-CN is ASCII. ASCII and Chinese
+ characters are distinguished by designations (ESC sequences) and
+ shift functions.
+
+ Designations define the Chinese character sets used in the text.
+ There are three kinds of designations: SOdesignation, SS2designation
+ and SS3designation.
+
+ The SOdesignation is in the form ESC $ ) <F>, where <F> is the "final
+ character" assigned to the character set by ISO (refer to the ISO
+ registry [ISOREG] for more details). The SS2designation is in the
+ form ESC $ * <F>, and the SS3designation is in the form ESC $ + <F>.
+ A designation overrides any previous designation for subsequent bytes
+ in the text.
+
+ There are four kinds of shifts: SI, SO, SS2 and SS3. Shift functions
+ specify how to interpret the subsequent bytes.
+
+ The shift SI (one byte with hexadecimal value 0F) declares that
+ subsequent bytes are interpreted in ASCII.
+
+
+
+
+Zhu, et al Informational [Page 2]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ The shift SO (one byte with hexadecimal value 0E) declares that
+ subsequent bytes are interpreted in the character set defined by
+ SOdesignation.
+
+ The shift SS2 (two bytes with hexadecimal values 1B 4E) declares that
+ the subsequent TWO bytes are interpreted in the character set defined
+ by SS2designation, after which the previous interpretation (from SI
+ or SO) is restored.
+
+ The shift SS3 (two bytes with hexadecimal values 1B 4F) declares that
+ the subsequent TWO bytes are interpreted in the character set defined
+ by SS3designation, after which the previous interpretation (from SI
+ or SO) is restored.
+
+ The escape sequences, shift functions and character sets used in an
+ ISO-2022-CN text are as follows:
+
+ Character sets Shift in with
+ --------------------------------------------------------------------
+ ASCII SI
+ GB 2312, CNS 11643-plane-1 SO
+ CNS 11643-plane-2 SS2
+
+ ESC $ ) A Indicates the bytes following SO are Chinese
+ characters as defined in GB 2312-80, until
+ another SOdesignation appears
+
+ ESC $ ) G Indicates the bytes following SO are as defined
+ in CNS 11643-plane-1, until another
+ SOdesignation appears
+
+ ESC $ * H Indicates the two bytes immediately following
+ SS2 is a Chinese character as defined in CNS
+ 11643-plane-2, until another SS2designation
+ appears
+
+ If there are any GB or CNS characters on a line, a designation for
+ the corresponding character set must be used so that each line has
+ its own character set information and the text can be displayed
+ correctly when scroll back in a window. Also, there must be a shift
+ to ASCII (SI) before the end of the line (i.e., before the CRLF). In
+ other words, each line starts in ASCII, and ends in ASCII.
+
+ Example: the hex sequence
+
+ 1b 24 29 41 0e 3d 3b 3b 3b 1b 24 29 47 47 28 5f 50 0f
+
+ represents the Chinese word for "Interchange" (jiao huan) twice;
+
+
+
+Zhu, et al Informational [Page 3]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ the first time in simplified form using GB-2312 (the 3d 3b 3b 3b
+ sequence above), and the second time in traditional form using
+ CNS-11643 (the 47 28 5f 50 sequence above). The sequence 1b 24 29
+ 41 is the SOdesignation for GB-2312, the 0e is SO to switch to
+ Chinese from ASCII, the 1b 24 29 47 is the SOdesignation for
+ CNS-11643 plane 1, and finally the 0f is the SI to return to ASCII
+ at the end of the line.
+
+ The name given to this character encoding is "ISO-2022-CN". This name
+ is intended to be used as the "charset" parameter in MIME [MIME-1,
+ MIME-2] messages.
+
+ Content-Type: text/plain; charset=iso-2022-cn
+
+ The ISO-2022-CN encoding is already in 7-bit form, so it is not
+ necessary to use a Content-Transfer-Encoding header.
+
+ Other restrictions are given in the "Formal Syntax of ISO-2022-CN"
+ (Section 7.1 of this document).
+
+1.3. ISO-2022-CN-EXT
+
+ ISO-2022-CN-EXT supports all characters in existing GB, Big5 and CNS
+ 11643 character sets.
+
+ The escape sequences, shift functions and character sets used in an
+ ISO-2022-CN-EXT text are as follows:
+
+ Character sets Shift in with
+ --------------------------------------------------------------------
+ ASCII SI
+ GB 2312, GB 12345, CNS 11643-plane-1, ISO-IR-165 SO
+ GB 7589, GB 13131, CNS 11643-plane-2 SS2
+ GB 7590, GB 13132 or other new GBs,CNS 11643-plane-3 or SS3
+ higher planes of CNS 11643
+
+ Note: Currently, there are some GB sets that have not been
+ registered in ISO. Here <X7589>, <X7590>, <X12345>, <X13131> and
+ <X13132> represent the final character that will be assigned by
+ ISO for those sets. These GB sets shall only be used once these
+ final characters are assigned.
+
+
+
+
+
+
+
+
+
+
+Zhu, et al Informational [Page 4]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ ESC $ ) A Indicates the bytes following SO are Chinese
+ characters as defined in GB 2312-80, until
+ another SOdesignation appears
+
+ ESC $ * <X7589> Indicates the two bytes immediately following
+ SS2 is a Chinese character as defined in GB
+ 7589-87 [GB-7589], until another SS2designation
+ appears
+
+ ESC $ + <X7590> Indicates the two bytes immediately following
+ SS3 is a Chinese character as defined in GB
+ 7590-87 [GB-7590], until another SS3designation
+ appears
+
+ ESC $ ) <X12345> Indicates the bytes following SO are as defined
+ in GB 12345-90 [GB-12345], until another
+ SOdesignation appears
+
+ ESC $ * <X13131> Indicates the two bytes immediately following
+ SS2 is a Chinese character as defined in GB
+ 13131-91 [GB-13131], until another
+ SS2designation appears
+
+ ESC $ + <X13132> Indicates the two bytes immediately following
+ SS3 is a Chinese character as defined in GB
+ 13132-91 [GB-13131], until another
+ SS3designation appears
+
+ ESC $ ) E Indicates the bytes following SO are as defined
+ in ISO-IR-165 (for details, see section 2.1),
+ until another SOdesignation appears
+
+ ESC $ ) G Indicates the bytes following SO are as defined
+ in CNS 11643-plane-1, until another
+ SOdesignation appears
+
+ ESC $ * H Indicates the two bytes immediately following
+ SS2 is a Chinese character as defined in CNS
+ 11643-plane-2, until another SS2designation
+ appears
+
+ ESC $ + I Indicates the immediate two bytes following SS3
+ is a Chinese character as defined in CNS
+ 11643-plane-3, until another SS3designation
+ appears
+
+
+
+
+
+
+Zhu, et al Informational [Page 5]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ ESC $ + J Indicates the immediate two bytes following SS3
+ is a Chinese character as defined in CNS
+ 11643-plane-4, until another SS3designation
+ appears
+
+ ESC $ + K Indicates the immediate two bytes following SS3
+ is a Chinese character as defined in CNS
+ 11643-plane-5, until another SS3designation
+ appears
+
+ ESC $ + L Indicates the immediate two bytes following SS3
+ is a Chinese character as defined in CNS
+ 11643-plane-6, until another SS3designation
+ appears
+
+ ESC $ + M Indicates the immediate two bytes following SS3
+ is a Chinese character as defined in CNS
+ 11643-plane-7, until another SS3designation
+ appears
+
+ As in ISO-2022-CN, each line starts in ASCII, and ends in ASCII, and
+ has its own designation information before any Chinese characters
+ appear.
+
+ The name given to this character encoding is "ISO-2022-CN-EXT". This
+ name is intended to be used as the "charset" parameter in MIME
+ messages.
+
+ Content-Type: text/plain; charset=ISO-2022-CN-EXT
+
+ The ISO-2022-CN-EXT encoding is also in 7-bit form, so it is not
+ necessary to use a Content-Transfer-Encoding header.
+
+ Other restrictions are given in the "Formal Syntax of
+ ISO-2022-CN-EXT" (Section 7.2 of this document).
+
+1.4. How to Support Big5 or other internal codesets with ISO-2022-CN
+ and ISO-2022-CN-EXT
+
+ Since there are many different Chinese internal coding systems
+ [CJKINF], such as EUC GB, Big5, CCCII (an encoding for library
+ systems mainly used in Taiwan), GBK (the new standard specification
+ for Chinese internal code, also is the codepage for Microsoft
+ simplified Chinese Windows 95) etc., ISO-2022-CN and ISO-2022-CN-EXT,
+ which are 7-bit and will not lose information during communication
+ among different codesets, facilitate interchange between the various
+ Chinese coding systems in the Internet.
+
+
+
+
+Zhu, et al Informational [Page 6]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ For instance, ISO-2022-CN and ISO-2022-CN-EXT can be used to support
+ the popular Big5 codeset, because the first two planes of CNS-11643
+ contain the same Chinese characters as Big5's "common part" except
+ two duplicate characters. By the "common part" we mean the part that
+ is not specific to any Big5 vendor, consisting of 5401 more
+ frequently used characters in Big5 range 0xA440-0xC67E, 7652 less
+ frequently used characters in Big5 range 0xC940-0xF9D5, and 441 other
+ symbols in Big5 range 0xA140-0xA3E0, as defined in Institute for
+ Information Industry's (III) technical report C-26 (see also [Big5]).
+ The appendix of this document presents a conversion table for
+ converting Big5 into CNS-11643, including specific extensions of some
+ popular vendors. For other extensions, vendors and implementors of
+ Big5 products are ENCOURAGED to create detailed conversion tables, in
+ order to increase interoperability between different coding systems.
+
+ Public domain software (binary or C source code) for conversion
+ between Big5 and CNS-11643 is available on many Internet sites. At
+ the time of this writing, the following FTP sites and software are
+ advertised:
+
+ 1) Beijing:
+ ftp://ftp.net.tsinghua.edu.cn/pub/Chinese/convert/big5cns.zip
+ (IP address: 166.111.1.6)
+
+ 2) Xi'an:
+ ftp://ftp.xanet.edu.cn
+ /pub/chinese-soft/unix/convert/BeTTY-1.534.tar.gz
+ (IP address: 202.112.11.131)
+
+ 3) Taiwan:
+ ftp://ftp.seed.net.tw/Pub/Chinese/DOS/code-convert/chcode.zip
+ (IP address: 140.92.1.65)
+
+ 4) US:
+ ftp://ftp.ifcss.org/pub/software/unix/convert/BeTTY-1.534.tar.gz
+ (IP address: 128.123.1.55)
+
+ 5) Japan:
+ ftp://etlport.etl.go.jp/pub/iso-2022-cn/convert/big5cns.zip
+ (IP address: 192.31.197.99)
+
+
+
+
+
+
+
+
+
+
+
+Zhu, et al Informational [Page 7]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+2. 8-bit Chinese encodings: CN-GB and CN-Big5
+
+ The CN-GB and CN-Big5 MIME charsets are defined below.
+
+ Note: the use of 8-bit character sets requires the use of either
+ an 8-to-7 Content-Transfer-Encoding mechanism such as "BASE64" or
+ "QUOTED-PRINTABLE" if the network is not 8-bit clean, or the 8-bit
+ SMTP extensions [SMTPEXT] with the "8BIT"
+ Content-Transfer-Encoding on 8-bit clean networks. Otherwise, an
+ 8-bit message that passes through a 7-bit mailer is likely to have
+ the 8th bit truncated, resulting in an unreadable message.
+ Although "just send 8-bit data" has been common practice in the
+ past, it is incorrect according to the Internet standards and
+ causes interoperability problems.
+
+2.1. CN-GB
+
+ E-mail using CN-GB characters is sent in this way:
+
+ GB 2312-80 characters are used with ASCII characters, not GB 1988-89
+ [GB-1988].
+
+ GB 2312-80 is also 7-bit, to avoid conflicting with ASCII. If the
+ character is from GB 2312-80, the MSB (bit-8) of each byte is set to
+ 1, and therefore becomes a 8-bit character. Otherwise, the byte is
+ interpreted as ASCII. This constructs a character set named "GB
+ Internal Code".
+
+ This method is also adopted in the .gb files in the Internet.
+
+ To use this character scheme with MIME, CN-GB is used as the value
+ for the charset parameter:
+
+ Content-Type: text/plain; charset=cn-gb; charset-edition=1980
+
+ Note: The "charset-edition" is a new MIME parameter described in
+ section 4.1 of the "Specification" part of this document.
+
+ GB 12345-90 is the traditional form of GB 2312, the charset name
+ given to this set is CN-GB-12345 with the charset-edition of 1990.
+
+ There are also character sets that can only be used with other GB
+ sets. For example, GB 8565-88 [GB-8565] is used with GB 2312 and
+ some other characters to form the ISO-IR-165 set (also known as GB
+ 2312 + GB 8565.2). ISO-IR-165 contains all characters from GB
+ 2312-80 as revised by GB 6345.1-86 and GB 8565.2-88. Its MIME
+ charset name is CN-GB-ISOIR165 with the charset-edition of 1992.
+
+
+
+
+Zhu, et al Informational [Page 8]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ CN-GB-12345 and CN-GB-ISOIR165 support ASCII in a similar manner to
+ CN-GB; the MSB of Chinese characters is set to 1 to distinguish from
+ ASCII.
+
+ Note: There are some supplementary character sets in GB, i.e. GB
+ 7589-87, GB 7590-87, GB 13131-91 and GB 13132-91. Normally, they
+ won't be used independently without using GB-2312 or GB-12345, so
+ they are not necessarily to be registered. Characters in these
+ standards could be supported with ISO-2022-CN and ISO-2022-CN-EXT.
+ If, in the future, they need to be used with "charset" names, it
+ is the responsibility of any interested third party (the
+ standardization organization or anybody else) to write the
+ necessary documents and register the charset with the IANA. It is
+ encouraged that the charset names take the form of CN-GB-<number>,
+ such as CN-GB-12345, where <number> is the GB standard number. A
+ charset-edition should also be given. All CN-GB-<number> sets
+ should be coded in 8-bit in a similar fashion to CN-GB.
+
+ To ensure interoperability, the CN-GB charset should be used whenever
+ possible instead of a CN-GB-<number> charset.
+
+2.2. CN-Big5
+
+ Big5 is a two-byte character set of traditional Chinese characters,
+ widely used in Taiwan and overseas. E-mail of CN-Big5 is sent in
+ this way:
+
+ Big5 is used with ASCII. The MSB of ASCII characters is always 0.
+ The MSB of the first byte of a Big5 character is always 1; this
+ distinguishes it from an ASCII character. The second byte has 8
+ significant bits. Therefore, CN-Big5 is an 8-bit encoding with a
+ 15-bit codespace.
+
+ To use this character scheme with MIME, CN-Big5 is used as the value
+ for the charset parameter:
+
+ Content-Type: text/plain; charset=cn-big5; charset-edition=1984
+
+ Note: The "charset-edition" is a new MIME parameter described in
+ section 4.1 of the "Specification" part of this document.
+
+3. Universal Multilingual Character Set: ISO/IEC-10646/Unicode
+
+ ISO/IEC 10646 defines a 32bit character space with the intent to
+ encode all characters in the world. Currently, only the lowest 16bit
+ plane of ISO 10646, the Basic Multilingual Plane (BMP), is defined.
+ The BMP is code-by-code identical to Unicode [Unicode 1.1]. it
+ contains a large repertoire of Chinese characters (it currently
+
+
+
+Zhu, et al Informational [Page 9]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ includes all the characters of GB 2312-80, GB 12345-90, GB 8565-89,
+ CNS 11643's plane 1 and 2, and part of some other standards) and
+ therefore can be used to transport Chinese characters in the Internet
+ community. This document does not give any details on how to do
+ this, as this has been done elsewhere. For details of using Unicode
+ with MIME, refer to RFC 1641 [RFC-1641], RFC 1642 [RFC-1642]. For
+ assigned names for 10646 set, refer to STD 2--"Assigned Numbers",
+ which is RFC 1700 [RFC-1700] currently. For more up-to-date assigned
+ numbers, please check:
+
+ ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
+
+4. Two New MIME parameters
+
+ Here we define two new MIME parameters to be used with "charset"
+ parameters.
+
+4.1. "charset-edition"
+
+ This parameter is used after the MIME "charset" parameter, using four
+ digits (AD) to indicate what the year of edition is for the character
+ set standard shown in "charset". Its use is optional.
+ Implementations should ignore this parameter unless the
+ implementation has specific support for that particular character set
+ edition.
+
+ The reason for defining this parameter is that there are often
+ differences in the defined characters between editions of a character
+ set standard. Sometimes, the difference can not be ignored,
+ otherwise implementations would have problems when processing it.
+ There are only two ways to indicate this difference, in the current
+ MIME syntax. One way is to indicate the edition in the charset name,
+ such as CN-GB-1988-80 (the 1980's edition of GB 1988). The other way
+ is to define a new optional parameter such as "charset-edition". The
+ latter way is better because receiving applications that can only
+ process an older edition can still recognize the character set and
+ offer to display the text in the older edition. This display may
+ have a few mistakes, but it is better than refusing to display any
+ text at all or defaulting to an inappropriate character set such as
+ US-ASCII or ISO-8859-1.
+
+4.2. "charset-extension"
+
+ This parameter is also used after the MIME "charset" parameter. It
+ is case-insensitive and optional, and any value of this parameter
+ should be registered in IANA. Unregistered value should start with
+ "x-" as with any MIME extension-token. Implementations should ignore
+ this parameter unless the implementation has specific support for
+
+
+
+Zhu, et al Informational [Page 10]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ that particular character set extension.
+
+ A character set extension has displayed glyphs for code points that
+ are not assigned in the character set, for example, vendor-specific
+ extensions of standard character sets. This parameter provides the
+ option of using these extensions. Although character set extensions
+ may cause interoperability problems, we recognize the existence of
+ such extensions.
+
+ For example:
+ Content-Type: text/plain; charset=CN-Big5; charset-edition=1984;
+ charset-extension=ETen-2.00.03-DOS
+
+ This may indicate Eten company's extension of Big5: ETen 2.00.03 for
+ DOS, assuming that "ETen-2.00.03-DOS" is registered with the IANA..
+
+4.3. Formal Syntax:
+
+ The following changes and additions are made to the MIME syntax:
+
+ charset-edition := "charset-edition" "=" 4DIGIT
+ ; year of edition in four digits
+
+ charset-extension := "charset-extension" "=" extension-token
+
+5. Background Information
+
+5.1. Writing systems and their encodings in Chinese-speaking nations and
+ regions
+
+ The mainland provinces of China use simplified Chinese character in
+ daily life. GB is the standard electronic character set. It is the
+ main means for communications between people who share simplified
+ Chinese characters in the world.
+
+ Taiwan uses traditional Chinese characters in daily life. CNS-11643
+ is the formal character set for information interchange in Taiwan;
+ however, Big5, a widely-used character set of traditional Chinese
+ characters, is the de-facto internal code standard in Taiwan.
+
+ Hong Kong uses traditional Chinese characters in daily life, but uses
+ both GB and Big5 in electronic form, because Hong Kong people often
+ communicate with people in all of China's provinces.
+
+ Singapore seldom uses Chinese characters, and uses the simplified
+ form when Chinese characters are used. In electronic form, Unicode
+ is more popular, however GB is also used.
+
+
+
+
+Zhu, et al Informational [Page 11]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+5.2. Miscellaneous information about Chinese character sets
+
+ The GB 1988-89 character set is identical to ISO 646 [ISO-646] except
+ for currency symbol and tilde. The currency symbol and the tilde are
+ replaced by the Yuan sign and the overline. This set is GB's variant
+ of ISO 646. This character set and CNS 5205 [CNS-5205] are not
+ encouraged for use in the Internet, since ASCII combined with GB 2312
+ or CNS 11643-plane 1 and plane 2 contains all the characters in them.
+
+ The GB 2312-80 character set consists of simplified Chinese
+ characters, digits, and the Latin, Greek and Russian alphabets, and
+ some other symbols; in all, 7445 characters. Each character is
+ represented with two bytes.
+
+ GB 13000-95 [GB-13000] is GB's variant of ISO 10646. However, for
+ interoperability in the Internet, assigned names for ISO 10646 are
+ encouraged instead.
+
+ Currently both sides of the Taiwan Straits are cooperating closely in
+ promoting the use of ISO 10646's BMP and in continuing its
+ development together with other organizations under ISO.
+
+5.3. Miscellaneous implementation information
+
+ For maximum interoperability, implementations SHOULD at least support
+ sending and receiving ISO-2022-CN. Supporting all registered
+ character sets in ISO-2022-CN-EXT is greatly encouraged.
+
+ To meet the current usage, support of CN-GB (the status quo for
+ simplified Chinese e-mail ) or CN-Big5 (the status quo for
+ traditional Chinese e-mail) may be necessary. However, it is not
+ reliable to send documents directly with these internal codes,
+ therefore sending ISO-2022-CN message is always encouraged whenever
+ possible.
+
+ To the maximum extent possible, implementations should be capable of
+ receiving messages in any of the encodings described in this
+ document, even if they only transmit messages in one form.
+
+ Preferably the implementation should display the characters with
+ glyphs appropriate to the typographic tradition that is implied in
+ the encoding of the received text. Implementation may also translate
+ these encodings to the encoding that its platform supports.
+
+ The human user (not implementor) should try to keep lines within 80
+ display columns, or, preferably, within 75 (or so) columns, to allow
+ insertion of ">" at the beginning of each line in excerpts. Each
+ Chinese character takes up two columns, and the shift sequences do
+
+
+
+Zhu, et al Informational [Page 12]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ not take up any columns. The implementor is reminded that Chinese
+ characters take up two bytes and should not be split in the middle to
+ break lines for displaying, etc.
+
+ Freely available fonts of Chinese characters:
+
+ Beijing:
+ ftp://ftp.net.tsinghua.edu.cn/pub/Chinese/fonts/
+
+ Xi'an:
+ ftp://ftp.xanet.edu.cn/pub/chinese-soft/fonts/
+
+ Taiwan:
+ ftp://ftp.edu.tw/Chinese/ifcss/software/fonts/
+ ftp://ftp.ntu.edu.tw/Chinese/ifcss/software/fonts/
+
+ Hong Kong:
+ ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/fonts/
+
+ Singapore:
+ ftp://ftp.technet.sg:/pub/chinese/fonts/
+
+ US:
+ ftp://ftp.ifcss.org/pub/software/fonts/
+ http://ccic.ifcss.org/www/pub/software/fonts/
+
+6. X.400 Considerations
+
+ X.400 has the ability of carrying different character sets in a
+ message by using the body part "GeneralText" defined by
+ ISO/IEC-10021-7 [ISO-10021].
+
+ The X.400 ASN.1 definition of the GeneralText body part is:
+
+ general-text-body-part EXTENDED-BODY-PART-TYPE
+ PARAMETERS GeneralTextParameters IDENTIFIED BY id-ep-general-text
+ DATA GeneralTextData
+ ::= id-et-general-text
+
+ GeneralTextParameters ::= SET OF CharacterSetRegistration
+
+ CharacterSetRegistration ::= INTEGER (1..32767)
+
+ GeneralTextData ::= GeneralString
+
+ Therefore, to use ISO-2022-CN, set the "CharacterSetRegistration"
+ part as { 6 58 171 172 }, and add an ESC sequence of ESC ( B (three
+ bytes, hexadecimal values: 1B 28 42) before the beginning of each
+
+
+
+Zhu, et al Informational [Page 13]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ line of ISO-2022-CN text.
+
+ Similarly, to use ISO-2022-CN-EXT, set the registered numbers of all
+ character sets in the "CharacterSetRegistration" part and add ESC ( B
+ at the beginning of each line. For the registered numbers, please
+ refer to ISO registry. In addition to the character sets supported
+ by ISO-2022-CN, currently registered numbers are:
+
+ ISO IR 165 (GB 2312+GB 8565.2): 165
+ CNS 11643-plane 3: 183
+ CNS 11643-plane 4: 184
+ CNS 11643-plane 5: 185
+ CNS 11643-plane 6: 186
+ CNS 11643-plane 7: 187
+
+ 176 is the registered number for the BASESET of ISO/IEC 10646-1:1993
+ UCS-2 with implementation level 3, Escape sequence of ESC % / E (four
+ bytes, hexadecimal values 1B 25 2F 45) indicates starting of this
+ codeset.
+
+ For CN-GB and CN-Big5 character sets, there are no formal methods
+ that could be used in X.400 yet.
+
+ For detail about X.400 use of character sets, please refer to RFC
+ 1502 [RFC-1502].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zhu, et al Informational [Page 14]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+7. Formal Syntax of ISO-2022-CN and ISO-2022-CN-EXT
+
+ The notational conventions used here are identical to those used in
+ RFC 822.
+
+7.1. Formal Syntax of ISO-2022-CN
+
+ body ::= * ( ascii_line / c_line )
+
+ ascii_line ::= *char CRLF
+
+ c_line ::= *char 1*(1*designation 1*(*char 1*c_text *char)) CRLF
+
+ designation ::= SOdesignation / SS2designation
+
+ SOdesignation ::= ESC "$" ")" finalchar_for_SO
+
+ SS2designation ::= ESC "$" "*" finalchar_for_SS2
+
+ finalchar_for_SO ::= "A" / "G"
+
+ finalchar_for_SS2 ::= "H"
+
+ c_text ::= 1* ( SO-SI-segment / SS2segment )
+
+ SO-SI-segment ::= SO 1*c_char *designation *c_segment SI
+
+ c_segment ::= 1* ( c_char / SS2segment )
+
+ SS2segment ::= SS2 c_char
+
+ c_char ::= one_of_94 one_of_94
+
+ ; ( Octal, Decimal.)
+
+ ESC ::= <ISO-646 ESC, escape> ; ( 33, 27.)
+
+ SI ::= <ASCII SI, shift in> ; ( 17, 15.)
+
+ SO ::= <ASCII SO, shift out> ; ( 16, 14.)
+
+ SS2 ::= <ISO 2022 Single_shift two> ; ( 33 116, 27 78.)
+
+ one_of_94 ::= <any char in 94_char set> ; ( 41-176, 33-126. )
+
+ char ::= <any char in 96_char_set> ; ( 40-177, 30-127. )
+
+
+
+
+
+Zhu, et al Informational [Page 15]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+7.2. Formal Syntax of ISO-2022-CN-EXT
+
+ body ::= * ( ascii_line / c_line )
+
+ ascii_line ::= *char CRLF
+
+ c_line ::= *char 1*(1*designation 1*(*char 1*c_text *char)) CRLF
+
+ designation ::= SOdesignation / SS2designation / SS3designation
+
+ SOdesignation ::= ESC "$" ")" finalchar_for_SO
+
+ SS2designation ::= ESC "$" "*" finalchar_for_SS2
+
+ SS3designation ::= ESC "$" "+" finalchar_for_SS3
+
+ finalchar_for_SO ::= "A" / <X12345> / "G" / "E"
+
+ finalchar_for_SS2 ::= <X7589> / <X13131> / "H"
+
+ finalchar_for_SS3 ::= <X7590> / <X13132> / "I" / "J" / "K" / "L"
+ / "M"
+
+ c_text ::= 1* ( SO-SI-segment / SS2segment / SS3segment )
+
+ SO-SI-segment ::= SO 1*c_char *designation *c_segment SI
+
+ c_segment ::= 1* ( c_char / SS2segment / SS3segment )
+
+ SS2segment ::= SS2 c_char
+
+ SS3segment ::= SS3 c_char
+
+ c_char ::= one_of_94 one_of_94
+
+ ; ( Octal, Decimal.)
+
+ ESC ::= <ISO-646 ESC, escape> ; ( 33, 27.)
+
+ SI ::= <ASCII SI, shift in> ; ( 17, 15.)
+
+ SO ::= <ASCII SO, shift out> ; ( 16, 14.)
+
+ SS2 ::= <ISO 2022 Single_shift two> ; ( 33 116, 27 78.)
+
+ SS3 ::= <ISO 2022 Single_shift three>; ( 33 117, 27 79.)
+
+ one_of_94 ::= <any char in 94_char set> ; ( 41-176, 33-126.
+
+
+
+Zhu, et al Informational [Page 16]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ )
+
+ char ::= <any char in 96_char_set> ; ( 40-177, 30-127.
+ )
+
+
+8. Registration of New "charset"s and New MIME parameter
+
+8.1. This document defines the following MIME "charset" names for
+ Chinese text:
+
+ ISO-2022-CN, ISO-2022-CN-EXT
+ CN-GB, CN-Big5
+ CN-GB-12345
+ CN-GB-ISOIR165
+
+8.2. This document defines two new MIME parameters:
+
+ charset-edition
+ charset-extension
+
+Acknowledgments
+
+ This document is the result of cooperation in APNG-CC, the Chinese
+ Character sub-working group of the I18N/L10N (Internationalization
+ and Localization) working group of APNG (Asia-Pacific Networking
+ Group), coordinator Zhu Haifeng <zhf@net.tsinghua.edu.cn>. The
+ membership of APNG-CC consists of individuals from both sides of the
+ Taiwan Strait, HongKong, and from Singapore and other countries. We
+ wish to thank all members of APNG-CC.
+
+ Prof. Yao Shiquan (Deputy chair of CITS--China Information Technology
+ Standardization Technical Committee), Ms. Lin Ning (Secretary-General
+ of CITS), Mr. Guo Chengzhong of the Office of the Joint Conference of
+ China Economic Information, and Prof. Zhao Jingrong, Prof. Wu
+ Jianping, Prof. Li Xing, and Mr. You Yue (Tsinghua University) and
+ other experts from CERNET Expert Committee, Prof. Meng Qingyu (China
+ Computer Software & Technology Services Corporation), Prof. Cao
+ Jinwen and Mr. Yu Jun (IBM Beijing) gave a lot of support and help in
+ many aspects.
+
+ Special thanks for the supports towards APNG-CC from Prof. Yang
+ Tianxing (Chair of CITS).
+
+ Prof. Ding ZyKaan from Academia Sinica of Taiwan, and Mr. C. J.
+ Cherng and Mr. C. K. Fan of III (Institute for Information Industry),
+ Mr. Chang JingShin from Tsinghua University in Hsinchu of Taiwan, Ms.
+ C. C. Hsu from IBM Taiwan and Ms. Tong-Lee Anita Lin from Microsoft
+
+
+
+Zhu, et al Informational [Page 17]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ Taiwan gave a lot of support and contributions in APNG-CC's work. In
+ particular, Ms. C. C. Hsu put much effort towards completing the
+ Appendix of this document.
+
+ We also wish to thank the following people who contributed in many
+ ways towards this document.
+
+ Zhang Zhoucai Martin J. Duerst
+ Zhang Ling Kenichi Handa
+ Zhu Bin Lu Chin
+ Sun Yufang Nelson Chin
+ Chen Shuyi Mao Yonggang
+ Masataka Ohta Ken Lunde
+ Lua Kim Teng Victor Cheng
+ Stephen G. Simpson Yuan Jiang
+ Liu Huifang Harald T. Alvestrand
+ Qian Hualin Jiang Lin
+ Lu Ming Emily Hsu
+ Wu Jian Zhu Shuang
+ Zheng Long Zhang Hailin
+ Yonggang Zhang Feng Hui
+ Yao Jian
+
+Security Considerations
+
+ Security issues are not discussed in this memo.
+
+Authors' Addresses
+
+ Zhu Haifeng (HF. Zhu)
+ 216 Central Main Building
+ Tsinghua University
+ Beijing, 100084
+ China
+
+ Tel: +86-10-2561144 ext. 3492
+ Fax: +86-10-2564173
+ EMail: zhf@net.tsinghua.edu.cn, zhf@net.edu.cn
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zhu, et al Informational [Page 18]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ Hu Daoyuan (DY. Hu)
+ Tsinghua Networking Center
+ Tsinghua University
+ Beijing, 100084
+ China
+
+ Tel: +86-10-2594016
+ Fax: +86-10-2564173
+ EMail: hdy@tsinghua.edu.cn
+
+
+ Wang Zhiguan (ZG. Wang)
+ Beijing 1101 MailBox
+ SubCommitte 2 (SC2)
+ China Information Technology Standardization Technical Committee
+ (CITS)
+ Beijing, 100007
+ China
+
+ Tel: +86-10-4012392
+ Fax: +86-10-4010601
+
+
+ Kao Tien-cheu (TC. Kao)
+ I.T. Promotion Division
+ Institute for Information Industry (III)
+ Taipei
+ Taiwan
+
+ Tel: +886-2-5631688
+ Fax: +886-2-563-4209
+ EMail: tckao@iiidns.iii.org.tw
+
+
+ Chang Wen-chung (WCH. Chang)
+ Institute for Information Industry (III)
+ Taipei
+ Taiwan
+
+ Tel: +886-2-7327771
+ Fax: +886-2-7370188
+ EMail: chung@iiidns.iii.org.tw
+
+
+
+
+
+
+
+
+
+Zhu, et al Informational [Page 19]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ Mark R. Crispin
+ Networks and Distributed Computing
+ University of Washington
+ 4545 15th Avenue NE
+ Seattle, WA 98105-4527
+ USA
+
+ Tel: +1 (206) 543-5762
+ Fax: +1 (206) 685-4045
+ EMail: MRC@CAC.Washington.EDU
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zhu, et al Informational [Page 20]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+Appendix -- Conversion Table for ISO-2022-CN (EXT) and Big5
+
+ This is a conversion table for the Chinese characters in Big5's
+ common part and ISO-2022-CN/-EXT, including all the vendor-specific
+ characters from Eten, Microsoft and IBM. For conversion source and
+ binary programs for Big5, III provides good on-line services (ftp
+ site listed in section 1.4), and [CJKINF] is also a good reference.
+
+A.1. Big5 (ETen, IBM, and Microsoft version) symbol set correspondence
+ to CNS 11643 Plane 1:
+
+ 0xA140-0xA1F5 <-> 0x2121-0x2256
+ 0xA1F6 <-> 0x2258
+ 0xA1F7 <-> 0x2257
+ 0xA1F8-0xA2AE <-> 0x2259-0x234E
+ 0xA2AF-0xA3BF <-> 0x2421-0x2570
+ 0xA3C0-0xA3E0 <-> 0x4221-0x4241 (ETen and Microsoft
+ defined as reserved area)
+
+A.2. Big5 (ETen, IBM, and Microsoft version) Level 1 correspondence to
+ CNS 11643-1992 Plane 1:
+
+ 0xA440-0xACFD <-> 0x4421-0x5322
+ 0xACFE <-> 0x5753
+ 0xAD40-0xAFCF <-> 0x5323-0x5752
+ 0xAFD0-0xBBC7 <-> 0x5754-0x6B4F
+ 0xBBC8-0xBE51 <-> 0x6B51-0x6F5B
+ 0xBE52 <-> 0x6B50
+ 0xBE53-0xC1AA <-> 0x6F5C-0x7534
+ 0xC1AB-0xC2CA <-> 0x7536-0x7736
+ 0xC2CB <-> 0x7535
+ 0xC2CC-0xC360 <-> 0x7737-0x782C
+ 0xC361-0xC3B8 <-> 0x782E-0x7863
+ 0xC3B9 <-> 0x7865
+ 0xC3BA <-> 0x7864
+ 0xC3BB-0xC455 <-> 0x7866-0x7961
+ 0xC456 <-> 0x782D
+ 0xC457-0xC67E <-> 0x7962-0x7D4B
+
+A.3. Big5 (ETen, IBM, and Microsoft version) Level 2 correspondence to
+ CNS 11643-1992 Plane 2:
+
+ 0xC940-0xC949 <-> 0x2121-0x212A
+ 0xC94A <-> 0x4442 # duplicate of Level 1's 0xA461
+ 0xC94B-0xC96B <-> 0x212B-0x214B
+ 0xC96C-0xC9BD <-> 0x214D-0x217C
+ 0xC9BE <-> 0x214C
+ 0xC9BF-0xC9EC <-> 0x217D-0x224C
+
+
+
+Zhu, et al Informational [Page 21]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ 0xC9ED-0xCAF6 <-> 0x224E-0x2438
+ 0xCAF7 <-> 0x224D
+ 0xCAF8-0xD779 <-> 0x2439-0x387D
+ 0xD77A <-> 0x3F6A
+ 0xD77B-0xDBA6 <-> 0x387E-0x3F69
+ 0xDBA7-0xDDFB <-> 0x3F6B-0x4423
+ 0xDDFC <-> 0x4176 # duplicate of 0xDCD1
+ 0xDDFD-0xE8A2 <-> 0x4424-0x554A
+ 0xE8A3-0xE975 <-> 0x554C-0x5721
+ 0xE976-0xEB5A <-> 0x5723-0x5A27
+ 0xEB5B-0xEBF0 <-> 0x5A29-0x5B3E
+ 0xEBF1 <-> 0x554B
+ 0xEBF2-0xECDD <-> 0x5B3F-0x5C69
+ 0xECDE <-> 0x5722
+ 0xECDF-0xEDA9 <-> 0x5C6A-0x5D73
+ 0xEDAA-0xEEEA <-> 0x5D75-0x6038
+ 0xEEEB <-> 0x642F
+ 0xEEEC-0xF055 <-> 0x6039-0x6242
+ 0xF056 <-> 0x5D74
+ 0xF057-0xF0CA <-> 0x6243-0x6336
+ 0xF0CB <-> 0x5A28
+ 0xF0CC-0xF162 <-> 0x6337-0x642E
+ 0xF163-0xF16A <-> 0x6430-0x6437
+ 0xF16B <-> 0x6761
+ 0xF16C-0xF267 <-> 0x6438-0x6572
+ 0xF268 <-> 0x6934
+ 0xF269-0xF2C2 <-> 0x6573-0x664C
+ 0xF2C3-0xF374 <-> 0x664E-0x6760
+ 0xF375-0xF465 <-> 0x6762-0x6933
+ 0xF466-0xF4B4 <-> 0x6935-0x6961
+ 0xF4B5 <-> 0x664D
+ 0xF4B6-0xF4FC <-> 0x6962-0x6A4A
+ 0xF4FD-0xF662 <-> 0x6A4C-0x6C51
+ 0xF663 <-> 0x6A4B
+ 0xF664-0xF976 <-> 0x6C52-0x7165
+ 0xF977-0xF9C3 <-> 0x7167-0x7233
+ 0xF9C4 <-> 0x7166
+ 0xF9C5 <-> 0x7234
+ 0xF9C6 <-> 0x7240
+ 0xF9C7-0xF9D1 <-> 0x7235-0x723F
+ 0xF9D2-0xF9D5 <-> 0x7241-0x7244
+
+
+A.4. Big5 (ETen and IBM Version) specific numeric symbols
+ correspondence to CNS 11643 Plane 1: (Microsoft version defined
+ this area as UDC - User Defined Character)
+
+
+
+
+
+Zhu, et al Informational [Page 22]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ 0xC6A1-0xC6BE <-> 0x2621 - 0x263E
+
+A.5. Big5 (ETen and IBM Version) specific KangXi radicals
+ correspondence to CNS 11643 Plane 1: (Microsoft version defined as
+ UDC - User Definable Character)
+
+ 0xC6BF <-> 0x2723
+ 0xC6C0 <-> 0x2724
+ 0xC6C1 <-> 0x2726
+ 0xC6C2 <-> 0x2728
+ 0xC6C3 <-> 0x272D
+ 0xC6C4 <-> 0x272E
+ 0xC6C5 <-> 0x272F
+ 0xC6C6 <-> 0x2734
+ 0xC6C7 <-> 0x2737
+ 0xC6C8 <-> 0x273A
+ 0xC6C9 <-> 0x273C
+ 0xC6CA <-> 0x2742
+ 0xC6CB <-> 0x2747
+ 0xC6CC <-> 0x274E
+ 0xC6CD <-> 0x2753
+ 0xC6CE <-> 0x2754
+ 0xC6CF <-> 0x2755
+ 0xC6D0 <-> 0x2759
+ 0xC6D1 <-> 0x275A
+ 0xC6D2 <-> 0x2761
+ 0xC6D3 <-> 0x2766
+ 0xC6D4 <-> 0x2829
+ 0xC6D5 <-> 0x282A
+ 0xC6D6 <-> 0x2863
+ 0xC6D7 <-> 0x286C
+
+A.6. Big5 (ETen and Microsoft version) specific Ideographs
+ correspondence to CNS 11643 Plane 3: (IBM version defined as UDC)
+
+ 0xF9D6 <-> 0x4337
+ 0xF9D7 <-> 0x4F50
+ 0xF9D8 <-> 0x444E
+ 0xF9D9 <-> 0x504A
+ 0xF9DA <-> 0x2C5D
+ 0xF9DB <-> 0x3D7E
+ 0xF9DC <-> 0x4B5C
+
+
+A.7. Big5 (ETen version only) specific symbols correspondence to CNS
+ 11643 Plane 4:
+
+ 0xC879 <-> 0x2123
+
+
+
+Zhu, et al Informational [Page 23]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ 0xC87B <-> 0x2124
+ 0xC87D <-> 0x212A
+ 0xC8A2 <-> 0x2152
+
+A.8. Other Big5 specific symbols which cannot mapping to CNS 11643:
+
+ 0xC6D8-0xC878 <-> none (ETen and IBM Version)
+ 0xC87A <-> none (ETen version only)
+ 0xC87C <-> none (ETen version only)
+ 0xC87E-0xC8A1 <-> none (ETen version only)
+ 0xC8A3-0xC8CC <-> none (ETen version only)
+ 0xC8CD-0xC8D3 <-> none (ETen and IBM version)
+ 0xF9DD-0xF9FE <-> none (ETen and Microsoft version)
+
+ Note: However, most of them can be mapped to GB-2312 too. For
+ example, Big5(ETen and IBM version) Hiragana, Katakana, and
+ Cyrillic symbols correspondence to GB-2312:
+
+ 0xC6E7-0xC77A <-> 0x2421-0x2473 # Japanese Hiragana
+ 0xC77B-0xC7F2 <-> 0x2521-0x2576 # Japanese Katakana
+ 0xC7F3-0xC854 <-> 0xA7A1-0xA7C1 # Cyrillic uppercase
+ 0xC855-0xC875 <-> 0xA7D1-0xA7F1 # Cyrillic lowercase
+
+ Please notice that there are also many symbols that could be
+ supported by GB-2312, for detail, please refer to the ftp sites in
+ section 1.4 of the "Specification" part of this document.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zhu, et al Informational [Page 24]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+References
+
+ [ASCII] American National Standards Institute, "Coded character set:
+ 7-bit American National Standard Code for Information Interchange",
+ ANSI X3.4-1986.
+
+ [BIG5] Institute for Information Industry, "Chinese Coded Character
+ Set in Computer ", March, 1984
+
+ [CJKINF] Ken Lunde, On-line documentation of Chinese/Japanese/Korean
+ Information Processing, 1995, available at:
+ ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
+
+ [CNS-5205] "Information processing: 7-Bit Coded Character Set For
+ Information Interchange", CNS-5205.
+
+ [CNS-11643] "Chinese Standard Interchange Code", CNS-11643 version
+ 1992; "Standard Interchange Code for Generally-Used Chinese
+ Characters", CNS 11643 version 1986.
+
+ [GB-1988] "7-bit Coding Character Set for Information Interchange",
+ GB 1988-89.
+
+ [GB-2312] "Coding of Chinese Ideogram Set for Information Interchange
+ Basic Set", GB 2312-80.
+
+ [GB-7589] "Code of Chinese Ideograms Set for Information Interchange,
+ the 2nd Supplementary Set", UDC 681.3.048, GB 7589-87.
+
+ [GB-7590] "Code of Chinese Ideogram Set for Information Interchange,
+ the 4th Supplementary Set", UDC 681.3.048, GB 7590-87.
+
+ [GB-8565] "Information Processing Coded Character Sets for Text
+ Communication", UDC 681.3, GB 8565-88.
+
+ [GB-12345] "Code of Chinese Ideogram Set for Information Interchange
+ Supplementary Set", GB/T 12345-90.
+
+ [GB-13000] "Information Technology: Universal Multiple-Octet Coded
+ Character Set(UCS) Part 1: Architecture and Basic Multilingual
+ Plane", GB13000.1
+
+ [GB-13131] "Code of Chinese Ideogram Set for Information Interchange,
+ the 3rd Supplementary Set", GB 13131-91.
+
+ [GB-13132] "Code of Chinese Ideogram Set for Information Interchange,
+ the 5th Supplementary Set", GB 13132-91.
+
+
+
+
+Zhu, et al Informational [Page 25]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ [ISO-646] International Organization for Standardization (ISO),
+ "Information Technology: ISO 7-bit Coded Character Set for
+ Information Interchange", International Standard, Ref. No. ISO/IEC
+ 646:1991.
+
+ [ISO-2022] International Organization for Standardization (ISO),
+ "Information Processing: ISO 7-bit and 8-bit coded character sets:
+ Code extension techniques", International Standard, Ref. No. ISO
+ 2022-1986 (E).
+
+ [ISO-10021] Information Technology: Text communication:
+ Message-Oriented Text Interchange Systems (MOTIS), ISO 10021, October
+ 1988.
+
+ [ISO-10646] ISO/IEC 10646-1:1993(E) Information Technology: Universal
+ Multiple-octet Coded Character Set (UCS) Part 1: Architecture and
+ Basic Multilingual Plane"
+
+ [ISOREG] International Organization for Standardization (ISO),
+ "International Register of Coded Character Sets To Be Used With
+ Escape Sequences".
+
+ [MIME-1] Borenstein, N., and Freed, N., "MIME (Multipurpose Internet
+ Mail Extensions) Part One: Mechanisms for Specifying and Describing
+ the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
+ September 1993.
+
+ [MIME-2] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
+ Part Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
+ University of Tennessee, September 1993.
+
+ [RFC-822] Crocker, D., "Standard for the Format of ARPA Internet Text
+ Messages", STD 11, RFC 822, University of Delaware, August 1982.
+
+ [RFC-854] Postel, J., Reynolds J., Telnet Protocol Specification, RFC
+ 854, ISI, May 1983.
+
+ [RFC-1036] Horton, M., and Adams, R., "Standard for Interchange of
+ USENET Messages", RFC 1036, AT&T Bell Laboratories, Center for
+ Seismic Studies, December 1987.
+
+ [RFC-1468] Murai J., Crispin, M., and van der Poel, E., Japanese
+ Character Encoding for Internet Messages, June 1993.
+
+ [RFC-1557] Choi U., Chon K., and Park H., Korean Character Encoding
+ for Internet Messages, December 1993.
+
+
+
+
+
+Zhu, et al Informational [Page 26]
+
+RFC 1922 Chinese Character Encoding March 1996
+
+
+ [RFC-1641] Goldsmith D., and Davis M., "Using Unicode with MIME", RFC
+ 1641, Taligent Inc., July 1994
+
+ [RFC-1642] Goldsmith D., and Davis M.," UTF-7, A Mail-Safe
+ Transformation Format of Unicode", July 1994
+
+ [RFC-1700] Reynolds J., and Postel J., "Assigned Numbers",RFC 1700,
+ STD 2, ISI, October 1994
+
+ [SMTP] Postel, J. B. "Simple Mail Transfer Protocol", STD 10, RFC
+ 821, USC/Information Sciences Institute, August 1982.
+
+ [SMTPEXT] Klensin J., Freed N., Rose M., Stefferud E., and Crocker
+ D., "SMTP Service Extensions", RFC 1651, July 1994.
+
+ [Unicode 1.1] "The Unicode Standard, Version 1.1", Addison-Wesley,
+ Reading, MA (to be published; the contents of this standard is
+ currently available by combining [Unicode92], [Unicode93], and
+ [Unicode4]).
+
+ [Unicode92] The Unicode Consortium, "The Unicode Standard: Worldwide
+ Character Encoding: Version 1.0", Volume 1, Addison-Wesley, Reading,
+ MA, 1992 (ISBN 0-201-56788-1).
+
+ [Unicode93] The Unicode Consortium, "The Unicode Standard: Worldwide
+ Character Encoding: Version 1.0", Volume 2, Addison-Wesley, Reading,
+ MA, 1992 (ISBN 0-201-60845-6).
+
+ [Unicode4] The Unicode Consortium, "The Unicode Standard: Version 1.1
+ (Prepublication Edition)", Unicode Technical Report #4 (avaliable
+ from the Unicode Consortium).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zhu, et al Informational [Page 27]
+