summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1842.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc1842.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc1842.txt')
-rw-r--r--doc/rfc/rfc1842.txt675
1 files changed, 675 insertions, 0 deletions
diff --git a/doc/rfc/rfc1842.txt b/doc/rfc/rfc1842.txt
new file mode 100644
index 0000000..254aa22
--- /dev/null
+++ b/doc/rfc/rfc1842.txt
@@ -0,0 +1,675 @@
+
+
+
+
+
+
+Network Working Group Y. Wei
+Request for Comments: 1842 AsiaInfo Services Inc.
+Category: Informational Y. Zhang
+ Harvard Univ.
+ J. Li
+ Rice Univ.
+ J. Ding
+ AsiaInfo Services Inc.
+ Y. Jiang
+ Univ. of Maryland
+ August 1995
+
+
+ ASCII Printable Characters-Based Chinese Character Encoding
+ for Internet Messages
+
+Status of this Memo
+
+ This memo provides information for the Internet community. This memo
+ does not specify an Internet standard of any kind. Distribution of
+ this memo is unlimited.
+
+Abstract
+
+ This document describes the encoding used in electronic mail [RFC822]
+ and network news [RFC1036] messages over the Internet. The 7-bit
+ representation of GB 2312 Chinese text was specified by Fung Fung Lee
+ of Stanford University [Lee89] and implemented in various software
+ packages under different platforms (see appendix for a partial list
+ of the available software packages that support this encoding
+ method). It is further tested and used in the usenet newsgroups
+ alt.chinese.text and chinese.* as well as various other network
+ forums with considerable success. Future extensions of this encoding
+ method can accommodate additional GB character sets and other east
+ asian language character sets [Wei94].
+
+ The name given to this encoding is "HZ-GB-2312", which is intended to
+ be used in the "charset" parameter field of MIME headers (see [MIME1]
+ and [MIME2]).
+
+
+
+
+
+
+
+
+
+
+
+
+Wei, et al Informational [Page 1]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+Table of Contents
+
+ 1. Introduction................................................ 2
+ 2. Description................................................. 3
+ 3. Formal Syntax............................................... 4
+ 4. MIME Considerations......................................... 5
+ 5. Background Information...................................... 5
+ 6. References.................................................. 6
+ 7. Acknowledgements............................................ 6
+ 8. Security Considerations..................................... 7
+ 9. Authors' Addresses.......................................... 7
+ 10. Appendix: List of Software Implementing HZ Representation... 9
+
+1. Introduction
+
+ Chinese (and other east Asia languages) characters are encoded with
+ multiple bytes to guarantee sufficient coding space for the large
+ number of glyphs these languages contain. With the prolification of
+ internetwork traffic around the world, it becomes necessary to define
+ ways to facilitate the transfer of text in multiple-byte character-
+ set languages (hereafter as Chinese text) over internet.
+
+ There are two layers of concerns need to be addressed by any
+ mechanism whose purpose is to transfer Chinese text over internet.
+ The first is on application layer, in which concerned applications
+ should be able to recognize the encoding of the text and/or discern
+ different character sets which might be mixed in the text and handle
+ it accordingly. The second layer is the actual transport of Chinese
+ text between point A to point B over the Internet. Because the
+ prevailing mail transport protocol used over internet, the Simple
+ Mail Transport Protocol (aka. SMTP) was designed originally for ASCII
+ character set only, many internet mail agents are not 8 bit clean and
+ therefore introduce challenges for any attempt to actually implement
+ a mechanism for the transport of Chinese text over internet.
+
+ Here we describe a mechanism for transmission of Chinese text over IP
+ network. This described mechanism has being implemented by various
+ software package dealing with multi-language support and has been
+ tested on USENET newsgroups and other types of internet forums over
+ the last two years. The test results shows that the HZ representation
+ can pass through almost all existing mail delivery agents without
+ being corrupted. The HZ representation currently handles GB2312-80
+ Chinese character set only. Further expansion to other Chinese
+ encoding systems and to other East Asia Language is under
+ consideration.
+
+
+
+
+
+
+Wei, et al Informational [Page 2]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+2. Description
+
+ For an arbitrary mixed text with both Chinese coded text strings and
+ ASCII text strings, we designate to two distinguishable text modes,
+ ASCII mode and HZ mode, as the only two states allowed in the text.
+ At any given time, the text is in either one of these two modes or in
+ the transition from one to the other. In the HZ mode, only printable
+ ASCII characters (0x21-0x7E) are meanful with the size of basic text
+ unit being two bytes long.
+
+ In the ASCII mode, the size of basic text unit is one (1) byte with
+ the exception '~~', which is the special sequence representing the
+ ASCII character '~'. In both ASCII mode and HZ mode, '~' leads an
+ escape sequence. However, as HZ mode has basic size of text unit
+ being 2 bytes long, only the '~' character which appears at the first
+ byte of the the two-byte character frame are considered as the start
+ of an escape sequence.
+
+ The default mode is ASCII mode. Each line of text starts with the
+ default ASCII mode. Therefore, all Chinese character strings are to
+ be enclosed with '~{' and '~}' pair in the same text line.
+
+ The escape sequences defined are as the following:
+
+ ~{ ---- escape from ASCII mode to GB2312 HZ mode
+ ~} ---- escape from HZ mode to ASCII mode
+ ~~ ---- ASCII character '~' in ASCII mode
+ ~\n ---- line continuation in ASCII mode
+ ~[!-z|] ---- reserved for future HZ mode character sets
+
+
+ A few examples of the 7 bit representation of Chinese GB coded test
+ taken directly from [Lee89] are listed as the following:
+
+ Example 1: (Suppose there is no line size limit.)
+ This sentence is in ASCII.
+ The next sentence is in GB.~{<:Ky2;S{#,NpJ)l6HK!#~}Bye.
+
+ Example 2: (Suppose the maximum line size is 42.)
+ This sentence is in ASCII.
+ The next sentence is in GB.~{<:Ky2;S{#,~}~
+ ~{NpJ)l6HK!#~}Bye.
+
+ Example 3: (Suppose a new line is started for every mode switch.)
+ This sentence is in ASCII.
+ The next sentence is in GB.~
+ ~{<:Ky2;S{#,NpJ)l6HK!#~}~
+ Bye.
+
+
+
+Wei, et al Informational [Page 3]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+3. Formal Syntax
+
+ The notational conventions used here are identical to those used in
+ RFC 822 [RFC822].
+
+ The * (asterisk) convention is as follows:
+
+ l*m something
+
+ meaning at least l and at most m somethings, with l and m taking
+ default values of 0 and infinity, respectively.
+
+
+ message = headers 1*( CRLF *single-byte-char *segment
+ single-byte-seq *single-byte-char )
+ ; see also [MIME1] "body-part"
+ ; note: must end in ASCII
+
+ headers = <see [RFC822] "fields" and [MIME1] "body-part">
+
+ segment = single-byte-segment / double-byte-segment
+
+ single-byte-segment = 1*single-byte-char
+
+ double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )
+
+ single-byte-seq = "~}"
+
+ double-byte-seq = "~{"
+
+ CRLF = CR LF
+ ; ( Octal, Decimal.)
+
+ CR = <ASCII CR, carriage return>; ( 15, 13.)
+
+ LF = <ASCII LF, linefeed> ; ( 12, 10.)
+
+ one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)
+
+ single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT
+ including CRLF, not including > / "~~">;
+
+ 7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)
+
+
+
+
+
+
+
+
+Wei, et al Informational [Page 4]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+4. MIME Considerations
+
+ The name given to the HZ character encoding is "HZ-GB-2312". This
+ name is intended to be used in MIME messages as follows:
+
+ Content-Type: text/plain; charset=HZ-GB-2312
+
+ The HZ-GB-2312 encoding is already in 7-bit form, so it is not
+ necessary to use a Content-Transfer-Encoding header.
+
+5. Background Information
+
+ A GB code is a two byte character withe the first byte is in the
+ range of 0x21-0x77 and the second byte in the range 0x21-0x7E. As the
+ printable ASCII subset of characters are single byte character in the
+ range of 0x21--0x7E, two printable ASCII characters can represent a
+ two byte GB coded Chinese character if proper escape sequence is used
+ to indicate the proper text mode. This form the base of the above
+ described HZ 7-bit representation methods. Further, with the use of a
+ printable ASCII character, '~', as the leading byte of the escape
+ sequence, the HZ representation eliminated the need of reserving any
+ non-printable ASCII characters, which are commonly used by
+ application programs (as well as system environment) for various
+ control function or other special signaling. Therefore, the HZ
+ representation method described here posses the least probability of
+ interfering with the host and network environment. This is also a
+ convenient for application for implementing the HZ coding method.
+
+ HZ representation method has been implemented in various Chinese
+ software across computer hardware platforms. It has also being tested
+ for more than two years over USENET newsgroups, alt.chinese.text and
+ chinese.*, for the transmission of Chinese texts over the internet.
+ The original points of those transferred Chinese texts are
+ geographically scattered around the world and under the constraints
+ of vast different system and network environments. Therefore, such a
+ test group may well represent a rather complete sample of the real
+ internet world. The successful test of the HZ representation method
+ therefore builds up the confidence that it is well suited for
+ transmitting multi-byte text messages over the internet.
+
+ Under HZ representation, ASCII text remain as 7-bit characters and
+ therefore HZ representation together with the 7-bit ASCII character
+ set can be viewed as forming a superset of characters.
+
+
+
+
+
+
+
+
+Wei, et al Informational [Page 5]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+6. References
+
+ [ASCII] American National Standards Institute, "Coded character set
+ -- 7-bit American national standard code for information
+ interchange", ANSI X3.4-1986.
+
+ [GB 2312] Technical Administrative Bureau of P.R.China, "Coding of
+ Chinese Ideogram Set for Information Interchange Basic Set",
+ GB 2312-80.
+
+ [Lee89] Lee, F., "HZ - A Data Format for Exchanging Files of
+ Arbitrarily Mixed Chinese and ASCII characters", RFC 1843,
+ Stanford University, August 1995.
+
+ [MIME1] Borenstein N., and N. Freed, "MIME (Multipurpose Internet
+ Mail Extensions) Part One: Mechanisms for Specifying and Describing
+ the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
+ September 1993.
+
+ [MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
+ Part Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
+ University of Tennessee, September 1993.
+
+ [RFC822] Crocker, D., "Standard for the Format of ARPA Internet
+ Text Messages", STD 11, RFC 822, UDEL, August 1982.
+
+ [RFC1036] Horton M., and R. Adams, "Standard for Interchange of
+ USENET Messages", RFC 1036, AT&T Bell Laboratories, Center for
+ Seismic Studies, December 1987.
+
+ [Wei94] Wei, Yagui, "A Proposal for a Consolidated Collection of
+ East Asian Language Coding Standards Using Solely ASCII Printable
+ Characters", June 30, 1994.
+
+7. Acknowledgements
+
+ Many people have involved the design and specification of the HZ 7-
+ bit Chinese representation system at different stages. Most notable
+ among them are Ed Lai, Chunqing Cheng, Fung Fung Lee, and Ricky
+ Yeung. This document is merely a recollection of thoughts and efforts
+ made collectively by this group of people whose devotion has led to
+ the current success of the HZ Chinese representation over the
+ Internet. Further, the authors wish to thank AsiaInfo Services Inc.
+ for sponsoring the preparation of this document and for facilitate
+ the communication need to refine this document.
+
+
+
+
+
+
+Wei, et al Informational [Page 6]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+8. Security Considerations
+
+ Security issues are not discussed in this memo.
+
+9. Authors' Addresses
+
+ Ya-Gui Wei
+ AsiaInfo Services Inc.
+ One Galleria Tower
+ 13355 Noel Rd. Suite 1340
+ Dallas, TX 75240
+
+ Phone: (214) 788-4141
+ Fax: (214) 788-0729
+ EMail: HZRFC@usai.asiainfo.com
+
+
+ Yun Fei Zhang
+ CfA
+ Harvard University
+ MS 66
+ 60 Garden St.
+ Cambridge, MA 02138
+
+ Phone: (617)-860-9444
+ EMail: zhang@orion.harvard.edu
+
+
+ Jian Q. Li
+ Rice University
+ ONS - MS 119
+ P.O. Box 1892
+ Houston, Texas 77251-1892
+
+ Phone: (713)285-5328
+ EMail: jian@is.rice.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wei, et al Informational [Page 7]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+ Jian Ding
+ ISTIC Bldg, Room 431
+ 15 Fuxing Road,
+ Beijing, China 100038
+
+ Phone: 86 10 853-7120
+ Fax: 86 10 853-7123
+ EMail: ding@Beijing.AsiaInfo.com
+
+
+ Yuan Jiang
+ Electrical Engineering Department
+ University of Maryland
+ College Park, MD 200742
+
+ Phone: 301-405-3729
+ EMail: yjj@eng.umd.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Wei, et al Informational [Page 8]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+10. Appendix: List of Software Implementing HZ Representation
+
+ In the following, we compiled a list on software packages support the
+ HZ Chinese representation method. Though this list is far from
+ complete, it is visible that support for HZ representation has be
+ implemented for major hardware and software platforms. For more
+ information on the listed software packages (and for other
+ information pertain to Chinese computing), please refer to the
+ internet site: ftp://ftp.ifcss.org/pub/software/ or its mirrors at
+ the following sites:
+
+ at Beijing, China: ftp://info.bta.net.cn:/pub/software/;
+ at Shanghai, China: ftp://info.bta.net.cn:/pub/software/;
+ at Taiwan: ftp://nctuccca.edu.tw/pub/Chinese/ifcss/;
+ or ftp://ftp.edu.tw:/Chinese/ifcss/software/;
+ At Singapore: ftp://ftp.technet.sg:/pub/chinese/;
+ at California, U.S.A.: ftp://cnd.org/pub/software/.
+
+ The software in the next section are listed by its name and followed
+ by the current version number, release date (in parenthesis) and the
+ author(s) of the software. A brief description of the functionality
+ of the software starts at the line immediately after the headline and
+ lead by character string "--". Two consecutive packages are separated
+ by a blank line.
+
+ zwdos (V2.2, March 5, 1993) by Wei Ya-Gui
+ -- MS-DOS kernal extension that gives DOS text mode programs the
+ ability to enter, display, manipulate and print 'zW' and HZ
+ Chinese text. Small memory requirement. Supports EGA,
+ VGA or Hercules Monographic displays.
+
+ HZ (V2.0, Feb. 7, 1995) by Fung F. Lee
+ -- Conversion from HZ to GB, GB to HZ, and zW to HZ respectively.
+ Versions for PC, Mac and Unix exist.
+
+ XingXing (V4.2, Mar 29. 1995) by Wang Xiangdong
+ -- chinese word processor for PC.
+
+ NJStar (V3.00, Feb. 10, 1994 by Hongbo Ni)
+ -- GB Word Processor (Viewer, editor, printing, converter)
+ Supports EGA/(mono)VGA/SuperVGA monitors, and various
+ printers, Chinese<->English dictionary lookup, HanziInfo
+ and glossary; Includes more than 20 Chinese input methods
+ with Intelligent LianXiang and fuzzy Pinyin; Speed up with
+ sentence based Pinyin; Reads and writes GB,Hz,zW & Big5 files;
+ DOS Shell; Configurable.
+
+
+
+
+
+Wei, et al Informational [Page 9]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+ QuickStar (V3.0, June 7, 1995) by Anthony Mai
+ -- Compact size Chinese edit software for PC. PinYin, CiZu,
+ WuBi, GuoBiao, ASCII etc input method. Translate to/from GB,
+ HZ and Big5 coded Chinese files.
+
+ cnprint (V2.6, Jan. 25, 95) by Yidao Cai
+ -- print GB/Hz/BIG5/JIS/KSC/UTF8 etc or convert to PostScript
+ (conforms to EPSF-3.0). Both DOS and UNIX version available.
+
+ dm24 (V2.0, Sept. 1993) by Gongquan Chen)
+ -- Chinese GB/HZ printing program for EPSON 24pin printer
+
+ HXLASER (V2.6, Feb. 1994) by Chen, Gongquan
+ -- A GB/HZ/BIG5 file printing program for HP LaserJet plus and
+ later model printers.
+
+ CNVIEW (V3.0, Jan. 1, 1995) by Jifang Lin
+ -- View GB/Hz/Big5 encoded Chinese text file on IBM-PC
+ & compatibles
+
+ ZWLIST (V1.1, Nov. 24, 1993) by Gongquan Chen
+ -- Chinese HZ/GB/BIG5 File Browser for ZWDOS
+
+ zwTool (V1.0, Oct. 30,1993) by Gongquan Chen
+ -- a MSDOS TSR program for input of Chinese characters in text
+ mode; Developed primarily for Chinese programmers using IDE
+ (Integrated Development Environment, like Borland's Turbo
+ languages); Supports GB/HZ; EGA/VGA required;
+
+ DateStar (V1.1) by Youzhen Cheng
+ -- Chinese Calendar Producer. Displays Chinese and western
+ calendar in ASCII code, BIG-5 code, GuoBiao code (PRC
+ Standard), and HZ code (Network)
+
+ MacViewHZ (V2.21 Dec. 93) by Xiaodong Chen
+ -- Display and print GB/HZ or BIG5 coded Chinese text files on
+ Macintosh without Chinese OS system, with easy to use Mac
+ user interface including multiple windows and simple editing
+ features such as delete, copy, cut and paste.
+
+ MacHZTerm (V0.52) by Xin Xu
+ -- a communication program using CommToolBox, capable of
+ displaying GB, HZ, Big5 texts on line. No Chinese OS required.
+ System 7 recommended.
+
+ HanziTerm (V0.5) by Ricky Yeung
+ -- A terminal emulator for Mac Chinese OS 6.0.x or later.
+ Support 8-bit character code, HZ, and zW.
+
+
+
+Wei, et al Informational [Page 10]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+ Tex-Edit-HZ (V1.0, Dec. 18 1993) by Tom Bender and Tie Zeng.
+ -- A MAC WorldScript savvy Text editor with HZ<->GB conversion
+ feature.
+
+
+ MacBlue Telnet (V2.6.6, Feb 16, 1995) by MacBlue
+ -- A Telnet program that can handle all Chinese encodings
+ (such as HZ, GB, Big5, ET etc), EUC-JIS and EUC-KSC; based on
+ NCSA Telnet with built-in hanzi input methods.
+
+ rnMac (V1.3b5) by Roy Wood
+ -- Offline Newsreader including GB <-> HZ conversion
+
+ Weiqi267 (V2.67) by Xiangbo Kang
+ -- record Weiqi games and transfer them through net.
+ GB, HZ 100 % compatible (but Russian char disabled).
+ There is a user guide in HZ coding.
+ * Now can also be used for Chinese Chess.
+
+ TwinBridge (V3.2, Nov. 16, 1994) by Twinbridge Software Corporation
+ -- an interface between Windows and applications, it allows
+ Chinese character processing in Windows applications like
+ Word for Windows, Ami Pro, Excel, etc.
+ You can edit Chinese characters like English characters
+ in most of applications.
+
+ WinHZ (V1.1, April 13, 1995) by Tian Bogang
+ -- HZ extension for Chinese systems for Windows
+
+ HZcomm (V1.5, Nov. 14, 1993) by Nick Ke Ning.
+ -- HZ coding supported communication program under Chinese
+ Windows System (GB internal coded). Good for reading/writing
+ HZ coded E-mail and news(alt.chinese.text) on line in
+ Windows 3.1 for PCs.
+
+ SimpTerm (V0.8.0) by Jianqing Hu
+ -- A Chinese communication program for MS-Windows 3.1
+ with build in support for BIG5, HZ and GB encoded text.
+
+ ChPad (V1.31) by Tian Bogang
+ -- GUO BIAO and HZ file browser for MS WINDOWS 3.1
+
+ SilkRoad (V1.0) by Antony C. Hu
+ -- GB/HZ Viewer for MS-Windows 3.1
+
+ gnus-chinese (V1.0, Apr. 26 1994) by Ning Mosberger-Tang
+ -- convert HZ articles to the code understandable by your
+ terminal automatically in GNUS newsreader (for GNU EMACS).
+
+
+
+Wei, et al Informational [Page 11]
+
+RFC 1842 ASCII/Chinese Character Encoding August 1995
+
+
+ requires conversion program (e.g. hz2gb and gb2hz) to do the
+ actual conversion.
+
+ irchat (V2.4jp4cn0) by HIROSE Tutomu
+ -- irc client e-lisp program on Mule
+ patched to handle HZ and Big5
+ now we can read/write all JIS/HZ/Big5 simultaneously on irc
+
+ hztty (V2.0 Jan 29, 1994) by Yongguang Zhang
+ -- This program turns a tty session from one encoding to another.
+ For example, running hztty on cxterm can allow you to
+ read/write Chinese in HZ format.
+
+ BeTTY/CCF/B5Encode package (V1.534, 1995.03.22) by Jing-Shin Chang
+ -- a chinese code conversion package for codes widely used
+ in Taiwan and the GB code widely used in Mainland, plus
+ a 7-bit Big5 encoding method (B5Encode3/B5E3, an extension
+ to HZ encoding for GB),
+ including off-line converters (CCF/Chinese Code Filters and
+ B5E/B5Encode) and an on-line converter (BeTTY) which simulates
+ your native chinese terminal to become aware of the coding
+ systems widely used in Taiwan and GB, HZ encoding.
+
+ gb2jis & jis2gb (V1.5, 1995.5.11) by Koichi Yasuoka
+ -- convert GB (or HZ) to/from JIS with two-letter pinyin
+
+
+ gb2ps (V2.02) by Wei SUN
+ -- convert GB/HZ to postscript, supports simple page formatting
+ (change chinese fonts and font size, cover page, page
+ number, etc). Five chinese fonts are provided in this
+ release, they are Song, Kai, Fang Song, Hei and FanTi
+ The HZ ENCODING is also supported.
+
+ ChiRK (V1.2a) by Bo Yang
+ -- GB/HZ/BIG5 text viewer on terminals (or emulations) capable
+ of displaying Tektronics 401x graphics, such as GraphOn,DEC
+ VT240/330, Xterm, Tektool on Sun, EM4105 on PC,
+ VersaTerm-Pro on Mac, etc.
+
+ Multi-Localization Enhancement of NCSA Mosaic X 2.4 (V2.4.0)
+ by TAKADA, Toshihiro
+ -- a patch to make use of various nat'l character sets in NCSA
+ Mosaic for X 2.4. You can switch between char-sets in one
+ Mosaic. Support ISO 8859-X, KOI-8, GB, HZ, BIG5, KSC & JIS.
+
+
+
+
+
+
+Wei, et al Informational [Page 12]
+