1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
|
Network Working Group Y. Wei
Request for Comments: 1842 AsiaInfo Services Inc.
Category: Informational Y. Zhang
Harvard Univ.
J. Li
Rice Univ.
J. Ding
AsiaInfo Services Inc.
Y. Jiang
Univ. of Maryland
August 1995
ASCII Printable Characters-Based Chinese Character Encoding
for Internet Messages
Status of this Memo
This memo provides information for the Internet community. This memo
does not specify an Internet standard of any kind. Distribution of
this memo is unlimited.
Abstract
This document describes the encoding used in electronic mail [RFC822]
and network news [RFC1036] messages over the Internet. The 7-bit
representation of GB 2312 Chinese text was specified by Fung Fung Lee
of Stanford University [Lee89] and implemented in various software
packages under different platforms (see appendix for a partial list
of the available software packages that support this encoding
method). It is further tested and used in the usenet newsgroups
alt.chinese.text and chinese.* as well as various other network
forums with considerable success. Future extensions of this encoding
method can accommodate additional GB character sets and other east
asian language character sets [Wei94].
The name given to this encoding is "HZ-GB-2312", which is intended to
be used in the "charset" parameter field of MIME headers (see [MIME1]
and [MIME2]).
Wei, et al Informational [Page 1]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
Table of Contents
1. Introduction................................................ 2
2. Description................................................. 3
3. Formal Syntax............................................... 4
4. MIME Considerations......................................... 5
5. Background Information...................................... 5
6. References.................................................. 6
7. Acknowledgements............................................ 6
8. Security Considerations..................................... 7
9. Authors' Addresses.......................................... 7
10. Appendix: List of Software Implementing HZ Representation... 9
1. Introduction
Chinese (and other east Asia languages) characters are encoded with
multiple bytes to guarantee sufficient coding space for the large
number of glyphs these languages contain. With the prolification of
internetwork traffic around the world, it becomes necessary to define
ways to facilitate the transfer of text in multiple-byte character-
set languages (hereafter as Chinese text) over internet.
There are two layers of concerns need to be addressed by any
mechanism whose purpose is to transfer Chinese text over internet.
The first is on application layer, in which concerned applications
should be able to recognize the encoding of the text and/or discern
different character sets which might be mixed in the text and handle
it accordingly. The second layer is the actual transport of Chinese
text between point A to point B over the Internet. Because the
prevailing mail transport protocol used over internet, the Simple
Mail Transport Protocol (aka. SMTP) was designed originally for ASCII
character set only, many internet mail agents are not 8 bit clean and
therefore introduce challenges for any attempt to actually implement
a mechanism for the transport of Chinese text over internet.
Here we describe a mechanism for transmission of Chinese text over IP
network. This described mechanism has being implemented by various
software package dealing with multi-language support and has been
tested on USENET newsgroups and other types of internet forums over
the last two years. The test results shows that the HZ representation
can pass through almost all existing mail delivery agents without
being corrupted. The HZ representation currently handles GB2312-80
Chinese character set only. Further expansion to other Chinese
encoding systems and to other East Asia Language is under
consideration.
Wei, et al Informational [Page 2]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
2. Description
For an arbitrary mixed text with both Chinese coded text strings and
ASCII text strings, we designate to two distinguishable text modes,
ASCII mode and HZ mode, as the only two states allowed in the text.
At any given time, the text is in either one of these two modes or in
the transition from one to the other. In the HZ mode, only printable
ASCII characters (0x21-0x7E) are meanful with the size of basic text
unit being two bytes long.
In the ASCII mode, the size of basic text unit is one (1) byte with
the exception '~~', which is the special sequence representing the
ASCII character '~'. In both ASCII mode and HZ mode, '~' leads an
escape sequence. However, as HZ mode has basic size of text unit
being 2 bytes long, only the '~' character which appears at the first
byte of the the two-byte character frame are considered as the start
of an escape sequence.
The default mode is ASCII mode. Each line of text starts with the
default ASCII mode. Therefore, all Chinese character strings are to
be enclosed with '~{' and '~}' pair in the same text line.
The escape sequences defined are as the following:
~{ ---- escape from ASCII mode to GB2312 HZ mode
~} ---- escape from HZ mode to ASCII mode
~~ ---- ASCII character '~' in ASCII mode
~\n ---- line continuation in ASCII mode
~[!-z|] ---- reserved for future HZ mode character sets
A few examples of the 7 bit representation of Chinese GB coded test
taken directly from [Lee89] are listed as the following:
Example 1: (Suppose there is no line size limit.)
This sentence is in ASCII.
The next sentence is in GB.~{<:Ky2;S{#,NpJ)l6HK!#~}Bye.
Example 2: (Suppose the maximum line size is 42.)
This sentence is in ASCII.
The next sentence is in GB.~{<:Ky2;S{#,~}~
~{NpJ)l6HK!#~}Bye.
Example 3: (Suppose a new line is started for every mode switch.)
This sentence is in ASCII.
The next sentence is in GB.~
~{<:Ky2;S{#,NpJ)l6HK!#~}~
Bye.
Wei, et al Informational [Page 3]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
3. Formal Syntax
The notational conventions used here are identical to those used in
RFC 822 [RFC822].
The * (asterisk) convention is as follows:
l*m something
meaning at least l and at most m somethings, with l and m taking
default values of 0 and infinity, respectively.
message = headers 1*( CRLF *single-byte-char *segment
single-byte-seq *single-byte-char )
; see also [MIME1] "body-part"
; note: must end in ASCII
headers = <see [RFC822] "fields" and [MIME1] "body-part">
segment = single-byte-segment / double-byte-segment
single-byte-segment = 1*single-byte-char
double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )
single-byte-seq = "~}"
double-byte-seq = "~{"
CRLF = CR LF
; ( Octal, Decimal.)
CR = <ASCII CR, carriage return>; ( 15, 13.)
LF = <ASCII LF, linefeed> ; ( 12, 10.)
one-of-94 = <any one of 94 values> ; (41-176, 33.-126.)
single-byte-char = <any 7BIT, including bare CR & bare LF, but NOT
including CRLF, not including > / "~~">;
7BIT = <any 7-bit value> ; ( 0-177, 0.-127.)
Wei, et al Informational [Page 4]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
4. MIME Considerations
The name given to the HZ character encoding is "HZ-GB-2312". This
name is intended to be used in MIME messages as follows:
Content-Type: text/plain; charset=HZ-GB-2312
The HZ-GB-2312 encoding is already in 7-bit form, so it is not
necessary to use a Content-Transfer-Encoding header.
5. Background Information
A GB code is a two byte character withe the first byte is in the
range of 0x21-0x77 and the second byte in the range 0x21-0x7E. As the
printable ASCII subset of characters are single byte character in the
range of 0x21--0x7E, two printable ASCII characters can represent a
two byte GB coded Chinese character if proper escape sequence is used
to indicate the proper text mode. This form the base of the above
described HZ 7-bit representation methods. Further, with the use of a
printable ASCII character, '~', as the leading byte of the escape
sequence, the HZ representation eliminated the need of reserving any
non-printable ASCII characters, which are commonly used by
application programs (as well as system environment) for various
control function or other special signaling. Therefore, the HZ
representation method described here posses the least probability of
interfering with the host and network environment. This is also a
convenient for application for implementing the HZ coding method.
HZ representation method has been implemented in various Chinese
software across computer hardware platforms. It has also being tested
for more than two years over USENET newsgroups, alt.chinese.text and
chinese.*, for the transmission of Chinese texts over the internet.
The original points of those transferred Chinese texts are
geographically scattered around the world and under the constraints
of vast different system and network environments. Therefore, such a
test group may well represent a rather complete sample of the real
internet world. The successful test of the HZ representation method
therefore builds up the confidence that it is well suited for
transmitting multi-byte text messages over the internet.
Under HZ representation, ASCII text remain as 7-bit characters and
therefore HZ representation together with the 7-bit ASCII character
set can be viewed as forming a superset of characters.
Wei, et al Informational [Page 5]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
6. References
[ASCII] American National Standards Institute, "Coded character set
-- 7-bit American national standard code for information
interchange", ANSI X3.4-1986.
[GB 2312] Technical Administrative Bureau of P.R.China, "Coding of
Chinese Ideogram Set for Information Interchange Basic Set",
GB 2312-80.
[Lee89] Lee, F., "HZ - A Data Format for Exchanging Files of
Arbitrarily Mixed Chinese and ASCII characters", RFC 1843,
Stanford University, August 1995.
[MIME1] Borenstein N., and N. Freed, "MIME (Multipurpose Internet
Mail Extensions) Part One: Mechanisms for Specifying and Describing
the Format of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
September 1993.
[MIME2] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
Part Two: Message Header Extensions for Non-ASCII Text", RFC 1522,
University of Tennessee, September 1993.
[RFC822] Crocker, D., "Standard for the Format of ARPA Internet
Text Messages", STD 11, RFC 822, UDEL, August 1982.
[RFC1036] Horton M., and R. Adams, "Standard for Interchange of
USENET Messages", RFC 1036, AT&T Bell Laboratories, Center for
Seismic Studies, December 1987.
[Wei94] Wei, Yagui, "A Proposal for a Consolidated Collection of
East Asian Language Coding Standards Using Solely ASCII Printable
Characters", June 30, 1994.
7. Acknowledgements
Many people have involved the design and specification of the HZ 7-
bit Chinese representation system at different stages. Most notable
among them are Ed Lai, Chunqing Cheng, Fung Fung Lee, and Ricky
Yeung. This document is merely a recollection of thoughts and efforts
made collectively by this group of people whose devotion has led to
the current success of the HZ Chinese representation over the
Internet. Further, the authors wish to thank AsiaInfo Services Inc.
for sponsoring the preparation of this document and for facilitate
the communication need to refine this document.
Wei, et al Informational [Page 6]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
8. Security Considerations
Security issues are not discussed in this memo.
9. Authors' Addresses
Ya-Gui Wei
AsiaInfo Services Inc.
One Galleria Tower
13355 Noel Rd. Suite 1340
Dallas, TX 75240
Phone: (214) 788-4141
Fax: (214) 788-0729
EMail: HZRFC@usai.asiainfo.com
Yun Fei Zhang
CfA
Harvard University
MS 66
60 Garden St.
Cambridge, MA 02138
Phone: (617)-860-9444
EMail: zhang@orion.harvard.edu
Jian Q. Li
Rice University
ONS - MS 119
P.O. Box 1892
Houston, Texas 77251-1892
Phone: (713)285-5328
EMail: jian@is.rice.edu
Wei, et al Informational [Page 7]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
Jian Ding
ISTIC Bldg, Room 431
15 Fuxing Road,
Beijing, China 100038
Phone: 86 10 853-7120
Fax: 86 10 853-7123
EMail: ding@Beijing.AsiaInfo.com
Yuan Jiang
Electrical Engineering Department
University of Maryland
College Park, MD 200742
Phone: 301-405-3729
EMail: yjj@eng.umd.edu
Wei, et al Informational [Page 8]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
10. Appendix: List of Software Implementing HZ Representation
In the following, we compiled a list on software packages support the
HZ Chinese representation method. Though this list is far from
complete, it is visible that support for HZ representation has be
implemented for major hardware and software platforms. For more
information on the listed software packages (and for other
information pertain to Chinese computing), please refer to the
internet site: ftp://ftp.ifcss.org/pub/software/ or its mirrors at
the following sites:
at Beijing, China: ftp://info.bta.net.cn:/pub/software/;
at Shanghai, China: ftp://info.bta.net.cn:/pub/software/;
at Taiwan: ftp://nctuccca.edu.tw/pub/Chinese/ifcss/;
or ftp://ftp.edu.tw:/Chinese/ifcss/software/;
At Singapore: ftp://ftp.technet.sg:/pub/chinese/;
at California, U.S.A.: ftp://cnd.org/pub/software/.
The software in the next section are listed by its name and followed
by the current version number, release date (in parenthesis) and the
author(s) of the software. A brief description of the functionality
of the software starts at the line immediately after the headline and
lead by character string "--". Two consecutive packages are separated
by a blank line.
zwdos (V2.2, March 5, 1993) by Wei Ya-Gui
-- MS-DOS kernal extension that gives DOS text mode programs the
ability to enter, display, manipulate and print 'zW' and HZ
Chinese text. Small memory requirement. Supports EGA,
VGA or Hercules Monographic displays.
HZ (V2.0, Feb. 7, 1995) by Fung F. Lee
-- Conversion from HZ to GB, GB to HZ, and zW to HZ respectively.
Versions for PC, Mac and Unix exist.
XingXing (V4.2, Mar 29. 1995) by Wang Xiangdong
-- chinese word processor for PC.
NJStar (V3.00, Feb. 10, 1994 by Hongbo Ni)
-- GB Word Processor (Viewer, editor, printing, converter)
Supports EGA/(mono)VGA/SuperVGA monitors, and various
printers, Chinese<->English dictionary lookup, HanziInfo
and glossary; Includes more than 20 Chinese input methods
with Intelligent LianXiang and fuzzy Pinyin; Speed up with
sentence based Pinyin; Reads and writes GB,Hz,zW & Big5 files;
DOS Shell; Configurable.
Wei, et al Informational [Page 9]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
QuickStar (V3.0, June 7, 1995) by Anthony Mai
-- Compact size Chinese edit software for PC. PinYin, CiZu,
WuBi, GuoBiao, ASCII etc input method. Translate to/from GB,
HZ and Big5 coded Chinese files.
cnprint (V2.6, Jan. 25, 95) by Yidao Cai
-- print GB/Hz/BIG5/JIS/KSC/UTF8 etc or convert to PostScript
(conforms to EPSF-3.0). Both DOS and UNIX version available.
dm24 (V2.0, Sept. 1993) by Gongquan Chen)
-- Chinese GB/HZ printing program for EPSON 24pin printer
HXLASER (V2.6, Feb. 1994) by Chen, Gongquan
-- A GB/HZ/BIG5 file printing program for HP LaserJet plus and
later model printers.
CNVIEW (V3.0, Jan. 1, 1995) by Jifang Lin
-- View GB/Hz/Big5 encoded Chinese text file on IBM-PC
& compatibles
ZWLIST (V1.1, Nov. 24, 1993) by Gongquan Chen
-- Chinese HZ/GB/BIG5 File Browser for ZWDOS
zwTool (V1.0, Oct. 30,1993) by Gongquan Chen
-- a MSDOS TSR program for input of Chinese characters in text
mode; Developed primarily for Chinese programmers using IDE
(Integrated Development Environment, like Borland's Turbo
languages); Supports GB/HZ; EGA/VGA required;
DateStar (V1.1) by Youzhen Cheng
-- Chinese Calendar Producer. Displays Chinese and western
calendar in ASCII code, BIG-5 code, GuoBiao code (PRC
Standard), and HZ code (Network)
MacViewHZ (V2.21 Dec. 93) by Xiaodong Chen
-- Display and print GB/HZ or BIG5 coded Chinese text files on
Macintosh without Chinese OS system, with easy to use Mac
user interface including multiple windows and simple editing
features such as delete, copy, cut and paste.
MacHZTerm (V0.52) by Xin Xu
-- a communication program using CommToolBox, capable of
displaying GB, HZ, Big5 texts on line. No Chinese OS required.
System 7 recommended.
HanziTerm (V0.5) by Ricky Yeung
-- A terminal emulator for Mac Chinese OS 6.0.x or later.
Support 8-bit character code, HZ, and zW.
Wei, et al Informational [Page 10]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
Tex-Edit-HZ (V1.0, Dec. 18 1993) by Tom Bender and Tie Zeng.
-- A MAC WorldScript savvy Text editor with HZ<->GB conversion
feature.
MacBlue Telnet (V2.6.6, Feb 16, 1995) by MacBlue
-- A Telnet program that can handle all Chinese encodings
(such as HZ, GB, Big5, ET etc), EUC-JIS and EUC-KSC; based on
NCSA Telnet with built-in hanzi input methods.
rnMac (V1.3b5) by Roy Wood
-- Offline Newsreader including GB <-> HZ conversion
Weiqi267 (V2.67) by Xiangbo Kang
-- record Weiqi games and transfer them through net.
GB, HZ 100 % compatible (but Russian char disabled).
There is a user guide in HZ coding.
* Now can also be used for Chinese Chess.
TwinBridge (V3.2, Nov. 16, 1994) by Twinbridge Software Corporation
-- an interface between Windows and applications, it allows
Chinese character processing in Windows applications like
Word for Windows, Ami Pro, Excel, etc.
You can edit Chinese characters like English characters
in most of applications.
WinHZ (V1.1, April 13, 1995) by Tian Bogang
-- HZ extension for Chinese systems for Windows
HZcomm (V1.5, Nov. 14, 1993) by Nick Ke Ning.
-- HZ coding supported communication program under Chinese
Windows System (GB internal coded). Good for reading/writing
HZ coded E-mail and news(alt.chinese.text) on line in
Windows 3.1 for PCs.
SimpTerm (V0.8.0) by Jianqing Hu
-- A Chinese communication program for MS-Windows 3.1
with build in support for BIG5, HZ and GB encoded text.
ChPad (V1.31) by Tian Bogang
-- GUO BIAO and HZ file browser for MS WINDOWS 3.1
SilkRoad (V1.0) by Antony C. Hu
-- GB/HZ Viewer for MS-Windows 3.1
gnus-chinese (V1.0, Apr. 26 1994) by Ning Mosberger-Tang
-- convert HZ articles to the code understandable by your
terminal automatically in GNUS newsreader (for GNU EMACS).
Wei, et al Informational [Page 11]
^L
RFC 1842 ASCII/Chinese Character Encoding August 1995
requires conversion program (e.g. hz2gb and gb2hz) to do the
actual conversion.
irchat (V2.4jp4cn0) by HIROSE Tutomu
-- irc client e-lisp program on Mule
patched to handle HZ and Big5
now we can read/write all JIS/HZ/Big5 simultaneously on irc
hztty (V2.0 Jan 29, 1994) by Yongguang Zhang
-- This program turns a tty session from one encoding to another.
For example, running hztty on cxterm can allow you to
read/write Chinese in HZ format.
BeTTY/CCF/B5Encode package (V1.534, 1995.03.22) by Jing-Shin Chang
-- a chinese code conversion package for codes widely used
in Taiwan and the GB code widely used in Mainland, plus
a 7-bit Big5 encoding method (B5Encode3/B5E3, an extension
to HZ encoding for GB),
including off-line converters (CCF/Chinese Code Filters and
B5E/B5Encode) and an on-line converter (BeTTY) which simulates
your native chinese terminal to become aware of the coding
systems widely used in Taiwan and GB, HZ encoding.
gb2jis & jis2gb (V1.5, 1995.5.11) by Koichi Yasuoka
-- convert GB (or HZ) to/from JIS with two-letter pinyin
gb2ps (V2.02) by Wei SUN
-- convert GB/HZ to postscript, supports simple page formatting
(change chinese fonts and font size, cover page, page
number, etc). Five chinese fonts are provided in this
release, they are Song, Kai, Fang Song, Hei and FanTi
The HZ ENCODING is also supported.
ChiRK (V1.2a) by Bo Yang
-- GB/HZ/BIG5 text viewer on terminals (or emulations) capable
of displaying Tektronics 401x graphics, such as GraphOn,DEC
VT240/330, Xterm, Tektool on Sun, EM4105 on PC,
VersaTerm-Pro on Mac, etc.
Multi-Localization Enhancement of NCSA Mosaic X 2.4 (V2.4.0)
by TAKADA, Toshihiro
-- a patch to make use of various nat'l character sets in NCSA
Mosaic for X 2.4. You can switch between char-sets in one
Mosaic. Support ISO 8859-X, KOI-8, GB, HZ, BIG5, KSC & JIS.
Wei, et al Informational [Page 12]
^L
|