summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1456.txt
blob: fd02610ad5de7f228da33e95867595c55ff01fff (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
Network Working Group          Vietnamese Standardization Working Group
Request for Comments: 1456                                     May 1993


            Conventions for Encoding the Vietnamese Language
      VISCII: VIetnamese Standard Code for Information Interchange
             VIQR: VIetnamese Quoted-Readable Specification
                              Revision 1.1

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard.  Distribution of this memo is
   unlimited.

Abstract

   This document provides information to the Internet community on the
   currently used conventions for encoding Vietnamese characters into
   7-bit US ASCII and in an 8-bit form.  These conventions are widely
   used by the overseas Vietnamese who are on the Internet and are
   active in USENET.  This document only provides information and
   specifies no level of standard.

1. Introduction

   In this paper we describe two conventions for representing Vietnamese
   characters.  VISCII (pronounced "visky") is an 8-bit character
   encoding that is similar to that used with ISO-8859.  VIQR
   (pronounced "vicker") is a mnemonic encoding of Vietnamese characters
   into US ASCII for use on 7-bit systems.  There is substantial
   existing online freely distributable software that implements these
   conventions for UNIX and personal computers.  These encodings enable
   Vietnamese-language users to take full advantage of powerful tools
   already developed for the English-speaking world, eliminating
   unnecessary reinvention.  This paper describes these conventions in
   part so that MIME-compliant software might also support the
   Vietnamese language.

   NOTE: The accented Vietnamese letters are herein represented by their
   VIQR equivalents, offset by enclosing angle brackets.  For example,
   the single letter "a acute" is written as <a'>, where the apostrophe
   is the mnemonic symbol for the acute.

2. LINGUISTIC OVERVIEW

   As a romanized language, Vietnamese appears to lend itself readily to
   integration into existing English-based systems.  To cite a simple



Vietnamese Standardization Working Group                        [Page 1]
^L
RFC 1456          Conventions for Encoding Vietnamese           May 1993


   example, consider implementing support for French in such systems.
   One can allocate code positions in the 8-bit space necessary for
   accented letters such as <e^> or <e'>, then provide a means for users
   to access these codes through the keyboard.  The required number of
   "extra" code positions is small (see, e.g., ISO-8859/Latin-1 [1]),
   and the relatively low frequency of occurrence of accented letters
   does not place heavy demand on efficient keyboard input schemes.  The
   same things cannot be said for Vietnamese, where both the number and
   occurrence frequency of accented letters are large.  Apart from the
   alphabetics already available in ASCII, Vietnamese requires an
   additional 134 combinations of a letter and diacritical symbols.

   Note that one can resort to a composite encoding scheme to reduce
   this requirement, but that would mean giving up on integration into
   today's computing platforms which for the most part do not support
   such schemes.  In addition, the heavy use of diacritical marks in
   Vietnamese text calls for a keyboard input scheme that does not
   require extra keystrokes such as a special "compose" key to generate
   accented letters.  Because of the large number of possible
   combinations, the scheme should also be easily learned and memorized.

   Finally, to integrate Vietnamese into current electronic mail systems
   which are still limited to 7 bits, there should be a representation
   for Vietnamese text that is readily readable in its 7-bit form.

   The Viet-Std group, an electronic standardization roundtable, has
   worked over the past few years to draft proposals addressing these
   issues.  This has culminated in the conventions to be described
   briefly in the next two sections.  The detailed technical
   considerations have been reported elsewhere [2].  In this memo we
   give a brief outline of the working standards and describe supporting
   software availability.

3. SPECIFICATION OF VISCII

   VISCII stands for VIetnamese Standard Code for Information
   Interchange, an 8-bit encoding specification.  Its salient features
   are:

    1.  Encoding of all Vietnamese letters as single units
        rather than separating base vowels and diacritical
        marks.

    2.  Retention of the complete ASCII graphics repertoire
        in order to facilitate integration.

    3.  Encoding the 6 least-often-used upper-case letters into
        6 least problematic C0 (control) characters.



Vietnamese Standardization Working Group                        [Page 2]
^L
RFC 1456          Conventions for Encoding Vietnamese           May 1993


    4.  Character placement have been designed with
        consideration for Unix/X integration, ISO-8859/Latin-1
        compatibility, coexistence with a wide array of
        existing software, including provisions for single-
        and double-line drawing characters in the IBM graphic
        character set.

   The 8-bit VISCII encoding is shown below.  Because of the limitations
   of the 7-bit US ASCII character set, here we use the mnemonic form to
   represent Vietnamese glyphs.  See the VIQR specification below for
   clarification of how diacritical marks are applied.  The online
   PostScript version of reference [2] may also be useful as it does
   display each character correctly.

               Table 1.  VISCII 8-bit Encoding Table (v1.1)
*=======================================================================*
|    | 0x  1x  2x  3x  4x  5x  6x  7x | 8x  9x  Ax  Bx  Cx  Dx  Ex  Fx  |
|====|==================================================================|
| x0 | nul dle sp  0   @   P   `   p  | A.  O^` O~  o^` A`  DD  a`  dd  |
| x1 | soh dc1 !   1   A   Q   a   q  | A(' O^? a(' o^? A'  u+' a'  u+. |
| x2 | A(? dc2 "   2   B   R   b   r  | A(` O^~ a(` o^~ A^  O`  a^  o`  |
| x3 | etx dc3 #   3   C   S   c   s  | A(. O^. a(. O+~ A~  O'  a~  o'  |
| x4 | eot Y?  $   4   D   T   d   t  | A^' O+. a^' O+  A?  O^  a?  o^  |
| x5 | A(~ nak %   5   E   U   e   u  | A^` O+' a^` o^. A(  a.  a(  o~  |
| x6 | A^~ syn &   6   F   V   f   v  | A^? O+` a^? o+` a(? y?  u+~ o?  |
| x7 | bel etb '   7   G   W   g   w  | A^. O+? a^. o+? a(~ u+` a^~ o.  |
| x8 | bs  can (   8   H   X   h   x  | E~  I.  e~  i.  E`  u+? e`  u.  |
| x9 | ht  Y~  )   9   I   Y   i   y  | E.  O?  e.  U+. E'  U`  e'  u`  |
| xA | lf  sub *   :   J   Z   j   z  | E^' O.  e^' U+' E^  U'  e^  u'  |
| xB | vt  esc +   ;   K   [   k   {  | E^` I?  e^` U+` E?  y~  e?  u~  |
| xC | ff  fs  ,   <   L   \   l   |  | E^? U?  e^? U+? I`  y.  i`  u?  |
| xD | cr  gs  -   =   M   ]   m   }  | E^~ U~  e^~ o+  I'  Y'  i'  y'  |
| xE | so  Y.  .   >   N   ^   n   ~  | E^. U.  e^. o+' I~  o+~ i~  o+. |
| xF | si  us  /   ?   O   _   o   DEL| O^' Y`  o^' U+  y`  u+  i?  U+~ |
*=======================================================================*

4. SPECIFICATION OF VIQR MNEMONICS

   VIQR, VIetnamese Quoted-Readable specification, is not an encoding
   convention but is rather a convention for typing, reading, and
   transferring Vietnamese data using only the 7-bit ASCII character
   set.  With VIQR, accented Vietnamese letters are represented by the
   vowel followed by ASCII characters whose appearances resemble those
   of the corresponding Vietnamese diacritical marks.  For example, the
   phrase "N<u+><o+'>c Vi<e^.>t Nam" is represented in 7-bits by
   "Nu+o+'c Vie^.t Nam".  The complete list of diacritical mark
   equivalents is given in Table 2.  There is also provision in the VIQR
   specification to prevent undesirable composition, for example, to



Vietnamese Standardization Working Group                        [Page 3]
^L
RFC 1456          Conventions for Encoding Vietnamese           May 1993


   avoid getting "How are you?" composed into "How are yo<u?>".  For
   details, please see [2].  VIQR therefore serves the following
   purposes:

  1.  It provides for a mnemonic, readable representation of
      Vietnamese in 7-bit form, which makes it easy to
      transfer Vietnamese electronic mail without special
      conversion.  The originator and recipient can
      communicate in Vietnamese without the need for an
      8-bit environment at any point in the data chain.

  2.  It provides a bridge for translation between 7- and 8-bit
      environments.  In this context, typing in both 7-bit
      and 8-bit systems requires exactly the same keystrokes,
      the only difference is that the 8-bit user gets to see
      actual Vietnamese on-screen, whereas the 7-bit user
      sees a mnemonic representation thereof.  The same
      options are available for the 7-bit and 8-bit recipients
      of Vietnamese text.

   Because of its mnemonic nature, the VIQR typing method is easy to
   learn and remember.  In pure 8-bit environments, special-purpose
   software developers may wish to devise more efficient input schemes,
   but the intent is for all Vietnamese keyboard software to support the
   basic VIQR method to minimize learning time for Vietnamese who will
   already be familiar with the mnemonic method described here.

             Table 2.  VIQR Mnemonics for Vietnamese Diacritics
          *=====================================================*
          | Diacritic   | Char |  ASCII Code        | D<a^'>u   |
          |=====================================================|
          | breve       |  (   |  0x28, left paren  | tr<a(>ng  |
          | circumflex  |  ^   |  0x5E, caret       | m<u~>     |
          | horn        |  +   |  0x2B, plus sign   | m<o'>c    |
          |-------------+------+--------------------+-----------|
          | acute       |  '   |  0x27, apostrophe  | s<a('>c   |
          | grave       |  `   |  0x60, backquote   | huy<e^`>n |
          | hook above  |  ?   |  0x3F, question    | h<o?>i    |
          | tilde       |  ~   |  0x7E, tilde       | ng<a~>    |
          | dot below   |  .   |  0x2E, period      | n<a(.>ng  |
          |-------------+------+--------------------+-----------|
          | d bar       |  dd  |  (repeated d)      | <dd>      |
          | D bar       |  DD  |  (repeated D)      | <DD>      |
          *=====================================================*







Vietnamese Standardization Working Group                        [Page 4]
^L
RFC 1456          Conventions for Encoding Vietnamese           May 1993


5. SUPPORTING SOFTWARE

   VISCII & VIQR have been successfully implemented on various
   platforms.  The work has been carried out primarily by the TriChlor
   software group, a non-profit spin-off from Viet-Std.  Software by
   other individuals and groups have also been developed.  In addition,
   commercial software entities have indicated that they would support
   the standards in the form of VISCII-compliant keyboards and fonts.

   The current software selection from the TriChlor group enables users
   to use Vietnamese on existing Unix, MS-DOS, and Windows systems,
   including such operations as Vietnamese file naming, Vietnamese
   keyboarding within any application, electronic mail and news filters
   for Unix, printing to various printer languages, incorporating
   Vietnamese in such document preparation systems as TeX, Word for
   Windows, WordPerfect, using Vietnamese in databases (e.g., Paradox)
   and spreadsheets (e.g., SC on Unix or Excel in Windows).
   Vietnamese-specific applications are also available and include a
   large song lyric database, several poetry collections in hypertext
   format, a Windows-based fortune teller, a text-based multiple-choice
   test program in Vietnamese, etc.  In short, software exists that
   supports thorough integration of Vietnamese into existing platforms,
   allowing Vietnamese users to take advantage of all the powerful tools
   already available in English-only environments.

   Translation between 8-bit VISCII 1.1 and other character sets,
   particularly ISO-10646/Unicode 1.1, has been included in the Plan 9
   operating systems' tcs utility that has been made available by Andrew
   Hume of AT&T Bell Laboratories.

6. MIME CONSIDERATIONS

   For use with MIME-compliant software, the value "VISCII" has been
   registered as a charset with the Internet Assigned Numbers Authority
   for the VISCII encoding convention described above, and the value
   "VIQR" has been registered with the Internet Assigned Numbers
   Authority as a charset for the VIQR mnemonic encoding convention
   described above.  Implementation of support for these two MIME
   character set types is not mandatory to comply with RFC-1341.  If the
   encoding conventions described above are used in MIME email or news,
   the appropriate MIME character set type value should be used to label
   the body-part containing such text.

7. SECURITY CONSIDERATIONS

   Security issues are not discussed in this memo.





Vietnamese Standardization Working Group                        [Page 5]
^L
RFC 1456          Conventions for Encoding Vietnamese           May 1993


REFERENCES

     [1] International Organization for Standardization. ISO 8859/x: 8-
         bit International Code Sets.  ISO, 1977.

     [2] Viet-Std, "A Unified Framework for Vietnamese Information
         Processing-v1.1," published on the Internet, available for FTP
         from Sonygate.Sony.COM:tin/viet-std, September 1992.











































Vietnamese Standardization Working Group                        [Page 6]
^L
RFC 1456          Conventions for Encoding Vietnamese           May 1993


AUTHORS' ADDRESSES

   Cuong T. Nguyen
   Center for Integrated Systems
   CIS 062--MC 4070
   Stanford, CA 94305-4070

   Phone: (415) 725-3721
   Email: cuong@haydn.Stanford.EDU


   Hoc D. Ngo
   Vista Research, Inc.
   100 View St, Suite 200
   P.O. Box 998
   Mountain View, CA 94042

   Phone: (415) 966-1171
   Email:  uunet!vri280!hoc


   Cuong M. Bui
   National Semiconductor Corp.
   3388 Burgundy Dr.
   San Jose, CA 95132

   Phone: (408) 721-6873
   Email: bui@berlioz.nsc.com


   Thanh van Nguyen
   Roche Image Analysis Systems
   95 First Str Suite 110
   Los Altos, CA 94022

   Phone: 415-917-2022
   Fax:   415-917-2025
   Email: thanh@rias.com

   For more information, please contact the authors at:
   viet-std@haydn.stanford.edu










Vietnamese Standardization Working Group                        [Page 7]
^L