summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1456.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1456.txt')
-rw-r--r--doc/rfc/rfc1456.txt395
1 files changed, 395 insertions, 0 deletions
diff --git a/doc/rfc/rfc1456.txt b/doc/rfc/rfc1456.txt
new file mode 100644
index 0000000..fd02610
--- /dev/null
+++ b/doc/rfc/rfc1456.txt
@@ -0,0 +1,395 @@
+
+
+
+
+
+
+Network Working Group Vietnamese Standardization Working Group
+Request for Comments: 1456 May 1993
+
+
+ Conventions for Encoding the Vietnamese Language
+ VISCII: VIetnamese Standard Code for Information Interchange
+ VIQR: VIetnamese Quoted-Readable Specification
+ Revision 1.1
+
+Status of this Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard. Distribution of this memo is
+ unlimited.
+
+Abstract
+
+ This document provides information to the Internet community on the
+ currently used conventions for encoding Vietnamese characters into
+ 7-bit US ASCII and in an 8-bit form. These conventions are widely
+ used by the overseas Vietnamese who are on the Internet and are
+ active in USENET. This document only provides information and
+ specifies no level of standard.
+
+1. Introduction
+
+ In this paper we describe two conventions for representing Vietnamese
+ characters. VISCII (pronounced "visky") is an 8-bit character
+ encoding that is similar to that used with ISO-8859. VIQR
+ (pronounced "vicker") is a mnemonic encoding of Vietnamese characters
+ into US ASCII for use on 7-bit systems. There is substantial
+ existing online freely distributable software that implements these
+ conventions for UNIX and personal computers. These encodings enable
+ Vietnamese-language users to take full advantage of powerful tools
+ already developed for the English-speaking world, eliminating
+ unnecessary reinvention. This paper describes these conventions in
+ part so that MIME-compliant software might also support the
+ Vietnamese language.
+
+ NOTE: The accented Vietnamese letters are herein represented by their
+ VIQR equivalents, offset by enclosing angle brackets. For example,
+ the single letter "a acute" is written as <a'>, where the apostrophe
+ is the mnemonic symbol for the acute.
+
+2. LINGUISTIC OVERVIEW
+
+ As a romanized language, Vietnamese appears to lend itself readily to
+ integration into existing English-based systems. To cite a simple
+
+
+
+Vietnamese Standardization Working Group [Page 1]
+
+RFC 1456 Conventions for Encoding Vietnamese May 1993
+
+
+ example, consider implementing support for French in such systems.
+ One can allocate code positions in the 8-bit space necessary for
+ accented letters such as <e^> or <e'>, then provide a means for users
+ to access these codes through the keyboard. The required number of
+ "extra" code positions is small (see, e.g., ISO-8859/Latin-1 [1]),
+ and the relatively low frequency of occurrence of accented letters
+ does not place heavy demand on efficient keyboard input schemes. The
+ same things cannot be said for Vietnamese, where both the number and
+ occurrence frequency of accented letters are large. Apart from the
+ alphabetics already available in ASCII, Vietnamese requires an
+ additional 134 combinations of a letter and diacritical symbols.
+
+ Note that one can resort to a composite encoding scheme to reduce
+ this requirement, but that would mean giving up on integration into
+ today's computing platforms which for the most part do not support
+ such schemes. In addition, the heavy use of diacritical marks in
+ Vietnamese text calls for a keyboard input scheme that does not
+ require extra keystrokes such as a special "compose" key to generate
+ accented letters. Because of the large number of possible
+ combinations, the scheme should also be easily learned and memorized.
+
+ Finally, to integrate Vietnamese into current electronic mail systems
+ which are still limited to 7 bits, there should be a representation
+ for Vietnamese text that is readily readable in its 7-bit form.
+
+ The Viet-Std group, an electronic standardization roundtable, has
+ worked over the past few years to draft proposals addressing these
+ issues. This has culminated in the conventions to be described
+ briefly in the next two sections. The detailed technical
+ considerations have been reported elsewhere [2]. In this memo we
+ give a brief outline of the working standards and describe supporting
+ software availability.
+
+3. SPECIFICATION OF VISCII
+
+ VISCII stands for VIetnamese Standard Code for Information
+ Interchange, an 8-bit encoding specification. Its salient features
+ are:
+
+ 1. Encoding of all Vietnamese letters as single units
+ rather than separating base vowels and diacritical
+ marks.
+
+ 2. Retention of the complete ASCII graphics repertoire
+ in order to facilitate integration.
+
+ 3. Encoding the 6 least-often-used upper-case letters into
+ 6 least problematic C0 (control) characters.
+
+
+
+Vietnamese Standardization Working Group [Page 2]
+
+RFC 1456 Conventions for Encoding Vietnamese May 1993
+
+
+ 4. Character placement have been designed with
+ consideration for Unix/X integration, ISO-8859/Latin-1
+ compatibility, coexistence with a wide array of
+ existing software, including provisions for single-
+ and double-line drawing characters in the IBM graphic
+ character set.
+
+ The 8-bit VISCII encoding is shown below. Because of the limitations
+ of the 7-bit US ASCII character set, here we use the mnemonic form to
+ represent Vietnamese glyphs. See the VIQR specification below for
+ clarification of how diacritical marks are applied. The online
+ PostScript version of reference [2] may also be useful as it does
+ display each character correctly.
+
+ Table 1. VISCII 8-bit Encoding Table (v1.1)
+*=======================================================================*
+| | 0x 1x 2x 3x 4x 5x 6x 7x | 8x 9x Ax Bx Cx Dx Ex Fx |
+|====|==================================================================|
+| x0 | nul dle sp 0 @ P ` p | A. O^` O~ o^` A` DD a` dd |
+| x1 | soh dc1 ! 1 A Q a q | A(' O^? a(' o^? A' u+' a' u+. |
+| x2 | A(? dc2 " 2 B R b r | A(` O^~ a(` o^~ A^ O` a^ o` |
+| x3 | etx dc3 # 3 C S c s | A(. O^. a(. O+~ A~ O' a~ o' |
+| x4 | eot Y? $ 4 D T d t | A^' O+. a^' O+ A? O^ a? o^ |
+| x5 | A(~ nak % 5 E U e u | A^` O+' a^` o^. A( a. a( o~ |
+| x6 | A^~ syn & 6 F V f v | A^? O+` a^? o+` a(? y? u+~ o? |
+| x7 | bel etb ' 7 G W g w | A^. O+? a^. o+? a(~ u+` a^~ o. |
+| x8 | bs can ( 8 H X h x | E~ I. e~ i. E` u+? e` u. |
+| x9 | ht Y~ ) 9 I Y i y | E. O? e. U+. E' U` e' u` |
+| xA | lf sub * : J Z j z | E^' O. e^' U+' E^ U' e^ u' |
+| xB | vt esc + ; K [ k { | E^` I? e^` U+` E? y~ e? u~ |
+| xC | ff fs , < L \ l | | E^? U? e^? U+? I` y. i` u? |
+| xD | cr gs - = M ] m } | E^~ U~ e^~ o+ I' Y' i' y' |
+| xE | so Y. . > N ^ n ~ | E^. U. e^. o+' I~ o+~ i~ o+. |
+| xF | si us / ? O _ o DEL| O^' Y` o^' U+ y` u+ i? U+~ |
+*=======================================================================*
+
+4. SPECIFICATION OF VIQR MNEMONICS
+
+ VIQR, VIetnamese Quoted-Readable specification, is not an encoding
+ convention but is rather a convention for typing, reading, and
+ transferring Vietnamese data using only the 7-bit ASCII character
+ set. With VIQR, accented Vietnamese letters are represented by the
+ vowel followed by ASCII characters whose appearances resemble those
+ of the corresponding Vietnamese diacritical marks. For example, the
+ phrase "N<u+><o+'>c Vi<e^.>t Nam" is represented in 7-bits by
+ "Nu+o+'c Vie^.t Nam". The complete list of diacritical mark
+ equivalents is given in Table 2. There is also provision in the VIQR
+ specification to prevent undesirable composition, for example, to
+
+
+
+Vietnamese Standardization Working Group [Page 3]
+
+RFC 1456 Conventions for Encoding Vietnamese May 1993
+
+
+ avoid getting "How are you?" composed into "How are yo<u?>". For
+ details, please see [2]. VIQR therefore serves the following
+ purposes:
+
+ 1. It provides for a mnemonic, readable representation of
+ Vietnamese in 7-bit form, which makes it easy to
+ transfer Vietnamese electronic mail without special
+ conversion. The originator and recipient can
+ communicate in Vietnamese without the need for an
+ 8-bit environment at any point in the data chain.
+
+ 2. It provides a bridge for translation between 7- and 8-bit
+ environments. In this context, typing in both 7-bit
+ and 8-bit systems requires exactly the same keystrokes,
+ the only difference is that the 8-bit user gets to see
+ actual Vietnamese on-screen, whereas the 7-bit user
+ sees a mnemonic representation thereof. The same
+ options are available for the 7-bit and 8-bit recipients
+ of Vietnamese text.
+
+ Because of its mnemonic nature, the VIQR typing method is easy to
+ learn and remember. In pure 8-bit environments, special-purpose
+ software developers may wish to devise more efficient input schemes,
+ but the intent is for all Vietnamese keyboard software to support the
+ basic VIQR method to minimize learning time for Vietnamese who will
+ already be familiar with the mnemonic method described here.
+
+ Table 2. VIQR Mnemonics for Vietnamese Diacritics
+ *=====================================================*
+ | Diacritic | Char | ASCII Code | D<a^'>u |
+ |=====================================================|
+ | breve | ( | 0x28, left paren | tr<a(>ng |
+ | circumflex | ^ | 0x5E, caret | m<u~> |
+ | horn | + | 0x2B, plus sign | m<o'>c |
+ |-------------+------+--------------------+-----------|
+ | acute | ' | 0x27, apostrophe | s<a('>c |
+ | grave | ` | 0x60, backquote | huy<e^`>n |
+ | hook above | ? | 0x3F, question | h<o?>i |
+ | tilde | ~ | 0x7E, tilde | ng<a~> |
+ | dot below | . | 0x2E, period | n<a(.>ng |
+ |-------------+------+--------------------+-----------|
+ | d bar | dd | (repeated d) | <dd> |
+ | D bar | DD | (repeated D) | <DD> |
+ *=====================================================*
+
+
+
+
+
+
+
+Vietnamese Standardization Working Group [Page 4]
+
+RFC 1456 Conventions for Encoding Vietnamese May 1993
+
+
+5. SUPPORTING SOFTWARE
+
+ VISCII & VIQR have been successfully implemented on various
+ platforms. The work has been carried out primarily by the TriChlor
+ software group, a non-profit spin-off from Viet-Std. Software by
+ other individuals and groups have also been developed. In addition,
+ commercial software entities have indicated that they would support
+ the standards in the form of VISCII-compliant keyboards and fonts.
+
+ The current software selection from the TriChlor group enables users
+ to use Vietnamese on existing Unix, MS-DOS, and Windows systems,
+ including such operations as Vietnamese file naming, Vietnamese
+ keyboarding within any application, electronic mail and news filters
+ for Unix, printing to various printer languages, incorporating
+ Vietnamese in such document preparation systems as TeX, Word for
+ Windows, WordPerfect, using Vietnamese in databases (e.g., Paradox)
+ and spreadsheets (e.g., SC on Unix or Excel in Windows).
+ Vietnamese-specific applications are also available and include a
+ large song lyric database, several poetry collections in hypertext
+ format, a Windows-based fortune teller, a text-based multiple-choice
+ test program in Vietnamese, etc. In short, software exists that
+ supports thorough integration of Vietnamese into existing platforms,
+ allowing Vietnamese users to take advantage of all the powerful tools
+ already available in English-only environments.
+
+ Translation between 8-bit VISCII 1.1 and other character sets,
+ particularly ISO-10646/Unicode 1.1, has been included in the Plan 9
+ operating systems' tcs utility that has been made available by Andrew
+ Hume of AT&T Bell Laboratories.
+
+6. MIME CONSIDERATIONS
+
+ For use with MIME-compliant software, the value "VISCII" has been
+ registered as a charset with the Internet Assigned Numbers Authority
+ for the VISCII encoding convention described above, and the value
+ "VIQR" has been registered with the Internet Assigned Numbers
+ Authority as a charset for the VIQR mnemonic encoding convention
+ described above. Implementation of support for these two MIME
+ character set types is not mandatory to comply with RFC-1341. If the
+ encoding conventions described above are used in MIME email or news,
+ the appropriate MIME character set type value should be used to label
+ the body-part containing such text.
+
+7. SECURITY CONSIDERATIONS
+
+ Security issues are not discussed in this memo.
+
+
+
+
+
+Vietnamese Standardization Working Group [Page 5]
+
+RFC 1456 Conventions for Encoding Vietnamese May 1993
+
+
+REFERENCES
+
+ [1] International Organization for Standardization. ISO 8859/x: 8-
+ bit International Code Sets. ISO, 1977.
+
+ [2] Viet-Std, "A Unified Framework for Vietnamese Information
+ Processing-v1.1," published on the Internet, available for FTP
+ from Sonygate.Sony.COM:tin/viet-std, September 1992.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Vietnamese Standardization Working Group [Page 6]
+
+RFC 1456 Conventions for Encoding Vietnamese May 1993
+
+
+AUTHORS' ADDRESSES
+
+ Cuong T. Nguyen
+ Center for Integrated Systems
+ CIS 062--MC 4070
+ Stanford, CA 94305-4070
+
+ Phone: (415) 725-3721
+ Email: cuong@haydn.Stanford.EDU
+
+
+ Hoc D. Ngo
+ Vista Research, Inc.
+ 100 View St, Suite 200
+ P.O. Box 998
+ Mountain View, CA 94042
+
+ Phone: (415) 966-1171
+ Email: uunet!vri280!hoc
+
+
+ Cuong M. Bui
+ National Semiconductor Corp.
+ 3388 Burgundy Dr.
+ San Jose, CA 95132
+
+ Phone: (408) 721-6873
+ Email: bui@berlioz.nsc.com
+
+
+ Thanh van Nguyen
+ Roche Image Analysis Systems
+ 95 First Str Suite 110
+ Los Altos, CA 94022
+
+ Phone: 415-917-2022
+ Fax: 415-917-2025
+ Email: thanh@rias.com
+
+ For more information, please contact the authors at:
+ viet-std@haydn.stanford.edu
+
+
+
+
+
+
+
+
+
+
+Vietnamese Standardization Working Group [Page 7]
+ \ No newline at end of file