1 files changed, 787 insertions, 0 deletions
diff --git a/doc/rfc/rfc1642.txt b/doc/rfc/rfc1642.txt
new file mode 100644
index 0000000..5fefec0
--- /dev/null
+++ b/doc/rfc/rfc1642.txt
@@ -0,0 +1,787 @@
+
+
+
+
+
+
+Network Working Group                                       D. Goldsmith
+Request for Comments: 1642                                      M. Davis
+Category: Experimental                                    Taligent, Inc.
+                                                               July 1994
+
+
+                                 UTF-7
+
+
+              A Mail-Safe Transformation Format of Unicode
+
+Status of this Memo
+
+   This memo defines an Experimental Protocol for the Internet
+   community.  This memo does not specify an Internet standard of any
+   kind.  Distribution of this memo is unlimited.
+
+Abstract
+
+   The Unicode Standard, version 1.1, and ISO/IEC 10646-1:1993(E)
+   jointly define a 16 bit character set (hereafter referred to as
+   Unicode) which encompasses most of the world's writing systems.
+   However, Internet mail (STD 11, RFC 822) currently supports only 7-
+   bit US ASCII as a character set. MIME (RFC 1521 and RFC 1522) extends
+   Internet mail to support different media types and character sets,
+   and thus could support Unicode in mail messages. MIME neither defines
+   Unicode as a permitted character set nor specifies how it would be
+   encoded, although it does provide for the registration of additional
+   character sets over time.
+
+   This document describes a new transformation format of Unicode that
+   contains only 7-bit ASCII characters and is intended to be readable
+   by humans in the limiting case that the document consists of
+   characters from the US-ASCII repertoire. It also specifies how this
+   transformation format is used in the context of RFC 1521, RFC 1522,
+   and the document "Using Unicode with MIME".
+
+Motivation
+
+   Although other transformation formats of Unicode exist and could
+   conceivably be used in this context (most notably UTF-1 and UTF-8,
+   also known as UTF-2 or UTF-FSS), they suffer the disadvantage that
+   they use octets in the range decimal 128 through 255 to encode
+   Unicode characters outside the US-ASCII range. Thus, in the context
+   of mail, those octets must themselves be encoded. This requires
+   putting text through two successive encoding processes, and leads to
+   a significant expansion of characters outside the US-ASCII range,
+   putting non-English speakers at a disadvantage. For example, using
+
+
+
+Goldsmith & Davis                                               [Page 1]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+   UTF-FSS together with the Quoted-Printable content transfer encoding
+   of MIME represents US-ASCII characters in one octet, but other
+   characters may require up to nine octets.
+
+Overview
+
+   UTF-7 encodes Unicode characters as US-ASCII, together with shift
+   sequences to encode characters outside that range. For this purpose,
+   one of the characters in the US-ASCII repertoire is reserved for use
+   as a shift character.
+
+   Many mail gateways and systems cannot handle the entire US-ASCII
+   character set (those based on EBCDIC, for example), and so UTF-7
+   contains provisions for encoding characters within US-ASCII in a way
+   that all mail systems can accomodate.
+
+   UTF-7 should normally be used only in the context of 7 bit
+   transports, such as mail and news. In other contexts, straight
+   Unicode or UTF-8 is preferred.
+
+   See the document "Using Unicode with MIME" for the overall
+   specification on usage of Unicode transformation formats with MIME.
+
+Definitions
+
+   First, the definition of Unicode:
+
+      The 16 bit character set Unicode is defined by "The Unicode
+      Standard, Version 1.1". This character set is identical with the
+      character repertoire and coding of the international standard
+      ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;
+      Subset=300; Implementation Level=3.
+
+      Note. Unicode 1.1 further specifies the use and interaction of
+      these character codes beyond the ISO standard. However, any valid
+      10646 BMP (Basic Multilingual Plane) sequence is a valid Unicode
+      sequence, and vice versa; Unicode supplies interpretations of
+      sequences on which the ISO standard is silent as to
+      interpretation.
+
+   Next, some handy definitions of US-ASCII character subsets:
+
+      Set D (directly encoded characters) consists of the following
+      characters (derived from RFC 1521, Appendix B): the upper and
+      lower case letters A through Z and a through z, the 10 digits 0-9,
+      and the following nine special characters (note that "+" and "="
+      are omitted):
+
+
+
+
+Goldsmith & Davis                                               [Page 2]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+               Character   ASCII & Unicode Value (decimal)
+                  '           39
+                  (           40
+                  )           41
+                  ,           44
+                  -           45
+                  .           46
+                  /           47
+                  :           58
+                  ?           63
+
+      Set O (optional direct characters) consists of the following
+      characters (note that "\" and "~" are omitted):
+
+               Character   ASCII & Unicode Value (decimal)
+                  !           33
+                  "           34
+                  #           35
+                  $           36
+                  %           37
+                  &           38
+                  *           42
+                  ;           59
+                  <           60
+                  =           61
+                  >           62
+                  @           64
+                  [           91
+                  ]           93
+                  ^           94
+                  _           95
+                  `           96
+                  {           123
+                  |           124
+                  }           125
+
+   Rationale. The characters "\" and "~" are omitted because they are
+   often redefined in variants of ASCII.
+
+   Set B (Modified Base 64) is the set of characters in the Base64
+   alphabet defined in RFC 1521, excluding the pad character "="
+   (decimal value 61).
+
+   Rationale. The pad character = is excluded because UTF-7 is designed
+   for use within header fields as set forth in RFC 1522. Since the only
+   readable encoding in RFC 1522 is "Q" (based on RFC 1521's Quoted-
+   Printable), the "=" character is not available for use (without a lot
+   of escape sequences). This was very unfortunate but unavoidable. The
+
+
+
+Goldsmith & Davis                                               [Page 3]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+   "=" character could otherwise have been used as the UTF-7 escape
+   character as well (rather than using "+").
+
+   Note that all characters in US-ASCII have the same value in Unicode
+   when zero-extended to 16 bits.
+
+UTF-7 Definition
+
+   A UTF-7 stream represents 16-bit Unicode characters in 7-bit US-ASCII
+   as follows:
+
+      Rule 1: (direct encoding) Unicode characters in set D above may be
+      encoded directly as their ASCII equivalents. Unicode characters in
+      Set O may optionally be encoded directly as their ASCII
+      equivalents, bearing in mind that many of these characters are
+      illegal in header fields, or may not pass correctly through some
+      mail gateways.
+
+      Rule 2: (Unicode shifted encoding) Any Unicode character sequence
+      may be encoded using a sequence of characters in set B, when
+      preceded by the shift character "+" (US-ASCII character value
+      decimal 43). The "+" signals that subsequent octets are to be
+      interpreted as elements of the Modified Base64 alphabet until a
+      character not in that alphabet is encountered. Such characters
+      include control characters such as carriage returns and line
+      feeds; thus, a Unicode shifted sequence always terminates at the
+      end of a line. As a special case, if the sequence terminates with
+      the character "-" (US-ASCII decimal 45) then that character is
+      absorbed; other terminating characters are not absorbed and are
+      processed normally.
+
+      Rationale. A terminating character is necessary for cases where
+      the next character after the Modified Base64 sequence is part of
+      character set B. It can also enhance readability by delimiting
+      encoded sequences.
+
+      Also as a special case, the sequence "+-" may be used to encode
+      the character "+". A "+" character followed immediately by any
+      character other than members of set B or "-" is an ill-formed
+      sequence.
+
+      Unicode is encoded using Modified Base64 by first converting
+      Unicode 16-bit quantities to an octet stream (with the most
+      significant octet first). Text with an odd number of octets is
+      ill-formed.
+
+      Rationale. ISO/IEC 10646-1:1993(E) specifies that when characters
+      in the UCS-2 form are serialized as octets, that the most
+
+
+
+Goldsmith & Davis                                               [Page 4]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+      significant octet appear first.  This is also in keeping with
+      common network practice of choosing a canonical format for
+      transmission.
+
+      Next, the octet stream is encoded by applying the Base64 content
+      transfer encoding algorithm as defined in RFC 1521, modified to
+      omit the "=" pad character. Instead, when encoding, zero bits are
+      added to pad to a Base64 character boundary. When decoding, any
+      bits at the end of the Modified Base64 sequence that do not
+      constitute a complete 16-bit Unicode character are discarded. If
+      such discarded bits are non-zero the sequence is ill-formed.
+
+      Rationale. The pad character "=" is not used when encoding
+      Modified Base64 because of the conflict with its use as an escape
+      character for the Q content transfer encoding in RFC 1522 header
+      fields, as mentioned above.
+
+      Rule 3: The space (decimal 32), tab (decimal 9), carriage return
+      (decimal 13), and line feed (decimal 10) characters may be
+      directly represented by their ASCII equivalents. However, note
+      that MIME content transfer encodings have rules concerning the use
+      of such characters. Usage that does not conform to the
+      restrictions of RFC 822, for example, would have to be encoded
+      using MIME content transfer encodings other than 7bit or 8bit,
+      such as quoted-printable, binary, or base64.
+
+   Given this set of rules, Unicode characters which may be encoded via
+   rules 1 or 3 take one octet per character, and other Unicode
+   characters are encoded on average with 2 2/3 octets per character
+   plus one octet to switch into Modified Base64 and an optional octet
+   to switch out.
+
+      Example. The Unicode sequence "A<NOT IDENTICAL TO><ALPHA>."
+      (hexadecimal 0041,2262,0391,002E) may be encoded as follows:
+
+            A+ImIDkQ.
+
+      Example. The Unicode sequence "Hi Mom <WHITE SMILING FACE>!"
+      (hexadecimal 0048, 0069, 0020, 004D, 006F, 004D, 0020, 263A, 0021)
+      may be encoded as follows:
+
+            Hi Mom +Jjo-!
+
+      Example. The Unicode sequence representing the Han characters for
+      the Japanese word "nihongo" (hexadecimal 65E5,672C,8A9E) may be
+      encoded as follows:
+
+            +ZeVnLIqe-
+
+
+
+Goldsmith & Davis                                               [Page 5]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+Use of Character Set UTF-7 Within MIME
+
+   Character set UTF-7 is safe for mail transmission and therefore may
+   be used with any content transfer encoding in MIME (except where line
+   length and line break restrictions are violated). Specifically, the 7
+   bit encoding for bodies and the Q encoding for headers are both
+   acceptable. The MIME character set identifier is UNICODE-1-1-UTF-7.
+
+      Example. Here is a text portion of a MIME message containing the
+      Unicode sequence "Hi Mom <WHITE SMILING FACE>!" (hexadecimal 0048,
+      0069, 0020, 004D, 006F, 004D, 0020, 263A, 0021).
+
+      Content-Type: text/plain; charset=UNICODE-1-1-UTF-7
+
+      Hi Mom +Jjo-!
+
+      Example. Here is a text portion of a MIME message containing the
+      Unicode sequence representing the Han characters for the Japanese
+      word "nihongo" (hexadecimal 65E5,672C,8A9E).
+
+      Content-Type: text/plain; charset=UNICODE-1-1-UTF-7
+
+      +ZeVnLIqe-
+
+      Example. Here is a text portion of a MIME message containing the
+      Unicode sequence "A<NOT IDENTICAL TO><ALPHA>." (hexadecimal
+      0041,2262,0391,002E).
+
+      Content-Type: text/plain; charset=UNICODE-1-1-UTF-7
+
+      A+ImIDkQ.
+
+      Example. Here is a text portion of a MIME message containing the
+      Unicode sequence "Item 3 is <POUND SIGN>1."  (hexadecimal 0049,
+      0074, 0065, 006D, 0020, 0033, 0020, 0069, 0073, 0020, 00A3, 0031,
+      002E).
+
+      Content-Type: text/plain; charset=UNICODE-1-1-UTF-7
+
+      Item 3 is +AKM-1.
+
+   Note that to achieve the best interoperability with systems that may
+   not support Unicode or MIME, when preparing text for mail
+   transmission line breaks should follow Internet conventions. This
+   means that lines should be short and terminated with the proper SMTP
+   CRLF sequence. Unicode LINE SEPARATOR (hexadecimal 2028) and
+   PARAGRAPH SEPARATOR (hexadecimal 2029) should be converted to SMTP
+   line breaks. Ideally, this would be handled transparently by a
+
+
+
+Goldsmith & Davis                                               [Page 6]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+   Unicode-aware user agent.
+
+   This preparation is not absolutely necessary, since UTF-7 and the
+   appropriate MIME content transfer encoding can handle text that does
+   not follow Internet conventions, but readability by systems without
+   Unicode or MIME will be impaired. See RFC 1521 for an in-depth
+   discussion of mail interoperability issues.
+
+   Lines should never be broken in the middle of a UTF-7 shifted
+   sequence, since such sequences may not cross line breaks. Therefore,
+   UTF-7 encoding should take place after line breaking. If a line
+   containing a shifted sequence is too long after encoding, a MIME
+   content transfer encoding such as Quoted Printable can be used to
+   encode the text. Another possibility is to perform line breaking and
+   UTF-7 encoding at the same time, so that lines containing shifted
+   sequences already conform to length restrictions.
+
+Discussion
+
+   In this section we will motivate the introduction of UTF-7 as opposed
+   to the alternative of using the existing transformation formats of
+   Unicode (e.g., UTF-8) with MIME's content transfer encodings. Before
+   discussing this, it will be useful to list some assumptions about
+   character frequency within typical natural language text strings that
+   we use to estimate typical storage requirements:
+
+   1. Most Western European languages use roughly 7/8 of their letters
+      from US-ASCII and 1/8 from Latin 1 (ISO-8859-1).
+
+   2. Most non-European alphabet-based languages (e.g., Greek) use about
+      1/6 of their letters from ASCII (since white space is in the 7-bit
+      area) and the rest from their alphabets.
+
+   3. East Asian ideographic-based languages (including Japanese) use
+      essentially all of their characters from the Han or CJK syllabary
+      area.
+
+   4. Non-directly encoded punctuation characters do not occur
+      frequently enough to affect the results.
+
+   Notice that current 8 bit standards, such as ISO-8859-x, require use
+   of a content transfer encoding. For comparison with the subsequent
+   discussion, the costs break down as follows (note that many of these
+   figures are approximate since they depend on the exact composition of
+   the text):
+
+
+
+
+
+
+Goldsmith & Davis                                               [Page 7]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+   8859-x in Base64
+
+      Text type          Average octets/character
+      All                      1.33
+
+   8859-x in Quoted Printable
+
+      Text type          Average octets/character
+      US-ASCII                 1
+      Western European         1.25
+      Other                    2.67
+
+   Note also that Unicode encoded in Base64 takes a constant 2.67 octets
+   per character. For purposes of comparison, we will look at UTF-8 in
+   Base64 and Quoted Printable, and UTF-7. UTF-1 gives results
+   substantially similar to UTF-8.  Also note that fixed overhead for
+   long strings is relative to 1/n, where n is the encoded string length
+   in octets.
+
+   UTF-8 in Base64
+
+      Text type          Average octets/character
+      US-ASCII                 1.33
+      Western European         1.5
+      Some Alphabetics         2.44
+      All others               4
+
+   UTF-8 in Quoted Printable
+
+      Text type          Average octets/character
+      US-ASCII                 1
+      Western European         1.63
+      Some Alphabetics         5.17
+      All others               7-9
+
+   UTF-7
+
+      Text type          Average octets/character
+      Most US-ASCII            1
+      Western European         1.5
+      All others               2.67+2/n
+
+   We feel that the UTF-8 in Quoted Printable option is not viable due
+   to the very large expansion of all text except Western European. This
+   would only be viable in texts consisting of large expanses of US-
+   ASCII or Latin characters with occasional other characters
+   interspersed. We would prefer to introduce one encoding that works
+   reasonably well for all users.
+
+
+
+Goldsmith & Davis                                               [Page 8]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+   We also feel that UTF-8 in Base64 has high expansion for non-
+   Western-European users, and is less desirable because it cannot be
+   read directly, even when the content is largely US-ASCII. The base
+   encoding of UTF-7 gives competitive results and is readable for ASCII
+   text.
+
+   UTF-7 gives results competitive with ISO-8859-x, with access to all
+   of the Unicode character set. We believe this justifies the
+   introduction of a new transformation format of Unicode.
+
+   As an alternative to use of UTF-7, it is possible to intermix Unicode
+   characters with other character sets using an existing MIME
+   mechanism, the multipart/mixed content type (thanks to Nathaniel
+   Borenstein for pointing this out). For instance (repeating an earlier
+   example):
+
+      Content-type: multipart/mixed; boundary=foo
+
+      --foo
+      Content-type: text/plain; charset=us-ascii
+
+      Hi Mom
+      --foo
+      Content-type: text/plain; charset=UNICODE-1-1
+      Content-transfer-encoding: base64
+
+      Jjo=
+      --foo
+      Content-type: text/plain; charset=us-ascii
+
+      !
+      --foo--
+
+   Theoretically, this removes the need for UTF-7 in message bodies
+   (multipart may not be used in header fields). However, we feel that
+   as use of the Unicode character set becomes more widespread,
+   intermittent use of specialized Unicode characters (such as dingbats
+   and mathematical symbols) will occur, and that text will also
+   typically include small snippets from other scripts, such as
+   Cyrillic, Greek, or East Asian languages (anything in the Roman
+   script is already handled adequately by existing MIME character
+   sets). Although the multipart technique works well for large chunks
+   of text in alternating character sets, we feel it does not adequately
+   support the kinds of uses just discussed, and so we still believe the
+   introduction of UTF-7 is justified.
+
+
+
+
+
+
+Goldsmith & Davis                                               [Page 9]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+Summary
+
+   The UTF-7 encoding allows Unicode characters to be encoded within the
+   US-ASCII 7 bit character set. It is most effective for Unicode
+   sequences which contain relatively long strings of US-ASCII
+   characters interspersed with either single Unicode characters or
+   strings of Unicode characters, as it allows the US-ASCII portions to
+   be read on systems without direct Unicode support.
+
+   UTF-7 should only be used with 7 bit transports such as mail and
+   news. In other contexts, use of straight Unicode or UTF-8 is
+   preferred.
+
+Acknowledgements
+
+   Many thanks to the following people for their contributions,
+   comments, and suggestions. If we have omitted anyone it was through
+   oversight and not intentionally.
+
+         Glenn Adams
+         Harald T. Alvestrand
+         Nathaniel Borenstein
+         Lee Collins
+         Jim Conklin
+         Dave Crocker
+         Steve Dorner
+         Dana S. Emery
+         Ned Freed
+         Kari E. Hurtta
+         John H. Jenkins
+         John C. Klensin
+         Valdis Kletnieks
+         Keith Moore
+         Masataka Ohta
+         Einar Stefferud
+         Erik M. van der Poel
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Goldsmith & Davis                                              [Page 10]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+Appendix A -- Examples
+
+   Here is a longer example, taken from a document originally in Big5
+   code. It has been condensed for brevity. There are two versions: the
+   first uses optional characters from set O (and thus may not pass
+   through some mail gateways), and the second uses no optional
+   characters.
+
+   Content-type: text/plain; charset=unicode-1-1-utf-7
+
+   Below is the full Chinese text of the Analects (+itaKng-).
+
+   The sources for the text are:
+
+   "The sayings of Confucius," James R. Ware, trans.  +U/BTFw-:
+   +ZYeB9FH6ckh5Pg-, 1980.  (Chinese text with English translation)
+
+   +Vttm+E6UfZM-, +W4tRQ066bOg-, +UxdOrA-:  +Ti1XC2b4Xpc-, 1990.
+
+   "The Chinese Classics with a Translation, Critical and
+   Exegetical Notes, Prolegomena, and Copius Indexes," James
+   Legge, trans., Taipei:  Southern Materials Center Publishing,
+   Inc., 1991.  (Chinese text with English translation)
+
+   Big Five and GB versions of the text are being made available
+   separately.
+
+   Neither the Big Five nor GB contain all the characters used in
+   this text.  Missing characters have been indicated using their
+   Unicode/ISO 10646 code points.  "U+-" followed by four
+   hexadecimal digits indicates a Unicode/10646 code (e.g.,
+   U+-9F08).  There is no good solution to the problem of the small
+   size of the Big Five/GB character sets; this represents the
+   solution I find personally most satisfactory.
+
+   (omitted...)
+
+   I have tried to minimize this problem by using variant
+   characters where they were available and the character
+   actually in the text was not.  Only variants listed as such in
+   the +XrdxmVtXUXg- were used.
+
+   (omitted...)
+
+   John H. Jenkins
+   +TpVPXGBG-
+   John_Jenkins@taligent.com
+   5 January 1993
+
+
+
+Goldsmith & Davis                                              [Page 11]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+   (omitted...)
+
+
+   Content-type: text/plain; charset=unicode-1-1-utf-7
+
+   Below is the full Chinese text of the Analects (+itaKng-).
+
+   The sources for the text are:
+
+   +ACI-The sayings of Confucius,+ACI- James R. Ware, trans.  +U/BTFw-:
+   +ZYeB9FH6ckh5Pg-, 1980.  (Chinese text with English translation)
+
+   +Vttm+E6UfZM-, +W4tRQ066bOg-, +UxdOrA-:  +Ti1XC2b4Xpc-, 1990.
+
+   +ACI-The Chinese Classics with a Translation, Critical and
+   Exegetical Notes, Prolegomena, and Copius Indexes,+ACI- James
+   Legge, trans., Taipei:  Southern Materials Center Publishing,
+   Inc., 1991.  (Chinese text with English translation)
+
+   Big Five and GB versions of the text are being made available
+   separately.
+
+   Neither the Big Five nor GB contain all the characters used in
+   this text.  Missing characters have been indicated using their
+   Unicode/ISO 10646 code points.  +ACI-U+-+ACI- followed by four
+   hexadecimal digits indicates a Unicode/10646 code (e.g.,
+   U+-9F08).  There is no good solution to the problem of the small
+   size of the Big Five/GB character sets+ADs- this represents the
+   solution I find personally most satisfactory.
+
+   (omitted...)
+
+   I have tried to minimize this problem by using variant
+   characters where they were available and the character
+   actually in the text was not.  Only variants listed as such in
+   the +XrdxmVtXUXg- were used.
+
+   (omitted...)
+
+   John H. Jenkins
+   +TpVPXGBG-
+   John+AF8-Jenkins+AEA-taligent.com
+   5 January 1993
+   (omitted...)
+
+
+
+
+
+
+
+Goldsmith & Davis                                              [Page 12]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+Security Considerations
+
+   Security issues are not discussed in this memo.
+
+References
+
+[UNICODE 1.1]  "The Unicode Standard, Version 1.1": Version 1.0, Volume
+               1 (ISBN 0-201-56788-1), Version 1.0, Volume 2 (ISBN 0-
+               201-60845-6), and "Unicode Technical Report #4, The
+               Unicode Standard, Version 1.1" (available from The
+               Unicode Consortium, and soon to be published by Addison-
+               Wesley).
+
+[ISO 10646]    ISO/IEC 10646-1:1993(E) Information Technology--Universal
+               Multiple-octet Coded Character Set (UCS).
+
+[MIME/UNICODE] Goldsmith, D., and M. Davis, "Using Unicode with MIME",
+               RFC 1641, Taligent, Inc., July 1994.
+
+[US-ASCII]     Coded Character Set--7-bit American Standard Code for
+               Information Interchange, ANSI X3.4-1986.
+
+[ISO-8859]     Information Processing -- 8-bit Single-Byte Coded Graphic
+               Character Sets -- Part 1: Latin Alphabet No. 1, ISO
+               8859-1:1987.  Part 2: Latin alphabet No.  2, ISO 8859-2,
+               1987.  Part 3: Latin alphabet No. 3, ISO 8859-3, 1988.
+               Part 4: Latin alphabet No.  4, ISO 8859-4, 1988.  Part 5:
+               Latin/Cyrillic alphabet, ISO 8859-5, 1988.  Part 6:
+               Latin/Arabic alphabet, ISO 8859-6, 1987.  Part 7:
+               Latin/Greek alphabet, ISO 8859-7, 1987.  Part 8:
+               Latin/Hebrew alphabet, ISO 8859-8, 1988.  Part 9: Latin
+               alphabet No. 5, ISO 8859-9, 1990.
+
+[RFC822]       Crocker, D., "Standard for the Format of ARPA Internet
+               Text Messages", STD 11, RFC 822, UDEL, August 1982.
+
+[RFC-1521]     Borenstein N., and N. Freed, "MIME (Multipurpose Internet
+               Mail Extensions) Part One:  Mechanisms for Specifying and
+               Describing the Format of Internet Message Bodies", RFC
+               1521, Bellcore, Innosoft, September 1993.
+
+[RFC-1522]     Moore, K., "Representation of Non-Ascii Text in Internet
+               Message Headers" RFC 1522, University of Tennessee,
+               September 1993.
+
+
+
+
+
+
+
+Goldsmith & Davis                                              [Page 13]
+
+RFC 1642                         UTF-7                         July 1994
+
+
+[UTF-8]        X/Open Company Ltd., "File System Safe UCS Transformation
+               Format (FSS_UTF)", X/Open Preliminary Specification,
+               Document Number: P316. This information also appears in
+               Unicode Technical Report #4, and in a forthcoming annex
+               to ISO/IEC 10646.
+
+Authors' Addresses
+
+   David Goldsmith
+   Taligent, Inc.
+   10201 N. DeAnza Blvd.
+   Cupertino, CA 95014-2233
+
+   Phone: 408-777-5225
+   Fax: 408-777-5081
+   EMail: david_goldsmith@taligent.com
+
+
+   Mark Davis
+   Taligent, Inc.
+   10201 N. DeAnza Blvd.
+   Cupertino, CA 95014-2233
+
+   Phone: 408-777-5116
+   Fax: 408-777-5081
+   EMail: mark_davis@taligent.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Goldsmith & Davis                                              [Page 14]
+