summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc2152.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc2152.txt')
-rw-r--r--doc/rfc/rfc2152.txt843
1 files changed, 843 insertions, 0 deletions
diff --git a/doc/rfc/rfc2152.txt b/doc/rfc/rfc2152.txt
new file mode 100644
index 0000000..e5ad077
--- /dev/null
+++ b/doc/rfc/rfc2152.txt
@@ -0,0 +1,843 @@
+
+
+
+
+
+
+Network Working Group D. Goldsmith
+Request for Comments: 2152 Apple Computer, Inc.
+Obsoletes: RFC 1642 M. Davis
+Category: Informational Taligent, Inc.
+ May 1997
+
+
+ UTF-7
+
+ A Mail-Safe Transformation Format of Unicode
+
+Status of this Memo
+
+ This memo provides information for the Internet community. This memo
+ does not specify an Internet standard of any kind. Distribution of
+ this memo is unlimited.
+
+Abstract
+
+ The Unicode Standard, version 2.0, and ISO/IEC 10646-1:1993(E) (as
+ amended) jointly define a character set (hereafter referred to as
+ Unicode) which encompasses most of the world's writing systems.
+ However, Internet mail (STD 11, RFC 822) currently supports only 7-
+ bit US ASCII as a character set. MIME (RFC 2045 through 2049) extends
+ Internet mail to support different media types and character sets,
+ and thus could support Unicode in mail messages. MIME neither defines
+ Unicode as a permitted character set nor specifies how it would be
+ encoded, although it does provide for the registration of additional
+ character sets over time.
+
+ This document describes a transformation format of Unicode that
+ contains only 7-bit ASCII octets and is intended to be readable by
+ humans in the limiting case that the document consists of characters
+ from the US-ASCII repertoire. It also specifies how this
+ transformation format is used in the context of MIME and RFC 1641,
+ "Using Unicode with MIME".
+
+Motivation
+
+ Although other transformation formats of Unicode exist and could
+ conceivably be used in this context (most notably UTF-8, also known
+ as UTF-2 or UTF-FSS), they suffer the disadvantage that they use
+ octets in the range decimal 128 through 255 to encode Unicode
+ characters outside the US-ASCII range. Thus, in the context of mail,
+ those octets must themselves be encoded. This requires putting text
+ through two successive encoding processes, and leads to a significant
+ expansion of characters outside the US-ASCII range, putting non-
+ English speakers at a disadvantage. For example, using UTF-8 together
+
+
+
+Goldsmith & Davis Informational [Page 1]
+
+RFC 2152 UTF-7 May 1997
+
+
+ with the Quoted-Printable content transfer encoding of MIME
+ represents US-ASCII characters in one octet, but other characters may
+ require up to nine octets.
+
+Overview
+
+ UTF-7 encodes Unicode characters as US-ASCII octets, together with
+ shift sequences to encode characters outside that range. For this
+ purpose, one of the characters in the US-ASCII repertoire is reserved
+ for use as a shift character.
+
+ Many mail gateways and systems cannot handle the entire US-ASCII
+ character set (those based on EBCDIC, for example), and so UTF-7
+ contains provisions for encoding characters within US-ASCII in a way
+ that all mail systems can accomodate.
+
+ UTF-7 should normally be used only in the context of 7 bit
+ transports, such as mail. In other contexts, straight Unicode or
+ UTF-8 is preferred.
+
+ See RFC 1641, "Using Unicode with MIME" for the overall specification
+ on usage of Unicode transformation formats with MIME.
+
+Definitions
+
+ First, the definition of Unicode:
+
+ The 16 bit character set Unicode is defined by "The Unicode
+ Standard, Version 2.0". This character set is identical with the
+ character repertoire and coding of the international standard
+ ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;
+ Subset=300; Implementation Level=3, including the first 7
+ amendments to 10646 plus editorial corrections.
+
+ Note. Unicode 2.0 further specifies the use and interaction of
+ these character codes beyond the ISO standard. However, any valid
+ 10646 sequence is a valid Unicode sequence, and vice versa;
+ Unicode supplies interpretations of sequences on which the ISO
+ standard is silent as to interpretation.
+
+ Next, some handy definitions of US-ASCII character subsets:
+
+ Set D (directly encoded characters) consists of the following
+ characters (derived from RFC 1521, Appendix B, which no longer
+ appears in RFC 2045): the upper and lower case letters A through Z
+ and a through z, the 10 digits 0-9, and the following nine special
+ characters (note that "+" and "=" are omitted):
+
+
+
+
+Goldsmith & Davis Informational [Page 2]
+
+RFC 2152 UTF-7 May 1997
+
+
+ Character ASCII & Unicode Value (decimal)
+ ' 39
+ ( 40
+ ) 41
+ , 44
+ - 45
+ . 46
+ / 47
+ : 58
+ ? 63
+
+ Set O (optional direct characters) consists of the following
+ characters (note that "\" and "~" are omitted):
+
+ Character ASCII & Unicode Value (decimal)
+ ! 33
+ " 34
+ # 35
+ $ 36
+ % 37
+ & 38
+ * 42
+ ; 59
+ < 60
+ = 61
+ > 62
+ @ 64
+ [ 91
+ ] 93
+ ^ 94
+ _ 95
+ ' 96
+ { 123
+ | 124
+ } 125
+
+ Rationale. The characters "\" and "~" are omitted because they are
+ often redefined in variants of ASCII.
+
+ Set B (Modified Base 64) is the set of characters in the Base64
+ alphabet defined in RFC 2045, excluding the pad character "="
+ (decimal value 61).
+
+
+
+
+
+
+
+
+
+Goldsmith & Davis Informational [Page 3]
+
+RFC 2152 UTF-7 May 1997
+
+
+ Rationale. The pad character = is excluded because UTF-7 is designed
+ for use within header fields as set forth in RFC 2047. Since the only
+ readable encoding in RFC 2047 is "Q" (based on RFC 2045's Quoted-
+ Printable), the "=" character is not available for use (without a lot
+ of escape sequences). This was very unfortunate but unavoidable. The
+ "=" character could otherwise have been used as the UTF-7 escape
+ character as well (rather than using "+").
+
+ Note that all characters in US-ASCII have the same value in Unicode
+ when zero-extended to 16 bits.
+
+UTF-7 Definition
+
+ A UTF-7 stream represents 16-bit Unicode characters using 7-bit US-
+ ASCII octets as follows:
+
+ Rule 1: (direct encoding) Unicode characters in set D above may be
+ encoded directly as their ASCII equivalents. Unicode characters in
+ Set O may optionally be encoded directly as their ASCII
+ equivalents, bearing in mind that many of these characters are
+ illegal in header fields, or may not pass correctly through some
+ mail gateways.
+
+ Rule 2: (Unicode shifted encoding) Any Unicode character sequence
+ may be encoded using a sequence of characters in set B, when
+ preceded by the shift character "+" (US-ASCII character value
+ decimal 43). The "+" signals that subsequent octets are to be
+ interpreted as elements of the Modified Base64 alphabet until a
+ character not in that alphabet is encountered. Such characters
+ include control characters such as carriage returns and line
+ feeds; thus, a Unicode shifted sequence always terminates at the
+ of a line. As a special case, if the sequence terminates with the
+ character "-" (US-ASCII decimal 45) then that character is
+ absorbed; other terminating characters are not absorbed and are
+ processed normally.
+
+ Note that if the first character after the shifted sequence is "-"
+ then an extra "-" must be present to terminate the shifted
+ sequence so that the actual "-" is not itself absorbed.
+
+ Rationale. A terminating character is necessary for cases where
+ the next character after the Modified Base64 sequence is part of
+ character set B or is itself the terminating character. It can
+ also enhance readability by delimiting encoded sequences.
+
+
+
+
+
+
+
+Goldsmith & Davis Informational [Page 4]
+
+RFC 2152 UTF-7 May 1997
+
+
+ Also as a special case, the sequence "+-" may be used to encode
+ the character "+". A "+" character followed immediately by any
+ character other than members of set B or "-" is an ill-formed
+ sequence.
+
+ Unicode is encoded using Modified Base64 by first converting
+ Unicode 16-bit quantities to an octet stream (with the most
+ significant octet first). Surrogate pairs (UTF-16) are converted
+ by treating each half of the pair as a separate 16 bit quantity
+ (i.e., no special treatment). Text with an odd number of octets is
+ ill-formed. ISO 10646 characters outside the range addressable via
+ surrogate pairs cannot be encoded.
+
+ Rationale. ISO/IEC 10646-1:1993(E) specifies that when characters
+ the UCS-2 form are serialized as octets, that the most significant
+ octet appear first. This is also in keeping with common network
+ practice of choosing a canonical format for transmission.
+
+ Rationale. The policy for code point allocation within ISO 10646
+ and Unicode is that the repertoires be kept synchronized. No code
+ points will be allocated in ISO 10646 outside the range
+ addressable by surrogate pairs.
+
+ Next, the octet stream is encoded by applying the Base64 content
+ transfer encoding algorithm as defined in RFC 2045, modified to
+ omit the "=" pad character. Instead, when encoding, zero bits are
+ added to pad to a Base64 character boundary. When decoding, any
+ bits at the end of the Modified Base64 sequence that do not
+ constitute a complete 16-bit Unicode character are discarded. If
+ such discarded bits are non-zero the sequence is ill-formed.
+
+ Rationale. The pad character "=" is not used when encoding
+ Modified Base64 because of the conflict with its use as an escape
+ character for the Q content transfer encoding in RFC 2047 header
+ fields, as mentioned above.
+
+ Rule 3: The space (decimal 32), tab (decimal 9), carriage return
+ (decimal 13), and line feed (decimal 10) characters may be
+ directly represented by their ASCII equivalents. However, note
+ that MIME content transfer encodings have rules concerning the use
+ of such characters. Usage that does not conform to the
+ restrictions of RFC 822, for example, would have to be encoded
+ using MIME content transfer encodings other than 7bit or 8bit,
+ such as quoted-printable, binary, or base64.
+
+ Given this set of rules, Unicode characters which may be encoded via
+ rules 1 or 3 take one octet per character, and other Unicode
+ characters are encoded on average with 2 2/3 octets per character
+
+
+
+Goldsmith & Davis Informational [Page 5]
+
+RFC 2152 UTF-7 May 1997
+
+
+ plus one octet to switch into Modified Base64 and an optional octet
+ to switch out.
+
+ Example. The Unicode sequence "A<NOT IDENTICAL TO><ALPHA>."
+ (hexadecimal 0041,2262,0391,002E) may be encoded as follows:
+
+ A+ImIDkQ.
+
+ Example. The Unicode sequence "Hi Mom -<WHITE SMILING FACE>-!"
+ (hexadecimal 0048, 0069, 0020, 004D, 006F, 006D, 0020, 002D, 263A,
+ 002D, 0021) may be encoded as follows:
+
+ Hi Mom -+Jjo--!
+
+ Example. The Unicode sequence representing the Han characters for
+ the Japanese word "nihongo" (hexadecimal 65E5,672C,8A9E) may be
+ encoded as follows:
+
+ +ZeVnLIqe-
+
+Use of Character Set UTF-7 Within MIME
+
+ Character set UTF-7 is safe for mail transmission and therefore may
+ be used with any content transfer encoding in MIME (except where line
+ length and line break restrictions are violated). Specifically, the 7
+ bit encoding for bodies and the Q encoding for headers are both
+ acceptable. The MIME character set tag is UTF-7. This signifies any
+ version of Unicode equal to or greater than 2.0.
+
+ Example. Here is a text portion of a MIME message containing the
+ Unicode sequence "Hi Mom <WHITE SMILING FACE>!" (hexadecimal 0048,
+ 0069, 0020, 004D, 006F, 006D, 0020, 263A, 0021).
+
+ Content-Type: text/plain; charset=UTF-7
+
+ Hi Mom +Jjo-!
+
+ Example. Here is a text portion of a MIME message containing the
+ Unicode sequence representing the Han characters for the Japanese
+ word "nihongo" (hexadecimal 65E5,672C,8A9E).
+
+ Content-Type: text/plain; charset=UTF-7
+
+ +ZeVnLIqe-
+
+ Example. Here is a text portion of a MIME message containing the
+ Unicode sequence "A<NOT IDENTICAL TO><ALPHA>." (hexadecimal
+ 0041,2262,0391,002E).
+
+
+
+Goldsmith & Davis Informational [Page 6]
+
+RFC 2152 UTF-7 May 1997
+
+
+ Content-Type: text/plain; charset=utf-7
+
+ A+ImIDkQ.
+
+ Example. Here is a text portion of a MIME message containing the
+ Unicode sequence "Item 3 is <POUND SIGN>1." (hexadecimal 0049,
+ 0074, 0065, 006D, 0020, 0033, 0020, 0069, 0073, 0020, 00A3, 0031,
+ 002E).
+
+ Content-Type: text/plain; charset=UTF-7
+
+ Item 3 is +AKM-1.
+
+ Note that to achieve the best interoperability with systems that may
+ not support Unicode or MIME, when preparing text for mail
+ transmission line breaks should follow Internet conventions. This
+ means that lines should be short and terminated with the proper SMTP
+ CRLF sequence. Unicode LINE SEPARATOR (hexadecimal 2028) and
+ PARAGRAPH SEPARATOR (hexadecimal 2029) should be converted to SMTP
+ line breaks. Ideally, this would be handled transparently by a
+ Unicode-aware user agent.
+
+ This preparation is not absolutely necessary, since UTF-7 and the
+ appropriate MIME content transfer encoding can handle text that does
+ not follow Internet conventions, but readability by systems without
+ Unicode or MIME will be impaired. See RFC 2045 for a discussion of
+ mail interoperability issues.
+
+ Lines should never be broken in the middle of a UTF-7 shifted
+ sequence, since such sequences may not cross line breaks. Therefore,
+ UTF-7 encoding should take place after line breaking. If a line
+ containing a shifted sequence is too long after encoding, a MIME
+ content transfer encoding such as Quoted Printable can be used to
+ encode the text. Another possibility is to perform line breaking and
+ UTF-7 encoding at the same time, so that lines containing shifted
+ sequences already conform to length restrictions.
+
+Discussion
+
+ In this section we will motivate the introduction of UTF-7 as opposed
+ to the alternative of using the existing transformation formats of
+ Unicode (e.g., UTF-8) with MIME's content transfer encodings. Before
+ discussing this, it will be useful to list some assumptions about
+ character frequency within typical natural language text strings that
+ we use to estimate typical storage requirements:
+
+ 1. Most Western European languages use roughly 7/8 of their letters
+ from US-ASCII and 1/8 from Latin 1 (ISO-8859-1).
+
+
+
+Goldsmith & Davis Informational [Page 7]
+
+RFC 2152 UTF-7 May 1997
+
+
+ 2. Most non-Roman alphabet-based languages (e.g., Greek) use about
+ 1/6 of their letters from ASCII (since white space is in the 7-bit
+ area) and the rest from their alphabets.
+
+ 3. East Asian ideographic-based languages (including Japanese) use
+ essentially all of their characters from the Han or CJK syllabary
+ area.
+
+ 4. Non-directly encoded punctuation characters do not occur
+ frequently enough to affect the results.
+
+ Notice that current 8 bit standards, such as ISO-8859-x, require use
+ of a content transfer encoding. For comparison with the subsequent
+ discussion, the costs break down as follows (note that many of these
+ figures are approximate since they depend on the exact composition of
+ the text):
+
+ 8859-x in Base64
+
+ Text type Average octets/character
+ All 1.33
+
+ 8859-x in Quoted Printable
+
+ Text type Average octets/character
+ US-ASCII 1
+ Western European 1.25
+ Other 2.67
+
+ Note also that Unicode encoded in Base64 takes a constant 2.67 octets
+ per character. For purposes of comparison, we will look at UTF-8 in
+ Base64 and Quoted Printable, and UTF-7. Also note that fixed overhead
+ for long strings is relative to 1/n, where n is the encoded string
+ length in octets.
+
+ UTF-8 in Base64
+
+ Text type Average octets/character
+ US-ASCII 1.33
+ Western European 1.5
+ Some Alphabetics 2.44
+ All others 4
+
+
+
+
+
+
+
+
+
+Goldsmith & Davis Informational [Page 8]
+
+RFC 2152 UTF-7 May 1997
+
+
+ UTF-8 in Quoted Printable
+
+ Text type Average octets/character
+ US-ASCII 1
+ Western European 1.63
+ Some Alphabetics 5.17
+ All others 7-9
+
+ UTF-7
+
+ Text type Average octets/character
+ Most US-ASCII 1
+ Western European 1.5
+ All others 2.67+2/n
+
+ We feel that the UTF-8 in Quoted Printable option is not viable due
+ to the very large expansion of all text except Western European. This
+ would only be viable in texts consisting of large expanses of US-
+ ASCII or Latin characters with occasional other characters
+ interspersed. We would prefer to introduce one encoding that works
+ reasonably well for all users.
+
+ We also feel that UTF-8 in Base64 has high expansion for non-
+ Western-European users, and is less desirable because it cannot be
+ read directly, even when the content is largely US-ASCII. The base
+ encoding of UTF-7 gives competitive results and is readable for ASCII
+ text.
+
+ UTF-7 gives results competitive with ISO-8859-x, with access to all
+ of the Unicode character set. We believe this justifies the
+ introduction of a new transformation format of Unicode.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Goldsmith & Davis Informational [Page 9]
+
+RFC 2152 UTF-7 May 1997
+
+
+ As an alternative to use of UTF-7, it might be possible to intermix
+ Unicode characters with other character sets using an existing MIME
+ mechanism, the multipart/mixed content type, ignoring for the moment
+ the issues with line breaks (thanks to Nathaniel Borenstein for
+ suggesting this). For instance (repeating an earlier example):
+
+ Content-type: multipart/mixed; boundary=foo
+ Content-Disposition: inline
+
+ --foo
+ Content-type: text/plain; charset=us-ascii
+
+ Hi Mom
+ --foo
+ Content-type: text/plain; charset=UNICODE-2-0
+ Content-transfer-encoding: base64
+
+ Jjo=
+ --foo
+ Content-type: text/plain; charset=us-ascii
+
+ !
+ --foo--
+
+ Theoretically, this removes the need for UTF-7 in message bodies
+ (multipart may not be used in header fields). However, we feel that
+ as use of the Unicode character set becomes more widespread,
+ intermittent use of specialized Unicode characters (such as dingbats
+ and mathematical symbols) will occur, and that text will also
+ typically include small snippets from other scripts, such as
+ Cyrillic, Greek, or East Asian languages (anything in the Roman
+ script is already handled adequately by existing MIME character
+ sets). Although the multipart technique works well for large chunks
+ of text in alternating character sets, we feel it does not adequately
+ support the kinds of uses just discussed, and so we still believe the
+ introduction of UTF-7 is justified.
+
+Summary
+
+ The UTF-7 encoding allows Unicode characters to be encoded within the
+ US-ASCII 7 bit character set. It is most effective for Unicode
+ sequences which contain relatively long strings of US-ASCII
+ characters interspersed with either single Unicode characters or
+ strings of Unicode characters, as it allows the US-ASCII portions to
+ be read on systems without direct Unicode support.
+
+ UTF-7 should only be used with 7 bit transports such as mail. In
+ other contexts, use of straight Unicode or UTF-8 is preferred.
+
+
+
+Goldsmith & Davis Informational [Page 10]
+
+RFC 2152 UTF-7 May 1997
+
+
+Acknowledgements
+
+ Many thanks to the following people for their contributions,
+ comments, and suggestions. If we have omitted anyone it was through
+ oversight and not intentionally.
+
+ Glenn Adams
+ Harald T. Alvestrand
+ Nathaniel Borenstein
+ Lee Collins
+ Jim Conklin
+ Dave Crocker
+ Steve Dorner
+ Dana S. Emery
+ Ned Freed
+ Kari E. Hurtta
+ John H. Jenkins
+ John C. Klensin
+ Valdis Kletnieks
+ Keith Moore
+ Masataka Ohta
+ Einar Stefferud
+ Erik M. van der Poel
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Goldsmith & Davis Informational [Page 11]
+
+RFC 2152 UTF-7 May 1997
+
+
+Appendix A -- Examples
+
+ Here is a longer example, taken from a document originally in Big5
+ code. It has been condensed for brevity. There are two versions: the
+ first uses optional characters from set O (and so may not pass
+ through some mail gateways), and the second does not.
+
+ Content-type: text/plain; charset=utf-7
+
+ Below is the full Chinese text of the Analects (+itaKng-).
+
+ The sources for the text are:
+
+ "The sayings of Confucius," James R. Ware, trans. +U/BTFw-:
+ +ZYeB9FH6ckh5Pg-, 1980. (Chinese text with English translation)
+
+ +Vttm+E6UfZM-, +W4tRQ066bOg-, +UxdOrA-: +Ti1XC2b4Xpc-, 1990.
+
+ "The Chinese Classics with a Translation, Critical and Exegetical
+ Notes, Prolegomena, and Copius Indexes," James Legge, trans., Taipei:
+ Southern Materials Center Publishing, Inc., 1991. (Chinese text with
+ English translation)
+
+ Big Five and GB versions of the text are being made available
+ separately.
+
+ Neither the Big Five nor GB contain all the characters used in this
+ text. Missing characters have been indicated using their Unicode/ISO
+ 10646 code points. "U+-" followed by four hexadecimal digits
+ indicates a Unicode/10646 code (e.g., U+-9F08). There is no good
+ solution to the problem of the small size of the Big Five/GB
+ character sets; this represents the solution I find personally most
+ satisfactory.
+
+ (omitted...)
+
+ I have tried to minimize this problem by using variant characters
+ where they were available and the character actually in the text was
+ not. Only variants listed as such in the +XrdxmVtXUXg- were used.
+
+ (omitted...)
+
+ John H. Jenkins +TpVPXGBG- jenkins@apple.com 5 January 1993
+ (omitted...)
+
+ Content-type: text/plain; charset=utf-7
+
+ Below is the full Chinese text of the Analects (+itaKng-).
+
+
+
+Goldsmith & Davis Informational [Page 12]
+
+RFC 2152 UTF-7 May 1997
+
+
+ The sources for the text are:
+
+ +ACI-The sayings of Confucius,+ACI- James R. Ware, trans. +U/BTFw-:
+ +ZYeB9FH6ckh5Pg-, 1980. (Chinese text with English translation)
+
+ +Vttm+E6UfZM-, +W4tRQ066bOg-, +UxdOrA-: +Ti1XC2b4Xpc-, 1990.
+
+ +ACI-The Chinese Classics with a Translation, Critical and Exegetical
+ Notes, Prolegomena, and Copius Indexes,+ACI- James Legge, trans.,
+ Taipei: Southern Materials Center Publishing, Inc., 1991. (Chinese
+ text with English translation)
+
+ Big Five and GB versions of the text are being made available
+ separately.
+
+ Neither the Big Five nor GB contain all the characters used in this
+ text. Missing characters have been indicated using their Unicode/ISO
+ 10646 code points. +ACI-U+-+ACI- followed by four hexadecimal digits
+ indicates a Unicode/10646 code (e.g., U+-9F08). There is no good
+ solution to the problem of the small size of the Big Five/GB
+ character sets+ADs- this represents the solution I find personally
+ most satisfactory.
+
+ (omitted...)
+
+ I have tried to minimize this problem by using variant characters
+ where they were available and the character actually in the text was
+ not. Only variants listed as such in the +XrdxmVtXUXg- were used.
+ (omitted...)
+
+ John H. Jenkins +TpVPXGBG- jenkins+AEA-apple.com 5 January 1993
+ (omitted...)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Goldsmith & Davis Informational [Page 13]
+
+RFC 2152 UTF-7 May 1997
+
+
+Security Considerations
+
+ Security issues are not discussed in this memo.
+
+References
+
+[UNICODE 2.0] "The Unicode Standard, Version 2.0", The Unicode
+ Consortium, Addison-Wesley, 1996. ISBN 0-201-48345-9.
+
+[ISO 10646] ISO/IEC 10646-1:1993(E) Information Technology--Universal
+ Multiple-octet Coded Character Set (UCS). See also
+ amendments 1 through 7, plus editorial corrections.
+
+[RFC-1641] Goldsmith, D., and M. Davis, "Using Unicode with MIME",
+ RFC 1641, Taligent, Inc., July 1994.
+
+[US-ASCII] Coded Character Set--7-bit American Standard Code for
+ Information Interchange, ANSI X3.4-1986.
+
+[ISO-8859] Information Processing -- 8-bit Single-Byte Coded Graphic
+ Character Sets -- Part 1: Latin Alphabet No. 1, ISO
+ 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2,
+ 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988.
+ Part 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5:
+ Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6:
+ Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7:
+ Latin/Greek alphabet, ISO 8859-7, 1987. Part 8:
+ Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin
+ alphabet No. 5, ISO 8859-9, 1990.
+
+[RFC822] Crocker, D., "Standard for the Format of ARPA Internet
+ Text Messages", STD 11, RFC 822, UDEL, August 1982.
+
+[MIME] Borenstein N., N. Freed, K. Moore, J. Klensin, and J.
+ Postel, "MIME (Multipurpose Internet Mail Extensions)
+ Parts One through Five", RFC 2045, 2046, 2047, 2048, and
+ 2049, November 1996.
+
+Authors' Addresses
+
+ David Goldsmith
+ Apple Computer, Inc.
+ 2 Infinite Loop, MS: 302-2IS
+ Cupertino, CA 95014
+
+ Phone: 408-974-1957
+ Fax: 408-862-4566
+ EMail: goldsmith@apple.com
+
+
+
+Goldsmith & Davis Informational [Page 14]
+
+RFC 2152 UTF-7 May 1997
+
+
+ Mark Davis
+ Taligent, Inc.
+ 10201 N. DeAnza Blvd.
+ Cupertino, CA 95014-2233
+
+ Phone: 408-777-5116
+ Fax: 408-777-5081
+ EMail: mark_davis@taligent.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Goldsmith & Davis Informational [Page 15]
+