1 files changed, 1739 insertions, 0 deletions
diff --git a/doc/rfc/rfc2130.txt b/doc/rfc/rfc2130.txt
new file mode 100644
index 0000000..932172c
--- /dev/null
+++ b/doc/rfc/rfc2130.txt
@@ -0,0 +1,1739 @@
+
+
+
+
+
+
+Network Working Group                                        C. Weider
+Request for Comments: 2130                                   Microsoft
+Category: Informational                                     C. Preston
+                                                       Preston & Lynch
+                                                           K. Simonsen
+                                                                 DKUUG
+                                                         H. Alvestrand
+                                                               UNINETT
+                                                           R. Atkinson
+                                                         Cisco Systems
+                                                            M. Crispin
+                                              University of Washington
+                                                           P. Svanberg
+                                                                   KTH
+                                                            April 1997
+
+              The Report of the IAB Character Set Workshop
+                    held 29 February - 1 March, 1996
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  This memo
+   does not specify an Internet standard of any kind.  Distribution of
+   this memo is unlimited.
+
+Acknowledgments
+
+   The authors would like to sincerely thank Information Sciences
+   Institute (ISI), and in particular Joyce K. Reynolds for graciously
+   hosting this event; Joe Kemp and Jeanine Yamazaki of ISI made sure
+   the facilities met our needs.  We also wish to thank the Internet
+   Society, which underwrote travel for participants who might not
+   otherwise have been able to attend.  Of course, we also wish to thank
+   the many experts who participated in the workshop and on the mailing
+   list; a complete list of these people can be found in Appendix D.
+   Bunyip Information Systems was kind enough to provide mailing list
+   facilities for this work.
+
+Table of Contents
+
+   Abstract
+   0:    Executive summary..........................................   2
+   1:    Introduction...............................................   3
+   2:    Character sets on the Internet -- the problem..............   3
+   2.1:  Character set handling in existing protocols...............   4
+   3:    Architectural model........................................   6
+   3.1:  Segments defined...........................................   7
+   3.2:  On the wire................................................   8
+
+
+
+Weider, et. al.              Informational                      [Page 1]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   3.3:  Determining which values of CCS, CES, and TES are used.....   9
+   3.4:  Recommended Defaults.......................................  10
+   3.5:  Guidelines for conversions between coded character sets....  13
+   4:    Presentation issues........................................  14
+   5:    Open issues................................................  14
+   5.1:  Language tags..............................................  15
+   5.2:  Public identifiers.........................................  16
+   5.3:  Bi-directionality..........................................  16
+   6:    Security Considerations....................................  16
+   7:    Conclusions................................................  16
+   8:    Recommendations............................................  17
+   8.1:  To the IAB.................................................  17
+   8.2:  For new Internet protocols.................................  18
+   8.3:  For registration of new character sets.....................  18
+   Appendix A: List of protocols affected by character set issues...  20
+   Appendix B: Acronyms.............................................  23
+   Appendix C: Glossary.............................................  24
+   Appendix D: References...........................................  25
+   Appendix E: Recommended reading..................................  27
+   Appendix F: Workshop attendee list...............................  29
+   Appendix G: Authors' Addresses...................................  30
+
+Abstract
+
+   This report details the conclusions of an IAB-sponsored invitational
+   workshop held 29 February  - 1 March, 1996, to discuss the use of
+   character sets on the Internet.  It motivates the need to have
+   character set handling in Internet protocols which transmit text,
+   provides a conceptual framework for specifying character sets,
+   recommends the use of MIME tagging for transmitted text, recommends a
+   default character set *without* stating that there is no need for
+   other character sets, and makes a series of recommendations to the
+   IAB, IANA, and the IESG for furthering the integration of the
+   character set framework into text transmission protocols.
+
+0: Executive summary
+
+   The term 'Character Set' means many things to many people. Even the
+   MIME registry of character sets registers items that have great
+   differences in semantics and applicability. This workshop provides
+   guidance to the IAB and IETF about the use of character sets on the
+   Internet and provides a common framework for interoperability between
+   the many characters in use there.
+
+   The framework consists of four components: an architecture model,
+   which specifies components necessary for on-the-wire transmission of
+   text; recommendations for tagging transmitted (and stored) text;
+   recommended defaults for each level of the model; and a set of
+
+
+
+Weider, et. al.              Informational                      [Page 2]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   recommendations to the IAB, IANA, and the IESG for furthering the
+   integration of  this framework into text transmission protocols.
+
+   The architectural model specifies 7 layers, of which only three are
+   required for on-the-wire transmission. The Coded Character Set is a
+   mapping from a set of abstract characters to a set of integers. The
+   Character Encoding Scheme is a mapping from a Coded Character Set (or
+   several) to a set of octets. The Transfer Encoding Syntax is a
+   transformation applied to data which has been encoded using a
+   Character Encoding Scheme to allow it to be transmitted. These layers
+   should be specified in a transmitted text stream by using the MIME
+   encoding mechanisms.
+
+   This report recommends the use of ISO 10646 as the default Coded
+   Character Set, and UTF-8 as the default Character Encoding Scheme in
+   the creation of new protocols or new version of old protocols which
+   transmit text. These defaults do not deprecate the use of other
+   character sets when and where they are needed; they are simply
+   intended to provide guidance and a specification for
+   interoperability.
+
+1:  Introduction
+
+   This is the report of an IAB-sponsored invitational workshop on the
+   use of Character Sets on the Internet, held 29 February - 1 March
+   1996 at Information Sciences Institute (ISI) in Marina del Rey,
+   California.  In addition, this report covers the discussion on the
+   mailing list up to and slightly beyond the workshop itself.  The
+   goals of this workshop were to provide guidance to the IAB and the
+   IETF about the use of character sets on the Internet, and if possible
+   a common framework for interoperability between the many character
+   sets in use there.  Both goals were achieved.
+
+2:  Character sets on the Internet - the problem
+
+   The term 'character set' is typically applied to the contents of a
+   wide variety of text transmission and display protocols used on the
+   Internet.  Because the term is used to mean different things,
+   confusion has arisen.  For example, the MIME registry of character
+   sets [MIME] contains items that may differ greatly in their
+   applicability and semantics in various Internet protocols.
+
+   In addition, there is a vast profusion of different text encoding
+   schemes in use on the Internet.  This per se is not a problem; each
+   scheme has evolved to meet real needs.  However, information
+   applications such as mail, directories, and the World Wide Web have
+   each developed different techniques for dealing with the growing
+   number of schemes.  A robust information architecture for the
+
+
+
+Weider, et. al.              Informational                      [Page 3]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   Internet requires as much interoperability between these techniques
+   as possible.
+
+2.1:  Related topics deemed out of scope for this workshop
+
+   Successful display of plain text transmitted over the Internet
+   requires a lot of information about the text itself, such as the
+   underlying character set, language, and so forth.  An additional set
+   of formatting information is needed if the receiving application
+   wishes to use local (cultural) conventions when it presents the data
+   to the user.  This formatting includes information, that provides the
+   data necessary to format certain  types of textual data (dates,
+   times, numbers and monetary notation) into a form which is familiar
+   to the user.  The POSIX [POSIX] notation of locale encompasses
+   language, coded character set and cultural conventions.
+
+   To avoid unfruitful discussion, and to make the best use of the time
+   available for the workshop, we declared the following  issues out of
+   scope for the purposes of this workshop:
+
+   -  glyphs
+   -  sorting
+   -  culture (e.g. do we present the American or British spelling?)
+   -  user interface issues
+   -  internal representation of textual data
+   -  included characters (why aren't certain characters available in
+          any character set?)
+   -  locale (in the POSIX sense)
+   -  font registration
+   -  semantics
+   -  user input/output issues
+   -  Han unification issues
+
+   There are some related issues which were included for discussion,
+   most importantly the 'locale' components necessary for transport and
+   identification of multilingual texts.
+
+2.2:  Character Set handling in existing protocols
+
+   One of the group's overriding concerns was that the framework
+   developed for character set handling not break existing protocols.
+   With that in mind, the way character sets are being used in existing
+   protocols was examined.  See Appendix A for a list of those protocols
+   and some recommendations for change.
+
+2.2.1:  General comments
+
+   The problem areas here fall into three main categories: protocols,
+
+
+
+Weider, et. al.              Informational                      [Page 4]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   identifiers, and data.
+
+2.2.1.1:  Protocols
+
+   The protocol machinery SHOULD NOT be changed; allowing, for instance,
+   SMTP [SMTP] to use both MAIL FROM and POST FRA is dangerous to the
+   protocols' stability.  However, many protocols carry error messages
+   and other information that is intended for human consumption; it
+   MIGHT be an advantage to allow these to be localized into a specific
+   language and character set, rather than staying in English and US-
+   ASCII [ASCII].  If this is done, new extensions should follow the
+   framework outlined below.
+
+2.2.1.2:  Identifiers.
+
+   There is a strong statement of direction from the IAB, RFC 1958 [RFC
+   1958],  which states:
+
+        4.3 Public (i.e. widely visible) names should be in case
+            independent ASCII.  Specifically, this refers to DNS names,
+            and to protocol elements that are transmitted in text format.
+            ...
+        5.4 Designs should be fully international, with support for
+            localization (adaptation to local character sets). In
+            particular, there should be a uniform approach to character
+            set tagging for information content.
+
+   In protocols that up to now have used US-ASCII only, UTF-8 [UTF-8]
+   forms a simple upgrade path; however, its use should be negotiated
+   either by negotiating a protocol version or by negotiating charset
+   usage, and a fallback to a US-ASCII compatible representation such as
+   UTF-7 [UTF-7] MUST be available.
+
+   The need for passing application data such as language on individual
+   identifiers varies between applications; protocols SHOULD attempt to
+   evaluate this need when designing mechanisms.  Applying the ASCII
+   requirement for identifiers that are only used in a local context
+   (such as private mailbox folder names) is both unrealistic and
+   unreasonable; in such cases, methods for consistency in the handling
+   of character set should be considered.
+
+2.2.1.3:  Data
+
+   Data that require character set handling includes text, databases,
+   and HTML [HTML] pages, for example.  In these the support for
+   multiple character sets and proper application information is
+   absolutely vital, and MUST be supported.
+
+
+
+
+Weider, et. al.              Informational                      [Page 5]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+2.3:  Architectural requirements
+
+   To address the issues enumerated for this work, first an
+   architectural model was created which establishes the components that
+   are required to fully specify the transmission of textual data. Many
+   of these components are already familiar to the users of encoding
+   protocols such as MIME.  Not all of these are discussed in detail in
+   this report; we restrict ourselves primarily to those components
+   which are required to specify the 'on-the-wire' phase of text
+   transmission.
+
+   Mandating a single, all-encompassing character set would not fit well
+   with the IETF philosophy of planning for architectural diversity.
+   So, the best that can be done is to provide a common *framework* for
+   identifying and using the multitude of character sets available on
+   the Internet.  It would be an advantage if the total number of Coded
+   Character Sets could be kept to a minimum.  This framework should
+   meet the following requirements:
+
+   -  it should not break existing protocols (because then the likelihood
+        of deployment is very small),
+   -  it should allow the use of character sets currently used on the
+        Internet, and
+   -  it should be relatively easy to build into new protocols.
+
+3:  Architectural model
+
+   The basic architectural model which guided our discussions is shown
+   in below.  A distinction was made between those segments which were
+   necessary to successfully transmit character set data on-the-wire and
+   those needed to present that data to a user in a comprehensible
+   manner.  The discussions were primarily restricted to those segments
+   of the model which specify the 'on-the-wire' transmission of textual
+   data.
+
+   User interface issues: these are briefly discussed in Section 3.1.1.
+        Layout
+        Culture
+        Locale
+        Language
+   On-the-wire: see section 3.2 for detailed discussion.
+        Transfer Syntax
+        Character Encoding Scheme
+        Coded Character Set
+
+
+
+
+
+
+
+Weider, et. al.              Informational                      [Page 6]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+3.1:  Segments defined
+
+3.1:1:  User interface
+
+3.1.1.1:  Layout
+
+   Layout includes the elements needed for displaying text to the user,
+   such as font selection, word-wrapping, etc.  It is similar to the
+   'presentation' layer in the 7-layer ISO telecommunications model
+   [ISO-7498].
+
+3.1.1.2:  Culture
+
+   Culture includes information about cultural preferences, which affect
+   spelling, word choice, and so forth.
+
+3.1.1.3:  Locale
+
+   The locale component includes the information necessary to make
+   choices about text manipulation which will present the text to the
+   user in an expected format.  This information may include the display
+   of date, time and monetary symbol preferences.  Notice that locale
+   modifications are typically applied to a text stream before it is
+   presented to the user, although they also are used to specify input
+   formats.
+
+3.1.1.4:  Language
+
+   This component specifies the language of the transmitted text.  At
+   times and in specific cases, language information may be required to
+   achieve a particular level of quality for the purpose of displaying a
+   text stream.  For example, UTF-8 encoded Han may require transmission
+   of a language tag to select the specific glyphs to be displayed at a
+   particular level of quality.
+
+   Note that information other than language may be used to achieve the
+   required level of quality in a display process.  In particular, a
+   font tag is sufficient to produce identical results.  However, the
+   association of a language with a specific block of text has
+   usefulness far beyond its use in display.  In particular, as the
+   amount of information available in multiple languages on the World
+   Wide Web grows, it becomes critical to specify which language is in
+   use in particular documents, to assist automatic indexing and
+   retrieval of relevant documents.
+
+
+
+
+
+
+
+Weider, et. al.              Informational                      [Page 7]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   The term 'language tag' should be reserved for the short identifier
+   of RFC 1766 [RFC-1766] that only serves to identify the language.
+   While there may be other text attributes intimately associated with
+   the language of the document, such as desired font or text direction,
+   these should be specified with other identifiers rather than
+   overloading the language tag.
+
+3.2:  On the wire
+
+   There are three segments of the model which are required for
+   completely specifying the content of a transmitted text stream (with
+   the occasional exception of the Language component, mentioned above).
+   These components are:
+
+   1)  Coded Character Set,
+   2)  Character Encoding Scheme, and
+   3)  Transfer Encoding Syntax.
+
+   Each of these abstract components must be explicitly specified by the
+   transmitter when the data is sent.  There may be instances of an
+   implicit specification due to the protocol/standard being used (i.e.
+   ANSI/NISO Z39.50).  Also, in MIME, the Coded Character Set and
+   Character Encoding Scheme are specified by the Charset parameter to
+   the Content-Type header field, and Transfer Encoding Syntax is
+   specified by the Content-Transfer-Encoding header field.
+
+3.2.1:  Coded Character Set
+
+   A Coded Character Set (CCS) is a mapping from a set of abstract
+   characters to a set of integers.  Examples of coded character sets
+   are ISO 10646 [ISO-10646], US-ASCII [ASCII], and ISO-8859 series
+   [ISO-8859].
+
+3.2.2:  Character Encoding Scheme
+
+   A Character Encoding Scheme (CES) is a mapping from a Coded Character
+   Set or several coded character sets to a set of octets. Examples of
+   Character Encoding Schemes are ISO 2022 [ISO-2022] and UTF-8 [UTF-8].
+   A given CES is typically associated with a single CCS; for example,
+   UTF-8 applies only to ISO 10646.
+
+
+
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                      [Page 8]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+3.2.3:  Transfer Encoding Syntax
+
+   It is frequently necessary to transform encoded text into a format
+   which is transmissible by specific protocols.  The Transfer Encoding
+   Syntax (TES) is a transformation applied to character data encoded
+   using a CCS and possibly a CES to allow it to be transmitted.
+   Examples of Transfer Encoding Syntaxes are Base64 Encoding [Base64],
+   gzip encoding, and so forth.
+
+3.3:  Determining which values of CCS, CES, and TES are used
+
+   To completely specify which CCS, CES, and TES are used in a specific
+   text transmission, there needs to be a consistent set of labels for
+   specifying which CCS, CES, and TES are used.  Once the appropriate
+   mechanisms have been selected, there are six techniques for attaching
+   these labels to the data.
+
+   The labels themselves are named and registered, either with IANA
+   [IANA] or with some other registry.  Ideally, their definitions are
+   retrievable from some registration authority.
+
+   Labels may be determined in one of the following ways:
+
+   -  Determined by guessing, where the receiver of the text has to
+      guess the values of the CCS, CES, and TES. For example: "I got
+      this from Sweden so it's probably  ISO-8859-1."  This is
+      obviously not a very foolproof way to decode text.
+   -  Determined by the standard, where the protocol used to transmit
+      the data has made documented choices of CCS, CES, and TES in the
+      standard. Thus, the encodings used are known through the
+      access protocol, for example HTTP [HTTP] uses (but is not
+      limited to) ISO-8859-1, SMTP uses US-ASCII.
+   -  Attached to the transfer envelope, where the descriptive labels are
+      attached to the wrapper placed around the text for transport.
+      MIME headers are a good example of this technique.
+   -  Included in the data stream, where the data stream itself has
+      been encoded in such a way as to signal the character set used.
+      For example, ISO-2022 encodes the data with escape sequences to
+      provide information on the character subset currently being used.
+   -  Agreed by prior bilateral agreement, where some out-of-band
+      negotiation has allowed the text transmitter and receiver to
+      determine the CCS, CES, and  TES for the transmitted text.
+   -  Agreed to by negotiation during some phase, typically
+      initialization of the protocol.
+
+
+
+
+
+
+
+Weider, et. al.              Informational                      [Page 9]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+3.3.1:  Recommendations for value specification mechanisms
+
+   While each of these techniques (with the  exception of guessing) is
+   useful in particular situations, interoperability requires a more
+   consistent set of techniques.  Thus, we recommend that MIME
+   registered values be used for all tagging of character sets and
+   languages UNLESS there is an existing mechanism for determining the
+   required information using one of the other techniques (except
+   guessing).  This recommendation will require a fair bit of work on
+   the part of protocol designers, implementors, the IETF, the IESG, and
+   the IAB.
+
+   However, it is important to point out that the MIME concept of
+   'charset' in some cases cuts across several layers of components in
+   our model.  While this can be accepted in existing registrations, we
+   also recommend that the MIME registration procedure for character
+   sets be modified to show how a proposed character set deals with the
+   CCS and the CES. Most 'charsets' have a well defined CCS and CES,
+   they should merely be teased apart for the registration.
+
+   There are a number of other recommendations, but these will be
+   covered in the next sections.
+
+3.4:  Recommended Defaults
+
+   For a number of reasons, one cannot define a mandatory set of
+   defaults for all Internet protocols.  There is a mass of current
+   practice, future protocols are likely to have different purposes,
+   which may determine their handling of text, and protocols may need
+   specific variation support.  For example, in mail, text is a
+   predominant data type and coded character sets then become a major
+   issue for the protocol.  Also, since e-mail is ubiquitous and users
+   expect to be able to send it to everyone, the mail protocols need to
+   be quite adept at handling different character set encodings.  On the
+   other hand, if strings are seldom used in a given protocol, there is
+   no need to weigh the protocol down with a sophisticated apparatus for
+   handling multiple character sets, assuming that the predicated
+   character set can handle all the protocol's needs. This observation
+   also applies to the specification techniques for character set
+   parameters.  If only one character set encoding is needed, it can be
+   made explicit in the protocol specification.  Protocols with a
+   greater need for character set support will need a more elaborate
+   specification technique.
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 10]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+3.4.1:  Clarity of specification
+
+   We recommend that each protocol clearly specify what it is using for
+   each of the layers of the transmission model.  Users (or clients)
+   should never have to guess what the parameter is for a given layer.
+
+3.4.2:  Default Coded Character Set:
+
+   The default Coded Character Set is the repertoire of ISO-10646.
+
+3.4.3:   Default Character Encoding Scheme
+
+   For text-oriented protocols, new protocols should use UTF-8, and
+   protocols that have a backwards compatibility requirement should use
+   the default of the existing protocol, e.g. US-ASCII for mail, and
+   ISO-8859-1 for HTTP.  The recommended specification scheme is the
+   MIME "charset" specification, using the IANA "charset"
+   specifications.  The MIME specifications will need to be clarified to
+   meet this model in the future.
+
+   For other protocols, the default should be UTF-8 as this initially
+   allows US-ASCII to be entered as-is, and enables the full repertoire
+   of ISO 10646.
+
+   Some protocols, such as those descended from SGML [SGML], have other
+   natural notations for characters outside their "natural" repertoire;
+   for instance, HTML [HTML] allows the use of &#nnnn to refer to any
+   ISO 10646 character.  Note that this, like all other encodings that
+   depend on "escape characters", redefines at least one character from
+   the base character set for use as an indicator of "foreign"
+   characters.  Use of this approach must be weighed very carefully.
+
+3.4.4:   Default Transport Encoding Scheme
+
+   There is no recommended default for this level.  For plain text
+   oriented protocols, the bytestream transport format should be 8-bit
+   clean, possibly with normalization of end-of-line indicators.  Some
+   special cases could be made for protocols that are not 8-bit clean,
+   such as encoding it for transport over 7-bit connections.  For binary
+   the same recommendation holds as above.  The specification technique
+   should either be defined in the  protocol, if only one way is
+   permitted, or by use of MIME content-transfer-encoding (CTE)
+   techniques, using IANA registered values.
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 11]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+3.4.5:  Default Language
+
+   There is no recommended default for the language level.  For human
+   readable text, there should always be a way to specify the natural
+   language. The specification technique should be a MIME identifier
+   with IANA  registered values for languages.  If headers are used, the
+   header should be 'Content-Language'.
+
+3.4.6:  Default Locale
+
+   The default should be the POSIX locale.  The specification technique
+   should use the Cultural register of CEN ENV 12005 [CEN] for the
+   values.  If headers are used, the header should be 'Content-Locale'.
+
+3.4.7:  Default Culture
+
+   There is no recommended default for the Culture level.  The
+   specification  technique should be a MIME or MIME-like identifier
+   (e.g. Content-Culture) and should use the Cultural register of CEN
+   ENV 12005 for its values.
+
+3.4.8:  Default Presentation
+
+   There is no recommended default for the Presentation level.  The
+   specification technique should be a MIME or MIME-like identifier
+   (e.g.  Content-Layout) and use the glyph register of ISO 10036 and
+   other registers for its values.
+
+3.4.9:  Multiplexing
+
+   In some cases, text transmission may require the use of a number of
+   different values for a given parameter; for example, English
+   annotation of Japanese text might well require shifting the Content-
+   Language parameter.  The way to switch the value of parameters within
+   a single body of text depends on the application.  For instance, the
+   HTML I18N [I18N] work defines a language attribute on most of its
+   elements, including <SPAN>, <HTML>, and <BODY>, for the purpose of
+   switching between different languages.  When only one value is
+   needed, this value should be as general as possible, and specified in
+   the protocol standard with reference to the IANA or other registry
+   value.  All levels should be specified explicitly.
+
+3.4.10:  Storage
+
+   Because stored text may very well be stored without any of the
+   additional information necessary for decoding, stored text SHOULD be
+   tagged in a MIME compliant fashion.  This alleviates the problem of
+   being unable to interpret text which has been stored for a long time,
+
+
+
+Weider, et. al.              Informational                     [Page 12]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   or text whose provenance is not available.
+
+3.5:  Guidelines for conversions between coded character sets
+
+   This section covers various algorithms to convert a source text S,
+   encoded in the coded character set CCS(S), to a target text T,
+   encoded in the coded character set CCS(T).
+
+   Rep(X) is the character repertoire of coded character set X, i.e. the
+   set of characters which can be represented with X.
+
+3.5.1:  Exact conversion
+
+   When Rep(CCS(S)) and Rep(CCS(T)) are equal or Rep(CCS(S)) is a subset
+   of Rep(CCS(T)), exact conversion is possible; i.e. T is equal to S.
+   The octets just need to be remapped.  The algorithm for performing
+   this remapping is simple, if the IANA-registered definition tables
+   for CCS(S) and CCS(T) are available.
+
+3.5.2:  Approximate conversion
+
+   In all other cases, any conversion creates a text T which differs
+   from S.  There are different principles for how this inevitable
+   difference should be handled.  A choice between them should be made,
+   depending on the purpose and requirements of the conversion.  Where
+   possible, the client application should be given mechanisms to
+   determine what has been done to the text.
+
+   3.5.2.1:  Length-modifying conversion for human display
+
+   When the length of the target text T is allowed to differ from the
+   length of the source text S, one should use a conversion method in
+   which each source character is converted to one or several target
+   character(s), using a best resemblance criteria in the choice of that
+   target character(s).
+
+   Examples:
+      LATIN CAPITAL LETTER [*] ->  AE
+      COPYRIGHT SIGN       [*] -> (c)
+
+3.5.2.2:  Length-preserving conversion for human display
+
+   Where the text T must be presented and the length of T cannot differ
+   from the length of S, one should use a conversion method where each
+   source character is converted to one target character, using some
+   kind of best  resemblance criteria in the choice of target character.
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 13]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   Examples:
+     LATIN CAPITAL LETTER  [*] -> A
+     COPYRIGHT SIGN        [*] -> C
+
+3.5.2.3:  Conversion without data loss
+
+   Where the conversion of the text S into T must be completely
+   reversible, apply a Character Encoding Syntax or other reversible
+   transformation method.  This case is most frequently met in data
+   storage requirements.
+
+   Examples:
+     LATIN CAPITAL LETTER [*] -> &AE
+     COPYRIGHT SIGN       [*] -> &(C
+
+   An alternate method, which can be used if the size of Rep(CCS(T)) >=
+   Rep(CCS(S)), then for each character in Rep(CCS(S)) which is not
+   present in Rep(CCS(T)), define a mapping into a character in
+   Rep(CCS(T)) which is not present in Rep(CCS(S)).
+
+   Examples:
+     LATIN CAPITAL LETTER  [*] -> CYRILLIC CAPITAL LETTER [*]
+     COPYRIGHT SIGN  [*] -> PARTIAL DIFFERENTIAL SIGN [*]
+
+   Note that conversion without data loss requires redefining some
+   member of T to indicate "the introduction of character data outside
+   T".  This effectively adds another level of CES on top of CES(T).
+
+4: Presentation issues
+
+   There are a number of considerations to make in selecting the base
+   character set.  One such consideration is the protocol's convenience
+   to users with limited equipment (for example only ISO 8859-1 or a
+   keyboard without the ability to enter all the characters in ISO
+   10646).  Alternative representation should be considered for these
+   users, both for input and output.  Possible options for the
+   representation of characters that can not be displayed include
+   transliteration (a la CEN/TC304 or ISO TC46/SC2 ), RFC 1345 [RFC-
+   1345] representative icons, or the WG2 short name (u+xxxx).
+
+5: Open issues
+
+   In addition to the issues declared out of scope and enumerated in
+   section 2.1, the following issues are still open and will need to be
+   addressed in other forums.  These issues: language tags, public
+   identifiers such as URL names, and bi-directionality are briefly
+   discussed below as they repeatedly encroached the discussion.
+
+
+
+
+Weider, et. al.              Informational                     [Page 14]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+5.1: Language tags
+
+   Although the workshop decided not to explicitly address the so-called
+   "CJK issue", a few members felt it was necessary to have some
+   mechanism to address the problem of correct Han character display in
+   the ISO-10646 issue, and that saying that it was a "font issue" would
+   not suffice.
+
+   The "CJK issue" refers to the extended discussion about "Han
+   unification", the use of a single ISO-10646 codepoint to represent
+   multiple national variants of a Chinese (Han) character.  ISO-10646
+   can map uniquely to any single CJK national character set, but in the
+   absence of additional  information an application can not display an
+   ISO-10646 text using the proper national variants for that text.
+
+   It was agreed that language tags would be sufficient to disambiguate
+   unified characters. There was not, in our opinion, a significant
+   technical difference between the use of different coded character
+   sets with overlapping codepoints, and a single coded character set
+   with language tags.  Either way, the application has sufficient
+   information to display the text properly.
+
+   It was observed that in contemporary usage of MIME charsets, the
+   language is implied as well as the coded character set and the
+   character encoding syntax.  We agreed that this is excessive
+   overloading of MIME charsets.
+
+   To specify the language used in a particular block of text, we
+   recommend that the MIME tag "Content-Language" be used.  There are a
+   number of questions about this approach that need to be worked out,
+   however:
+
+   -  Is Content-Language: actually suitable?
+   -  Is there an overload between this function and the other
+        intended functions of Content-Language: as described in RFC
+        1766?
+   -  What, precisely, does "Content-Language: zh-tw, ja, ko, zh-cn"
+        mean in this context? We believe it means that, in drawing a
+        Han character, the Taiwanese variant (presumably traditional
+        Han) is preferred, followed by the Japanese, Korean, and
+        mainland Chinese (presumably simplified Han) variants. It does
+        *NOT* mean "mixed text containing Taiwanese, Japanese, Korean,
+        and mainland Chinese text with all the national variants in
+        each of these".
+
+   Mixed CJK text, that simultaneously displays different variants
+   occupying the same codepoint, requires language tags embedded in the
+   data.  Ohta and Handa propose in RFC 1554 [RFC-1554] a MIME charset
+
+
+
+Weider, et. al.              Informational                     [Page 15]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   using ISO-2022 shifts between multiple coded character sets; in
+   effect this is an encoding that uses coded character sets for
+   displaying the appropriate glyphs.
+
+   There is some speculation that states that mixed CJK text is
+   relatively infrequent, and that therefore it is acceptable to require
+   that such text be represented using a rich text format that can
+   support language tags.  In other words, that a simplifying assumption
+   can be made for TEXT/PLAIN in  email using ISO-10646 that will not
+   require multiple display representations for the same codepoint.  A
+   mechanism such as RFC 1554 could address this need if it was
+   important; although arguably RFC 1554 should really be identified as
+   TEXT/ISO-2022.
+
+   Note again that we recommend that support for language tagging SHOULD
+   be built into new protocols, as this will become a critical component
+   of the automated indexing and retrieval in information applications
+   of the future.
+
+5.2:   Public identifiers
+
+   There is a considerable demand from the user community for the
+   ability to use non-ASCII characters in URL names, IMAP mailbox names,
+   file names, and other public identifiers. This is still an open
+   problem.
+
+5.3:   Bi-directionality
+
+   It was realized that a consistent framework for bi-directional text
+   was needed but there was no attempt to work on it in this workshop.
+
+6:  Security Considerations
+
+   There are no security considerations associated with character sets.
+
+7:  Conclusions
+
+   This paper provides a conceptual framework and a set of
+   recommendations which, if adopted, should provide a solid foundation
+   for interoperability on the Internet. There are, however, a number of
+   open issues which will need to be addressed to provide ever better
+   use of text on the Internet.
+
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 16]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+8:  Recommendations
+
+8.1:  To the IAB
+
+   There were a number of recommendations to the IAB about making the
+   standards process more aware of the need for character set
+   interoperability, and about the framework itself.
+
+   A: The IAB should trigger the examination of all RFCs to determine
+   the way  they handle character sets, and obsolete or annotate the
+   RFCs where necessary.
+
+   B: The IESG should trigger the recommendation of procedures to the
+   RFC editor  to encourage RFCs to specify character set handling if
+   they specify the  transmission of text.
+
+   C: The IAB should trigger the production of a perspectives document
+   on the  character set work that has gone on in the past and relate it
+   to the current framework.
+
+   D: Full ISO 10646 has a sufficiently broad repertoire, and scope for
+   further extension, that it is sufficient for use in Internet
+   Protocols (without excluding the use of existing alternatives).
+   There is no need for specific development of character set standards
+   for the Internet.
+
+   E: The IAB should encourage the IRTF to create a research group to
+   explore the open issues of character sets on the Internet. This group
+   should set its sights much higher than this workshop did.
+
+   F: The IANA (perhaps with the help of an IETF or IRTF group) should
+   develop  procedures for the registration of new character sets for
+   use in the Internet.
+
+   G: Register UTF-8 as a Character Encoding Scheme for MIME.
+
+   H: The current use of the "x-*" format for distinguishing
+   experimental tags should be continued for private use among
+   consenting parties. All other namespaces should be allocated by IANA.
+
+   I: Application protocol RFCs SHOULD include a section on
+   "multilingual Considerations".
+
+   J: Application Protocol RFCs SHOULD indicate how to transfer 'on the
+   wire' all characters in the character sets they use. They SHOULD also
+   specify how to transfer other information that applications may need
+   to know about the data.
+
+
+
+
+Weider, et. al.              Informational                     [Page 17]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   K: The IESG should trigger a set of extensions to RFC 1522 to allow
+   language tagging of the free text parts of message headers.
+
+8.2:  For new Internet protocols
+
+   New protocols do not suffer from the need to be compatible with old
+   7-bit pipes.  New protocol specifications SHOULD use ISO 10646 as the
+   base charset unless there is an overriding need to use a different
+   base character set.
+
+   New protocols SHOULD use values from the IANA registries when
+   referring to parameter values.  The way these values are carried in
+   the protocols is protocol dependent; if the protocol uses RFC-822-
+   like headers, the header names already in use SHOULD be used.
+
+   For protocols with only a single choice for each component, the
+   protocol  should use the most general specification and should be
+   specified with reference to the registered value in the protocol
+   standard.
+
+   Protocols SHOULD tag text streams with the language of the text.
+
+8.3:  For the registration of new character sets
+
+   Ned Freed will be releasing a new MIME registration document in
+   conjunction with this paper.
+
+8.3.1:   A definition table for a coded character set
+
+   A definition table for a coded character set A must for each
+   character C that is in the repertoire of A give:
+
+   a) if C is present in ISO 10646, the code value (in hexadecimal form)
+        for that character.
+
+   b) If C is not present in ISO 10646, but may be constructed using ISO
+        10646 combining characters, the series of code values (in
+        hexadecimal form) used to construct that character.
+
+   c) if C is not present in ISO 10646, a textual description of the
+        character,  and a reference to its origin.
+
+
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 18]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+8.3.2:   A definition of a character encoding scheme
+
+   A definition of a character encoding scheme consists of:
+
+   -  A description of an algorithm which transforms every possible
+        sequence of octets to either a sequence of pairs <CCS, code
+        value> or to the  error state "illegal octet sequence"
+   -  Specifications, either by reference to CCS's registered by IANA or
+      in text, of each CCS upon which this CES is based.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 19]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+Appendix A:
+
+A-1:  IETF Protocols
+
+   The following list describes how various existing protocols handle
+   multiple character set information.
+
+   Email
+
+      SMTP
+        See 8.2. ESMTP makes it easy to negotiate the use of alternate
+        language and encoding if it is needed.
+      Headers
+        RFC 1522 forms an adequate framework for supporting text; UTF-8
+        alone is not a possible solution, because the mail pathways are
+        assumed to be 7-bit 'forever'. However, RFC 1522 should be
+        extended to allow language tagging of the free text parts of
+        message headers.
+      Bodies
+        Selection of charset parameters for Email text bodies is
+        reasonably well covered by the charset= parameter on Text/* MIME
+        types.  Language is defined by the Content-language header of
+        RFC 1766.  Other information will have to be added using body
+        part headers; due to the way MIME differentiates between body
+        part headers and message headers, these will all have to have
+        names starting with Content- .
+
+   NetNews
+
+      NNTP
+        See 8.2. No strong tradition for negotiation of encoding in NNTP
+        exists.
+      NetNews Messages
+        These should be able to leverage off the mechanisms defined for
+        Email.  One difference is that nearly all NNTP channels are 8-
+        bit clean; some NNTP newsgroups have a tradition of using 8-bit
+        charsets in both headers and bodies. Defining character set
+        default on a per newsgroup basis might be a suitable approach.
+
+   RTCP
+        The identifiers carried as information about parties are already
+        defined to be in UTF-8.
+
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 20]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   FTP
+      Protocol
+        See 8.2. The common use of welcome banners in the login response
+        means that there might be strong reason here to allow client and
+        server to negotiate a language different from the default for
+        greetings and error messages. This should be a simple protocol
+        extension.
+      Filenames
+        Many fileservers now how have the capability of using non-ASCII
+        characters in filenames, while the "dir" and "get" commands of
+        are defined in terms of US-ASCII only. One possible solution
+        would be to define a "UTF-8" mode for the transfer of filenames
+        and directory information; this would need to be a negotiated
+        facility, with fallback to US-ASCII if not negotiated. The
+        important point here is consistency between all implementations;
+        a single charset is better here than the ability to handle
+        multiple charsets.
+
+   World Wide Web
+      HTTP
+        See 8.2. The single-shot stype of HTTP makes negotiation more
+        complex than it would otherwise be.
+      HTML
+        Internationalization of HTML [I18N] seems fairly well covered in
+        the current "I18N" document. It needs review to see if it needs
+        more specific details in order to carry application information
+        apart from the language.
+
+   URLs
+        URLs are "input identifiers", and powerful arguments should be
+        made if they are ever to be anything but US-ASCII.
+
+   IMAP
+        IMAP's information objects are MIME Email objects, and therefore
+        are able to use that standard's methods. However, IMAP folder
+        names are local identifiers; there is strong reason to allow
+        non-ASCII characters in these. A UTF-8 negotiation might be the
+        most appropriate thing, however, UTF-8 is awkward to use.
+        Unfortunately, UTF-7 isn't suitable because it conflicts with
+        popular hierarchy delimiters. The most recent IMAP work in
+        progress specification describes a modified UTF-7 which avoids
+        this problem.
+
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 21]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   DNS
+        DNS names are the prime example of identifiers that need to stay
+        in US-ASCII for global interoperability. However, some DNS
+        information, in particular TXT records, may represent
+        information (such as names) that is outside the ASCII range. A
+        single solution is the best; problems resulting from UTF-8
+        should be investigated.
+
+   WHOIS++
+        WHOIS++ version 1 is defined to use ISO 8859-1. The next version
+        will use UTF-8. The currently designed changes will also allow
+        the specification of individual attributes on attribute names;
+        these will make the passing of application information about the
+        values (such as language) easier. No immediate action seems
+        necessary.
+
+   WHOIS
+        This has been a stable protocol for so many years now that it
+        seems unwise to suggest that it be modified. Furthermore,
+        compatible extensions exist in RWHOIS and WHOIS++; modification
+        should rather be made to these protocols than to the WHOIS
+        protocol itself.
+
+   Telnet
+        This is a prime example of protocol where character set support
+        is necessary and nonexistent. The current work in progress on
+        character set negotiation in Telnet seems adequate to the task;
+        the question of passing other application data that might be
+        useful is still open.
+
+A-2: Non-IETF protocols
+
+   For these protocols, the IETF does not have any power to change them.
+   However, the guidelines developed by the workshop may still be useful
+   as input to the further development of the protocols.
+
+   Gopher: Gopher, Gopher+
+
+   Prospero (Archie)
+
+   NFS:  Filesystem
+
+   CORBA, Finger, GEDI, IRC, ISO 10160/1, Kerberos, LPR, RSTAT, RWhois,
+   SGML, TFTP, X11, X.500, Z39.50
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 22]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+Appendix B: Acronyms
+
+   ASCII       American National Standard Code for Information Character
+                 Sets
+   CCS         Coded Character Sets
+   CEN ENV     European Committee for Standardisation (CEN) European
+                 pre-standard (ENV)
+   CES         Character Encoding Scheme
+   CJK         Chinese Japanese Korean
+   CORBA       Common Object Request Broker Architecture
+   CTE         Content Transfer Encoding
+   DNS         Domain Name Service
+   ESMTP       Extended SMTP
+   FTP         File Transfer Protocol
+   HTML        Hypertext Transfer Protocol
+   I18N        Internationalization (or 18 characters between the first
+                 (I) and last (n)character)
+   IAB         Internet Activities Board
+   IANA        Internet Assigned Numbers Authority
+   IESG        Internet Engineering Steering Group
+   IETF        Internet Engineering Task Force
+   IMAP        Internet Message Access Protocol
+   IRC         Internet Relay Chat
+   IRTF        Internet Research Task Force
+   ISI         Information Sciences Institute
+   ISO         International Standards Organization
+   MIME        Multipurpose Internet Mail Extensions
+   NFS         Networked File Server
+   NNTP        Net News Transfer Protocol
+   POSIX       Portable Operating System Interface
+   RFC         Request for Comments (Internet standards documents)
+   RPC         Remote Procedure Call
+   RSTAT       Remote Statistics
+   RTCP        Real-Time Transport Control Protocol
+   Rwhois      Referral Whois
+   SGML        Standard Generalized Mark-up Language
+   SMTP        Simple Mail Transfer Protocol
+   TES         Transfer Encoding Syntax
+   TFTP        Trivial File Transfer Protocol
+   URL         Uniform Resource Locator
+   UTF         Universal Text/Translation Format
+
+
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 23]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+Appendix C:  Glossary
+
+   Bi-directionality -  A property of some text where text written right-
+         to- left (Arabic or Hebrew) and text written left-to-right
+         (e.g. Latin) are intermixed in one and the same line.
+
+   Character - A single graphic symbol represented by sequence of one or
+        more bytes.
+
+   Character Encoding Scheme - The mapping from a coded character set to
+        an encoding which may be more suitable for specific purpose. For
+        example, UTF-8 is a character encoding scheme for ISO 10646.
+
+   Character Set - An enumerated group of symbols (e.g., letters, numbers
+        or glyphs)
+
+   Coded Character Set - The mapping from a set of integers to the
+        characters of a character set.
+
+   Culture - Preferences in the display of text based on cultural norms,
+        such as spelling and word choice.
+
+   Language - The words and combinations of words the constitute a system
+        of expression and communication among people with a shared
+        history or set of traditions.
+
+   Layout - Information needed to display text to the user, similar to
+        the presentation layer in the ISO telecommunications model.
+
+   Locale - The attributes of communication, such as language, character
+        set and cultural conventions.
+
+   On-the-wire -  The data that actually gets put into packets for
+        transmission to other computers.
+
+   Transfer Encoding Syntax -  The mapping from a coded character set
+        which has been encoded in a Character Encoding Scheme to an
+        encoding which may be more suitable for transmission using
+        specific protocols. For example, Base64 is a transfer encoding
+        syntax.
+
+
+
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 24]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+Appendix D:  References
+
+[*]  Non-ASCII character
+
+[ASCII]  ANSI X3.4:1986  "Coded  Character Sets - 7 Bit American
+     National Standard Code for Information Interchange (7-bit ASCII)"
+
+[Base64]  Freed, N., and N. Borenstein, "Multipurpose Internet
+     Mail Extensions (MIME) Part One: Format of Internet Message
+     Bodies", RFC 2045, November 1996.
+
+[CEN]  see http://tobbi.iti.is/TC304/welcome.html for current status.
+
+[HTML]  Berners-Lee, T., and D. Connolly, "Hypertext Markup Language -
+     2.0", RFC 1866, November 1995.
+
+[HTTP]  Berners-Lee, T., Fielding, R., and H. Nielsen, "Hypertext
+     Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996.
+
+[I18N]  Yergeau, F., et.al.,  "Internationalization of the Hypertext
+     Markup Language", RFC 2070, January 1997.
+
+[IANA] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC
+     1700, ISI, October 1994.
+
+[ISO-2022]  ISO/IEC 2022:1994,  "Information technology -- Character
+     Code Structure and Extension Techniques",  JTC1/SC2.
+
+[ISO-7498]  ISO/IEC 7498-1:1994,  "Information technology - Open Systems
+     Interconnection - Basic Reference Model:  The Basic Model".
+
+[ISO-8859]  Information Processing -- 8-bit Single-Byte Coded Graphic
+     Character Sets -- Part 1: Latin Alphabet no. 1,
+     ISO 8859-1:1987(E). Part 2: Latin Alphabet no. 2, ISO 8859-2
+     1987(E). Part 3: Latin Alphabet no. 3, ISO 8859-3:1988(E).
+     Part 4: Latin Alphabet no. 4, ISO 8859-4, 1988(E). Part 5:
+     Latin/Cyrillic Alphabet ISO 8859-5, 1988(E). Part 6:
+     Latin/Arabic Alphabet, ISO 8859-6, 1987(E). Part 7: Latin/Greek
+     Alphabet, ISO 8859-7, 1987(E). Part 8: Latin/Hebrew Alphabet, ISO
+     8859-8-1988(E).Part 9: Latin Alphabet no. 5, ISO 8859-9, 1990(E).
+     Part 10: Latin Alphabet no. 6, ISO 8859-10:1992(E).
+
+[ISO-10646]  ISO/IEC 10646-1:1993(E ),  "Information technology --
+     Universal Multiple-Octet Coded Character Set (UCS) -- Part 1:
+     Architecture and Basic Multilingual Plane".  JTC1/SC2, 1993
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 25]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+[MIME]  See [Base64]
+
+[POSIX]  Institute of Electrical and Electronics Engineers.  "IEEE
+     standard interpretations for IEEE standard portable operating
+     systems interface for computer environments". IEEE Std 1003.1
+     -1988/Int, 1992 edition.  Sponsor, Technical Committee on Operating
+     Systems of the IEEE Computer Society.  New York, NY: Institute of
+     Electrical and Electronic Engineers, 1992.
+
+RFC 1340  See [IANA]
+
+[RFC-1345]  Simonsen, K., "Character Mnemonics & Character Sets",
+     RFC 1345, Rationel Alim Planlaegning, June 1992.
+
+[RFC-1554]  Ohta, M., and K. Handa,  "ISO-2022-JP-2: Multilingual
+     Extension of ISO-2022-JP",  Tokyo Institute of Technology, ETL,
+     December 1993.
+
+RFC 1642  See [UTF-7]
+
+[RFC-1766]  Alvestrad, H., "Tags for the Identification of Languages",
+     RFC 1766, UNINETT, March 1995.
+
+[RFC 1958]  Carpenter, B. (ed.) "Architectural Principles of the
+     Internet", RFC 1958, IAB, June 1996.
+
+[SGML] ISO 8879:1986 "Information Processing - Text and Office Systems
+     - Standard Generalized Markup Language (SGML)"
+
+[SMTP]   Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC 821,
+     August, 1982.
+
+[Unicode]  "The Unicode standard, version 2.0.  Unicode Consortium.
+     Reading, Mass.: Addison-Wesley Developers Press, 1996
+
+[UTF-7]  Goldsmith, D., and M. Davis, "UTF-7: A Mail Safe
+     Transformation Format of Unicode", RFC 1642, Taligent, Inc., July
+     1994.
+
+[UTF-8]  International Standards Organization, Joint Technical
+     Committee 1 (ISO/JTC1), "Amendment 2:1993, UCS Transformation
+     Format 8 (UTF-8)", in ISO/IEC 10646-1:1993 Information technology
+     - Universal Multiple-Octet Coded Character Set (UCS) -- Part 1:
+     Architecture and Basic Multilingual Plane.  JTC1/SC2, 1993.
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 26]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+Appendix E:  Recommended reading
+
+Alvestrand, H., "Tags for the Identification of Languages", RFC 1766,
+     UNINETT, March 1995.
+
+Alvestrand, H., "X.400 Use of Extended Character Sets", RFC 1502,
+     SINTEF DELAB, August 1993.
+
+Borenstein, N.,  "Implications of MIME for Internet Mail Gateways",
+     RFC 1344, Bellcore, June 1992.
+
+Freed, N., and N. Borenstein, "Multipurpose Internet
+     Mail Extensions (MIME) Part One: Format of Internet Message
+     Bodies", RFC 2045, November 1996.
+
+Chernov, A., "Registration of a Cyrillic Character Set", RFC 1489,
+     RELCOM Development Team, July 1993.
+
+Choi, U., and K. Chan, "Korean Character Encoding for Internet
+     Messages", RFC 1557, KAIST, December 1993.
+
+Freed, N., and N. Borenstein, "Multipurpose Internet Mail Extensions
+     (MIME) Part Two: Media Types", RFC 2046, November 1996.
+
+Goldsmith, D., and M. Davis, "Transformation Format for Unicode",
+     RFC 1642, Taligent, Inc., July 1994.
+
+Goldsmith, D., and M. Davis, "Using Unicode with MIME", RFC 1641,
+     Taligent, Inc., July 1994.
+
+Jerman-Blazic, B. "Character handling in computer communication" in
+     "user needs in information technology standards", Computer Weekly
+     Professional service, eds. C.D. Evans, B.L. Meed & R.S. Walker,
+     P.C. Butterworth Heineman, 1993, Oxford, Boston, p. 102-129.
+
+Jerman-Blazic, B. "Tool supporting the internationalization of the
+     generic network services", Computer Networks and ISDN Systems,
+     No. 27 (1994), p. 429-435.
+
+Jerman-Blazic, B., A. Gogala and D. Gabrijelcic, "Transparent language
+     processing: A solution for internationalization of Internet
+     services", The LISA Forum Newsletter, 5 (1996) p. 12-21
+
+Lee, F., "HZ - A Data Format for Exchanging Files of Arbitrarily Mixed
+     Chinese and ASCII Characters", RFC 1843, Stanford University,
+     August 1995.
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 27]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+McCarthy, J., "Arbitrary Character Sets", RFC 373, Stanford
+     University, July 1972.
+
+Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Two:
+     Message Header Extensions for Non-ASCII Text", RFC 1522,
+     September 1993.  (Obsoleted by RFC 2047.)
+
+Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Three:
+     Message Header Extensions for Non-ASCII Text", RFC 2047,
+     University of Tennessee, November 1996.
+
+Murai, J., Crispin, M., and E. von der Poel. "Japanese Character
+     Encoding for Internet Messages", RFC 1468, Keio University &
+     Panda Programming, June 1993.
+
+Nussbacher, H., "Handling of Bi-directional Texts in MIME", Israeli
+     Inter-University, December 1993.
+
+Nussbacher, H., and Y. Bourvine, "Hebrew Character Encoding for
+     Internet Messages", RFC 1555, Israeli Inter-University and
+     Hebrew University, December 1993.
+
+Ohta, M., "Character Sets ISO-10646 and ISO-10646-J-1", RFC 1815,
+     Tokyo Institute of Technology, July 1995.
+
+Postel, J., and J. Reynolds, "File Transfer Protocol (FTP)", STD 9,
+     RFC 959, ISI, October 1985.
+
+Postel, J., and J. Reynolds, "Telnet Protocol Specification", STD 8,
+     RFC 854, ISI, May 1983.
+
+Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC 1700,
+     ISI, October 1994. p.100-117.
+
+Rose, M., "The Internet Message", Prentice Hall, 1992.
+
+Simonsen, K., "Character Mnemonics & Character Sets", RFC 1345,
+     Rationel Almen Planlaegning, June 1992.
+
+Unicode Consortium.  "The Unicode standard, version 2.0.  Reading,
+     Mass.: Addison-Wesley Developers Press, 1996
+
+Wei, U., et.al.  "ASCII Printable Characters-Based Chinese Character
+     Encoding for Internet Messages", RFC 1842, AsiInfo Services,
+     Inc., et.al.  August 1995.
+
+Yergeau, F. "UTF-8, a transformation format of Unicode and ISO 10646",
+     RFC 2044, ALIS Technologies, October 1996.
+
+
+
+Weider, et. al.              Informational                     [Page 28]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+Zhu, H., et.al., "Chinese Character Encoding for Internet Messages",
+     RFC 1922, Tsinghua University, et.al., March 1996.
+
+Appendix F: Workshop attendee list
+
+   These people were participants on the workshop mailing list.
+   An * indicates that the person attended the workshop in person.
+
+     Glenn Adams <glenn@spyglass.com>
+   * Joan Aliprand <joan@unicode.org>
+   * Harald Alvestrand <Harald.T.Alvestrand@uninett.no>
+   * Ran Atkinson <ran@cisco.com>
+   * Bert Bos <bert@w3.org>
+   * Brian Carpenter <brian@dxcoms.cern.ch>
+   * Mark Crispin <mrc@panda.com>
+     Makx Dekkers <dekkers@pica.nl>
+     Robert Elz <kre@munnari.oz.au>
+     Patrik Faltstrom <paf@paf.se>
+   * Zhu Haifeng <zhf@net.tsinghua.edu.cn>
+     Keniichi Handa<handa@etl.go.jp>
+     Olle Jarnefors <ojarnef@admin.kth.se>
+     Borka Jerman-Blazic <borka@e5.ijs.si>
+     John Klensin <klensin@mail1.reston.mci.net>
+   * Larry Masinter <masinter@parc.xerox.com>
+   * Rick McGowan <Rick_McGowan@next.com>
+   * Keith Moore <moore+charsets@cs.utk.edu>
+   * Lisa Moore <lisam@vnet.ibm.com>
+     Ruth Moulton <ruth@muswell.demon.co.uk>
+   * Cecilia Preston <cecilia@well.com>
+   * Joyce K. Reynolds <jkrey@isi.edu>
+   * Keld Simonsen <keld@dkuug.dk>
+   * Gary Smith <Gary_Smith@oclc.org>
+   * Peter Svanberg <psv@nada.kth.se>
+   * Chris Weider <cweider@microsoft.com >
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 29]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+Appendix G: Authors' Addresses
+
+   Chris Weider
+   Microsoft Corp.
+   1 Microsoft Way
+   Redmond, WA 98052
+   USA
+
+   EMail: cweider@microsoft.com
+
+
+   Cecilia Preston
+   Preston & Lynch
+   PO Box 8310
+   Emeryville, CA 94662
+   USA
+
+   EMail: cecilia@well.com
+
+
+   Keld Simonsen
+   DKUUG
+   Freubjergvey 3
+   DK-2100 Kxbenhavn X
+   Danmark
+
+   EMail: Keld@dkuug.dk
+
+
+   Harald T. Alvestrand
+   UNINETT
+   P.O.Box 6883 Elgeseter
+   N-7002 TRONDHEIM
+   NORWAY
+
+   EMail: Harald.T.Alvestrand@uninett.no
+
+
+   Randall Atkinson
+   cisco Systems
+   170 West Tasman Drive
+   San Jose, CA 95134-1706
+   USA
+
+   EMail: rja@cisco.com
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 30]
+
+RFC 2130             Character Set Workshop Report            April 1997
+
+
+   Mark Crispin
+   Networks & Distributed Computing
+   University of Washington
+   4545 15th Avenue NE
+   Seattle, WA  98105-4527
+   USA
+
+   EMail: mrc@cac.washington.edu
+
+
+   Peter Svanberg
+   Dept. of Numberical Analysis and Computing Science (Nada)
+   Royal Institute of Technology
+   SE-100 44 STOCKHOLM
+   SWEDEN
+
+   EMail: psv@nada.kth.se
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Weider, et. al.              Informational                     [Page 31]
+