diff options
Diffstat (limited to 'doc/rfc/rfc2130.txt')
-rw-r--r-- | doc/rfc/rfc2130.txt | 1739 |
1 files changed, 1739 insertions, 0 deletions
diff --git a/doc/rfc/rfc2130.txt b/doc/rfc/rfc2130.txt new file mode 100644 index 0000000..932172c --- /dev/null +++ b/doc/rfc/rfc2130.txt @@ -0,0 +1,1739 @@ + + + + + + +Network Working Group C. Weider +Request for Comments: 2130 Microsoft +Category: Informational C. Preston + Preston & Lynch + K. Simonsen + DKUUG + H. Alvestrand + UNINETT + R. Atkinson + Cisco Systems + M. Crispin + University of Washington + P. Svanberg + KTH + April 1997 + + The Report of the IAB Character Set Workshop + held 29 February - 1 March, 1996 + +Status of this Memo + + This memo provides information for the Internet community. This memo + does not specify an Internet standard of any kind. Distribution of + this memo is unlimited. + +Acknowledgments + + The authors would like to sincerely thank Information Sciences + Institute (ISI), and in particular Joyce K. Reynolds for graciously + hosting this event; Joe Kemp and Jeanine Yamazaki of ISI made sure + the facilities met our needs. We also wish to thank the Internet + Society, which underwrote travel for participants who might not + otherwise have been able to attend. Of course, we also wish to thank + the many experts who participated in the workshop and on the mailing + list; a complete list of these people can be found in Appendix D. + Bunyip Information Systems was kind enough to provide mailing list + facilities for this work. + +Table of Contents + + Abstract + 0: Executive summary.......................................... 2 + 1: Introduction............................................... 3 + 2: Character sets on the Internet -- the problem.............. 3 + 2.1: Character set handling in existing protocols............... 4 + 3: Architectural model........................................ 6 + 3.1: Segments defined........................................... 7 + 3.2: On the wire................................................ 8 + + + +Weider, et. al. Informational [Page 1] + +RFC 2130 Character Set Workshop Report April 1997 + + + 3.3: Determining which values of CCS, CES, and TES are used..... 9 + 3.4: Recommended Defaults....................................... 10 + 3.5: Guidelines for conversions between coded character sets.... 13 + 4: Presentation issues........................................ 14 + 5: Open issues................................................ 14 + 5.1: Language tags.............................................. 15 + 5.2: Public identifiers......................................... 16 + 5.3: Bi-directionality.......................................... 16 + 6: Security Considerations.................................... 16 + 7: Conclusions................................................ 16 + 8: Recommendations............................................ 17 + 8.1: To the IAB................................................. 17 + 8.2: For new Internet protocols................................. 18 + 8.3: For registration of new character sets..................... 18 + Appendix A: List of protocols affected by character set issues... 20 + Appendix B: Acronyms............................................. 23 + Appendix C: Glossary............................................. 24 + Appendix D: References........................................... 25 + Appendix E: Recommended reading.................................. 27 + Appendix F: Workshop attendee list............................... 29 + Appendix G: Authors' Addresses................................... 30 + +Abstract + + This report details the conclusions of an IAB-sponsored invitational + workshop held 29 February - 1 March, 1996, to discuss the use of + character sets on the Internet. It motivates the need to have + character set handling in Internet protocols which transmit text, + provides a conceptual framework for specifying character sets, + recommends the use of MIME tagging for transmitted text, recommends a + default character set *without* stating that there is no need for + other character sets, and makes a series of recommendations to the + IAB, IANA, and the IESG for furthering the integration of the + character set framework into text transmission protocols. + +0: Executive summary + + The term 'Character Set' means many things to many people. Even the + MIME registry of character sets registers items that have great + differences in semantics and applicability. This workshop provides + guidance to the IAB and IETF about the use of character sets on the + Internet and provides a common framework for interoperability between + the many characters in use there. + + The framework consists of four components: an architecture model, + which specifies components necessary for on-the-wire transmission of + text; recommendations for tagging transmitted (and stored) text; + recommended defaults for each level of the model; and a set of + + + +Weider, et. al. Informational [Page 2] + +RFC 2130 Character Set Workshop Report April 1997 + + + recommendations to the IAB, IANA, and the IESG for furthering the + integration of this framework into text transmission protocols. + + The architectural model specifies 7 layers, of which only three are + required for on-the-wire transmission. The Coded Character Set is a + mapping from a set of abstract characters to a set of integers. The + Character Encoding Scheme is a mapping from a Coded Character Set (or + several) to a set of octets. The Transfer Encoding Syntax is a + transformation applied to data which has been encoded using a + Character Encoding Scheme to allow it to be transmitted. These layers + should be specified in a transmitted text stream by using the MIME + encoding mechanisms. + + This report recommends the use of ISO 10646 as the default Coded + Character Set, and UTF-8 as the default Character Encoding Scheme in + the creation of new protocols or new version of old protocols which + transmit text. These defaults do not deprecate the use of other + character sets when and where they are needed; they are simply + intended to provide guidance and a specification for + interoperability. + +1: Introduction + + This is the report of an IAB-sponsored invitational workshop on the + use of Character Sets on the Internet, held 29 February - 1 March + 1996 at Information Sciences Institute (ISI) in Marina del Rey, + California. In addition, this report covers the discussion on the + mailing list up to and slightly beyond the workshop itself. The + goals of this workshop were to provide guidance to the IAB and the + IETF about the use of character sets on the Internet, and if possible + a common framework for interoperability between the many character + sets in use there. Both goals were achieved. + +2: Character sets on the Internet - the problem + + The term 'character set' is typically applied to the contents of a + wide variety of text transmission and display protocols used on the + Internet. Because the term is used to mean different things, + confusion has arisen. For example, the MIME registry of character + sets [MIME] contains items that may differ greatly in their + applicability and semantics in various Internet protocols. + + In addition, there is a vast profusion of different text encoding + schemes in use on the Internet. This per se is not a problem; each + scheme has evolved to meet real needs. However, information + applications such as mail, directories, and the World Wide Web have + each developed different techniques for dealing with the growing + number of schemes. A robust information architecture for the + + + +Weider, et. al. Informational [Page 3] + +RFC 2130 Character Set Workshop Report April 1997 + + + Internet requires as much interoperability between these techniques + as possible. + +2.1: Related topics deemed out of scope for this workshop + + Successful display of plain text transmitted over the Internet + requires a lot of information about the text itself, such as the + underlying character set, language, and so forth. An additional set + of formatting information is needed if the receiving application + wishes to use local (cultural) conventions when it presents the data + to the user. This formatting includes information, that provides the + data necessary to format certain types of textual data (dates, + times, numbers and monetary notation) into a form which is familiar + to the user. The POSIX [POSIX] notation of locale encompasses + language, coded character set and cultural conventions. + + To avoid unfruitful discussion, and to make the best use of the time + available for the workshop, we declared the following issues out of + scope for the purposes of this workshop: + + - glyphs + - sorting + - culture (e.g. do we present the American or British spelling?) + - user interface issues + - internal representation of textual data + - included characters (why aren't certain characters available in + any character set?) + - locale (in the POSIX sense) + - font registration + - semantics + - user input/output issues + - Han unification issues + + There are some related issues which were included for discussion, + most importantly the 'locale' components necessary for transport and + identification of multilingual texts. + +2.2: Character Set handling in existing protocols + + One of the group's overriding concerns was that the framework + developed for character set handling not break existing protocols. + With that in mind, the way character sets are being used in existing + protocols was examined. See Appendix A for a list of those protocols + and some recommendations for change. + +2.2.1: General comments + + The problem areas here fall into three main categories: protocols, + + + +Weider, et. al. Informational [Page 4] + +RFC 2130 Character Set Workshop Report April 1997 + + + identifiers, and data. + +2.2.1.1: Protocols + + The protocol machinery SHOULD NOT be changed; allowing, for instance, + SMTP [SMTP] to use both MAIL FROM and POST FRA is dangerous to the + protocols' stability. However, many protocols carry error messages + and other information that is intended for human consumption; it + MIGHT be an advantage to allow these to be localized into a specific + language and character set, rather than staying in English and US- + ASCII [ASCII]. If this is done, new extensions should follow the + framework outlined below. + +2.2.1.2: Identifiers. + + There is a strong statement of direction from the IAB, RFC 1958 [RFC + 1958], which states: + + 4.3 Public (i.e. widely visible) names should be in case + independent ASCII. Specifically, this refers to DNS names, + and to protocol elements that are transmitted in text format. + ... + 5.4 Designs should be fully international, with support for + localization (adaptation to local character sets). In + particular, there should be a uniform approach to character + set tagging for information content. + + In protocols that up to now have used US-ASCII only, UTF-8 [UTF-8] + forms a simple upgrade path; however, its use should be negotiated + either by negotiating a protocol version or by negotiating charset + usage, and a fallback to a US-ASCII compatible representation such as + UTF-7 [UTF-7] MUST be available. + + The need for passing application data such as language on individual + identifiers varies between applications; protocols SHOULD attempt to + evaluate this need when designing mechanisms. Applying the ASCII + requirement for identifiers that are only used in a local context + (such as private mailbox folder names) is both unrealistic and + unreasonable; in such cases, methods for consistency in the handling + of character set should be considered. + +2.2.1.3: Data + + Data that require character set handling includes text, databases, + and HTML [HTML] pages, for example. In these the support for + multiple character sets and proper application information is + absolutely vital, and MUST be supported. + + + + +Weider, et. al. Informational [Page 5] + +RFC 2130 Character Set Workshop Report April 1997 + + +2.3: Architectural requirements + + To address the issues enumerated for this work, first an + architectural model was created which establishes the components that + are required to fully specify the transmission of textual data. Many + of these components are already familiar to the users of encoding + protocols such as MIME. Not all of these are discussed in detail in + this report; we restrict ourselves primarily to those components + which are required to specify the 'on-the-wire' phase of text + transmission. + + Mandating a single, all-encompassing character set would not fit well + with the IETF philosophy of planning for architectural diversity. + So, the best that can be done is to provide a common *framework* for + identifying and using the multitude of character sets available on + the Internet. It would be an advantage if the total number of Coded + Character Sets could be kept to a minimum. This framework should + meet the following requirements: + + - it should not break existing protocols (because then the likelihood + of deployment is very small), + - it should allow the use of character sets currently used on the + Internet, and + - it should be relatively easy to build into new protocols. + +3: Architectural model + + The basic architectural model which guided our discussions is shown + in below. A distinction was made between those segments which were + necessary to successfully transmit character set data on-the-wire and + those needed to present that data to a user in a comprehensible + manner. The discussions were primarily restricted to those segments + of the model which specify the 'on-the-wire' transmission of textual + data. + + User interface issues: these are briefly discussed in Section 3.1.1. + Layout + Culture + Locale + Language + On-the-wire: see section 3.2 for detailed discussion. + Transfer Syntax + Character Encoding Scheme + Coded Character Set + + + + + + + +Weider, et. al. Informational [Page 6] + +RFC 2130 Character Set Workshop Report April 1997 + + +3.1: Segments defined + +3.1:1: User interface + +3.1.1.1: Layout + + Layout includes the elements needed for displaying text to the user, + such as font selection, word-wrapping, etc. It is similar to the + 'presentation' layer in the 7-layer ISO telecommunications model + [ISO-7498]. + +3.1.1.2: Culture + + Culture includes information about cultural preferences, which affect + spelling, word choice, and so forth. + +3.1.1.3: Locale + + The locale component includes the information necessary to make + choices about text manipulation which will present the text to the + user in an expected format. This information may include the display + of date, time and monetary symbol preferences. Notice that locale + modifications are typically applied to a text stream before it is + presented to the user, although they also are used to specify input + formats. + +3.1.1.4: Language + + This component specifies the language of the transmitted text. At + times and in specific cases, language information may be required to + achieve a particular level of quality for the purpose of displaying a + text stream. For example, UTF-8 encoded Han may require transmission + of a language tag to select the specific glyphs to be displayed at a + particular level of quality. + + Note that information other than language may be used to achieve the + required level of quality in a display process. In particular, a + font tag is sufficient to produce identical results. However, the + association of a language with a specific block of text has + usefulness far beyond its use in display. In particular, as the + amount of information available in multiple languages on the World + Wide Web grows, it becomes critical to specify which language is in + use in particular documents, to assist automatic indexing and + retrieval of relevant documents. + + + + + + + +Weider, et. al. Informational [Page 7] + +RFC 2130 Character Set Workshop Report April 1997 + + + The term 'language tag' should be reserved for the short identifier + of RFC 1766 [RFC-1766] that only serves to identify the language. + While there may be other text attributes intimately associated with + the language of the document, such as desired font or text direction, + these should be specified with other identifiers rather than + overloading the language tag. + +3.2: On the wire + + There are three segments of the model which are required for + completely specifying the content of a transmitted text stream (with + the occasional exception of the Language component, mentioned above). + These components are: + + 1) Coded Character Set, + 2) Character Encoding Scheme, and + 3) Transfer Encoding Syntax. + + Each of these abstract components must be explicitly specified by the + transmitter when the data is sent. There may be instances of an + implicit specification due to the protocol/standard being used (i.e. + ANSI/NISO Z39.50). Also, in MIME, the Coded Character Set and + Character Encoding Scheme are specified by the Charset parameter to + the Content-Type header field, and Transfer Encoding Syntax is + specified by the Content-Transfer-Encoding header field. + +3.2.1: Coded Character Set + + A Coded Character Set (CCS) is a mapping from a set of abstract + characters to a set of integers. Examples of coded character sets + are ISO 10646 [ISO-10646], US-ASCII [ASCII], and ISO-8859 series + [ISO-8859]. + +3.2.2: Character Encoding Scheme + + A Character Encoding Scheme (CES) is a mapping from a Coded Character + Set or several coded character sets to a set of octets. Examples of + Character Encoding Schemes are ISO 2022 [ISO-2022] and UTF-8 [UTF-8]. + A given CES is typically associated with a single CCS; for example, + UTF-8 applies only to ISO 10646. + + + + + + + + + + + +Weider, et. al. Informational [Page 8] + +RFC 2130 Character Set Workshop Report April 1997 + + +3.2.3: Transfer Encoding Syntax + + It is frequently necessary to transform encoded text into a format + which is transmissible by specific protocols. The Transfer Encoding + Syntax (TES) is a transformation applied to character data encoded + using a CCS and possibly a CES to allow it to be transmitted. + Examples of Transfer Encoding Syntaxes are Base64 Encoding [Base64], + gzip encoding, and so forth. + +3.3: Determining which values of CCS, CES, and TES are used + + To completely specify which CCS, CES, and TES are used in a specific + text transmission, there needs to be a consistent set of labels for + specifying which CCS, CES, and TES are used. Once the appropriate + mechanisms have been selected, there are six techniques for attaching + these labels to the data. + + The labels themselves are named and registered, either with IANA + [IANA] or with some other registry. Ideally, their definitions are + retrievable from some registration authority. + + Labels may be determined in one of the following ways: + + - Determined by guessing, where the receiver of the text has to + guess the values of the CCS, CES, and TES. For example: "I got + this from Sweden so it's probably ISO-8859-1." This is + obviously not a very foolproof way to decode text. + - Determined by the standard, where the protocol used to transmit + the data has made documented choices of CCS, CES, and TES in the + standard. Thus, the encodings used are known through the + access protocol, for example HTTP [HTTP] uses (but is not + limited to) ISO-8859-1, SMTP uses US-ASCII. + - Attached to the transfer envelope, where the descriptive labels are + attached to the wrapper placed around the text for transport. + MIME headers are a good example of this technique. + - Included in the data stream, where the data stream itself has + been encoded in such a way as to signal the character set used. + For example, ISO-2022 encodes the data with escape sequences to + provide information on the character subset currently being used. + - Agreed by prior bilateral agreement, where some out-of-band + negotiation has allowed the text transmitter and receiver to + determine the CCS, CES, and TES for the transmitted text. + - Agreed to by negotiation during some phase, typically + initialization of the protocol. + + + + + + + +Weider, et. al. Informational [Page 9] + +RFC 2130 Character Set Workshop Report April 1997 + + +3.3.1: Recommendations for value specification mechanisms + + While each of these techniques (with the exception of guessing) is + useful in particular situations, interoperability requires a more + consistent set of techniques. Thus, we recommend that MIME + registered values be used for all tagging of character sets and + languages UNLESS there is an existing mechanism for determining the + required information using one of the other techniques (except + guessing). This recommendation will require a fair bit of work on + the part of protocol designers, implementors, the IETF, the IESG, and + the IAB. + + However, it is important to point out that the MIME concept of + 'charset' in some cases cuts across several layers of components in + our model. While this can be accepted in existing registrations, we + also recommend that the MIME registration procedure for character + sets be modified to show how a proposed character set deals with the + CCS and the CES. Most 'charsets' have a well defined CCS and CES, + they should merely be teased apart for the registration. + + There are a number of other recommendations, but these will be + covered in the next sections. + +3.4: Recommended Defaults + + For a number of reasons, one cannot define a mandatory set of + defaults for all Internet protocols. There is a mass of current + practice, future protocols are likely to have different purposes, + which may determine their handling of text, and protocols may need + specific variation support. For example, in mail, text is a + predominant data type and coded character sets then become a major + issue for the protocol. Also, since e-mail is ubiquitous and users + expect to be able to send it to everyone, the mail protocols need to + be quite adept at handling different character set encodings. On the + other hand, if strings are seldom used in a given protocol, there is + no need to weigh the protocol down with a sophisticated apparatus for + handling multiple character sets, assuming that the predicated + character set can handle all the protocol's needs. This observation + also applies to the specification techniques for character set + parameters. If only one character set encoding is needed, it can be + made explicit in the protocol specification. Protocols with a + greater need for character set support will need a more elaborate + specification technique. + + + + + + + + +Weider, et. al. Informational [Page 10] + +RFC 2130 Character Set Workshop Report April 1997 + + +3.4.1: Clarity of specification + + We recommend that each protocol clearly specify what it is using for + each of the layers of the transmission model. Users (or clients) + should never have to guess what the parameter is for a given layer. + +3.4.2: Default Coded Character Set: + + The default Coded Character Set is the repertoire of ISO-10646. + +3.4.3: Default Character Encoding Scheme + + For text-oriented protocols, new protocols should use UTF-8, and + protocols that have a backwards compatibility requirement should use + the default of the existing protocol, e.g. US-ASCII for mail, and + ISO-8859-1 for HTTP. The recommended specification scheme is the + MIME "charset" specification, using the IANA "charset" + specifications. The MIME specifications will need to be clarified to + meet this model in the future. + + For other protocols, the default should be UTF-8 as this initially + allows US-ASCII to be entered as-is, and enables the full repertoire + of ISO 10646. + + Some protocols, such as those descended from SGML [SGML], have other + natural notations for characters outside their "natural" repertoire; + for instance, HTML [HTML] allows the use of &#nnnn to refer to any + ISO 10646 character. Note that this, like all other encodings that + depend on "escape characters", redefines at least one character from + the base character set for use as an indicator of "foreign" + characters. Use of this approach must be weighed very carefully. + +3.4.4: Default Transport Encoding Scheme + + There is no recommended default for this level. For plain text + oriented protocols, the bytestream transport format should be 8-bit + clean, possibly with normalization of end-of-line indicators. Some + special cases could be made for protocols that are not 8-bit clean, + such as encoding it for transport over 7-bit connections. For binary + the same recommendation holds as above. The specification technique + should either be defined in the protocol, if only one way is + permitted, or by use of MIME content-transfer-encoding (CTE) + techniques, using IANA registered values. + + + + + + + + +Weider, et. al. Informational [Page 11] + +RFC 2130 Character Set Workshop Report April 1997 + + +3.4.5: Default Language + + There is no recommended default for the language level. For human + readable text, there should always be a way to specify the natural + language. The specification technique should be a MIME identifier + with IANA registered values for languages. If headers are used, the + header should be 'Content-Language'. + +3.4.6: Default Locale + + The default should be the POSIX locale. The specification technique + should use the Cultural register of CEN ENV 12005 [CEN] for the + values. If headers are used, the header should be 'Content-Locale'. + +3.4.7: Default Culture + + There is no recommended default for the Culture level. The + specification technique should be a MIME or MIME-like identifier + (e.g. Content-Culture) and should use the Cultural register of CEN + ENV 12005 for its values. + +3.4.8: Default Presentation + + There is no recommended default for the Presentation level. The + specification technique should be a MIME or MIME-like identifier + (e.g. Content-Layout) and use the glyph register of ISO 10036 and + other registers for its values. + +3.4.9: Multiplexing + + In some cases, text transmission may require the use of a number of + different values for a given parameter; for example, English + annotation of Japanese text might well require shifting the Content- + Language parameter. The way to switch the value of parameters within + a single body of text depends on the application. For instance, the + HTML I18N [I18N] work defines a language attribute on most of its + elements, including <SPAN>, <HTML>, and <BODY>, for the purpose of + switching between different languages. When only one value is + needed, this value should be as general as possible, and specified in + the protocol standard with reference to the IANA or other registry + value. All levels should be specified explicitly. + +3.4.10: Storage + + Because stored text may very well be stored without any of the + additional information necessary for decoding, stored text SHOULD be + tagged in a MIME compliant fashion. This alleviates the problem of + being unable to interpret text which has been stored for a long time, + + + +Weider, et. al. Informational [Page 12] + +RFC 2130 Character Set Workshop Report April 1997 + + + or text whose provenance is not available. + +3.5: Guidelines for conversions between coded character sets + + This section covers various algorithms to convert a source text S, + encoded in the coded character set CCS(S), to a target text T, + encoded in the coded character set CCS(T). + + Rep(X) is the character repertoire of coded character set X, i.e. the + set of characters which can be represented with X. + +3.5.1: Exact conversion + + When Rep(CCS(S)) and Rep(CCS(T)) are equal or Rep(CCS(S)) is a subset + of Rep(CCS(T)), exact conversion is possible; i.e. T is equal to S. + The octets just need to be remapped. The algorithm for performing + this remapping is simple, if the IANA-registered definition tables + for CCS(S) and CCS(T) are available. + +3.5.2: Approximate conversion + + In all other cases, any conversion creates a text T which differs + from S. There are different principles for how this inevitable + difference should be handled. A choice between them should be made, + depending on the purpose and requirements of the conversion. Where + possible, the client application should be given mechanisms to + determine what has been done to the text. + + 3.5.2.1: Length-modifying conversion for human display + + When the length of the target text T is allowed to differ from the + length of the source text S, one should use a conversion method in + which each source character is converted to one or several target + character(s), using a best resemblance criteria in the choice of that + target character(s). + + Examples: + LATIN CAPITAL LETTER [*] -> AE + COPYRIGHT SIGN [*] -> (c) + +3.5.2.2: Length-preserving conversion for human display + + Where the text T must be presented and the length of T cannot differ + from the length of S, one should use a conversion method where each + source character is converted to one target character, using some + kind of best resemblance criteria in the choice of target character. + + + + + +Weider, et. al. Informational [Page 13] + +RFC 2130 Character Set Workshop Report April 1997 + + + Examples: + LATIN CAPITAL LETTER [*] -> A + COPYRIGHT SIGN [*] -> C + +3.5.2.3: Conversion without data loss + + Where the conversion of the text S into T must be completely + reversible, apply a Character Encoding Syntax or other reversible + transformation method. This case is most frequently met in data + storage requirements. + + Examples: + LATIN CAPITAL LETTER [*] -> &AE + COPYRIGHT SIGN [*] -> &(C + + An alternate method, which can be used if the size of Rep(CCS(T)) >= + Rep(CCS(S)), then for each character in Rep(CCS(S)) which is not + present in Rep(CCS(T)), define a mapping into a character in + Rep(CCS(T)) which is not present in Rep(CCS(S)). + + Examples: + LATIN CAPITAL LETTER [*] -> CYRILLIC CAPITAL LETTER [*] + COPYRIGHT SIGN [*] -> PARTIAL DIFFERENTIAL SIGN [*] + + Note that conversion without data loss requires redefining some + member of T to indicate "the introduction of character data outside + T". This effectively adds another level of CES on top of CES(T). + +4: Presentation issues + + There are a number of considerations to make in selecting the base + character set. One such consideration is the protocol's convenience + to users with limited equipment (for example only ISO 8859-1 or a + keyboard without the ability to enter all the characters in ISO + 10646). Alternative representation should be considered for these + users, both for input and output. Possible options for the + representation of characters that can not be displayed include + transliteration (a la CEN/TC304 or ISO TC46/SC2 ), RFC 1345 [RFC- + 1345] representative icons, or the WG2 short name (u+xxxx). + +5: Open issues + + In addition to the issues declared out of scope and enumerated in + section 2.1, the following issues are still open and will need to be + addressed in other forums. These issues: language tags, public + identifiers such as URL names, and bi-directionality are briefly + discussed below as they repeatedly encroached the discussion. + + + + +Weider, et. al. Informational [Page 14] + +RFC 2130 Character Set Workshop Report April 1997 + + +5.1: Language tags + + Although the workshop decided not to explicitly address the so-called + "CJK issue", a few members felt it was necessary to have some + mechanism to address the problem of correct Han character display in + the ISO-10646 issue, and that saying that it was a "font issue" would + not suffice. + + The "CJK issue" refers to the extended discussion about "Han + unification", the use of a single ISO-10646 codepoint to represent + multiple national variants of a Chinese (Han) character. ISO-10646 + can map uniquely to any single CJK national character set, but in the + absence of additional information an application can not display an + ISO-10646 text using the proper national variants for that text. + + It was agreed that language tags would be sufficient to disambiguate + unified characters. There was not, in our opinion, a significant + technical difference between the use of different coded character + sets with overlapping codepoints, and a single coded character set + with language tags. Either way, the application has sufficient + information to display the text properly. + + It was observed that in contemporary usage of MIME charsets, the + language is implied as well as the coded character set and the + character encoding syntax. We agreed that this is excessive + overloading of MIME charsets. + + To specify the language used in a particular block of text, we + recommend that the MIME tag "Content-Language" be used. There are a + number of questions about this approach that need to be worked out, + however: + + - Is Content-Language: actually suitable? + - Is there an overload between this function and the other + intended functions of Content-Language: as described in RFC + 1766? + - What, precisely, does "Content-Language: zh-tw, ja, ko, zh-cn" + mean in this context? We believe it means that, in drawing a + Han character, the Taiwanese variant (presumably traditional + Han) is preferred, followed by the Japanese, Korean, and + mainland Chinese (presumably simplified Han) variants. It does + *NOT* mean "mixed text containing Taiwanese, Japanese, Korean, + and mainland Chinese text with all the national variants in + each of these". + + Mixed CJK text, that simultaneously displays different variants + occupying the same codepoint, requires language tags embedded in the + data. Ohta and Handa propose in RFC 1554 [RFC-1554] a MIME charset + + + +Weider, et. al. Informational [Page 15] + +RFC 2130 Character Set Workshop Report April 1997 + + + using ISO-2022 shifts between multiple coded character sets; in + effect this is an encoding that uses coded character sets for + displaying the appropriate glyphs. + + There is some speculation that states that mixed CJK text is + relatively infrequent, and that therefore it is acceptable to require + that such text be represented using a rich text format that can + support language tags. In other words, that a simplifying assumption + can be made for TEXT/PLAIN in email using ISO-10646 that will not + require multiple display representations for the same codepoint. A + mechanism such as RFC 1554 could address this need if it was + important; although arguably RFC 1554 should really be identified as + TEXT/ISO-2022. + + Note again that we recommend that support for language tagging SHOULD + be built into new protocols, as this will become a critical component + of the automated indexing and retrieval in information applications + of the future. + +5.2: Public identifiers + + There is a considerable demand from the user community for the + ability to use non-ASCII characters in URL names, IMAP mailbox names, + file names, and other public identifiers. This is still an open + problem. + +5.3: Bi-directionality + + It was realized that a consistent framework for bi-directional text + was needed but there was no attempt to work on it in this workshop. + +6: Security Considerations + + There are no security considerations associated with character sets. + +7: Conclusions + + This paper provides a conceptual framework and a set of + recommendations which, if adopted, should provide a solid foundation + for interoperability on the Internet. There are, however, a number of + open issues which will need to be addressed to provide ever better + use of text on the Internet. + + + + + + + + + +Weider, et. al. Informational [Page 16] + +RFC 2130 Character Set Workshop Report April 1997 + + +8: Recommendations + +8.1: To the IAB + + There were a number of recommendations to the IAB about making the + standards process more aware of the need for character set + interoperability, and about the framework itself. + + A: The IAB should trigger the examination of all RFCs to determine + the way they handle character sets, and obsolete or annotate the + RFCs where necessary. + + B: The IESG should trigger the recommendation of procedures to the + RFC editor to encourage RFCs to specify character set handling if + they specify the transmission of text. + + C: The IAB should trigger the production of a perspectives document + on the character set work that has gone on in the past and relate it + to the current framework. + + D: Full ISO 10646 has a sufficiently broad repertoire, and scope for + further extension, that it is sufficient for use in Internet + Protocols (without excluding the use of existing alternatives). + There is no need for specific development of character set standards + for the Internet. + + E: The IAB should encourage the IRTF to create a research group to + explore the open issues of character sets on the Internet. This group + should set its sights much higher than this workshop did. + + F: The IANA (perhaps with the help of an IETF or IRTF group) should + develop procedures for the registration of new character sets for + use in the Internet. + + G: Register UTF-8 as a Character Encoding Scheme for MIME. + + H: The current use of the "x-*" format for distinguishing + experimental tags should be continued for private use among + consenting parties. All other namespaces should be allocated by IANA. + + I: Application protocol RFCs SHOULD include a section on + "multilingual Considerations". + + J: Application Protocol RFCs SHOULD indicate how to transfer 'on the + wire' all characters in the character sets they use. They SHOULD also + specify how to transfer other information that applications may need + to know about the data. + + + + +Weider, et. al. Informational [Page 17] + +RFC 2130 Character Set Workshop Report April 1997 + + + K: The IESG should trigger a set of extensions to RFC 1522 to allow + language tagging of the free text parts of message headers. + +8.2: For new Internet protocols + + New protocols do not suffer from the need to be compatible with old + 7-bit pipes. New protocol specifications SHOULD use ISO 10646 as the + base charset unless there is an overriding need to use a different + base character set. + + New protocols SHOULD use values from the IANA registries when + referring to parameter values. The way these values are carried in + the protocols is protocol dependent; if the protocol uses RFC-822- + like headers, the header names already in use SHOULD be used. + + For protocols with only a single choice for each component, the + protocol should use the most general specification and should be + specified with reference to the registered value in the protocol + standard. + + Protocols SHOULD tag text streams with the language of the text. + +8.3: For the registration of new character sets + + Ned Freed will be releasing a new MIME registration document in + conjunction with this paper. + +8.3.1: A definition table for a coded character set + + A definition table for a coded character set A must for each + character C that is in the repertoire of A give: + + a) if C is present in ISO 10646, the code value (in hexadecimal form) + for that character. + + b) If C is not present in ISO 10646, but may be constructed using ISO + 10646 combining characters, the series of code values (in + hexadecimal form) used to construct that character. + + c) if C is not present in ISO 10646, a textual description of the + character, and a reference to its origin. + + + + + + + + + + +Weider, et. al. Informational [Page 18] + +RFC 2130 Character Set Workshop Report April 1997 + + +8.3.2: A definition of a character encoding scheme + + A definition of a character encoding scheme consists of: + + - A description of an algorithm which transforms every possible + sequence of octets to either a sequence of pairs <CCS, code + value> or to the error state "illegal octet sequence" + - Specifications, either by reference to CCS's registered by IANA or + in text, of each CCS upon which this CES is based. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Weider, et. al. Informational [Page 19] + +RFC 2130 Character Set Workshop Report April 1997 + + +Appendix A: + +A-1: IETF Protocols + + The following list describes how various existing protocols handle + multiple character set information. + + Email + + SMTP + See 8.2. ESMTP makes it easy to negotiate the use of alternate + language and encoding if it is needed. + Headers + RFC 1522 forms an adequate framework for supporting text; UTF-8 + alone is not a possible solution, because the mail pathways are + assumed to be 7-bit 'forever'. However, RFC 1522 should be + extended to allow language tagging of the free text parts of + message headers. + Bodies + Selection of charset parameters for Email text bodies is + reasonably well covered by the charset= parameter on Text/* MIME + types. Language is defined by the Content-language header of + RFC 1766. Other information will have to be added using body + part headers; due to the way MIME differentiates between body + part headers and message headers, these will all have to have + names starting with Content- . + + NetNews + + NNTP + See 8.2. No strong tradition for negotiation of encoding in NNTP + exists. + NetNews Messages + These should be able to leverage off the mechanisms defined for + Email. One difference is that nearly all NNTP channels are 8- + bit clean; some NNTP newsgroups have a tradition of using 8-bit + charsets in both headers and bodies. Defining character set + default on a per newsgroup basis might be a suitable approach. + + RTCP + The identifiers carried as information about parties are already + defined to be in UTF-8. + + + + + + + + + +Weider, et. al. Informational [Page 20] + +RFC 2130 Character Set Workshop Report April 1997 + + + FTP + Protocol + See 8.2. The common use of welcome banners in the login response + means that there might be strong reason here to allow client and + server to negotiate a language different from the default for + greetings and error messages. This should be a simple protocol + extension. + Filenames + Many fileservers now how have the capability of using non-ASCII + characters in filenames, while the "dir" and "get" commands of + are defined in terms of US-ASCII only. One possible solution + would be to define a "UTF-8" mode for the transfer of filenames + and directory information; this would need to be a negotiated + facility, with fallback to US-ASCII if not negotiated. The + important point here is consistency between all implementations; + a single charset is better here than the ability to handle + multiple charsets. + + World Wide Web + HTTP + See 8.2. The single-shot stype of HTTP makes negotiation more + complex than it would otherwise be. + HTML + Internationalization of HTML [I18N] seems fairly well covered in + the current "I18N" document. It needs review to see if it needs + more specific details in order to carry application information + apart from the language. + + URLs + URLs are "input identifiers", and powerful arguments should be + made if they are ever to be anything but US-ASCII. + + IMAP + IMAP's information objects are MIME Email objects, and therefore + are able to use that standard's methods. However, IMAP folder + names are local identifiers; there is strong reason to allow + non-ASCII characters in these. A UTF-8 negotiation might be the + most appropriate thing, however, UTF-8 is awkward to use. + Unfortunately, UTF-7 isn't suitable because it conflicts with + popular hierarchy delimiters. The most recent IMAP work in + progress specification describes a modified UTF-7 which avoids + this problem. + + + + + + + + + +Weider, et. al. Informational [Page 21] + +RFC 2130 Character Set Workshop Report April 1997 + + + DNS + DNS names are the prime example of identifiers that need to stay + in US-ASCII for global interoperability. However, some DNS + information, in particular TXT records, may represent + information (such as names) that is outside the ASCII range. A + single solution is the best; problems resulting from UTF-8 + should be investigated. + + WHOIS++ + WHOIS++ version 1 is defined to use ISO 8859-1. The next version + will use UTF-8. The currently designed changes will also allow + the specification of individual attributes on attribute names; + these will make the passing of application information about the + values (such as language) easier. No immediate action seems + necessary. + + WHOIS + This has been a stable protocol for so many years now that it + seems unwise to suggest that it be modified. Furthermore, + compatible extensions exist in RWHOIS and WHOIS++; modification + should rather be made to these protocols than to the WHOIS + protocol itself. + + Telnet + This is a prime example of protocol where character set support + is necessary and nonexistent. The current work in progress on + character set negotiation in Telnet seems adequate to the task; + the question of passing other application data that might be + useful is still open. + +A-2: Non-IETF protocols + + For these protocols, the IETF does not have any power to change them. + However, the guidelines developed by the workshop may still be useful + as input to the further development of the protocols. + + Gopher: Gopher, Gopher+ + + Prospero (Archie) + + NFS: Filesystem + + CORBA, Finger, GEDI, IRC, ISO 10160/1, Kerberos, LPR, RSTAT, RWhois, + SGML, TFTP, X11, X.500, Z39.50 + + + + + + + +Weider, et. al. Informational [Page 22] + +RFC 2130 Character Set Workshop Report April 1997 + + +Appendix B: Acronyms + + ASCII American National Standard Code for Information Character + Sets + CCS Coded Character Sets + CEN ENV European Committee for Standardisation (CEN) European + pre-standard (ENV) + CES Character Encoding Scheme + CJK Chinese Japanese Korean + CORBA Common Object Request Broker Architecture + CTE Content Transfer Encoding + DNS Domain Name Service + ESMTP Extended SMTP + FTP File Transfer Protocol + HTML Hypertext Transfer Protocol + I18N Internationalization (or 18 characters between the first + (I) and last (n)character) + IAB Internet Activities Board + IANA Internet Assigned Numbers Authority + IESG Internet Engineering Steering Group + IETF Internet Engineering Task Force + IMAP Internet Message Access Protocol + IRC Internet Relay Chat + IRTF Internet Research Task Force + ISI Information Sciences Institute + ISO International Standards Organization + MIME Multipurpose Internet Mail Extensions + NFS Networked File Server + NNTP Net News Transfer Protocol + POSIX Portable Operating System Interface + RFC Request for Comments (Internet standards documents) + RPC Remote Procedure Call + RSTAT Remote Statistics + RTCP Real-Time Transport Control Protocol + Rwhois Referral Whois + SGML Standard Generalized Mark-up Language + SMTP Simple Mail Transfer Protocol + TES Transfer Encoding Syntax + TFTP Trivial File Transfer Protocol + URL Uniform Resource Locator + UTF Universal Text/Translation Format + + + + + + + + + + +Weider, et. al. Informational [Page 23] + +RFC 2130 Character Set Workshop Report April 1997 + + +Appendix C: Glossary + + Bi-directionality - A property of some text where text written right- + to- left (Arabic or Hebrew) and text written left-to-right + (e.g. Latin) are intermixed in one and the same line. + + Character - A single graphic symbol represented by sequence of one or + more bytes. + + Character Encoding Scheme - The mapping from a coded character set to + an encoding which may be more suitable for specific purpose. For + example, UTF-8 is a character encoding scheme for ISO 10646. + + Character Set - An enumerated group of symbols (e.g., letters, numbers + or glyphs) + + Coded Character Set - The mapping from a set of integers to the + characters of a character set. + + Culture - Preferences in the display of text based on cultural norms, + such as spelling and word choice. + + Language - The words and combinations of words the constitute a system + of expression and communication among people with a shared + history or set of traditions. + + Layout - Information needed to display text to the user, similar to + the presentation layer in the ISO telecommunications model. + + Locale - The attributes of communication, such as language, character + set and cultural conventions. + + On-the-wire - The data that actually gets put into packets for + transmission to other computers. + + Transfer Encoding Syntax - The mapping from a coded character set + which has been encoded in a Character Encoding Scheme to an + encoding which may be more suitable for transmission using + specific protocols. For example, Base64 is a transfer encoding + syntax. + + + + + + + + + + + +Weider, et. al. Informational [Page 24] + +RFC 2130 Character Set Workshop Report April 1997 + + +Appendix D: References + +[*] Non-ASCII character + +[ASCII] ANSI X3.4:1986 "Coded Character Sets - 7 Bit American + National Standard Code for Information Interchange (7-bit ASCII)" + +[Base64] Freed, N., and N. Borenstein, "Multipurpose Internet + Mail Extensions (MIME) Part One: Format of Internet Message + Bodies", RFC 2045, November 1996. + +[CEN] see http://tobbi.iti.is/TC304/welcome.html for current status. + +[HTML] Berners-Lee, T., and D. Connolly, "Hypertext Markup Language - + 2.0", RFC 1866, November 1995. + +[HTTP] Berners-Lee, T., Fielding, R., and H. Nielsen, "Hypertext + Transfer Protocol -- HTTP/1.0", RFC 1945, May 1996. + +[I18N] Yergeau, F., et.al., "Internationalization of the Hypertext + Markup Language", RFC 2070, January 1997. + +[IANA] Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC + 1700, ISI, October 1994. + +[ISO-2022] ISO/IEC 2022:1994, "Information technology -- Character + Code Structure and Extension Techniques", JTC1/SC2. + +[ISO-7498] ISO/IEC 7498-1:1994, "Information technology - Open Systems + Interconnection - Basic Reference Model: The Basic Model". + +[ISO-8859] Information Processing -- 8-bit Single-Byte Coded Graphic + Character Sets -- Part 1: Latin Alphabet no. 1, + ISO 8859-1:1987(E). Part 2: Latin Alphabet no. 2, ISO 8859-2 + 1987(E). Part 3: Latin Alphabet no. 3, ISO 8859-3:1988(E). + Part 4: Latin Alphabet no. 4, ISO 8859-4, 1988(E). Part 5: + Latin/Cyrillic Alphabet ISO 8859-5, 1988(E). Part 6: + Latin/Arabic Alphabet, ISO 8859-6, 1987(E). Part 7: Latin/Greek + Alphabet, ISO 8859-7, 1987(E). Part 8: Latin/Hebrew Alphabet, ISO + 8859-8-1988(E).Part 9: Latin Alphabet no. 5, ISO 8859-9, 1990(E). + Part 10: Latin Alphabet no. 6, ISO 8859-10:1992(E). + +[ISO-10646] ISO/IEC 10646-1:1993(E ), "Information technology -- + Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: + Architecture and Basic Multilingual Plane". JTC1/SC2, 1993 + + + + + + +Weider, et. al. Informational [Page 25] + +RFC 2130 Character Set Workshop Report April 1997 + + +[MIME] See [Base64] + +[POSIX] Institute of Electrical and Electronics Engineers. "IEEE + standard interpretations for IEEE standard portable operating + systems interface for computer environments". IEEE Std 1003.1 + -1988/Int, 1992 edition. Sponsor, Technical Committee on Operating + Systems of the IEEE Computer Society. New York, NY: Institute of + Electrical and Electronic Engineers, 1992. + +RFC 1340 See [IANA] + +[RFC-1345] Simonsen, K., "Character Mnemonics & Character Sets", + RFC 1345, Rationel Alim Planlaegning, June 1992. + +[RFC-1554] Ohta, M., and K. Handa, "ISO-2022-JP-2: Multilingual + Extension of ISO-2022-JP", Tokyo Institute of Technology, ETL, + December 1993. + +RFC 1642 See [UTF-7] + +[RFC-1766] Alvestrad, H., "Tags for the Identification of Languages", + RFC 1766, UNINETT, March 1995. + +[RFC 1958] Carpenter, B. (ed.) "Architectural Principles of the + Internet", RFC 1958, IAB, June 1996. + +[SGML] ISO 8879:1986 "Information Processing - Text and Office Systems + - Standard Generalized Markup Language (SGML)" + +[SMTP] Postel, J., "Simple Mail Transfer Protocol", STD 10, RFC 821, + August, 1982. + +[Unicode] "The Unicode standard, version 2.0. Unicode Consortium. + Reading, Mass.: Addison-Wesley Developers Press, 1996 + +[UTF-7] Goldsmith, D., and M. Davis, "UTF-7: A Mail Safe + Transformation Format of Unicode", RFC 1642, Taligent, Inc., July + 1994. + +[UTF-8] International Standards Organization, Joint Technical + Committee 1 (ISO/JTC1), "Amendment 2:1993, UCS Transformation + Format 8 (UTF-8)", in ISO/IEC 10646-1:1993 Information technology + - Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: + Architecture and Basic Multilingual Plane. JTC1/SC2, 1993. + + + + + + + +Weider, et. al. Informational [Page 26] + +RFC 2130 Character Set Workshop Report April 1997 + + +Appendix E: Recommended reading + +Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, + UNINETT, March 1995. + +Alvestrand, H., "X.400 Use of Extended Character Sets", RFC 1502, + SINTEF DELAB, August 1993. + +Borenstein, N., "Implications of MIME for Internet Mail Gateways", + RFC 1344, Bellcore, June 1992. + +Freed, N., and N. Borenstein, "Multipurpose Internet + Mail Extensions (MIME) Part One: Format of Internet Message + Bodies", RFC 2045, November 1996. + +Chernov, A., "Registration of a Cyrillic Character Set", RFC 1489, + RELCOM Development Team, July 1993. + +Choi, U., and K. Chan, "Korean Character Encoding for Internet + Messages", RFC 1557, KAIST, December 1993. + +Freed, N., and N. Borenstein, "Multipurpose Internet Mail Extensions + (MIME) Part Two: Media Types", RFC 2046, November 1996. + +Goldsmith, D., and M. Davis, "Transformation Format for Unicode", + RFC 1642, Taligent, Inc., July 1994. + +Goldsmith, D., and M. Davis, "Using Unicode with MIME", RFC 1641, + Taligent, Inc., July 1994. + +Jerman-Blazic, B. "Character handling in computer communication" in + "user needs in information technology standards", Computer Weekly + Professional service, eds. C.D. Evans, B.L. Meed & R.S. Walker, + P.C. Butterworth Heineman, 1993, Oxford, Boston, p. 102-129. + +Jerman-Blazic, B. "Tool supporting the internationalization of the + generic network services", Computer Networks and ISDN Systems, + No. 27 (1994), p. 429-435. + +Jerman-Blazic, B., A. Gogala and D. Gabrijelcic, "Transparent language + processing: A solution for internationalization of Internet + services", The LISA Forum Newsletter, 5 (1996) p. 12-21 + +Lee, F., "HZ - A Data Format for Exchanging Files of Arbitrarily Mixed + Chinese and ASCII Characters", RFC 1843, Stanford University, + August 1995. + + + + + +Weider, et. al. Informational [Page 27] + +RFC 2130 Character Set Workshop Report April 1997 + + +McCarthy, J., "Arbitrary Character Sets", RFC 373, Stanford + University, July 1972. + +Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Two: + Message Header Extensions for Non-ASCII Text", RFC 1522, + September 1993. (Obsoleted by RFC 2047.) + +Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Three: + Message Header Extensions for Non-ASCII Text", RFC 2047, + University of Tennessee, November 1996. + +Murai, J., Crispin, M., and E. von der Poel. "Japanese Character + Encoding for Internet Messages", RFC 1468, Keio University & + Panda Programming, June 1993. + +Nussbacher, H., "Handling of Bi-directional Texts in MIME", Israeli + Inter-University, December 1993. + +Nussbacher, H., and Y. Bourvine, "Hebrew Character Encoding for + Internet Messages", RFC 1555, Israeli Inter-University and + Hebrew University, December 1993. + +Ohta, M., "Character Sets ISO-10646 and ISO-10646-J-1", RFC 1815, + Tokyo Institute of Technology, July 1995. + +Postel, J., and J. Reynolds, "File Transfer Protocol (FTP)", STD 9, + RFC 959, ISI, October 1985. + +Postel, J., and J. Reynolds, "Telnet Protocol Specification", STD 8, + RFC 854, ISI, May 1983. + +Reynolds, J., and J. Postel, "Assigned Numbers", STD 2, RFC 1700, + ISI, October 1994. p.100-117. + +Rose, M., "The Internet Message", Prentice Hall, 1992. + +Simonsen, K., "Character Mnemonics & Character Sets", RFC 1345, + Rationel Almen Planlaegning, June 1992. + +Unicode Consortium. "The Unicode standard, version 2.0. Reading, + Mass.: Addison-Wesley Developers Press, 1996 + +Wei, U., et.al. "ASCII Printable Characters-Based Chinese Character + Encoding for Internet Messages", RFC 1842, AsiInfo Services, + Inc., et.al. August 1995. + +Yergeau, F. "UTF-8, a transformation format of Unicode and ISO 10646", + RFC 2044, ALIS Technologies, October 1996. + + + +Weider, et. al. Informational [Page 28] + +RFC 2130 Character Set Workshop Report April 1997 + + +Zhu, H., et.al., "Chinese Character Encoding for Internet Messages", + RFC 1922, Tsinghua University, et.al., March 1996. + +Appendix F: Workshop attendee list + + These people were participants on the workshop mailing list. + An * indicates that the person attended the workshop in person. + + Glenn Adams <glenn@spyglass.com> + * Joan Aliprand <joan@unicode.org> + * Harald Alvestrand <Harald.T.Alvestrand@uninett.no> + * Ran Atkinson <ran@cisco.com> + * Bert Bos <bert@w3.org> + * Brian Carpenter <brian@dxcoms.cern.ch> + * Mark Crispin <mrc@panda.com> + Makx Dekkers <dekkers@pica.nl> + Robert Elz <kre@munnari.oz.au> + Patrik Faltstrom <paf@paf.se> + * Zhu Haifeng <zhf@net.tsinghua.edu.cn> + Keniichi Handa<handa@etl.go.jp> + Olle Jarnefors <ojarnef@admin.kth.se> + Borka Jerman-Blazic <borka@e5.ijs.si> + John Klensin <klensin@mail1.reston.mci.net> + * Larry Masinter <masinter@parc.xerox.com> + * Rick McGowan <Rick_McGowan@next.com> + * Keith Moore <moore+charsets@cs.utk.edu> + * Lisa Moore <lisam@vnet.ibm.com> + Ruth Moulton <ruth@muswell.demon.co.uk> + * Cecilia Preston <cecilia@well.com> + * Joyce K. Reynolds <jkrey@isi.edu> + * Keld Simonsen <keld@dkuug.dk> + * Gary Smith <Gary_Smith@oclc.org> + * Peter Svanberg <psv@nada.kth.se> + * Chris Weider <cweider@microsoft.com > + + + + + + + + + + + + + + + + + +Weider, et. al. Informational [Page 29] + +RFC 2130 Character Set Workshop Report April 1997 + + +Appendix G: Authors' Addresses + + Chris Weider + Microsoft Corp. + 1 Microsoft Way + Redmond, WA 98052 + USA + + EMail: cweider@microsoft.com + + + Cecilia Preston + Preston & Lynch + PO Box 8310 + Emeryville, CA 94662 + USA + + EMail: cecilia@well.com + + + Keld Simonsen + DKUUG + Freubjergvey 3 + DK-2100 Kxbenhavn X + Danmark + + EMail: Keld@dkuug.dk + + + Harald T. Alvestrand + UNINETT + P.O.Box 6883 Elgeseter + N-7002 TRONDHEIM + NORWAY + + EMail: Harald.T.Alvestrand@uninett.no + + + Randall Atkinson + cisco Systems + 170 West Tasman Drive + San Jose, CA 95134-1706 + USA + + EMail: rja@cisco.com + + + + + + +Weider, et. al. Informational [Page 30] + +RFC 2130 Character Set Workshop Report April 1997 + + + Mark Crispin + Networks & Distributed Computing + University of Washington + 4545 15th Avenue NE + Seattle, WA 98105-4527 + USA + + EMail: mrc@cac.washington.edu + + + Peter Svanberg + Dept. of Numberical Analysis and Computing Science (Nada) + Royal Institute of Technology + SE-100 44 STOCKHOLM + SWEDEN + + EMail: psv@nada.kth.se + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Weider, et. al. Informational [Page 31] + |