From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc1766.txt | 507 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 507 insertions(+) create mode 100644 doc/rfc/rfc1766.txt (limited to 'doc/rfc/rfc1766.txt') diff --git a/doc/rfc/rfc1766.txt b/doc/rfc/rfc1766.txt new file mode 100644 index 0000000..901c50e --- /dev/null +++ b/doc/rfc/rfc1766.txt @@ -0,0 +1,507 @@ + + + + + + +Network Working Group H. Alvestrand +Request for Comments: 1766 UNINETT +Category: Standards Track March 1995 + + + Tags for the Identification of Languages + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + This document describes a language tag for use in cases where it is + desired to indicate the language used in an information object. + + It also defines a Content-language: header, for use in the case where + one desires to indicate the language of something that has RFC-822- + like headers, like MIME body parts or Web documents, and a new + parameter to the Multipart/Alternative type, to aid in the usage of + the Content-Language: header. + +1. Introduction + + There are a number of languages spoken by human beings in this world. + + A great number of these people would prefer to have information + presented in a language that they understand. + + In some contexts, it is possible to have information in more than one + language, or it might be possible to provide tools for assisting in + the understanding of a language (like dictionaries). + + A prerequisite for any such function is a means of labelling the + information content with an identifier for the language in which is + is written. + + In the tradition of solving only problems that we think we + understand, this document specifies an identifier mechanism, and one + possible use for it. + + + + + + + +Alvestrand [Page 1] + +RFC 1766 Language Tag March 1995 + + +2. The Language tag + + The language tag is composed of 1 or more parts: A primary language + tag and a (possibly empty) series of subtags. + + The syntax of this tag in RFC-822 EBNF is: + + Language-Tag = Primary-tag *( "-" Subtag ) + Primary-tag = 1*8ALPHA + Subtag = 1*8ALPHA + + Whitespace is not allowed within the tag. + + All tags are to be treated as case insensitive; there exist + conventions for capitalization of some of them, but these should not + be taken to carry meaning. + + The namespace of language tags is administered by the IANA according + to the rules in section 5 of this document. + + The following registrations are predefined: + + In the primary language tag: + + - All 2-letter tags are interpreted according to ISO standard + 639, "Code for the representation of names of languages" [ISO + 639]. + + - The value "i" is reserved for IANA-defined registrations + + - The value "x" is reserved for private use. Subtags of "x" + will not be registered by the IANA. + + - Other values cannot be assigned except by updating this + standard. + + The reason for reserving all other tags is to be open towards new + revisions of ISO 639; the use of "i" and "x" is the minimum we can do + here to be able to extend the mechanism to meet our requirements. + + In the first subtag: + + - All 2-letter codes are interpreted as ISO 3166 alpha-2 + country codes denoting the area in which the language is + used. + + - Codes of 3 to 8 letters may be registered with the IANA by + anyone who feels a need for it, according to the rules in + + + +Alvestrand [Page 2] + +RFC 1766 Language Tag March 1995 + + + chapter 5 of this document. + + The information in the subtag may for instance be: + + - Country identification, such as en-US (this usage is + described in ISO 639) + + - Dialect or variant information, such as no-nynorsk or en- + cockney + + - Languages not listed in ISO 639 that are not variants of + any listed language, which can be registered with the i- + prefix, such as i-cherokee + + - Script variations, such as az-arabic and az-cyrillic + + In the second and subsequent subtag, any value can be registered. + + NOTE: The ISO 639/ISO 3166 convention is that language names are + written in lower case, while country codes are written in upper case. + This convention is recommended, but not enforced; the tags are case + insensitive. + + NOTE: ISO 639 defines a registration authority for additions to and + changes in the list of languages in ISO 639. This authority is: + + International Information Centre for Terminology (Infoterm) + P.O. Box 130 + A-1021 Wien + Austria + Phone: +43 1 26 75 35 Ext. 312 + Fax: +43 1 216 32 72 + + The following codes have been added in 1989 (nothing later): ug + (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang), he (Hebrew, + replacing iw), yi (Yiddish, replacing ji), and id (Indonesian, + replacing in). + + NOTE: The registration agency for ISO 3166 (country codes) is: + + ISO 3166 Maintenance Agency Secretariat + c/o DIN Deutches Institut fuer Normung + Burggrafenstrasse 6 + Postfach 1107 + D-10787 Berlin + Germany + Phone: +49 30 26 01 320 + Fax: +49 30 26 01 231 + + + +Alvestrand [Page 3] + +RFC 1766 Language Tag March 1995 + + + The country codes AA, QM-QZ, XA-XZ and ZZ are reserved by ISO 3166 as + user-assigned codes. + +2.1. Meaning of the language tag + + The language tag always defines a language as spoken (or written) by + human beings for communication of information to other human beings. + Computer languages are explicitly excluded. + + There is no guaranteed relationship between languages whose tags + start out with the same series of subtags; especially, they are NOT + guraranteed to be mutually comprehensible, although this will + sometimes be the case. + + Applications should always treat language tags as a single token; the + division into main tag and subtags is an administrative mechanism, + not a navigation aid. + + The relationship between the tag and the information it relates to is + defined by the standard describing the context in which it appears. + So, this section can only give possible examples of its usage. + + - For a single information object, it should be taken as the + set of languages that is required for a complete + comprehension of the complete object. Example: Simple text. + + - For an aggregation of information objects, it should be taken + as the set of languages used inside components of that + aggregation. Examples: Document stores and libraries. + + - For information objects whose purpose in life is providing + alternatives, it should be regarded as a hint that the + material inside is provided in several languages, and that + one has to inspect each of the alternatives in order to find + its language or languages. In this case, multiple languages + need not mean that one needs to be multilingual to get + complete understanding of the document. Example: MIME + multipart/alternative. + + - It would be possible to define (for instance) an SGML DTD + that defines a tag for indicating that following or + contained text is written in this language, such that one + could write "C'est la vie"; the Norwegian- + speaking user could then access a French-Norwegian dictionary + to find out what the quote meant. + + + + + + +Alvestrand [Page 4] + +RFC 1766 Language Tag March 1995 + + +3. The Content-language header + + The Language header is intended for use in the case where one desires + to indicate the language(s) of something that has RFC-822-like + headers, like MIME body parts or Web documents. + + The RFC-822 EBNF of the Language header is: + + Language-Header = "Content-Language" ":" 1#Language-tag + + Note that the Language-Header is allowed to list several languages in + a comma-separated list. + + Whitespace is allowed, which means also that one can place + parenthesized comments anywhere in the language sequence. + +3.1. Examples of Content-language values + + NOTE: NONE of the subtags shown in this document have actually been + assigned; they are used for illustration purposes only. + + Norwegian official document, with parallel text in both official + versions of Norwegian. (Both versions are readable by all + Norwegians). + + Content-Type: multipart/alternative; + differences=content-language + Content-Language: no-nynorsk, no-bokmaal + + Voice recording from the London docks + + Content-type: audio/basic + Content-Language: en-cockney + + Document in Sami, which does not have an ISO 639 code, and is spoken + in several countries, but with about half the speakers in Norway, + with six different, mutually incomprehensible dialects: + + Content-type: text/plain; charset=iso-8859-10 + Content-Language: i-sami-no (North Sami) + + An English-French dictionary + + Content-type: application/dictionary + Content-Language: en, fr (This is a dictionary) + + An official EC document (in a few of its official languages) + + + + +Alvestrand [Page 5] + +RFC 1766 Language Tag March 1995 + + + Content-type: multipart/alternative + Content-Language: en, fr, de, da, el, it + + An excerpt from Star Trek + + Content-type: video/mpeg + Content-Language: x-klingon + +4. Use of Content-Language with Multipart/Alternative + + When using the Multipart/Alternative body part of MIME, it is + possible to have the body parts giving the same information content + in different languages. In this case, one should put a Content- + Language header on each of the body parts, and a summary Content- + Language header onto the Multipart/Alternative itself. + +4.1. The differences parameter to multipart/alternative + + As defined in RFC 1541, Multipart/Alternative only has one parameter: + boundary. + + The common usage of Multipart/Alternative is to have more than one + format of the same message (f.ex. PostScript and ASCII). + + The use of language tags to differentiate between different + alternatives will certainly not lead all MIME UAs to present the most + sensible body part as default. + + Therefore, a new parameter is defined, to allow the configuration of + MIME readers to handle language differences in a sensible manner. + + Name: Differences + Value: One or more of + Content-Type + Content-Language + + Further values can be registered with IANA; it must be the name of a + header for which a definition exists in a published RFC. If not + present, Differences=Content-Type is assumed. + + The intent is that the MIME reader can look at these headers of the + message component to do an intelligent choice of what to present to + the user, based on knowledge about the user preferences and + capabilities. + + (The intent of having registration with IANA of the fields used in + this context is to maintain a list of usages that a mail UA may + expect to see, not to reject usages.) + + + +Alvestrand [Page 6] + +RFC 1766 Language Tag March 1995 + + + (NOTE: The MIME specification [RFC 1521], section 7.2, states that + headers not beginning with "Content-" are generally to be ignored in + body parts. People defining a header for use with "differences=" + should take note of this.) + + The mechanism for deciding which body part to present is outside the + scope of this document. + + MIME EXAMPLE: + + Content-Type: multipart/alternative; differences=Content-Language; + boundary="limit" + Content-Language: en, fr, de + + --limit + Content-Language: fr + + Le renard brun et agile saute par dessus le chien paresseux + --limit + Content-Language: de + Content-Type: text/plain; charset=iso-8859-1 + Content-Transfer-encoding: quoted-printable + + Der schnelle braune Fuchs h=FCpft =FCber den faulen Hund + --limit + Content-Language: en + + The quick brown fox jumps over the lazy dog + --limit-- + + When composing a message, the choice of sequence may be somewhat + arbitrary. However, non-MIME mail readers will show the first body + part first, meaning that this should most likely be the language + understood by most of the recipients. + +5. IANA registration procedure for language tags + + Any language tag must start with an existing tag, and extend it. + + This registration form should be used by anyone who wants to use a + language tag not defined by ISO or IANA. + + + + + + + + + + +Alvestrand [Page 7] + +RFC 1766 Language Tag March 1995 + + +---------------------------------------------------------------------- +LANGUAGE TAG REGISTRATION FORM + +Name of requester : +E-mail address of requester: +Tag to be registered : + +English name of language : + +Native name of language (transcribed into ASCII): + +Reference to published description of the language (book or article): +---------------------------------------------------------------------- + + The language form must be sent to for a 2- + week review period before submitting it to IANA. (This is an open + list. Requests to be added should be sent to .) + + When the two week period has passed, the language tag reviewer, who + is appointed by the IETF Applications Area Director, either forwards + the request to IANA@ISI.EDU, or rejects it because of significant + objections raised on the list. + + Decisions made by the reviewer may be appealed to the IESG. + + All registered forms are available online in the directory + ftp://ftp.isi.edu/in-notes/iana/assignments/languages/ + +6. Security Considerations + + Security issues are not discussed in this memo. + +7. Character set considerations + + Codes may always be expressed using the US-ASCII character repertoire + (a-z), which is present in most character sets. + + The issue of deciding upon the rendering of a character set based on + the language tag is not addressed in this memo; however, it is + thought impossible to make such a decision correctly for all cases + unless means of switching language in the middle of a text are + defined (for example, a rendering engine that decides font based on + Japanese or Chinese language will fail to work when a mixed + Japanese-Chinese text is encountered) + + + + + + +Alvestrand [Page 8] + +RFC 1766 Language Tag March 1995 + + +8. Acknowledgements + + This document has benefited from innumberable rounds of review and + comments in various fora of the IETF and the Internet working groups. + As so, any list of contributors is bound to be incomplete; please + regard the following as only a selection from the group of people who + have contributed to make this document what it is today. + + In alphabetical order: + + Tim Berners-Lee, Nathaniel Borenstein, Jim Conklin, Dave Crocker, + Ned Freed, Tim Goodwin, Olle Jarnefors, John Klensin, Keith Moore, + Masataka Ohta, Keld Jorn Simonsen, Rhys Weatherley, and many, many + others. + +9. Author's Address + + Harald Tveit Alvestrand + UNINETT + Pb. 6883 Elgeseter + N-7002 TRONDHEIM + NORWAY + + EMail: Harald.T.Alvestrand@uninett.no + Phone: +47 73 59 70 94 + +10. References + + [ISO 639] + ISO 639:1988 (E/F) - Code for the representation of names of + languages - The International Organization for + Standardization, 1st edition, 1988 17 pages Prepared by + ISO/TC 37 - Terminology (principles and coordination). + + [ISO 3166] + ISO 3166:1988 (E/F) - Codes for the representation of names + of countries - The International Organization for + Standardization, 3rd edition, 1988-08-15. + + [RFC 1521] + Borenstein, N., and N. Freed, "MIME Part One: Mechanisms for + Specifying and Describing the Format of Internet Message + Bodies", RFC 1521, Bellcore, Innosoft, September 1993. + + [RFC 1327] + Kille, S., "Mapping between X.400(1988) / ISO 10021 and RFC + 822", RFC 1327, University College London, May 1992. + + + + +Alvestrand [Page 9] + -- cgit v1.2.3