diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4646.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc4646.txt')
-rw-r--r-- | doc/rfc/rfc4646.txt | 3307 |
1 files changed, 3307 insertions, 0 deletions
diff --git a/doc/rfc/rfc4646.txt b/doc/rfc/rfc4646.txt new file mode 100644 index 0000000..466d547 --- /dev/null +++ b/doc/rfc/rfc4646.txt @@ -0,0 +1,3307 @@ + + + + + + +Network Working Group A. Phillips, Ed. +Request for Comments: 4646 Yahoo! Inc. +BCP: 47 M. Davis, Ed. +Obsoletes: 3066 Google +Category: Best Current Practice September 2006 + + + Tags for Identifying Languages + +Status of This Memo + + This document specifies an Internet Best Current Practices for the + Internet Community, and requests discussion and suggestions for + improvements. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2005). + +Abstract + + This document describes the structure, content, construction, and + semantics of language tags for use in cases where it is desirable to + indicate the language used in an information object. It also + describes how to register values for use in language tags and the + creation of user-defined extensions for private interchange. This + document, in combination with RFC 4647, replaces RFC 3066, which + replaced RFC 1766. + + + + + + + + + + + + + + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 1] + +RFC 4646 Tags for Identifying Languages September 2006 + + +Table of Contents + + 1. Introduction ....................................................3 + 2. The Language Tag ................................................4 + 2.1. Syntax .....................................................4 + 2.2. Language Subtag Sources and Interpretation .................7 + 2.2.1. Primary Language Subtag .............................8 + 2.2.2. Extended Language Subtags ..........................10 + 2.2.3. Script Subtag ......................................11 + 2.2.4. Region Subtag ......................................11 + 2.2.5. Variant Subtags ....................................13 + 2.2.6. Extension Subtags ..................................14 + 2.2.7. Private Use Subtags ................................16 + 2.2.8. Preexisting RFC 3066 Registrations .................16 + 2.2.9. Classes of Conformance .............................17 + 3. Registry Format and Maintenance ................................18 + 3.1. Format of the IANA Language Subtag Registry ...............18 + 3.2. Language Subtag Reviewer ..................................24 + 3.3. Maintenance of the Registry ...............................24 + 3.4. Stability of IANA Registry Entries ........................25 + 3.5. Registration Procedure for Subtags ........................29 + 3.6. Possibilities for Registration ............................32 + 3.7. Extensions and Extensions Registry ........................34 + 3.8. Initialization of the Registries ..........................37 + 4. Formation and Processing of Language Tags ......................38 + 4.1. Choice of Language Tag ....................................38 + 4.2. Meaning of the Language Tag ...............................40 + 4.3. Length Considerations .....................................41 + 4.3.1. Working with Limited Buffer Sizes ..................42 + 4.3.2. Truncation of Language Tags ........................43 + 4.4. Canonicalization of Language Tags .........................44 + 4.5. Considerations for Private Use Subtags ....................45 + 5. IANA Considerations ............................................46 + 5.1. Language Subtag Registry ..................................46 + 5.2. Extensions Registry .......................................47 + 6. Security Considerations ........................................48 + 7. Character Set Considerations ...................................48 + 8. Changes from RFC 3066 ..........................................49 + 9. References .....................................................52 + 9.1. Normative References ......................................52 + 9.2. Informative References ....................................53 + Appendix A. Acknowledgements ......................................55 + Appendix B. Examples of Language Tags (Informative) ...............56 + + + + + + + + +Phillips & Davis Best Current Practice [Page 2] + +RFC 4646 Tags for Identifying Languages September 2006 + + +1. Introduction + + Human beings on our planet have, past and present, used a number of + languages. There are many reasons why one would want to identify the + language used when presenting or requesting information. + + A user's language preferences often need to be identified so that + appropriate processing can be applied. For example, the user's + language preferences in a Web browser can be used to select Web pages + appropriately. Language preferences can also be used to select among + tools (such as dictionaries) to assist in the processing or + understanding of content in different languages. + + In addition, knowledge about the particular language used by some + piece of information content might be useful or even required by some + types of processing; for example, spell-checking, computer- + synthesized speech, Braille transcription, or high-quality print + renderings. + + One means of indicating the language used is by labeling the + information content with an identifier or "tag". These tags can be + used to specify user preferences when selecting information content, + or for labeling additional attributes of content and associated + resources. + + Tags can also be used to indicate additional language attributes of + content. For example, indicating specific information about the + dialect, writing system, or orthography used in a document or + resource may enable the user to obtain information in a form that + they can understand, or it can be important in processing or + rendering the given content into an appropriate form or style. + + This document specifies a particular identifier mechanism (the + language tag) and a registration function for values to be used to + form tags. It also defines a mechanism for private use values and + future extension. + + This document, in combination with [RFC4647], replaces [RFC3066], + which replaced [RFC1766]. For a list of changes in this document, + see Section 8. + + The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + + + + + + + +Phillips & Davis Best Current Practice [Page 3] + +RFC 4646 Tags for Identifying Languages September 2006 + + +2. The Language Tag + + Language tags are used to help identify languages, whether spoken, + written, signed, or otherwise signaled, for the purpose of + communication. This includes constructed and artificial languages, + but excludes languages not intended primarily for human + communication, such as programming languages. + +2.1. Syntax + + The language tag is composed of one or more parts, known as + "subtags". Each subtag consists of a sequence of alphanumeric + characters. Subtags are distinguished and separated from one another + by a hyphen ("-", ABNF [RFC4234] %x2D). A language tag consists of a + "primary language" subtag and a (possibly empty) series of subsequent + subtags, each of which refines or narrows the range of languages + identified by the overall tag. + + Usually, each type of subtag is distinguished by length, position in + the tag, and content: subtags can be recognized solely by these + features. The only exception to this is a fixed list of + grandfathered tags registered under RFC 3066 [RFC3066]. This makes + it possible to construct a parser that can extract and assign some + semantic information to the subtags, even if the specific subtag + values are not recognized. Thus, a parser need not have an up-to- + date copy (or any copy at all) of the subtag registry to perform most + searching and matching operations. + + + + + + + + + + + + + + + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 4] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The syntax of the language tag in ABNF [RFC4234] is: + + Language-Tag = langtag + / privateuse ; private use tag + / grandfathered ; grandfathered registrations + + langtag = (language + ["-" script] + ["-" region] + *("-" variant) + *("-" extension) + ["-" privateuse]) + + language = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code + / 4ALPHA ; reserved for future use + / 5*8ALPHA ; registered language subtag + + extlang = *3("-" 3ALPHA) ; reserved for future use + + script = 4ALPHA ; ISO 15924 code + + region = 2ALPHA ; ISO 3166 code + / 3DIGIT ; UN M.49 code + + variant = 5*8alphanum ; registered variants + / (DIGIT 3alphanum) + + extension = singleton 1*("-" (2*8alphanum)) + + singleton = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT + ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9" + ; Single letters: x/X is reserved for private use + + privateuse = ("x"/"X") 1*("-" (1*8alphanum)) + + grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum)) + ; grandfathered registration + ; Note: i is the only singleton + ; that starts a grandfathered tag + + alphanum = (ALPHA / DIGIT) ; letters and numbers + + Figure 1: Language Tag ABNF + + Note: There is a subtlety in the ABNF for 'variant': variants + starting with a digit MAY be four characters long, while those + starting with a letter MUST be at least five characters long. + + + + +Phillips & Davis Best Current Practice [Page 5] + +RFC 4646 Tags for Identifying Languages September 2006 + + + All subtags have a maximum length of eight characters and whitespace + is not permitted in a language tag. For examples of language tags, + see Appendix B. + + Note that although [RFC4234] refers to octets, the language tags + described in this document are sequences of characters from the + US-ASCII [ISO646] repertoire. Language tags MAY be used in documents + and applications that use other encodings, so long as these encompass + the US-ASCII repertoire. An example of this would be an XML document + that uses the UTF-16LE [RFC2781] encoding of [Unicode]. + + The tags and their subtags, including private use and extensions, are + to be treated as case insensitive: there exist conventions for the + capitalization of some of the subtags, but these MUST NOT be taken to + carry meaning. + + For example: + + o [ISO639-1] recommends that language codes be written in lowercase + ('mn' Mongolian). + + o [ISO3166-1] recommends that country codes be capitalized ('MN' + Mongolia). + + o [ISO15924] recommends that script codes use lowercase with the + initial letter capitalized ('Cyrl' Cyrillic). + + However, in the tags defined by this document, the uppercase US-ASCII + letters in the range 'A' through 'Z' are considered equivalent and + mapped directly to their US-ASCII lowercase equivalents in the range + 'a' through 'z'. Thus, the tag "mn-Cyrl-MN" is not distinct from + "MN-cYRL-mn" or "mN-cYrL-Mn" (or any other combination), and each of + these variations conveys the same meaning: Mongolian written in the + Cyrillic script as used in Mongolia. + + Although case distinctions do not carry meaning in language tags, + consistent formatting and presentation of the tags will aid users. + The format of the tags and subtags in the registry is RECOMMENDED. + In this format, all non-initial two-letter subtags are uppercase, all + non-initial four-letter subtags are titlecase, and all other subtags + are lowercase. + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 6] + +RFC 4646 Tags for Identifying Languages September 2006 + + +2.2. Language Subtag Sources and Interpretation + + The namespace of language tags and their subtags is administered by + the Internet Assigned Numbers Authority (IANA) [RFC2860] according to + the rules in Section 5 of this document. The Language Subtag + Registry maintained by IANA is the source for valid subtags: other + standards referenced in this section provide the source material for + that registry. + + Terminology in this section: + + o Tag or tags refers to a complete language tag, such as + "fr-Latn-CA". Examples of tags in this document are enclosed in + double-quotes ("en-US"). + + o Subtag refers to a specific section of a tag, delimited by hyphen, + such as the subtag 'Latn' in "fr-Latn-CA". Examples of subtags in + this document are enclosed in single quotes ('Latn'). + + o Code or codes refers to values defined in external standards (and + that are used as subtags in this document). For example, 'Latn' + is an [ISO15924] script code that was used to define the 'Latn' + script subtag for use in a language tag. Examples of codes in + this document are enclosed in single quotes ('en', 'Latn'). + + The definitions in this section apply to the various subtags within + the language tags defined by this document, excepting those + "grandfathered" tags defined in Section 2.2.8. + + Language tags are designed so that each subtag type has unique length + and content restrictions. These make identification of the subtag's + type possible, even if the content of the subtag itself is + unrecognized. This allows tags to be parsed and processed without + reference to the latest version of the underlying standards or the + IANA registry and makes the associated exception handling when + parsing tags simpler. + + Subtags in the IANA registry that do not come from an underlying + standard can only appear in specific positions in a tag. + Specifically, they can only occur as primary language subtags or as + variant subtags. + + Note that sequences of private use and extension subtags MUST occur + at the end of the sequence of subtags and MUST NOT be interspersed + with subtags defined elsewhere in this document. + + Single-letter and single-digit subtags are reserved for current or + future use. These include the following current uses: + + + +Phillips & Davis Best Current Practice [Page 7] + +RFC 4646 Tags for Identifying Languages September 2006 + + + o The single-letter subtag 'x' is reserved to introduce a sequence + of private use subtags. The interpretation of any private use + subtags is defined solely by private agreement and is not defined + by the rules in this section or in any standard or registry + defined in this document. + + o All other single-letter subtags are reserved to introduce + standardized extension subtag sequences as described in + Section 3.7. + + The single-letter subtag 'i' is used by some grandfathered tags, such + as "i-enochian", where it always appears in the first position and + cannot be confused with an extension. + +2.2.1. Primary Language Subtag + + The primary language subtag is the first subtag in a language tag + (with the exception of private use and certain grandfathered tags) + and cannot be omitted. The following rules apply to the primary + language subtag: + + 1. All two-character language subtags were defined in the IANA + registry according to the assignments found in the standard ISO + 639 Part 1, "ISO 639-1:2002, Codes for the representation of + names of languages -- Part 1: Alpha-2 code" [ISO639-1], or using + assignments subsequently made by the ISO 639 Part 1 maintenance + agency or governing standardization bodies. + + 2. All three-character language subtags were defined in the IANA + registry according to the assignments found in ISO 639 Part 2, + "ISO 639-2:1998 - Codes for the representation of names of + languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2], or + assignments subsequently made by the ISO 639 Part 2 maintenance + agency or governing standardization bodies. + + 3. The subtags in the range 'qaa' through 'qtz' are reserved for + private use in language tags. These subtags correspond to codes + reserved by ISO 639-2 for private use. These codes MAY be used + for non-registered primary language subtags (instead of using + private use subtags following 'x-'). Please refer to Section 4.5 + for more information on private use subtags. + + 4. All four-character language subtags are reserved for possible + future standardization. + + 5. All language subtags of 5 to 8 characters in length in the IANA + registry were defined via the registration process in Section 3.5 + and MAY be used to form the primary language subtag. At the time + + + +Phillips & Davis Best Current Practice [Page 8] + +RFC 4646 Tags for Identifying Languages September 2006 + + + this document was created, there were no examples of this kind of + subtag and future registrations of this type will be discouraged: + primary languages are strongly RECOMMENDED for registration with + ISO 639, and proposals rejected by ISO 639/RA will be closely + scrutinized before they are registered with IANA. + + 6. The single-character subtag 'x' as the primary subtag indicates + that the language tag consists solely of subtags whose meaning is + defined by private agreement. For example, in the tag "x-fr-CH", + the subtags 'fr' and 'CH' SHOULD NOT be taken to represent the + French language or the country of Switzerland (or any other value + in the IANA registry) unless there is a private agreement in + place to do so. See Section 4.5. + + 7. The single-character subtag 'i' is used by some grandfathered + tags (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other + grandfathered tags have a primary language subtag in their first + position.) + + 8. Other values MUST NOT be assigned to the primary subtag except by + revision or update of this document. + + Note: For languages that have both an ISO 639-1 two-character code + and an ISO 639-2 three-character code, only the ISO 639-1 two- + character code is defined in the IANA registry. + + Note: For languages that have no ISO 639-1 two-character code and for + which the ISO 639-2/T (Terminology) code and the ISO 639-2/B + (Bibliographic) codes differ, only the Terminology code is defined in + the IANA registry. At the time this document was created, all + languages that had both kinds of three-character code were also + assigned a two-character code; it is not expected that future + assignments of this nature will occur. + + Note: To avoid problems with versioning and subtag choice as + experienced during the transition between RFC 1766 and RFC 3066, as + well as the canonical nature of subtags defined by this document, the + ISO 639 Registration Authority Joint Advisory Committee (ISO 639/ + RA-JAC) has included the following statement in [iso639.prin]: + + "A language code already in ISO 639-2 at the point of freezing ISO + 639-1 shall not later be added to ISO 639-1. This is to ensure + consistency in usage over time, since users are directed in Internet + applications to employ the alpha-3 code when an alpha-2 code for that + language is not available." + + + + + + +Phillips & Davis Best Current Practice [Page 9] + +RFC 4646 Tags for Identifying Languages September 2006 + + + In order to avoid instability in the canonical form of tags, if a + two-character code is added to ISO 639-1 for a language for which a + three-character code was already included in ISO 639-2, the two- + character code MUST NOT be registered. See Section 3.4. + + For example, if some content were tagged with 'haw' (Hawaiian), which + currently has no two-character code, the tag would not be invalidated + if ISO 639-1 were to assign a two-character code to the Hawaiian + language at a later date. + + For example, one of the grandfathered IANA registrations is + "i-enochian". The subtag 'enochian' could be registered in the IANA + registry as a primary language subtag (assuming that ISO 639 does not + register this language first), making tags such as "enochian-AQ" and + "enochian-Latn" valid. + +2.2.2. Extended Language Subtags + + The following rules apply to the extended language subtags: + + 1. Three-letter subtags immediately following the primary subtag are + reserved for future standardization, anticipating work that is + currently under way on ISO 639. + + 2. Extended language subtags MUST follow the primary subtag and + precede any other subtags. + + 3. There MAY be up to three extended language subtags. + + 4. Extended language subtags MUST NOT be registered or used to form + language tags. Their syntax is described here so that + implementations can be compatible with any future revision of + this document that does provide for their registration. + + Extended language subtag records, once they appear in the registry, + MUST include exactly one 'Prefix' field indicating an appropriate + language subtag or sequence of subtags that MUST always appear as a + prefix to the extended language subtag. + + Example: In a future revision or update of this document, the tag + "zh-gan" (registered under RFC 3066) might become a valid non- + grandfathered (that is, redundant) tag in which the subtag 'gan' + might represent the Chinese dialect 'Gan'. + + + + + + + + +Phillips & Davis Best Current Practice [Page 10] + +RFC 4646 Tags for Identifying Languages September 2006 + + +2.2.3. Script Subtag + + Script subtags are used to indicate the script or writing system + variations that distinguish the written forms of a language or its + dialects. The following rules apply to the script subtags: + + 1. All four-character subtags were defined according to + [ISO15924]--"Codes for the representation of names of scripts": + alpha-4 script codes, or subsequently assigned by the ISO 15924 + maintenance agency or governing standardization bodies, denoting + the script or writing system used in conjunction with this + language. + + 2. Script subtags MUST immediately follow the primary language + subtag and all extended language subtags and MUST occur before + any other type of subtag described below. + + 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private + use in language tags. These subtags correspond to codes reserved + by ISO 15924 for private use. These codes MAY be used for non- + registered script values. Please refer to Section 4.5 for more + information on private use subtags. + + 4. Script subtags MUST NOT be registered using the process in + Section 3.5 of this document. Variant subtags MAY be considered + for registration for that purpose. + + 5. There MUST be at most one script subtag in a language tag, and + the script subtag SHOULD be omitted when it adds no + distinguishing value to the tag or when the primary language + subtag's record includes a Suppress-Script field listing the + applicable script subtag. + + Example: "sr-Latn" represents Serbian written using the Latin script. + +2.2.4. Region Subtag + + Region subtags are used to indicate linguistic variations associated + with or appropriate to a specific country, territory, or region. + Typically, a region subtag is used to indicate regional dialects or + usage, or region-specific spelling conventions. A region subtag can + also be used to indicate that content is expressed in a way that is + appropriate for use throughout a region, for instance, Spanish + content tailored to be useful throughout Latin America. + + + + + + + +Phillips & Davis Best Current Practice [Page 11] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The following rules apply to the region subtags: + + 1. Region subtags MUST follow any language, extended language, or + script subtags and MUST precede all other subtags. + + 2. All two-character subtags following the primary subtag were + defined in the IANA registry according to the assignments found + in [ISO3166-1] ("Codes for the representation of names of + countries and their subdivisions -- Part 1: Country codes") using + the list of alpha-2 country codes, or using assignments + subsequently made by the ISO 3166 maintenance agency or governing + standardization bodies. + + 3. All three-character subtags consisting of digit (numeric) + characters following the primary subtag were defined in the IANA + registry according to the assignments found in UN Standard + Country or Area Codes for Statistical Use [UN_M.49] or + assignments subsequently made by the governing standards body. + Note that not all of the UN M.49 codes are defined in the IANA + registry. The following rules define which codes are entered + into the registry as valid subtags: + + A. UN numeric codes assigned to 'macro-geographical + (continental)' or sub-regions MUST be registered in the + registry. These codes are not associated with an assigned + ISO 3166 alpha-2 code and represent supra-national areas, + usually covering more than one nation, state, province, or + territory. + + B. UN numeric codes for 'economic groupings' or 'other + groupings' MUST NOT be registered in the IANA registry and + MUST NOT be used to form language tags. + + C. UN numeric codes for countries or areas with ambiguous ISO + 3166 alpha-2 codes, when entered into the registry, MUST be + defined according to the rules in Section 3.4 and MUST be + used to form language tags that represent the country or + region for which they are defined. + + D. UN numeric codes for countries or areas for which there is an + associated ISO 3166 alpha-2 code in the registry MUST NOT be + entered into the registry and MUST NOT be used to form + language tags. Note that the ISO 3166-based subtag in the + registry MUST actually be associated with the UN M.49 code in + question. + + + + + + +Phillips & Davis Best Current Practice [Page 12] + +RFC 4646 Tags for Identifying Languages September 2006 + + + E. UN numeric codes and ISO 3166 alpha-2 codes for countries or + areas listed as eligible for registration in [RFC4645] but + not presently registered MAY be entered into the IANA + registry via the process described in Section 3.5. Once + registered, these codes MAY be used to form language tags. + + F. All other UN numeric codes for countries or areas that do not + have an associated ISO 3166 alpha-2 code MUST NOT be entered + into the registry and MUST NOT be used to form language tags. + For more information about these codes, see Section 3.4. + + 4. Note: The alphanumeric codes in Appendix X of the UN document + MUST NOT be entered into the registry and MUST NOT be used to + form language tags. (At the time this document was created, + these values matched the ISO 3166 alpha-2 codes.) + + 5. There MUST be at most one region subtag in a language tag and the + region subtag MAY be omitted, as when it adds no distinguishing + value to the tag. + + 6. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are + reserved for private use in language tags. These subtags + correspond to codes reserved by ISO 3166 for private use. These + codes MAY be used for private use region subtags (instead of + using a private use subtag sequence). Please refer to + Section 4.5 for more information on private use subtags. + + "de-CH" represents German ('de') as used in Switzerland ('CH'). + + "sr-Latn-CS" represents Serbian ('sr') written using Latin script + ('Latn') as used in Serbia and Montenegro ('CS'). + + "es-419" represents Spanish ('es') appropriate to the UN-defined + Latin America and Caribbean region ('419'). + +2.2.5. Variant Subtags + + Variant subtags are used to indicate additional, well-recognized + variations that define a language or its dialects that are not + covered by other available subtags. The following rules apply to the + variant subtags: + + 1. Variant subtags are not associated with any external standard. + Variant subtags and their meanings are defined by the + registration process defined in Section 3.5. + + 2. Variant subtags MUST follow all of the other defined subtags, but + precede any extension or private use subtag sequences. + + + +Phillips & Davis Best Current Practice [Page 13] + +RFC 4646 Tags for Identifying Languages September 2006 + + + 3. More than one variant MAY be used to form the language tag. + + 4. Variant subtags MUST be registered with IANA according to the + rules in Section 3.5 of this document before being used to form + language tags. In order to distinguish variants from other types + of subtags, registrations MUST meet the following length and + content restrictions: + + 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be + at least five characters long. + + 2. Variant subtags that begin with a digit (0-9) MUST be at + least four characters long. + + Variant subtag records in the language subtag registry MAY include + one or more 'Prefix' fields, which indicate the language tag or tags + that would make a suitable prefix (with other subtags, as + appropriate) in forming a language tag with the variant. For + example, the subtag 'nedis' has a Prefix of "sl", making it suitable + to form language tags such as "sl-nedis" and "sl-IT-nedis", but not + suitable for use in a tag such as "zh-nedis" or "it-IT-nedis". + + "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. + + "de-CH-1996" represents German as used in Switzerland and as written + using the spelling reform beginning in the year 1996 C.E. + + Most variants that share a prefix are mutually exclusive. For + example, the German orthographic variations '1996' and '1901' SHOULD + NOT be used in the same tag, as they represent the dates of different + spelling reforms. A variant that can meaningfully be used in + combination with another variant SHOULD include a 'Prefix' field in + its registry record that lists that other variant. For example, if + another German variant 'example' were created that made sense to use + with '1996', then 'example' should include two Prefix fields: "de" + and "de-1996". + +2.2.6. Extension Subtags + + Extensions provide a mechanism for extending language tags for use in + various applications. See Section 3.7. The following rules apply to + extensions: + + 1. Extension subtags are separated from the other subtags defined + in this document by a single-character subtag ("singleton"). + The singleton MUST be one allocated to a registration authority + via the mechanism described in Section 3.7 and MUST NOT be the + letter 'x', which is reserved for private use subtag sequences. + + + +Phillips & Davis Best Current Practice [Page 14] + +RFC 4646 Tags for Identifying Languages September 2006 + + + 2. Note: Private use subtag sequences starting with the singleton + subtag 'x' are described in Section 2.2.7 below. + + 3. An extension MUST follow at least a primary language subtag. + That is, a language tag cannot begin with an extension. + Extensions extend language tags, they do not override or replace + them. For example, "a-value" is not a well-formed language tag, + while "de-a-value" is. + + 4. Each singleton subtag MUST appear at most one time in each tag + (other than as a private use subtag). That is, singleton + subtags MUST NOT be repeated. For example, the tag + "en-a-bbb-a-ccc" is invalid because the subtag 'a' appears + twice. Note that the tag "en-a-bbb-x-a-ccc" is valid because + the second appearance of the singleton 'a' is in a private use + sequence. + + 5. Extension subtags MUST meet all of the requirements for the + content and format of subtags defined in this document. + + 6. Extension subtags MUST meet whatever requirements are set by the + document that defines their singleton prefix and whatever + requirements are provided by the maintaining authority. + + 7. Each extension subtag MUST be from two to eight characters long + and consist solely of letters or digits, with each subtag + separated by a single '-'. + + 8. Each singleton MUST be followed by at least one extension + subtag. For example, the tag "tlh-a-b-foo" is invalid because + the first singleton 'a' is followed immediately by another + singleton 'b'. + + 9. Extension subtags MUST follow all language, extended language, + script, region, and variant subtags in a tag. + + 10. All subtags following the singleton and before another singleton + are part of the extension. Example: In the tag "fr-a-Latn", the + subtag 'Latn' does not represent the script subtag 'Latn' + defined in the IANA Language Subtag Registry. Its meaning is + defined by the extension 'a'. + + 11. In the event that more than one extension appears in a single + tag, the tag SHOULD be canonicalized as described in + Section 4.4. + + + + + + +Phillips & Davis Best Current Practice [Page 15] + +RFC 4646 Tags for Identifying Languages September 2006 + + + For example, if the prefix singleton 'r' and the shown subtags were + defined, then the following tag would be a valid example: + "en-Latn-GB-boont-r-extended-sequence-x-private". + +2.2.7. Private Use Subtags + + Private use subtags are used to indicate distinctions in language + important in a given context by private agreement. The following + rules apply to private use subtags: + + 1. Private use subtags are separated from the other subtags defined + in this document by the reserved single-character subtag 'x'. + + 2. Private use subtags MUST conform to the format and content + constraints defined in the ABNF for all subtags. + + 3. Private use subtags MUST follow all language, extended language, + script, region, variant, and extension subtags in the tag. + Another way of saying this is that all subtags following the + singleton 'x' MUST be considered private use. Example: The + subtag 'US' in the tag "en-x-US" is a private use subtag. + + 4. A tag MAY consist entirely of private use subtags. + + 5. No source is defined for private use subtags. Use of private use + subtags is by private agreement only. + + 6. Private use subtags are NOT RECOMMENDED where alternatives exist + or for general interchange. See Section 4.5 for more information + on private use subtag choice. + + For example: Users who wished to utilize codes from the Ethnologue + publication of SIL International for language identification might + agree to exchange tags such as "az-Arab-x-AZE-derbend". This example + contains two private use subtags. The first is 'AZE' and the second + is 'derbend'. + +2.2.8. Preexisting RFC 3066 Registrations + + Existing IANA-registered language tags from RFC 1766 and/or RFC 3066 + maintain their validity. These tags will be maintained in the + registry in records of either the "grandfathered" or "redundant" + type. Grandfathered tags contain one or more subtags that are not + defined in the Language Subtag Registry (see Section 3). Redundant + tags consist entirely of subtags defined above and whose independent + registration is superseded by this document. For more information, + see Section 3.8. + + + + +Phillips & Davis Best Current Practice [Page 16] + +RFC 4646 Tags for Identifying Languages September 2006 + + + It is important to note that all language tags formed under the + guidelines in this document were either legal, well-formed tags or + could have been registered under RFC 3066. + +2.2.9. Classes of Conformance + + Implementations sometimes need to describe their capabilities with + regard to the rules and practices described in this document. There + are two classes of conforming implementations described by this + document: "well-formed" processors and "validating" processors. + Claims of conformance SHOULD explicitly reference one of these + definitions. + + An implementation that claims to check for well-formed language tags + MUST: + + o Check that the tag and all of its subtags, including extension and + private use subtags, conform to the ABNF or that the tag is on the + list of grandfathered tags. + + o Check that singleton subtags that identify extensions do not + repeat. For example, the tag "en-a-xx-b-yy-a-zz" is not well- + formed. + + Well-formed processors are strongly encouraged to implement the + canonicalization rules contained in Section 4.4. + + An implementation that claims to be validating MUST: + + o Check that the tag is well-formed. + + o Specify the particular registry date for which the implementation + performs validation of subtags. + + o Check that either the tag is a grandfathered tag, or that all + language, script, region, and variant subtags consist of valid + codes for use in language tags according to the IANA registry as + of the particular date specified by the implementation. + + o Specify which, if any, extension RFCs as defined in Section 3.7 + are supported, including version, revision, and date. + + o For any such extensions supported, check that all subtags used in + that extension are valid. + + o For variant and extended language subtags, if the registry + contains one or more 'Prefix' fields for that subtag, check that + the tag matches at least one prefix. The tag matches if all the + + + +Phillips & Davis Best Current Practice [Page 17] + +RFC 4646 Tags for Identifying Languages September 2006 + + + subtags in the 'Prefix' also appear in the tag. For example, the + prefix "es-CO" matches the tag "es-Latn-CO-x-private" because both + the 'es' language subtag and 'CO' region subtag appear in the tag. + +3. Registry Format and Maintenance + + This section defines the Language Subtag Registry and the maintenance + and update procedures associated with it, as well as a registry for + extensions to language tags (Section 3.7). + + The Language Subtag Registry contains a comprehensive list of all of + the subtags valid in language tags. This allows implementers a + straightforward and reliable way to validate language tags. The + Language Subtag Registry will be maintained so that, except for + extension subtags, it is possible to validate all of the subtags that + appear in a language tag under the provisions of this document or its + revisions or successors. In addition, the meaning of the various + subtags will be unambiguous and stable over time. (The meaning of + private use subtags, of course, is not defined by the IANA registry.) + +3.1. Format of the IANA Language Subtag Registry + + The IANA Language Subtag Registry ("the registry") consists of a text + file that is machine readable in the format described in this + section, plus copies of the registration forms approved in accordance + with the process described in Section 3.5. The existing registration + forms for grandfathered and redundant tags taken from RFC 3066 will + be maintained as part of the obsolete RFC 3066 registry. The + remaining set of initial subtags will not have registration forms + created for them. + + The registry is in the text format described below. This format was + based on the record-jar format described in [record-jar]. + + Each line of text is limited to 72 characters, including all + whitespace. Records are separated by lines containing only the + sequence "%%" (%x25.25). + + Each field can be viewed as a single, logical line of ASCII + characters, comprising a field-name and a field-body separated by a + COLON character (%x3A). For convenience, the field-body portion of + this conceptual entity can be split into a multiple-line + representation; this is called "folding". The format of the registry + is described by the following ABNF (per [RFC4234]): + + + + + + + +Phillips & Davis Best Current Practice [Page 18] + +RFC 4646 Tags for Identifying Languages September 2006 + + + registry = record *("%%" CRLF record) + record = 1*( field-name *SP ":" *SP field-body CRLF ) + field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] + field-body = *(ASCCHAR/LWSP) + ASCCHAR = %x21-25 / %x27-7E / UNICHAR ; Note: AMPERSAND is %x26 + UNICHAR = "&#x" 2*6HEXDIG ";" + + Figure 2: Registry Format ABNF + + The sequence '..' (%x2E.2E) in a field-body denotes a range of + values. Such a range represents all subtags of the same length that + are in alphabetic or numeric order within that range, including the + values explicitly mentioned. For example 'a..c' denotes the values + 'a', 'b', and 'c' and '11..13' denotes the values '11', '12', and + '13'. + + Characters from outside the US-ASCII [ISO646] repertoire, as well as + the AMPERSAND character ("&", %x26) when it occurs in a field-body, + are represented by a "Numeric Character Reference" using hexadecimal + notation in the style used by [XML10] (see + <http://www.w3.org/TR/REC-xml/#dt-charref>). This consists of the + sequence "&#x" (%x26.23.78) followed by a hexadecimal representation + of the character's code point in [ISO10646] followed by a closing + semicolon (%x3B). For example, the EURO SIGN, U+20AC, would be + represented by the sequence "€". Note that the hexadecimal + notation MAY have between two and six digits. + + All fields whose field-body contains a date value use the "full-date" + format specified in [RFC3339]. For example: "2004-06-28" represents + June 28, 2004, in the Gregorian calendar. + + The first record in the file contains the single field whose field- + name is "File-Date" (see Figure 3). The field-body of this record + contains the last modification date of this copy of the registry, + making it possible to compare different versions of the registry. + The registry on the IANA website is the most current. Versions with + an older date than that one are not up-to-date. + + File-Date: 2004-06-28 + %% + + Figure 3: Example of the File-Date Record + + Subsequent records represent subtags in the registry. Each of the + fields in each record MUST occur no more than once, unless otherwise + noted below. Each record MUST contain the following fields: + + + + + +Phillips & Davis Best Current Practice [Page 19] + +RFC 4646 Tags for Identifying Languages September 2006 + + + o 'Type' + + * Type's field-value MUST consist of one of the following + strings: "language", "extlang", "script", "region", "variant", + "grandfathered", and "redundant" and denotes the type of tag or + subtag. + + o Either 'Subtag' or 'Tag' + + * Subtag's field-value contains the subtag being defined. This + field MUST only appear in records of whose 'Type' has one of + these values: "language", "extlang", "script", "region", or + "variant". + + * Tag's field-value contains a complete language tag. This field + MUST only appear in records whose 'Type' has one of these + values: "grandfathered" or "redundant". Note that the field- + value will always follow the 'grandfathered' production in the + ABNF in Section 2.1 + + o Description + + * Description's field-value contains a non-normative description + of the subtag or tag. + + o Added + + * Added's field-value contains the date the record was added to + the registry. + + The 'Subtag' or 'Tag' field MUST use lowercase letters to form the + subtag or tag, with two exceptions. Subtags whose 'Type' field is + 'script' (in other words, subtags defined by ISO 15924) MUST use + titlecase. Subtags whose 'Type' field is 'region' (in other words, + subtags defined by ISO 3166) MUST use uppercase. These exceptions + mirror the use of case in the underlying standards. + + The field 'Description' MAY appear more than one time and contains a + description of the tag or subtag in the record. At least one of the + 'Description' fields MUST be written or transcribed into the Latin + script; the same or additional fields MAY also include a description + in a non-Latin script. The 'Description' field is used for + identification purposes and SHOULD NOT be taken to represent the + actual native name of the language or variation or to be in any + particular language. Most descriptions are taken directly from + source standards such as ISO 639 or ISO 3166. + + + + + +Phillips & Davis Best Current Practice [Page 20] + +RFC 4646 Tags for Identifying Languages September 2006 + + + Note: Descriptions in registry entries that correspond to ISO 639, + ISO 15924, ISO 3166, or UN M.49 codes are intended only to indicate + the meaning of that identifier as defined in the source standard at + the time it was added to the registry. The description does not + replace the content of the source standard itself. The descriptions + are not intended to be the English localized names for the subtags. + Localization or translation of language tag and subtag descriptions + is out of scope of this document. + + Each record MAY also contain the following fields: + + o Preferred-Value + + * For fields of type 'language', 'extlang', 'script', 'region', + and 'variant', 'Preferred-Value' contains the subtag of the + same 'Type' that is preferred for forming the language tag. + + * For fields of type 'grandfathered' and 'redundant', a canonical + mapping to a complete language tag. + + o Deprecated + + * Deprecated's field-value contains the date the record was + deprecated. + + o Prefix + + * Prefix's field-value contains a language tag with which this + subtag MAY be used to form a new language tag, perhaps with + other subtags as well. This field MUST only appear in records + whose 'Type' field-value is 'variant' or 'extlang'. For + example, the 'Prefix' for the variant 'nedis' is 'sl', meaning + that the tags "sl-nedis" and "sl-IT-nedis" might be appropriate + while the tag "is-nedis" is not. + + o Comments + + * Comments contains additional information about the subtag, as + deemed appropriate for understanding the registry and + implementing language tags using the subtag or tag. + + o Suppress-Script + + * Suppress-Script contains a script subtag that SHOULD NOT be + used to form language tags with the associated primary language + subtag. This field MUST only appear in records whose 'Type' + field-value is 'language'. See Section 4.1. + + + + +Phillips & Davis Best Current Practice [Page 21] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The field 'Deprecated' MAY be added to any record via the maintenance + process described in Section 3.3 or via the registration process + described in Section 3.5. Usually, the addition of a 'Deprecated' + field is due to the action of one of the standards bodies, such as + ISO 3166, withdrawing a code. In some historical cases, it might not + have been possible to reconstruct the original deprecation date. For + these cases, an approximate date appears in the registry. Although + valid in language tags, subtags and tags with a 'Deprecated' field + are deprecated and validating processors SHOULD NOT generate these + subtags. Note that a record that contains a 'Deprecated' field and + no corresponding 'Preferred-Value' field has no replacement mapping. + + The field 'Preferred-Value' contains a mapping between the record in + which it appears and another tag or subtag. The value in this field + is STRONGLY RECOMMENDED as the best choice to represent the value of + this record when selecting a language tag. These values form three + groups: + + 1. ISO 639 language codes that were later withdrawn in favor of + other codes. These values are mostly a historical curiosity. + + 2. ISO 3166 region codes that have been withdrawn in favor of a new + code. This sometimes happens when a country changes its name or + administration in such a way that warrants a new region code. + + 3. Tags grandfathered from RFC 3066. In many cases, these tags have + become obsolete because the values they represent were later + encoded by ISO 639. + + Records that contain a 'Preferred-Value' field MUST also have a + 'Deprecated' field. This field contains a date of deprecation. + Thus, a language tag processor can use the registry to construct the + valid, non-deprecated set of subtags for a given date. In addition, + for any given tag, a processor can construct the set of valid + language tags that correspond to that tag for all dates up to the + date of the registry. The ability to do these mappings MAY be + beneficial to applications that are matching, selecting, for + filtering content based on its language tags. + + Note that 'Preferred-Value' mappings in records of type 'region' + sometimes do not represent exactly the same meaning as the original + value. There are many reasons for a country code to be changed, and + the effect this has on the formation of language tags will depend on + the nature of the change in question. + + In particular, the 'Preferred-Value' field does not imply retagging + content that uses the affected subtag. + + + + +Phillips & Davis Best Current Practice [Page 22] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The field 'Preferred-Value' MUST NOT be modified once created in the + registry. The field MAY be added to records of type "grandfathered" + and "region" according to the rules in Section 3.3. Otherwise the + field MUST NOT be added to any record already in the registry. + + The 'Preferred-Value' field in records of type "grandfathered" and + "redundant" contains whole language tags that are strongly + RECOMMENDED for use in place of the record's value. In many cases, + the mappings were created by deprecation of the tags during the + period before this document was adopted. For example, the tag + "no-nyn" was deprecated in favor of the ISO 639-1-defined language + code 'nn'. + + Records of type 'variant' MAY have more than one field of type + 'Prefix'. Additional fields of this type MAY be added to a 'variant' + record via the registration process. + + Records of type 'extlang' MUST have _exactly_ one 'Prefix' field. + + The field-value of the 'Prefix' field consists of a language tag + whose subtags are appropriate to use with this subtag. For example, + the variant subtag '1996' has a 'Prefix' field of "de". This means + that tags starting with the sequence "de-" are appropriate with this + subtag, so "de-Latg-1996" and "de-CH-1996" are both acceptable, while + the tag "fr-1996" is an inappropriate choice. + + The field of type 'Prefix' MUST NOT be removed from any record. The + field-value for this type of field MUST NOT be modified. + + The field 'Comments' MAY appear more than once per record. This + field MAY be inserted or changed via the registration process and no + guarantee of stability is provided. The content of this field is not + restricted, except by the need to register the information, the + suitability of the request, and by reasonable practical size + limitations. + + The field 'Suppress-Script' MUST only appear in records whose 'Type' + field-value is 'language'. This field MUST NOT appear more than one + time in a record. This field indicates a script used to write the + overwhelming majority of documents for the given language and that + therefore adds no distinguishing information to a language tag. It + helps ensure greater compatibility between the language tags + generated according to the rules in this document and language tags + and tag processors or consumers based on RFC 3066. For example, + virtually all Icelandic documents are written in the Latin script, + making the subtag 'Latn' redundant in the tag "is-Latn". + + + + + +Phillips & Davis Best Current Practice [Page 23] + +RFC 4646 Tags for Identifying Languages September 2006 + + +3.2. Language Subtag Reviewer + + The Language Subtag Reviewer is appointed by the IESG for an + indefinite term, subject to removal or replacement at the IESG's + discretion. The Language Subtag Reviewer moderates the ietf- + languages mailing list, responds to requests for registration, and + performs the other registry maintenance duties described in + Section 3.3. Only the Language Subtag Reviewer is permitted to + request IANA to change, update, or add records to the Language Subtag + Registry. + + The performance or decisions of the Language Subtag Reviewer MAY be + appealed to the IESG under the same rules as other IETF decisions + (see [RFC2026]). The IESG can reverse or overturn the decision of + the Language Subtag Reviewer, provide guidance, or take other + appropriate actions. + +3.3. Maintenance of the Registry + + Maintenance of the registry requires that as codes are assigned or + withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language + Subtag Reviewer MUST evaluate each change, determine whether it + conflicts with existing registry entries, and submit the information + to IANA for inclusion in the registry. If a change takes place and + the Language Subtag Reviewer does not do this in a timely manner, + then any interested party MAY use the procedure in Section 3.5 to + register the appropriate update. + + Note: The redundant and grandfathered entries together are the + complete list of tags registered under [RFC3066]. The redundant tags + are those that can now be formed using the subtags defined in the + registry together with the rules of Section 2.2. The grandfathered + entries include those that can never be legal under those same + provisions. + + The set of redundant and grandfathered tags is permanent and stable: + new entries in this section MUST NOT be added and existing entries + MUST NOT be removed. Records of type 'grandfathered' MAY have their + type converted to 'redundant'; see item 12 in Section 3.6 for more + information. The decision-making process about which tags were + initially grandfathered and which were made redundant is described in + [RFC4645]. + + RFC 3066 tags that were deprecated prior to the adoption of this + document are part of the list of grandfathered tags, and their + component subtags were not included as registered variants (although + they remain eligible for registration). For example, the tag + "art-lojban" was deprecated in favor of the language subtag 'jbo'. + + + +Phillips & Davis Best Current Practice [Page 24] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The Language Subtag Reviewer MUST ensure that new subtags meet the + requirements in Section 4.1 or submit an appropriate alternate subtag + as described in that section. When either a change or addition to + the registry is needed, the Language Subtag Reviewer MUST prepare the + complete record, including all fields, and forward it to IANA for + insertion into the registry. Each record being modified or inserted + MUST be forwarded in a separate message. + + If a record represents a new subtag that does not currently exist in + the registry, then the message's subject line MUST include the word + "INSERT". If the record represents a change to an existing subtag, + then the subject line of the message MUST include the word "MODIFY". + The message MUST contain both the record for the subtag being + inserted or modified and the new File-Date record. Here is an + example of what the body of the message might contain: + + LANGUAGE SUBTAG MODIFICATION + File-Date: 2005-01-02 + %% + Type: variant + Subtag: nedis + Description: Natisone dialect + Description: Nadiza dialect + Added: 2003-10-09 + Prefix: sl + Comments: This is a comment shown + as an example. + %% + + Figure 4: Example of a Language Subtag Modification Form + + Whenever an entry is created or modified in the registry, the + 'File-Date' record at the start of the registry is updated to reflect + the most recent modification date in the [RFC3339] "full-date" + format. + + Before forwarding a new registration to IANA, the Language Subtag + Reviewer MUST ensure that values in the 'Subtag' field match case + according to the description in Section 3.1. + +3.4. Stability of IANA Registry Entries + + The stability of entries and their meaning in the registry is + critical to the long-term stability of language tags. The rules in + this section guarantee that a specific language tag's meaning is + stable over time and will not change. + + + + + +Phillips & Davis Best Current Practice [Page 25] + +RFC 4646 Tags for Identifying Languages September 2006 + + + These rules specifically deal with how changes to codes (including + withdrawal and deprecation of codes) maintained by ISO 639, ISO + 15924, ISO 3166, and UN M.49 are reflected in the IANA Language + Subtag Registry. Assignments to the IANA Language Subtag Registry + MUST follow the following stability rules: + + 1. Values in the fields 'Type', 'Subtag', 'Tag', 'Added', + 'Deprecated' and 'Preferred-Value' MUST NOT be changed and are + guaranteed to be stable over time. + + 2. Values in the 'Description' field MUST NOT be changed in a way + that would invalidate previously-existing tags. They MAY be + broadened somewhat in scope, changed to add information, or + adapted to the most common modern usage. For example, countries + occasionally change their official names; a historical example + of this would be "Upper Volta" changing to "Burkina Faso". + + 3. Values in the field 'Prefix' MAY be added to records of type + 'variant' via the registration process. + + 4. Values in the field 'Prefix' MAY be modified, so long as the + modifications broaden the set of prefixes. That is, a prefix + MAY be replaced by one of its own prefixes. For example, the + prefix "en-US" could be replaced by "en", but not by the + prefixes "en-Latn", "fr", or "en-US-boont". If one of those + prefixes were needed, a new Prefix SHOULD be registered. + + 5. Values in the field 'Prefix' MUST NOT be removed. + + 6. The field 'Comments' MAY be added, changed, modified, or removed + via the registration process or any of the processes or + considerations described in this section. + + 7. The field 'Suppress-Script' MAY be added or removed via the + registration process. + + 8. Codes assigned by ISO 639, ISO 15924, and ISO 3166 that do not + conflict with existing subtags of the associated type and whose + meaning is not the same as an existing subtag of the same type + are entered into the IANA registry as new records. + + 9. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that are + withdrawn by their respective maintenance or registration + authority remain valid in language tags. A 'Deprecated' field + containing the date of withdrawal is added to the record. If a + new record of the same type is added that represents a + + + + + +Phillips & Davis Best Current Practice [Page 26] + +RFC 4646 Tags for Identifying Languages September 2006 + + + replacement value, then a 'Preferred-Value' field MAY also be + added. The registration process MAY be used to add comments + about the withdrawal of the code by the respective standard. + + Example + The region code 'TL' was assigned to the country 'Timor- + Leste', replacing the code 'TP' (which was assigned to 'East + Timor' when it was under administration by Portugal). The + subtag 'TP' remains valid in language tags, but its record + contains the a 'Preferred-Value' of 'TL' and its field + 'Deprecated' contains the date the new code was assigned + ('2004-07-06'). + + 10. Codes assigned by ISO 639, ISO 15924, or ISO 3166 that conflict + with existing subtags of the associated type, including subtags + that are deprecated, MUST NOT be entered into the registry. The + following additional considerations apply to subtag values that + are reassigned: + + A. For ISO 639 codes, if the newly assigned code's meaning is + not represented by a subtag in the IANA registry, the + Language Subtag Reviewer, as described in Section 3.5, SHALL + prepare a proposal for entering in the IANA registry as soon + as practical a registered language subtag as an alternate + value for the new code. The form of the registered language + subtag will be at the discretion of the Language Subtag + Reviewer and MUST conform to other restrictions on language + subtags in this document. + + B. For all subtags whose meaning is derived from an external + standard (i.e., ISO 639, ISO 15924, ISO 3166, or UN M.49), + if a new meaning is assigned to an existing code and the new + meaning broadens the meaning of that code, then the meaning + for the associated subtag MAY be changed to match. The + meaning of a subtag MUST NOT be narrowed, however, as this + can result in an unknown proportion of the existing uses of + a subtag becoming invalid. Note: ISO 639 maintenance + agency/registration authority (MA/RA) has adopted a similar + stability policy. + + C. For ISO 15924 codes, if the newly assigned code's meaning is + not represented by a subtag in the IANA registry, the + Language Subtag Reviewer, as described in Section 3.5, SHALL + prepare a proposal for entering in the IANA registry as soon + as practical a registered variant subtag as an alternate + value for the new code. The form of the registered variant + + + + + +Phillips & Davis Best Current Practice [Page 27] + +RFC 4646 Tags for Identifying Languages September 2006 + + + subtag will be at the discretion of the Language Subtag + Reviewer and MUST conform to other restrictions on variant + subtags in this document. + + D. For ISO 3166 codes, if the newly assigned code's meaning is + associated with the same UN M.49 code as another 'region' + subtag, then the existing region subtag remains as the + preferred value for that region and no new entry is created. + A comment MAY be added to the existing region subtag + indicating the relationship to the new ISO 3166 code. + + E. For ISO 3166 codes, if the newly assigned code's meaning is + associated with a UN M.49 code that is not represented by an + existing region subtag, then the Language Subtag Reviewer, + as described in Section 3.5, SHALL prepare a proposal for + entering the appropriate UN M.49 country code as an entry in + the IANA registry. + + F. For ISO 3166 codes, if there is no associated UN numeric + code, then the Language Subtag Reviewer SHALL petition the + UN to create one. If there is no response from the UN + within ninety days of the request being sent, the Language + Subtag Reviewer SHALL prepare a proposal for entering in the + IANA registry as soon as practical a registered variant + subtag as an alternate value for the new code. The form of + the registered variant subtag will be at the discretion of + the Language Subtag Reviewer and MUST conform to other + restrictions on variant subtags in this document. This + situation is very unlikely to ever occur. + + 11. UN M.49 has codes for both countries and areas (such as '276' + for Germany) and geographical regions and sub-regions (such as + '150' for Europe). UN M.49 country or area codes for which + there is no corresponding ISO 3166 code SHOULD NOT be + registered, except as a surrogate for an ISO 3166 code that is + blocked from registration by an existing subtag. If such a code + becomes necessary, then the registration authority for ISO 3166 + SHOULD first be petitioned to assign a code to the region. If + the petition for a code assignment by ISO 3166 is refused or not + acted on in a timely manner, the registration process described + in Section 3.5 MAY then be used to register the corresponding UN + M.49 code. At the time this document was written, there were + only four such codes: 830 (Channel Islands), 831 (Guernsey), 832 + (Jersey), and 833 (Isle of Man). This way, UN M.49 codes remain + available as the value of last resort in cases where ISO 3166 + reassigns a deprecated value in the registry. + + + + + +Phillips & Davis Best Current Practice [Page 28] + +RFC 4646 Tags for Identifying Languages September 2006 + + + 12. Stability provisions apply to grandfathered tags with this + exception: should all of the subtags in a grandfathered tag + become valid subtags in the IANA registry, then the field 'Type' + in that record is changed from 'grandfathered' to 'redundant'. + Note that this will not affect language tags that match the + grandfathered tag, since these tags will now match valid + generative subtag sequences. For example, if the subtag 'gan' + in the language tag "zh-gan" were to be registered as an + extended language subtag, then the grandfathered tag "zh-gan" + would be deprecated (but existing content or implementations + that use "zh-gan" would remain valid). + +3.5. Registration Procedure for Subtags + + The procedure given here MUST be used by anyone who wants to use a + subtag not currently in the IANA Language Subtag Registry. + + Only subtags of type 'language' and 'variant' will be considered for + independent registration of new subtags. Handling of subtags needed + for stability and subtags necessary to keep the registry synchronized + with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits + defined by this document are described in Section 3.3. Stability + provisions are described in Section 3.4. + + This procedure MAY also be used to register or alter the information + for the 'Description', 'Comments', 'Deprecated', or 'Prefix' fields + in a subtag's record as described in Section 3.4. Changes to all + other fields in the IANA registry are NOT permitted. + + Registering a new subtag or requesting modifications to an existing + tag or subtag starts with the requester filling out the registration + form reproduced below. Note that each response is not limited in + size so that the request can adequately describe the registration. + The fields in the "Record Requested" section SHOULD follow the + requirements in Section 3.1. + + + + + + + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 29] + +RFC 4646 Tags for Identifying Languages September 2006 + + + LANGUAGE SUBTAG REGISTRATION FORM + 1. Name of requester: + 2. E-mail address of requester: + 3. Record Requested: + + Type: + Subtag: + Description: + Prefix: + Preferred-Value: + Deprecated: + Suppress-Script: + Comments: + + 4. Intended meaning of the subtag: + 5. Reference to published description + of the language (book or article): + 6. Any other relevant information: + + Figure 5: The Language Subtag Registration Form + + The subtag registration form MUST be sent to + <ietf-languages@iana.org> for a two-week review period before it can + be submitted to IANA. (This is an open list and can be joined by + sending a request to <ietf-languages-request@iana.org>.) + + Variant subtags are usually registered for use with a particular + range of language tags. For example, the subtag 'rozaj' is intended + for use with language tags that start with the primary language + subtag "sl", since Resian is a dialect of Slovenian. Thus, the + subtag 'rozaj' would be appropriate in tags such as "sl-Latn-rozaj" + or "sl-IT-rozaj". This information is stored in the 'Prefix' field + in the registry. Variant registration requests SHOULD include at + least one 'Prefix' field in the registration form. + + Extended language subtags are reserved for future standardization. + These subtags will be REQUIRED to include exactly one 'Prefix' field + once they are allowed for registration. + + The 'Prefix' field for a given registered subtag exists in the IANA + registry as a guide to usage. Additional prefixes MAY be added by + filing an additional registration form. In that form, the "Any other + relevant information:" field MUST indicate that it is the addition of + a prefix. + + Requests to add a prefix to a variant subtag that imply a different + semantic meaning will probably be rejected. For example, a request + to add the prefix "de" to the subtag 'nedis' so that the tag + + + +Phillips & Davis Best Current Practice [Page 30] + +RFC 4646 Tags for Identifying Languages September 2006 + + + "de-nedis" represented some German dialect would be rejected. The + 'nedis' subtag represents a particular Slovenian dialect and the + additional registration would change the semantic meaning assigned to + the subtag. A separate subtag SHOULD be proposed instead. + + The 'Description' field MUST contain a description of the tag being + registered written or transcribed into the Latin script; it MAY also + include a description in a non-Latin script. Non-ASCII characters + MUST be escaped using the syntax described in Section 3.1. The + 'Description' field is used for identification purposes and doesn't + necessarily represent the actual native name of the language or + variation or to be in any particular language. + + While the 'Description' field itself is not guaranteed to be stable + and errata corrections MAY be undertaken from time to time, attempts + to provide translations or transcriptions of entries in the registry + itself will probably be frowned upon by the community or rejected + outright, as changes of this nature have an impact on the provisions + in Section 3.4. + + When the two-week period has passed, the Language Subtag Reviewer + either forwards the record to be inserted or modified to + iana@iana.org according to the procedure described in Section 3.3, or + rejects the request because of significant objections raised on the + list or due to problems with constraints in this document (which MUST + be explicitly cited). The Language Subtag Reviewer MAY also extend + the review period in two-week increments to permit further + discussion. The Language Subtag Reviewer MUST indicate on the list + whether the registration has been accepted, rejected, or extended + following each two-week period. + + Note that the Language Subtag Reviewer MAY raise objections on the + list if he or she so desires. The important thing is that the + objection MUST be made publicly. + + The applicant is free to modify a rejected application with + additional information and submit it again; this restarts the two- + week comment period. + + Decisions made by the Language Subtag Reviewer MAY be appealed to the + IESG [RFC2028] under the same rules as other IETF decisions + [RFC2026]. + + All approved registration forms are available online in the directory + http://www.iana.org/numbers.html under "languages". + + + + + + +Phillips & Davis Best Current Practice [Page 31] + +RFC 4646 Tags for Identifying Languages September 2006 + + + Updates or changes to existing records follow the same procedure as + new registrations. The Language Subtag Reviewer decides whether + there is consensus to update the registration following the two-week + review period; normally, objections by the original registrant will + carry extra weight in forming such a consensus. + + Registrations are permanent and stable. Once registered, subtags + will not be removed from the registry and will remain a valid way in + which to specify a specific language or variant. + + Note: The purpose of the "Description" in the registration form is to + aid people trying to verify whether a language is registered or what + language or language variation a particular subtag refers to. In + most cases, reference to an authoritative grammar or dictionary of + that language will be useful; in cases where no such work exists, + other well-known works describing that language or in that language + MAY be appropriate. The Language Subtag Reviewer decides what + constitutes "good enough" reference material. This requirement is + not intended to exclude particular languages or dialects due to the + size of the speaker population or lack of a standardized orthography. + Minority languages will be considered equally on their own merits. + +3.6. Possibilities for Registration + + Possibilities for registration of subtags or information about + subtags include: + + o Primary language subtags for languages not listed in ISO 639 that + are not variants of any listed or registered language MAY be + registered. At the time this document was created, there were no + examples of this form of subtag. Before attempting to register a + language subtag, there MUST be an attempt to register the language + with ISO 639. Subtags MUST NOT be registered for codes that exist + in ISO 639-1 or ISO 639-2, that are under consideration by the ISO + 639 maintenance or registration authorities, or that have never + been attempted for registration with those authorities. If ISO + 639 has previously rejected a language for registration, it is + reasonable to assume that there must be additional, very + compelling evidence of need before it will be registered in the + IANA registry (to the extent that it is very unlikely that any + subtags will be registered of this type). + + o Dialect or other divisions or variations within a language, its + orthography, writing system, regional or historical usage, + transliteration or other transformation, or distinguishing + variation MAY be registered as variant subtags. An example is the + 'rozaj' subtag (the Resian dialect of Slovenian). + + + + +Phillips & Davis Best Current Practice [Page 32] + +RFC 4646 Tags for Identifying Languages September 2006 + + + o The addition or maintenance of fields (generally of an + informational nature) in Tag or Subtag records as described in + Section 3.1 and subject to the stability provisions in + Section 3.4. This includes descriptions, comments, deprecation + and preferred values for obsolete or withdrawn codes, or the + addition of script or extlang information to primary language + subtags. + + o The addition of records and related field value changes necessary + to reflect assignments made by ISO 639, ISO 15924, ISO 3166, and + UN M.49 as described in Section 3.4. + + Subtags proposed for registration that would cause all or part of a + grandfathered tag to become redundant but whose meaning conflicts + with or alters the meaning of the grandfathered tag MUST be rejected. + + This document leaves the decision on what subtags or changes to + subtags are appropriate (or not) to the registration process + described in Section 3.5. + + Note: four-character primary language subtags are reserved to allow + for the possibility of alpha4 codes in some future addition to the + ISO 639 family of standards. + + ISO 639 defines a maintenance agency for additions to and changes in + the list of languages in ISO 639. This agency is: + + International Information Centre for Terminology (Infoterm) + Aichholzgasse 6/12, AT-1120 + Wien, Austria + Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 + + ISO 639-2 defines a maintenance agency for additions to and changes + in the list of languages in ISO 639-2. This agency is: + + Library of Congress + Network Development and MARC Standards Office + Washington, D.C. 20540 USA + Phone: +1 202 707 6237 Fax: +1 202 707 0115 + URL: http://www.loc.gov/standards/iso639-2 + + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 33] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The maintenance agency for ISO 3166 (country codes) is: + + ISO 3166 Maintenance Agency + c/o International Organization for Standardization + Case postale 56 + CH-1211 Geneva 20 Switzerland + Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 + URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html + + The registration authority for ISO 15924 (script codes) is: + + Unicode Consortium Box 391476 + Mountain View, CA 94039-1476, USA + URL: http://www.unicode.org/iso15924 + + The Statistics Division of the United Nations Secretariat maintains + the Standard Country or Area Codes for Statistical Use and can be + reached at: + + Statistical Services Branch + Statistics Division + United Nations, Room DC2-1620 + New York, NY 10017, USA + + Fax: +1-212-963-0623 + E-mail: statistics@un.org + URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm + +3.7. Extensions and Extensions Registry + + Extension subtags are those introduced by single-character subtags + ("singletons") other than 'x'. They are reserved for the generation + of identifiers that contain a language component and are compatible + with applications that understand language tags. + + The structure and form of extensions are defined by this document so + that implementations can be created that are forward compatible with + applications that might be created using singletons in the future. + In addition, defining a mechanism for maintaining singletons will + lend stability to this document by reducing the likely need for + future revisions or updates. + + Single-character subtags are assigned by IANA using the "IETF + Consensus" policy defined by [RFC2434]. This policy requires the + development of an RFC, which SHALL define the name, purpose, + processes, and procedures for maintaining the subtags. The + maintaining or registering authority, including name, contact email, + + + + +Phillips & Davis Best Current Practice [Page 34] + +RFC 4646 Tags for Identifying Languages September 2006 + + + discussion list email, and URL location of the registry, MUST be + indicated clearly in the RFC. The RFC MUST specify or include each + of the following: + + o The specification MUST reference the specific version or revision + of this document that governs its creation and MUST reference this + section of this document. + + o The specification and all subtags defined by the specification + MUST follow the ABNF and other rules for the formation of tags and + subtags as defined in this document. In particular, it MUST + specify that case is not significant and that subtags MUST NOT + exceed eight characters in length. + + o The specification MUST specify a canonical representation. + + o The specification of valid subtags MUST be available over the + Internet and at no cost. + + o The specification MUST be in the public domain or available via a + royalty-free license acceptable to the IETF and specified in the + RFC. + + o The specification MUST be versioned, and each version of the + specification MUST be numbered, dated, and stable. + + o The specification MUST be stable. That is, extension subtags, + once defined by a specification, MUST NOT be retracted or change + in meaning in any substantial way. + + o The specification MUST include in a separate section the + registration form reproduced in this section (below) to be used in + registering the extension upon publication as an RFC. + + o IANA MUST be informed of changes to the contact information and + URL for the specification. + + IANA will maintain a registry of allocated single-character + (singleton) subtags. This registry MUST use the record-jar format + described by the ABNF in Section 3.1. Upon publication of an + extension as an RFC, the maintaining authority defined in the RFC + MUST forward this registration form to iesg@ietf.org, who MUST + forward the request to iana@iana.org. The maintaining authority of + the extension MUST maintain the accuracy of the record by sending an + updated full copy of the record to iana@iana.org with the subject + line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only + the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY + be modified in these updates. + + + +Phillips & Davis Best Current Practice [Page 35] + +RFC 4646 Tags for Identifying Languages September 2006 + + + Failure to maintain this record, maintain the corresponding registry, + or meet other conditions imposed by this section of this document MAY + be appealed to the IESG [RFC2028] under the same rules as other IETF + decisions (see [RFC2026]) and MAY result in the authority to maintain + the extension being withdrawn or reassigned by the IESG. + + %% + Identifier: + Description: + Comments: + Added: + RFC: + Authority: + Contact_Email: + Mailing_List: + URL: + %% + + Figure 6: Format of Records in the Language Tag Extensions Registry + + 'Identifier' contains the single-character subtag (singleton) + assigned to the extension. The Internet-Draft submitted to define + the extension SHOULD specify which letter or digit to use, although + the IESG MAY change the assignment when approving the RFC. + + 'Description' contains the name and description of the extension. + + 'Comments' is an OPTIONAL field and MAY contain a broader description + of the extension. + + 'Added' contains the date the RFC was published in the "full-date" + format specified in [RFC3339]. For example: 2004-06-28 represents + June 28, 2004, in the Gregorian calendar. + + 'RFC' contains the RFC number assigned to the extension. + + 'Authority' contains the name of the maintaining authority for the + extension. + + 'Contact_Email' contains the email address used to contact the + maintaining authority. + + 'Mailing_List' contains the URL or subscription email address of the + mailing list used by the maintaining authority. + + 'URL' contains the URL of the registry for this extension. + + + + + +Phillips & Davis Best Current Practice [Page 36] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The determination of whether an Internet-Draft meets the above + conditions and the decision to grant or withhold such authority rests + solely with the IESG and is subject to the normal review and appeals + process associated with the RFC process. + + Extension authors are strongly cautioned that many (including most + well-formed) processors will be unaware of any special relationships + or meaning inherent in the order of extension subtags. Extension + authors SHOULD avoid subtag relationships or canonicalization + mechanisms that interfere with matching or with length restrictions + that sometimes exist in common protocols where the extension is used. + In particular, applications MAY truncate the subtags in doing + matching or in fitting into limited lengths, so it is RECOMMENDED + that the most significant information be in the most significant + (left-most) subtags and that the specification gracefully handle + truncated subtags. + + When a language tag is to be used in a specific, known, protocol, it + is RECOMMENDED that the language tag not contain extensions not + supported by that protocol. In addition, note that some protocols + MAY impose upper limits on the length of the strings used to store or + transport the language tag. + +3.8. Initialization of the Registries + + Upon adoption of this document, an initial version of the Language + Subtag Registry containing the various subtags initially valid in a + language tag is necessary. This collection of subtags, along with a + description of the process used to create it, is described by + [RFC4645]. IANA SHALL publish the initial version of the registry + described by this document from the content of [RFC4645]. Once + published by IANA, the maintenance procedures, rules, and + registration processes described in this document will be available + for new registrations or updates. + + Registrations that are in process under the rules defined in + [RFC3066] when this document is adopted MAY be completed under the + former rules, at the discretion of the Language Tag Reviewer (as + described in [RFC3066]). Until the IESG officially appoints a + Language Subtag Reviewer, the existing Language Tag Reviewer SHALL + serve as the Language Subtag Reviewer. + + Any new registrations submitted using the RFC 3066 forms or format + after the adoption of this document and publication of the registry + by IANA MUST be rejected. + + + + + + +Phillips & Davis Best Current Practice [Page 37] + +RFC 4646 Tags for Identifying Languages September 2006 + + + An initial version of the Language Tag Extensions Registry described + in Section 3.7 is also needed. The Language Tag Extensions Registry + SHALL be initialized with a single record containing a single field + of type "File-Date" as a placeholder for future assignments. + +4. Formation and Processing of Language Tags + + This section addresses how to use the information in the registry + with the tag syntax to choose, form, and process language tags. + +4.1. Choice of Language Tag + + One is sometimes faced with the choice between several possible tags + for the same body of text. + + Interoperability is best served when all users use the same language + tag in order to represent the same language. If an application has + requirements that make the rules here inapplicable, then that + application risks damaging interoperability. It is strongly + RECOMMENDED that users not define their own rules for language tag + choice. + + Subtags SHOULD only be used where they add useful distinguishing + information; extraneous subtags interfere with the meaning, + understanding, and processing of language tags. In particular, users + and implementations SHOULD follow the 'Prefix' and 'Suppress-Script' + fields in the registry (defined in Section 3.1): these fields provide + guidance on when specific additional subtags SHOULD (and SHOULD NOT) + be used in a language tag. + + Of particular note, many applications can benefit from the use of + script subtags in language tags, as long as the use is consistent for + a given context. Script subtags were not formally defined in RFC + 3066 and their use can affect matching and subtag identification by + implementations of RFC 3066, as these subtags appear between the + primary language and region subtags. For example, if a user requests + content in an implementation of Section 2.5 of [RFC3066] using the + language range "en-US", content labeled "en-Latn-US" will not match + the request. Therefore, it is important to know when script subtags + will customarily be used and when they ought not be used. In the + registry, the Suppress-Script field helps ensure greater + compatibility between the language tags generated according to the + rules in this document and language tags and tag processors or + consumers based on RFC 3066 by defining when users SHOULD NOT include + a script subtag with a particular primary language subtag. + + + + + + +Phillips & Davis Best Current Practice [Page 38] + +RFC 4646 Tags for Identifying Languages September 2006 + + + Extended language subtags (type 'extlang' in the registry; see + Section 3.1) also appear between the primary language and region + subtags and are reserved for future standardization. Applications + might benefit from their judicious use in forming language tags in + the future. Similar recommendations are expected to apply to their + use as apply to script subtags. + + Standards, protocols, and applications that reference this document + normatively but apply different rules to the ones given in this + section MUST specify how the procedure varies from the one given + here. + + The choice of subtags used to form a language tag SHOULD be guided by + the following rules: + + 1. Use as precise a tag as possible, but no more specific than is + justified. Avoid using subtags that are not important for + distinguishing content in an application. + + * For example, 'de' might suffice for tagging an email written + in German, while "de-CH-1996" is probably unnecessarily + precise for such a task. + + 2. The script subtag SHOULD NOT be used to form language tags unless + the script adds some distinguishing information to the tag. The + field 'Suppress-Script' in the primary language record in the + registry indicates which script subtags do not add distinguishing + information for most applications. + + * For example, the subtag 'Latn' should not be used with the + primary language 'en' because nearly all English documents are + written in the Latin script and it adds no distinguishing + information. However, if a document were written in English + mixing Latin script with another script such as Braille + ('Brai'), then it might be appropriate to choose to indicate + both scripts to aid in content selection, such as the + application of a style sheet. + + 3. If a tag or subtag has a 'Preferred-Value' field in its registry + entry, then the value of that field SHOULD be used to form the + language tag in preference to the tag or subtag in which the + preferred value appears. + + * For example, use 'he' for Hebrew in preference to 'iw'. + + + + + + + +Phillips & Davis Best Current Practice [Page 39] + +RFC 4646 Tags for Identifying Languages September 2006 + + + 4. The 'und' (Undetermined) primary language subtag SHOULD NOT be + used to label content, even if the language is unknown. Omitting + the language tag altogether is preferred to using a tag with a + primary language subtag of 'und'. The 'und' subtag MAY be useful + for protocols that require a language tag to be provided. The + 'und' subtag MAY also be useful when matching language tags in + certain situations. + + 5. The 'mul' (Multiple) primary language subtag SHOULD NOT be used + whenever the protocol allows the separate tags for multiple + languages, as is the case for the Content-Language header in + HTTP. The 'mul' subtag conveys little useful information: + content in multiple languages SHOULD individually tag the + languages where they appear or otherwise indicate the actual + language in preference to the 'mul' subtag. + + 6. The same variant subtag SHOULD NOT be used more than once within + a language tag. + + * For example, do not use "de-DE-1901-1901". + + To ensure consistent backward compatibility, this document contains + several provisions to account for potential instability in the + standards used to define the subtags that make up language tags. + These provisions mean that no language tag created under the rules in + this document will become obsolete. + +4.2. Meaning of the Language Tag + + The relationship between the tag and the information it relates to is + defined by the context in which the tag appears. Accordingly, this + section gives only possible examples of its usage. + + o For a single information object, the associated language tags + might be interpreted as the set of languages that is necessary for + a complete comprehension of the complete object. Example: Plain + text documents. + + o For an aggregation of information objects, the associated language + tags could be taken as the set of languages used inside components + of that aggregation. Examples: Document stores and libraries. + + o For information objects whose purpose is to provide alternatives, + the associated language tags could be regarded as a hint that the + content is provided in several languages and that one has to + inspect each of the alternatives in order to find its language or + languages. In this case, the presence of multiple tags might not + mean that one needs to be multi-lingual to get complete + + + +Phillips & Davis Best Current Practice [Page 40] + +RFC 4646 Tags for Identifying Languages September 2006 + + + understanding of the document. Example: MIME multipart/ + alternative. + + o In markup languages, such as HTML and XML, language information + can be added to each part of the document identified by the markup + structure (including the whole document itself). For example, one + could write <span lang="fr">C'est la vie.</span> inside a + Norwegian document; the Norwegian-speaking user could then access + a French-Norwegian dictionary to find out what the marked section + meant. If the user were listening to that document through a + speech synthesis interface, this formation could be used to signal + the synthesizer to appropriately apply French text-to-speech + pronunciation rules to that span of text, instead of applying the + inappropriate Norwegian rules. + + Language tags are related when they contain a similar sequence of + subtags. For example, if a language tag B contains language tag A as + a prefix, then B is typically "narrower" or "more specific" than A. + Thus, "zh-Hant-TW" is more specific than "zh-Hant". + + This relationship is not guaranteed in all cases: specifically, + languages that begin with the same sequence of subtags are NOT + guaranteed to be mutually intelligible, although they might be. For + example, the tag "az" shares a prefix with both "az-Latn" + (Azerbaijani written using the Latin script) and "az-Cyrl" + (Azerbaijani written using the Cyrillic script). A person fluent in + one script might not be able to read the other, even though the text + might be identical. Content tagged as "az" most probably is written + in just one script and thus might not be intelligible to a reader + familiar with the other script. + +4.3. Length Considerations + + [RFC3066] did not provide an upper limit on the size of language + tags. While RFC 3066 did define the semantics of particular subtags + in such a way that most language tags consisted of language and + region subtags with a combined total length of up to six characters, + larger registered tags were not only possible but were actually + registered. + + Neither the language tag syntax nor other requirements in this + document impose a fixed upper limit on the number of subtags in a + language tag (and thus an upper bound on the size of a tag). The + language tag syntax suggests that, depending on the specific + language, more subtags (and thus a longer tag) are sometimes + necessary to completely identify the language for certain + applications; thus, it is possible to envision long or complex subtag + sequences. + + + +Phillips & Davis Best Current Practice [Page 41] + +RFC 4646 Tags for Identifying Languages September 2006 + + +4.3.1. Working with Limited Buffer Sizes + + Some applications and protocols are forced to allocate fixed buffer + sizes or otherwise limit the length of a language tag. A conformant + implementation or specification MAY refuse to support the storage of + language tags that exceed a specified length. Any such limitation + SHOULD be clearly documented, and such documentation SHOULD include + what happens to longer tags (for example, whether an error value is + generated or the language tag is truncated). A protocol that allows + tags to be truncated at an arbitrary limit, without giving any + indication of what that limit is, has the potential for causing harm + by changing the meaning of tags in substantial ways. + + In practice, most language tags do not require more than a few + subtags and will not approach reasonably sized buffer limitations; + see Section 4.1. + + Some specifications or protocols have limits on tag length but do not + have a fixed length limitation. For example, [RFC2231] has no + explicit length limitation: the length available for the language tag + is constrained by the length of other header components (such as the + charset's name) coupled with the 76-character limit in [RFC2047]. + Thus, the "limit" might be 50 or more characters, but it could + potentially be quite small. + + The considerations for assigning a buffer limit are: + + Implementations SHOULD NOT truncate language tags unless the + meaning of the tag is purposefully being changed, or unless the + tag does not fit into a limited buffer size specified by a + protocol for storage or transmission. + + Implementations SHOULD warn the user when a tag is truncated since + truncation changes the semantic meaning of the tag. + + Implementations of protocols or specifications that are space + constrained but do not have a fixed limit SHOULD use the longest + possible tag in preference to truncation. + + Protocols or specifications that specify limited buffer sizes for + language tags MUST allow for language tags of up to 33 characters. + + Protocols or specifications that specify limited buffer sizes for + language tags SHOULD allow for language tags of at least 42 + characters. + + + + + + +Phillips & Davis Best Current Practice [Page 42] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The following illustration shows how the 42-character recommendation + was derived. The combination of language and extended language + subtags was chosen for future compatibility. At up to 15 characters, + this combination is longer than the longest possible primary language + subtag (8 characters): + + language = 3 (ISO 639-2; ISO 639-1 requires 2) + extlang1 = 4 (each subsequent subtag includes '-') + extlang2 = 4 (unlikely: needs prefix="language-extlang1") + extlang3 = 4 (extremely unlikely) + script = 5 (if not suppressed: see Section 4.1) + region = 4 (UN M.49; ISO 3166 requires 3) + variant1 = 9 (MUST have language as a prefix) + variant2 = 9 (MUST have language-variant1 as a prefix) + + total = 42 characters + + Figure 7: Derivation of the Limit on Tag Length + +4.3.2. Truncation of Language Tags + + Truncation of a language tag alters the meaning of the tag, and thus + SHOULD be avoided. However, truncation of language tags is sometimes + necessary due to limited buffer sizes. Such truncation MUST NOT + permit a subtag to be chopped off in the middle or the formation of + invalid tags (for example, one ending with the "-" character). + + This means that applications or protocols that truncate tags MUST do + so by progressively removing subtags along with their preceding "-" + from the right side of the language tag until the tag is short enough + for the given buffer. If the resulting tag ends with a single- + character subtag, that subtag and its preceding "-" MUST also be + removed. For example: + + Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 + 1. zh-Latn-CN-variant1-a-extend1-x-wadegile + 2. zh-Latn-CN-variant1-a-extend1 + 3. zh-Latn-CN-variant1 + 4. zh-Latn-CN + 5. zh-Latn + 6. zh + + Figure 8: Example of Tag Truncation + + + + + + + + +Phillips & Davis Best Current Practice [Page 43] + +RFC 4646 Tags for Identifying Languages September 2006 + + +4.4. Canonicalization of Language Tags + + Since a particular language tag is sometimes used by many processes, + language tags SHOULD always be created or generated in a canonical + form. + + A language tag is in canonical form when: + + 1. The tag is well-formed according the rules in Section 2.1 and + Section 2.2. + + 2. Subtags of type 'Region' that have a Preferred-Value mapping in + the IANA registry (see Section 3.1) SHOULD be replaced with their + mapped value. Note: In rare cases, the mapped value will also + have a Preferred-Value. + + 3. Redundant or grandfathered tags that have a Preferred-Value + mapping in the IANA registry (see Section 3.1) MUST be replaced + with their mapped value. These items either are deprecated + mappings created before the adoption of this document (such as + the mapping of "no-nyn" to "nn" or "i-klingon" to "tlh") or are + the result of later registrations or additions to this document + (for example, "zh-guoyu" might be mapped to a language-extlang + combination such as "zh-cmn" by some future update of this + document). + + 4. Other subtags that have a Preferred-Value mapping in the IANA + registry (see Section 3.1) MUST be replaced with their mapped + value. These items consist entirely of clerical corrections to + ISO 639-1 in which the deprecated subtags have been maintained + for compatibility purposes. + + 5. If more than one extension subtag sequence exists, the extension + sequences are ordered into case-insensitive ASCII order by + singleton subtag. + + Example: The language tag "en-A-aaa-B-ccc-bbb-x-xyz" is in canonical + form, while "en-B-ccc-bbb-A-aaa-X-xyz" is well-formed but not in + canonical form. + + Example: The language tag "en-BU" (English as used in Burma) is not + canonical because the 'BU' subtag has a canonical mapping to 'MM' + (Myanmar), although the tag "en-BU" maintains its validity. + + Canonicalization of language tags does not imply anything about the + use of upper or lowercase letters when processing or comparing + subtags (and as described in Section 2.1). All comparisons MUST be + performed in a case-insensitive manner. + + + +Phillips & Davis Best Current Practice [Page 44] + +RFC 4646 Tags for Identifying Languages September 2006 + + + When performing canonicalization of language tags, processors MAY + regularize the case of the subtags (that is, this process is + OPTIONAL), following the case used in the registry. Note that this + corresponds to the following casing rules: uppercase all non-initial + two-letter subtags; titlecase all non-initial four-letter subtags; + lowercase everything else. + + Note: Case folding of ASCII letters in certain locales, unless + carefully handled, sometimes produces non-ASCII character values. + The Unicode Character Database file "SpecialCasing.txt" defines the + specific cases that are known to cause problems with this. In + particular, the letter 'i' (U+0069) in Turkish and Azerbaijani is + uppercased to U+0130 (LATIN CAPITAL LETTER I WITH DOT ABOVE). + Implementers SHOULD specify a locale-neutral casing operation to + ensure that case folding of subtags does not produce this value, + which is illegal in language tags. For example, if one were to + uppercase the region subtag 'in' using Turkish locale rules, the + sequence U+0130 U+004E would result instead of the expected 'IN'. + + Note: if the field 'Deprecated' appears in a registry record without + an accompanying 'Preferred-Value' field, then that tag or subtag is + deprecated without a replacement. Validating processors SHOULD NOT + generate tags that include these values, although the values are + canonical when they appear in a language tag. + + An extension MUST define any relationships that exist between the + various subtags in the extension and thus MAY define an alternate + canonicalization scheme for the extension's subtags. Extensions MAY + define how the order of the extension's subtags are interpreted. For + example, an extension could define that its subtags are in canonical + order when the subtags are placed into ASCII order: that is, + "en-a-aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension + might define that the order of the subtags influences their semantic + meaning (so that "en-b-ccc-bbb-aaa" has a different value from + "en-b-aaa-bbb-ccc"). However, extension specifications SHOULD be + designed so that they are tolerant of the typical processes described + in Section 3.7. + +4.5. Considerations for Private Use Subtags + + Private use subtags, like all other subtags, MUST conform to the + format and content constraints in the ABNF. Private use subtags have + no meaning outside the private agreement between the parties that + intend to use or exchange language tags that employ them. The same + subtags MAY be used with a different meaning under a separate private + agreement. They SHOULD NOT be used where alternatives exist and + SHOULD NOT be used in content or protocols intended for general use. + + + + +Phillips & Davis Best Current Practice [Page 45] + +RFC 4646 Tags for Identifying Languages September 2006 + + + Private use subtags are simply useless for information exchange + without prior arrangement. The value and semantic meaning of private + use tags and of the subtags used within such a language tag are not + defined by this document. + + Subtags defined in the IANA registry as having a specific private use + meaning convey more information that a purely private use tag + prefixed by the singleton subtag 'x'. For applications, this + additional information MAY be useful. + + For example, the region subtags 'AA', 'ZZ', and in the ranges + 'QM'-'QZ' and 'XA'-'XZ' (derived from ISO 3166 private use codes) MAY + be used to form a language tag. A tag such as "zh-Hans-XQ" conveys a + great deal of public, interchangeable information about the language + material (that it is Chinese in the simplified Chinese script and is + suitable for some geographic region 'XQ'). While the precise + geographic region is not known outside of private agreement, the tag + conveys far more information than an opaque tag such as "x-someLang", + which contains no information about the language subtag or script + subtag outside of the private agreement. + + However, in some cases content tagged with private use subtags MAY + interact with other systems in a different and possibly unsuitable + manner compared to tags that use opaque, privately defined subtags, + so the choice of the best approach sometimes depends on the + particular domain in question. + +5. IANA Considerations + + This section deals with the processes and requirements necessary for + IANA to undertake to maintain the subtag and extension registries as + defined by this document and in accordance with the requirements of + [RFC2434]. + + The impact on the IANA maintainers of the two registries defined by + this document will be a small increase in the frequency of new + entries or updates. + +5.1. Language Subtag Registry + + Upon adoption of this document, the registry will be initialized by a + companion document: [RFC4645]. The criteria and process for + selecting the initial set of records are described in that document. + The initial set of records represents no impact on IANA, since the + work to create it will be performed externally. + + + + + + +Phillips & Davis Best Current Practice [Page 46] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The new registry MUST be listed under "Language Tags" at + <http://www.iana.org/numbers.html>, replacing the existing + registrations defined by [RFC3066]. The existing set of registration + forms and RFC 3066 registrations MUST be relabeled as "Language Tags + (Obsolete)" and maintained (but not added to or modified). + + Future work on the Language Subtag Registry SHALL be limited to + inserting or replacing whole records preformatted for IANA by the + Language Subtag Reviewer as described in Section 3.3 of this document + and archiving the forwarded registration form. + + Each record MUST be sent to iana@iana.org with a subject line + indicating whether the enclosed record is an insertion of a new + record (indicated by the word "INSERT" in the subject line) or a + replacement of an existing record (indicated by the word "MODIFY" in + the subject line). Records MUST NOT be deleted from the registry. + IANA MUST place any inserted or modified records into the appropriate + section of the language subtag registry, grouping the records by + their 'Type' field. Inserted records MAY be placed anywhere in the + appropriate section; there is no guarantee of the order of the + records beyond grouping them together by 'Type'. Modified records + MUST overwrite the record they replace. + + Included in any request to insert or modify records MUST be a new + File-Date record. This record MUST be placed first in the registry. + In the event that the File-Date record present in the registry has a + later date than the record being inserted or modified, the existing + record MUST be preserved. + +5.2. Extensions Registry + + The Language Tag Extensions Registry will also be generated and sent + to IANA as described in Section 3.7. This registry can contain at + most 35 records, and thus changes to this registry are expected to be + very infrequent. + + Future work by IANA on the Language Tag Extensions Registry is + limited to two cases. First, the IESG MAY request that new records + be inserted into this registry from time to time. These requests + MUST include the record to insert in the exact format described in + Section 3.7. In addition, there MAY be occasional requests from the + maintaining authority for a specific extension to update the contact + information or URLs in the record. These requests MUST include the + complete, updated record. IANA is not responsible for validating the + information provided, only that it is properly formatted. It should + reasonably be seen to come from the maintaining authority named in + the record present in the registry. + + + + +Phillips & Davis Best Current Practice [Page 47] + +RFC 4646 Tags for Identifying Languages September 2006 + + +6. Security Considerations + + Language tags used in content negotiation, like any other information + exchanged on the Internet, might be a source of concern because they + might be used to infer the nationality of the sender, and thus + identify potential targets for surveillance. + + This is a special case of the general problem that anything sent is + visible to the receiving party and possibly to third parties as well. + It is useful to be aware that such concerns can exist in some cases. + + The evaluation of the exact magnitude of the threat, and any possible + countermeasures, is left to each application protocol (see BCP 72 + [RFC3552] for best current practice guidance on security threats and + defenses). + + The language tag associated with a particular information item is of + no consequence whatsoever in determining whether that content might + contain possible homographs. The fact that a text is tagged as being + in one language or using a particular script subtag provides no + assurance whatsoever that it does not contain characters from scripts + other than the one(s) associated with or specified by that language + tag. + + Since there is no limit to the number of variant, private use, and + extension subtags, and consequently no limit on the possible length + of a tag, implementations need to guard against buffer overflow + attacks. See Section 4.3 for details on language tag truncation, + which can occur as a consequence of defenses against buffer overflow. + + Although the specification of valid subtags for an extension (see + Section 3.7) MUST be available over the Internet, implementations + SHOULD NOT mechanically depend on it being always accessible, to + prevent denial-of-service attacks. + +7. Character Set Considerations + + The syntax in this document requires that language tags use only the + characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most + character sets, so the composition of language tags should not have + any character set issues. + + Rendering of characters based on the content of a language tag is not + addressed in this memo. Historically, some languages have relied on + the use of specific character sets or other information in order to + infer how a specific character should be rendered (notably this + applies to language- and culture-specific variations of Han + ideographs as used in Japanese, Chinese, and Korean). When language + + + +Phillips & Davis Best Current Practice [Page 48] + +RFC 4646 Tags for Identifying Languages September 2006 + + + tags are applied to spans of text, rendering engines sometimes use + that information in deciding which font to use in the absence of + other information, particularly where languages with distinct writing + traditions use the same characters. + +8. Changes from RFC 3066 + + The main goals for this revision of language tags were the following: + + *Compatibility.* All RFC 3066 language tags (including those in the + IANA registry) remain valid in this specification. The changes in + this document represent additional constraints on language tags. + That is, in no case is the syntax more permissive and processors + based on the ABNF and other provisions of RFC 3066 (such as those + described in [XMLSchema]) will be able to process the tags described + by this document. In addition, this document defines language tags + in such as way as to ensure future compatibility. + + *Stability.* Because of changes in the past in the underlying ISO + standards, a valid RFC 3066 language tag could become invalid or have + its meaning change. This has the potential of invalidating content + that may have an extensive shelf-life. In this specification, once a + language tag is valid, it remains valid forever. + + *Validity.* The structure of language tags defined by this document + makes it possible to determine if a particular tag is well-formed + without regard for the actual content or "meaning" of the tag as a + whole. This is important because the registry grows and underlying + standards change over time. In addition, it must be possible to + determine if a tag is valid (or not) for a given point in time in + order to provide reproducible, testable results. This process must + not be error-prone; otherwise implementations might give different + results. By having an authoritative registry with specific + versioning information, the validity of language tags at any point in + time can be precisely determined (instead of interpolating values + from many separate sources). + + *Utility.* It is sometimes important to be able to differentiate + between written forms of a language -- for many implementations this + is more important than distinguishing between the spoken variants of + a language. Languages are written in a wide variety of different + scripts, so this document provides for the generative use of ISO + 15924 script codes. Like the generative use of ISO language and + country codes in RFC 3066, this allows combinations to be produced + without resorting to the registration process. The addition of UN + M.49 codes provides for the generation of language tags with regional + scope, which is also required by some applications. + + + + +Phillips & Davis Best Current Practice [Page 49] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The recast of the registry from containing whole language tags to + subtags is a key part of this. An important feature of RFC 3066 was + that it allowed generative use of subtags. This allows people to + meaningfully use generated tags, without the delays in registering + whole tags or the need to register all of the combinations that might + be useful. + + The choice of placing the extended language and script subtags + between the primary language and region subtags was widely debated. + This design was chosen because the prevalent matching and content + negotiation schemes rely on the subtags being arranged in order of + increasing specificity. That is, the subtags that mark a greater + barrier to mutual intelligibility appear left-most in a tag. For + example, when selecting content written in Azerbaijani, the script + (Arabic, Cyrillic, or Latin) represents a greater barrier to + understanding than any regional variations (those associated with + Azerbaijan or Iran, for example). Individuals who prefer documents + in a particular script, but can deal with the minor regional + differences, can therefore select appropriate content. Applications + that do not deal with written content will continue to omit these + subtags. + + *Extensibility.* Because of the widespread use of language tags, it + is disruptive to have periodic revisions of the core specification, + even in the face of demonstrated need. The extension mechanism + provides for a way for independent RFCs to define extensions to + language tags. These extensions have a very constrained, well- + defined structure that prevents extensions from interfering with + implementations of language tags defined in this document. + + The document also anticipates features of ISO 639-3 with the addition + of the extended language subtags, as well as the possibility of other + ISO 639 parts becoming useful for the formation of language tags in + the future. + + The use and definition of private use tags have also been modified, + to allow people to use private use subtags to extend or modify + defined tags and to move as much information as possible out of + private use and into the regular structure. + + The goal for each of these modifications is to reduce or eliminate + the need for future revisions of this document. + + + + + + + + + +Phillips & Davis Best Current Practice [Page 50] + +RFC 4646 Tags for Identifying Languages September 2006 + + + The specific changes in this document to meet these goals are: + + o Defines the ABNF and rules for subtags so that the category of all + subtags can be determined without reference to the registry. + + o Adds the concept of well-formed vs. validating processors, + defining the rules by which an implementation can claim to be one + or the other. + + o Replaces the IANA language tag registry with a language subtag + registry that provides a complete list of valid subtags in the + IANA registry. This allows for robust implementation and ease of + maintenance. The language subtag registry becomes the canonical + source for forming language tags. + + o Provides a process that guarantees stability of language tags, by + handling reuse of values by ISO 639, ISO 15924, and ISO 3166 in + the event that they register a previously used value for a new + purpose. + + o Allows ISO 15924 script code subtags and allows them to be used + generatively. Defines a method for indicating in the registry + when script subtags are necessary for a given language tag. + + o Adds the concept of a variant subtag and allows variants to be + used generatively. + + o Adds the ability to use a class of UN M.49 tags for supra-national + regions and to resolve conflicts in the assignment of ISO 3166 + codes. + + o Defines the private use tags in ISO 639, ISO 15924, and ISO 3166 + as the mechanism for creating private use language, script, and + region subtags, respectively. + + o Adds a well-defined extension mechanism. + + o Defines an extended language subtag, possibly for use with certain + anticipated features of ISO 639-3. + + + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 51] + +RFC 4646 Tags for Identifying Languages September 2006 + + +9. References + +9.1. Normative References + + [ISO10646] International Organization for Standardization, + "ISO/IEC 10646:2003. Information technology -- + Universal Multiple-Octet Coded Character Set (UCS)", + 2003. + + [ISO15924] International Organization for Standardization, "ISO + 15924:2004. Information and documentation -- Codes for + the representation of names of scripts", January 2004. + + [ISO3166-1] International Organization for Standardization, "ISO + 3166-1:1997. Codes for the representation of names of + countries and their subdivisions -- Part 1: Country + codes", 1997. + + [ISO639-1] International Organization for Standardization, "ISO + 639-1:2002. Codes for the representation of names of + languages -- Part 1: Alpha-2 code", 2002. + + [ISO639-2] International Organization for Standardization, "ISO + 639-2:1998. Codes for the representation of names of + languages -- Part 2: Alpha-3 code, first edition", + 1998. + + [ISO646] International Organization for Standardization, + "ISO/IEC 646:1991, Information technology -- ISO 7-bit + coded character set for information interchange.", + 1991. + + [RFC2026] Bradner, S., "The Internet Standards Process -- + Revision 3", BCP 9, RFC 2026, October 1996. + + [RFC2028] Hovey, R. and S. Bradner, "The Organizations Involved + in the IETF Standards Process", BCP 11, RFC 2028, + October 1996. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing + an IANA Considerations Section in RFCs", BCP 26, + RFC 2434, October 1998. + + + + + + +Phillips & Davis Best Current Practice [Page 52] + +RFC 4646 Tags for Identifying Languages September 2006 + + + [RFC2860] Carpenter, B., Baker, F., and M. Roberts, "Memorandum + of Understanding Concerning the Technical Work of the + Internet Assigned Numbers Authority", RFC 2860, + June 2000. + + [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the + Internet: Timestamps", RFC 3339, July 2002. + + [RFC4234] Crocker, D., Ed. and P. Overell, "Augmented BNF for + Syntax Specifications: ABNF", RFC 4234, October 2005. + + [UN_M.49] Statistics Division, United Nations, "Standard Country + or Area Codes for Statistical Use", UN Standard + Country or Area Codes for Statistical Use, Revision 4 + (United Nations publication, Sales No. 98.XVII.9, + June 1999. + +9.2. Informative References + + [RFC1766] Alvestrand, H., "Tags for the Identification of + Languages", RFC 1766, March 1995. + + [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail + Extensions) Part Three: Message Header Extensions for + Non-ASCII Text", RFC 2047, November 1996. + + [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and + Encoded Word Extensions: Character Sets, Languages, + and Continuations", RFC 2231, November 1997. + + [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of + ISO 10646", RFC 2781, February 2000. + + [RFC3066] Alvestrand, H., "Tags for the Identification of + Languages", BCP 47, RFC 3066, January 2001. + + [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing + RFC Text on Security Considerations", BCP 72, + RFC 3552, July 2003. + + [RFC4645] Ewell, D., Ed., "Initial Language Subtag Registry", + RFC 4645, September 2006. + + [RFC4647] Phillips, A., Ed. and M. Davis, Ed., "Matching of + Language Tags", BCP 47, RFC 4647, September 2006. + + + + + + +Phillips & Davis Best Current Practice [Page 53] + +RFC 4646 Tags for Identifying Languages September 2006 + + + [Unicode] Unicode Consortium, "The Unicode Standard, Version + 5.0", Boston, MA, Addison-Wesley, 2007. ISBN 0-321- + 48091-0. + + [XML10] Bray (et al), T., "Extensible Markup Language (XML) + 1.0", 02 2004. + + [XMLSchema] Biron, P., Ed. and A. Malhotra, Ed., "XML Schema Part + 2: Datatypes Second Edition", 10 2004, < + http://www.w3.org/TR/xmlschema-2/>. + + [iso639.prin] ISO 639 Joint Advisory Committee, "ISO 639 Joint + Advisory Committee: Working principles for ISO 639 + maintenance", March 2000, <http://www.loc.gov/ + standards/iso639-2/iso639jac_n3r.html>. + + [record-jar] Raymond, E., "The Art of Unix Programming", 2003, + <urn:isbn:0-13-142901-9>. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 54] + +RFC 4646 Tags for Identifying Languages September 2006 + + +Appendix A. Acknowledgements + + Any list of contributors is bound to be incomplete; please regard the + following as only a selection from the group of people who have + contributed to make this document what it is today. + + The contributors to RFC 3066 and RFC 1766, the precursors of this + document, made enormous contributions directly or indirectly to this + document and are generally responsible for the success of language + tags. + + The following people (in alphabetical order) contributed to this + document or to RFCs 1766 and 3066: + + Glenn Adams, Harald Tveit Alvestrand, Tim Berners-Lee, Marc Blanchet, + Nathaniel Borenstein, Karen Broome, Eric Brunner, Sean M. Burke, M.T. + Carrasco Benitez, Jeremy Carroll, John Clews, Jim Conklin, Peter + Constable, John Cowan, Mark Crispin, Dave Crocker, Elwyn Davies, + Martin Duerst, Frank Ellerman, Michael Everson, Doug Ewell, Ned + Freed, Tim Goodwin, Dirk-Willem van Gulik, Marion Gunn, Joel Halpren, + Elliotte Rusty Harold, Paul Hoffman, Scott Hollenbeck, Richard + Ishida, Olle Jarnefors, Kent Karlsson, John Klensin, Erkki + Kolehmainen, Alain LaBonte, Eric Mader, Ira McDonald, Keith Moore, + Chris Newman, Masataka Ohta, Dylan Pierce, Randy Presuhn, George + Rhoten, Felix Sasaki, Markus Scherer, Keld Jorn Simonsen, Thierry + Sourbier, Otto Stolz, Tex Texin, Andrea Vine, Rhys Weatherley, Misha + Wolf, Francois Yergeau and many, many others. + + Very special thanks must go to Harald Tveit Alvestrand, who + originated RFCs 1766 and 3066, and without whom this document would + not have been possible. Special thanks must go to Michael Everson, + who has served as Language Tag Reviewer for almost the complete + period since the publication of RFC 1766. Special thanks to Doug + Ewell, for his production of the first complete subtag registry, and + his work in producing a test parser for verifying language tags. + + + + + + + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 55] + +RFC 4646 Tags for Identifying Languages September 2006 + + +Appendix B. Examples of Language Tags (Informative) + + Simple language subtag: + + de (German) + + fr (French) + + ja (Japanese) + + i-enochian (example of a grandfathered tag) + + Language subtag plus Script subtag: + + zh-Hant (Chinese written using the Traditional Chinese script) + + zh-Hans (Chinese written using the Simplified Chinese script) + + sr-Cyrl (Serbian written using the Cyrillic script) + + sr-Latn (Serbian written using the Latin script) + + Language-Script-Region: + + zh-Hans-CN (Chinese written using the Simplified script as used in + mainland China) + + sr-Latn-CS (Serbian written using the Latin script as used in + Serbia and Montenegro) + + Language-Variant: + + sl-rozaj (Resian dialect of Slovenian + + sl-nedis (Nadiza dialect of Slovenian) + + Language-Region-Variant: + + de-CH-1901 (German as used in Switzerland using the 1901 variant + [orthography]) + + sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) + + + + + + + + + +Phillips & Davis Best Current Practice [Page 56] + +RFC 4646 Tags for Identifying Languages September 2006 + + + Language-Script-Region-Variant: + + sl-Latn-IT-nedis (Nadiza dialect of Slovenian written using the + Latin script as used in Italy. Note that this tag is NOT + RECOMMENDED because subtag 'sl' has a Suppress-Script value of + 'Latn') + + Language-Region: + + de-DE (German for Germany) + + en-US (English as used in the United States) + + es-419 (Spanish appropriate for the Latin America and Caribbean + region using the UN region code) + + Private use subtags: + + de-CH-x-phonebk + + az-Arab-x-AZE-derbend + + Extended language subtags (examples ONLY: extended languages MUST be + defined by revision or update to this document): + + zh-min + + zh-min-nan-Hant-CN + + Private use registry values: + + x-whatever (private use using the singleton 'x') + + qaa-Qaaa-QM-x-southern (all private tags) + + de-Qaaa (German, with a private script) + + sr-Latn-QM (Serbian, Latin-script, private region) + + sr-Qaaa-CS (Serbian, private script, for Serbia and Montenegro) + + Tags that use extensions (examples ONLY: extensions MUST be defined + by revision or update to this document or by RFC): + + en-US-u-islamCal + + zh-CN-a-myExt-x-private + + + + +Phillips & Davis Best Current Practice [Page 57] + +RFC 4646 Tags for Identifying Languages September 2006 + + + en-a-myExt-b-another + + Some Invalid Tags: + + de-419-DE (two region tags) + + a-DE (use of a single-character subtag in primary position; note + that there are a few grandfathered tags that start with "i-" that + are valid) + + ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter + prefix) + +Authors' Addresses + + Addison Phillips (Editor) + Yahoo! Inc. + + EMail: addison@inter-locale.com + + + Mark Davis (Editor) + Google + + EMail: mark.davis@macchiato.com or mark.davis@google.com + + + + + + + + + + + + + + + + + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 58] + +RFC 4646 Tags for Identifying Languages September 2006 + + +Full Copyright Statement + + Copyright (C) The Internet Society (2006). + + This document is subject to the rights, licenses and restrictions + contained in BCP 78, and except as set forth therein, the authors + retain all their rights. + + This document and the information contained herein are provided on an + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET + ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, + INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE + INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + + The IETF takes no position regarding the validity or scope of any + Intellectual Property Rights or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; nor does it represent that it has + made any independent effort to identify any such rights. Information + on the procedures with respect to rights in RFC documents can be + found in BCP 78 and BCP 79. + + Copies of IPR disclosures made to the IETF Secretariat and any + assurances of licenses to be made available, or the result of an + attempt made to obtain a general license or permission for the use of + such proprietary rights by implementers or users of this + specification can be obtained from the IETF on-line IPR repository at + http://www.ietf.org/ipr. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights that may cover technology that may be required to implement + this standard. Please address the information to the IETF at + ietf-ipr@ietf.org. + +Acknowledgement + + Funding for the RFC Editor function is provided by the IETF + Administrative Support Activity (IASA). + + + + + + + +Phillips & Davis Best Current Practice [Page 59] + |