From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc5646.txt | 4707 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 4707 insertions(+) create mode 100644 doc/rfc/rfc5646.txt (limited to 'doc/rfc/rfc5646.txt') diff --git a/doc/rfc/rfc5646.txt b/doc/rfc/rfc5646.txt new file mode 100644 index 0000000..327b832 --- /dev/null +++ b/doc/rfc/rfc5646.txt @@ -0,0 +1,4707 @@ + + + + + + +Network Working Group A. Phillips, Ed. +Request for Comments: 5646 Lab126 +BCP: 47 M. Davis, Ed. +Obsoletes: 4646 Google +Category: Best Current Practice September 2009 + + + Tags for Identifying Languages + +Abstract + + This document describes the structure, content, construction, and + semantics of language tags for use in cases where it is desirable to + indicate the language used in an information object. It also + describes how to register values for use in language tags and the + creation of user-defined extensions for private interchange. + +Status of This Memo + + This document specifies an Internet Best Current Practices for the + Internet Community, and requests discussion and suggestions for + improvements. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (c) 2009 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents in effect on the date of + publication of this document (http://trustee.ietf.org/license-info). + Please review these documents carefully, as they describe your rights + and restrictions with respect to this document. + + This document may contain material from IETF Documents or IETF + Contributions published or made publicly available before November + 10, 2008. The person(s) controlling the copyright in some of this + material may not have granted the IETF Trust the right to allow + modifications of such material outside the IETF Standards Process. + Without obtaining an adequate license from the person(s) controlling + the copyright in such materials, this document may not be modified + outside the IETF Standards Process, and derivative works of it may + not be created outside the IETF Standards Process, except to format + it for publication as an RFC or to translate it into languages other + than English. + + + + + + +Phillips & Davis Best Current Practice [Page 1] + +RFC 5646 Language Tags September 2009 + + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 + 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4 + 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4 + 2.1.1. Formatting of Language Tags . . . . . . . . . . . . . 6 + 2.2. Language Subtag Sources and Interpretation . . . . . . . . 8 + 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . . 9 + 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11 + 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 12 + 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 13 + 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 15 + 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 16 + 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 18 + 2.2.8. Grandfathered and Redundant Registrations . . . . . . 18 + 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 19 + 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 21 + 3.1. Format of the IANA Language Subtag Registry . . . . . . . 21 + 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 21 + 3.1.2. Record and Field Definitions . . . . . . . . . . . . . 23 + 3.1.3. Type Field . . . . . . . . . . . . . . . . . . . . . . 26 + 3.1.4. Subtag and Tag Fields . . . . . . . . . . . . . . . . 26 + 3.1.5. Description Field . . . . . . . . . . . . . . . . . . 26 + 3.1.6. Deprecated Field . . . . . . . . . . . . . . . . . . . 28 + 3.1.7. Preferred-Value Field . . . . . . . . . . . . . . . . 28 + 3.1.8. Prefix Field . . . . . . . . . . . . . . . . . . . . . 31 + 3.1.9. Suppress-Script Field . . . . . . . . . . . . . . . . 32 + 3.1.10. Macrolanguage Field . . . . . . . . . . . . . . . . . 32 + 3.1.11. Scope Field . . . . . . . . . . . . . . . . . . . . . 33 + 3.1.12. Comments Field . . . . . . . . . . . . . . . . . . . . 34 + 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 35 + 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 35 + 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 36 + 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 41 + 3.6. Possibilities for Registration . . . . . . . . . . . . . . 46 + 3.7. Extensions and the Extensions Registry . . . . . . . . . . 49 + 3.8. Update of the Language Subtag Registry . . . . . . . . . . 52 + 3.9. Applicability of the Subtag Registry . . . . . . . . . . . 52 + 4. Formation and Processing of Language Tags . . . . . . . . . . 53 + 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 53 + 4.1.1. Tagging Encompassed Languages . . . . . . . . . . . . 58 + 4.1.2. Using Extended Language Subtags . . . . . . . . . . . 59 + 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 61 + 4.3. Lists of Languages . . . . . . . . . . . . . . . . . . . . 63 + 4.4. Length Considerations . . . . . . . . . . . . . . . . . . 63 + 4.4.1. Working with Limited Buffer Sizes . . . . . . . . . . 64 + 4.4.2. Truncation of Language Tags . . . . . . . . . . . . . 65 + 4.5. Canonicalization of Language Tags . . . . . . . . . . . . 66 + + + +Phillips & Davis Best Current Practice [Page 2] + +RFC 5646 Language Tags September 2009 + + + 4.6. Considerations for Private Use Subtags . . . . . . . . . . 68 + 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 69 + 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 69 + 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 71 + 6. Security Considerations . . . . . . . . . . . . . . . . . . . 71 + 7. Character Set Considerations . . . . . . . . . . . . . . . . . 72 + 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 73 + 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 76 + 9.1. Normative References . . . . . . . . . . . . . . . . . . . 76 + 9.2. Informative References . . . . . . . . . . . . . . . . . . 78 + Appendix A. Examples of Language Tags (Informative) . . . . . . . 80 + Appendix B. Examples of Registration Forms . . . . . . . . . . . 82 + Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 83 + +1. Introduction + + Human beings on our planet have, past and present, used a number of + languages. There are many reasons why one would want to identify the + language used when presenting or requesting information. + + The language of an information item or a user's language preferences + often need to be identified so that appropriate processing can be + applied. For example, the user's language preferences in a Web + browser can be used to select Web pages appropriately. Language + information can also be used to select among tools (such as + dictionaries) to assist in the processing or understanding of content + in different languages. Knowledge about the particular language used + by some piece of information content might be useful or even required + by some types of processing, for example, spell-checking, computer- + synthesized speech, Braille transcription, or high-quality print + renderings. + + One means of indicating the language used is by labeling the + information content with an identifier or "tag". These tags can also + be used to specify the user's preferences when selecting information + content or to label additional attributes of content and associated + resources. + + Sometimes language tags are used to indicate additional language + attributes of content. For example, indicating specific information + about the dialect, writing system, or orthography used in a document + or resource may enable the user to obtain information in a form that + they can understand, or it can be important in processing or + rendering the given content into an appropriate form or style. + + This document specifies a particular identifier mechanism (the + language tag) and a registration function for values to be used to + + + + +Phillips & Davis Best Current Practice [Page 3] + +RFC 5646 Language Tags September 2009 + + + form tags. It also defines a mechanism for private use values and + future extensions. + + This document replaces [RFC4646] (which obsoleted [RFC3066] which, in + turn, replaced [RFC1766]). This document, in combination with + [RFC4647], comprises BCP 47. For a list of changes in this document, + see Section 8. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + +2. The Language Tag + + Language tags are used to help identify languages, whether spoken, + written, signed, or otherwise signaled, for the purpose of + communication. This includes constructed and artificial languages + but excludes languages not intended primarily for human + communication, such as programming languages. + +2.1. Syntax + + A language tag is composed from a sequence of one or more "subtags", + each of which refines or narrows the range of language identified by + the overall tag. Subtags, in turn, are a sequence of alphanumeric + characters (letters and digits), distinguished and separated from + other subtags in a tag by a hyphen ("-", [Unicode] U+002D). + + There are different types of subtag, each of which is distinguished + by length, position in the tag, and content: each subtag's type can + be recognized solely by these features. This makes it possible to + extract and assign some semantic information to the subtags, even if + the specific subtag values are not recognized. Thus, a language tag + processor need not have a list of valid tags or subtags (that is, a + copy of some version of the IANA Language Subtag Registry) in order + to perform common searching and matching operations. The only + exceptions to this ability to infer meaning from subtag structure are + the grandfathered tags listed in the productions 'regular' and + 'irregular' below. These tags were registered under [RFC3066] and + are a fixed list that can never change. + + The syntax of the language tag in ABNF [RFC5234] is: + + Language-Tag = langtag ; normal language tags + / privateuse ; private use tag + / grandfathered ; grandfathered tags + + + + + +Phillips & Davis Best Current Practice [Page 4] + +RFC 5646 Language Tags September 2009 + + + langtag = language + ["-" script] + ["-" region] + *("-" variant) + *("-" extension) + ["-" privateuse] + + language = 2*3ALPHA ; shortest ISO 639 code + ["-" extlang] ; sometimes followed by + ; extended language subtags + / 4ALPHA ; or reserved for future use + / 5*8ALPHA ; or registered language subtag + + extlang = 3ALPHA ; selected ISO 639 codes + *2("-" 3ALPHA) ; permanently reserved + + script = 4ALPHA ; ISO 15924 code + + region = 2ALPHA ; ISO 3166-1 code + / 3DIGIT ; UN M.49 code + + variant = 5*8alphanum ; registered variants + / (DIGIT 3alphanum) + + extension = singleton 1*("-" (2*8alphanum)) + + ; Single alphanumerics + ; "x" reserved for private use + singleton = DIGIT ; 0 - 9 + / %x41-57 ; A - W + / %x59-5A ; Y - Z + / %x61-77 ; a - w + / %x79-7A ; y - z + + privateuse = "x" 1*("-" (1*8alphanum)) + + grandfathered = irregular ; non-redundant tags registered + / regular ; during the RFC 3066 era + + irregular = "en-GB-oed" ; irregular tags do not match + / "i-ami" ; the 'langtag' production and + / "i-bnn" ; would not otherwise be + / "i-default" ; considered 'well-formed' + / "i-enochian" ; These tags are all valid, + / "i-hak" ; but most are deprecated + / "i-klingon" ; in favor of more modern + / "i-lux" ; subtags or subtag + / "i-mingo" ; combination + + + +Phillips & Davis Best Current Practice [Page 5] + +RFC 5646 Language Tags September 2009 + + + / "i-navajo" + / "i-pwn" + / "i-tao" + / "i-tay" + / "i-tsu" + / "sgn-BE-FR" + / "sgn-BE-NL" + / "sgn-CH-DE" + + regular = "art-lojban" ; these tags match the 'langtag' + / "cel-gaulish" ; production, but their subtags + / "no-bok" ; are not extended language + / "no-nyn" ; or variant subtags: their meaning + / "zh-guoyu" ; is defined by their registration + / "zh-hakka" ; and all of these are deprecated + / "zh-min" ; in favor of a more modern + / "zh-min-nan" ; subtag or sequence of subtags + / "zh-xiang" + + alphanum = (ALPHA / DIGIT) ; letters and numbers + + Figure 1: Language Tag ABNF + + For examples of language tags, see Appendix A. + + All subtags have a maximum length of eight characters. Whitespace is + not permitted in a language tag. There is a subtlety in the ABNF + production 'variant': a variant starting with a digit has a minimum + length of four characters, while those starting with a letter have a + minimum length of five characters. + + Although [RFC5234] refers to octets, the language tags described in + this document are sequences of characters from the US-ASCII [ISO646] + repertoire. Language tags MAY be used in documents and applications + that use other encodings, so long as these encompass the relevant + part of the US-ASCII repertoire. An example of this would be an XML + document that uses the UTF-16LE [RFC2781] encoding of [Unicode]. + +2.1.1. Formatting of Language Tags + + At all times, language tags and their subtags, including private use + and extensions, are to be treated as case insensitive: there exist + conventions for the capitalization of some of the subtags, but these + MUST NOT be taken to carry meaning. + + Thus, the tag "mn-Cyrl-MN" is not distinct from "MN-cYRL-mn" or "mN- + cYrL-Mn" (or any other combination), and each of these variations + + + + +Phillips & Davis Best Current Practice [Page 6] + +RFC 5646 Language Tags September 2009 + + + conveys the same meaning: Mongolian written in the Cyrillic script as + used in Mongolia. + + The ABNF syntax also does not distinguish between upper- and + lowercase: the uppercase US-ASCII letters in the range 'A' through + 'Z' are always considered equivalent and mapped directly to their US- + ASCII lowercase equivalents in the range 'a' through 'z'. So the tag + "I-AMI" is considered equivalent to that value "i-ami" in the + 'irregular' production. + + Although case distinctions do not carry meaning in language tags, + consistent formatting and presentation of language tags will aid + users. The format of subtags in the registry is RECOMMENDED as the + form to use in language tags. This format generally corresponds to + the common conventions for the various ISO standards from which the + subtags are derived. + + These conventions include: + + o [ISO639-1] recommends that language codes be written in lowercase + ('mn' Mongolian). + + o [ISO15924] recommends that script codes use lowercase with the + initial letter capitalized ('Cyrl' Cyrillic). + + o [ISO3166-1] recommends that country codes be capitalized ('MN' + Mongolia). + + An implementation can reproduce this format without accessing the + registry as follows. All subtags, including extension and private + use subtags, use lowercase letters with two exceptions: two-letter + and four-letter subtags that neither appear at the start of the tag + nor occur after singletons. Such two-letter subtags are all + uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four- + letter subtags are titlecase (as in the tag "az-Latn-x-latn"). + + Note: Case folding of ASCII letters in certain locales, unless + carefully handled, sometimes produces non-ASCII character values. + The Unicode Character Database file "SpecialCasing.txt" + [SpecialCasing] defines the specific cases that are known to cause + problems with this. In particular, the letter 'i' (U+0069) in + Turkish and Azerbaijani is uppercased to U+0130 (LATIN CAPITAL LETTER + I WITH DOT ABOVE). Implementers SHOULD specify a locale-neutral + casing operation to ensure that case folding of subtags does not + produce this value, which is illegal in language tags. For example, + if one were to uppercase the region subtag 'in' using Turkish locale + rules, the sequence U+0130 U+004E would result, instead of the + expected 'IN'. + + + +Phillips & Davis Best Current Practice [Page 7] + +RFC 5646 Language Tags September 2009 + + +2.2. Language Subtag Sources and Interpretation + + The namespace of language tags and their subtags is administered by + the Internet Assigned Numbers Authority (IANA) according to the rules + in Section 5 of this document. The Language Subtag Registry + maintained by IANA is the source for valid subtags: other standards + referenced in this section provide the source material for that + registry. + + Terminology used in this document: + + o "Tag" refers to a complete language tag, such as "sr-Latn-RS" or + "az-Arab-IR". Examples of tags in this document are enclosed in + double-quotes ("en-US"). + + o "Subtag" refers to a specific section of a tag, delimited by a + hyphen, such as the subtags 'zh', 'Hant', and 'CN' in the tag "zh- + Hant-CN". Examples of subtags in this document are enclosed in + single quotes ('Hant'). + + o "Code" refers to values defined in external standards (and that + are used as subtags in this document). For example, 'Hant' is an + [ISO15924] script code that was used to define the 'Hant' script + subtag for use in a language tag. Examples of codes in this + document are enclosed in single quotes ('en', 'Hant'). + + Language tags are designed so that each subtag type has unique length + and content restrictions. These make identification of the subtag's + type possible, even if the content of the subtag itself is + unrecognized. This allows tags to be parsed and processed without + reference to the latest version of the underlying standards or the + IANA registry and makes the associated exception handling when + parsing tags simpler. + + Some of the subtags in the IANA registry do not come from an + underlying standard. These can only appear in specific positions in + a tag: they can only occur as primary language subtags or as variant + subtags. + + Sequences of private use and extension subtags MUST occur at the end + of the sequence of subtags and MUST NOT be interspersed with subtags + defined elsewhere in this document. These sequences are introduced + by single-character subtags, which are reserved as follows: + + o The single-letter subtag 'x' introduces a sequence of private use + subtags. The interpretation of any private use subtag is defined + + + + + +Phillips & Davis Best Current Practice [Page 8] + +RFC 5646 Language Tags September 2009 + + + solely by private agreement and is not defined by the rules in + this section or in any standard or registry defined in this + document. + + o The single-letter subtag 'i' is used by some grandfathered tags, + such as "i-default", where it always appears in the first position + and cannot be confused with an extension. + + o All other single-letter and single-digit subtags are reserved to + introduce standardized extension subtag sequences as described in + Section 3.7. + +2.2.1. Primary Language Subtag + + The primary language subtag is the first subtag in a language tag and + cannot be omitted, with two exceptions: + + o The single-character subtag 'x' as the primary subtag indicates + that the language tag consists solely of subtags whose meaning is + defined by private agreement. For example, in the tag "x-fr-CH", + the subtags 'fr' and 'CH' do not represent the French language or + the country of Switzerland (or any other value in the IANA + registry) unless there is a private agreement in place to do so. + See Section 4.6. + + o The single-character subtag 'i' is used by some grandfathered tags + (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other + grandfathered tags have a primary language subtag in their first + position.) + + The following rules apply to the primary language subtag: + + 1. Two-character primary language subtags were defined in the IANA + registry according to the assignments found in the standard "ISO + 639-1:2002, Codes for the representation of names of languages -- + Part 1: Alpha-2 code" [ISO639-1], or using assignments + subsequently made by the ISO 639-1 registration authority (RA) or + governing standardization bodies. + + 2. Three-character primary language subtags in the IANA registry + were defined according to the assignments found in one of these + additional ISO 639 parts or assignments subsequently made by the + relevant ISO 639 registration authorities or governing + standardization bodies: + + A. "ISO 639-2:1998 - Codes for the representation of names of + languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2] + + + + +Phillips & Davis Best Current Practice [Page 9] + +RFC 5646 Language Tags September 2009 + + + B. "ISO 639-3:2007 - Codes for the representation of names of + languages -- Part 3: Alpha-3 code for comprehensive coverage + of languages" [ISO639-3] + + C. "ISO 639-5:2008 - Codes for the representation of names of + languages -- Part 5: Alpha-3 code for language families and + groups" [ISO639-5] + + 3. The subtags in the range 'qaa' through 'qtz' are reserved for + private use in language tags. These subtags correspond to codes + reserved by ISO 639-2 for private use. These codes MAY be used + for non-registered primary language subtags (instead of using + private use subtags following 'x-'). Please refer to Section 4.6 + for more information on private use subtags. + + 4. Four-character language subtags are reserved for possible future + standardization. + + 5. Any language subtags of five to eight characters in length in the + IANA registry were defined via the registration process in + Section 3.5 and MAY be used to form the primary language subtag. + An example of what such a registration might include is the + grandfathered IANA registration "i-enochian". The subtag + 'enochian' could be registered in the IANA registry as a primary + language subtag (assuming that ISO 639 does not register this + language first), making tags such as "enochian-AQ" and "enochian- + Latn" valid. + + At the time this document was created, there were no examples of + this kind of subtag. Future registrations of this type are + discouraged: an attempt to register any new proposed primary + language MUST be made to the ISO 639 registration authority. + Proposals rejected by the ISO 639 registration authority are + unlikely to meet the criteria for primary language subtags and + are thus unlikely to be registered. + + 6. Other values MUST NOT be assigned to the primary subtag except by + revision or update of this document. + + When languages have both an ISO 639-1 two-character code and a three- + character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only + the ISO 639-1 two-character code is defined in the IANA registry. + + When a language has no ISO 639-1 two-character code and the ISO + 639-2/T (Terminology) code and the ISO 639-2/B (Bibliographic) code + for that language differ, only the Terminology code is defined in the + IANA registry. At the time this document was created, all languages + that had both kinds of three-character codes were also assigned a + + + +Phillips & Davis Best Current Practice [Page 10] + +RFC 5646 Language Tags September 2009 + + + two-character code; it is expected that future assignments of this + nature will not occur. + + In order to avoid instability in the canonical form of tags, if a + two-character code is added to ISO 639-1 for a language for which a + three-character code was already included in either ISO 639-2 or ISO + 639-3, the two-character code MUST NOT be registered. See + Section 3.4. + + For example, if some content were tagged with 'haw' (Hawaiian), which + currently has no two-character code, the tag would not need to be + changed if ISO 639-1 were to assign a two-character code to the + Hawaiian language at a later date. + + To avoid these problems with versioning and subtag choice (as + experienced during the transition between RFC 1766 and RFC 3066), as + well as to ensure the canonical nature of subtags defined by this + document, the ISO 639 Registration Authority Joint Advisory Committee + (ISO 639/RA-JAC) has included the following statement in + [iso639.prin]: + + "A language code already in ISO 639-2 at the point of freezing ISO + 639-1 shall not later be added to ISO 639-1. This is to ensure + consistency in usage over time, since users are directed in + Internet applications to employ the alpha-3 code when an alpha-2 + code for that language is not available." + +2.2.2. Extended Language Subtags + + Extended language subtags are used to identify certain specially + selected languages that, for various historical and compatibility + reasons, are closely identified with or tagged using an existing + primary language subtag. Extended language subtags are always used + with their enclosing primary language subtag (indicated with a + 'Prefix' field in the registry) when used to form the language tag. + All languages that have an extended language subtag in the registry + also have an identical primary language subtag record in the + registry. This primary language subtag is RECOMMENDED for forming + the language tag. The following rules apply to the extended language + subtags: + + 1. Extended language subtags consist solely of three-letter subtags. + All extended language subtag records defined in the registry were + defined according to the assignments found in [ISO639-3]. + Language collections and groupings, such as defined in + [ISO639-5], are specifically excluded from being extended + language subtags. + + + + +Phillips & Davis Best Current Practice [Page 11] + +RFC 5646 Language Tags September 2009 + + + 2. Extended language subtag records MUST include exactly one + 'Prefix' field indicating an appropriate subtag or sequence of + subtags for that extended language subtag. + + 3. Extended language subtag records MUST include a 'Preferred- + Value'. The 'Preferred-Value' and 'Subtag' fields MUST be + identical. + + 4. Although the ABNF production 'extlang' permits up to three + extended language tags in the language tag, extended language + subtags MUST NOT include another extended language subtag in + their 'Prefix'. That is, the second and third extended language + subtag positions in a language tag are permanently reserved and + tags that include those subtags in that position are, and will + always remain, invalid. + + For example, the macrolanguage Chinese ('zh') encompasses a number of + languages. For compatibility reasons, each of these languages has + both a primary and extended language subtag in the registry. A few + selected examples of these include Gan Chinese ('gan'), Cantonese + Chinese ('yue'), and Mandarin Chinese ('cmn'). Each is encompassed + by the macrolanguage 'zh' (Chinese). Therefore, they each have the + prefix "zh" in their registry records. Thus, Gan Chinese is + represented with tags beginning "zh-gan" or "gan", Cantonese with + tags beginning either "yue" or "zh-yue", and Mandarin Chinese with + "zh-cmn" or "cmn". The language subtag 'zh' can still be used + without an extended language subtag to label a resource as some + unspecified variety of Chinese, while the primary language subtag + ('gan', 'yue', 'cmn') is preferred to using the extended language + form ("zh-gan", "zh-yue", "zh-cmn"). + +2.2.3. Script Subtag + + Script subtags are used to indicate the script or writing system + variations that distinguish the written forms of a language or its + dialects. The following rules apply to the script subtags: + + 1. Script subtags MUST follow any primary and extended language + subtags and MUST precede any other type of subtag. + + 2. Script subtags consist of four letters and were defined according + to the assignments found in [ISO15924] ("Information and + documentation -- Codes for the representation of names of + scripts"), or subsequently assigned by the ISO 15924 registration + authority or governing standardization bodies. Only codes + assigned by ISO 15924 will be considered for registration. + + + + + +Phillips & Davis Best Current Practice [Page 12] + +RFC 5646 Language Tags September 2009 + + + 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private + use in language tags. These subtags correspond to codes reserved + by ISO 15924 for private use. These codes MAY be used for non- + registered script values. Please refer to Section 4.6 for more + information on private use subtags. + + 4. There MUST be at most one script subtag in a language tag, and + the script subtag SHOULD be omitted when it adds no + distinguishing value to the tag or when the primary or extended + language subtag's record in the subtag registry includes a + 'Suppress-Script' field listing the applicable script subtag. + + For example: "sr-Latn" represents Serbian written using the Latin + script. + +2.2.4. Region Subtag + + Region subtags are used to indicate linguistic variations associated + with or appropriate to a specific country, territory, or region. + Typically, a region subtag is used to indicate variations such as + regional dialects or usage, or region-specific spelling conventions. + It can also be used to indicate that content is expressed in a way + that is appropriate for use throughout a region, for instance, + Spanish content tailored to be useful throughout Latin America. + + The following rules apply to the region subtags: + + 1. Region subtags MUST follow any primary language, extended + language, or script subtags and MUST precede any other type of + subtag. + + 2. Two-letter region subtags were defined according to the + assignments found in [ISO3166-1] ("Codes for the representation + of names of countries and their subdivisions -- Part 1: Country + codes"), using the list of alpha-2 country codes or using + assignments subsequently made by the ISO 3166-1 maintenance + agency or governing standardization bodies. In addition, the + codes that are "exceptionally reserved" (as opposed to + "assigned") in ISO 3166-1 were also defined in the registry, with + the exception of 'UK', which is an exact synonym for the assigned + code 'GB'. + + 3. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are + reserved for private use in language tags. These subtags + correspond to codes reserved by ISO 3166 for private use. These + codes MAY be used for private use region subtags (instead of + using a private use subtag sequence). Please refer to + Section 4.6 for more information on private use subtags. + + + +Phillips & Davis Best Current Practice [Page 13] + +RFC 5646 Language Tags September 2009 + + + 4. Three-character region subtags consist solely of digit (number) + characters and were defined according to the assignments found in + the UN Standard Country or Area Codes for Statistical Use + [UN_M.49] or assignments subsequently made by the governing + standards body. Not all of the UN M.49 codes are defined in the + IANA registry. The following rules define which codes are + entered into the registry as valid subtags: + + A. UN numeric codes assigned to 'macro-geographical + (continental)' or sub-regions MUST be registered in the + registry. These codes are not associated with an assigned + ISO 3166-1 alpha-2 code and represent supra-national areas, + usually covering more than one nation, state, province, or + territory. + + B. UN numeric codes for 'economic groupings' or 'other + groupings' MUST NOT be registered in the IANA registry and + MUST NOT be used to form language tags. + + C. When ISO 3166-1 reassigns a code formerly used for one + country or area to another country or area and that code + already is present in the registry, the UN numeric code for + that country or area MUST be registered in the registry as + described in Section 3.4 and MUST be used to form language + tags that represent the country or region for which it is + defined (rather than the recycled ISO 3166-1 code). + + D. UN numeric codes for countries or areas for which there is an + associated ISO 3166-1 alpha-2 code in the registry MUST NOT + be entered into the registry and MUST NOT be used to form + language tags. Note that the ISO 3166-based subtag in the + registry MUST actually be associated with the UN M.49 code in + question. + + E. For historical reasons, the UN numeric code 830 (Channel + Islands), which was not registered at the time this document + was adopted and had, at that time, no corresponding ISO + 3166-1 code, MAY be entered into the IANA registry via the + process described in Section 3.5, provided no ISO 3166-1 code + with that exact meaning has been previously registered. + + F. All other UN numeric codes for countries or areas that do not + have an associated ISO 3166-1 alpha-2 code MUST NOT be + entered into the registry and MUST NOT be used to form + language tags. For more information about these codes, see + Section 3.4. + + + + + +Phillips & Davis Best Current Practice [Page 14] + +RFC 5646 Language Tags September 2009 + + + 5. The alphanumeric codes in Appendix X of the UN document MUST NOT + be entered into the registry and MUST NOT be used to form + language tags. (At the time this document was created, these + values matched the ISO 3166-1 alpha-2 codes.) + + 6. There MUST be at most one region subtag in a language tag and the + region subtag MAY be omitted, as when it adds no distinguishing + value to the tag. + + For example: + + "de-AT" represents German ('de') as used in Austria ('AT'). + + "sr-Latn-RS" represents Serbian ('sr') written using Latin script + ('Latn') as used in Serbia ('RS'). + + "es-419" represents Spanish ('es') appropriate to the UN-defined + Latin America and Caribbean region ('419'). + +2.2.5. Variant Subtags + + Variant subtags are used to indicate additional, well-recognized + variations that define a language or its dialects that are not + covered by other available subtags. The following rules apply to the + variant subtags: + + 1. Variant subtags MUST follow any primary language, extended + language, script, or region subtags and MUST precede any + extension or private use subtag sequences. + + 2. Variant subtags, as a collection, are not associated with any + particular external standard. The meaning of variant subtags in + the registry is defined in the course of the registration process + defined in Section 3.5. Note that any particular variant subtag + might be associated with some external standard. However, + association with a standard is not required for registration. + + 3. More than one variant MAY be used to form the language tag. + + 4. Variant subtags MUST be registered with IANA according to the + rules in Section 3.5 of this document before being used to form + language tags. In order to distinguish variants from other types + of subtags, registrations MUST meet the following length and + content restrictions: + + 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be + at least five characters long. + + + + +Phillips & Davis Best Current Practice [Page 15] + +RFC 5646 Language Tags September 2009 + + + 2. Variant subtags that begin with a digit (0-9) MUST be at + least four characters long. + + 5. The same variant subtag MUST NOT be used more than once within a + language tag. + + * For example, the tag "de-DE-1901-1901" is not valid. + + Variant subtag records in the Language Subtag Registry MAY include + one or more 'Prefix' (Section 3.1.8) fields. Each 'Prefix' indicates + a suitable sequence of subtags for forming (with other subtags, as + appropriate) a language tag when using the variant. + + Most variants that share a prefix are mutually exclusive. For + example, the German orthographic variations '1996' and '1901' SHOULD + NOT be used in the same tag, as they represent the dates of different + spelling reforms. A variant that can meaningfully be used in + combination with another variant SHOULD include a 'Prefix' field in + its registry record that lists that other variant. For example, if + another German variant 'example' were created that made sense to use + with '1996', then 'example' should include two 'Prefix' fields: "de" + and "de-1996". + + For example: + + "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian. + + "de-CH-1996" represents German as used in Switzerland and as + written using the spelling reform beginning in the year 1996 C.E. + +2.2.6. Extension Subtags + + Extensions provide a mechanism for extending language tags for use in + various applications. They are intended to identify information that + is commonly used in association with languages or language tags but + that is not part of language identification. See Section 3.7. The + following rules apply to extensions: + + 1. An extension MUST follow at least a primary language subtag. + That is, a language tag cannot begin with an extension. + Extensions extend language tags, they do not override or replace + them. For example, "a-value" is not a well-formed language tag, + while "de-a-value" is. Note that extensions cannot be used in + tags that are entirely private use (that is, tags starting with + "x-"). + + + + + + +Phillips & Davis Best Current Practice [Page 16] + +RFC 5646 Language Tags September 2009 + + + 2. Extension subtags are separated from the other subtags defined in + this document by a single-character subtag (called a + "singleton"). The singleton MUST be one allocated to a + registration authority via the mechanism described in Section 3.7 + and MUST NOT be the letter 'x', which is reserved for private use + subtag sequences. + + 3. Each singleton subtag MUST appear at most one time in each tag + (other than as a private use subtag). That is, singleton subtags + MUST NOT be repeated. For example, the tag "en-a-bbb-a-ccc" is + invalid because the subtag 'a' appears twice. Note that the tag + "en-a-bbb-x-a-ccc" is valid because the second appearance of the + singleton 'a' is in a private use sequence. + + 4. Extension subtags MUST meet whatever requirements are set by the + document that defines their singleton prefix and whatever + requirements are provided by the maintaining authority. Note + that there might not be a registry of these subtags and + validating processors are not required to validate extensions. + + 5. Each extension subtag MUST be from two to eight characters long + and consist solely of letters or digits, with each subtag + separated by a single '-'. Case distinctions are ignored in + extensions (as with any language subtag) and normalized subtags + of this type are expected to be in lowercase. + + 6. Each singleton MUST be followed by at least one extension subtag. + For example, the tag "tlh-a-b-foo" is invalid because the first + singleton 'a' is followed immediately by another singleton 'b'. + + 7. Extension subtags MUST follow all primary language, extended + language, script, region, and variant subtags in a tag and MUST + precede any private use subtag sequences. + + 8. All subtags following the singleton and before another singleton + are part of the extension. Example: In the tag "fr-a-Latn", the + subtag 'Latn' does not represent the script subtag 'Latn' defined + in the IANA Language Subtag Registry. Its meaning is defined by + the extension 'a'. + + 9. In the event that more than one extension appears in a single + tag, the tag SHOULD be canonicalized as described in Section 4.5, + by ordering the various extension sequences into case-insensitive + ASCII order. + + For example, if an extension were defined for the singleton 'r' and + it defined the subtags shown, then the following tag would be a valid + example: "en-Latn-GB-boont-r-extended-sequence-x-private". + + + +Phillips & Davis Best Current Practice [Page 17] + +RFC 5646 Language Tags September 2009 + + +2.2.7. Private Use Subtags + + Private use subtags are used to indicate distinctions in language + that are important in a given context by private agreement. The + following rules apply to private use subtags: + + 1. Private use subtags are separated from the other subtags defined + in this document by the reserved single-character subtag 'x'. + + 2. Private use subtags MUST conform to the format and content + constraints defined in the ABNF for all subtags; that is, they + MUST consist solely of letters and digits and not exceed eight + characters in length. + + 3. Private use subtags MUST follow all primary language, extended + language, script, region, variant, and extension subtags in the + tag. Another way of saying this is that all subtags following + the singleton 'x' MUST be considered private use. Example: The + subtag 'US' in the tag "en-x-US" is a private use subtag. + + 4. A tag MAY consist entirely of private use subtags. + + 5. No source is defined for private use subtags. Use of private use + subtags is by private agreement only. + + 6. Private use subtags are NOT RECOMMENDED where alternatives exist + or for general interchange. See Section 4.6 for more information + on private use subtag choice. + + For example, suppose a group of scholars is studying some texts in + medieval Greek. They might agree to use some collection of private + use subtags to identify different styles of writing in the texts. + For example, they might use 'el-x-koine' for documents in the + "common" style while using 'el-x-attic' for other documents that + mimic the Attic style. These subtags would not be recognized by + outside processes or systems, but might be useful in categorizing + various texts for study by those in the group. + + In the registry, there are also subtags derived from codes reserved + by ISO 639, ISO 15924, or ISO 3166 for private use. Do not confuse + these with private use subtag sequences following the subtag 'x'. + See Section 4.6. + +2.2.8. Grandfathered and Redundant Registrations + + Prior to RFC 4646, whole language tags were registered according to + the rules in RFC 1766 and/or RFC 3066. All of these registered tags + remain valid as language tags. + + + +Phillips & Davis Best Current Practice [Page 18] + +RFC 5646 Language Tags September 2009 + + + Many of these registered tags were made redundant by the advent of + either RFC 4646 or this document. A redundant tag is a grandfathered + registration whose individual subtags appear with the same semantic + meaning in the registry. For example, the tag "zh-Hant" (Traditional + Chinese) can now be composed from the subtags 'zh' (Chinese) and + 'Hant' (Han script traditional variant). These redundant tags are + maintained in the registry as records of type 'redundant', mostly as + a matter of historical curiosity. + + The remainder of the previously registered tags are "grandfathered". + These tags are classified into two groups: 'regular' and 'irregular'. + + Grandfathered tags that (appear to) match the 'langtag' production in + Figure 1 are considered 'regular' grandfathered tags. These tags + contain one or more subtags that either do not individually appear in + the registry or appear but with a different semantic meaning: each + tag, in its entirety, represents a language or collection of + languages. + + Grandfathered tags that do not match the 'langtag' production in the + ABNF and would otherwise be invalid are considered 'irregular' + grandfathered tags. With the exception of "en-GB-oed", which is a + variant of "en-GB", each of them, in its entirety, represents a + language. + + Many of the grandfathered tags have been superseded by the subsequent + addition of new subtags: each superseded record contains a + 'Preferred-Value' field that ought to be used to form language tags + representing that value. For example, the tag "art-lojban" is + superseded by the primary language subtag 'jbo'. + +2.2.9. Classes of Conformance + + Implementations sometimes need to describe their capabilities with + regard to the rules and practices described in this document. Tags + can be checked or verified in a number of ways, but two particular + classes of tag conformance are formally defined here. + + A tag is considered "well-formed" if it conforms to the ABNF + (Section 2.1). Language tags may be well-formed in terms of syntax + but not valid in terms of content. However, many operations + involving language tags work well without knowing anything about the + meaning or validity of the subtags. + + A tag is considered "valid" if it satisfies these conditions: + + o The tag is well-formed. + + + + +Phillips & Davis Best Current Practice [Page 19] + +RFC 5646 Language Tags September 2009 + + + o Either the tag is in the list of grandfathered tags or all of its + primary language, extended language, script, region, and variant + subtags appear in the IANA Language Subtag Registry as of the + particular registry date. + + o There are no duplicate variant subtags. + + o There are no duplicate singleton (extension) subtags. + + Note that a tag's validity depends on the date of the registry used + to validate the tag. A more recent copy of the registry might + contain a subtag that an older version does not. + + A tag is considered valid for a given extension (Section 3.7) (as of + a particular version, revision, and date) if it meets the criteria + for "valid" above and also satisfies this condition: + + Each subtag used in the extension part of the tag is valid + according to the extension. + + Older specifications or language tag implementations sometimes + reference [RFC3066]. A wider array of tags was considered well- + formed under that document. Any tags that were valid for use under + RFC 3066 are both well-formed and valid under this document's syntax; + only invalid or illegal tags were well-formed under the earlier + definition but no longer are. The language tag syntax under RFC 3066 + was: + + obs-language-tag = primary-subtag *( "-" subtag ) + primary-subtag = 1*8ALPHA + subtag = 1*8(ALPHA / DIGIT) + + Figure 2: RFC 3066 Language Tag Syntax + + Subtags designated for private use as well as private use sequences + introduced by the 'x' subtag are available for cases in which no + assigned subtags are available and registration is not a suitable + option. For example, one might use a tag such as "no-QQ", where 'QQ' + is one of a range of private use ISO 3166-1 codes to indicate an + otherwise undefined region. Users MUST NOT assign language tags that + use subtags that do not appear in the registry other than in private + use sequences (such as the subtag 'personal' in the tag "en-x- + personal"). Besides not being valid, the user also risks collision + with a future possible assignment or registrations. + + Note well: although the 'Language-Tag' production appearing in this + document is functionally equivalent to the one in [RFC4646], it has + + + + +Phillips & Davis Best Current Practice [Page 20] + +RFC 5646 Language Tags September 2009 + + + been changed to prevent certain errors in well-formedness arising + from the old 'grandfathered' production. + +3. Registry Format and Maintenance + + The IANA Language Subtag Registry ("the registry") contains a + comprehensive list of all of the subtags valid in language tags. + This allows implementers a straightforward and reliable way to + validate language tags. The registry will be maintained so that, + except for extension subtags, it is possible to validate all of the + subtags that appear in a language tag under the provisions of this + document or its revisions or successors. In addition, the meaning of + the various subtags will be unambiguous and stable over time. (The + meaning of private use subtags, of course, is not defined by the + registry.) + + This section defines the registry along with the maintenance and + update procedures associated with it, as well as a registry for + extensions to language tags (Section 3.7). + +3.1. Format of the IANA Language Subtag Registry + + The IANA Language Subtag Registry is a machine-readable file in the + format described in this section, plus copies of the registration + forms approved in accordance with the process described in + Section 3.5. + + The existing registration forms for grandfathered and redundant tags + taken from RFC 3066 have been maintained as part of the obsolete RFC + 3066 registry. The subtags added to the registry by either [RFC4645] + or [RFC5645] do not have separate registration forms (so no forms are + archived for these additions). + +3.1.1. File Format + + The registry is a [Unicode] text file and consists of a series of + records in a format based on "record-jar" (described in + [record-jar]). Each record, in turn, consists of a series of fields + that describe the various subtags and tags. The actual registry file + is encoded using the UTF-8 [RFC3629] character encoding. + + Each field can be considered a single, logical line of characters. + Each field contains a "field-name" and a "field-body". These are + separated by a "field-separator". The field-separator is a COLON + character (U+003A) plus any surrounding whitespace. Each field is + terminated by the newline sequence CRLF. The text in each field MUST + be in Unicode Normalization Form C (NFC). + + + + +Phillips & Davis Best Current Practice [Page 21] + +RFC 5646 Language Tags September 2009 + + + A collection of fields forms a "record". Records are separated by + lines containing only the sequence "%%" (U+0025 U+0025). + + Although fields are logically a single line of text, each line of + text in the file format is limited to 72 bytes in length. To + accommodate this, the field-body can be split into a multiple-line + representation; this is called "folding". Folding is done according + to customary conventions for line-wrapping. This is typically on + whitespace boundaries, but can occur between other characters when + the value does not include spaces, such as when a language does not + use whitespace between words. In any event, there MUST NOT be breaks + inside a multibyte UTF-8 sequence or in the middle of a combining + character sequence. For more information, see [UAX14]. + + Although the file format uses the Unicode character set and the file + itself is encoded using the UTF-8 encoding, fields are restricted to + the printable characters from the US-ASCII [ISO646] repertoire unless + otherwise indicated in the description of a specific field + (Section 3.1.2). + + The format of the registry is described by the following ABNF + [RFC5234]. Character numbers (code points) are taken from Unicode, + and terminals in the ABNF productions are in terms of characters + rather than bytes. + + registry = record *("%%" CRLF record) + record = 1*field + field = ( field-name field-sep field-body CRLF ) + field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)] + field-sep = *SP ":" *SP + field-body = *([[*SP CRLF] 1*SP] 1*CHARS) + CHARS = (%x21-10FFFF) ; Unicode code points + + Figure 3: Registry Format ABNF + + The sequence '..' (U+002E U+002E) in a field-body denotes a range of + values. Such a range represents all subtags of the same length that + are in alphabetic or numeric order within that range, including the + values explicitly mentioned. For example, 'a..c' denotes the values + 'a', 'b', and 'c', and '11..13' denotes the values '11', '12', and + '13'. + + All fields whose field-body contains a date value use the "full-date" + format specified in [RFC3339]. For example, "2004-06-28" represents + June 28, 2004, in the Gregorian calendar. + + + + + + +Phillips & Davis Best Current Practice [Page 22] + +RFC 5646 Language Tags September 2009 + + +3.1.2. Record and Field Definitions + + There are three types of records in the registry: "File-Date", + "Subtag", and "Tag". + + The first record in the registry is always the "File-Date" record. + This record occurs only once in the file and contains a single field + whose field-name is "File-Date". The field-body of this record + contains a date (see Section 5.1), making it possible to easily + recognize different versions of the registry. + + File-Date: 2004-06-28 + %% + + Figure 4: Example of the File-Date Record + + Subsequent records contain multiple fields and represent information + about either subtags or tags. Both types of records have an + identical structure, except that "Subtag" records contain a field + with a field-name of "Subtag", while, unsurprisingly, "Tag" records + contain a field with a field-name of "Tag". Field-names MUST NOT + occur more than once per record, with the exception of the + 'Description', 'Comments', and 'Prefix' fields. + + Each record MUST contain at least one of each of the following + fields: + + o 'Type' + + * Type's field-body MUST consist of one of the following strings: + "language", "extlang", "script", "region", "variant", + "grandfathered", and "redundant"; it denotes the type of tag or + subtag. + + o Either 'Subtag' or 'Tag' + + * Subtag's field-body contains the subtag being defined. This + field MUST appear in all records whose 'Type' has one of these + values: "language", "extlang", "script", "region", or + "variant". + + * Tag's field-body contains a complete language tag. This field + MUST appear in all records whose 'Type' has one of these + values: "grandfathered" or "redundant". If the 'Type' is + "grandfathered", then the 'Tag' field-body will be one of the + tags listed in either the 'regular' or 'irregular' production + found in Section 2.1. + + + + +Phillips & Davis Best Current Practice [Page 23] + +RFC 5646 Language Tags September 2009 + + + o 'Description' + + * Description's field-body contains a non-normative description + of the subtag or tag. + + o 'Added' + + * Added's field-body contains the date the record was registered + or, in the case of grandfathered or redundant tags, the date + the corresponding tag was registered under the rules of + [RFC1766] or [RFC3066]. + + Each record MAY also contain the following fields: + + o 'Deprecated' + + * Deprecated's field-body contains the date the record was + deprecated. In some cases, this value is earlier than that of + the 'Added' field in the same record. That is, the date of + deprecation preceded the addition of the record to the + registry. + + o 'Preferred-Value' + + * Preferred-Value's field-body contains a canonical mapping from + this record's value to a modern equivalent that is preferred in + its place. Depending on the value of the 'Type' field, this + value can take different forms: + + + For fields of type 'language', 'Preferred-Value' contains + the primary language subtag that is preferred when forming + the language tag. + + + For fields of type 'script', 'region', or 'variant', + 'Preferred-Value' contains the subtag of the same type that + is preferred for forming the language tag. + + + For fields of type 'extlang', 'grandfathered', or + 'redundant', 'Preferred-Value' contains an "extended + language range" [RFC4647] that is preferred for forming the + language tag. That is, the preferred language tag will + contain, in order, each of the subtags that appears in the + 'Preferred-Value'; additional fields can be included in a + language tag, as described elsewhere in this document. For + example, the replacement for the grandfathered tag "zh-min- + nan" (Min Nan Chinese) is "nan", which can be used as the + + + + + +Phillips & Davis Best Current Practice [Page 24] + +RFC 5646 Language Tags September 2009 + + + basis for tags such as "nan-Hant" or "nan-TW" (note that the + extended language subtag form such as "zh-nan-Hant" or "zh- + nan-TW" can also be used). + + o 'Prefix' + + * Prefix's field-body contains a valid language tag that is + RECOMMENDED as one possible prefix to this record's subtag. + This field MAY appear in records whose 'Type' field-body is + either 'extlang' or 'variant' (it MUST NOT appear in any other + record type). + + o 'Suppress-Script' + + * Suppress-Script's field-body contains a script subtag that + SHOULD NOT be used to form language tags with the associated + primary or extended language subtag. This field MUST appear + only in records whose 'Type' field-body is 'language' or + 'extlang'. See Section 4.1. + + o 'Macrolanguage' + + * Macrolanguage's field-body contains a primary language subtag + defined by ISO 639 as the "macrolanguage" that encompasses this + language subtag. This field MUST appear only in records whose + 'Type' field-body is either 'language' or 'extlang'. + + o 'Scope' + + * Scope's field-body contains information about a primary or + extended language subtag indicating the type of language code + according to ISO 639. The values permitted in this field are + "macrolanguage", "collection", "special", and "private-use". + This field only appears in records whose 'Type' field-body is + either 'language' or 'extlang'. When this field is omitted, + the language is an individual language. + + o 'Comments' + + * Comments's field-body contains additional information about the + subtag, as deemed appropriate for understanding the registry + and implementing language tags using the subtag or tag. + + Future versions of this document might add additional fields to the + registry; implementations SHOULD ignore fields found in the registry + that are not defined in this document. + + + + + +Phillips & Davis Best Current Practice [Page 25] + +RFC 5646 Language Tags September 2009 + + +3.1.3. Type Field + + The field 'Type' contains the string identifying the record type in + which it appears. Values for the 'Type' field-body are: "language" + (Section 2.2.1); "extlang" (Section 2.2.2); "script" (Section 2.2.3); + "region" (Section 2.2.4); "variant" (Section 2.2.5); "grandfathered" + or "redundant" (Section 2.2.8). + +3.1.4. Subtag and Tag Fields + + The field 'Subtag' contains the subtag defined in the record. The + field 'Tag' appears in records whose 'Type' is either 'grandfathered' + or 'redundant' and contains a tag registered under [RFC3066]. + + The 'Subtag' field-body MUST follow the casing conventions described + in Section 2.1.1. All subtags use lowercase letters in the field- + body, with two exceptions: + + Subtags whose 'Type' field is 'script' (in other words, subtags + defined by ISO 15924) MUST use titlecase. + + Subtags whose 'Type' field is 'region' (in other words, the non- + numeric region subtags defined by ISO 3166-1) MUST use all + uppercase. + + The 'Tag' field-body MUST be formatted according to the rules + described in Section 2.1.1. + +3.1.5. Description Field + + The field 'Description' contains a description of the tag or subtag + in the record. The 'Description' field MAY appear more than once per + record. The 'Description' field MAY include the full range of + Unicode characters. At least one of the 'Description' fields MUST be + written or transcribed into the Latin script; additional + 'Description' fields MAY be in any script or language. + + The 'Description' field is used for identification purposes. + Descriptions SHOULD contain all and only that information necessary + to distinguish one subtag from others with which it might be + confused. They are not intended to provide general background + information or to provide all possible alternate names or + designations. 'Description' fields don't necessarily represent the + actual native name of the item in the record, nor are any of the + descriptions guaranteed to be in any particular language (such as + English or French, for example). + + + + + +Phillips & Davis Best Current Practice [Page 26] + +RFC 5646 Language Tags September 2009 + + + Descriptions in the registry that correspond to ISO 639, ISO 15924, + ISO 3166-1, or UN M.49 codes are intended only to indicate the + meaning of that identifier as defined in the source standard at the + time it was added to the registry or as subsequently modified, within + the bounds of the stability rules (Section 3.4), via subsequent + registration. The 'Description' does not replace the content of the + source standard itself. 'Description' fields are not intended to be + the localized English names for the subtags. Localization or + translation of language tag and subtag descriptions is out of scope + of this document. + + For subtags taken from a source standard (such as ISO 639 or ISO + 15924), the 'Description' fields in the record are also initially + taken from that source standard. Multiple descriptions in the source + standard are split into separate 'Description' fields. The source + standard's descriptions MAY be edited or modified, either prior to + insertion or via the registration process, and additional or + extraneous descriptions omitted or removed. Each 'Description' field + MUST be unique within the record in which it appears, and formatting + variations of the same description SHOULD NOT occur in that specific + record. For example, while the ISO 639-1 code 'fy' has both the + description "Western Frisian" and the description "Frisian, Western" + in that standard, only one of these descriptions appears in the + registry. + + To help ensure that users do not become confused about which subtag + to use, 'Description' fields assigned to a record of any specific + type ('language', 'extlang', 'script', and so on) MUST be unique + within that given record type with the following exception: if a + particular 'Description' field occurs in multiple records of a given + type, then at most one of the records can omit the 'Deprecated' + field. All deprecated records that share a 'Description' MUST have + the same 'Preferred-Value', and all non-deprecated records MUST be + that 'Preferred-Value'. This means that two records of the same type + that share a 'Description' are also semantically equivalent and no + more than one record with a given 'Description' is preferred for that + meaning. + + For example, consider the 'language' subtags 'zza' (Zaza) and 'diq' + (Dimli). It so happens that 'zza' is a macrolanguage enclosing 'diq' + and thus also has a description in ISO 639-3 of "Dimli". This + description was edited to read "Dimli (macrolanguage)" in the + registry record for 'zza' to prevent a collision. + + By contrast, the subtags 'he' and 'iw' share a 'Description' value of + "Hebrew"; this is permitted because 'iw' is deprecated and its + 'Preferred-Value' is 'he'. + + + + +Phillips & Davis Best Current Practice [Page 27] + +RFC 5646 Language Tags September 2009 + + + For fields of type 'language', the first 'Description' field + appearing in the registry corresponds whenever possible to the + Reference Name assigned by ISO 639-3. This helps facilitate cross- + referencing between ISO 639 and the registry. + + When creating or updating a record due to the action of one of the + source standards, the Language Subtag Reviewer MAY edit descriptions + to correct irregularities in formatting (such as misspellings, + inappropriate apostrophes or other punctuation, or excessive or + missing spaces) prior to submitting the proposed record to the + ietf-languages@iana.org list for consideration. + +3.1.6. Deprecated Field + + The field 'Deprecated' contains the date the record was deprecated + and MAY be added, changed, or removed from any record via the + maintenance process described in Section 3.3 or via the registration + process described in Section 3.5. Usually, the addition of a + 'Deprecated' field is due to the action of one of the standards + bodies, such as ISO 3166, withdrawing a code. Although valid in + language tags, subtags and tags with a 'Deprecated' field are + deprecated, and validating processors SHOULD NOT generate these + subtags. Note that a record that contains a 'Deprecated' field and + no corresponding 'Preferred-Value' field has no replacement mapping. + + In some historical cases, it might not have been possible to + reconstruct the original deprecation date. For these cases, an + approximate date appears in the registry. Some subtags and some + grandfathered or redundant tags were deprecated before the initial + creation of the registry. The exact rules for this appear in Section + 2 of [RFC4645]. Note that these records have a 'Deprecated' field + with an earlier date then the corresponding 'Added' field! + +3.1.7. Preferred-Value Field + + The field 'Preferred-Value' contains a mapping between the record in + which it appears and another tag or subtag (depending on the record's + 'Type'). The value in this field is used for canonicalization (see + Section 4.5). In cases where the subtag or tag also has a + 'Deprecated' field, then the 'Preferred-Value' is RECOMMENDED as the + best choice to represent the value of this record when selecting a + language tag. + + Records containing a 'Preferred-Value' fall into one of these four + groups: + + + + + + +Phillips & Davis Best Current Practice [Page 28] + +RFC 5646 Language Tags September 2009 + + + 1. ISO 639 language codes that were later withdrawn in favor of + other codes. These values are mostly a historical curiosity. + The 'he'/'iw' pairing above is an example of this. + + 2. Subtags (with types other than language or extlang) taken from + codes or values that have been withdrawn in favor of a new code. + In particular, this applies to region subtags taken from ISO + 3166-1, because sometimes a country will change its name or + administration in such a way that warrants a new region code. In + some cases, countries have reverted to an older name, which might + already be encoded. For example, the subtag 'ZR' (Zaire) was + replaced by the subtag 'CD' (Democratic Republic of the Congo) + when that country's name was changed. + + 3. Tags or subtags that have become obsolete because the values they + represent were later encoded. Many of the grandfathered or + redundant tags were later encoded by ISO 639, for example, and + fall into this grouping. For example, "i-klingon" was deprecated + when the subtag 'tlh' was added. The record for "i-klingon" has + a 'Preferred-Value' of 'tlh'. + + 4. Extended language subtags always have a mapping to their + identical primary language subtag. For example, the extended + language subtag 'yue' (Cantonese) can be used to form the tag + "zh-yue". It has a 'Preferred-Value' mapping to the primary + language subtag 'yue', meaning that a tag such as + "zh-yue-Hant-HK" can be canonicalized to "yue-Hant-HK". + + Records other than those of type 'extlang' that contain a 'Preferred- + Value' field MUST also have a 'Deprecated' field. This field + contains the date on which the tag or subtag was deprecated in favor + of the preferred value. + + For records of type 'extlang', the 'Preferred-Value' field appears + without a corresponding 'Deprecated' field. An implementation MAY + ignore these preferred value mappings, although if it ignores the + mapping, it SHOULD do so consistently. It SHOULD also treat the + 'Preferred-Value' as equivalent to the mapped item. For example, the + tags "zh-yue-Hant-HK" and "yue-Hant-HK" are semantically equivalent + and ought to be treated as if they were the same tag. + + Occasionally, the deprecated code is preferred in certain contexts. + For example, both "iw" and "he" can be used in the Java programming + language, but "he" is converted on input to "iw", which is thus the + canonical form in Java. + + + + + + +Phillips & Davis Best Current Practice [Page 29] + +RFC 5646 Language Tags September 2009 + + + 'Preferred-Value' mappings in records of type 'region' sometimes do + not represent exactly the same meaning as the original value. There + are many reasons for a country code to be changed, and the effect + this has on the formation of language tags will depend on the nature + of the change in question. For example, the region subtag 'YD' + (Democratic Yemen) was deprecated in favor of the subtag 'YE' (Yemen) + when those two countries unified in 1990. + + A 'Preferred-Value' MAY be added to, changed, or removed from records + according to the rules in Section 3.3. Addition, modification, or + removal of a 'Preferred-Value' field in a record does not imply that + content using the affected subtag needs to be retagged. + + The 'Preferred-Value' fields in records of type "grandfathered" and + "redundant" each contain an "extended language range" [RFC4647] that + is strongly RECOMMENDED for use in place of the record's value. In + many cases, these mappings were created via deprecation of the tags + during the period before [RFC4646] was adopted. For example, the tag + "no-nyn" was deprecated in favor of the ISO 639-1-defined language + code 'nn'. + + The 'Preferred-Value' field in subtag records of type "extlang" also + contains an "extended language range". This allows the subtag to be + deprecated in favor of either a single primary language subtag or a + new language-extlang sequence. + + Usually, the addition, removal, or change of a 'Preferred-Value' + field for a subtag is done to reflect changes in one of the source + standards. For example, if an ISO 3166-1 region code is deprecated + in favor of another code, that SHOULD result in the addition of a + 'Preferred-Value' field. + + Changes to one subtag can affect other subtags as well: when + proposing changes to the registry, the Language Subtag Reviewer MUST + review the registry for such effects and propose the necessary + changes using the process in Section 3.5, although anyone MAY request + such changes. For example: + + Suppose that subtag 'XX' has a 'Preferred-Value' of 'YY'. If 'YY' + later changes to have a 'Preferred-Value' of 'ZZ', then the + 'Preferred-Value' for 'XX' MUST also change to be 'ZZ'. + + Suppose that a registered language subtag 'dialect' represents a + language not yet available in any part of ISO 639. The later + addition of a corresponding language code in ISO 639 SHOULD result + in the addition of a 'Preferred-Value' for 'dialect'. + + + + + +Phillips & Davis Best Current Practice [Page 30] + +RFC 5646 Language Tags September 2009 + + +3.1.8. Prefix Field + + The field 'Prefix' contains a valid language tag that is RECOMMENDED + as one possible prefix to this record's subtag, perhaps with other + subtags. That is, when including an extended language or a variant + subtag that has at least one 'Prefix' in a language tag, the + resulting tag SHOULD match at least one of the subtag's 'Prefix' + fields using the "Extended Filtering" algorithm (see [RFC4647]), and + each of the subtags in that 'Prefix' SHOULD appear before the subtag + itself. + + The 'Prefix' field MUST appear exactly once in a record of type + 'extlang'. The 'Prefix' field MAY appear multiple times (or not at + all) in records of type 'variant'. Additional fields of this type + MAY be added to a 'variant' record via the registration process, + provided the 'variant' record already has at least one 'Prefix' + field. + + Each 'Prefix' field indicates a particular sequence of subtags that + form a meaningful tag with this subtag. For example, the extended + language subtag 'cmn' (Mandarin Chinese) only makes sense with its + prefix 'zh' (Chinese). Similarly, 'rozaj' (Resian, a dialect of + Slovenian) would be appropriate when used with its prefix 'sl' + (Slovenian), while tags such as "is-1994" are not appropriate (and + probably not meaningful). Although the 'Prefix' for 'rozaj' is "sl", + other subtags might appear between them. For example, the tag "sl- + IT-rozaj" (Slovenian, Italy, Resian) matches the 'Prefix' "sl". + + The 'Prefix' also indicates when variant subtags make sense when used + together (many that otherwise share a 'Prefix' are mutually + exclusive) and what the relative ordering of variants is supposed to + be. For example, the variant '1994' (Standardized Resian + orthography) has several 'Prefix' fields in the registry ("sl-rozaj", + "sl-rozaj-biske", "sl-rozaj-njiva", "sl-rozaj-osojs", and "sl-rozaj- + solba"). This indicates not only that '1994' is appropriate to use + with each of these five Resian variant subtags ('rozaj', 'biske', + 'njiva', 'osojs', and 'solba'), but also that it SHOULD appear + following any of these variants in a tag. Thus, the language tag + ought to take the form "sl-rozaj-biske-1994", rather than "sl-1994- + rozaj-biske" or "sl-rozaj-1994-biske". + + If a record includes no 'Prefix' field, a 'Prefix' field MUST NOT be + added to the record at a later date. Otherwise, changes (additions, + deletions, or modifications) to the set of 'Prefix' fields MAY be + registered, as long as they strictly widen the range of language tags + that are recommended. For example, a 'Prefix' with the value "be- + Latn" (Belarusian, Latin script) could be replaced by the value "be" + (Belarusian) but not by the value "ru-Latn" (Russian, Latin script) + + + +Phillips & Davis Best Current Practice [Page 31] + +RFC 5646 Language Tags September 2009 + + + or the value "be-Latn-BY" (Belarusian, Latin script, Belarus), since + these latter either change or narrow the range of suggested tags. + + The field-body of the 'Prefix' field MUST NOT conflict with any + 'Prefix' already registered for a given record. Such a conflict + would occur when no valid tag could be constructed that would contain + the prefix, such as when two subtags each have a 'Prefix' that + contains the other subtag. For example, suppose that the subtag + 'avariant' has the prefix "es-bvariant". Then the subtag 'bvariant' + cannot be assigned the prefix 'avariant', for that would require a + tag of the form "es-avariant-bvariant-avariant", which would not be + valid. + +3.1.9. Suppress-Script Field + + The field 'Suppress-Script' contains a script subtag (whose record + appears in the registry). The field 'Suppress-Script' MUST appear + only in records whose 'Type' field-body is either 'language' or + 'extlang'. This field MUST NOT appear more than one time in a + record. + + This field indicates a script used to write the overwhelming majority + of documents for the given language. The subtag for such a script + therefore adds no distinguishing information to a language tag and + thus SHOULD NOT be used for most documents in that language. + Omitting the script subtag indicated by this field helps ensure + greater compatibility between the language tags generated according + to the rules in this document and language tags and tag processors or + consumers based on RFC 3066. For example, virtually all Icelandic + documents are written in the Latin script, making the subtag 'Latn' + redundant in the tag "is-Latn". + + Many language subtag records do not have a 'Suppress-Script' field. + The lack of a 'Suppress-Script' might indicate that the language is + customarily written in more than one script or that the language is + not customarily written at all. It might also mean that sufficient + information was not available when the record was created and thus + remains a candidate for future registration. + +3.1.10. Macrolanguage Field + + The field 'Macrolanguage' contains a primary language subtag (whose + record appears in the registry). This field indicates a language + that encompasses this subtag's language according to assignments made + by ISO 639-3. + + ISO 639-3 labels some languages in the registry as "macrolanguages". + ISO 639-3 defines the term "macrolanguage" to mean "clusters of + + + +Phillips & Davis Best Current Practice [Page 32] + +RFC 5646 Language Tags September 2009 + + + closely-related language varieties that [...] can be considered + distinct individual languages, yet in certain usage contexts a single + language identity for all is needed". These correspond to codes + registered in ISO 639-2 as individual languages that were found to + correspond to more than one language in ISO 639-3. + + A language contained within a macrolanguage is called an "encompassed + language". The record for each encompassed language contains a + 'Macrolanguage' field in the registry; the macrolanguages themselves + are not specially marked. Note that some encompassed languages have + ISO 639-1 or ISO 639-2 codes. + + The 'Macrolanguage' field can only occur in records of type + 'language' or 'extlang'. Only values assigned by ISO 639-3 will be + considered for inclusion. 'Macrolanguage' fields MAY be added or + removed via the normal registration process whenever ISO 639-3 + defines new values or withdraws old values. Macrolanguages are + informational, and MAY be removed or changed if ISO 639-3 changes the + values. For more information on the use of this field and choosing + between macrolanguage and encompassed language subtags, see + Section 4.1.1. + + For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn' + (Norwegian Nynorsk) each have a 'Macrolanguage' field with a value of + 'no' (Norwegian). For more information, see Section 4.1. + +3.1.11. Scope Field + + The field 'Scope' contains classification information about a primary + or extended language subtag derived from ISO 639. Most languages + have a scope of 'individual', which means that the language is not a + macrolanguage, collection, special code, or private use. That is, it + is what one would normally consider to be 'a language'. Any primary + or extended language subtag that has no 'Scope' field is an + individual language. + + 'Scope' information can sometimes be helpful in selecting language + tags, since it indicates the purpose or "scope" of the code + assignment within ISO 639. The available values are: + + o 'macrolanguage' - Indicates a macrolanguage as defined by ISO + 639-3 (see Section 3.1.10). A macrolanguage is a cluster of + closely related languages that are sometimes considered to be a + single language. + + o 'collection' - Indicates a subtag that represents a collection of + languages, typically related by some type of historical, + geographical, or linguistic association. Unlike a macrolanguage, + + + +Phillips & Davis Best Current Practice [Page 33] + +RFC 5646 Language Tags September 2009 + + + a collection can contain languages that are only loosely related + and a collection cannot be used interchangeably with languages + that belong to it. + + o 'special' - Indicates a special language code. These are subtags + used for identifying linguistic attributes not particularly + associated with a concrete language. These include codes for when + the language is undetermined or for non-linguistic content. + + o 'private-use' - Indicates a code reserved for private use in the + underlying standard. Subtags with this scope can be used to + indicate a primary language for which no ISO 639 or registered + assignment exists. + + The 'Scope' field MAY appear in records of type 'language' or + 'extlang'. Note that many of the prefixes for extended language + subtags will have a 'Scope' of 'macrolanguage' (although some will + not) and that many languages that have a 'Scope' of 'macrolanguage' + will have extended language subtags associated with them. + + The 'Scope' field MAY be added, modified, or removed via the + registration process, provided the change mirrors changes made by ISO + 639 to the assignment's classification. Such a change is expected to + be rare. + + For example, the primary language subtag 'zh' (Chinese) has a 'Scope' + of 'macrolanguage', while its enclosed language 'nan' (Min Nan + Chinese) has a 'Scope' of 'individual'. The special value 'und' + (Undetermined) has a 'Scope' of 'special'. The ISO 639-5 collection + 'gem' (Germanic languages) has a 'Scope' of 'collection'. + +3.1.12. Comments Field + + The field 'Comments' contains additional information about the record + and MAY appear more than once per record. The field-body MAY include + the full range of Unicode characters and is not restricted to any + particular script. This field MAY be inserted or changed via the + registration process, and no guarantee of stability is provided. + + The content of this field is not restricted, except by the need to + register the information, the suitability of the request, and by + reasonable practical size limitations. The primary reason for the + 'Comments' field is subtag identification -- to help distinguish the + subtag from others with which it might be confused as an aid to + usage. Large amounts of information about the use, history, or + general background of a subtag are frowned upon, as these generally + belong in a registration request rather than in the registry. + + + + +Phillips & Davis Best Current Practice [Page 34] + +RFC 5646 Language Tags September 2009 + + +3.2. Language Subtag Reviewer + + The Language Subtag Reviewer moderates the ietf-languages@iana.org + mailing list, responds to requests for registration, and performs the + other registry maintenance duties described in Section 3.3. Only the + Language Subtag Reviewer is permitted to request IANA to change, + update, or add records to the Language Subtag Registry. The Language + Subtag Reviewer MAY delegate list moderation and other clerical + duties as needed. + + The Language Subtag Reviewer is appointed by the IESG for an + indefinite term, subject to removal or replacement at the IESG's + discretion. The IESG will solicit nominees for the position (upon + adoption of this document or upon a vacancy) and then solicit + feedback on the nominees' qualifications. Qualified candidates + should be familiar with BCP 47 and its requirements; be willing to + fairly, responsively, and judiciously administer the registration + process; and be suitably informed about the issues of language + identification so that the reviewer can assess the claims and draw + upon the contributions of language experts and subtag requesters. + + The subsequent performance or decisions of the Language Subtag + Reviewer MAY be appealed to the IESG under the same rules as other + IETF decisions (see [RFC2026]). The IESG can reverse or overturn the + decisions of the Language Subtag Reviewer, provide guidance, or take + other appropriate actions. + +3.3. Maintenance of the Registry + + Maintenance of the registry requires that, as codes are assigned or + withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language + Subtag Reviewer MUST evaluate each change and determine the + appropriate course of action according to the rules in this document. + Such updates follow the registration process described in + Section 3.5. Usually, the Language Subtag Reviewer will start the + process for the new or updated record by filling in the registration + form and submitting it. If a change to one of these standards takes + place and the Language Subtag Reviewer does not do this in a timely + manner, then any interested party MAY submit the form. Thereafter, + the registration process continues normally. + + Note that some registrations affect other subtags--perhaps more than + one--as when a region subtag is being deprecated in favor of a new + value. The Language Subtag Reviewer is responsible for ensuring that + any such changes are properly registered, with each change requiring + its own registration form. + + + + + +Phillips & Davis Best Current Practice [Page 35] + +RFC 5646 Language Tags September 2009 + + + The Language Subtag Reviewer MUST ensure that new subtags meet the + requirements elsewhere in this document (and most especially in + Section 3.4) or submit an appropriate registration form for an + alternate subtag as described in that section. Each individual + subtag affected by a change MUST be sent to the + ietf-languages@iana.org list with its own registration form and in a + separate message. + +3.4. Stability of IANA Registry Entries + + The stability of entries and their meaning in the registry is + critical to the long-term stability of language tags. The rules in + this section guarantee that a specific language tag's meaning is + stable over time and will not change. + + These rules specifically deal with how changes to codes (including + withdrawal and deprecation of codes) maintained by ISO 639, ISO + 15924, ISO 3166, and UN M.49 are reflected in the IANA Language + Subtag Registry. Assignments to the IANA Language Subtag Registry + MUST follow the following stability rules: + + 1. Values in the fields 'Type', 'Subtag', 'Tag', and 'Added' MUST + NOT be changed and are guaranteed to be stable over time. + + 2. Values in the fields 'Preferred-Value' and 'Deprecated' MAY be + added, altered, or removed via the registration process. These + changes SHOULD be limited to changes necessary to mirror changes + in one of the underlying standards (ISO 639, ISO 15924, ISO + 3166-1, or UN M.49) and typically alteration or removal of a + 'Preferred-Value' is limited specifically to region codes. + + 3. Values in the 'Description' field MUST NOT be changed in a way + that would invalidate any existing tags. The description MAY be + broadened somewhat in scope, changed to add information, or + adapted to the most common modern usage. For example, countries + occasionally change their names; a historical example of this is + "Upper Volta" changing to "Burkina Faso". + + 4. Values in the field 'Prefix' MAY be added to existing records of + type 'variant' via the registration process, provided the + 'variant' already has at least one 'Prefix'. A 'Prefix' field + SHALL NOT be registered for any 'variant' that has no existing + 'Prefix' field. If a prefix is added to a variant record, + 'Comment' fields MAY be used to explain different usages with + the various prefixes. + + + + + + +Phillips & Davis Best Current Practice [Page 36] + +RFC 5646 Language Tags September 2009 + + + 5. Values in the field 'Prefix' in records of type 'variant' MAY + also be modified, so long as the modifications broaden the set + of prefixes. That is, a prefix MAY be replaced by one of its + own prefixes. For example, the prefix "en-US" could be replaced + by "en", but not by the prefixes "en-Latn", "fr", or "en-US- + boont". If one of those prefix values were needed, it would + have to be separately registered. + + 6. Values in the field 'Prefix' in records of type 'extlang' MUST + NOT be added, modified, or removed. + + 7. The field 'Prefix' MUST NOT be removed from any record in which + it appears. This field SHOULD be included in the initial + registration of any records of type 'variant' and MUST be + included in any records of type 'extlang'. + + 8. The field 'Comments' MAY be added, changed, modified, or removed + via the registration process or any of the processes or + considerations described in this section. + + 9. The field 'Suppress-Script' MAY be added or removed via the + registration process. + + 10. The field 'Macrolanguage' MAY be added or removed via the + registration process, but only in response to changes made by + ISO 639. The 'Macrolanguage' field appears whenever a language + has a corresponding macrolanguage in ISO 639. That is, the + 'Macrolanguage' fields in the registry exactly match those of + ISO 639. No other macrolanguage mappings will be considered for + registration. + + 11. The field 'Scope' MAY be added or removed from a primary or + extended language subtag after initial registration, and it MAY + be modified in order to match any changes made by ISO 639. + Changes to the 'Scope' field MUST mirror changes made by ISO + 639. Note that primary or extended language subtags whose + records do not contain a 'Scope' field (that is, most of them) + are individual languages as described in Section 3.1.11. + + 12. Primary and extended language subtags (other than independently + registered values created using the registration process) are + created according to the assignments of the various parts of ISO + 639, as follows: + + A. Codes assigned by ISO 639-1 that do not conflict with + existing two-letter primary language subtags and that have + no corresponding three-letter primary defined in the + registry are entered into the IANA registry as new records + + + +Phillips & Davis Best Current Practice [Page 37] + +RFC 5646 Language Tags September 2009 + + + of type 'language'. Note that languages given an ISO 639-1 + code cannot be given extended language subtags, even if + encompassed by a macrolanguage. + + B. Codes assigned by ISO 639-3 or ISO 639-5 that do not + conflict with existing three-letter primary language subtags + and that do not have ISO 639-1 codes assigned (or expected + to be assigned) are entered into the IANA registry as new + records of type 'language'. Note that these two standards + now comprise a superset of ISO 639-2 codes. Codes that have + a defined 'macrolanguage' mapping at the time of their + registration MUST contain a 'Macrolanguage' field. + + C. Codes assigned by ISO 639-3 MAY also be considered for an + extended language subtag registration. Note that they MUST + be assigned a primary language subtag record of type + 'language' even when an 'extlang' record is proposed. When + considering extended language subtag assignment, these + criteria apply: + + 1. If a language has a macrolanguage mapping, and that + macrolanguage has other encompassed languages that are + assigned extended language subtags, then the new + language SHOULD have an 'extlang' record assigned to it + as well. For example, any language with a macrolanguage + of 'zh' or 'ar' would be assigned an 'extlang' record. + + 2. 'Extlang' records SHOULD NOT be created for languages if + other languages encompassed by the macrolanguage do not + also include 'extlang' records. For example, if a new + Serbo-Croatian ('sh') language were registered, it would + not get an extlang record because other languages + encompassed, such as Serbian ('sr'), do not include one + in the registry. + + 3. Sign languages SHOULD have an 'extlang' record with a + 'Prefix' of 'sgn'. + + 4. 'Extlang' records MUST NOT be created for items already + in the registry. Extended language subtags will only be + considered at the time of initial registration. + + 5. Extended language subtag records MUST include the fields + 'Prefix' and 'Preferred-Value' with field values + assigned as described in Section 2.2.2. + + D. Any other codes assigned by ISO 639-2 that do not conflict + with existing three-letter primary or extended language + + + +Phillips & Davis Best Current Practice [Page 38] + +RFC 5646 Language Tags September 2009 + + + subtags and that do not have ISO 639-1 two-letter codes + assigned are entered into the IANA registry as new records + of type 'language'. This type of registration is not + supposed to occur in the future. + + 13. Codes assigned by ISO 15924 and ISO 3166-1 that do not conflict + with existing subtags of the associated type and whose meaning + is not the same as an existing subtag of the same type are + entered into the IANA registry as new records. + + 14. Codes assigned by ISO 639, ISO 15924, or ISO 3166-1 that are + withdrawn by their respective maintenance or registration + authority remain valid in language tags. A 'Deprecated' field + containing the date of withdrawal MUST be added to the record. + If a new record of the same type is added that represents a + replacement value, then a 'Preferred-Value' field MAY also be + added. The registration process MAY be used to add comments + about the withdrawal of the code by the respective standard. + + For example: the region code 'TL' was assigned to the country + 'Timor-Leste', replacing the code 'TP' (which was assigned to + 'East Timor' when it was under administration by Portugal). + The subtag 'TP' remains valid in language tags, but its + record contains the 'Preferred-Value' of 'TL' and its field + 'Deprecated' contains the date the new code was assigned + ('2004-07-06'). + + 15. Codes assigned by ISO 639, ISO 15924, or ISO 3166-1 that + conflict with existing subtags of the associated type, including + subtags that are deprecated, MUST NOT be entered into the + registry. The following additional considerations apply to + subtag values that are reassigned: + + A. For ISO 639 codes, if the newly assigned code's meaning is + not represented by a subtag in the IANA registry, the + Language Subtag Reviewer, as described in Section 3.5, SHALL + prepare a proposal for entering in the IANA registry, as + soon as practical, a registered language subtag as an + alternate value for the new code. The form of the + registered language subtag will be at the discretion of the + Language Subtag Reviewer and MUST conform to other + restrictions on language subtags in this document. + + B. For all subtags whose meaning is derived from an external + standard (that is, by ISO 639, ISO 15924, ISO 3166-1, or UN + M.49), if a new meaning is assigned to an existing code and + the new meaning broadens the meaning of that code, then the + meaning for the associated subtag MAY be changed to match. + + + +Phillips & Davis Best Current Practice [Page 39] + +RFC 5646 Language Tags September 2009 + + + The meaning of a subtag MUST NOT be narrowed, however, as + this can result in an unknown proportion of the existing + uses of a subtag becoming invalid. Note: the ISO 639 + registration authority (RA) has adopted a similar stability + policy. + + C. For ISO 15924 codes, if the newly assigned code's meaning is + not represented by a subtag in the IANA registry, the + Language Subtag Reviewer, as described in Section 3.5, SHALL + prepare a proposal for entering in the IANA registry, as + soon as practical, a registered variant subtag as an + alternate value for the new code. The form of the + registered variant subtag will be at the discretion of the + Language Subtag Reviewer and MUST conform to other + restrictions on variant subtags in this document. + + D. For ISO 3166-1 codes, if the newly assigned code's meaning + is associated with the same UN M.49 code as another 'region' + subtag, then the existing region subtag remains as the + preferred value for that region and no new entry is created. + A comment MAY be added to the existing region subtag + indicating the relationship to the new ISO 3166-1 code. + + E. For ISO 3166-1 codes, if the newly assigned code's meaning + is associated with a UN M.49 code that is not represented by + an existing region subtag, then the Language Subtag + Reviewer, as described in Section 3.5, SHALL prepare a + proposal for entering the appropriate UN M.49 country code + as an entry in the IANA registry. + + F. For ISO 3166-1 codes, if there is no associated UN numeric + code, then the Language Subtag Reviewer SHALL petition the + UN to create one. If there is no response from the UN + within 90 days of the request being sent, the Language + Subtag Reviewer SHALL prepare a proposal for entering in the + IANA registry, as soon as practical, a registered variant + subtag as an alternate value for the new code. The form of + the registered variant subtag will be at the discretion of + the Language Subtag Reviewer and MUST conform to other + restrictions on variant subtags in this document. This + situation is very unlikely to ever occur. + + 16. UN M.49 has codes for both "countries and areas" (such as '276' + for Germany) and "geographical regions and sub-regions" (such as + '150' for Europe). UN M.49 country or area codes for which + there is no corresponding ISO 3166-1 code MUST NOT be + registered, except as a surrogate for an ISO 3166-1 code that is + blocked from registration by an existing subtag. + + + +Phillips & Davis Best Current Practice [Page 40] + +RFC 5646 Language Tags September 2009 + + + If such a code becomes necessary, then the maintenance agency + for ISO 3166-1 SHALL first be petitioned to assign a code to the + region. If the petition for a code assignment by ISO 3166-1 is + refused or not acted on in a timely manner, the registration + process described in Section 3.5 can then be used to register + the corresponding UN M.49 code. This way, UN M.49 codes remain + available as the value of last resort in cases where ISO 3166-1 + reassigns a deprecated value in the registry. + + 17. The redundant and grandfathered entries together form the + complete list of tags registered under [RFC3066]. The redundant + tags are those previously registered tags that can now be formed + using the subtags defined in the registry. The grandfathered + entries include those that can never be legal because they are + 'irregular' (that is, they do not match the 'langtag' production + in Figure 1), are limited by rule (subtags such as 'nyn' and + 'min' look like the extlang production, but cannot be registered + as extended language subtags), or their subtags are + inappropriate for registration. All of the grandfathered tags + are listed in either the 'regular' or the 'irregular' + productions in the ABNF. Under [RFC4646] it was possible for + grandfathered tags to become redundant. However, all of the + tags for which this was possible became redundant before this + document was produced. So the set of redundant and + grandfathered tags is now permanent and immutable: new entries + of either type MUST NOT be added and existing entries MUST NOT + be removed. The decision-making process about which tags were + initially grandfathered and which were made redundant is + described in [RFC4645]. + + Many of the grandfathered tags are deprecated -- indeed, they + were deprecated even before [RFC4646]. For example, the tag + "art-lojban" was deprecated in favor of the primary language + subtag 'jbo'. These tags could have been made 'redundant' by + registering some of their subtags as 'variants'. The 'variant- + like' subtags in the grandfathered registrations SHALL NOT be + registered in the future, even with a similar or identical + meaning. + +3.5. Registration Procedure for Subtags + + The procedure given here MUST be used by anyone who wants to use a + subtag not currently in the IANA Language Subtag Registry or who + wishes to add, modify, update, or remove information in existing + records as permitted by this document. + + Only subtags of type 'language' and 'variant' will be considered for + independent registration of new subtags. Subtags needed for + + + +Phillips & Davis Best Current Practice [Page 41] + +RFC 5646 Language Tags September 2009 + + + stability and subtags necessary to keep the registry synchronized + with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits + defined by this document also use this process, as described in + Section 3.3 and subject to stability provisions as described in + Section 3.4. + + Registration requests are accepted relating to information in the + 'Comments', 'Deprecated', 'Description', 'Prefix', 'Preferred-Value', + 'Macrolanguage', or 'Suppress-Script' fields in a subtag's record as + described in Section 3.4. Changes to all other fields in the IANA + registry are NOT permitted. + + Registering a new subtag or requesting modifications to an existing + tag or subtag starts with the requester filling out the registration + form reproduced below. Note that each response is not limited in + size so that the request can adequately describe the registration. + The fields in the "Record Requested" section need to follow the + requirements in Section 3.1 before the record will be approved. + + LANGUAGE SUBTAG REGISTRATION FORM + 1. Name of requester: + 2. E-mail address of requester: + 3. Record Requested: + + Type: + Subtag: + Description: + Prefix: + Preferred-Value: + Deprecated: + Suppress-Script: + Macrolanguage: + Comments: + + 4. Intended meaning of the subtag: + 5. Reference to published description + of the language (book or article): + 6. Any other relevant information: + + Figure 5: The Language Subtag Registration Form + + Examples of completed registration forms can be found in Appendix B. + A complete list of approved registration forms is online through + http://www.iana.org; readers should note that the Language Tag + Registry is now obsolete and should instead look for the Language + Subtag Registry. + + + + + +Phillips & Davis Best Current Practice [Page 42] + +RFC 5646 Language Tags September 2009 + + + The subtag registration form MUST be sent to + . Registration requests receive a two-week + review period before being approved and submitted to IANA for + inclusion in the registry. If modifications are made to the request + during the course of the registration process (such as corrections to + meet the requirements in Section 3.1 or to make the 'Description' + fields unique for the given record type), the modified form MUST also + be sent to at least one week prior to + submission to IANA. + + The ietf-languages list is an open list and can be joined by sending + a request to . The list can be + hosted by IANA or any third party at the request of IESG. + + Before forwarding any registration to IANA, the Language Subtag + Reviewer MUST ensure that all requirements in this document are met. + This includes ensuring that values in the 'Subtag' field match case + according to the description in Section 3.1.4 and that 'Description' + fields are unique for the given record type as described in + Section 3.1.5. The Reviewer MUST also ensure that an appropriate + File-Date record is included in the request, to assist IANA when + updating the registry (see Section 5.1). + + Some fields in both the registration form as well as the registry + record itself permit the use of non-ASCII characters. Registration + requests SHOULD use the UTF-8 encoding for consistency and clarity. + However, since some mail clients do not support this encoding, other + encodings MAY be used for the registration request. The Language + Subtag Reviewer is responsible for ensuring that the proper Unicode + characters appear in both the archived request form and the registry + record. In the case of a transcription or encoding error by IANA, + the Language Subtag Reviewer will request that the registry be + repaired, providing any necessary information to assist IANA. + + Extended language subtags (type 'extlang'), by definition, are always + encompassed by another language. All records of type 'extlang' MUST, + therefore, contain a 'Prefix' field at the time of registration. + This 'Prefix' field can never be altered or removed, and requests to + do so MUST be rejected. + + Variant subtags are usually registered for use with a particular + range of language tags, and variant subtags based on the terminology + of the language to which they are apply are encouraged. For example, + the subtag 'rozaj' (Resian) is intended for use with language tags + that start with the primary language subtag "sl" (Slovenian), since + Resian is a dialect of Slovenian. Thus, the subtag 'rozaj' would be + appropriate in tags such as "sl-Latn-rozaj" or "sl-IT-rozaj". This + information is stored in the 'Prefix' field in the registry. Variant + + + +Phillips & Davis Best Current Practice [Page 43] + +RFC 5646 Language Tags September 2009 + + + registration requests SHOULD include at least one 'Prefix' field in + the registration form. + + Requests to assign an additional record of a given type with an + existing subtag value MUST be rejected. For example, the variant + subtag 'rozaj' already exists in the registry, so adding a second + record of type 'variant' with the subtag 'rozaj' is prohibited. + + The 'Prefix' field for a given registered variant subtag exists in + the IANA registry as a guide to usage. Additional 'Prefix' fields + MAY be added by filing an additional registration form. In that + form, the "Any other relevant information:" field MUST indicate that + it is the addition of a prefix. + + Requests to add a 'Prefix' field to a variant subtag that imply a + different semantic meaning SHOULD be rejected. For example, a + request to add the prefix "de" to the subtag '1994' so that the tag + "de-1994" represented some German dialect or orthographic form would + be rejected. The '1994' subtag represents a particular Slovenian + orthography, and the additional registration would change or blur the + semantic meaning assigned to the subtag. A separate subtag SHOULD be + proposed instead. + + Requests to add a 'Prefix' to a variant subtag that has no current + 'Prefix' field MUST be rejected. Variants are registered with no + prefix because they are potentially useful with many or even all + languages. Adding one or more 'Prefix' fields would be potentially + harmful to the use of the variant, since it dramatically reduces the + scope of the subtag (which is not allowed under the stability rules + (Section 3.4) as opposed to broadening the scope of the subtag, which + is what the addition of a 'Prefix' normally does. An example of such + a "no-prefix" variant is the subtag 'fonipa', which represents the + International Phonetic Alphabet, a scheme that can be used to + transcribe many languages. + + The 'Description' fields provided in the request MUST contain at + least one description written or transcribed into the Latin script; + the request MAY also include additional 'Description' fields in any + script or language. The 'Description' field is used for + identification purposes and doesn't necessarily represent the actual + native name of the language or variation. It also doesn't have to be + in any particular language, but SHOULD be both suitable and + sufficient to identify the item in the record. The Language Subtag + Reviewer will check and edit any proposed 'Description' fields so as + to ensure uniqueness and prevent collisions with 'Description' fields + in other records of the same type. If this occurs in an independent + registration request, the Language Subtag Reviewer MUST resubmit the + record to , treating it as a modification of + + + +Phillips & Davis Best Current Practice [Page 44] + +RFC 5646 Language Tags September 2009 + + + a request due to discussion, as described in Section 3.5, unless the + request's sole purpose is to introduce a duplicate 'Description' + field, in which case the request SHALL be rejected. + + The 'Description' field is not guaranteed to be stable. Corrections + or clarifications of intent are examples of possible changes. + Attempts to provide translations or transcriptions of entries in the + registry (which, by definition, provide no new information) are + unlikely to be approved. + + Soon after the two-week review period has passed, the Language Subtag + Reviewer MUST take one of the following actions: + + o Explicitly accept the request and forward the form containing the + record to be inserted or modified to according to + the procedure described in Section 3.3. + + o Explicitly reject the request because of significant objections + raised on the list or due to problems with constraints in this + document (which MUST be explicitly cited). + + o Extend the review period by granting an additional two-week + increment to permit further discussion. After each two-week + increment, the Language Subtag Reviewer MUST indicate on the list + whether the registration has been accepted, rejected, or extended. + + Note that the Language Subtag Reviewer MAY raise objections on the + list if he or she so desires. The important thing is that the + objection MUST be made publicly. + + Sometimes the request needs to be modified as a result of discussion + during the review period or due to requirements in this document. + The applicant, Language Subtag Reviewer, or others MAY submit a + modified version of the completed registration form, which will be + considered in lieu of the original request with the explicit approval + of the applicant. Such changes do not restart the two-week + discussion period, although an application containing the final + record submitted to IANA MUST appear on the list at least one week + prior to the Language Subtag Reviewer forwarding the record to IANA. + The applicant MAY modify a rejected application with more appropriate + or additional information and submit it again; this starts a new two- + week comment period. + + Registrations initiated due to the provisions of Section 3.3 or + Section 3.4 SHALL NOT be rejected altogether (since they have to + ultimately appear in the registry) and SHOULD be completed as quickly + as possible. The review process allows list members to comment on + the specific information in the form and the record it contains and + + + +Phillips & Davis Best Current Practice [Page 45] + +RFC 5646 Language Tags September 2009 + + + thus help ensure that it is correct and consistent. The Language + Subtag Reviewer MAY reject a specific version of the form, but MUST + propose a suitable replacement, extending the review period as + described above, until the form is in a format worthy of the + reviewer's approval and meets with rough consensus of the list. + + Decisions made by the Language Subtag Reviewer MAY be appealed to the + IESG [RFC2028] under the same rules as other IETF decisions + [RFC2026]. This includes a decision to extend the review period or + the failure to announce a decision in a clear and timely manner. + + The approved records appear in the Language Subtag Registry. The + approved registration forms are available online from + http://www.iana.org. + + Updates or changes to existing records follow the same procedure as + new registrations. The Language Subtag Reviewer decides whether + there is consensus to update the registration following the two-week + review period; normally, objections by the original registrant will + carry extra weight in forming such a consensus. + + Registrations are permanent and stable. Once registered, subtags + will not be removed from the registry and will remain a valid way in + which to specify a specific language or variant. + + Note: The purpose of the "Reference to published description" section + in the registration form is to aid in verifying whether a language is + registered or to which language or language variation a particular + subtag refers. In most cases, reference to an authoritative grammar + or dictionary of that language will be useful; in cases where no such + work exists, other well-known works describing that language or in + that language MAY be appropriate. The Language Subtag Reviewer + decides what constitutes "good enough" reference material. This + requirement is not intended to exclude particular languages or + dialects due to the size of the speaker population or lack of a + standardized orthography. Minority languages will be considered + equally on their own merits. + +3.6. Possibilities for Registration + + Possibilities for registration of subtags or information about + subtags include: + + o Primary language subtags for languages not listed in ISO 639 that + are not variants of any listed or registered language MAY be + registered. At the time this document was created, there were no + examples of this form of subtag. Before attempting to register a + language subtag, there MUST be an attempt to register the language + + + +Phillips & Davis Best Current Practice [Page 46] + +RFC 5646 Language Tags September 2009 + + + with ISO 639. Subtags MUST NOT be registered for languages + defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3; + that are under consideration by the ISO 639 registration + authorities; or that have never been attempted for registration + with those authorities. If ISO 639 has previously rejected a + language for registration, it is reasonable to assume that there + must be additional, very compelling evidence of need before it + will be registered as a primary language subtag in the IANA + registry (to the extent that it is very unlikely that any subtags + will be registered of this type). + + o Dialect or other divisions or variations within a language, its + orthography, writing system, regional or historical usage, + transliteration or other transformation, or distinguishing + variation MAY be registered as variant subtags. An example is the + 'rozaj' subtag (the Resian dialect of Slovenian). + + o The addition or maintenance of fields (generally of an + informational nature) in tag or subtag records as described in + Section 3.1 is allowed. Such changes are subject to the stability + provisions in Section 3.4. This includes 'Description', + 'Comments', 'Deprecated', and 'Preferred-Value' fields for + obsolete or withdrawn codes, or the addition of 'Suppress-Script' + or 'Macrolanguage' fields to primary language subtags, as well as + other changes permitted by this document, such as the addition of + an appropriate 'Prefix' field to a variant subtag. + + o The addition of records and related field value changes necessary + to reflect assignments made by ISO 639, ISO 15924, ISO 3166-1, and + UN M.49 as described in Section 3.4 is allowed. + + Subtags proposed for registration that would cause all or part of a + grandfathered tag to become redundant but whose meaning conflicts + with or alters the meaning of the grandfathered tag MUST be rejected. + + This document leaves the decision on what subtags or changes to + subtags are appropriate (or not) to the registration process + described in Section 3.5. + + Note: Four-character primary language subtags are reserved to allow + for the possibility of alpha4 codes in some future addition to the + ISO 639 family of standards. + + ISO 639 defines a registration authority for additions to and changes + in the list of languages in ISO 639. This agency is: + + + + + + +Phillips & Davis Best Current Practice [Page 47] + +RFC 5646 Language Tags September 2009 + + + International Information Centre for Terminology (Infoterm) + Aichholzgasse 6/12, AT-1120 + Wien, Austria + Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72 + + ISO 639-2 defines a registration authority for additions to and + changes in the list of languages in ISO 639-2. This agency is: + + Library of Congress + Network Development and MARC Standards Office + Washington, DC 20540, USA + Phone: +1 202 707 6237 Fax: +1 202 707 0115 + URL: http://www.loc.gov/standards/iso639-2 + + ISO 639-3 defines a registration authority for additions to and + changes in the list of languages in ISO 639-3. This agency is: + + SIL International + ISO 639-3 Registrar + 7500 W. Camp Wisdom Rd. + Dallas, TX 75236, USA + Phone: +1 972 708 7400, ext. 2293 + Fax: +1 972 708 7546 + Email: iso639-3@sil.org + URL: http://www.sil.org/iso639-3 + + ISO 639-5 defines a registration authority for additions to and + changes in the list of languages in ISO 639-5. This agency is the + same as for ISO 639-2 and is: + + Library of Congress + Network Development and MARC Standards Office + Washington, DC 20540, USA + Phone: +1 202 707 6237 + Fax: +1 202 707 0115 + URL: http://www.loc.gov/standards/iso639-5 + + The maintenance agency for ISO 3166-1 (country codes) is: + + ISO 3166 Maintenance Agency + c/o International Organization for Standardization + Case postale 56 + CH-1211 Geneva 20, Switzerland + Phone: +41 22 749 72 33 Fax: +41 22 749 73 49 + URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html + + + + + + +Phillips & Davis Best Current Practice [Page 48] + +RFC 5646 Language Tags September 2009 + + + The registration authority for ISO 15924 (script codes) is: + + Unicode Consortium + Box 391476 + Mountain View, CA 94039-1476, USA + URL: http://www.unicode.org/iso15924 + + The Statistics Division of the United Nations Secretariat maintains + the Standard Country or Area Codes for Statistical Use and can be + reached at: + + Statistical Services Branch + Statistics Division + United Nations, Room DC2-1620 + New York, NY 10017, USA + Fax: +1-212-963-0623 + Email: statistics@un.org + URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm + +3.7. Extensions and the Extensions Registry + + Extension subtags are those introduced by single-character subtags + ("singletons") other than 'x'. They are reserved for the generation + of identifiers that contain a language component and are compatible + with applications that understand language tags. + + The structure and form of extensions are defined by this document so + that implementations can be created that are forward compatible with + applications that might be created using singletons in the future. + In addition, defining a mechanism for maintaining singletons will + lend stability to this document by reducing the likely need for + future revisions or updates. + + Single-character subtags are assigned by IANA using the "IETF Review" + policy defined by [RFC5226]. This policy requires the development of + an RFC, which SHALL define the name, purpose, processes, and + procedures for maintaining the subtags. The maintaining or + registering authority, including name, contact email, discussion list + email, and URL location of the registry, MUST be indicated clearly in + the RFC. The RFC MUST specify or include each of the following: + + o The specification MUST reference the specific version or revision + of this document that governs its creation and MUST reference this + section of this document. + + o The specification and all subtags defined by the specification + MUST follow the ABNF and other rules for the formation of tags and + subtags as defined in this document. In particular, it MUST + + + +Phillips & Davis Best Current Practice [Page 49] + +RFC 5646 Language Tags September 2009 + + + specify that case is not significant and that subtags MUST NOT + exceed eight characters in length. + + o The specification MUST specify a canonical representation. + + o The specification of valid subtags MUST be available over the + Internet and at no cost. + + o The specification MUST be in the public domain or available via a + royalty-free license acceptable to the IETF and specified in the + RFC. + + o The specification MUST be versioned, and each version of the + specification MUST be numbered, dated, and stable. + + o The specification MUST be stable. That is, extension subtags, + once defined by a specification, MUST NOT be retracted or change + in meaning in any substantial way. + + o The specification MUST include, in a separate section, the + registration form reproduced in this section (below) to be used in + registering the extension upon publication as an RFC. + + o IANA MUST be informed of changes to the contact information and + URL for the specification. + + IANA will maintain a registry of allocated single-character + (singleton) subtags. This registry MUST use the record-jar format + described by the ABNF in Section 3.1.1. Upon publication of an + extension as an RFC, the maintaining authority defined in the RFC + MUST forward this registration form to , who MUST + forward the request to . The maintaining authority of + the extension MUST maintain the accuracy of the record by sending an + updated full copy of the record to with the subject + line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only + the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY + be modified in these updates. + + Failure to maintain this record, maintain the corresponding registry, + or meet other conditions imposed by this section of this document MAY + be appealed to the IESG [RFC2028] under the same rules as other IETF + decisions (see [RFC2026]) and MAY result in the authority to maintain + the extension being withdrawn or reassigned by the IESG. + + + + + + + + +Phillips & Davis Best Current Practice [Page 50] + +RFC 5646 Language Tags September 2009 + + + %% + Identifier: + Description: + Comments: + Added: + RFC: + Authority: + Contact_Email: + Mailing_List: + URL: + %% + + Figure 6: Format of Records in the Language Tag Extensions Registry + + 'Identifier' contains the single-character subtag (singleton) + assigned to the extension. The Internet-Draft submitted to define + the extension SHOULD specify which letter or digit to use, although + the IESG MAY change the assignment when approving the RFC. + + 'Description' contains the name and description of the extension. + + 'Comments' is an OPTIONAL field and MAY contain a broader description + of the extension. + + 'Added' contains the date the extension's RFC was published in the + "full-date" format specified in [RFC3339]. For example: 2004-06-28 + represents June 28, 2004, in the Gregorian calendar. + + 'RFC' contains the RFC number assigned to the extension. + + 'Authority' contains the name of the maintaining authority for the + extension. + + 'Contact_Email' contains the email address used to contact the + maintaining authority. + + 'Mailing_List' contains the URL or subscription email address of the + mailing list used by the maintaining authority. + + 'URL' contains the URL of the registry for this extension. + + The determination of whether an Internet-Draft meets the above + conditions and the decision to grant or withhold such authority rests + solely with the IESG and is subject to the normal review and appeals + process associated with the RFC process. + + Extension authors are strongly cautioned that many (including most + well-formed) processors will be unaware of any special relationships + + + +Phillips & Davis Best Current Practice [Page 51] + +RFC 5646 Language Tags September 2009 + + + or meaning inherent in the order of extension subtags. Extension + authors SHOULD avoid subtag relationships or canonicalization + mechanisms that interfere with matching or with length restrictions + that sometimes exist in common protocols where the extension is used. + In particular, applications MAY truncate the subtags in doing + matching or in fitting into limited lengths, so it is RECOMMENDED + that the most significant information be in the most significant + (left-most) subtags and that the specification gracefully handle + truncated subtags. + + When a language tag is to be used in a specific, known protocol, it + is RECOMMENDED that the language tag not contain extensions not + supported by that protocol. In addition, note that some protocols + MAY impose upper limits on the length of the strings used to store or + transport the language tag. + +3.8. Update of the Language Subtag Registry + + After the adoption of this document, the IANA Language Subtag + Registry needed an update so that it would contain the complete set + of subtags valid in a language tag. [RFC5645] describes the process + used to create this update. + + Registrations that are in process under the rules defined in + [RFC4646] when this document is adopted MUST be completed under the + rules contained in this document. + +3.9. Applicability of the Subtag Registry + + The Language Subtag Registry is the source of data elements used to + construct language tags, following the rules described in this + document. Language tags are designed for indicating linguistic + attributes of various content, including not only text but also most + media formats, such as video or audio. They also form the basis for + language and locale negotiation in various protocols and APIs. + + The registry is therefore applicable to many applications that need + some form of language identification, with these limitations: + + o It is not designed to be the sole data source in the creation of a + language-selection user interface. For example, the registry does + not contain translations for subtag descriptions or for tags + composed from the subtags. Sources for localized data based on + the registry are generally available, notably [CLDR]. Nor does + the registry indicate which subtag combinations are particularly + useful or relevant. + + + + + +Phillips & Davis Best Current Practice [Page 52] + +RFC 5646 Language Tags September 2009 + + + o It does not provide information indicating relationships between + different languages, such as might be used in a user interface to + select language tags hierarchically, regionally, or on some other + organizational model. + + o It does not supply information about potential overlap between + different language tags, as the notion of what constitutes a + language is not precise: several different language tags might be + reasonable choices for the same given piece of content. + + o It does not contain information about appropriate fallback choices + when performing language negotiation. A good fallback language + might be linguistically unrelated to the specified language. The + fact that one language is often used as a fallback language for + another is usually a result of outside factors, such as geography, + history, or culture -- factors that might not apply in all cases. + For example, most people who use Breton (a Celtic language used in + the Northwest of France) would probably prefer to be served French + (a Romance language) if Breton isn't available. + +4. Formation and Processing of Language Tags + + This section addresses how to use the information in the registry + with the tag syntax to choose, form, and process language tags. + +4.1. Choice of Language Tag + + The guiding principle in forming language tags is to "tag content + wisely." Sometimes there is a choice between several possible tags + for the same content. The choice of which tag to use depends on the + content and application in question, and some amount of judgment + might be necessary when selecting a tag. + + Interoperability is best served when the same language tag is used + consistently to represent the same language. If an application has + requirements that make the rules here inapplicable, then that + application risks damaging interoperability. It is strongly + RECOMMENDED that users not define their own rules for language tag + choice. + + Standards, protocols, and applications that reference this document + normatively but apply different rules to the ones given in this + section MUST specify how language tag selection varies from the + guidelines given here. + + To ensure consistent backward compatibility, this document contains + several provisions to account for potential instability in the + standards used to define the subtags that make up language tags. + + + +Phillips & Davis Best Current Practice [Page 53] + +RFC 5646 Language Tags September 2009 + + + These provisions mean that no valid language tag can become invalid, + nor will a language tag have a narrower scope in the future (it may + have a broader scope). The most appropriate language tag for a given + application or content item might evolve over time, but once applied, + the tag itself cannot become invalid or have its meaning wholly + change. + + A subtag SHOULD only be used when it adds useful distinguishing + information to the tag. Extraneous subtags interfere with the + meaning, understanding, and processing of language tags. In + particular, users and implementations SHOULD follow the 'Prefix' and + 'Suppress-Script' fields in the registry (defined in Section 3.1): + these fields provide guidance on when specific additional subtags + SHOULD be used or avoided in a language tag. + + The choice of subtags used to form a language tag SHOULD follow these + guidelines: + + 1. Use as precise a tag as possible, but no more specific than is + justified. Avoid using subtags that are not important for + distinguishing content in an application. + + * For example, 'de' might suffice for tagging an email written + in German, while "de-CH-1996" is probably unnecessarily + precise for such a task. + + * Note that some subtag sequences might not represent the + language a casual user might expect. For example, the Swiss + German (Schweizerdeutsch) language is represented by "gsw-CH" + and not by "de-CH". This latter tag represents German ('de') + as used in Switzerland ('CH'), also known as Swiss High German + (Schweizer Hochdeutsch). Both are real languages, and + distinguishing between them could be important to an + application. + + 2. The script subtag SHOULD NOT be used to form language tags unless + the script adds some distinguishing information to the tag. + Script subtags were first formally defined in [RFC4646]. Their + use can affect matching and subtag identification for + implementations of [RFC1766] or [RFC3066] (which are obsoleted by + this document), as these subtags appear between the primary + language and region subtags. Some applications can benefit from + the use of script subtags in language tags, as long as the use is + consistent for a given context. Script subtags are never + appropriate for unwritten content (such as audio recordings). + The field 'Suppress-Script' in the primary or extended language + record in the registry indicates script subtags that do not add + distinguishing information for most applications; this field + + + +Phillips & Davis Best Current Practice [Page 54] + +RFC 5646 Language Tags September 2009 + + + defines when users SHOULD NOT include a script subtag with a + particular primary language subtag. + + For example, if an implementation selects content using Basic + Filtering [RFC4647] (originally described in Section 14.4 of + [RFC2616]) and the user requested the language range "en-US", + content labeled "en-Latn-US" will not match the request and thus + not be selected. Therefore, it is important to know when script + subtags will customarily be used and when they ought not be used. + + For example: + + * The subtag 'Latn' should not be used with the primary language + 'en' because nearly all English documents are written in the + Latin script and it adds no distinguishing information. + However, if a document were written in English mixing Latin + script with another script such as Braille ('Brai'), then it + might be appropriate to choose to indicate both scripts to aid + in content selection, such as the application of a style + sheet. + + * When labeling content that is unwritten (such as a recording + of human speech), the script subtag should not be used, even + if the language is customarily written in several scripts. + Thus, the subtitles to a movie might use the tag "uz-Arab" + (Uzbek, Arabic script), but the audio track for the same + language would be tagged simply "uz". (The tag "uz-Zxxx" + could also be used where content is not written, as the subtag + 'Zxxx' represents the "Code for unwritten documents".) + + 3. If a tag or subtag has a 'Preferred-Value' field in its registry + entry, then the value of that field SHOULD be used to form the + language tag in preference to the tag or subtag in which the + preferred value appears. + + * For example, use 'jbo' for Lojban in preference to the + grandfathered tag "art-lojban". + + 4. Use subtags or sequences of subtags for individual languages in + preference to subtags for language collections. A "language + collection" is a group of languages that are descended from a + common ancestor, are spoken in the same geographical area, or are + otherwise related. Certain language collections are assigned + codes by [ISO639-5] (and some of these [ISO639-5] codes are also + defined as collections in [ISO639-2]). These codes are included + as primary language subtags in the registry. Subtags for a + language collection in the registry have a 'Scope' field with a + value of 'collection'. A subtag for a language collection is + + + +Phillips & Davis Best Current Practice [Page 55] + +RFC 5646 Language Tags September 2009 + + + always preferred to less specific alternatives such as 'mul' and + 'und' (see below), and a subtag representing a language + collection MAY be used when more specific language information is + not available. However, most users and implementations do not + know there is a relationship between the collection and its + individual languages. In addition, the relationship between the + individual languages in the collection is not well defined; in + particular, the languages are usually not mutually intelligible. + Since the subtags are different, a request for the collection + will typically only produce items tagged with the collection's + subtag, not items tagged with subtags for the individual + languages contained in the collection. + + * For example, collections are interpreted inclusively, so the + subtag 'gem' (Germanic languages) could, but SHOULD NOT, be + used with content that would be better tagged with "en" + (English), "de" (German), or "gsw" (Swiss German, Alemannic). + While 'gem' collects all of these (and other) languages, most + implementations will not match 'gem' to the individual + languages; thus, using the subtag will not produce the desired + result. + + 5. [ISO639-2] has defined several codes included in the subtag + registry that require additional care when choosing language + tags. In most of these cases, where omitting the language tag is + permitted, such omission is preferable to using these codes. + Language tags SHOULD NOT incorporate these subtags as a prefix, + unless the additional information conveys some value to the + application. + + * The 'mul' (Multiple) primary language subtag identifies + content in multiple languages. This subtag SHOULD NOT be used + when a list of languages or individual tags for each content + element can be used instead. For example, the 'Content- + Language' header [RFC3282] allows a list of languages to be + used, not just a single language tag. + + * The 'und' (Undetermined) primary language subtag identifies + linguistic content whose language is not determined. This + subtag SHOULD NOT be used unless a language tag is required + and language information is not available or cannot be + determined. Omitting the language tag (where permitted) is + preferred. The 'und' subtag might be useful for protocols + that require a language tag to be provided or where a primary + language subtag is required (such as in "und-Latn"). The + 'und' subtag MAY also be useful when matching language tags in + certain situations. + + + + +Phillips & Davis Best Current Practice [Page 56] + +RFC 5646 Language Tags September 2009 + + + * The 'zxx' (Non-Linguistic, Not Applicable) primary language + subtag identifies content for which a language classification + is inappropriate or does not apply. Some examples might + include instrumental or electronic music; sound recordings + consisting of nonverbal sounds; audiovisual materials with no + narration, dialog, printed titles, or subtitles; machine- + readable data files consisting of machine languages or + character codes; or programming source code. + + * The 'mis' (Uncoded) primary language subtag identifies content + whose language is known but that does not currently have a + corresponding subtag. This subtag SHOULD NOT be used. + Because the addition of other codes in the future can render + its application invalid, it is inherently unstable and hence + incompatible with the stability goals of BCP 47. It is always + preferable to use other subtags: either 'und' or (with prior + agreement) private use subtags. + + 6. Use variant subtags sparingly and in the correct order. Most + variant subtags have one or more 'Prefix' fields in the registry + that express the list of subtags with which they are appropriate. + Variants SHOULD only be used with subtags that appear in one of + these 'Prefix' fields. If a variant lists a second variant in + one of its 'Prefix' fields, the first variant SHOULD appear + directly after the second variant in any language tag where both + occur. General purpose variants (those with no 'Prefix' fields + at all) SHOULD appear after any other variant subtags. Order any + remaining variants by placing the most significant subtag first. + If none of the subtags is more significant or no relationship can + be determined, alphabetize the subtags. Because variants are + very specialized, using many of them together generally makes the + tag so narrow as to override the additional precision gained. + Putting the subtags into another order interferes with + interoperability, as well as the overall interpretation of the + tag. + + For example: + + * The tag "en-scotland-fonipa" (English, Scottish dialect, IPA + phonetic transcription) is correctly ordered because + 'scotland' has a 'Prefix' of "en", while 'fonipa' has no + 'Prefix' field. + + * The tag "sl-IT-rozaj-biske-1994" is correctly ordered: 'rozaj' + lists "sl" as its sole 'Prefix'; 'biske' lists "sl-rozaj" as + its sole 'Prefix'. The subtag '1994' has several prefixes, + + + + + +Phillips & Davis Best Current Practice [Page 57] + +RFC 5646 Language Tags September 2009 + + + including "sl-rozaj". However, it follows both 'rozaj' and + 'biske' because one of its 'Prefix' fields is "sl-rozaj- + biske". + + 7. The grandfathered tag "i-default" (Default Language) was + originally registered according to [RFC1766] to meet the needs of + [RFC2277]. It is not used to indicate a specific language, but + rather to identify the condition or content used where the + language preferences of the user cannot be established. It + SHOULD NOT be used except as a means of labeling the default + content for applications or protocols that require default + language content to be labeled with that specific tag. It MAY + also be used by an application or protocol to identify when the + default language content is being returned. + +4.1.1. Tagging Encompassed Languages + + Some primary language records in the registry have a 'Macrolanguage' + field (Section 3.1.10) that contains a mapping from each "encompassed + language" to its macrolanguage. The 'Macrolanguage' mapping doesn't + define what the relationship between the encompassed language and its + macrolanguage is, nor does it define how languages encompassed by the + same macrolanguage are related to each other. Two different + languages encompassed by the same macrolanguage may differ from one + another more than, say, French and Spanish do. + + A few specific macrolanguages, such as Chinese ('zh') and Arabic + ('ar'), are handled differently. See Section 4.1.2. + + The more specific encompassed language subtag SHOULD be used to form + the language tag, although either the macrolanguage's primary + language subtag or the encompassed language's subtag MAY be used. + This means, for example, tagging Plains Cree with 'crk' rather than + 'cr' (Cree), and so forth. + + Each macrolanguage subtag's scope, by definition, includes all of its + encompassed languages. Since the relationship between encompassed + languages varies, users cannot assume that the macrolanguage subtag + means any particular encompassed language, nor that any given pair of + encompassed languages are mutually intelligible or otherwise + interchangeable. + + Applications MAY use macrolanguage information to improve matching or + language negotiation. For example, the information that 'sr' + (Serbian) and 'hr' (Croatian) share a macrolanguage expresses a + closer relation between those languages than between, say, 'sr' + (Serbian) and 'ma' (Macedonian). However, this relationship is not + guaranteed nor is it exclusive. For example, Romanian ('ro') and + + + +Phillips & Davis Best Current Practice [Page 58] + +RFC 5646 Language Tags September 2009 + + + Moldavian ('mo') do not share a macrolanguage, but are far more + closely related to each other than Cantonese ('yue') and Wu ('wuu'), + which do share a macrolanguage. + +4.1.2. Using Extended Language Subtags + + To accommodate language tag forms used prior to the adoption of this + document, language tags provide a special compatibility mechanism: + the extended language subtag. Selected languages have been provided + with both primary and extended language subtags. These include + macrolanguages, such as Malay ('ms') and Uzbek ('uz'), that have a + specific dominant variety that is generally synonymous with the + macrolanguage. Other languages, such as the Chinese ('zh') and + Arabic ('ar') macrolanguages and the various sign languages ('sgn'), + have traditionally used their primary language subtag, possibly + coupled with various region subtags or as part of a registered + grandfathered tag, to indicate the language. + + With the adoption of this document, specific ISO 639-3 subtags became + available to identify the languages contained within these diverse + language families or groupings. This presents a choice of language + tags where previously none existed: + + o Each encompassed language's subtag SHOULD be used as the primary + language subtag. For example, a document in Mandarin Chinese + would be tagged "cmn" (the subtag for Mandarin Chinese) in + preference to "zh" (Chinese). + + o If compatibility is desired or needed, the encompassed subtag MAY + be used as an extended language subtag. For example, a document + in Mandarin Chinese could be tagged "zh-cmn" instead of either + "cmn" or "zh". + + o The macrolanguage or prefixing subtag MAY still be used to form + the tag instead of the more specific encompassed language subtag. + That is, tags such as "zh-HK" or "sgn-RU" are still valid. + + Chinese ('zh') provides a useful illustration of this. In the past, + various content has used tags beginning with the 'zh' subtag, with + application-specific meaning being associated with region codes, + private use sequences, or grandfathered registered values. This is + because historically only the macrolanguage subtag 'zh' was available + for forming language tags. However, the languages encompassed by the + Chinese subtag 'zh' are, in the main, not mutually intelligible when + spoken, and the written forms of these languages also show wide + variation in form and usage. + + + + + +Phillips & Davis Best Current Practice [Page 59] + +RFC 5646 Language Tags September 2009 + + + To provide compatibility, Chinese languages encompassed by the 'zh' + subtag are in the registry both as primary language subtags and as + extended language subtags. For example, the ISO 639-3 code for + Cantonese is 'yue'. Content in Cantonese might historically have + used a tag such as "zh-HK" (since Cantonese is commonly spoken in + Hong Kong), although that tag actually means any type of Chinese as + used in Hong Kong. With the availability of ISO 639-3 codes in the + registry, content in Cantonese can be directly tagged using the 'yue' + subtag. The content can use it as a primary language subtag, as in + the tag "yue-HK" (Cantonese, Hong Kong). Or it can use an extended + language subtag with 'zh', as in the tag "zh-yue-Hant" (Chinese, + Cantonese, Traditional script). + + As noted above, applications can choose to use the macrolanguage + subtag to form the tag instead of using the more specific encompassed + language subtag. For example, an application with large quantities + of data already using tags with the 'zh' (Chinese) subtag might + continue to use this more general subtag even for new data, even + though the content could be more precisely tagged with 'cmn' + (Mandarin), 'yue' (Cantonese), 'wuu' (Wu), and so on. Similarly, an + application already using tags that start with the 'ar' (Arabic) + subtag might continue to use this more general subtag even for new + data, which could be more precisely tagged with 'arb' (Standard + Arabic). + + In some cases, the encompassed languages had tags registered for them + during the RFC 3066 era. Those grandfathered tags not already + deprecated or rendered redundant were deprecated in the registry upon + adoption of this document. As grandfathered values, they remain + valid for use, and some content or applications might use them. As + with other grandfathered tags, since implementations might not be + able to associate the grandfathered tags with the encompassed + language subtag equivalents that are recommended by this document, + implementations are encouraged to canonicalize tags for comparison + purposes. Some examples of this include the tags "zh-hakka" (Hakka) + and "zh-guoyu" (Mandarin or Standard Chinese). + + Sign languages share a mode of communication rather than a linguistic + heritage. There are many sign languages that have developed + independently, and the subtag 'sgn' indicates only the presence of a + sign language. A number of sign languages also had grandfathered + tags registered for them during the RFC 3066 era. For example, the + grandfathered tag "sgn-US" was registered to represent 'American Sign + Language' specifically, without reference to the United States. This + is still valid, but deprecated: a document in American Sign Language + can be labeled either "ase" or "sgn-ase" (the 'ase' subtag is for the + language called 'American Sign Language'). + + + + +Phillips & Davis Best Current Practice [Page 60] + +RFC 5646 Language Tags September 2009 + + +4.2. Meaning of the Language Tag + + The meaning of a language tag is related to the meaning of the + subtags that it contains. Each subtag, in turn, implies a certain + range of expectations one might have for related content, although it + is not a guarantee. For example, the use of a script subtag such as + 'Arab' (Arabic script) does not mean that the content contains only + Arabic characters. It does mean that the language involved is + predominantly in the Arabic script. Thus, a language tag and its + subtags can encompass a very wide range of variation and yet remain + appropriate in each particular instance. + + Validity of a tag is not the only factor determining its usefulness. + While every valid tag has a meaning, it might not represent any real- + world language usage. This is unavoidable in a system in which + subtags can be combined freely. For example, tags such as + "ar-Cyrl-CO" (Arabic, Cyrillic script, as used in Colombia) or "tlh- + Kore-AQ-fonipa" (Klingon, Korean script, as used in Antarctica, IPA + phonetic transcription) are both valid and unlikely to represent a + useful combination of language attributes. + + The meaning of a given tag doesn't depend on the context in which it + appears. The relationship between a tag's meaning and the + information objects to which that tag is applied, however, can vary. + + o For a single information object, the associated language tags + might be interpreted as the set of languages that is necessary for + a complete comprehension of the complete object. Example: Plain + text documents. + + o For an aggregation of information objects, the associated language + tags could be taken as the set of languages used inside components + of that aggregation. Examples: Document stores and libraries. + + o For information objects whose purpose is to provide alternatives, + the associated language tags could be regarded as a hint that the + content is provided in several languages and that one has to + inspect each of the alternatives in order to find its language or + languages. In this case, the presence of multiple tags might not + mean that one needs to be multilingual to get complete + understanding of the document. Example: MIME multipart/ + alternative [RFC2046]. + + o For markup languages, such as HTML and XML, language information + can be added to each part of the document identified by the markup + structure (including the whole document itself). For example, one + could write C'est la vie. inside a German + document; the German-speaking user could then access a French- + + + +Phillips & Davis Best Current Practice [Page 61] + +RFC 5646 Language Tags September 2009 + + + German dictionary to find out what the marked section meant. If + the user were listening to that document through a speech + synthesis interface, this formation could be used to signal the + synthesizer to appropriately apply French text-to-speech + pronunciation rules to that span of text, instead of applying the + inappropriate German rules. + + o For markup languages and document formats that allow the audience + to be identified, a language tag could indicate the audience(s) + appropriate for that document. For example, the same HTML + document described in the preceding bullet might have an HTTP + header "Content-Language: de" to indicate that the intended + audience for the file is German (even though three words appear + and are identified as being in French within it). + + o For systems and APIs, language tags form the basis for most + implementations of locale identifiers. For example, see Unicode's + CLDR (Common Locale Data Repository) (see UTS #35 [UTS35]) + project. + + Language tags are related when they contain a similar sequence of + subtags. For example, if a language tag B contains language tag A as + a prefix, then B is typically "narrower" or "more specific" than A. + Thus, "zh-Hant-TW" is more specific than "zh-Hant". + + This relationship is not guaranteed in all cases: specifically, + languages that begin with the same sequence of subtags are NOT + guaranteed to be mutually intelligible, although they might be. For + example, the tag "az" shares a prefix with both "az-Latn" + (Azerbaijani written using the Latin script) and "az-Cyrl" + (Azerbaijani written using the Cyrillic script). A person fluent in + one script might not be able to read the other, even though the + linguistic content (e.g., what would be heard if both texts were read + aloud) might be identical. Content tagged as "az" most probably is + written in just one script and thus might not be intelligible to a + reader familiar with the other script. + + Similarly, not all subtags specify an actual distinction in language. + For example, the tags "en-US" and "en-CA" mean, roughly, English with + features generally thought to be characteristic of the United States + and Canada, respectively. They do not imply that a significant + dialectical boundary exists between any arbitrarily selected point in + the United States and any arbitrarily selected point in Canada. + Neither does a particular region subtag imply that linguistic + distinctions do not exist within that region. + + + + + + +Phillips & Davis Best Current Practice [Page 62] + +RFC 5646 Language Tags September 2009 + + +4.3. Lists of Languages + + In some applications, a single content item might best be associated + with more than one language tag. Examples of such a usage include: + + o Content items that contain multiple, distinct varieties. Often + this is used to indicate an appropriate audience for a given + content item when multiple choices might be appropriate. Examples + of this could include: + + * Metadata about the appropriate audience for a movie title. For + example, a DVD might label its individual audio tracks 'de' + (German), 'fr' (French), and 'es' (Spanish), but the overall + title would list "de, fr, es" as its overall audience. + + * A French/English, English/French dictionary tagged as both "en" + and "fr" to specify that it applies equally to French and + English. + + * A side-by-side or interlinear translation of a document, as is + commonly done with classical works in Latin or Greek. + + o Content items that contain a single language but that require + multiple levels of specificity. For example, a library might wish + to classify a particular work as both Norwegian ('no') and as + Nynorsk ('nn') for audiences capable of appreciating the + distinction or needing to select content more narrowly. + +4.4. Length Considerations + + There is no defined upper limit on the size of language tags. While + historically most language tags have consisted of language and region + subtags with a combined total length of up to six characters, larger + tags have always been both possible and have actually appeared in + use. + + Neither the language tag syntax nor other requirements in this + document impose a fixed upper limit on the number of subtags in a + language tag (and thus an upper bound on the size of a tag). The + language tag syntax suggests that, depending on the specific + language, more subtags (and thus a longer tag) are sometimes + necessary to completely identify the language for certain + applications; thus, it is possible to envision long or complex subtag + sequences. + + + + + + + +Phillips & Davis Best Current Practice [Page 63] + +RFC 5646 Language Tags September 2009 + + +4.4.1. Working with Limited Buffer Sizes + + Some applications and protocols are forced to allocate fixed buffer + sizes or otherwise limit the length of a language tag. A conformant + implementation or specification MAY refuse to support the storage of + language tags that exceed a specified length. Any such limitation + SHOULD be clearly documented, and such documentation SHOULD include + what happens to longer tags (for example, whether an error value is + generated or the language tag is truncated). A protocol that allows + tags to be truncated at an arbitrary limit, without giving any + indication of what that limit is, has the potential to cause harm by + changing the meaning of tags in substantial ways. + + In practice, most language tags do not require more than a few + subtags and will not approach reasonably sized buffer limitations; + see Section 4.1. + + Some specifications or protocols have limits on tag length but do not + have a fixed length limitation. For example, [RFC2231] has no + explicit length limitation: the length available for the language tag + is constrained by the length of other header components (such as the + charset's name) coupled with the 76-character limit in [RFC2047]. + Thus, the "limit" might be 50 or more characters, but it could + potentially be quite small. + + The considerations for assigning a buffer limit are: + + Implementations SHOULD NOT truncate language tags unless the + meaning of the tag is purposefully being changed, or unless the + tag does not fit into a limited buffer size specified by a + protocol for storage or transmission. + + Implementations SHOULD warn the user when a tag is truncated since + truncation changes the semantic meaning of the tag. + + Implementations of protocols or specifications that are space + constrained but do not have a fixed limit SHOULD use the longest + possible tag in preference to truncation. + + Protocols or specifications that specify limited buffer sizes for + language tags MUST allow for language tags of at least 35 + characters. Note that [RFC4646] recommended a minimum field size + of 42 characters because it included all three elements of the + 'extlang' production. Two of these are now permanently reserved, + so a registered primary language subtag of the maximum length of 8 + characters is now longer than the longest language-extlang + combination. Protocols or specifications that commonly use + + + + +Phillips & Davis Best Current Practice [Page 64] + +RFC 5646 Language Tags September 2009 + + + extensions or private use subtags might wish to reserve or + recommend a longer "minimum buffer" size. + + The following illustration shows how the 35-character recommendation + was derived: + + language = 8 ; longest allowed registered value + ; longer than primary+extlang + ; which requires 7 characters + script = 5 ; if not suppressed: see Section 4.1 + region = 4 ; UN M.49 numeric region code + ; ISO 3166-1 codes require 3 + variant1 = 9 ; needs 'language' as a prefix + variant2 = 9 ; very rare, as it needs + ; 'language-variant1' as a prefix + + total = 35 characters + + Figure 7: Derivation of the Limit on Tag Length + +4.4.2. Truncation of Language Tags + + Truncation of a language tag alters the meaning of the tag, and thus + SHOULD be avoided. However, truncation of language tags is sometimes + necessary due to limited buffer sizes. Such truncation MUST NOT + permit a subtag to be chopped off in the middle or the formation of + invalid tags (for example, one ending with the "-" character). + + This means that applications or protocols that truncate tags MUST do + so by progressively removing subtags along with their preceding "-" + from the right side of the language tag until the tag is short enough + for the given buffer. If the resulting tag ends with a single- + character subtag, that subtag and its preceding "-" MUST also be + removed. For example: + + Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1 + 1. zh-Latn-CN-variant1-a-extend1-x-wadegile + 2. zh-Latn-CN-variant1-a-extend1 + 3. zh-Latn-CN-variant1 + 4. zh-Latn-CN + 5. zh-Latn + 6. zh + + Figure 8: Example of Tag Truncation + + + + + + + +Phillips & Davis Best Current Practice [Page 65] + +RFC 5646 Language Tags September 2009 + + +4.5. Canonicalization of Language Tags + + Since a particular language tag can be used by many processes, + language tags SHOULD always be created or generated in canonical + form. + + A language tag is in 'canonical form' when the tag is well-formed + according to the rules in Sections 2.1 and 2.2 and it has been + canonicalized by applying each of the following steps in order, using + data from the IANA registry (see Section 3.1): + + 1. Extension sequences are ordered into case-insensitive ASCII order + by singleton subtag. + + * For example, the subtag sequence '-a-babble' comes before + '-b-warble'. + + 2. Redundant or grandfathered tags are replaced by their 'Preferred- + Value', if there is one. + + * The field-body of the 'Preferred-Value' for grandfathered and + redundant tags is an "extended language range" [RFC4647] and + might consist of more than one subtag. + + * 'Preferred-Value' fields in the registry provide mappings from + deprecated tags to modern equivalents. Many of these were + created before the adoption of this document (such as the + mapping of "no-nyn" to "nn" or "i-klingon" to "tlh"). Others + are the result of later registrations or additions to the + registry as permitted or required by this document (for + example, "zh-hakka" was deprecated in favor of the ISO 639-3 + code 'hak' when this document was adopted). + + 3. Subtags are replaced by their 'Preferred-Value', if there is one. + For extlangs, the original primary language subtag is also + replaced if there is a primary language subtag in the 'Preferred- + Value'. + + * The field-body of the 'Preferred-Value' for extlangs is an + "extended language range" and typically maps to a primary + language subtag. For example, the subtag sequence "zh-hak" + (Chinese, Hakka) is replaced with the subtag 'hak' (Hakka). + + * Most of the non-extlang subtags are either Region subtags + where the country name or designation has changed or clerical + corrections to ISO 639-1. + + + + + +Phillips & Davis Best Current Practice [Page 66] + +RFC 5646 Language Tags September 2009 + + + The canonical form contains no 'extlang' subtags. There is an + alternate 'extlang form' that maintains or reinstates extlang + subtags. This form can be useful in environments where the presence + of the 'Prefix' subtag is considered beneficial in matching or + selection (see Section 4.1.2). + + A language tag is in 'extlang form' when the tag is well-formed + according to the rules in Sections 2.1 and 2.2 and it has been + processed by applying each of the following two steps in order, using + data from the IANA registry: + + 1. The language tag is first transformed into canonical form, as + described above. + + 2. If the language tag starts with a primary language subtag that is + also an extlang subtag, then the language tag is prepended with + the extlang's 'Prefix'. + + * For example, "hak-CN" (Hakka, China) has the primary language + subtag 'hak', which in turn has an 'extlang' record with a + 'Prefix' 'zh' (Chinese). The extlang form is "zh-hak-CN" + (Chinese, Hakka, China). + + * Note that Step 2 (prepending a prefix) can restore a subtag + that was removed by Step 1 (canonicalizing). + + Example: The language tag "en-a-aaa-b-ccc-bbb-x-xyz" is in canonical + form, while "en-b-ccc-bbb-a-aaa-X-xyz" is well-formed and potentially + valid (extensions 'a' and 'b' are not defined as of the publication + of this document) but not in canonical form (the extensions are not + in alphabetical order). + + Example: Although the tag "en-BU" (English as used in Burma) + maintains its validity, the language tag "en-BU" is not in canonical + form because the 'BU' subtag has a canonical mapping to 'MM' + (Myanmar). + + Canonicalization of language tags does not imply anything about the + use of upper- or lowercase letters when processing or comparing + subtags (and as described in Section 2.1). All comparisons MUST be + performed in a case-insensitive manner. + + When performing canonicalization of language tags, processors MAY + regularize the case of the subtags (that is, this process is + OPTIONAL), following the case used in the registry (see + Section 2.1.1). + + + + + +Phillips & Davis Best Current Practice [Page 67] + +RFC 5646 Language Tags September 2009 + + + If more than one variant appears within a tag, processors MAY reorder + the variants to obtain better matching behavior or more consistent + presentation. Reordering of the variants SHOULD follow the + recommendations for variant ordering in Section 4.1. + + If the field 'Deprecated' appears in a registry record without an + accompanying 'Preferred-Value' field, then that tag or subtag is + deprecated without a replacement. These values are canonical when + they appear in a language tag. However, tags that include these + values SHOULD NOT be selected by users or generated by + implementations. + + An extension MUST define any relationships that exist between the + various subtags in the extension and thus MAY define an alternate + canonicalization scheme for the extension's subtags. Extensions MAY + define how the order of the extension's subtags is interpreted. For + example, an extension could define that its subtags are in canonical + order when the subtags are placed into ASCII order: that is, "en-a- + aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might + define that the order of the subtags influences their semantic + meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b- + aaa-bbb-ccc"). However, extension specifications SHOULD be designed + so that they are tolerant of the typical processes described in + Section 3.7. + +4.6. Considerations for Private Use Subtags + + Private use subtags, like all other subtags, MUST conform to the + format and content constraints in the ABNF. Private use subtags have + no meaning outside the private agreement between the parties that + intend to use or exchange language tags that employ them. The same + subtags MAY be used with a different meaning under a separate private + agreement. They SHOULD NOT be used where alternatives exist and + SHOULD NOT be used in content or protocols intended for general use. + + Private use subtags are simply useless for information exchange + without prior arrangement. The value and semantic meaning of private + use tags and of the subtags used within such a language tag are not + defined by this document. + + Private use sequences introduced by the 'x' singleton are completely + opaque to users or implementations outside of the private use + agreement. So, in addition to private use subtag sequences + introduced by the singleton subtag 'x', the Language Subtag Registry + provides private use language, script, and region subtags derived + from the private use codes assigned by the underlying standards. + These subtags are valid for use in forming language tags; they are + RECOMMENDED over the 'x' singleton private use subtag sequences + + + +Phillips & Davis Best Current Practice [Page 68] + +RFC 5646 Language Tags September 2009 + + + because they convey more information via their linkage to the + language tag's inherent structure. + + For example, the region subtags 'AA', 'ZZ', and those in the ranges + 'QM'-'QZ' and 'XA'-'XZ' (derived from the ISO 3166-1 private use + codes) can be used to form a language tag. A tag such as + "zh-Hans-XQ" conveys a great deal of public, interchangeable + information about the language material (that it is Chinese in the + simplified Chinese script and is suitable for some geographic region + 'XQ'). While the precise geographic region is not known outside of + private agreement, the tag conveys far more information than an + opaque tag such as "x-somelang" or even "zh-Hans-x-xq" (where the + 'xq' subtag's meaning is entirely opaque). + + However, in some cases content tagged with private use subtags can + interact with other systems in a different and possibly unsuitable + manner compared to tags that use opaque, privately defined subtags, + so the choice of the best approach sometimes depends on the + particular domain in question. + +5. IANA Considerations + + This section deals with the processes and requirements necessary for + IANA to maintain the subtag and extension registries as defined by + this document and in accordance with the requirements of [RFC5226]. + + The impact on the IANA maintainers of the two registries defined by + this document will be a small increase in the frequency of new + entries or updates. IANA also is required to create a new mailing + list (described below in Section 5.1) to announce registry changes + and updates. + +5.1. Language Subtag Registry + + IANA updated the registry using instructions and content provided in + a companion document [RFC5645]. The criteria and process for + selecting the updated set of records are described in that document. + The updated set of records represents no impact on IANA, since the + work to create it will be performed externally. + + Future work on the Language Subtag Registry includes the following + activities: + + o Inserting or replacing whole records. These records are + preformatted for IANA by the Language Subtag Reviewer, as + described in Section 3.3. + + o Archiving and making publicly available the registration forms. + + + +Phillips & Davis Best Current Practice [Page 69] + +RFC 5646 Language Tags September 2009 + + + o Announcing each updated version of the registry on the + "ietf-languages-announcements@iana.org" mailing list. + + Each registration form sent to IANA contains a single record for + incorporation into the registry. The form will be sent to + by the Language Subtag Reviewer. It will have a + subject line indicating whether the enclosed form represents an + insertion of a new record (indicated by the word "INSERT" in the + subject line) or a replacement of an existing record (indicated by + the word "MODIFY" in the subject line). At no time can a record be + deleted from the registry. + + IANA will extract the record from the form and place the inserted or + modified record into the appropriate section of the Language Subtag + Registry, grouping the records by their 'Type' field. Inserted + records can be placed anywhere within the appropriate section; there + is no guarantee that the registry's records will be placed in any + particular order except that they will always be grouped by 'Type'. + Modified records overwrite the record they replace. + + Whenever an entry is created or modified in the registry, the 'File- + Date' record at the start of the registry is updated to reflect the + most recent modification date. The date format SHALL be the "full- + date" format of [RFC3339]. The date SHALL be the date on which that + version of the registry was first published by IANA. There SHALL be + at most one version of the registry published in a day. A 'File- + Date' record is also included in each request to IANA to insert or + modify records, indicating the acceptance date of the records in the + request. + + The updated registry file MUST use the UTF-8 character encoding, and + IANA MUST check the registry file for proper encoding. Non-ASCII + characters can be sent to IANA by attaching the registration form to + the email message or by using various encodings in the mail message + body (UTF-8 is recommended). IANA will verify any unclear or + corrupted characters with the Language Subtag Reviewer prior to + posting the updated registry. + + IANA will also archive and make publicly available from + http://www.iana.org each registration form. Note that multiple + registrations can pertain to the same record in the registry. + + Developers who are dependent upon the Language Subtag Registry + sometimes would like to be informed of changes in the registry so + that they can update their implementations. When any change is made + to the Language Subtag Registry, IANA will send an announcement + message to (a self- + subscribing list to which only IANA can post). + + + +Phillips & Davis Best Current Practice [Page 70] + +RFC 5646 Language Tags September 2009 + + +5.2. Extensions Registry + + The Language Tag Extensions Registry can contain at most 35 records, + and thus changes to this registry are expected to be very infrequent. + + Future work by IANA on the Language Tag Extensions Registry is + limited to two cases. First, the IESG MAY request that new records + be inserted into this registry from time to time. These requests + MUST include the record to insert in the exact format described in + Section 3.7. In addition, there MAY be occasional requests from the + maintaining authority for a specific extension to update the contact + information or URLs in the record. These requests MUST include the + complete, updated record. IANA is not responsible for validating the + information provided, only that it is properly formatted. IANA + SHOULD take reasonable steps to ascertain that the request comes from + the maintaining authority named in the record present in the + registry. + +6. Security Considerations + + Language tags used in content negotiation, like any other information + exchanged on the Internet, might be a source of concern because they + might be used to infer the nationality of the sender, and thus + identify potential targets for surveillance. + + This is a special case of the general problem that anything sent is + visible to the receiving party and possibly to third parties as well. + It is useful to be aware that such concerns can exist in some cases. + + The evaluation of the exact magnitude of the threat, and any possible + countermeasures, is left to each application protocol (see BCP 72 + [RFC3552] for best current practice guidance on security threats and + defenses). + + The language tag associated with a particular information item is of + no consequence whatsoever in determining whether that content might + contain possible homographs. The fact that a text is tagged as being + in one language or using a particular script subtag provides no + assurance whatsoever that it does not contain characters from scripts + other than the one(s) associated with or specified by that language + tag. + + Since there is no limit to the number of variant, private use, and + extension subtags, and consequently no limit on the possible length + of a tag, implementations need to guard against buffer overflow + attacks. See Section 4.4 for details on language tag truncation, + which can occur as a consequence of defenses against buffer overflow. + + + + +Phillips & Davis Best Current Practice [Page 71] + +RFC 5646 Language Tags September 2009 + + + To prevent denial-of-service attacks, applications SHOULD NOT depend + on either the Language Subtag Registry or the Language Tag Extensions + Registry being always accessible. Additionally, although the + specification of valid subtags for an extension (see Section 3.7) + MUST be available over the Internet, implementations SHOULD NOT + mechanically depend on those sources being always accessible. + + The registries specified in this document are not suitable for + frequent or real-time access to, or retrieval of, the full registry + contents. Most applications do not need registry data at all. For + others, being able to validate or canonicalize language tags as of a + particular registry date will be sufficient, as the registry contents + change only occasionally. Changes are announced to + . This mailing list is + intended for interested organizations and individuals, not for bulk + subscription to trigger automatic software updates. The size of the + registry makes it unsuitable for automatic software updates. + Implementers considering integrating the Language Subtag Registry in + an automatic updating scheme are strongly advised to distribute only + suitably encoded differences, and only via their own infrastructure + -- not directly from IANA. + + Changes, or the absence thereof, can also easily be detected by + looking at the 'File-Date' record at the start of the registry, or by + using features of the protocol used for downloading, without having + to download the full registry. At the time of publication of this + document, IANA is making the Language Tag Registry available over + HTTP 1.1. The proper way to update a local copy of the Language + Subtag Registry using HTTP 1.1 is to use a conditional GET [RFC2616]. + +7. Character Set Considerations + + The syntax in this document requires that language tags use only the + characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most + character sets, so the composition of language tags shouldn't have + any character set issues. + + The rendering of text based on the language tag is not addressed + here. Historically, some processes have relied on the use of + character set/encoding information (or other external information) in + order to infer how a specific string of characters should be + rendered. Notably, this applies to language- and culture-specific + variations of Han ideographs as used in Japanese, Chinese, and + Korean, where use of, for example, a Japanese character encoding such + as EUC-JP implies that the text itself is in Japanese. When language + tags are applied to spans of text, rendering engines might be able to + use that information to better select fonts or make other rendering + + + + +Phillips & Davis Best Current Practice [Page 72] + +RFC 5646 Language Tags September 2009 + + + choices, particularly where languages with distinct writing + traditions use the same characters. + +8. Changes from RFC 4646 + + The main goal for this revision of RFC 4646 was to incorporate two + new parts of ISO 639 (ISO 639-3 and ISO 639-5) and their attendant + sets of language codes into the IANA Language Subtag Registry. This + permits the identification of many more languages and language + collections than previously supported. + + The specific changes in this document to meet these goals are: + + o Defined the incorporation of ISO 639-3 and ISO 639-5 codes for use + as primary and extended language subtags. It also permanently + reserves and disallows the use of additional 'extlang' subtags. + The changes necessary to achieve this were: + + * Modified the ABNF comments. + + * Updated various registration and stability requirements + sections to reference ISO 639-3 and ISO 639-5 in addition to + ISO 639-1 and ISO 639-2. + + * Edited the text to eliminate references to extended language + subtags where they are no longer used. + + * Explained the change in the section on extended language + subtags. + + o Changed the ABNF related to grandfathered tags. The irregular + tags are now listed. Well-formed grandfathered tags are now + described by the 'langtag' production, and the 'grandfathered' + production was removed as a result. Also: added description of + both types of grandfathered tags to Section 2.2.8. + + o Added the paragraph on "collections" to Section 4.1. + + o Changed the capitalization rules for 'Tag' fields in Section 3.1. + + o Split Section 3.1 up into subsections. + + o Modified Section 3.5 to allow 'Suppress-Script' fields to be + added, modified, or removed via the registration process. This + was an erratum from RFC 4646. + + o Modified examples that used region code 'CS' (formerly Serbia and + Montenegro) to use 'RS' (Serbia) instead. + + + +Phillips & Davis Best Current Practice [Page 73] + +RFC 5646 Language Tags September 2009 + + + o Modified the rules for creating and maintaining record + 'Description' fields to prevent duplicates, including inverted + duplicates. + + o Removed the lengthy description of why RFC 4646 was created from + this section, which also caused the removal of the reference to + XML Schema. + + o Modified the text in Section 2.1 to place more emphasis on the + fact that language tags are not case sensitive. + + o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS" + and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the + 'Suppress-Script' on 'Latn' with 'fr'. + + o Changed the requirements for well-formedness to make singleton + repetition checking optional (it is required for validity + checking) in Section 2.2.9. + + o Changed the text in Section 2.2.9 referring to grandfathered + checking to note that the list is now included in the ABNF. + + o Modified and added text to Section 3.2. The job description was + placed first. A note was added making clear that the Language + Subtag Reviewer may delegate various non-critical duties, + including list moderation. Finally, additional text was added to + make the appointment process clear and to clarify that decisions + and performance of the reviewer are appealable. + + o Added text to Section 3.5 clarifying that the + ietf-languages@iana.org list is operated by whomever the IESG + appoints. + + o Added text to Section 3.1.5 clarifying that the first Description + in a 'language' record matches the corresponding Reference Name + for the language in ISO 639-3. + + o Modified Section 2.2.9 to define classes of conformance related to + specific tags (formerly 'well-formed' and 'valid' referred to + implementations). Notes were added about the removal of 'extlang' + from the ABNF provided in RFC 4646, allowing for well-formedness + using this older definition. Reference to RFC 3066 well- + formedness was also added. + + o Added text to the end of Section 3.1.2 noting that future versions + of this document might add new field types to the registry format + and recommending that implementations ignore any unrecognized + fields. + + + +Phillips & Davis Best Current Practice [Page 74] + +RFC 5646 Language Tags September 2009 + + + o Added text about what the lack of a 'Suppress-Script' field means + in a record to Section 3.1.9. + + o Added text allowing the correction of misspellings and typographic + errors to Section 3.1.5. + + o Added text to Section 3.1.8 disallowing 'Prefix' field conflicts + (such as circular prefix references). + + o Modified text in Section 3.5 to require the subtag reviewer to + announce his/her decision (or extension) following the two-week + period. Also clarified that any decision or failure to decide can + be appealed. + + o Modified text in Section 4.1 to include the (heretofore anecdotal) + guiding principle of tag choice, and clarifying the non-use of + script subtags in non-written applications. + + o Prohibited multiple use of the same variant in a tag (i.e., "de- + 1901-1901"). Previously, this was only a recommendation + ("SHOULD"). + + o Removed inappropriate [RFC2119] language from the illustration in + Section 4.4.1. + + o Replaced the example of deprecating "zh-guoyu" with "zh- + hakka"->"hak" in Section 4.5, noting that it was this document + that caused the change. + + o Replaced the section in Section 4.1 dealing with "mul"/"und" to + include the subtags 'zxx' and 'mis', as well as the tag + "i-default". A normative reference to RFC 2277 was added. + + o Added text to Section 3.5 clarifying that any modifications of a + registration request must be sent to the + list before submission to IANA. + + o Changed the ABNF for the record-jar format from using the LWSP + production to use a folding whitespace production similar to obs- + FWS in [RFC5234]. This effectively prevents unintentional blank + lines inside a field. + + o Clarified and revised text in Sections 3.3, 3.5, and 5.1 to + clarify that the Language Subtag Reviewer sends the complete + registration forms to IANA, that IANA extracts the record from the + form, and that the forms must also be archived separately from the + registry. + + + + +Phillips & Davis Best Current Practice [Page 75] + +RFC 5646 Language Tags September 2009 + + + o Added text to Section 5 requiring IANA to send an announcement to + an ietf-languages-announcements list whenever the registry is + updated. + + o Modification of the registry to use UTF-8 as its character + encoding. This also entails additional instructions to IANA and + the Language Subtag Reviewer in the registration process. + + o Modified the rules in Section 2.2.4 so that "exceptionally + reserved" ISO 3166-1 codes other than 'UK' were included into the + registry. In particular, this allows the code 'EU' (European + Union) to be used to form language tags or (more commonly) for + applications that use the registry for region codes to reference + this subtag. + + o Modified the IANA considerations section (Section 5) to remove + unnecessary normative [RFC2119] language. + +9. References + +9.1. Normative References + + [ISO15924] International Organization for Standardization, "ISO + 15924:2004. Information and documentation -- Codes + for the representation of names of scripts", + January 2004. + + [ISO3166-1] International Organization for Standardization, "ISO + 3166-1:2006. Codes for the representation of names + of countries and their subdivisions -- Part 1: + Country codes", November 2006. + + [ISO639-1] International Organization for Standardization, "ISO + 639-1:2002. Codes for the representation of names + of languages -- Part 1: Alpha-2 code", July 2002. + + [ISO639-2] International Organization for Standardization, "ISO + 639-2:1998. Codes for the representation of names + of languages -- Part 2: Alpha-3 code", October 1998. + + [ISO639-3] International Organization for Standardization, "ISO + 639-3:2007. Codes for the representation of names + of languages - Part 3: Alpha-3 code for + comprehensive coverage of languages", February 2007. + + + + + + + +Phillips & Davis Best Current Practice [Page 76] + +RFC 5646 Language Tags September 2009 + + + [ISO639-5] International Organization for Standardization, "ISO + 639-5:2008. Codes for the representation of names of + languages -- Part 5: Alpha-3 code for language + families and groups", May 2008. + + [ISO646] International Organization for Standardization, + "ISO/IEC 646:1991, Information technology -- ISO + 7-bit coded character set for information + interchange.", 1991. + + [RFC2026] Bradner, S., "The Internet Standards Process -- + Revision 3", BCP 9, RFC 2026, October 1996. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and + Languages", BCP 18, RFC 2277, January 1998. + + [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the + Internet: Timestamps", RFC 3339, July 2002. + + [RFC4647] Phillips, A. and M. Davis, "Matching of Language + Tags", BCP 47, RFC 4647, September 2006. + + [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for + Writing an IANA Considerations Section in RFCs", + BCP 26, RFC 5226, May 2008. + + [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for + Syntax Specifications: ABNF", STD 68, RFC 5234, + January 2008. + + [SpecialCasing] The Unicode Consoritum, "Unicode Character Database, + Special Casing Properties", March 2008, . + + [UAX14] Freitag, A., "Unicode Standard Annex #14: Line + Breaking Properties", August 2006, + . + + [UN_M.49] Statistics Division, United Nations, "Standard + Country or Area Codes for Statistical Use", Revision + 4 (United Nations publication, Sales No. 98.XVII.9, + June 1999. + + + + + + +Phillips & Davis Best Current Practice [Page 77] + +RFC 5646 Language Tags September 2009 + + + [Unicode] Unicode Consortium, "The Unicode Consortium. The + Unicode Standard, Version 5.0, (Boston, MA, Addison- + Wesley, 2003. ISBN 0-321-49081-0)", January 2007. + +9.2. Informative References + + [CLDR] "The Common Locale Data Repository Project", + . + + [RFC1766] Alvestrand, H., "Tags for the Identification of + Languages", RFC 1766, March 1995. + + [RFC2028] Hovey, R. and S. Bradner, "The Organizations + Involved in the IETF Standards Process", BCP 11, + RFC 2028, October 1996. + + [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet + Mail Extensions (MIME) Part Two: Media Types", + RFC 2046, November 1996. + + [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail + Extensions) Part Three: Message Header Extensions + for Non-ASCII Text", RFC 2047, November 1996. + + [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and + Encoded Word Extensions: + Character Sets, Languages, and Continuations", + RFC 2231, November 1997. + + [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., + Masinter, L., Leach, P., and T. Berners-Lee, + "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, + June 1999. + + [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of + ISO 10646", RFC 2781, February 2000. + + [RFC3066] Alvestrand, H., "Tags for the Identification of + Languages", RFC 3066, January 2001. + + [RFC3282] Alvestrand, H., "Content Language Headers", + RFC 3282, May 2002. + + [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing + RFC Text on Security Considerations", BCP 72, + RFC 3552, July 2003. + + + + + +Phillips & Davis Best Current Practice [Page 78] + +RFC 5646 Language Tags September 2009 + + + [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO + 10646", STD 63, RFC 3629, November 2003. + + [RFC4645] Ewell, D., "Initial Language Subtag Registry", + RFC 4645, September 2006. + + [RFC4646] Phillips, A. and M. Davis, "Tags for Identifying + Languages", BCP 47, RFC 4646, September 2006. + + [RFC5645] Ewell, D., Ed., "Update to the Language Subtag + Registry", September 2009. + + [UTS35] Davis, M., "Unicode Technical Standard #35: Locale + Data Markup Language (LDML)", December 2007, + . + + [iso639.prin] ISO 639 Joint Advisory Committee, "ISO 639 Joint + Advisory Committee: Working principles for ISO 639 + maintenance", March 2000, . + + [record-jar] Raymond, E., "The Art of Unix Programming", 2003, + . + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 79] + +RFC 5646 Language Tags September 2009 + + +Appendix A. Examples of Language Tags (Informative) + + Simple language subtag: + + de (German) + + fr (French) + + ja (Japanese) + + i-enochian (example of a grandfathered tag) + + Language subtag plus Script subtag: + + zh-Hant (Chinese written using the Traditional Chinese script) + + zh-Hans (Chinese written using the Simplified Chinese script) + + sr-Cyrl (Serbian written using the Cyrillic script) + + sr-Latn (Serbian written using the Latin script) + + Extended language subtags and their primary language subtag + counterparts: + + zh-cmn-Hans-CN (Chinese, Mandarin, Simplified script, as used in + China) + + cmn-Hans-CN (Mandarin Chinese, Simplified script, as used in + China) + + zh-yue-HK (Chinese, Cantonese, as used in Hong Kong SAR) + + yue-HK (Cantonese Chinese, as used in Hong Kong SAR) + + Language-Script-Region: + + zh-Hans-CN (Chinese written using the Simplified script as used in + mainland China) + + sr-Latn-RS (Serbian written using the Latin script as used in + Serbia) + + + + + + + + + +Phillips & Davis Best Current Practice [Page 80] + +RFC 5646 Language Tags September 2009 + + + Language-Variant: + + sl-rozaj (Resian dialect of Slovenian) + + sl-rozaj-biske (San Giorgio dialect of Resian dialect of + Slovenian) + + sl-nedis (Nadiza dialect of Slovenian) + + Language-Region-Variant: + + de-CH-1901 (German as used in Switzerland using the 1901 variant + [orthography]) + + sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect) + + Language-Script-Region-Variant: + + hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as + used in Italy) + + Language-Region: + + de-DE (German for Germany) + + en-US (English as used in the United States) + + es-419 (Spanish appropriate for the Latin America and Caribbean + region using the UN region code) + + Private use subtags: + + de-CH-x-phonebk + + az-Arab-x-AZE-derbend + + Private use registry values: + + x-whatever (private use using the singleton 'x') + + qaa-Qaaa-QM-x-southern (all private tags) + + de-Qaaa (German, with a private script) + + sr-Latn-QM (Serbian, Latin script, private region) + + sr-Qaaa-RS (Serbian, private script, for Serbia) + + + + +Phillips & Davis Best Current Practice [Page 81] + +RFC 5646 Language Tags September 2009 + + + Tags that use extensions (examples ONLY -- extensions MUST be defined + by revision or update to this document, or by RFC): + + en-US-u-islamcal + + zh-CN-a-myext-x-private + + en-a-myext-b-another + + Some Invalid Tags: + + de-419-DE (two region tags) + + a-DE (use of a single-character subtag in primary position; note + that there are a few grandfathered tags that start with "i-" that + are valid) + + ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter + prefix) + +Appendix B. Examples of Registration Forms + + LANGUAGE SUBTAG REGISTRATION FORM + + 1. Name of requester: Han Steenwijk + 2. E-mail address of requester: han.steenwijk @ unipd.it + 3. Record Requested: + + Type: variant + Subtag: biske + Description: The San Giorgio dialect of Resian + Description: The Bila dialect of Resian + Prefix: sl-rozaj + Comments: The dialect of San Giorgio/Bila is one of the + four major local dialects of Resian + + 4. Intended meaning of the subtag: + + The local variety of Resian as spoken in San Giorgio/Bila + + 5. Reference to published description of the language (book or + article): + + -- Jan I.N. Baudouin de Courtenay - Opyt fonetiki rez'janskich + govorov, Varsava - Peterburg: Vende - Kozancikov, 1875. + + + + + + +Phillips & Davis Best Current Practice [Page 82] + +RFC 5646 Language Tags September 2009 + + + LANGUAGE SUBTAG REGISTRATION FORM + + 1. Name of requester: Jaska Zedlik + 2. E-mail address of requester: jz53 @ zedlik.com + 3. Record Requested: + + Type: variant + Subtag: tarask + Description: Belarusian in Taraskievica orthography + Prefix: be + Comments: The subtag represents Branislau Taraskievic's Belarusian + orthography as published in "Bielaruski klasycny pravapis" by + Juras Buslakou, Vincuk Viacorka, Zmicier Sanko, and Zmicier Sauka + (Vilnia-Miensk 2005). + + 4. Intended meaning of the subtag: + + The subtag is intended to represent the Belarusian orthography as + published in "Bielaruski klasycny pravapis" by Juras Buslakou, Vincuk + Viacorka, Zmicier Sanko, and Zmicier Sauka (Vilnia-Miensk 2005). + + 5. Reference to published description of the language (book or + article): + + Taraskievic, Branislau. Bielaruskaja gramatyka dla skol. Vilnia: Vyd. + "Bielaruskaha kamitetu", 1929, 5th edition. + + Buslakou, Juras; Viacorka, Vincuk; Sanko, Zmicier; Sauka, Zmicier. + Bielaruski klasycny pravapis. Vilnia-Miensk, 2005. + + 6. Any other relevant information: + + Belarusian in Taraskievica orthography became widely used, especially + in Belarusian-speaking Internet segment, but besides this some books + and newspapers are also printed using this orthography of Belarusian. + +Appendix C. Acknowledgements + + Any list of contributors is bound to be incomplete; please regard the + following as only a selection from the group of people who have + contributed to make this document what it is today. + + The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the + precursors of this document, made enormous contributions directly or + indirectly to this document and are generally responsible for the + success of language tags. + + + + + +Phillips & Davis Best Current Practice [Page 83] + +RFC 5646 Language Tags September 2009 + + + The following people contributed to this document: + + Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan, + Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion + Gunn, Alfred Hoenes, Kent Karlsson, Chris Newman, Randy Presuhn, + Stephen Silver, Shawn Steele, and many, many others. + + Very special thanks must go to Harald Tveit Alvestrand, who + originated RFCs 1766 and 3066, and without whom this document would + not have been possible. + + Special thanks go to Michael Everson, who served as the Language Tag + Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as + the Language Subtag Reviewer since the adoption of RFC 4646. + + Special thanks also go to Doug Ewell, for his production of the first + complete subtag registry, his work to support and maintain new + registrations, and his careful editorship of both RFC 4645 and + [RFC5645]. + +Authors' Addresses + + Addison Phillips (editor) + Lab126 + + EMail: addison@inter-locale.com + URI: http://www.inter-locale.com + + + Mark Davis (editor) + Google + + EMail: markdavis@google.com + + + + + + + + + + + + + + + + + + +Phillips & Davis Best Current Practice [Page 84] + -- cgit v1.2.3