summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc5646.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc5646.txt')
-rw-r--r--doc/rfc/rfc5646.txt4707
1 files changed, 4707 insertions, 0 deletions
diff --git a/doc/rfc/rfc5646.txt b/doc/rfc/rfc5646.txt
new file mode 100644
index 0000000..327b832
--- /dev/null
+++ b/doc/rfc/rfc5646.txt
@@ -0,0 +1,4707 @@
+
+
+
+
+
+
+Network Working Group A. Phillips, Ed.
+Request for Comments: 5646 Lab126
+BCP: 47 M. Davis, Ed.
+Obsoletes: 4646 Google
+Category: Best Current Practice September 2009
+
+
+ Tags for Identifying Languages
+
+Abstract
+
+ This document describes the structure, content, construction, and
+ semantics of language tags for use in cases where it is desirable to
+ indicate the language used in an information object. It also
+ describes how to register values for use in language tags and the
+ creation of user-defined extensions for private interchange.
+
+Status of This Memo
+
+ This document specifies an Internet Best Current Practices for the
+ Internet Community, and requests discussion and suggestions for
+ improvements. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (c) 2009 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents in effect on the date of
+ publication of this document (http://trustee.ietf.org/license-info).
+ Please review these documents carefully, as they describe your rights
+ and restrictions with respect to this document.
+
+ This document may contain material from IETF Documents or IETF
+ Contributions published or made publicly available before November
+ 10, 2008. The person(s) controlling the copyright in some of this
+ material may not have granted the IETF Trust the right to allow
+ modifications of such material outside the IETF Standards Process.
+ Without obtaining an adequate license from the person(s) controlling
+ the copyright in such materials, this document may not be modified
+ outside the IETF Standards Process, and derivative works of it may
+ not be created outside the IETF Standards Process, except to format
+ it for publication as an RFC or to translate it into languages other
+ than English.
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 1]
+
+RFC 5646 Language Tags September 2009
+
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 2. The Language Tag . . . . . . . . . . . . . . . . . . . . . . . 4
+ 2.1. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 4
+ 2.1.1. Formatting of Language Tags . . . . . . . . . . . . . 6
+ 2.2. Language Subtag Sources and Interpretation . . . . . . . . 8
+ 2.2.1. Primary Language Subtag . . . . . . . . . . . . . . . . 9
+ 2.2.2. Extended Language Subtags . . . . . . . . . . . . . . 11
+ 2.2.3. Script Subtag . . . . . . . . . . . . . . . . . . . . 12
+ 2.2.4. Region Subtag . . . . . . . . . . . . . . . . . . . . 13
+ 2.2.5. Variant Subtags . . . . . . . . . . . . . . . . . . . 15
+ 2.2.6. Extension Subtags . . . . . . . . . . . . . . . . . . 16
+ 2.2.7. Private Use Subtags . . . . . . . . . . . . . . . . . 18
+ 2.2.8. Grandfathered and Redundant Registrations . . . . . . 18
+ 2.2.9. Classes of Conformance . . . . . . . . . . . . . . . . 19
+ 3. Registry Format and Maintenance . . . . . . . . . . . . . . . 21
+ 3.1. Format of the IANA Language Subtag Registry . . . . . . . 21
+ 3.1.1. File Format . . . . . . . . . . . . . . . . . . . . . 21
+ 3.1.2. Record and Field Definitions . . . . . . . . . . . . . 23
+ 3.1.3. Type Field . . . . . . . . . . . . . . . . . . . . . . 26
+ 3.1.4. Subtag and Tag Fields . . . . . . . . . . . . . . . . 26
+ 3.1.5. Description Field . . . . . . . . . . . . . . . . . . 26
+ 3.1.6. Deprecated Field . . . . . . . . . . . . . . . . . . . 28
+ 3.1.7. Preferred-Value Field . . . . . . . . . . . . . . . . 28
+ 3.1.8. Prefix Field . . . . . . . . . . . . . . . . . . . . . 31
+ 3.1.9. Suppress-Script Field . . . . . . . . . . . . . . . . 32
+ 3.1.10. Macrolanguage Field . . . . . . . . . . . . . . . . . 32
+ 3.1.11. Scope Field . . . . . . . . . . . . . . . . . . . . . 33
+ 3.1.12. Comments Field . . . . . . . . . . . . . . . . . . . . 34
+ 3.2. Language Subtag Reviewer . . . . . . . . . . . . . . . . . 35
+ 3.3. Maintenance of the Registry . . . . . . . . . . . . . . . 35
+ 3.4. Stability of IANA Registry Entries . . . . . . . . . . . . 36
+ 3.5. Registration Procedure for Subtags . . . . . . . . . . . . 41
+ 3.6. Possibilities for Registration . . . . . . . . . . . . . . 46
+ 3.7. Extensions and the Extensions Registry . . . . . . . . . . 49
+ 3.8. Update of the Language Subtag Registry . . . . . . . . . . 52
+ 3.9. Applicability of the Subtag Registry . . . . . . . . . . . 52
+ 4. Formation and Processing of Language Tags . . . . . . . . . . 53
+ 4.1. Choice of Language Tag . . . . . . . . . . . . . . . . . . 53
+ 4.1.1. Tagging Encompassed Languages . . . . . . . . . . . . 58
+ 4.1.2. Using Extended Language Subtags . . . . . . . . . . . 59
+ 4.2. Meaning of the Language Tag . . . . . . . . . . . . . . . 61
+ 4.3. Lists of Languages . . . . . . . . . . . . . . . . . . . . 63
+ 4.4. Length Considerations . . . . . . . . . . . . . . . . . . 63
+ 4.4.1. Working with Limited Buffer Sizes . . . . . . . . . . 64
+ 4.4.2. Truncation of Language Tags . . . . . . . . . . . . . 65
+ 4.5. Canonicalization of Language Tags . . . . . . . . . . . . 66
+
+
+
+Phillips & Davis Best Current Practice [Page 2]
+
+RFC 5646 Language Tags September 2009
+
+
+ 4.6. Considerations for Private Use Subtags . . . . . . . . . . 68
+ 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 69
+ 5.1. Language Subtag Registry . . . . . . . . . . . . . . . . . 69
+ 5.2. Extensions Registry . . . . . . . . . . . . . . . . . . . 71
+ 6. Security Considerations . . . . . . . . . . . . . . . . . . . 71
+ 7. Character Set Considerations . . . . . . . . . . . . . . . . . 72
+ 8. Changes from RFC 4646 . . . . . . . . . . . . . . . . . . . . 73
+ 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 76
+ 9.1. Normative References . . . . . . . . . . . . . . . . . . . 76
+ 9.2. Informative References . . . . . . . . . . . . . . . . . . 78
+ Appendix A. Examples of Language Tags (Informative) . . . . . . . 80
+ Appendix B. Examples of Registration Forms . . . . . . . . . . . 82
+ Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . . 83
+
+1. Introduction
+
+ Human beings on our planet have, past and present, used a number of
+ languages. There are many reasons why one would want to identify the
+ language used when presenting or requesting information.
+
+ The language of an information item or a user's language preferences
+ often need to be identified so that appropriate processing can be
+ applied. For example, the user's language preferences in a Web
+ browser can be used to select Web pages appropriately. Language
+ information can also be used to select among tools (such as
+ dictionaries) to assist in the processing or understanding of content
+ in different languages. Knowledge about the particular language used
+ by some piece of information content might be useful or even required
+ by some types of processing, for example, spell-checking, computer-
+ synthesized speech, Braille transcription, or high-quality print
+ renderings.
+
+ One means of indicating the language used is by labeling the
+ information content with an identifier or "tag". These tags can also
+ be used to specify the user's preferences when selecting information
+ content or to label additional attributes of content and associated
+ resources.
+
+ Sometimes language tags are used to indicate additional language
+ attributes of content. For example, indicating specific information
+ about the dialect, writing system, or orthography used in a document
+ or resource may enable the user to obtain information in a form that
+ they can understand, or it can be important in processing or
+ rendering the given content into an appropriate form or style.
+
+ This document specifies a particular identifier mechanism (the
+ language tag) and a registration function for values to be used to
+
+
+
+
+Phillips & Davis Best Current Practice [Page 3]
+
+RFC 5646 Language Tags September 2009
+
+
+ form tags. It also defines a mechanism for private use values and
+ future extensions.
+
+ This document replaces [RFC4646] (which obsoleted [RFC3066] which, in
+ turn, replaced [RFC1766]). This document, in combination with
+ [RFC4647], comprises BCP 47. For a list of changes in this document,
+ see Section 8.
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [RFC2119].
+
+2. The Language Tag
+
+ Language tags are used to help identify languages, whether spoken,
+ written, signed, or otherwise signaled, for the purpose of
+ communication. This includes constructed and artificial languages
+ but excludes languages not intended primarily for human
+ communication, such as programming languages.
+
+2.1. Syntax
+
+ A language tag is composed from a sequence of one or more "subtags",
+ each of which refines or narrows the range of language identified by
+ the overall tag. Subtags, in turn, are a sequence of alphanumeric
+ characters (letters and digits), distinguished and separated from
+ other subtags in a tag by a hyphen ("-", [Unicode] U+002D).
+
+ There are different types of subtag, each of which is distinguished
+ by length, position in the tag, and content: each subtag's type can
+ be recognized solely by these features. This makes it possible to
+ extract and assign some semantic information to the subtags, even if
+ the specific subtag values are not recognized. Thus, a language tag
+ processor need not have a list of valid tags or subtags (that is, a
+ copy of some version of the IANA Language Subtag Registry) in order
+ to perform common searching and matching operations. The only
+ exceptions to this ability to infer meaning from subtag structure are
+ the grandfathered tags listed in the productions 'regular' and
+ 'irregular' below. These tags were registered under [RFC3066] and
+ are a fixed list that can never change.
+
+ The syntax of the language tag in ABNF [RFC5234] is:
+
+ Language-Tag = langtag ; normal language tags
+ / privateuse ; private use tag
+ / grandfathered ; grandfathered tags
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 4]
+
+RFC 5646 Language Tags September 2009
+
+
+ langtag = language
+ ["-" script]
+ ["-" region]
+ *("-" variant)
+ *("-" extension)
+ ["-" privateuse]
+
+ language = 2*3ALPHA ; shortest ISO 639 code
+ ["-" extlang] ; sometimes followed by
+ ; extended language subtags
+ / 4ALPHA ; or reserved for future use
+ / 5*8ALPHA ; or registered language subtag
+
+ extlang = 3ALPHA ; selected ISO 639 codes
+ *2("-" 3ALPHA) ; permanently reserved
+
+ script = 4ALPHA ; ISO 15924 code
+
+ region = 2ALPHA ; ISO 3166-1 code
+ / 3DIGIT ; UN M.49 code
+
+ variant = 5*8alphanum ; registered variants
+ / (DIGIT 3alphanum)
+
+ extension = singleton 1*("-" (2*8alphanum))
+
+ ; Single alphanumerics
+ ; "x" reserved for private use
+ singleton = DIGIT ; 0 - 9
+ / %x41-57 ; A - W
+ / %x59-5A ; Y - Z
+ / %x61-77 ; a - w
+ / %x79-7A ; y - z
+
+ privateuse = "x" 1*("-" (1*8alphanum))
+
+ grandfathered = irregular ; non-redundant tags registered
+ / regular ; during the RFC 3066 era
+
+ irregular = "en-GB-oed" ; irregular tags do not match
+ / "i-ami" ; the 'langtag' production and
+ / "i-bnn" ; would not otherwise be
+ / "i-default" ; considered 'well-formed'
+ / "i-enochian" ; These tags are all valid,
+ / "i-hak" ; but most are deprecated
+ / "i-klingon" ; in favor of more modern
+ / "i-lux" ; subtags or subtag
+ / "i-mingo" ; combination
+
+
+
+Phillips & Davis Best Current Practice [Page 5]
+
+RFC 5646 Language Tags September 2009
+
+
+ / "i-navajo"
+ / "i-pwn"
+ / "i-tao"
+ / "i-tay"
+ / "i-tsu"
+ / "sgn-BE-FR"
+ / "sgn-BE-NL"
+ / "sgn-CH-DE"
+
+ regular = "art-lojban" ; these tags match the 'langtag'
+ / "cel-gaulish" ; production, but their subtags
+ / "no-bok" ; are not extended language
+ / "no-nyn" ; or variant subtags: their meaning
+ / "zh-guoyu" ; is defined by their registration
+ / "zh-hakka" ; and all of these are deprecated
+ / "zh-min" ; in favor of a more modern
+ / "zh-min-nan" ; subtag or sequence of subtags
+ / "zh-xiang"
+
+ alphanum = (ALPHA / DIGIT) ; letters and numbers
+
+ Figure 1: Language Tag ABNF
+
+ For examples of language tags, see Appendix A.
+
+ All subtags have a maximum length of eight characters. Whitespace is
+ not permitted in a language tag. There is a subtlety in the ABNF
+ production 'variant': a variant starting with a digit has a minimum
+ length of four characters, while those starting with a letter have a
+ minimum length of five characters.
+
+ Although [RFC5234] refers to octets, the language tags described in
+ this document are sequences of characters from the US-ASCII [ISO646]
+ repertoire. Language tags MAY be used in documents and applications
+ that use other encodings, so long as these encompass the relevant
+ part of the US-ASCII repertoire. An example of this would be an XML
+ document that uses the UTF-16LE [RFC2781] encoding of [Unicode].
+
+2.1.1. Formatting of Language Tags
+
+ At all times, language tags and their subtags, including private use
+ and extensions, are to be treated as case insensitive: there exist
+ conventions for the capitalization of some of the subtags, but these
+ MUST NOT be taken to carry meaning.
+
+ Thus, the tag "mn-Cyrl-MN" is not distinct from "MN-cYRL-mn" or "mN-
+ cYrL-Mn" (or any other combination), and each of these variations
+
+
+
+
+Phillips & Davis Best Current Practice [Page 6]
+
+RFC 5646 Language Tags September 2009
+
+
+ conveys the same meaning: Mongolian written in the Cyrillic script as
+ used in Mongolia.
+
+ The ABNF syntax also does not distinguish between upper- and
+ lowercase: the uppercase US-ASCII letters in the range 'A' through
+ 'Z' are always considered equivalent and mapped directly to their US-
+ ASCII lowercase equivalents in the range 'a' through 'z'. So the tag
+ "I-AMI" is considered equivalent to that value "i-ami" in the
+ 'irregular' production.
+
+ Although case distinctions do not carry meaning in language tags,
+ consistent formatting and presentation of language tags will aid
+ users. The format of subtags in the registry is RECOMMENDED as the
+ form to use in language tags. This format generally corresponds to
+ the common conventions for the various ISO standards from which the
+ subtags are derived.
+
+ These conventions include:
+
+ o [ISO639-1] recommends that language codes be written in lowercase
+ ('mn' Mongolian).
+
+ o [ISO15924] recommends that script codes use lowercase with the
+ initial letter capitalized ('Cyrl' Cyrillic).
+
+ o [ISO3166-1] recommends that country codes be capitalized ('MN'
+ Mongolia).
+
+ An implementation can reproduce this format without accessing the
+ registry as follows. All subtags, including extension and private
+ use subtags, use lowercase letters with two exceptions: two-letter
+ and four-letter subtags that neither appear at the start of the tag
+ nor occur after singletons. Such two-letter subtags are all
+ uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four-
+ letter subtags are titlecase (as in the tag "az-Latn-x-latn").
+
+ Note: Case folding of ASCII letters in certain locales, unless
+ carefully handled, sometimes produces non-ASCII character values.
+ The Unicode Character Database file "SpecialCasing.txt"
+ [SpecialCasing] defines the specific cases that are known to cause
+ problems with this. In particular, the letter 'i' (U+0069) in
+ Turkish and Azerbaijani is uppercased to U+0130 (LATIN CAPITAL LETTER
+ I WITH DOT ABOVE). Implementers SHOULD specify a locale-neutral
+ casing operation to ensure that case folding of subtags does not
+ produce this value, which is illegal in language tags. For example,
+ if one were to uppercase the region subtag 'in' using Turkish locale
+ rules, the sequence U+0130 U+004E would result, instead of the
+ expected 'IN'.
+
+
+
+Phillips & Davis Best Current Practice [Page 7]
+
+RFC 5646 Language Tags September 2009
+
+
+2.2. Language Subtag Sources and Interpretation
+
+ The namespace of language tags and their subtags is administered by
+ the Internet Assigned Numbers Authority (IANA) according to the rules
+ in Section 5 of this document. The Language Subtag Registry
+ maintained by IANA is the source for valid subtags: other standards
+ referenced in this section provide the source material for that
+ registry.
+
+ Terminology used in this document:
+
+ o "Tag" refers to a complete language tag, such as "sr-Latn-RS" or
+ "az-Arab-IR". Examples of tags in this document are enclosed in
+ double-quotes ("en-US").
+
+ o "Subtag" refers to a specific section of a tag, delimited by a
+ hyphen, such as the subtags 'zh', 'Hant', and 'CN' in the tag "zh-
+ Hant-CN". Examples of subtags in this document are enclosed in
+ single quotes ('Hant').
+
+ o "Code" refers to values defined in external standards (and that
+ are used as subtags in this document). For example, 'Hant' is an
+ [ISO15924] script code that was used to define the 'Hant' script
+ subtag for use in a language tag. Examples of codes in this
+ document are enclosed in single quotes ('en', 'Hant').
+
+ Language tags are designed so that each subtag type has unique length
+ and content restrictions. These make identification of the subtag's
+ type possible, even if the content of the subtag itself is
+ unrecognized. This allows tags to be parsed and processed without
+ reference to the latest version of the underlying standards or the
+ IANA registry and makes the associated exception handling when
+ parsing tags simpler.
+
+ Some of the subtags in the IANA registry do not come from an
+ underlying standard. These can only appear in specific positions in
+ a tag: they can only occur as primary language subtags or as variant
+ subtags.
+
+ Sequences of private use and extension subtags MUST occur at the end
+ of the sequence of subtags and MUST NOT be interspersed with subtags
+ defined elsewhere in this document. These sequences are introduced
+ by single-character subtags, which are reserved as follows:
+
+ o The single-letter subtag 'x' introduces a sequence of private use
+ subtags. The interpretation of any private use subtag is defined
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 8]
+
+RFC 5646 Language Tags September 2009
+
+
+ solely by private agreement and is not defined by the rules in
+ this section or in any standard or registry defined in this
+ document.
+
+ o The single-letter subtag 'i' is used by some grandfathered tags,
+ such as "i-default", where it always appears in the first position
+ and cannot be confused with an extension.
+
+ o All other single-letter and single-digit subtags are reserved to
+ introduce standardized extension subtag sequences as described in
+ Section 3.7.
+
+2.2.1. Primary Language Subtag
+
+ The primary language subtag is the first subtag in a language tag and
+ cannot be omitted, with two exceptions:
+
+ o The single-character subtag 'x' as the primary subtag indicates
+ that the language tag consists solely of subtags whose meaning is
+ defined by private agreement. For example, in the tag "x-fr-CH",
+ the subtags 'fr' and 'CH' do not represent the French language or
+ the country of Switzerland (or any other value in the IANA
+ registry) unless there is a private agreement in place to do so.
+ See Section 4.6.
+
+ o The single-character subtag 'i' is used by some grandfathered tags
+ (see Section 2.2.8) such as "i-klingon" and "i-bnn". (Other
+ grandfathered tags have a primary language subtag in their first
+ position.)
+
+ The following rules apply to the primary language subtag:
+
+ 1. Two-character primary language subtags were defined in the IANA
+ registry according to the assignments found in the standard "ISO
+ 639-1:2002, Codes for the representation of names of languages --
+ Part 1: Alpha-2 code" [ISO639-1], or using assignments
+ subsequently made by the ISO 639-1 registration authority (RA) or
+ governing standardization bodies.
+
+ 2. Three-character primary language subtags in the IANA registry
+ were defined according to the assignments found in one of these
+ additional ISO 639 parts or assignments subsequently made by the
+ relevant ISO 639 registration authorities or governing
+ standardization bodies:
+
+ A. "ISO 639-2:1998 - Codes for the representation of names of
+ languages -- Part 2: Alpha-3 code - edition 1" [ISO639-2]
+
+
+
+
+Phillips & Davis Best Current Practice [Page 9]
+
+RFC 5646 Language Tags September 2009
+
+
+ B. "ISO 639-3:2007 - Codes for the representation of names of
+ languages -- Part 3: Alpha-3 code for comprehensive coverage
+ of languages" [ISO639-3]
+
+ C. "ISO 639-5:2008 - Codes for the representation of names of
+ languages -- Part 5: Alpha-3 code for language families and
+ groups" [ISO639-5]
+
+ 3. The subtags in the range 'qaa' through 'qtz' are reserved for
+ private use in language tags. These subtags correspond to codes
+ reserved by ISO 639-2 for private use. These codes MAY be used
+ for non-registered primary language subtags (instead of using
+ private use subtags following 'x-'). Please refer to Section 4.6
+ for more information on private use subtags.
+
+ 4. Four-character language subtags are reserved for possible future
+ standardization.
+
+ 5. Any language subtags of five to eight characters in length in the
+ IANA registry were defined via the registration process in
+ Section 3.5 and MAY be used to form the primary language subtag.
+ An example of what such a registration might include is the
+ grandfathered IANA registration "i-enochian". The subtag
+ 'enochian' could be registered in the IANA registry as a primary
+ language subtag (assuming that ISO 639 does not register this
+ language first), making tags such as "enochian-AQ" and "enochian-
+ Latn" valid.
+
+ At the time this document was created, there were no examples of
+ this kind of subtag. Future registrations of this type are
+ discouraged: an attempt to register any new proposed primary
+ language MUST be made to the ISO 639 registration authority.
+ Proposals rejected by the ISO 639 registration authority are
+ unlikely to meet the criteria for primary language subtags and
+ are thus unlikely to be registered.
+
+ 6. Other values MUST NOT be assigned to the primary subtag except by
+ revision or update of this document.
+
+ When languages have both an ISO 639-1 two-character code and a three-
+ character code (assigned by ISO 639-2, ISO 639-3, or ISO 639-5), only
+ the ISO 639-1 two-character code is defined in the IANA registry.
+
+ When a language has no ISO 639-1 two-character code and the ISO
+ 639-2/T (Terminology) code and the ISO 639-2/B (Bibliographic) code
+ for that language differ, only the Terminology code is defined in the
+ IANA registry. At the time this document was created, all languages
+ that had both kinds of three-character codes were also assigned a
+
+
+
+Phillips & Davis Best Current Practice [Page 10]
+
+RFC 5646 Language Tags September 2009
+
+
+ two-character code; it is expected that future assignments of this
+ nature will not occur.
+
+ In order to avoid instability in the canonical form of tags, if a
+ two-character code is added to ISO 639-1 for a language for which a
+ three-character code was already included in either ISO 639-2 or ISO
+ 639-3, the two-character code MUST NOT be registered. See
+ Section 3.4.
+
+ For example, if some content were tagged with 'haw' (Hawaiian), which
+ currently has no two-character code, the tag would not need to be
+ changed if ISO 639-1 were to assign a two-character code to the
+ Hawaiian language at a later date.
+
+ To avoid these problems with versioning and subtag choice (as
+ experienced during the transition between RFC 1766 and RFC 3066), as
+ well as to ensure the canonical nature of subtags defined by this
+ document, the ISO 639 Registration Authority Joint Advisory Committee
+ (ISO 639/RA-JAC) has included the following statement in
+ [iso639.prin]:
+
+ "A language code already in ISO 639-2 at the point of freezing ISO
+ 639-1 shall not later be added to ISO 639-1. This is to ensure
+ consistency in usage over time, since users are directed in
+ Internet applications to employ the alpha-3 code when an alpha-2
+ code for that language is not available."
+
+2.2.2. Extended Language Subtags
+
+ Extended language subtags are used to identify certain specially
+ selected languages that, for various historical and compatibility
+ reasons, are closely identified with or tagged using an existing
+ primary language subtag. Extended language subtags are always used
+ with their enclosing primary language subtag (indicated with a
+ 'Prefix' field in the registry) when used to form the language tag.
+ All languages that have an extended language subtag in the registry
+ also have an identical primary language subtag record in the
+ registry. This primary language subtag is RECOMMENDED for forming
+ the language tag. The following rules apply to the extended language
+ subtags:
+
+ 1. Extended language subtags consist solely of three-letter subtags.
+ All extended language subtag records defined in the registry were
+ defined according to the assignments found in [ISO639-3].
+ Language collections and groupings, such as defined in
+ [ISO639-5], are specifically excluded from being extended
+ language subtags.
+
+
+
+
+Phillips & Davis Best Current Practice [Page 11]
+
+RFC 5646 Language Tags September 2009
+
+
+ 2. Extended language subtag records MUST include exactly one
+ 'Prefix' field indicating an appropriate subtag or sequence of
+ subtags for that extended language subtag.
+
+ 3. Extended language subtag records MUST include a 'Preferred-
+ Value'. The 'Preferred-Value' and 'Subtag' fields MUST be
+ identical.
+
+ 4. Although the ABNF production 'extlang' permits up to three
+ extended language tags in the language tag, extended language
+ subtags MUST NOT include another extended language subtag in
+ their 'Prefix'. That is, the second and third extended language
+ subtag positions in a language tag are permanently reserved and
+ tags that include those subtags in that position are, and will
+ always remain, invalid.
+
+ For example, the macrolanguage Chinese ('zh') encompasses a number of
+ languages. For compatibility reasons, each of these languages has
+ both a primary and extended language subtag in the registry. A few
+ selected examples of these include Gan Chinese ('gan'), Cantonese
+ Chinese ('yue'), and Mandarin Chinese ('cmn'). Each is encompassed
+ by the macrolanguage 'zh' (Chinese). Therefore, they each have the
+ prefix "zh" in their registry records. Thus, Gan Chinese is
+ represented with tags beginning "zh-gan" or "gan", Cantonese with
+ tags beginning either "yue" or "zh-yue", and Mandarin Chinese with
+ "zh-cmn" or "cmn". The language subtag 'zh' can still be used
+ without an extended language subtag to label a resource as some
+ unspecified variety of Chinese, while the primary language subtag
+ ('gan', 'yue', 'cmn') is preferred to using the extended language
+ form ("zh-gan", "zh-yue", "zh-cmn").
+
+2.2.3. Script Subtag
+
+ Script subtags are used to indicate the script or writing system
+ variations that distinguish the written forms of a language or its
+ dialects. The following rules apply to the script subtags:
+
+ 1. Script subtags MUST follow any primary and extended language
+ subtags and MUST precede any other type of subtag.
+
+ 2. Script subtags consist of four letters and were defined according
+ to the assignments found in [ISO15924] ("Information and
+ documentation -- Codes for the representation of names of
+ scripts"), or subsequently assigned by the ISO 15924 registration
+ authority or governing standardization bodies. Only codes
+ assigned by ISO 15924 will be considered for registration.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 12]
+
+RFC 5646 Language Tags September 2009
+
+
+ 3. The script subtags 'Qaaa' through 'Qabx' are reserved for private
+ use in language tags. These subtags correspond to codes reserved
+ by ISO 15924 for private use. These codes MAY be used for non-
+ registered script values. Please refer to Section 4.6 for more
+ information on private use subtags.
+
+ 4. There MUST be at most one script subtag in a language tag, and
+ the script subtag SHOULD be omitted when it adds no
+ distinguishing value to the tag or when the primary or extended
+ language subtag's record in the subtag registry includes a
+ 'Suppress-Script' field listing the applicable script subtag.
+
+ For example: "sr-Latn" represents Serbian written using the Latin
+ script.
+
+2.2.4. Region Subtag
+
+ Region subtags are used to indicate linguistic variations associated
+ with or appropriate to a specific country, territory, or region.
+ Typically, a region subtag is used to indicate variations such as
+ regional dialects or usage, or region-specific spelling conventions.
+ It can also be used to indicate that content is expressed in a way
+ that is appropriate for use throughout a region, for instance,
+ Spanish content tailored to be useful throughout Latin America.
+
+ The following rules apply to the region subtags:
+
+ 1. Region subtags MUST follow any primary language, extended
+ language, or script subtags and MUST precede any other type of
+ subtag.
+
+ 2. Two-letter region subtags were defined according to the
+ assignments found in [ISO3166-1] ("Codes for the representation
+ of names of countries and their subdivisions -- Part 1: Country
+ codes"), using the list of alpha-2 country codes or using
+ assignments subsequently made by the ISO 3166-1 maintenance
+ agency or governing standardization bodies. In addition, the
+ codes that are "exceptionally reserved" (as opposed to
+ "assigned") in ISO 3166-1 were also defined in the registry, with
+ the exception of 'UK', which is an exact synonym for the assigned
+ code 'GB'.
+
+ 3. The region subtags 'AA', 'QM'-'QZ', 'XA'-'XZ', and 'ZZ' are
+ reserved for private use in language tags. These subtags
+ correspond to codes reserved by ISO 3166 for private use. These
+ codes MAY be used for private use region subtags (instead of
+ using a private use subtag sequence). Please refer to
+ Section 4.6 for more information on private use subtags.
+
+
+
+Phillips & Davis Best Current Practice [Page 13]
+
+RFC 5646 Language Tags September 2009
+
+
+ 4. Three-character region subtags consist solely of digit (number)
+ characters and were defined according to the assignments found in
+ the UN Standard Country or Area Codes for Statistical Use
+ [UN_M.49] or assignments subsequently made by the governing
+ standards body. Not all of the UN M.49 codes are defined in the
+ IANA registry. The following rules define which codes are
+ entered into the registry as valid subtags:
+
+ A. UN numeric codes assigned to 'macro-geographical
+ (continental)' or sub-regions MUST be registered in the
+ registry. These codes are not associated with an assigned
+ ISO 3166-1 alpha-2 code and represent supra-national areas,
+ usually covering more than one nation, state, province, or
+ territory.
+
+ B. UN numeric codes for 'economic groupings' or 'other
+ groupings' MUST NOT be registered in the IANA registry and
+ MUST NOT be used to form language tags.
+
+ C. When ISO 3166-1 reassigns a code formerly used for one
+ country or area to another country or area and that code
+ already is present in the registry, the UN numeric code for
+ that country or area MUST be registered in the registry as
+ described in Section 3.4 and MUST be used to form language
+ tags that represent the country or region for which it is
+ defined (rather than the recycled ISO 3166-1 code).
+
+ D. UN numeric codes for countries or areas for which there is an
+ associated ISO 3166-1 alpha-2 code in the registry MUST NOT
+ be entered into the registry and MUST NOT be used to form
+ language tags. Note that the ISO 3166-based subtag in the
+ registry MUST actually be associated with the UN M.49 code in
+ question.
+
+ E. For historical reasons, the UN numeric code 830 (Channel
+ Islands), which was not registered at the time this document
+ was adopted and had, at that time, no corresponding ISO
+ 3166-1 code, MAY be entered into the IANA registry via the
+ process described in Section 3.5, provided no ISO 3166-1 code
+ with that exact meaning has been previously registered.
+
+ F. All other UN numeric codes for countries or areas that do not
+ have an associated ISO 3166-1 alpha-2 code MUST NOT be
+ entered into the registry and MUST NOT be used to form
+ language tags. For more information about these codes, see
+ Section 3.4.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 14]
+
+RFC 5646 Language Tags September 2009
+
+
+ 5. The alphanumeric codes in Appendix X of the UN document MUST NOT
+ be entered into the registry and MUST NOT be used to form
+ language tags. (At the time this document was created, these
+ values matched the ISO 3166-1 alpha-2 codes.)
+
+ 6. There MUST be at most one region subtag in a language tag and the
+ region subtag MAY be omitted, as when it adds no distinguishing
+ value to the tag.
+
+ For example:
+
+ "de-AT" represents German ('de') as used in Austria ('AT').
+
+ "sr-Latn-RS" represents Serbian ('sr') written using Latin script
+ ('Latn') as used in Serbia ('RS').
+
+ "es-419" represents Spanish ('es') appropriate to the UN-defined
+ Latin America and Caribbean region ('419').
+
+2.2.5. Variant Subtags
+
+ Variant subtags are used to indicate additional, well-recognized
+ variations that define a language or its dialects that are not
+ covered by other available subtags. The following rules apply to the
+ variant subtags:
+
+ 1. Variant subtags MUST follow any primary language, extended
+ language, script, or region subtags and MUST precede any
+ extension or private use subtag sequences.
+
+ 2. Variant subtags, as a collection, are not associated with any
+ particular external standard. The meaning of variant subtags in
+ the registry is defined in the course of the registration process
+ defined in Section 3.5. Note that any particular variant subtag
+ might be associated with some external standard. However,
+ association with a standard is not required for registration.
+
+ 3. More than one variant MAY be used to form the language tag.
+
+ 4. Variant subtags MUST be registered with IANA according to the
+ rules in Section 3.5 of this document before being used to form
+ language tags. In order to distinguish variants from other types
+ of subtags, registrations MUST meet the following length and
+ content restrictions:
+
+ 1. Variant subtags that begin with a letter (a-z, A-Z) MUST be
+ at least five characters long.
+
+
+
+
+Phillips & Davis Best Current Practice [Page 15]
+
+RFC 5646 Language Tags September 2009
+
+
+ 2. Variant subtags that begin with a digit (0-9) MUST be at
+ least four characters long.
+
+ 5. The same variant subtag MUST NOT be used more than once within a
+ language tag.
+
+ * For example, the tag "de-DE-1901-1901" is not valid.
+
+ Variant subtag records in the Language Subtag Registry MAY include
+ one or more 'Prefix' (Section 3.1.8) fields. Each 'Prefix' indicates
+ a suitable sequence of subtags for forming (with other subtags, as
+ appropriate) a language tag when using the variant.
+
+ Most variants that share a prefix are mutually exclusive. For
+ example, the German orthographic variations '1996' and '1901' SHOULD
+ NOT be used in the same tag, as they represent the dates of different
+ spelling reforms. A variant that can meaningfully be used in
+ combination with another variant SHOULD include a 'Prefix' field in
+ its registry record that lists that other variant. For example, if
+ another German variant 'example' were created that made sense to use
+ with '1996', then 'example' should include two 'Prefix' fields: "de"
+ and "de-1996".
+
+ For example:
+
+ "sl-nedis" represents the Natisone or Nadiza dialect of Slovenian.
+
+ "de-CH-1996" represents German as used in Switzerland and as
+ written using the spelling reform beginning in the year 1996 C.E.
+
+2.2.6. Extension Subtags
+
+ Extensions provide a mechanism for extending language tags for use in
+ various applications. They are intended to identify information that
+ is commonly used in association with languages or language tags but
+ that is not part of language identification. See Section 3.7. The
+ following rules apply to extensions:
+
+ 1. An extension MUST follow at least a primary language subtag.
+ That is, a language tag cannot begin with an extension.
+ Extensions extend language tags, they do not override or replace
+ them. For example, "a-value" is not a well-formed language tag,
+ while "de-a-value" is. Note that extensions cannot be used in
+ tags that are entirely private use (that is, tags starting with
+ "x-").
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 16]
+
+RFC 5646 Language Tags September 2009
+
+
+ 2. Extension subtags are separated from the other subtags defined in
+ this document by a single-character subtag (called a
+ "singleton"). The singleton MUST be one allocated to a
+ registration authority via the mechanism described in Section 3.7
+ and MUST NOT be the letter 'x', which is reserved for private use
+ subtag sequences.
+
+ 3. Each singleton subtag MUST appear at most one time in each tag
+ (other than as a private use subtag). That is, singleton subtags
+ MUST NOT be repeated. For example, the tag "en-a-bbb-a-ccc" is
+ invalid because the subtag 'a' appears twice. Note that the tag
+ "en-a-bbb-x-a-ccc" is valid because the second appearance of the
+ singleton 'a' is in a private use sequence.
+
+ 4. Extension subtags MUST meet whatever requirements are set by the
+ document that defines their singleton prefix and whatever
+ requirements are provided by the maintaining authority. Note
+ that there might not be a registry of these subtags and
+ validating processors are not required to validate extensions.
+
+ 5. Each extension subtag MUST be from two to eight characters long
+ and consist solely of letters or digits, with each subtag
+ separated by a single '-'. Case distinctions are ignored in
+ extensions (as with any language subtag) and normalized subtags
+ of this type are expected to be in lowercase.
+
+ 6. Each singleton MUST be followed by at least one extension subtag.
+ For example, the tag "tlh-a-b-foo" is invalid because the first
+ singleton 'a' is followed immediately by another singleton 'b'.
+
+ 7. Extension subtags MUST follow all primary language, extended
+ language, script, region, and variant subtags in a tag and MUST
+ precede any private use subtag sequences.
+
+ 8. All subtags following the singleton and before another singleton
+ are part of the extension. Example: In the tag "fr-a-Latn", the
+ subtag 'Latn' does not represent the script subtag 'Latn' defined
+ in the IANA Language Subtag Registry. Its meaning is defined by
+ the extension 'a'.
+
+ 9. In the event that more than one extension appears in a single
+ tag, the tag SHOULD be canonicalized as described in Section 4.5,
+ by ordering the various extension sequences into case-insensitive
+ ASCII order.
+
+ For example, if an extension were defined for the singleton 'r' and
+ it defined the subtags shown, then the following tag would be a valid
+ example: "en-Latn-GB-boont-r-extended-sequence-x-private".
+
+
+
+Phillips & Davis Best Current Practice [Page 17]
+
+RFC 5646 Language Tags September 2009
+
+
+2.2.7. Private Use Subtags
+
+ Private use subtags are used to indicate distinctions in language
+ that are important in a given context by private agreement. The
+ following rules apply to private use subtags:
+
+ 1. Private use subtags are separated from the other subtags defined
+ in this document by the reserved single-character subtag 'x'.
+
+ 2. Private use subtags MUST conform to the format and content
+ constraints defined in the ABNF for all subtags; that is, they
+ MUST consist solely of letters and digits and not exceed eight
+ characters in length.
+
+ 3. Private use subtags MUST follow all primary language, extended
+ language, script, region, variant, and extension subtags in the
+ tag. Another way of saying this is that all subtags following
+ the singleton 'x' MUST be considered private use. Example: The
+ subtag 'US' in the tag "en-x-US" is a private use subtag.
+
+ 4. A tag MAY consist entirely of private use subtags.
+
+ 5. No source is defined for private use subtags. Use of private use
+ subtags is by private agreement only.
+
+ 6. Private use subtags are NOT RECOMMENDED where alternatives exist
+ or for general interchange. See Section 4.6 for more information
+ on private use subtag choice.
+
+ For example, suppose a group of scholars is studying some texts in
+ medieval Greek. They might agree to use some collection of private
+ use subtags to identify different styles of writing in the texts.
+ For example, they might use 'el-x-koine' for documents in the
+ "common" style while using 'el-x-attic' for other documents that
+ mimic the Attic style. These subtags would not be recognized by
+ outside processes or systems, but might be useful in categorizing
+ various texts for study by those in the group.
+
+ In the registry, there are also subtags derived from codes reserved
+ by ISO 639, ISO 15924, or ISO 3166 for private use. Do not confuse
+ these with private use subtag sequences following the subtag 'x'.
+ See Section 4.6.
+
+2.2.8. Grandfathered and Redundant Registrations
+
+ Prior to RFC 4646, whole language tags were registered according to
+ the rules in RFC 1766 and/or RFC 3066. All of these registered tags
+ remain valid as language tags.
+
+
+
+Phillips & Davis Best Current Practice [Page 18]
+
+RFC 5646 Language Tags September 2009
+
+
+ Many of these registered tags were made redundant by the advent of
+ either RFC 4646 or this document. A redundant tag is a grandfathered
+ registration whose individual subtags appear with the same semantic
+ meaning in the registry. For example, the tag "zh-Hant" (Traditional
+ Chinese) can now be composed from the subtags 'zh' (Chinese) and
+ 'Hant' (Han script traditional variant). These redundant tags are
+ maintained in the registry as records of type 'redundant', mostly as
+ a matter of historical curiosity.
+
+ The remainder of the previously registered tags are "grandfathered".
+ These tags are classified into two groups: 'regular' and 'irregular'.
+
+ Grandfathered tags that (appear to) match the 'langtag' production in
+ Figure 1 are considered 'regular' grandfathered tags. These tags
+ contain one or more subtags that either do not individually appear in
+ the registry or appear but with a different semantic meaning: each
+ tag, in its entirety, represents a language or collection of
+ languages.
+
+ Grandfathered tags that do not match the 'langtag' production in the
+ ABNF and would otherwise be invalid are considered 'irregular'
+ grandfathered tags. With the exception of "en-GB-oed", which is a
+ variant of "en-GB", each of them, in its entirety, represents a
+ language.
+
+ Many of the grandfathered tags have been superseded by the subsequent
+ addition of new subtags: each superseded record contains a
+ 'Preferred-Value' field that ought to be used to form language tags
+ representing that value. For example, the tag "art-lojban" is
+ superseded by the primary language subtag 'jbo'.
+
+2.2.9. Classes of Conformance
+
+ Implementations sometimes need to describe their capabilities with
+ regard to the rules and practices described in this document. Tags
+ can be checked or verified in a number of ways, but two particular
+ classes of tag conformance are formally defined here.
+
+ A tag is considered "well-formed" if it conforms to the ABNF
+ (Section 2.1). Language tags may be well-formed in terms of syntax
+ but not valid in terms of content. However, many operations
+ involving language tags work well without knowing anything about the
+ meaning or validity of the subtags.
+
+ A tag is considered "valid" if it satisfies these conditions:
+
+ o The tag is well-formed.
+
+
+
+
+Phillips & Davis Best Current Practice [Page 19]
+
+RFC 5646 Language Tags September 2009
+
+
+ o Either the tag is in the list of grandfathered tags or all of its
+ primary language, extended language, script, region, and variant
+ subtags appear in the IANA Language Subtag Registry as of the
+ particular registry date.
+
+ o There are no duplicate variant subtags.
+
+ o There are no duplicate singleton (extension) subtags.
+
+ Note that a tag's validity depends on the date of the registry used
+ to validate the tag. A more recent copy of the registry might
+ contain a subtag that an older version does not.
+
+ A tag is considered valid for a given extension (Section 3.7) (as of
+ a particular version, revision, and date) if it meets the criteria
+ for "valid" above and also satisfies this condition:
+
+ Each subtag used in the extension part of the tag is valid
+ according to the extension.
+
+ Older specifications or language tag implementations sometimes
+ reference [RFC3066]. A wider array of tags was considered well-
+ formed under that document. Any tags that were valid for use under
+ RFC 3066 are both well-formed and valid under this document's syntax;
+ only invalid or illegal tags were well-formed under the earlier
+ definition but no longer are. The language tag syntax under RFC 3066
+ was:
+
+ obs-language-tag = primary-subtag *( "-" subtag )
+ primary-subtag = 1*8ALPHA
+ subtag = 1*8(ALPHA / DIGIT)
+
+ Figure 2: RFC 3066 Language Tag Syntax
+
+ Subtags designated for private use as well as private use sequences
+ introduced by the 'x' subtag are available for cases in which no
+ assigned subtags are available and registration is not a suitable
+ option. For example, one might use a tag such as "no-QQ", where 'QQ'
+ is one of a range of private use ISO 3166-1 codes to indicate an
+ otherwise undefined region. Users MUST NOT assign language tags that
+ use subtags that do not appear in the registry other than in private
+ use sequences (such as the subtag 'personal' in the tag "en-x-
+ personal"). Besides not being valid, the user also risks collision
+ with a future possible assignment or registrations.
+
+ Note well: although the 'Language-Tag' production appearing in this
+ document is functionally equivalent to the one in [RFC4646], it has
+
+
+
+
+Phillips & Davis Best Current Practice [Page 20]
+
+RFC 5646 Language Tags September 2009
+
+
+ been changed to prevent certain errors in well-formedness arising
+ from the old 'grandfathered' production.
+
+3. Registry Format and Maintenance
+
+ The IANA Language Subtag Registry ("the registry") contains a
+ comprehensive list of all of the subtags valid in language tags.
+ This allows implementers a straightforward and reliable way to
+ validate language tags. The registry will be maintained so that,
+ except for extension subtags, it is possible to validate all of the
+ subtags that appear in a language tag under the provisions of this
+ document or its revisions or successors. In addition, the meaning of
+ the various subtags will be unambiguous and stable over time. (The
+ meaning of private use subtags, of course, is not defined by the
+ registry.)
+
+ This section defines the registry along with the maintenance and
+ update procedures associated with it, as well as a registry for
+ extensions to language tags (Section 3.7).
+
+3.1. Format of the IANA Language Subtag Registry
+
+ The IANA Language Subtag Registry is a machine-readable file in the
+ format described in this section, plus copies of the registration
+ forms approved in accordance with the process described in
+ Section 3.5.
+
+ The existing registration forms for grandfathered and redundant tags
+ taken from RFC 3066 have been maintained as part of the obsolete RFC
+ 3066 registry. The subtags added to the registry by either [RFC4645]
+ or [RFC5645] do not have separate registration forms (so no forms are
+ archived for these additions).
+
+3.1.1. File Format
+
+ The registry is a [Unicode] text file and consists of a series of
+ records in a format based on "record-jar" (described in
+ [record-jar]). Each record, in turn, consists of a series of fields
+ that describe the various subtags and tags. The actual registry file
+ is encoded using the UTF-8 [RFC3629] character encoding.
+
+ Each field can be considered a single, logical line of characters.
+ Each field contains a "field-name" and a "field-body". These are
+ separated by a "field-separator". The field-separator is a COLON
+ character (U+003A) plus any surrounding whitespace. Each field is
+ terminated by the newline sequence CRLF. The text in each field MUST
+ be in Unicode Normalization Form C (NFC).
+
+
+
+
+Phillips & Davis Best Current Practice [Page 21]
+
+RFC 5646 Language Tags September 2009
+
+
+ A collection of fields forms a "record". Records are separated by
+ lines containing only the sequence "%%" (U+0025 U+0025).
+
+ Although fields are logically a single line of text, each line of
+ text in the file format is limited to 72 bytes in length. To
+ accommodate this, the field-body can be split into a multiple-line
+ representation; this is called "folding". Folding is done according
+ to customary conventions for line-wrapping. This is typically on
+ whitespace boundaries, but can occur between other characters when
+ the value does not include spaces, such as when a language does not
+ use whitespace between words. In any event, there MUST NOT be breaks
+ inside a multibyte UTF-8 sequence or in the middle of a combining
+ character sequence. For more information, see [UAX14].
+
+ Although the file format uses the Unicode character set and the file
+ itself is encoded using the UTF-8 encoding, fields are restricted to
+ the printable characters from the US-ASCII [ISO646] repertoire unless
+ otherwise indicated in the description of a specific field
+ (Section 3.1.2).
+
+ The format of the registry is described by the following ABNF
+ [RFC5234]. Character numbers (code points) are taken from Unicode,
+ and terminals in the ABNF productions are in terms of characters
+ rather than bytes.
+
+ registry = record *("%%" CRLF record)
+ record = 1*field
+ field = ( field-name field-sep field-body CRLF )
+ field-name = (ALPHA / DIGIT) [*(ALPHA / DIGIT / "-") (ALPHA / DIGIT)]
+ field-sep = *SP ":" *SP
+ field-body = *([[*SP CRLF] 1*SP] 1*CHARS)
+ CHARS = (%x21-10FFFF) ; Unicode code points
+
+ Figure 3: Registry Format ABNF
+
+ The sequence '..' (U+002E U+002E) in a field-body denotes a range of
+ values. Such a range represents all subtags of the same length that
+ are in alphabetic or numeric order within that range, including the
+ values explicitly mentioned. For example, 'a..c' denotes the values
+ 'a', 'b', and 'c', and '11..13' denotes the values '11', '12', and
+ '13'.
+
+ All fields whose field-body contains a date value use the "full-date"
+ format specified in [RFC3339]. For example, "2004-06-28" represents
+ June 28, 2004, in the Gregorian calendar.
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 22]
+
+RFC 5646 Language Tags September 2009
+
+
+3.1.2. Record and Field Definitions
+
+ There are three types of records in the registry: "File-Date",
+ "Subtag", and "Tag".
+
+ The first record in the registry is always the "File-Date" record.
+ This record occurs only once in the file and contains a single field
+ whose field-name is "File-Date". The field-body of this record
+ contains a date (see Section 5.1), making it possible to easily
+ recognize different versions of the registry.
+
+ File-Date: 2004-06-28
+ %%
+
+ Figure 4: Example of the File-Date Record
+
+ Subsequent records contain multiple fields and represent information
+ about either subtags or tags. Both types of records have an
+ identical structure, except that "Subtag" records contain a field
+ with a field-name of "Subtag", while, unsurprisingly, "Tag" records
+ contain a field with a field-name of "Tag". Field-names MUST NOT
+ occur more than once per record, with the exception of the
+ 'Description', 'Comments', and 'Prefix' fields.
+
+ Each record MUST contain at least one of each of the following
+ fields:
+
+ o 'Type'
+
+ * Type's field-body MUST consist of one of the following strings:
+ "language", "extlang", "script", "region", "variant",
+ "grandfathered", and "redundant"; it denotes the type of tag or
+ subtag.
+
+ o Either 'Subtag' or 'Tag'
+
+ * Subtag's field-body contains the subtag being defined. This
+ field MUST appear in all records whose 'Type' has one of these
+ values: "language", "extlang", "script", "region", or
+ "variant".
+
+ * Tag's field-body contains a complete language tag. This field
+ MUST appear in all records whose 'Type' has one of these
+ values: "grandfathered" or "redundant". If the 'Type' is
+ "grandfathered", then the 'Tag' field-body will be one of the
+ tags listed in either the 'regular' or 'irregular' production
+ found in Section 2.1.
+
+
+
+
+Phillips & Davis Best Current Practice [Page 23]
+
+RFC 5646 Language Tags September 2009
+
+
+ o 'Description'
+
+ * Description's field-body contains a non-normative description
+ of the subtag or tag.
+
+ o 'Added'
+
+ * Added's field-body contains the date the record was registered
+ or, in the case of grandfathered or redundant tags, the date
+ the corresponding tag was registered under the rules of
+ [RFC1766] or [RFC3066].
+
+ Each record MAY also contain the following fields:
+
+ o 'Deprecated'
+
+ * Deprecated's field-body contains the date the record was
+ deprecated. In some cases, this value is earlier than that of
+ the 'Added' field in the same record. That is, the date of
+ deprecation preceded the addition of the record to the
+ registry.
+
+ o 'Preferred-Value'
+
+ * Preferred-Value's field-body contains a canonical mapping from
+ this record's value to a modern equivalent that is preferred in
+ its place. Depending on the value of the 'Type' field, this
+ value can take different forms:
+
+ + For fields of type 'language', 'Preferred-Value' contains
+ the primary language subtag that is preferred when forming
+ the language tag.
+
+ + For fields of type 'script', 'region', or 'variant',
+ 'Preferred-Value' contains the subtag of the same type that
+ is preferred for forming the language tag.
+
+ + For fields of type 'extlang', 'grandfathered', or
+ 'redundant', 'Preferred-Value' contains an "extended
+ language range" [RFC4647] that is preferred for forming the
+ language tag. That is, the preferred language tag will
+ contain, in order, each of the subtags that appears in the
+ 'Preferred-Value'; additional fields can be included in a
+ language tag, as described elsewhere in this document. For
+ example, the replacement for the grandfathered tag "zh-min-
+ nan" (Min Nan Chinese) is "nan", which can be used as the
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 24]
+
+RFC 5646 Language Tags September 2009
+
+
+ basis for tags such as "nan-Hant" or "nan-TW" (note that the
+ extended language subtag form such as "zh-nan-Hant" or "zh-
+ nan-TW" can also be used).
+
+ o 'Prefix'
+
+ * Prefix's field-body contains a valid language tag that is
+ RECOMMENDED as one possible prefix to this record's subtag.
+ This field MAY appear in records whose 'Type' field-body is
+ either 'extlang' or 'variant' (it MUST NOT appear in any other
+ record type).
+
+ o 'Suppress-Script'
+
+ * Suppress-Script's field-body contains a script subtag that
+ SHOULD NOT be used to form language tags with the associated
+ primary or extended language subtag. This field MUST appear
+ only in records whose 'Type' field-body is 'language' or
+ 'extlang'. See Section 4.1.
+
+ o 'Macrolanguage'
+
+ * Macrolanguage's field-body contains a primary language subtag
+ defined by ISO 639 as the "macrolanguage" that encompasses this
+ language subtag. This field MUST appear only in records whose
+ 'Type' field-body is either 'language' or 'extlang'.
+
+ o 'Scope'
+
+ * Scope's field-body contains information about a primary or
+ extended language subtag indicating the type of language code
+ according to ISO 639. The values permitted in this field are
+ "macrolanguage", "collection", "special", and "private-use".
+ This field only appears in records whose 'Type' field-body is
+ either 'language' or 'extlang'. When this field is omitted,
+ the language is an individual language.
+
+ o 'Comments'
+
+ * Comments's field-body contains additional information about the
+ subtag, as deemed appropriate for understanding the registry
+ and implementing language tags using the subtag or tag.
+
+ Future versions of this document might add additional fields to the
+ registry; implementations SHOULD ignore fields found in the registry
+ that are not defined in this document.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 25]
+
+RFC 5646 Language Tags September 2009
+
+
+3.1.3. Type Field
+
+ The field 'Type' contains the string identifying the record type in
+ which it appears. Values for the 'Type' field-body are: "language"
+ (Section 2.2.1); "extlang" (Section 2.2.2); "script" (Section 2.2.3);
+ "region" (Section 2.2.4); "variant" (Section 2.2.5); "grandfathered"
+ or "redundant" (Section 2.2.8).
+
+3.1.4. Subtag and Tag Fields
+
+ The field 'Subtag' contains the subtag defined in the record. The
+ field 'Tag' appears in records whose 'Type' is either 'grandfathered'
+ or 'redundant' and contains a tag registered under [RFC3066].
+
+ The 'Subtag' field-body MUST follow the casing conventions described
+ in Section 2.1.1. All subtags use lowercase letters in the field-
+ body, with two exceptions:
+
+ Subtags whose 'Type' field is 'script' (in other words, subtags
+ defined by ISO 15924) MUST use titlecase.
+
+ Subtags whose 'Type' field is 'region' (in other words, the non-
+ numeric region subtags defined by ISO 3166-1) MUST use all
+ uppercase.
+
+ The 'Tag' field-body MUST be formatted according to the rules
+ described in Section 2.1.1.
+
+3.1.5. Description Field
+
+ The field 'Description' contains a description of the tag or subtag
+ in the record. The 'Description' field MAY appear more than once per
+ record. The 'Description' field MAY include the full range of
+ Unicode characters. At least one of the 'Description' fields MUST be
+ written or transcribed into the Latin script; additional
+ 'Description' fields MAY be in any script or language.
+
+ The 'Description' field is used for identification purposes.
+ Descriptions SHOULD contain all and only that information necessary
+ to distinguish one subtag from others with which it might be
+ confused. They are not intended to provide general background
+ information or to provide all possible alternate names or
+ designations. 'Description' fields don't necessarily represent the
+ actual native name of the item in the record, nor are any of the
+ descriptions guaranteed to be in any particular language (such as
+ English or French, for example).
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 26]
+
+RFC 5646 Language Tags September 2009
+
+
+ Descriptions in the registry that correspond to ISO 639, ISO 15924,
+ ISO 3166-1, or UN M.49 codes are intended only to indicate the
+ meaning of that identifier as defined in the source standard at the
+ time it was added to the registry or as subsequently modified, within
+ the bounds of the stability rules (Section 3.4), via subsequent
+ registration. The 'Description' does not replace the content of the
+ source standard itself. 'Description' fields are not intended to be
+ the localized English names for the subtags. Localization or
+ translation of language tag and subtag descriptions is out of scope
+ of this document.
+
+ For subtags taken from a source standard (such as ISO 639 or ISO
+ 15924), the 'Description' fields in the record are also initially
+ taken from that source standard. Multiple descriptions in the source
+ standard are split into separate 'Description' fields. The source
+ standard's descriptions MAY be edited or modified, either prior to
+ insertion or via the registration process, and additional or
+ extraneous descriptions omitted or removed. Each 'Description' field
+ MUST be unique within the record in which it appears, and formatting
+ variations of the same description SHOULD NOT occur in that specific
+ record. For example, while the ISO 639-1 code 'fy' has both the
+ description "Western Frisian" and the description "Frisian, Western"
+ in that standard, only one of these descriptions appears in the
+ registry.
+
+ To help ensure that users do not become confused about which subtag
+ to use, 'Description' fields assigned to a record of any specific
+ type ('language', 'extlang', 'script', and so on) MUST be unique
+ within that given record type with the following exception: if a
+ particular 'Description' field occurs in multiple records of a given
+ type, then at most one of the records can omit the 'Deprecated'
+ field. All deprecated records that share a 'Description' MUST have
+ the same 'Preferred-Value', and all non-deprecated records MUST be
+ that 'Preferred-Value'. This means that two records of the same type
+ that share a 'Description' are also semantically equivalent and no
+ more than one record with a given 'Description' is preferred for that
+ meaning.
+
+ For example, consider the 'language' subtags 'zza' (Zaza) and 'diq'
+ (Dimli). It so happens that 'zza' is a macrolanguage enclosing 'diq'
+ and thus also has a description in ISO 639-3 of "Dimli". This
+ description was edited to read "Dimli (macrolanguage)" in the
+ registry record for 'zza' to prevent a collision.
+
+ By contrast, the subtags 'he' and 'iw' share a 'Description' value of
+ "Hebrew"; this is permitted because 'iw' is deprecated and its
+ 'Preferred-Value' is 'he'.
+
+
+
+
+Phillips & Davis Best Current Practice [Page 27]
+
+RFC 5646 Language Tags September 2009
+
+
+ For fields of type 'language', the first 'Description' field
+ appearing in the registry corresponds whenever possible to the
+ Reference Name assigned by ISO 639-3. This helps facilitate cross-
+ referencing between ISO 639 and the registry.
+
+ When creating or updating a record due to the action of one of the
+ source standards, the Language Subtag Reviewer MAY edit descriptions
+ to correct irregularities in formatting (such as misspellings,
+ inappropriate apostrophes or other punctuation, or excessive or
+ missing spaces) prior to submitting the proposed record to the
+ ietf-languages@iana.org list for consideration.
+
+3.1.6. Deprecated Field
+
+ The field 'Deprecated' contains the date the record was deprecated
+ and MAY be added, changed, or removed from any record via the
+ maintenance process described in Section 3.3 or via the registration
+ process described in Section 3.5. Usually, the addition of a
+ 'Deprecated' field is due to the action of one of the standards
+ bodies, such as ISO 3166, withdrawing a code. Although valid in
+ language tags, subtags and tags with a 'Deprecated' field are
+ deprecated, and validating processors SHOULD NOT generate these
+ subtags. Note that a record that contains a 'Deprecated' field and
+ no corresponding 'Preferred-Value' field has no replacement mapping.
+
+ In some historical cases, it might not have been possible to
+ reconstruct the original deprecation date. For these cases, an
+ approximate date appears in the registry. Some subtags and some
+ grandfathered or redundant tags were deprecated before the initial
+ creation of the registry. The exact rules for this appear in Section
+ 2 of [RFC4645]. Note that these records have a 'Deprecated' field
+ with an earlier date then the corresponding 'Added' field!
+
+3.1.7. Preferred-Value Field
+
+ The field 'Preferred-Value' contains a mapping between the record in
+ which it appears and another tag or subtag (depending on the record's
+ 'Type'). The value in this field is used for canonicalization (see
+ Section 4.5). In cases where the subtag or tag also has a
+ 'Deprecated' field, then the 'Preferred-Value' is RECOMMENDED as the
+ best choice to represent the value of this record when selecting a
+ language tag.
+
+ Records containing a 'Preferred-Value' fall into one of these four
+ groups:
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 28]
+
+RFC 5646 Language Tags September 2009
+
+
+ 1. ISO 639 language codes that were later withdrawn in favor of
+ other codes. These values are mostly a historical curiosity.
+ The 'he'/'iw' pairing above is an example of this.
+
+ 2. Subtags (with types other than language or extlang) taken from
+ codes or values that have been withdrawn in favor of a new code.
+ In particular, this applies to region subtags taken from ISO
+ 3166-1, because sometimes a country will change its name or
+ administration in such a way that warrants a new region code. In
+ some cases, countries have reverted to an older name, which might
+ already be encoded. For example, the subtag 'ZR' (Zaire) was
+ replaced by the subtag 'CD' (Democratic Republic of the Congo)
+ when that country's name was changed.
+
+ 3. Tags or subtags that have become obsolete because the values they
+ represent were later encoded. Many of the grandfathered or
+ redundant tags were later encoded by ISO 639, for example, and
+ fall into this grouping. For example, "i-klingon" was deprecated
+ when the subtag 'tlh' was added. The record for "i-klingon" has
+ a 'Preferred-Value' of 'tlh'.
+
+ 4. Extended language subtags always have a mapping to their
+ identical primary language subtag. For example, the extended
+ language subtag 'yue' (Cantonese) can be used to form the tag
+ "zh-yue". It has a 'Preferred-Value' mapping to the primary
+ language subtag 'yue', meaning that a tag such as
+ "zh-yue-Hant-HK" can be canonicalized to "yue-Hant-HK".
+
+ Records other than those of type 'extlang' that contain a 'Preferred-
+ Value' field MUST also have a 'Deprecated' field. This field
+ contains the date on which the tag or subtag was deprecated in favor
+ of the preferred value.
+
+ For records of type 'extlang', the 'Preferred-Value' field appears
+ without a corresponding 'Deprecated' field. An implementation MAY
+ ignore these preferred value mappings, although if it ignores the
+ mapping, it SHOULD do so consistently. It SHOULD also treat the
+ 'Preferred-Value' as equivalent to the mapped item. For example, the
+ tags "zh-yue-Hant-HK" and "yue-Hant-HK" are semantically equivalent
+ and ought to be treated as if they were the same tag.
+
+ Occasionally, the deprecated code is preferred in certain contexts.
+ For example, both "iw" and "he" can be used in the Java programming
+ language, but "he" is converted on input to "iw", which is thus the
+ canonical form in Java.
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 29]
+
+RFC 5646 Language Tags September 2009
+
+
+ 'Preferred-Value' mappings in records of type 'region' sometimes do
+ not represent exactly the same meaning as the original value. There
+ are many reasons for a country code to be changed, and the effect
+ this has on the formation of language tags will depend on the nature
+ of the change in question. For example, the region subtag 'YD'
+ (Democratic Yemen) was deprecated in favor of the subtag 'YE' (Yemen)
+ when those two countries unified in 1990.
+
+ A 'Preferred-Value' MAY be added to, changed, or removed from records
+ according to the rules in Section 3.3. Addition, modification, or
+ removal of a 'Preferred-Value' field in a record does not imply that
+ content using the affected subtag needs to be retagged.
+
+ The 'Preferred-Value' fields in records of type "grandfathered" and
+ "redundant" each contain an "extended language range" [RFC4647] that
+ is strongly RECOMMENDED for use in place of the record's value. In
+ many cases, these mappings were created via deprecation of the tags
+ during the period before [RFC4646] was adopted. For example, the tag
+ "no-nyn" was deprecated in favor of the ISO 639-1-defined language
+ code 'nn'.
+
+ The 'Preferred-Value' field in subtag records of type "extlang" also
+ contains an "extended language range". This allows the subtag to be
+ deprecated in favor of either a single primary language subtag or a
+ new language-extlang sequence.
+
+ Usually, the addition, removal, or change of a 'Preferred-Value'
+ field for a subtag is done to reflect changes in one of the source
+ standards. For example, if an ISO 3166-1 region code is deprecated
+ in favor of another code, that SHOULD result in the addition of a
+ 'Preferred-Value' field.
+
+ Changes to one subtag can affect other subtags as well: when
+ proposing changes to the registry, the Language Subtag Reviewer MUST
+ review the registry for such effects and propose the necessary
+ changes using the process in Section 3.5, although anyone MAY request
+ such changes. For example:
+
+ Suppose that subtag 'XX' has a 'Preferred-Value' of 'YY'. If 'YY'
+ later changes to have a 'Preferred-Value' of 'ZZ', then the
+ 'Preferred-Value' for 'XX' MUST also change to be 'ZZ'.
+
+ Suppose that a registered language subtag 'dialect' represents a
+ language not yet available in any part of ISO 639. The later
+ addition of a corresponding language code in ISO 639 SHOULD result
+ in the addition of a 'Preferred-Value' for 'dialect'.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 30]
+
+RFC 5646 Language Tags September 2009
+
+
+3.1.8. Prefix Field
+
+ The field 'Prefix' contains a valid language tag that is RECOMMENDED
+ as one possible prefix to this record's subtag, perhaps with other
+ subtags. That is, when including an extended language or a variant
+ subtag that has at least one 'Prefix' in a language tag, the
+ resulting tag SHOULD match at least one of the subtag's 'Prefix'
+ fields using the "Extended Filtering" algorithm (see [RFC4647]), and
+ each of the subtags in that 'Prefix' SHOULD appear before the subtag
+ itself.
+
+ The 'Prefix' field MUST appear exactly once in a record of type
+ 'extlang'. The 'Prefix' field MAY appear multiple times (or not at
+ all) in records of type 'variant'. Additional fields of this type
+ MAY be added to a 'variant' record via the registration process,
+ provided the 'variant' record already has at least one 'Prefix'
+ field.
+
+ Each 'Prefix' field indicates a particular sequence of subtags that
+ form a meaningful tag with this subtag. For example, the extended
+ language subtag 'cmn' (Mandarin Chinese) only makes sense with its
+ prefix 'zh' (Chinese). Similarly, 'rozaj' (Resian, a dialect of
+ Slovenian) would be appropriate when used with its prefix 'sl'
+ (Slovenian), while tags such as "is-1994" are not appropriate (and
+ probably not meaningful). Although the 'Prefix' for 'rozaj' is "sl",
+ other subtags might appear between them. For example, the tag "sl-
+ IT-rozaj" (Slovenian, Italy, Resian) matches the 'Prefix' "sl".
+
+ The 'Prefix' also indicates when variant subtags make sense when used
+ together (many that otherwise share a 'Prefix' are mutually
+ exclusive) and what the relative ordering of variants is supposed to
+ be. For example, the variant '1994' (Standardized Resian
+ orthography) has several 'Prefix' fields in the registry ("sl-rozaj",
+ "sl-rozaj-biske", "sl-rozaj-njiva", "sl-rozaj-osojs", and "sl-rozaj-
+ solba"). This indicates not only that '1994' is appropriate to use
+ with each of these five Resian variant subtags ('rozaj', 'biske',
+ 'njiva', 'osojs', and 'solba'), but also that it SHOULD appear
+ following any of these variants in a tag. Thus, the language tag
+ ought to take the form "sl-rozaj-biske-1994", rather than "sl-1994-
+ rozaj-biske" or "sl-rozaj-1994-biske".
+
+ If a record includes no 'Prefix' field, a 'Prefix' field MUST NOT be
+ added to the record at a later date. Otherwise, changes (additions,
+ deletions, or modifications) to the set of 'Prefix' fields MAY be
+ registered, as long as they strictly widen the range of language tags
+ that are recommended. For example, a 'Prefix' with the value "be-
+ Latn" (Belarusian, Latin script) could be replaced by the value "be"
+ (Belarusian) but not by the value "ru-Latn" (Russian, Latin script)
+
+
+
+Phillips & Davis Best Current Practice [Page 31]
+
+RFC 5646 Language Tags September 2009
+
+
+ or the value "be-Latn-BY" (Belarusian, Latin script, Belarus), since
+ these latter either change or narrow the range of suggested tags.
+
+ The field-body of the 'Prefix' field MUST NOT conflict with any
+ 'Prefix' already registered for a given record. Such a conflict
+ would occur when no valid tag could be constructed that would contain
+ the prefix, such as when two subtags each have a 'Prefix' that
+ contains the other subtag. For example, suppose that the subtag
+ 'avariant' has the prefix "es-bvariant". Then the subtag 'bvariant'
+ cannot be assigned the prefix 'avariant', for that would require a
+ tag of the form "es-avariant-bvariant-avariant", which would not be
+ valid.
+
+3.1.9. Suppress-Script Field
+
+ The field 'Suppress-Script' contains a script subtag (whose record
+ appears in the registry). The field 'Suppress-Script' MUST appear
+ only in records whose 'Type' field-body is either 'language' or
+ 'extlang'. This field MUST NOT appear more than one time in a
+ record.
+
+ This field indicates a script used to write the overwhelming majority
+ of documents for the given language. The subtag for such a script
+ therefore adds no distinguishing information to a language tag and
+ thus SHOULD NOT be used for most documents in that language.
+ Omitting the script subtag indicated by this field helps ensure
+ greater compatibility between the language tags generated according
+ to the rules in this document and language tags and tag processors or
+ consumers based on RFC 3066. For example, virtually all Icelandic
+ documents are written in the Latin script, making the subtag 'Latn'
+ redundant in the tag "is-Latn".
+
+ Many language subtag records do not have a 'Suppress-Script' field.
+ The lack of a 'Suppress-Script' might indicate that the language is
+ customarily written in more than one script or that the language is
+ not customarily written at all. It might also mean that sufficient
+ information was not available when the record was created and thus
+ remains a candidate for future registration.
+
+3.1.10. Macrolanguage Field
+
+ The field 'Macrolanguage' contains a primary language subtag (whose
+ record appears in the registry). This field indicates a language
+ that encompasses this subtag's language according to assignments made
+ by ISO 639-3.
+
+ ISO 639-3 labels some languages in the registry as "macrolanguages".
+ ISO 639-3 defines the term "macrolanguage" to mean "clusters of
+
+
+
+Phillips & Davis Best Current Practice [Page 32]
+
+RFC 5646 Language Tags September 2009
+
+
+ closely-related language varieties that [...] can be considered
+ distinct individual languages, yet in certain usage contexts a single
+ language identity for all is needed". These correspond to codes
+ registered in ISO 639-2 as individual languages that were found to
+ correspond to more than one language in ISO 639-3.
+
+ A language contained within a macrolanguage is called an "encompassed
+ language". The record for each encompassed language contains a
+ 'Macrolanguage' field in the registry; the macrolanguages themselves
+ are not specially marked. Note that some encompassed languages have
+ ISO 639-1 or ISO 639-2 codes.
+
+ The 'Macrolanguage' field can only occur in records of type
+ 'language' or 'extlang'. Only values assigned by ISO 639-3 will be
+ considered for inclusion. 'Macrolanguage' fields MAY be added or
+ removed via the normal registration process whenever ISO 639-3
+ defines new values or withdraws old values. Macrolanguages are
+ informational, and MAY be removed or changed if ISO 639-3 changes the
+ values. For more information on the use of this field and choosing
+ between macrolanguage and encompassed language subtags, see
+ Section 4.1.1.
+
+ For example, the language subtags 'nb' (Norwegian Bokmal) and 'nn'
+ (Norwegian Nynorsk) each have a 'Macrolanguage' field with a value of
+ 'no' (Norwegian). For more information, see Section 4.1.
+
+3.1.11. Scope Field
+
+ The field 'Scope' contains classification information about a primary
+ or extended language subtag derived from ISO 639. Most languages
+ have a scope of 'individual', which means that the language is not a
+ macrolanguage, collection, special code, or private use. That is, it
+ is what one would normally consider to be 'a language'. Any primary
+ or extended language subtag that has no 'Scope' field is an
+ individual language.
+
+ 'Scope' information can sometimes be helpful in selecting language
+ tags, since it indicates the purpose or "scope" of the code
+ assignment within ISO 639. The available values are:
+
+ o 'macrolanguage' - Indicates a macrolanguage as defined by ISO
+ 639-3 (see Section 3.1.10). A macrolanguage is a cluster of
+ closely related languages that are sometimes considered to be a
+ single language.
+
+ o 'collection' - Indicates a subtag that represents a collection of
+ languages, typically related by some type of historical,
+ geographical, or linguistic association. Unlike a macrolanguage,
+
+
+
+Phillips & Davis Best Current Practice [Page 33]
+
+RFC 5646 Language Tags September 2009
+
+
+ a collection can contain languages that are only loosely related
+ and a collection cannot be used interchangeably with languages
+ that belong to it.
+
+ o 'special' - Indicates a special language code. These are subtags
+ used for identifying linguistic attributes not particularly
+ associated with a concrete language. These include codes for when
+ the language is undetermined or for non-linguistic content.
+
+ o 'private-use' - Indicates a code reserved for private use in the
+ underlying standard. Subtags with this scope can be used to
+ indicate a primary language for which no ISO 639 or registered
+ assignment exists.
+
+ The 'Scope' field MAY appear in records of type 'language' or
+ 'extlang'. Note that many of the prefixes for extended language
+ subtags will have a 'Scope' of 'macrolanguage' (although some will
+ not) and that many languages that have a 'Scope' of 'macrolanguage'
+ will have extended language subtags associated with them.
+
+ The 'Scope' field MAY be added, modified, or removed via the
+ registration process, provided the change mirrors changes made by ISO
+ 639 to the assignment's classification. Such a change is expected to
+ be rare.
+
+ For example, the primary language subtag 'zh' (Chinese) has a 'Scope'
+ of 'macrolanguage', while its enclosed language 'nan' (Min Nan
+ Chinese) has a 'Scope' of 'individual'. The special value 'und'
+ (Undetermined) has a 'Scope' of 'special'. The ISO 639-5 collection
+ 'gem' (Germanic languages) has a 'Scope' of 'collection'.
+
+3.1.12. Comments Field
+
+ The field 'Comments' contains additional information about the record
+ and MAY appear more than once per record. The field-body MAY include
+ the full range of Unicode characters and is not restricted to any
+ particular script. This field MAY be inserted or changed via the
+ registration process, and no guarantee of stability is provided.
+
+ The content of this field is not restricted, except by the need to
+ register the information, the suitability of the request, and by
+ reasonable practical size limitations. The primary reason for the
+ 'Comments' field is subtag identification -- to help distinguish the
+ subtag from others with which it might be confused as an aid to
+ usage. Large amounts of information about the use, history, or
+ general background of a subtag are frowned upon, as these generally
+ belong in a registration request rather than in the registry.
+
+
+
+
+Phillips & Davis Best Current Practice [Page 34]
+
+RFC 5646 Language Tags September 2009
+
+
+3.2. Language Subtag Reviewer
+
+ The Language Subtag Reviewer moderates the ietf-languages@iana.org
+ mailing list, responds to requests for registration, and performs the
+ other registry maintenance duties described in Section 3.3. Only the
+ Language Subtag Reviewer is permitted to request IANA to change,
+ update, or add records to the Language Subtag Registry. The Language
+ Subtag Reviewer MAY delegate list moderation and other clerical
+ duties as needed.
+
+ The Language Subtag Reviewer is appointed by the IESG for an
+ indefinite term, subject to removal or replacement at the IESG's
+ discretion. The IESG will solicit nominees for the position (upon
+ adoption of this document or upon a vacancy) and then solicit
+ feedback on the nominees' qualifications. Qualified candidates
+ should be familiar with BCP 47 and its requirements; be willing to
+ fairly, responsively, and judiciously administer the registration
+ process; and be suitably informed about the issues of language
+ identification so that the reviewer can assess the claims and draw
+ upon the contributions of language experts and subtag requesters.
+
+ The subsequent performance or decisions of the Language Subtag
+ Reviewer MAY be appealed to the IESG under the same rules as other
+ IETF decisions (see [RFC2026]). The IESG can reverse or overturn the
+ decisions of the Language Subtag Reviewer, provide guidance, or take
+ other appropriate actions.
+
+3.3. Maintenance of the Registry
+
+ Maintenance of the registry requires that, as codes are assigned or
+ withdrawn by ISO 639, ISO 15924, ISO 3166, and UN M.49, the Language
+ Subtag Reviewer MUST evaluate each change and determine the
+ appropriate course of action according to the rules in this document.
+ Such updates follow the registration process described in
+ Section 3.5. Usually, the Language Subtag Reviewer will start the
+ process for the new or updated record by filling in the registration
+ form and submitting it. If a change to one of these standards takes
+ place and the Language Subtag Reviewer does not do this in a timely
+ manner, then any interested party MAY submit the form. Thereafter,
+ the registration process continues normally.
+
+ Note that some registrations affect other subtags--perhaps more than
+ one--as when a region subtag is being deprecated in favor of a new
+ value. The Language Subtag Reviewer is responsible for ensuring that
+ any such changes are properly registered, with each change requiring
+ its own registration form.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 35]
+
+RFC 5646 Language Tags September 2009
+
+
+ The Language Subtag Reviewer MUST ensure that new subtags meet the
+ requirements elsewhere in this document (and most especially in
+ Section 3.4) or submit an appropriate registration form for an
+ alternate subtag as described in that section. Each individual
+ subtag affected by a change MUST be sent to the
+ ietf-languages@iana.org list with its own registration form and in a
+ separate message.
+
+3.4. Stability of IANA Registry Entries
+
+ The stability of entries and their meaning in the registry is
+ critical to the long-term stability of language tags. The rules in
+ this section guarantee that a specific language tag's meaning is
+ stable over time and will not change.
+
+ These rules specifically deal with how changes to codes (including
+ withdrawal and deprecation of codes) maintained by ISO 639, ISO
+ 15924, ISO 3166, and UN M.49 are reflected in the IANA Language
+ Subtag Registry. Assignments to the IANA Language Subtag Registry
+ MUST follow the following stability rules:
+
+ 1. Values in the fields 'Type', 'Subtag', 'Tag', and 'Added' MUST
+ NOT be changed and are guaranteed to be stable over time.
+
+ 2. Values in the fields 'Preferred-Value' and 'Deprecated' MAY be
+ added, altered, or removed via the registration process. These
+ changes SHOULD be limited to changes necessary to mirror changes
+ in one of the underlying standards (ISO 639, ISO 15924, ISO
+ 3166-1, or UN M.49) and typically alteration or removal of a
+ 'Preferred-Value' is limited specifically to region codes.
+
+ 3. Values in the 'Description' field MUST NOT be changed in a way
+ that would invalidate any existing tags. The description MAY be
+ broadened somewhat in scope, changed to add information, or
+ adapted to the most common modern usage. For example, countries
+ occasionally change their names; a historical example of this is
+ "Upper Volta" changing to "Burkina Faso".
+
+ 4. Values in the field 'Prefix' MAY be added to existing records of
+ type 'variant' via the registration process, provided the
+ 'variant' already has at least one 'Prefix'. A 'Prefix' field
+ SHALL NOT be registered for any 'variant' that has no existing
+ 'Prefix' field. If a prefix is added to a variant record,
+ 'Comment' fields MAY be used to explain different usages with
+ the various prefixes.
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 36]
+
+RFC 5646 Language Tags September 2009
+
+
+ 5. Values in the field 'Prefix' in records of type 'variant' MAY
+ also be modified, so long as the modifications broaden the set
+ of prefixes. That is, a prefix MAY be replaced by one of its
+ own prefixes. For example, the prefix "en-US" could be replaced
+ by "en", but not by the prefixes "en-Latn", "fr", or "en-US-
+ boont". If one of those prefix values were needed, it would
+ have to be separately registered.
+
+ 6. Values in the field 'Prefix' in records of type 'extlang' MUST
+ NOT be added, modified, or removed.
+
+ 7. The field 'Prefix' MUST NOT be removed from any record in which
+ it appears. This field SHOULD be included in the initial
+ registration of any records of type 'variant' and MUST be
+ included in any records of type 'extlang'.
+
+ 8. The field 'Comments' MAY be added, changed, modified, or removed
+ via the registration process or any of the processes or
+ considerations described in this section.
+
+ 9. The field 'Suppress-Script' MAY be added or removed via the
+ registration process.
+
+ 10. The field 'Macrolanguage' MAY be added or removed via the
+ registration process, but only in response to changes made by
+ ISO 639. The 'Macrolanguage' field appears whenever a language
+ has a corresponding macrolanguage in ISO 639. That is, the
+ 'Macrolanguage' fields in the registry exactly match those of
+ ISO 639. No other macrolanguage mappings will be considered for
+ registration.
+
+ 11. The field 'Scope' MAY be added or removed from a primary or
+ extended language subtag after initial registration, and it MAY
+ be modified in order to match any changes made by ISO 639.
+ Changes to the 'Scope' field MUST mirror changes made by ISO
+ 639. Note that primary or extended language subtags whose
+ records do not contain a 'Scope' field (that is, most of them)
+ are individual languages as described in Section 3.1.11.
+
+ 12. Primary and extended language subtags (other than independently
+ registered values created using the registration process) are
+ created according to the assignments of the various parts of ISO
+ 639, as follows:
+
+ A. Codes assigned by ISO 639-1 that do not conflict with
+ existing two-letter primary language subtags and that have
+ no corresponding three-letter primary defined in the
+ registry are entered into the IANA registry as new records
+
+
+
+Phillips & Davis Best Current Practice [Page 37]
+
+RFC 5646 Language Tags September 2009
+
+
+ of type 'language'. Note that languages given an ISO 639-1
+ code cannot be given extended language subtags, even if
+ encompassed by a macrolanguage.
+
+ B. Codes assigned by ISO 639-3 or ISO 639-5 that do not
+ conflict with existing three-letter primary language subtags
+ and that do not have ISO 639-1 codes assigned (or expected
+ to be assigned) are entered into the IANA registry as new
+ records of type 'language'. Note that these two standards
+ now comprise a superset of ISO 639-2 codes. Codes that have
+ a defined 'macrolanguage' mapping at the time of their
+ registration MUST contain a 'Macrolanguage' field.
+
+ C. Codes assigned by ISO 639-3 MAY also be considered for an
+ extended language subtag registration. Note that they MUST
+ be assigned a primary language subtag record of type
+ 'language' even when an 'extlang' record is proposed. When
+ considering extended language subtag assignment, these
+ criteria apply:
+
+ 1. If a language has a macrolanguage mapping, and that
+ macrolanguage has other encompassed languages that are
+ assigned extended language subtags, then the new
+ language SHOULD have an 'extlang' record assigned to it
+ as well. For example, any language with a macrolanguage
+ of 'zh' or 'ar' would be assigned an 'extlang' record.
+
+ 2. 'Extlang' records SHOULD NOT be created for languages if
+ other languages encompassed by the macrolanguage do not
+ also include 'extlang' records. For example, if a new
+ Serbo-Croatian ('sh') language were registered, it would
+ not get an extlang record because other languages
+ encompassed, such as Serbian ('sr'), do not include one
+ in the registry.
+
+ 3. Sign languages SHOULD have an 'extlang' record with a
+ 'Prefix' of 'sgn'.
+
+ 4. 'Extlang' records MUST NOT be created for items already
+ in the registry. Extended language subtags will only be
+ considered at the time of initial registration.
+
+ 5. Extended language subtag records MUST include the fields
+ 'Prefix' and 'Preferred-Value' with field values
+ assigned as described in Section 2.2.2.
+
+ D. Any other codes assigned by ISO 639-2 that do not conflict
+ with existing three-letter primary or extended language
+
+
+
+Phillips & Davis Best Current Practice [Page 38]
+
+RFC 5646 Language Tags September 2009
+
+
+ subtags and that do not have ISO 639-1 two-letter codes
+ assigned are entered into the IANA registry as new records
+ of type 'language'. This type of registration is not
+ supposed to occur in the future.
+
+ 13. Codes assigned by ISO 15924 and ISO 3166-1 that do not conflict
+ with existing subtags of the associated type and whose meaning
+ is not the same as an existing subtag of the same type are
+ entered into the IANA registry as new records.
+
+ 14. Codes assigned by ISO 639, ISO 15924, or ISO 3166-1 that are
+ withdrawn by their respective maintenance or registration
+ authority remain valid in language tags. A 'Deprecated' field
+ containing the date of withdrawal MUST be added to the record.
+ If a new record of the same type is added that represents a
+ replacement value, then a 'Preferred-Value' field MAY also be
+ added. The registration process MAY be used to add comments
+ about the withdrawal of the code by the respective standard.
+
+ For example: the region code 'TL' was assigned to the country
+ 'Timor-Leste', replacing the code 'TP' (which was assigned to
+ 'East Timor' when it was under administration by Portugal).
+ The subtag 'TP' remains valid in language tags, but its
+ record contains the 'Preferred-Value' of 'TL' and its field
+ 'Deprecated' contains the date the new code was assigned
+ ('2004-07-06').
+
+ 15. Codes assigned by ISO 639, ISO 15924, or ISO 3166-1 that
+ conflict with existing subtags of the associated type, including
+ subtags that are deprecated, MUST NOT be entered into the
+ registry. The following additional considerations apply to
+ subtag values that are reassigned:
+
+ A. For ISO 639 codes, if the newly assigned code's meaning is
+ not represented by a subtag in the IANA registry, the
+ Language Subtag Reviewer, as described in Section 3.5, SHALL
+ prepare a proposal for entering in the IANA registry, as
+ soon as practical, a registered language subtag as an
+ alternate value for the new code. The form of the
+ registered language subtag will be at the discretion of the
+ Language Subtag Reviewer and MUST conform to other
+ restrictions on language subtags in this document.
+
+ B. For all subtags whose meaning is derived from an external
+ standard (that is, by ISO 639, ISO 15924, ISO 3166-1, or UN
+ M.49), if a new meaning is assigned to an existing code and
+ the new meaning broadens the meaning of that code, then the
+ meaning for the associated subtag MAY be changed to match.
+
+
+
+Phillips & Davis Best Current Practice [Page 39]
+
+RFC 5646 Language Tags September 2009
+
+
+ The meaning of a subtag MUST NOT be narrowed, however, as
+ this can result in an unknown proportion of the existing
+ uses of a subtag becoming invalid. Note: the ISO 639
+ registration authority (RA) has adopted a similar stability
+ policy.
+
+ C. For ISO 15924 codes, if the newly assigned code's meaning is
+ not represented by a subtag in the IANA registry, the
+ Language Subtag Reviewer, as described in Section 3.5, SHALL
+ prepare a proposal for entering in the IANA registry, as
+ soon as practical, a registered variant subtag as an
+ alternate value for the new code. The form of the
+ registered variant subtag will be at the discretion of the
+ Language Subtag Reviewer and MUST conform to other
+ restrictions on variant subtags in this document.
+
+ D. For ISO 3166-1 codes, if the newly assigned code's meaning
+ is associated with the same UN M.49 code as another 'region'
+ subtag, then the existing region subtag remains as the
+ preferred value for that region and no new entry is created.
+ A comment MAY be added to the existing region subtag
+ indicating the relationship to the new ISO 3166-1 code.
+
+ E. For ISO 3166-1 codes, if the newly assigned code's meaning
+ is associated with a UN M.49 code that is not represented by
+ an existing region subtag, then the Language Subtag
+ Reviewer, as described in Section 3.5, SHALL prepare a
+ proposal for entering the appropriate UN M.49 country code
+ as an entry in the IANA registry.
+
+ F. For ISO 3166-1 codes, if there is no associated UN numeric
+ code, then the Language Subtag Reviewer SHALL petition the
+ UN to create one. If there is no response from the UN
+ within 90 days of the request being sent, the Language
+ Subtag Reviewer SHALL prepare a proposal for entering in the
+ IANA registry, as soon as practical, a registered variant
+ subtag as an alternate value for the new code. The form of
+ the registered variant subtag will be at the discretion of
+ the Language Subtag Reviewer and MUST conform to other
+ restrictions on variant subtags in this document. This
+ situation is very unlikely to ever occur.
+
+ 16. UN M.49 has codes for both "countries and areas" (such as '276'
+ for Germany) and "geographical regions and sub-regions" (such as
+ '150' for Europe). UN M.49 country or area codes for which
+ there is no corresponding ISO 3166-1 code MUST NOT be
+ registered, except as a surrogate for an ISO 3166-1 code that is
+ blocked from registration by an existing subtag.
+
+
+
+Phillips & Davis Best Current Practice [Page 40]
+
+RFC 5646 Language Tags September 2009
+
+
+ If such a code becomes necessary, then the maintenance agency
+ for ISO 3166-1 SHALL first be petitioned to assign a code to the
+ region. If the petition for a code assignment by ISO 3166-1 is
+ refused or not acted on in a timely manner, the registration
+ process described in Section 3.5 can then be used to register
+ the corresponding UN M.49 code. This way, UN M.49 codes remain
+ available as the value of last resort in cases where ISO 3166-1
+ reassigns a deprecated value in the registry.
+
+ 17. The redundant and grandfathered entries together form the
+ complete list of tags registered under [RFC3066]. The redundant
+ tags are those previously registered tags that can now be formed
+ using the subtags defined in the registry. The grandfathered
+ entries include those that can never be legal because they are
+ 'irregular' (that is, they do not match the 'langtag' production
+ in Figure 1), are limited by rule (subtags such as 'nyn' and
+ 'min' look like the extlang production, but cannot be registered
+ as extended language subtags), or their subtags are
+ inappropriate for registration. All of the grandfathered tags
+ are listed in either the 'regular' or the 'irregular'
+ productions in the ABNF. Under [RFC4646] it was possible for
+ grandfathered tags to become redundant. However, all of the
+ tags for which this was possible became redundant before this
+ document was produced. So the set of redundant and
+ grandfathered tags is now permanent and immutable: new entries
+ of either type MUST NOT be added and existing entries MUST NOT
+ be removed. The decision-making process about which tags were
+ initially grandfathered and which were made redundant is
+ described in [RFC4645].
+
+ Many of the grandfathered tags are deprecated -- indeed, they
+ were deprecated even before [RFC4646]. For example, the tag
+ "art-lojban" was deprecated in favor of the primary language
+ subtag 'jbo'. These tags could have been made 'redundant' by
+ registering some of their subtags as 'variants'. The 'variant-
+ like' subtags in the grandfathered registrations SHALL NOT be
+ registered in the future, even with a similar or identical
+ meaning.
+
+3.5. Registration Procedure for Subtags
+
+ The procedure given here MUST be used by anyone who wants to use a
+ subtag not currently in the IANA Language Subtag Registry or who
+ wishes to add, modify, update, or remove information in existing
+ records as permitted by this document.
+
+ Only subtags of type 'language' and 'variant' will be considered for
+ independent registration of new subtags. Subtags needed for
+
+
+
+Phillips & Davis Best Current Practice [Page 41]
+
+RFC 5646 Language Tags September 2009
+
+
+ stability and subtags necessary to keep the registry synchronized
+ with ISO 639, ISO 15924, ISO 3166, and UN M.49 within the limits
+ defined by this document also use this process, as described in
+ Section 3.3 and subject to stability provisions as described in
+ Section 3.4.
+
+ Registration requests are accepted relating to information in the
+ 'Comments', 'Deprecated', 'Description', 'Prefix', 'Preferred-Value',
+ 'Macrolanguage', or 'Suppress-Script' fields in a subtag's record as
+ described in Section 3.4. Changes to all other fields in the IANA
+ registry are NOT permitted.
+
+ Registering a new subtag or requesting modifications to an existing
+ tag or subtag starts with the requester filling out the registration
+ form reproduced below. Note that each response is not limited in
+ size so that the request can adequately describe the registration.
+ The fields in the "Record Requested" section need to follow the
+ requirements in Section 3.1 before the record will be approved.
+
+ LANGUAGE SUBTAG REGISTRATION FORM
+ 1. Name of requester:
+ 2. E-mail address of requester:
+ 3. Record Requested:
+
+ Type:
+ Subtag:
+ Description:
+ Prefix:
+ Preferred-Value:
+ Deprecated:
+ Suppress-Script:
+ Macrolanguage:
+ Comments:
+
+ 4. Intended meaning of the subtag:
+ 5. Reference to published description
+ of the language (book or article):
+ 6. Any other relevant information:
+
+ Figure 5: The Language Subtag Registration Form
+
+ Examples of completed registration forms can be found in Appendix B.
+ A complete list of approved registration forms is online through
+ http://www.iana.org; readers should note that the Language Tag
+ Registry is now obsolete and should instead look for the Language
+ Subtag Registry.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 42]
+
+RFC 5646 Language Tags September 2009
+
+
+ The subtag registration form MUST be sent to
+ <ietf-languages@iana.org>. Registration requests receive a two-week
+ review period before being approved and submitted to IANA for
+ inclusion in the registry. If modifications are made to the request
+ during the course of the registration process (such as corrections to
+ meet the requirements in Section 3.1 or to make the 'Description'
+ fields unique for the given record type), the modified form MUST also
+ be sent to <ietf-languages@iana.org> at least one week prior to
+ submission to IANA.
+
+ The ietf-languages list is an open list and can be joined by sending
+ a request to <ietf-languages-request@iana.org>. The list can be
+ hosted by IANA or any third party at the request of IESG.
+
+ Before forwarding any registration to IANA, the Language Subtag
+ Reviewer MUST ensure that all requirements in this document are met.
+ This includes ensuring that values in the 'Subtag' field match case
+ according to the description in Section 3.1.4 and that 'Description'
+ fields are unique for the given record type as described in
+ Section 3.1.5. The Reviewer MUST also ensure that an appropriate
+ File-Date record is included in the request, to assist IANA when
+ updating the registry (see Section 5.1).
+
+ Some fields in both the registration form as well as the registry
+ record itself permit the use of non-ASCII characters. Registration
+ requests SHOULD use the UTF-8 encoding for consistency and clarity.
+ However, since some mail clients do not support this encoding, other
+ encodings MAY be used for the registration request. The Language
+ Subtag Reviewer is responsible for ensuring that the proper Unicode
+ characters appear in both the archived request form and the registry
+ record. In the case of a transcription or encoding error by IANA,
+ the Language Subtag Reviewer will request that the registry be
+ repaired, providing any necessary information to assist IANA.
+
+ Extended language subtags (type 'extlang'), by definition, are always
+ encompassed by another language. All records of type 'extlang' MUST,
+ therefore, contain a 'Prefix' field at the time of registration.
+ This 'Prefix' field can never be altered or removed, and requests to
+ do so MUST be rejected.
+
+ Variant subtags are usually registered for use with a particular
+ range of language tags, and variant subtags based on the terminology
+ of the language to which they are apply are encouraged. For example,
+ the subtag 'rozaj' (Resian) is intended for use with language tags
+ that start with the primary language subtag "sl" (Slovenian), since
+ Resian is a dialect of Slovenian. Thus, the subtag 'rozaj' would be
+ appropriate in tags such as "sl-Latn-rozaj" or "sl-IT-rozaj". This
+ information is stored in the 'Prefix' field in the registry. Variant
+
+
+
+Phillips & Davis Best Current Practice [Page 43]
+
+RFC 5646 Language Tags September 2009
+
+
+ registration requests SHOULD include at least one 'Prefix' field in
+ the registration form.
+
+ Requests to assign an additional record of a given type with an
+ existing subtag value MUST be rejected. For example, the variant
+ subtag 'rozaj' already exists in the registry, so adding a second
+ record of type 'variant' with the subtag 'rozaj' is prohibited.
+
+ The 'Prefix' field for a given registered variant subtag exists in
+ the IANA registry as a guide to usage. Additional 'Prefix' fields
+ MAY be added by filing an additional registration form. In that
+ form, the "Any other relevant information:" field MUST indicate that
+ it is the addition of a prefix.
+
+ Requests to add a 'Prefix' field to a variant subtag that imply a
+ different semantic meaning SHOULD be rejected. For example, a
+ request to add the prefix "de" to the subtag '1994' so that the tag
+ "de-1994" represented some German dialect or orthographic form would
+ be rejected. The '1994' subtag represents a particular Slovenian
+ orthography, and the additional registration would change or blur the
+ semantic meaning assigned to the subtag. A separate subtag SHOULD be
+ proposed instead.
+
+ Requests to add a 'Prefix' to a variant subtag that has no current
+ 'Prefix' field MUST be rejected. Variants are registered with no
+ prefix because they are potentially useful with many or even all
+ languages. Adding one or more 'Prefix' fields would be potentially
+ harmful to the use of the variant, since it dramatically reduces the
+ scope of the subtag (which is not allowed under the stability rules
+ (Section 3.4) as opposed to broadening the scope of the subtag, which
+ is what the addition of a 'Prefix' normally does. An example of such
+ a "no-prefix" variant is the subtag 'fonipa', which represents the
+ International Phonetic Alphabet, a scheme that can be used to
+ transcribe many languages.
+
+ The 'Description' fields provided in the request MUST contain at
+ least one description written or transcribed into the Latin script;
+ the request MAY also include additional 'Description' fields in any
+ script or language. The 'Description' field is used for
+ identification purposes and doesn't necessarily represent the actual
+ native name of the language or variation. It also doesn't have to be
+ in any particular language, but SHOULD be both suitable and
+ sufficient to identify the item in the record. The Language Subtag
+ Reviewer will check and edit any proposed 'Description' fields so as
+ to ensure uniqueness and prevent collisions with 'Description' fields
+ in other records of the same type. If this occurs in an independent
+ registration request, the Language Subtag Reviewer MUST resubmit the
+ record to <ietf-languages@iana.org>, treating it as a modification of
+
+
+
+Phillips & Davis Best Current Practice [Page 44]
+
+RFC 5646 Language Tags September 2009
+
+
+ a request due to discussion, as described in Section 3.5, unless the
+ request's sole purpose is to introduce a duplicate 'Description'
+ field, in which case the request SHALL be rejected.
+
+ The 'Description' field is not guaranteed to be stable. Corrections
+ or clarifications of intent are examples of possible changes.
+ Attempts to provide translations or transcriptions of entries in the
+ registry (which, by definition, provide no new information) are
+ unlikely to be approved.
+
+ Soon after the two-week review period has passed, the Language Subtag
+ Reviewer MUST take one of the following actions:
+
+ o Explicitly accept the request and forward the form containing the
+ record to be inserted or modified to <iana@iana.org> according to
+ the procedure described in Section 3.3.
+
+ o Explicitly reject the request because of significant objections
+ raised on the list or due to problems with constraints in this
+ document (which MUST be explicitly cited).
+
+ o Extend the review period by granting an additional two-week
+ increment to permit further discussion. After each two-week
+ increment, the Language Subtag Reviewer MUST indicate on the list
+ whether the registration has been accepted, rejected, or extended.
+
+ Note that the Language Subtag Reviewer MAY raise objections on the
+ list if he or she so desires. The important thing is that the
+ objection MUST be made publicly.
+
+ Sometimes the request needs to be modified as a result of discussion
+ during the review period or due to requirements in this document.
+ The applicant, Language Subtag Reviewer, or others MAY submit a
+ modified version of the completed registration form, which will be
+ considered in lieu of the original request with the explicit approval
+ of the applicant. Such changes do not restart the two-week
+ discussion period, although an application containing the final
+ record submitted to IANA MUST appear on the list at least one week
+ prior to the Language Subtag Reviewer forwarding the record to IANA.
+ The applicant MAY modify a rejected application with more appropriate
+ or additional information and submit it again; this starts a new two-
+ week comment period.
+
+ Registrations initiated due to the provisions of Section 3.3 or
+ Section 3.4 SHALL NOT be rejected altogether (since they have to
+ ultimately appear in the registry) and SHOULD be completed as quickly
+ as possible. The review process allows list members to comment on
+ the specific information in the form and the record it contains and
+
+
+
+Phillips & Davis Best Current Practice [Page 45]
+
+RFC 5646 Language Tags September 2009
+
+
+ thus help ensure that it is correct and consistent. The Language
+ Subtag Reviewer MAY reject a specific version of the form, but MUST
+ propose a suitable replacement, extending the review period as
+ described above, until the form is in a format worthy of the
+ reviewer's approval and meets with rough consensus of the list.
+
+ Decisions made by the Language Subtag Reviewer MAY be appealed to the
+ IESG [RFC2028] under the same rules as other IETF decisions
+ [RFC2026]. This includes a decision to extend the review period or
+ the failure to announce a decision in a clear and timely manner.
+
+ The approved records appear in the Language Subtag Registry. The
+ approved registration forms are available online from
+ http://www.iana.org.
+
+ Updates or changes to existing records follow the same procedure as
+ new registrations. The Language Subtag Reviewer decides whether
+ there is consensus to update the registration following the two-week
+ review period; normally, objections by the original registrant will
+ carry extra weight in forming such a consensus.
+
+ Registrations are permanent and stable. Once registered, subtags
+ will not be removed from the registry and will remain a valid way in
+ which to specify a specific language or variant.
+
+ Note: The purpose of the "Reference to published description" section
+ in the registration form is to aid in verifying whether a language is
+ registered or to which language or language variation a particular
+ subtag refers. In most cases, reference to an authoritative grammar
+ or dictionary of that language will be useful; in cases where no such
+ work exists, other well-known works describing that language or in
+ that language MAY be appropriate. The Language Subtag Reviewer
+ decides what constitutes "good enough" reference material. This
+ requirement is not intended to exclude particular languages or
+ dialects due to the size of the speaker population or lack of a
+ standardized orthography. Minority languages will be considered
+ equally on their own merits.
+
+3.6. Possibilities for Registration
+
+ Possibilities for registration of subtags or information about
+ subtags include:
+
+ o Primary language subtags for languages not listed in ISO 639 that
+ are not variants of any listed or registered language MAY be
+ registered. At the time this document was created, there were no
+ examples of this form of subtag. Before attempting to register a
+ language subtag, there MUST be an attempt to register the language
+
+
+
+Phillips & Davis Best Current Practice [Page 46]
+
+RFC 5646 Language Tags September 2009
+
+
+ with ISO 639. Subtags MUST NOT be registered for languages
+ defined by codes that exist in ISO 639-1, ISO 639-2, or ISO 639-3;
+ that are under consideration by the ISO 639 registration
+ authorities; or that have never been attempted for registration
+ with those authorities. If ISO 639 has previously rejected a
+ language for registration, it is reasonable to assume that there
+ must be additional, very compelling evidence of need before it
+ will be registered as a primary language subtag in the IANA
+ registry (to the extent that it is very unlikely that any subtags
+ will be registered of this type).
+
+ o Dialect or other divisions or variations within a language, its
+ orthography, writing system, regional or historical usage,
+ transliteration or other transformation, or distinguishing
+ variation MAY be registered as variant subtags. An example is the
+ 'rozaj' subtag (the Resian dialect of Slovenian).
+
+ o The addition or maintenance of fields (generally of an
+ informational nature) in tag or subtag records as described in
+ Section 3.1 is allowed. Such changes are subject to the stability
+ provisions in Section 3.4. This includes 'Description',
+ 'Comments', 'Deprecated', and 'Preferred-Value' fields for
+ obsolete or withdrawn codes, or the addition of 'Suppress-Script'
+ or 'Macrolanguage' fields to primary language subtags, as well as
+ other changes permitted by this document, such as the addition of
+ an appropriate 'Prefix' field to a variant subtag.
+
+ o The addition of records and related field value changes necessary
+ to reflect assignments made by ISO 639, ISO 15924, ISO 3166-1, and
+ UN M.49 as described in Section 3.4 is allowed.
+
+ Subtags proposed for registration that would cause all or part of a
+ grandfathered tag to become redundant but whose meaning conflicts
+ with or alters the meaning of the grandfathered tag MUST be rejected.
+
+ This document leaves the decision on what subtags or changes to
+ subtags are appropriate (or not) to the registration process
+ described in Section 3.5.
+
+ Note: Four-character primary language subtags are reserved to allow
+ for the possibility of alpha4 codes in some future addition to the
+ ISO 639 family of standards.
+
+ ISO 639 defines a registration authority for additions to and changes
+ in the list of languages in ISO 639. This agency is:
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 47]
+
+RFC 5646 Language Tags September 2009
+
+
+ International Information Centre for Terminology (Infoterm)
+ Aichholzgasse 6/12, AT-1120
+ Wien, Austria
+ Phone: +43 1 26 75 35 Ext. 312 Fax: +43 1 216 32 72
+
+ ISO 639-2 defines a registration authority for additions to and
+ changes in the list of languages in ISO 639-2. This agency is:
+
+ Library of Congress
+ Network Development and MARC Standards Office
+ Washington, DC 20540, USA
+ Phone: +1 202 707 6237 Fax: +1 202 707 0115
+ URL: http://www.loc.gov/standards/iso639-2
+
+ ISO 639-3 defines a registration authority for additions to and
+ changes in the list of languages in ISO 639-3. This agency is:
+
+ SIL International
+ ISO 639-3 Registrar
+ 7500 W. Camp Wisdom Rd.
+ Dallas, TX 75236, USA
+ Phone: +1 972 708 7400, ext. 2293
+ Fax: +1 972 708 7546
+ Email: iso639-3@sil.org
+ URL: http://www.sil.org/iso639-3
+
+ ISO 639-5 defines a registration authority for additions to and
+ changes in the list of languages in ISO 639-5. This agency is the
+ same as for ISO 639-2 and is:
+
+ Library of Congress
+ Network Development and MARC Standards Office
+ Washington, DC 20540, USA
+ Phone: +1 202 707 6237
+ Fax: +1 202 707 0115
+ URL: http://www.loc.gov/standards/iso639-5
+
+ The maintenance agency for ISO 3166-1 (country codes) is:
+
+ ISO 3166 Maintenance Agency
+ c/o International Organization for Standardization
+ Case postale 56
+ CH-1211 Geneva 20, Switzerland
+ Phone: +41 22 749 72 33 Fax: +41 22 749 73 49
+ URL: http://www.iso.org/iso/en/prods-services/iso3166ma/index.html
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 48]
+
+RFC 5646 Language Tags September 2009
+
+
+ The registration authority for ISO 15924 (script codes) is:
+
+ Unicode Consortium
+ Box 391476
+ Mountain View, CA 94039-1476, USA
+ URL: http://www.unicode.org/iso15924
+
+ The Statistics Division of the United Nations Secretariat maintains
+ the Standard Country or Area Codes for Statistical Use and can be
+ reached at:
+
+ Statistical Services Branch
+ Statistics Division
+ United Nations, Room DC2-1620
+ New York, NY 10017, USA
+ Fax: +1-212-963-0623
+ Email: statistics@un.org
+ URL: http://unstats.un.org/unsd/methods/m49/m49alpha.htm
+
+3.7. Extensions and the Extensions Registry
+
+ Extension subtags are those introduced by single-character subtags
+ ("singletons") other than 'x'. They are reserved for the generation
+ of identifiers that contain a language component and are compatible
+ with applications that understand language tags.
+
+ The structure and form of extensions are defined by this document so
+ that implementations can be created that are forward compatible with
+ applications that might be created using singletons in the future.
+ In addition, defining a mechanism for maintaining singletons will
+ lend stability to this document by reducing the likely need for
+ future revisions or updates.
+
+ Single-character subtags are assigned by IANA using the "IETF Review"
+ policy defined by [RFC5226]. This policy requires the development of
+ an RFC, which SHALL define the name, purpose, processes, and
+ procedures for maintaining the subtags. The maintaining or
+ registering authority, including name, contact email, discussion list
+ email, and URL location of the registry, MUST be indicated clearly in
+ the RFC. The RFC MUST specify or include each of the following:
+
+ o The specification MUST reference the specific version or revision
+ of this document that governs its creation and MUST reference this
+ section of this document.
+
+ o The specification and all subtags defined by the specification
+ MUST follow the ABNF and other rules for the formation of tags and
+ subtags as defined in this document. In particular, it MUST
+
+
+
+Phillips & Davis Best Current Practice [Page 49]
+
+RFC 5646 Language Tags September 2009
+
+
+ specify that case is not significant and that subtags MUST NOT
+ exceed eight characters in length.
+
+ o The specification MUST specify a canonical representation.
+
+ o The specification of valid subtags MUST be available over the
+ Internet and at no cost.
+
+ o The specification MUST be in the public domain or available via a
+ royalty-free license acceptable to the IETF and specified in the
+ RFC.
+
+ o The specification MUST be versioned, and each version of the
+ specification MUST be numbered, dated, and stable.
+
+ o The specification MUST be stable. That is, extension subtags,
+ once defined by a specification, MUST NOT be retracted or change
+ in meaning in any substantial way.
+
+ o The specification MUST include, in a separate section, the
+ registration form reproduced in this section (below) to be used in
+ registering the extension upon publication as an RFC.
+
+ o IANA MUST be informed of changes to the contact information and
+ URL for the specification.
+
+ IANA will maintain a registry of allocated single-character
+ (singleton) subtags. This registry MUST use the record-jar format
+ described by the ABNF in Section 3.1.1. Upon publication of an
+ extension as an RFC, the maintaining authority defined in the RFC
+ MUST forward this registration form to <iesg@ietf.org>, who MUST
+ forward the request to <iana@iana.org>. The maintaining authority of
+ the extension MUST maintain the accuracy of the record by sending an
+ updated full copy of the record to <iana@iana.org> with the subject
+ line "LANGUAGE TAG EXTENSION UPDATE" whenever content changes. Only
+ the 'Comments', 'Contact_Email', 'Mailing_List', and 'URL' fields MAY
+ be modified in these updates.
+
+ Failure to maintain this record, maintain the corresponding registry,
+ or meet other conditions imposed by this section of this document MAY
+ be appealed to the IESG [RFC2028] under the same rules as other IETF
+ decisions (see [RFC2026]) and MAY result in the authority to maintain
+ the extension being withdrawn or reassigned by the IESG.
+
+
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 50]
+
+RFC 5646 Language Tags September 2009
+
+
+ %%
+ Identifier:
+ Description:
+ Comments:
+ Added:
+ RFC:
+ Authority:
+ Contact_Email:
+ Mailing_List:
+ URL:
+ %%
+
+ Figure 6: Format of Records in the Language Tag Extensions Registry
+
+ 'Identifier' contains the single-character subtag (singleton)
+ assigned to the extension. The Internet-Draft submitted to define
+ the extension SHOULD specify which letter or digit to use, although
+ the IESG MAY change the assignment when approving the RFC.
+
+ 'Description' contains the name and description of the extension.
+
+ 'Comments' is an OPTIONAL field and MAY contain a broader description
+ of the extension.
+
+ 'Added' contains the date the extension's RFC was published in the
+ "full-date" format specified in [RFC3339]. For example: 2004-06-28
+ represents June 28, 2004, in the Gregorian calendar.
+
+ 'RFC' contains the RFC number assigned to the extension.
+
+ 'Authority' contains the name of the maintaining authority for the
+ extension.
+
+ 'Contact_Email' contains the email address used to contact the
+ maintaining authority.
+
+ 'Mailing_List' contains the URL or subscription email address of the
+ mailing list used by the maintaining authority.
+
+ 'URL' contains the URL of the registry for this extension.
+
+ The determination of whether an Internet-Draft meets the above
+ conditions and the decision to grant or withhold such authority rests
+ solely with the IESG and is subject to the normal review and appeals
+ process associated with the RFC process.
+
+ Extension authors are strongly cautioned that many (including most
+ well-formed) processors will be unaware of any special relationships
+
+
+
+Phillips & Davis Best Current Practice [Page 51]
+
+RFC 5646 Language Tags September 2009
+
+
+ or meaning inherent in the order of extension subtags. Extension
+ authors SHOULD avoid subtag relationships or canonicalization
+ mechanisms that interfere with matching or with length restrictions
+ that sometimes exist in common protocols where the extension is used.
+ In particular, applications MAY truncate the subtags in doing
+ matching or in fitting into limited lengths, so it is RECOMMENDED
+ that the most significant information be in the most significant
+ (left-most) subtags and that the specification gracefully handle
+ truncated subtags.
+
+ When a language tag is to be used in a specific, known protocol, it
+ is RECOMMENDED that the language tag not contain extensions not
+ supported by that protocol. In addition, note that some protocols
+ MAY impose upper limits on the length of the strings used to store or
+ transport the language tag.
+
+3.8. Update of the Language Subtag Registry
+
+ After the adoption of this document, the IANA Language Subtag
+ Registry needed an update so that it would contain the complete set
+ of subtags valid in a language tag. [RFC5645] describes the process
+ used to create this update.
+
+ Registrations that are in process under the rules defined in
+ [RFC4646] when this document is adopted MUST be completed under the
+ rules contained in this document.
+
+3.9. Applicability of the Subtag Registry
+
+ The Language Subtag Registry is the source of data elements used to
+ construct language tags, following the rules described in this
+ document. Language tags are designed for indicating linguistic
+ attributes of various content, including not only text but also most
+ media formats, such as video or audio. They also form the basis for
+ language and locale negotiation in various protocols and APIs.
+
+ The registry is therefore applicable to many applications that need
+ some form of language identification, with these limitations:
+
+ o It is not designed to be the sole data source in the creation of a
+ language-selection user interface. For example, the registry does
+ not contain translations for subtag descriptions or for tags
+ composed from the subtags. Sources for localized data based on
+ the registry are generally available, notably [CLDR]. Nor does
+ the registry indicate which subtag combinations are particularly
+ useful or relevant.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 52]
+
+RFC 5646 Language Tags September 2009
+
+
+ o It does not provide information indicating relationships between
+ different languages, such as might be used in a user interface to
+ select language tags hierarchically, regionally, or on some other
+ organizational model.
+
+ o It does not supply information about potential overlap between
+ different language tags, as the notion of what constitutes a
+ language is not precise: several different language tags might be
+ reasonable choices for the same given piece of content.
+
+ o It does not contain information about appropriate fallback choices
+ when performing language negotiation. A good fallback language
+ might be linguistically unrelated to the specified language. The
+ fact that one language is often used as a fallback language for
+ another is usually a result of outside factors, such as geography,
+ history, or culture -- factors that might not apply in all cases.
+ For example, most people who use Breton (a Celtic language used in
+ the Northwest of France) would probably prefer to be served French
+ (a Romance language) if Breton isn't available.
+
+4. Formation and Processing of Language Tags
+
+ This section addresses how to use the information in the registry
+ with the tag syntax to choose, form, and process language tags.
+
+4.1. Choice of Language Tag
+
+ The guiding principle in forming language tags is to "tag content
+ wisely." Sometimes there is a choice between several possible tags
+ for the same content. The choice of which tag to use depends on the
+ content and application in question, and some amount of judgment
+ might be necessary when selecting a tag.
+
+ Interoperability is best served when the same language tag is used
+ consistently to represent the same language. If an application has
+ requirements that make the rules here inapplicable, then that
+ application risks damaging interoperability. It is strongly
+ RECOMMENDED that users not define their own rules for language tag
+ choice.
+
+ Standards, protocols, and applications that reference this document
+ normatively but apply different rules to the ones given in this
+ section MUST specify how language tag selection varies from the
+ guidelines given here.
+
+ To ensure consistent backward compatibility, this document contains
+ several provisions to account for potential instability in the
+ standards used to define the subtags that make up language tags.
+
+
+
+Phillips & Davis Best Current Practice [Page 53]
+
+RFC 5646 Language Tags September 2009
+
+
+ These provisions mean that no valid language tag can become invalid,
+ nor will a language tag have a narrower scope in the future (it may
+ have a broader scope). The most appropriate language tag for a given
+ application or content item might evolve over time, but once applied,
+ the tag itself cannot become invalid or have its meaning wholly
+ change.
+
+ A subtag SHOULD only be used when it adds useful distinguishing
+ information to the tag. Extraneous subtags interfere with the
+ meaning, understanding, and processing of language tags. In
+ particular, users and implementations SHOULD follow the 'Prefix' and
+ 'Suppress-Script' fields in the registry (defined in Section 3.1):
+ these fields provide guidance on when specific additional subtags
+ SHOULD be used or avoided in a language tag.
+
+ The choice of subtags used to form a language tag SHOULD follow these
+ guidelines:
+
+ 1. Use as precise a tag as possible, but no more specific than is
+ justified. Avoid using subtags that are not important for
+ distinguishing content in an application.
+
+ * For example, 'de' might suffice for tagging an email written
+ in German, while "de-CH-1996" is probably unnecessarily
+ precise for such a task.
+
+ * Note that some subtag sequences might not represent the
+ language a casual user might expect. For example, the Swiss
+ German (Schweizerdeutsch) language is represented by "gsw-CH"
+ and not by "de-CH". This latter tag represents German ('de')
+ as used in Switzerland ('CH'), also known as Swiss High German
+ (Schweizer Hochdeutsch). Both are real languages, and
+ distinguishing between them could be important to an
+ application.
+
+ 2. The script subtag SHOULD NOT be used to form language tags unless
+ the script adds some distinguishing information to the tag.
+ Script subtags were first formally defined in [RFC4646]. Their
+ use can affect matching and subtag identification for
+ implementations of [RFC1766] or [RFC3066] (which are obsoleted by
+ this document), as these subtags appear between the primary
+ language and region subtags. Some applications can benefit from
+ the use of script subtags in language tags, as long as the use is
+ consistent for a given context. Script subtags are never
+ appropriate for unwritten content (such as audio recordings).
+ The field 'Suppress-Script' in the primary or extended language
+ record in the registry indicates script subtags that do not add
+ distinguishing information for most applications; this field
+
+
+
+Phillips & Davis Best Current Practice [Page 54]
+
+RFC 5646 Language Tags September 2009
+
+
+ defines when users SHOULD NOT include a script subtag with a
+ particular primary language subtag.
+
+ For example, if an implementation selects content using Basic
+ Filtering [RFC4647] (originally described in Section 14.4 of
+ [RFC2616]) and the user requested the language range "en-US",
+ content labeled "en-Latn-US" will not match the request and thus
+ not be selected. Therefore, it is important to know when script
+ subtags will customarily be used and when they ought not be used.
+
+ For example:
+
+ * The subtag 'Latn' should not be used with the primary language
+ 'en' because nearly all English documents are written in the
+ Latin script and it adds no distinguishing information.
+ However, if a document were written in English mixing Latin
+ script with another script such as Braille ('Brai'), then it
+ might be appropriate to choose to indicate both scripts to aid
+ in content selection, such as the application of a style
+ sheet.
+
+ * When labeling content that is unwritten (such as a recording
+ of human speech), the script subtag should not be used, even
+ if the language is customarily written in several scripts.
+ Thus, the subtitles to a movie might use the tag "uz-Arab"
+ (Uzbek, Arabic script), but the audio track for the same
+ language would be tagged simply "uz". (The tag "uz-Zxxx"
+ could also be used where content is not written, as the subtag
+ 'Zxxx' represents the "Code for unwritten documents".)
+
+ 3. If a tag or subtag has a 'Preferred-Value' field in its registry
+ entry, then the value of that field SHOULD be used to form the
+ language tag in preference to the tag or subtag in which the
+ preferred value appears.
+
+ * For example, use 'jbo' for Lojban in preference to the
+ grandfathered tag "art-lojban".
+
+ 4. Use subtags or sequences of subtags for individual languages in
+ preference to subtags for language collections. A "language
+ collection" is a group of languages that are descended from a
+ common ancestor, are spoken in the same geographical area, or are
+ otherwise related. Certain language collections are assigned
+ codes by [ISO639-5] (and some of these [ISO639-5] codes are also
+ defined as collections in [ISO639-2]). These codes are included
+ as primary language subtags in the registry. Subtags for a
+ language collection in the registry have a 'Scope' field with a
+ value of 'collection'. A subtag for a language collection is
+
+
+
+Phillips & Davis Best Current Practice [Page 55]
+
+RFC 5646 Language Tags September 2009
+
+
+ always preferred to less specific alternatives such as 'mul' and
+ 'und' (see below), and a subtag representing a language
+ collection MAY be used when more specific language information is
+ not available. However, most users and implementations do not
+ know there is a relationship between the collection and its
+ individual languages. In addition, the relationship between the
+ individual languages in the collection is not well defined; in
+ particular, the languages are usually not mutually intelligible.
+ Since the subtags are different, a request for the collection
+ will typically only produce items tagged with the collection's
+ subtag, not items tagged with subtags for the individual
+ languages contained in the collection.
+
+ * For example, collections are interpreted inclusively, so the
+ subtag 'gem' (Germanic languages) could, but SHOULD NOT, be
+ used with content that would be better tagged with "en"
+ (English), "de" (German), or "gsw" (Swiss German, Alemannic).
+ While 'gem' collects all of these (and other) languages, most
+ implementations will not match 'gem' to the individual
+ languages; thus, using the subtag will not produce the desired
+ result.
+
+ 5. [ISO639-2] has defined several codes included in the subtag
+ registry that require additional care when choosing language
+ tags. In most of these cases, where omitting the language tag is
+ permitted, such omission is preferable to using these codes.
+ Language tags SHOULD NOT incorporate these subtags as a prefix,
+ unless the additional information conveys some value to the
+ application.
+
+ * The 'mul' (Multiple) primary language subtag identifies
+ content in multiple languages. This subtag SHOULD NOT be used
+ when a list of languages or individual tags for each content
+ element can be used instead. For example, the 'Content-
+ Language' header [RFC3282] allows a list of languages to be
+ used, not just a single language tag.
+
+ * The 'und' (Undetermined) primary language subtag identifies
+ linguistic content whose language is not determined. This
+ subtag SHOULD NOT be used unless a language tag is required
+ and language information is not available or cannot be
+ determined. Omitting the language tag (where permitted) is
+ preferred. The 'und' subtag might be useful for protocols
+ that require a language tag to be provided or where a primary
+ language subtag is required (such as in "und-Latn"). The
+ 'und' subtag MAY also be useful when matching language tags in
+ certain situations.
+
+
+
+
+Phillips & Davis Best Current Practice [Page 56]
+
+RFC 5646 Language Tags September 2009
+
+
+ * The 'zxx' (Non-Linguistic, Not Applicable) primary language
+ subtag identifies content for which a language classification
+ is inappropriate or does not apply. Some examples might
+ include instrumental or electronic music; sound recordings
+ consisting of nonverbal sounds; audiovisual materials with no
+ narration, dialog, printed titles, or subtitles; machine-
+ readable data files consisting of machine languages or
+ character codes; or programming source code.
+
+ * The 'mis' (Uncoded) primary language subtag identifies content
+ whose language is known but that does not currently have a
+ corresponding subtag. This subtag SHOULD NOT be used.
+ Because the addition of other codes in the future can render
+ its application invalid, it is inherently unstable and hence
+ incompatible with the stability goals of BCP 47. It is always
+ preferable to use other subtags: either 'und' or (with prior
+ agreement) private use subtags.
+
+ 6. Use variant subtags sparingly and in the correct order. Most
+ variant subtags have one or more 'Prefix' fields in the registry
+ that express the list of subtags with which they are appropriate.
+ Variants SHOULD only be used with subtags that appear in one of
+ these 'Prefix' fields. If a variant lists a second variant in
+ one of its 'Prefix' fields, the first variant SHOULD appear
+ directly after the second variant in any language tag where both
+ occur. General purpose variants (those with no 'Prefix' fields
+ at all) SHOULD appear after any other variant subtags. Order any
+ remaining variants by placing the most significant subtag first.
+ If none of the subtags is more significant or no relationship can
+ be determined, alphabetize the subtags. Because variants are
+ very specialized, using many of them together generally makes the
+ tag so narrow as to override the additional precision gained.
+ Putting the subtags into another order interferes with
+ interoperability, as well as the overall interpretation of the
+ tag.
+
+ For example:
+
+ * The tag "en-scotland-fonipa" (English, Scottish dialect, IPA
+ phonetic transcription) is correctly ordered because
+ 'scotland' has a 'Prefix' of "en", while 'fonipa' has no
+ 'Prefix' field.
+
+ * The tag "sl-IT-rozaj-biske-1994" is correctly ordered: 'rozaj'
+ lists "sl" as its sole 'Prefix'; 'biske' lists "sl-rozaj" as
+ its sole 'Prefix'. The subtag '1994' has several prefixes,
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 57]
+
+RFC 5646 Language Tags September 2009
+
+
+ including "sl-rozaj". However, it follows both 'rozaj' and
+ 'biske' because one of its 'Prefix' fields is "sl-rozaj-
+ biske".
+
+ 7. The grandfathered tag "i-default" (Default Language) was
+ originally registered according to [RFC1766] to meet the needs of
+ [RFC2277]. It is not used to indicate a specific language, but
+ rather to identify the condition or content used where the
+ language preferences of the user cannot be established. It
+ SHOULD NOT be used except as a means of labeling the default
+ content for applications or protocols that require default
+ language content to be labeled with that specific tag. It MAY
+ also be used by an application or protocol to identify when the
+ default language content is being returned.
+
+4.1.1. Tagging Encompassed Languages
+
+ Some primary language records in the registry have a 'Macrolanguage'
+ field (Section 3.1.10) that contains a mapping from each "encompassed
+ language" to its macrolanguage. The 'Macrolanguage' mapping doesn't
+ define what the relationship between the encompassed language and its
+ macrolanguage is, nor does it define how languages encompassed by the
+ same macrolanguage are related to each other. Two different
+ languages encompassed by the same macrolanguage may differ from one
+ another more than, say, French and Spanish do.
+
+ A few specific macrolanguages, such as Chinese ('zh') and Arabic
+ ('ar'), are handled differently. See Section 4.1.2.
+
+ The more specific encompassed language subtag SHOULD be used to form
+ the language tag, although either the macrolanguage's primary
+ language subtag or the encompassed language's subtag MAY be used.
+ This means, for example, tagging Plains Cree with 'crk' rather than
+ 'cr' (Cree), and so forth.
+
+ Each macrolanguage subtag's scope, by definition, includes all of its
+ encompassed languages. Since the relationship between encompassed
+ languages varies, users cannot assume that the macrolanguage subtag
+ means any particular encompassed language, nor that any given pair of
+ encompassed languages are mutually intelligible or otherwise
+ interchangeable.
+
+ Applications MAY use macrolanguage information to improve matching or
+ language negotiation. For example, the information that 'sr'
+ (Serbian) and 'hr' (Croatian) share a macrolanguage expresses a
+ closer relation between those languages than between, say, 'sr'
+ (Serbian) and 'ma' (Macedonian). However, this relationship is not
+ guaranteed nor is it exclusive. For example, Romanian ('ro') and
+
+
+
+Phillips & Davis Best Current Practice [Page 58]
+
+RFC 5646 Language Tags September 2009
+
+
+ Moldavian ('mo') do not share a macrolanguage, but are far more
+ closely related to each other than Cantonese ('yue') and Wu ('wuu'),
+ which do share a macrolanguage.
+
+4.1.2. Using Extended Language Subtags
+
+ To accommodate language tag forms used prior to the adoption of this
+ document, language tags provide a special compatibility mechanism:
+ the extended language subtag. Selected languages have been provided
+ with both primary and extended language subtags. These include
+ macrolanguages, such as Malay ('ms') and Uzbek ('uz'), that have a
+ specific dominant variety that is generally synonymous with the
+ macrolanguage. Other languages, such as the Chinese ('zh') and
+ Arabic ('ar') macrolanguages and the various sign languages ('sgn'),
+ have traditionally used their primary language subtag, possibly
+ coupled with various region subtags or as part of a registered
+ grandfathered tag, to indicate the language.
+
+ With the adoption of this document, specific ISO 639-3 subtags became
+ available to identify the languages contained within these diverse
+ language families or groupings. This presents a choice of language
+ tags where previously none existed:
+
+ o Each encompassed language's subtag SHOULD be used as the primary
+ language subtag. For example, a document in Mandarin Chinese
+ would be tagged "cmn" (the subtag for Mandarin Chinese) in
+ preference to "zh" (Chinese).
+
+ o If compatibility is desired or needed, the encompassed subtag MAY
+ be used as an extended language subtag. For example, a document
+ in Mandarin Chinese could be tagged "zh-cmn" instead of either
+ "cmn" or "zh".
+
+ o The macrolanguage or prefixing subtag MAY still be used to form
+ the tag instead of the more specific encompassed language subtag.
+ That is, tags such as "zh-HK" or "sgn-RU" are still valid.
+
+ Chinese ('zh') provides a useful illustration of this. In the past,
+ various content has used tags beginning with the 'zh' subtag, with
+ application-specific meaning being associated with region codes,
+ private use sequences, or grandfathered registered values. This is
+ because historically only the macrolanguage subtag 'zh' was available
+ for forming language tags. However, the languages encompassed by the
+ Chinese subtag 'zh' are, in the main, not mutually intelligible when
+ spoken, and the written forms of these languages also show wide
+ variation in form and usage.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 59]
+
+RFC 5646 Language Tags September 2009
+
+
+ To provide compatibility, Chinese languages encompassed by the 'zh'
+ subtag are in the registry both as primary language subtags and as
+ extended language subtags. For example, the ISO 639-3 code for
+ Cantonese is 'yue'. Content in Cantonese might historically have
+ used a tag such as "zh-HK" (since Cantonese is commonly spoken in
+ Hong Kong), although that tag actually means any type of Chinese as
+ used in Hong Kong. With the availability of ISO 639-3 codes in the
+ registry, content in Cantonese can be directly tagged using the 'yue'
+ subtag. The content can use it as a primary language subtag, as in
+ the tag "yue-HK" (Cantonese, Hong Kong). Or it can use an extended
+ language subtag with 'zh', as in the tag "zh-yue-Hant" (Chinese,
+ Cantonese, Traditional script).
+
+ As noted above, applications can choose to use the macrolanguage
+ subtag to form the tag instead of using the more specific encompassed
+ language subtag. For example, an application with large quantities
+ of data already using tags with the 'zh' (Chinese) subtag might
+ continue to use this more general subtag even for new data, even
+ though the content could be more precisely tagged with 'cmn'
+ (Mandarin), 'yue' (Cantonese), 'wuu' (Wu), and so on. Similarly, an
+ application already using tags that start with the 'ar' (Arabic)
+ subtag might continue to use this more general subtag even for new
+ data, which could be more precisely tagged with 'arb' (Standard
+ Arabic).
+
+ In some cases, the encompassed languages had tags registered for them
+ during the RFC 3066 era. Those grandfathered tags not already
+ deprecated or rendered redundant were deprecated in the registry upon
+ adoption of this document. As grandfathered values, they remain
+ valid for use, and some content or applications might use them. As
+ with other grandfathered tags, since implementations might not be
+ able to associate the grandfathered tags with the encompassed
+ language subtag equivalents that are recommended by this document,
+ implementations are encouraged to canonicalize tags for comparison
+ purposes. Some examples of this include the tags "zh-hakka" (Hakka)
+ and "zh-guoyu" (Mandarin or Standard Chinese).
+
+ Sign languages share a mode of communication rather than a linguistic
+ heritage. There are many sign languages that have developed
+ independently, and the subtag 'sgn' indicates only the presence of a
+ sign language. A number of sign languages also had grandfathered
+ tags registered for them during the RFC 3066 era. For example, the
+ grandfathered tag "sgn-US" was registered to represent 'American Sign
+ Language' specifically, without reference to the United States. This
+ is still valid, but deprecated: a document in American Sign Language
+ can be labeled either "ase" or "sgn-ase" (the 'ase' subtag is for the
+ language called 'American Sign Language').
+
+
+
+
+Phillips & Davis Best Current Practice [Page 60]
+
+RFC 5646 Language Tags September 2009
+
+
+4.2. Meaning of the Language Tag
+
+ The meaning of a language tag is related to the meaning of the
+ subtags that it contains. Each subtag, in turn, implies a certain
+ range of expectations one might have for related content, although it
+ is not a guarantee. For example, the use of a script subtag such as
+ 'Arab' (Arabic script) does not mean that the content contains only
+ Arabic characters. It does mean that the language involved is
+ predominantly in the Arabic script. Thus, a language tag and its
+ subtags can encompass a very wide range of variation and yet remain
+ appropriate in each particular instance.
+
+ Validity of a tag is not the only factor determining its usefulness.
+ While every valid tag has a meaning, it might not represent any real-
+ world language usage. This is unavoidable in a system in which
+ subtags can be combined freely. For example, tags such as
+ "ar-Cyrl-CO" (Arabic, Cyrillic script, as used in Colombia) or "tlh-
+ Kore-AQ-fonipa" (Klingon, Korean script, as used in Antarctica, IPA
+ phonetic transcription) are both valid and unlikely to represent a
+ useful combination of language attributes.
+
+ The meaning of a given tag doesn't depend on the context in which it
+ appears. The relationship between a tag's meaning and the
+ information objects to which that tag is applied, however, can vary.
+
+ o For a single information object, the associated language tags
+ might be interpreted as the set of languages that is necessary for
+ a complete comprehension of the complete object. Example: Plain
+ text documents.
+
+ o For an aggregation of information objects, the associated language
+ tags could be taken as the set of languages used inside components
+ of that aggregation. Examples: Document stores and libraries.
+
+ o For information objects whose purpose is to provide alternatives,
+ the associated language tags could be regarded as a hint that the
+ content is provided in several languages and that one has to
+ inspect each of the alternatives in order to find its language or
+ languages. In this case, the presence of multiple tags might not
+ mean that one needs to be multilingual to get complete
+ understanding of the document. Example: MIME multipart/
+ alternative [RFC2046].
+
+ o For markup languages, such as HTML and XML, language information
+ can be added to each part of the document identified by the markup
+ structure (including the whole document itself). For example, one
+ could write <span lang="fr">C'est la vie.</span> inside a German
+ document; the German-speaking user could then access a French-
+
+
+
+Phillips & Davis Best Current Practice [Page 61]
+
+RFC 5646 Language Tags September 2009
+
+
+ German dictionary to find out what the marked section meant. If
+ the user were listening to that document through a speech
+ synthesis interface, this formation could be used to signal the
+ synthesizer to appropriately apply French text-to-speech
+ pronunciation rules to that span of text, instead of applying the
+ inappropriate German rules.
+
+ o For markup languages and document formats that allow the audience
+ to be identified, a language tag could indicate the audience(s)
+ appropriate for that document. For example, the same HTML
+ document described in the preceding bullet might have an HTTP
+ header "Content-Language: de" to indicate that the intended
+ audience for the file is German (even though three words appear
+ and are identified as being in French within it).
+
+ o For systems and APIs, language tags form the basis for most
+ implementations of locale identifiers. For example, see Unicode's
+ CLDR (Common Locale Data Repository) (see UTS #35 [UTS35])
+ project.
+
+ Language tags are related when they contain a similar sequence of
+ subtags. For example, if a language tag B contains language tag A as
+ a prefix, then B is typically "narrower" or "more specific" than A.
+ Thus, "zh-Hant-TW" is more specific than "zh-Hant".
+
+ This relationship is not guaranteed in all cases: specifically,
+ languages that begin with the same sequence of subtags are NOT
+ guaranteed to be mutually intelligible, although they might be. For
+ example, the tag "az" shares a prefix with both "az-Latn"
+ (Azerbaijani written using the Latin script) and "az-Cyrl"
+ (Azerbaijani written using the Cyrillic script). A person fluent in
+ one script might not be able to read the other, even though the
+ linguistic content (e.g., what would be heard if both texts were read
+ aloud) might be identical. Content tagged as "az" most probably is
+ written in just one script and thus might not be intelligible to a
+ reader familiar with the other script.
+
+ Similarly, not all subtags specify an actual distinction in language.
+ For example, the tags "en-US" and "en-CA" mean, roughly, English with
+ features generally thought to be characteristic of the United States
+ and Canada, respectively. They do not imply that a significant
+ dialectical boundary exists between any arbitrarily selected point in
+ the United States and any arbitrarily selected point in Canada.
+ Neither does a particular region subtag imply that linguistic
+ distinctions do not exist within that region.
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 62]
+
+RFC 5646 Language Tags September 2009
+
+
+4.3. Lists of Languages
+
+ In some applications, a single content item might best be associated
+ with more than one language tag. Examples of such a usage include:
+
+ o Content items that contain multiple, distinct varieties. Often
+ this is used to indicate an appropriate audience for a given
+ content item when multiple choices might be appropriate. Examples
+ of this could include:
+
+ * Metadata about the appropriate audience for a movie title. For
+ example, a DVD might label its individual audio tracks 'de'
+ (German), 'fr' (French), and 'es' (Spanish), but the overall
+ title would list "de, fr, es" as its overall audience.
+
+ * A French/English, English/French dictionary tagged as both "en"
+ and "fr" to specify that it applies equally to French and
+ English.
+
+ * A side-by-side or interlinear translation of a document, as is
+ commonly done with classical works in Latin or Greek.
+
+ o Content items that contain a single language but that require
+ multiple levels of specificity. For example, a library might wish
+ to classify a particular work as both Norwegian ('no') and as
+ Nynorsk ('nn') for audiences capable of appreciating the
+ distinction or needing to select content more narrowly.
+
+4.4. Length Considerations
+
+ There is no defined upper limit on the size of language tags. While
+ historically most language tags have consisted of language and region
+ subtags with a combined total length of up to six characters, larger
+ tags have always been both possible and have actually appeared in
+ use.
+
+ Neither the language tag syntax nor other requirements in this
+ document impose a fixed upper limit on the number of subtags in a
+ language tag (and thus an upper bound on the size of a tag). The
+ language tag syntax suggests that, depending on the specific
+ language, more subtags (and thus a longer tag) are sometimes
+ necessary to completely identify the language for certain
+ applications; thus, it is possible to envision long or complex subtag
+ sequences.
+
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 63]
+
+RFC 5646 Language Tags September 2009
+
+
+4.4.1. Working with Limited Buffer Sizes
+
+ Some applications and protocols are forced to allocate fixed buffer
+ sizes or otherwise limit the length of a language tag. A conformant
+ implementation or specification MAY refuse to support the storage of
+ language tags that exceed a specified length. Any such limitation
+ SHOULD be clearly documented, and such documentation SHOULD include
+ what happens to longer tags (for example, whether an error value is
+ generated or the language tag is truncated). A protocol that allows
+ tags to be truncated at an arbitrary limit, without giving any
+ indication of what that limit is, has the potential to cause harm by
+ changing the meaning of tags in substantial ways.
+
+ In practice, most language tags do not require more than a few
+ subtags and will not approach reasonably sized buffer limitations;
+ see Section 4.1.
+
+ Some specifications or protocols have limits on tag length but do not
+ have a fixed length limitation. For example, [RFC2231] has no
+ explicit length limitation: the length available for the language tag
+ is constrained by the length of other header components (such as the
+ charset's name) coupled with the 76-character limit in [RFC2047].
+ Thus, the "limit" might be 50 or more characters, but it could
+ potentially be quite small.
+
+ The considerations for assigning a buffer limit are:
+
+ Implementations SHOULD NOT truncate language tags unless the
+ meaning of the tag is purposefully being changed, or unless the
+ tag does not fit into a limited buffer size specified by a
+ protocol for storage or transmission.
+
+ Implementations SHOULD warn the user when a tag is truncated since
+ truncation changes the semantic meaning of the tag.
+
+ Implementations of protocols or specifications that are space
+ constrained but do not have a fixed limit SHOULD use the longest
+ possible tag in preference to truncation.
+
+ Protocols or specifications that specify limited buffer sizes for
+ language tags MUST allow for language tags of at least 35
+ characters. Note that [RFC4646] recommended a minimum field size
+ of 42 characters because it included all three elements of the
+ 'extlang' production. Two of these are now permanently reserved,
+ so a registered primary language subtag of the maximum length of 8
+ characters is now longer than the longest language-extlang
+ combination. Protocols or specifications that commonly use
+
+
+
+
+Phillips & Davis Best Current Practice [Page 64]
+
+RFC 5646 Language Tags September 2009
+
+
+ extensions or private use subtags might wish to reserve or
+ recommend a longer "minimum buffer" size.
+
+ The following illustration shows how the 35-character recommendation
+ was derived:
+
+ language = 8 ; longest allowed registered value
+ ; longer than primary+extlang
+ ; which requires 7 characters
+ script = 5 ; if not suppressed: see Section 4.1
+ region = 4 ; UN M.49 numeric region code
+ ; ISO 3166-1 codes require 3
+ variant1 = 9 ; needs 'language' as a prefix
+ variant2 = 9 ; very rare, as it needs
+ ; 'language-variant1' as a prefix
+
+ total = 35 characters
+
+ Figure 7: Derivation of the Limit on Tag Length
+
+4.4.2. Truncation of Language Tags
+
+ Truncation of a language tag alters the meaning of the tag, and thus
+ SHOULD be avoided. However, truncation of language tags is sometimes
+ necessary due to limited buffer sizes. Such truncation MUST NOT
+ permit a subtag to be chopped off in the middle or the formation of
+ invalid tags (for example, one ending with the "-" character).
+
+ This means that applications or protocols that truncate tags MUST do
+ so by progressively removing subtags along with their preceding "-"
+ from the right side of the language tag until the tag is short enough
+ for the given buffer. If the resulting tag ends with a single-
+ character subtag, that subtag and its preceding "-" MUST also be
+ removed. For example:
+
+ Tag to truncate: zh-Latn-CN-variant1-a-extend1-x-wadegile-private1
+ 1. zh-Latn-CN-variant1-a-extend1-x-wadegile
+ 2. zh-Latn-CN-variant1-a-extend1
+ 3. zh-Latn-CN-variant1
+ 4. zh-Latn-CN
+ 5. zh-Latn
+ 6. zh
+
+ Figure 8: Example of Tag Truncation
+
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 65]
+
+RFC 5646 Language Tags September 2009
+
+
+4.5. Canonicalization of Language Tags
+
+ Since a particular language tag can be used by many processes,
+ language tags SHOULD always be created or generated in canonical
+ form.
+
+ A language tag is in 'canonical form' when the tag is well-formed
+ according to the rules in Sections 2.1 and 2.2 and it has been
+ canonicalized by applying each of the following steps in order, using
+ data from the IANA registry (see Section 3.1):
+
+ 1. Extension sequences are ordered into case-insensitive ASCII order
+ by singleton subtag.
+
+ * For example, the subtag sequence '-a-babble' comes before
+ '-b-warble'.
+
+ 2. Redundant or grandfathered tags are replaced by their 'Preferred-
+ Value', if there is one.
+
+ * The field-body of the 'Preferred-Value' for grandfathered and
+ redundant tags is an "extended language range" [RFC4647] and
+ might consist of more than one subtag.
+
+ * 'Preferred-Value' fields in the registry provide mappings from
+ deprecated tags to modern equivalents. Many of these were
+ created before the adoption of this document (such as the
+ mapping of "no-nyn" to "nn" or "i-klingon" to "tlh"). Others
+ are the result of later registrations or additions to the
+ registry as permitted or required by this document (for
+ example, "zh-hakka" was deprecated in favor of the ISO 639-3
+ code 'hak' when this document was adopted).
+
+ 3. Subtags are replaced by their 'Preferred-Value', if there is one.
+ For extlangs, the original primary language subtag is also
+ replaced if there is a primary language subtag in the 'Preferred-
+ Value'.
+
+ * The field-body of the 'Preferred-Value' for extlangs is an
+ "extended language range" and typically maps to a primary
+ language subtag. For example, the subtag sequence "zh-hak"
+ (Chinese, Hakka) is replaced with the subtag 'hak' (Hakka).
+
+ * Most of the non-extlang subtags are either Region subtags
+ where the country name or designation has changed or clerical
+ corrections to ISO 639-1.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 66]
+
+RFC 5646 Language Tags September 2009
+
+
+ The canonical form contains no 'extlang' subtags. There is an
+ alternate 'extlang form' that maintains or reinstates extlang
+ subtags. This form can be useful in environments where the presence
+ of the 'Prefix' subtag is considered beneficial in matching or
+ selection (see Section 4.1.2).
+
+ A language tag is in 'extlang form' when the tag is well-formed
+ according to the rules in Sections 2.1 and 2.2 and it has been
+ processed by applying each of the following two steps in order, using
+ data from the IANA registry:
+
+ 1. The language tag is first transformed into canonical form, as
+ described above.
+
+ 2. If the language tag starts with a primary language subtag that is
+ also an extlang subtag, then the language tag is prepended with
+ the extlang's 'Prefix'.
+
+ * For example, "hak-CN" (Hakka, China) has the primary language
+ subtag 'hak', which in turn has an 'extlang' record with a
+ 'Prefix' 'zh' (Chinese). The extlang form is "zh-hak-CN"
+ (Chinese, Hakka, China).
+
+ * Note that Step 2 (prepending a prefix) can restore a subtag
+ that was removed by Step 1 (canonicalizing).
+
+ Example: The language tag "en-a-aaa-b-ccc-bbb-x-xyz" is in canonical
+ form, while "en-b-ccc-bbb-a-aaa-X-xyz" is well-formed and potentially
+ valid (extensions 'a' and 'b' are not defined as of the publication
+ of this document) but not in canonical form (the extensions are not
+ in alphabetical order).
+
+ Example: Although the tag "en-BU" (English as used in Burma)
+ maintains its validity, the language tag "en-BU" is not in canonical
+ form because the 'BU' subtag has a canonical mapping to 'MM'
+ (Myanmar).
+
+ Canonicalization of language tags does not imply anything about the
+ use of upper- or lowercase letters when processing or comparing
+ subtags (and as described in Section 2.1). All comparisons MUST be
+ performed in a case-insensitive manner.
+
+ When performing canonicalization of language tags, processors MAY
+ regularize the case of the subtags (that is, this process is
+ OPTIONAL), following the case used in the registry (see
+ Section 2.1.1).
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 67]
+
+RFC 5646 Language Tags September 2009
+
+
+ If more than one variant appears within a tag, processors MAY reorder
+ the variants to obtain better matching behavior or more consistent
+ presentation. Reordering of the variants SHOULD follow the
+ recommendations for variant ordering in Section 4.1.
+
+ If the field 'Deprecated' appears in a registry record without an
+ accompanying 'Preferred-Value' field, then that tag or subtag is
+ deprecated without a replacement. These values are canonical when
+ they appear in a language tag. However, tags that include these
+ values SHOULD NOT be selected by users or generated by
+ implementations.
+
+ An extension MUST define any relationships that exist between the
+ various subtags in the extension and thus MAY define an alternate
+ canonicalization scheme for the extension's subtags. Extensions MAY
+ define how the order of the extension's subtags is interpreted. For
+ example, an extension could define that its subtags are in canonical
+ order when the subtags are placed into ASCII order: that is, "en-a-
+ aaa-bbb-ccc" instead of "en-a-ccc-bbb-aaa". Another extension might
+ define that the order of the subtags influences their semantic
+ meaning (so that "en-b-ccc-bbb-aaa" has a different value from "en-b-
+ aaa-bbb-ccc"). However, extension specifications SHOULD be designed
+ so that they are tolerant of the typical processes described in
+ Section 3.7.
+
+4.6. Considerations for Private Use Subtags
+
+ Private use subtags, like all other subtags, MUST conform to the
+ format and content constraints in the ABNF. Private use subtags have
+ no meaning outside the private agreement between the parties that
+ intend to use or exchange language tags that employ them. The same
+ subtags MAY be used with a different meaning under a separate private
+ agreement. They SHOULD NOT be used where alternatives exist and
+ SHOULD NOT be used in content or protocols intended for general use.
+
+ Private use subtags are simply useless for information exchange
+ without prior arrangement. The value and semantic meaning of private
+ use tags and of the subtags used within such a language tag are not
+ defined by this document.
+
+ Private use sequences introduced by the 'x' singleton are completely
+ opaque to users or implementations outside of the private use
+ agreement. So, in addition to private use subtag sequences
+ introduced by the singleton subtag 'x', the Language Subtag Registry
+ provides private use language, script, and region subtags derived
+ from the private use codes assigned by the underlying standards.
+ These subtags are valid for use in forming language tags; they are
+ RECOMMENDED over the 'x' singleton private use subtag sequences
+
+
+
+Phillips & Davis Best Current Practice [Page 68]
+
+RFC 5646 Language Tags September 2009
+
+
+ because they convey more information via their linkage to the
+ language tag's inherent structure.
+
+ For example, the region subtags 'AA', 'ZZ', and those in the ranges
+ 'QM'-'QZ' and 'XA'-'XZ' (derived from the ISO 3166-1 private use
+ codes) can be used to form a language tag. A tag such as
+ "zh-Hans-XQ" conveys a great deal of public, interchangeable
+ information about the language material (that it is Chinese in the
+ simplified Chinese script and is suitable for some geographic region
+ 'XQ'). While the precise geographic region is not known outside of
+ private agreement, the tag conveys far more information than an
+ opaque tag such as "x-somelang" or even "zh-Hans-x-xq" (where the
+ 'xq' subtag's meaning is entirely opaque).
+
+ However, in some cases content tagged with private use subtags can
+ interact with other systems in a different and possibly unsuitable
+ manner compared to tags that use opaque, privately defined subtags,
+ so the choice of the best approach sometimes depends on the
+ particular domain in question.
+
+5. IANA Considerations
+
+ This section deals with the processes and requirements necessary for
+ IANA to maintain the subtag and extension registries as defined by
+ this document and in accordance with the requirements of [RFC5226].
+
+ The impact on the IANA maintainers of the two registries defined by
+ this document will be a small increase in the frequency of new
+ entries or updates. IANA also is required to create a new mailing
+ list (described below in Section 5.1) to announce registry changes
+ and updates.
+
+5.1. Language Subtag Registry
+
+ IANA updated the registry using instructions and content provided in
+ a companion document [RFC5645]. The criteria and process for
+ selecting the updated set of records are described in that document.
+ The updated set of records represents no impact on IANA, since the
+ work to create it will be performed externally.
+
+ Future work on the Language Subtag Registry includes the following
+ activities:
+
+ o Inserting or replacing whole records. These records are
+ preformatted for IANA by the Language Subtag Reviewer, as
+ described in Section 3.3.
+
+ o Archiving and making publicly available the registration forms.
+
+
+
+Phillips & Davis Best Current Practice [Page 69]
+
+RFC 5646 Language Tags September 2009
+
+
+ o Announcing each updated version of the registry on the
+ "ietf-languages-announcements@iana.org" mailing list.
+
+ Each registration form sent to IANA contains a single record for
+ incorporation into the registry. The form will be sent to
+ <iana@iana.org> by the Language Subtag Reviewer. It will have a
+ subject line indicating whether the enclosed form represents an
+ insertion of a new record (indicated by the word "INSERT" in the
+ subject line) or a replacement of an existing record (indicated by
+ the word "MODIFY" in the subject line). At no time can a record be
+ deleted from the registry.
+
+ IANA will extract the record from the form and place the inserted or
+ modified record into the appropriate section of the Language Subtag
+ Registry, grouping the records by their 'Type' field. Inserted
+ records can be placed anywhere within the appropriate section; there
+ is no guarantee that the registry's records will be placed in any
+ particular order except that they will always be grouped by 'Type'.
+ Modified records overwrite the record they replace.
+
+ Whenever an entry is created or modified in the registry, the 'File-
+ Date' record at the start of the registry is updated to reflect the
+ most recent modification date. The date format SHALL be the "full-
+ date" format of [RFC3339]. The date SHALL be the date on which that
+ version of the registry was first published by IANA. There SHALL be
+ at most one version of the registry published in a day. A 'File-
+ Date' record is also included in each request to IANA to insert or
+ modify records, indicating the acceptance date of the records in the
+ request.
+
+ The updated registry file MUST use the UTF-8 character encoding, and
+ IANA MUST check the registry file for proper encoding. Non-ASCII
+ characters can be sent to IANA by attaching the registration form to
+ the email message or by using various encodings in the mail message
+ body (UTF-8 is recommended). IANA will verify any unclear or
+ corrupted characters with the Language Subtag Reviewer prior to
+ posting the updated registry.
+
+ IANA will also archive and make publicly available from
+ http://www.iana.org each registration form. Note that multiple
+ registrations can pertain to the same record in the registry.
+
+ Developers who are dependent upon the Language Subtag Registry
+ sometimes would like to be informed of changes in the registry so
+ that they can update their implementations. When any change is made
+ to the Language Subtag Registry, IANA will send an announcement
+ message to <ietf-languages-announcements@iana.org> (a self-
+ subscribing list to which only IANA can post).
+
+
+
+Phillips & Davis Best Current Practice [Page 70]
+
+RFC 5646 Language Tags September 2009
+
+
+5.2. Extensions Registry
+
+ The Language Tag Extensions Registry can contain at most 35 records,
+ and thus changes to this registry are expected to be very infrequent.
+
+ Future work by IANA on the Language Tag Extensions Registry is
+ limited to two cases. First, the IESG MAY request that new records
+ be inserted into this registry from time to time. These requests
+ MUST include the record to insert in the exact format described in
+ Section 3.7. In addition, there MAY be occasional requests from the
+ maintaining authority for a specific extension to update the contact
+ information or URLs in the record. These requests MUST include the
+ complete, updated record. IANA is not responsible for validating the
+ information provided, only that it is properly formatted. IANA
+ SHOULD take reasonable steps to ascertain that the request comes from
+ the maintaining authority named in the record present in the
+ registry.
+
+6. Security Considerations
+
+ Language tags used in content negotiation, like any other information
+ exchanged on the Internet, might be a source of concern because they
+ might be used to infer the nationality of the sender, and thus
+ identify potential targets for surveillance.
+
+ This is a special case of the general problem that anything sent is
+ visible to the receiving party and possibly to third parties as well.
+ It is useful to be aware that such concerns can exist in some cases.
+
+ The evaluation of the exact magnitude of the threat, and any possible
+ countermeasures, is left to each application protocol (see BCP 72
+ [RFC3552] for best current practice guidance on security threats and
+ defenses).
+
+ The language tag associated with a particular information item is of
+ no consequence whatsoever in determining whether that content might
+ contain possible homographs. The fact that a text is tagged as being
+ in one language or using a particular script subtag provides no
+ assurance whatsoever that it does not contain characters from scripts
+ other than the one(s) associated with or specified by that language
+ tag.
+
+ Since there is no limit to the number of variant, private use, and
+ extension subtags, and consequently no limit on the possible length
+ of a tag, implementations need to guard against buffer overflow
+ attacks. See Section 4.4 for details on language tag truncation,
+ which can occur as a consequence of defenses against buffer overflow.
+
+
+
+
+Phillips & Davis Best Current Practice [Page 71]
+
+RFC 5646 Language Tags September 2009
+
+
+ To prevent denial-of-service attacks, applications SHOULD NOT depend
+ on either the Language Subtag Registry or the Language Tag Extensions
+ Registry being always accessible. Additionally, although the
+ specification of valid subtags for an extension (see Section 3.7)
+ MUST be available over the Internet, implementations SHOULD NOT
+ mechanically depend on those sources being always accessible.
+
+ The registries specified in this document are not suitable for
+ frequent or real-time access to, or retrieval of, the full registry
+ contents. Most applications do not need registry data at all. For
+ others, being able to validate or canonicalize language tags as of a
+ particular registry date will be sufficient, as the registry contents
+ change only occasionally. Changes are announced to
+ <ietf-languages-announcements@iana.org>. This mailing list is
+ intended for interested organizations and individuals, not for bulk
+ subscription to trigger automatic software updates. The size of the
+ registry makes it unsuitable for automatic software updates.
+ Implementers considering integrating the Language Subtag Registry in
+ an automatic updating scheme are strongly advised to distribute only
+ suitably encoded differences, and only via their own infrastructure
+ -- not directly from IANA.
+
+ Changes, or the absence thereof, can also easily be detected by
+ looking at the 'File-Date' record at the start of the registry, or by
+ using features of the protocol used for downloading, without having
+ to download the full registry. At the time of publication of this
+ document, IANA is making the Language Tag Registry available over
+ HTTP 1.1. The proper way to update a local copy of the Language
+ Subtag Registry using HTTP 1.1 is to use a conditional GET [RFC2616].
+
+7. Character Set Considerations
+
+ The syntax in this document requires that language tags use only the
+ characters A-Z, a-z, 0-9, and HYPHEN-MINUS, which are present in most
+ character sets, so the composition of language tags shouldn't have
+ any character set issues.
+
+ The rendering of text based on the language tag is not addressed
+ here. Historically, some processes have relied on the use of
+ character set/encoding information (or other external information) in
+ order to infer how a specific string of characters should be
+ rendered. Notably, this applies to language- and culture-specific
+ variations of Han ideographs as used in Japanese, Chinese, and
+ Korean, where use of, for example, a Japanese character encoding such
+ as EUC-JP implies that the text itself is in Japanese. When language
+ tags are applied to spans of text, rendering engines might be able to
+ use that information to better select fonts or make other rendering
+
+
+
+
+Phillips & Davis Best Current Practice [Page 72]
+
+RFC 5646 Language Tags September 2009
+
+
+ choices, particularly where languages with distinct writing
+ traditions use the same characters.
+
+8. Changes from RFC 4646
+
+ The main goal for this revision of RFC 4646 was to incorporate two
+ new parts of ISO 639 (ISO 639-3 and ISO 639-5) and their attendant
+ sets of language codes into the IANA Language Subtag Registry. This
+ permits the identification of many more languages and language
+ collections than previously supported.
+
+ The specific changes in this document to meet these goals are:
+
+ o Defined the incorporation of ISO 639-3 and ISO 639-5 codes for use
+ as primary and extended language subtags. It also permanently
+ reserves and disallows the use of additional 'extlang' subtags.
+ The changes necessary to achieve this were:
+
+ * Modified the ABNF comments.
+
+ * Updated various registration and stability requirements
+ sections to reference ISO 639-3 and ISO 639-5 in addition to
+ ISO 639-1 and ISO 639-2.
+
+ * Edited the text to eliminate references to extended language
+ subtags where they are no longer used.
+
+ * Explained the change in the section on extended language
+ subtags.
+
+ o Changed the ABNF related to grandfathered tags. The irregular
+ tags are now listed. Well-formed grandfathered tags are now
+ described by the 'langtag' production, and the 'grandfathered'
+ production was removed as a result. Also: added description of
+ both types of grandfathered tags to Section 2.2.8.
+
+ o Added the paragraph on "collections" to Section 4.1.
+
+ o Changed the capitalization rules for 'Tag' fields in Section 3.1.
+
+ o Split Section 3.1 up into subsections.
+
+ o Modified Section 3.5 to allow 'Suppress-Script' fields to be
+ added, modified, or removed via the registration process. This
+ was an erratum from RFC 4646.
+
+ o Modified examples that used region code 'CS' (formerly Serbia and
+ Montenegro) to use 'RS' (Serbia) instead.
+
+
+
+Phillips & Davis Best Current Practice [Page 73]
+
+RFC 5646 Language Tags September 2009
+
+
+ o Modified the rules for creating and maintaining record
+ 'Description' fields to prevent duplicates, including inverted
+ duplicates.
+
+ o Removed the lengthy description of why RFC 4646 was created from
+ this section, which also caused the removal of the reference to
+ XML Schema.
+
+ o Modified the text in Section 2.1 to place more emphasis on the
+ fact that language tags are not case sensitive.
+
+ o Replaced the example "fr-Latn-CA" in Section 2.1 with "sr-Latn-RS"
+ and "az-Arab-IR" because "fr-Latn-CA" doesn't respect the
+ 'Suppress-Script' on 'Latn' with 'fr'.
+
+ o Changed the requirements for well-formedness to make singleton
+ repetition checking optional (it is required for validity
+ checking) in Section 2.2.9.
+
+ o Changed the text in Section 2.2.9 referring to grandfathered
+ checking to note that the list is now included in the ABNF.
+
+ o Modified and added text to Section 3.2. The job description was
+ placed first. A note was added making clear that the Language
+ Subtag Reviewer may delegate various non-critical duties,
+ including list moderation. Finally, additional text was added to
+ make the appointment process clear and to clarify that decisions
+ and performance of the reviewer are appealable.
+
+ o Added text to Section 3.5 clarifying that the
+ ietf-languages@iana.org list is operated by whomever the IESG
+ appoints.
+
+ o Added text to Section 3.1.5 clarifying that the first Description
+ in a 'language' record matches the corresponding Reference Name
+ for the language in ISO 639-3.
+
+ o Modified Section 2.2.9 to define classes of conformance related to
+ specific tags (formerly 'well-formed' and 'valid' referred to
+ implementations). Notes were added about the removal of 'extlang'
+ from the ABNF provided in RFC 4646, allowing for well-formedness
+ using this older definition. Reference to RFC 3066 well-
+ formedness was also added.
+
+ o Added text to the end of Section 3.1.2 noting that future versions
+ of this document might add new field types to the registry format
+ and recommending that implementations ignore any unrecognized
+ fields.
+
+
+
+Phillips & Davis Best Current Practice [Page 74]
+
+RFC 5646 Language Tags September 2009
+
+
+ o Added text about what the lack of a 'Suppress-Script' field means
+ in a record to Section 3.1.9.
+
+ o Added text allowing the correction of misspellings and typographic
+ errors to Section 3.1.5.
+
+ o Added text to Section 3.1.8 disallowing 'Prefix' field conflicts
+ (such as circular prefix references).
+
+ o Modified text in Section 3.5 to require the subtag reviewer to
+ announce his/her decision (or extension) following the two-week
+ period. Also clarified that any decision or failure to decide can
+ be appealed.
+
+ o Modified text in Section 4.1 to include the (heretofore anecdotal)
+ guiding principle of tag choice, and clarifying the non-use of
+ script subtags in non-written applications.
+
+ o Prohibited multiple use of the same variant in a tag (i.e., "de-
+ 1901-1901"). Previously, this was only a recommendation
+ ("SHOULD").
+
+ o Removed inappropriate [RFC2119] language from the illustration in
+ Section 4.4.1.
+
+ o Replaced the example of deprecating "zh-guoyu" with "zh-
+ hakka"->"hak" in Section 4.5, noting that it was this document
+ that caused the change.
+
+ o Replaced the section in Section 4.1 dealing with "mul"/"und" to
+ include the subtags 'zxx' and 'mis', as well as the tag
+ "i-default". A normative reference to RFC 2277 was added.
+
+ o Added text to Section 3.5 clarifying that any modifications of a
+ registration request must be sent to the <ietf-languages@iana.org>
+ list before submission to IANA.
+
+ o Changed the ABNF for the record-jar format from using the LWSP
+ production to use a folding whitespace production similar to obs-
+ FWS in [RFC5234]. This effectively prevents unintentional blank
+ lines inside a field.
+
+ o Clarified and revised text in Sections 3.3, 3.5, and 5.1 to
+ clarify that the Language Subtag Reviewer sends the complete
+ registration forms to IANA, that IANA extracts the record from the
+ form, and that the forms must also be archived separately from the
+ registry.
+
+
+
+
+Phillips & Davis Best Current Practice [Page 75]
+
+RFC 5646 Language Tags September 2009
+
+
+ o Added text to Section 5 requiring IANA to send an announcement to
+ an ietf-languages-announcements list whenever the registry is
+ updated.
+
+ o Modification of the registry to use UTF-8 as its character
+ encoding. This also entails additional instructions to IANA and
+ the Language Subtag Reviewer in the registration process.
+
+ o Modified the rules in Section 2.2.4 so that "exceptionally
+ reserved" ISO 3166-1 codes other than 'UK' were included into the
+ registry. In particular, this allows the code 'EU' (European
+ Union) to be used to form language tags or (more commonly) for
+ applications that use the registry for region codes to reference
+ this subtag.
+
+ o Modified the IANA considerations section (Section 5) to remove
+ unnecessary normative [RFC2119] language.
+
+9. References
+
+9.1. Normative References
+
+ [ISO15924] International Organization for Standardization, "ISO
+ 15924:2004. Information and documentation -- Codes
+ for the representation of names of scripts",
+ January 2004.
+
+ [ISO3166-1] International Organization for Standardization, "ISO
+ 3166-1:2006. Codes for the representation of names
+ of countries and their subdivisions -- Part 1:
+ Country codes", November 2006.
+
+ [ISO639-1] International Organization for Standardization, "ISO
+ 639-1:2002. Codes for the representation of names
+ of languages -- Part 1: Alpha-2 code", July 2002.
+
+ [ISO639-2] International Organization for Standardization, "ISO
+ 639-2:1998. Codes for the representation of names
+ of languages -- Part 2: Alpha-3 code", October 1998.
+
+ [ISO639-3] International Organization for Standardization, "ISO
+ 639-3:2007. Codes for the representation of names
+ of languages - Part 3: Alpha-3 code for
+ comprehensive coverage of languages", February 2007.
+
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 76]
+
+RFC 5646 Language Tags September 2009
+
+
+ [ISO639-5] International Organization for Standardization, "ISO
+ 639-5:2008. Codes for the representation of names of
+ languages -- Part 5: Alpha-3 code for language
+ families and groups", May 2008.
+
+ [ISO646] International Organization for Standardization,
+ "ISO/IEC 646:1991, Information technology -- ISO
+ 7-bit coded character set for information
+ interchange.", 1991.
+
+ [RFC2026] Bradner, S., "The Internet Standards Process --
+ Revision 3", BCP 9, RFC 2026, October 1996.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+ [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
+ Languages", BCP 18, RFC 2277, January 1998.
+
+ [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the
+ Internet: Timestamps", RFC 3339, July 2002.
+
+ [RFC4647] Phillips, A. and M. Davis, "Matching of Language
+ Tags", BCP 47, RFC 4647, September 2006.
+
+ [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for
+ Writing an IANA Considerations Section in RFCs",
+ BCP 26, RFC 5226, May 2008.
+
+ [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for
+ Syntax Specifications: ABNF", STD 68, RFC 5234,
+ January 2008.
+
+ [SpecialCasing] The Unicode Consoritum, "Unicode Character Database,
+ Special Casing Properties", March 2008, <http://
+ unicode.org/Public/UNIDATA/SpecialCasing.txt>.
+
+ [UAX14] Freitag, A., "Unicode Standard Annex #14: Line
+ Breaking Properties", August 2006,
+ <http://www.unicode.org/reports/tr14/>.
+
+ [UN_M.49] Statistics Division, United Nations, "Standard
+ Country or Area Codes for Statistical Use", Revision
+ 4 (United Nations publication, Sales No. 98.XVII.9,
+ June 1999.
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 77]
+
+RFC 5646 Language Tags September 2009
+
+
+ [Unicode] Unicode Consortium, "The Unicode Consortium. The
+ Unicode Standard, Version 5.0, (Boston, MA, Addison-
+ Wesley, 2003. ISBN 0-321-49081-0)", January 2007.
+
+9.2. Informative References
+
+ [CLDR] "The Common Locale Data Repository Project",
+ <http://cldr.unicode.org>.
+
+ [RFC1766] Alvestrand, H., "Tags for the Identification of
+ Languages", RFC 1766, March 1995.
+
+ [RFC2028] Hovey, R. and S. Bradner, "The Organizations
+ Involved in the IETF Standards Process", BCP 11,
+ RFC 2028, October 1996.
+
+ [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet
+ Mail Extensions (MIME) Part Two: Media Types",
+ RFC 2046, November 1996.
+
+ [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail
+ Extensions) Part Three: Message Header Extensions
+ for Non-ASCII Text", RFC 2047, November 1996.
+
+ [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and
+ Encoded Word Extensions:
+ Character Sets, Languages, and Continuations",
+ RFC 2231, November 1997.
+
+ [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
+ Masinter, L., Leach, P., and T. Berners-Lee,
+ "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616,
+ June 1999.
+
+ [RFC2781] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of
+ ISO 10646", RFC 2781, February 2000.
+
+ [RFC3066] Alvestrand, H., "Tags for the Identification of
+ Languages", RFC 3066, January 2001.
+
+ [RFC3282] Alvestrand, H., "Content Language Headers",
+ RFC 3282, May 2002.
+
+ [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing
+ RFC Text on Security Considerations", BCP 72,
+ RFC 3552, July 2003.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 78]
+
+RFC 5646 Language Tags September 2009
+
+
+ [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
+ 10646", STD 63, RFC 3629, November 2003.
+
+ [RFC4645] Ewell, D., "Initial Language Subtag Registry",
+ RFC 4645, September 2006.
+
+ [RFC4646] Phillips, A. and M. Davis, "Tags for Identifying
+ Languages", BCP 47, RFC 4646, September 2006.
+
+ [RFC5645] Ewell, D., Ed., "Update to the Language Subtag
+ Registry", September 2009.
+
+ [UTS35] Davis, M., "Unicode Technical Standard #35: Locale
+ Data Markup Language (LDML)", December 2007,
+ <http://www.unicode.org/reports/tr35/>.
+
+ [iso639.prin] ISO 639 Joint Advisory Committee, "ISO 639 Joint
+ Advisory Committee: Working principles for ISO 639
+ maintenance", March 2000, <http://www.loc.gov/
+ standards/iso639-2/iso639jac_n3r.html>.
+
+ [record-jar] Raymond, E., "The Art of Unix Programming", 2003,
+ <urn:isbn:0-13-142901-9>.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 79]
+
+RFC 5646 Language Tags September 2009
+
+
+Appendix A. Examples of Language Tags (Informative)
+
+ Simple language subtag:
+
+ de (German)
+
+ fr (French)
+
+ ja (Japanese)
+
+ i-enochian (example of a grandfathered tag)
+
+ Language subtag plus Script subtag:
+
+ zh-Hant (Chinese written using the Traditional Chinese script)
+
+ zh-Hans (Chinese written using the Simplified Chinese script)
+
+ sr-Cyrl (Serbian written using the Cyrillic script)
+
+ sr-Latn (Serbian written using the Latin script)
+
+ Extended language subtags and their primary language subtag
+ counterparts:
+
+ zh-cmn-Hans-CN (Chinese, Mandarin, Simplified script, as used in
+ China)
+
+ cmn-Hans-CN (Mandarin Chinese, Simplified script, as used in
+ China)
+
+ zh-yue-HK (Chinese, Cantonese, as used in Hong Kong SAR)
+
+ yue-HK (Cantonese Chinese, as used in Hong Kong SAR)
+
+ Language-Script-Region:
+
+ zh-Hans-CN (Chinese written using the Simplified script as used in
+ mainland China)
+
+ sr-Latn-RS (Serbian written using the Latin script as used in
+ Serbia)
+
+
+
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 80]
+
+RFC 5646 Language Tags September 2009
+
+
+ Language-Variant:
+
+ sl-rozaj (Resian dialect of Slovenian)
+
+ sl-rozaj-biske (San Giorgio dialect of Resian dialect of
+ Slovenian)
+
+ sl-nedis (Nadiza dialect of Slovenian)
+
+ Language-Region-Variant:
+
+ de-CH-1901 (German as used in Switzerland using the 1901 variant
+ [orthography])
+
+ sl-IT-nedis (Slovenian as used in Italy, Nadiza dialect)
+
+ Language-Script-Region-Variant:
+
+ hy-Latn-IT-arevela (Eastern Armenian written in Latin script, as
+ used in Italy)
+
+ Language-Region:
+
+ de-DE (German for Germany)
+
+ en-US (English as used in the United States)
+
+ es-419 (Spanish appropriate for the Latin America and Caribbean
+ region using the UN region code)
+
+ Private use subtags:
+
+ de-CH-x-phonebk
+
+ az-Arab-x-AZE-derbend
+
+ Private use registry values:
+
+ x-whatever (private use using the singleton 'x')
+
+ qaa-Qaaa-QM-x-southern (all private tags)
+
+ de-Qaaa (German, with a private script)
+
+ sr-Latn-QM (Serbian, Latin script, private region)
+
+ sr-Qaaa-RS (Serbian, private script, for Serbia)
+
+
+
+
+Phillips & Davis Best Current Practice [Page 81]
+
+RFC 5646 Language Tags September 2009
+
+
+ Tags that use extensions (examples ONLY -- extensions MUST be defined
+ by revision or update to this document, or by RFC):
+
+ en-US-u-islamcal
+
+ zh-CN-a-myext-x-private
+
+ en-a-myext-b-another
+
+ Some Invalid Tags:
+
+ de-419-DE (two region tags)
+
+ a-DE (use of a single-character subtag in primary position; note
+ that there are a few grandfathered tags that start with "i-" that
+ are valid)
+
+ ar-a-aaa-b-bbb-a-ccc (two extensions with same single-letter
+ prefix)
+
+Appendix B. Examples of Registration Forms
+
+ LANGUAGE SUBTAG REGISTRATION FORM
+
+ 1. Name of requester: Han Steenwijk
+ 2. E-mail address of requester: han.steenwijk @ unipd.it
+ 3. Record Requested:
+
+ Type: variant
+ Subtag: biske
+ Description: The San Giorgio dialect of Resian
+ Description: The Bila dialect of Resian
+ Prefix: sl-rozaj
+ Comments: The dialect of San Giorgio/Bila is one of the
+ four major local dialects of Resian
+
+ 4. Intended meaning of the subtag:
+
+ The local variety of Resian as spoken in San Giorgio/Bila
+
+ 5. Reference to published description of the language (book or
+ article):
+
+ -- Jan I.N. Baudouin de Courtenay - Opyt fonetiki rez'janskich
+ govorov, Varsava - Peterburg: Vende - Kozancikov, 1875.
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 82]
+
+RFC 5646 Language Tags September 2009
+
+
+ LANGUAGE SUBTAG REGISTRATION FORM
+
+ 1. Name of requester: Jaska Zedlik
+ 2. E-mail address of requester: jz53 @ zedlik.com
+ 3. Record Requested:
+
+ Type: variant
+ Subtag: tarask
+ Description: Belarusian in Taraskievica orthography
+ Prefix: be
+ Comments: The subtag represents Branislau Taraskievic's Belarusian
+ orthography as published in "Bielaruski klasycny pravapis" by
+ Juras Buslakou, Vincuk Viacorka, Zmicier Sanko, and Zmicier Sauka
+ (Vilnia-Miensk 2005).
+
+ 4. Intended meaning of the subtag:
+
+ The subtag is intended to represent the Belarusian orthography as
+ published in "Bielaruski klasycny pravapis" by Juras Buslakou, Vincuk
+ Viacorka, Zmicier Sanko, and Zmicier Sauka (Vilnia-Miensk 2005).
+
+ 5. Reference to published description of the language (book or
+ article):
+
+ Taraskievic, Branislau. Bielaruskaja gramatyka dla skol. Vilnia: Vyd.
+ "Bielaruskaha kamitetu", 1929, 5th edition.
+
+ Buslakou, Juras; Viacorka, Vincuk; Sanko, Zmicier; Sauka, Zmicier.
+ Bielaruski klasycny pravapis. Vilnia-Miensk, 2005.
+
+ 6. Any other relevant information:
+
+ Belarusian in Taraskievica orthography became widely used, especially
+ in Belarusian-speaking Internet segment, but besides this some books
+ and newspapers are also printed using this orthography of Belarusian.
+
+Appendix C. Acknowledgements
+
+ Any list of contributors is bound to be incomplete; please regard the
+ following as only a selection from the group of people who have
+ contributed to make this document what it is today.
+
+ The contributors to RFC 4646, RFC 4647, RFC 3066, and RFC 1766, the
+ precursors of this document, made enormous contributions directly or
+ indirectly to this document and are generally responsible for the
+ success of language tags.
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 83]
+
+RFC 5646 Language Tags September 2009
+
+
+ The following people contributed to this document:
+
+ Stephane Bortzmeyer, Karen Broome, Peter Constable, John Cowan,
+ Martin Duerst, Frank Ellerman, Doug Ewell, Deborah Garside, Marion
+ Gunn, Alfred Hoenes, Kent Karlsson, Chris Newman, Randy Presuhn,
+ Stephen Silver, Shawn Steele, and many, many others.
+
+ Very special thanks must go to Harald Tveit Alvestrand, who
+ originated RFCs 1766 and 3066, and without whom this document would
+ not have been possible.
+
+ Special thanks go to Michael Everson, who served as the Language Tag
+ Reviewer for almost the entire RFC 1766/RFC 3066 period, as well as
+ the Language Subtag Reviewer since the adoption of RFC 4646.
+
+ Special thanks also go to Doug Ewell, for his production of the first
+ complete subtag registry, his work to support and maintain new
+ registrations, and his careful editorship of both RFC 4645 and
+ [RFC5645].
+
+Authors' Addresses
+
+ Addison Phillips (editor)
+ Lab126
+
+ EMail: addison@inter-locale.com
+ URI: http://www.inter-locale.com
+
+
+ Mark Davis (editor)
+ Google
+
+ EMail: markdavis@google.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Phillips & Davis Best Current Practice [Page 84]
+