summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc5890.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc5890.txt')
-rw-r--r--doc/rfc/rfc5890.txt1291
1 files changed, 1291 insertions, 0 deletions
diff --git a/doc/rfc/rfc5890.txt b/doc/rfc/rfc5890.txt
new file mode 100644
index 0000000..8ca47e2
--- /dev/null
+++ b/doc/rfc/rfc5890.txt
@@ -0,0 +1,1291 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) J. Klensin
+Request for Comments: 5890 August 2010
+Obsoletes: 3490
+Category: Standards Track
+ISSN: 2070-1721
+
+
+ Internationalized Domain Names for Applications (IDNA):
+ Definitions and Document Framework
+
+Abstract
+
+ This document is one of a collection that, together, describe the
+ protocol and usage context for a revision of Internationalized Domain
+ Names for Applications (IDNA), superseding the earlier version. It
+ describes the document collection and provides definitions and other
+ material that are common to the set.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc5890.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Standards Track [Page 1]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+Copyright Notice
+
+ Copyright (c) 2010 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+ This document may contain material from IETF Documents or IETF
+ Contributions published or made publicly available before November
+ 10, 2008. The person(s) controlling the copyright in some of this
+ material may not have granted the IETF Trust the right to allow
+ modifications of such material outside the IETF Standards Process.
+ Without obtaining an adequate license from the person(s) controlling
+ the copyright in such materials, this document may not be modified
+ outside the IETF Standards Process, and derivative works of it may
+ not be created outside the IETF Standards Process, except to format
+ it for publication as an RFC or to translate it into languages other
+ than English.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Standards Track [Page 2]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
+ 1.1. IDNA2008 . . . . . . . . . . . . . . . . . . . . . . . . . 4
+ 1.1.1. Audiences . . . . . . . . . . . . . . . . . . . . . . 4
+ 1.1.2. Normative Language . . . . . . . . . . . . . . . . . . 5
+ 1.2. Road Map of IDNA2008 Documents . . . . . . . . . . . . . . 5
+ 2. Definitions and Terminology . . . . . . . . . . . . . . . . . 6
+ 2.1. Characters and Character Sets . . . . . . . . . . . . . . 6
+ 2.2. DNS-Related Terminology . . . . . . . . . . . . . . . . . 6
+ 2.3. Terminology Specific to IDNA . . . . . . . . . . . . . . . 7
+ 2.3.1. LDH Label . . . . . . . . . . . . . . . . . . . . . . 7
+ 2.3.2. Terms for IDN Label Codings . . . . . . . . . . . . . 11
+ 2.3.2.1. IDNA-valid strings, A-label, and U-label . . . . . 11
+ 2.3.2.2. NR-LDH Label . . . . . . . . . . . . . . . . . . . 13
+ 2.3.2.3. Internationalized Domain Name and
+ Internationalized Label . . . . . . . . . . . . . 13
+ 2.3.2.4. Label Equivalence . . . . . . . . . . . . . . . . 14
+ 2.3.2.5. ACE Prefix . . . . . . . . . . . . . . . . . . . . 14
+ 2.3.2.6. Domain Name Slot . . . . . . . . . . . . . . . . . 14
+ 2.3.3. Order of Characters in Labels . . . . . . . . . . . . 15
+ 2.3.4. Punycode is an Algorithm, Not a Name or Adjective . . 15
+ 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
+ 4. Security Considerations . . . . . . . . . . . . . . . . . . . 16
+ 4.1. General Issues . . . . . . . . . . . . . . . . . . . . . . 16
+ 4.2. U-label Lengths . . . . . . . . . . . . . . . . . . . . . 16
+ 4.3. Local Character Set Issues . . . . . . . . . . . . . . . . 17
+ 4.4. Visually Similar Characters . . . . . . . . . . . . . . . 17
+ 4.5. IDNA Lookup, Registration, and the Base DNS
+ Specifications . . . . . . . . . . . . . . . . . . . . . . 18
+ 4.6. Legacy IDN Label Strings . . . . . . . . . . . . . . . . . 18
+ 4.7. Security Differences from IDNA2003 . . . . . . . . . . . . 19
+ 4.8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 20
+ 5. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20
+ 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
+ 6.1. Normative References . . . . . . . . . . . . . . . . . . . 20
+ 6.2. Informative References . . . . . . . . . . . . . . . . . . 21
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Standards Track [Page 3]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+1. Introduction
+
+1.1. IDNA2008
+
+ This document is one of a collection that, together, describe the
+ protocol and usage context for a revision of Internationalized Domain
+ Names for Applications (IDNA) that was largely completed in 2008,
+ known within the series and elsewhere as "IDNA2008". The series
+ replaces an earlier version of IDNA [RFC3490] [RFC3491]. For
+ convenience, that version of IDNA is referred to in these documents
+ as "IDNA2003". The newer version continues to use the Punycode
+ algorithm [RFC3492] and ACE (ASCII-compatible encoding) prefix from
+ that earlier version. The document collection is described in
+ Section 1.2. As indicated there, this document provides definitions
+ and other material that are common to the set.
+
+1.1.1. Audiences
+
+ While many IETF specifications are directed exclusively to protocol
+ implementers, the character of IDNA requires that it be understood
+ and properly used by those whose responsibilities include making
+ decisions about:
+
+ o what names are permitted in DNS zone files,
+
+ o policies related to names and naming, and
+
+ o the handling of domain name strings in files and systems, even
+ with no immediate intention of looking them up.
+
+ This document and those documents concerned with the protocol
+ definition, rules for handling strings that include characters
+ written right to left, and the actual list of characters and
+ categories will be of primary interest to protocol implementers.
+ This document and the one containing explanatory material will be of
+ primary interest to others, although they may have to fill in some
+ details by reference to other documents in the set.
+
+ This document and the associated ones are written from the
+ perspective of an IDNA-aware user, application, or implementation.
+ While they may reiterate fundamental DNS rules and requirements for
+ the convenience of the reader, they make no attempt to be
+ comprehensive about DNS principles and should not be considered as a
+ substitute for a thorough understanding of the DNS protocols and
+ specifications.
+
+
+
+
+
+
+Klensin Standards Track [Page 4]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+1.1.2. Normative Language
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [RFC2119].
+
+1.2. Road Map of IDNA2008 Documents
+
+ IDNA2008 consists of the following documents:
+
+ o This document, containing definitions and other material that are
+ needed for understanding other documents in the set. It is
+ referred to informally in other documents in the set as "Defs" or
+ "Definitions".
+
+ o A document, RFC 5894 [RFC5894], that provides an overview of the
+ protocol and associated tables together with explanatory material
+ and some rationale for the decisions that led to IDNA2008. That
+ document also contains advice for registry operations and those
+ who use Internationalized Domain Names (IDNs). It is referred to
+ informally in other documents in the set as "Rationale". It is
+ not normative.
+
+ o A document, RFC 5891 [RFC5891], that describes the core IDNA2008
+ protocol and its operations. In combination with the Bidi
+ document, described immediately below, it explicitly updates and
+ replaces RFC 3490. It is referred to informally in other
+ documents in the set as "Protocol".
+
+ o A document, RFC 5893 [RFC5893], that specifies special rules
+ (Bidi) for labels that contain characters that are written from
+ right to left.
+
+ o A specification, RFC 5892 [RFC5892], of the categories and rules
+ that identify the code points allowed in a label written in native
+ character form (defined more specifically as a "U-label" in
+ Section 2.3.2.1 below), based on Unicode 5.2 [Unicode52] code
+ point assignments and additional rules unique to IDNA2008. The
+ Unicode-based rules are expected to be stable across Unicode
+ updates and hence independent of Unicode versions. That
+ specification obsoletes RFC 3941 and IDN use of the tables to
+ which it refers. It is referred to informally in other documents
+ in the set as "Tables".
+
+
+
+
+
+
+
+
+Klensin Standards Track [Page 5]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ o A document [IDNA2008-Mapping] that discusses the issue of mapping
+ characters into other characters and that provides guidance for
+ doing so when that is appropriate. That document, referred to
+ informally as "Mapping", provides advice; it is not a required
+ part of IDNA.
+
+2. Definitions and Terminology
+
+2.1. Characters and Character Sets
+
+ A code point is an integer value in the codespace of a coded
+ character set. In Unicode, these are integers from 0 to 0x10FFFF.
+
+ Unicode [Unicode52] is a coded character set containing somewhat over
+ 100,000 characters assigned to code points as of version 5.2. A
+ single Unicode code point is denoted in these documents by "U+"
+ followed by four to six hexadecimal digits, while a range of Unicode
+ code points is denoted by two four to six digit hexadecimal numbers
+ separated by "..", with no prefixes.
+
+ ASCII means US-ASCII [ASCII], a coded character set containing 128
+ characters associated with code points in the range 0000..007F.
+ Unicode is a superset of ASCII and may be thought of as a
+ generalization of it; it includes all the ASCII characters and
+ associates them with the equivalent code points.
+
+ "Letters" are, informally, generalizations from the ASCII and
+ common-sense understanding of that term, i.e., characters that are
+ used to write text and that are not digits, symbols, or punctuation.
+ Formally, they are characters with a Unicode General Category value
+ starting in "L" (see Section 4.5 of The Unicode Standard
+ [Unicode52]).
+
+2.2. DNS-Related Terminology
+
+ When discussing the DNS, this document generally assumes the
+ terminology used in the DNS specifications [RFC1034] [RFC1035] as
+ subsequently modified [RFC1123] [RFC2181]. The term "lookup" is used
+ to describe the combination of operations performed by the IDNA2008
+ protocol and those actually performed by a DNS resolver. The process
+ of placing an entry into the DNS is referred to as "registration".
+ This is similar to common contemporary usage of that term in other
+ contexts. Consequently, any DNS zone administration is described as
+ a "registry", and the terms "registry" and "zone administrator" are
+ used interchangeably, regardless of the actual administrative
+ arrangements or level in the DNS tree. More details about that
+ relationship are included in the Rationale document.
+
+
+
+
+Klensin Standards Track [Page 6]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ The term "LDH code point" is defined in this document to refer to the
+ code points associated with ASCII letters (Unicode code points
+ 0041..005A and 0061..007A), digits (0030..0039), and the hyphen-minus
+ (U+002D). "LDH" is an abbreviation for "letters, digits, hyphen" but
+ is used specifically in this document to refer to the set of naming
+ rules described in Section 2.3.1 below.
+
+ The base DNS specifications [RFC1034] [RFC1035] discuss "domain
+ names" and "hostnames", but many people use the terms
+ interchangeably, as do sections of these specifications. Lack of
+ clarity about that terminology has contributed to confusion about
+ intent in some cases. These documents generally use the term "domain
+ name". When they refer to, e.g., hostname syntax restrictions, they
+ explicitly cite the relevant defining documents. The remaining
+ definitions in this subsection are essentially a review: if there is
+ any perceived difference between those definitions and the
+ definitions in the base DNS documents or those cited below, the
+ definitions in the other documents take precedence.
+
+ A label is an individual component of a domain name. Labels are
+ usually shown separated by dots; for example, the domain name
+ "www.example.com" is composed of three labels: "www", "example", and
+ "com". (The complete name convention using a trailing dot described
+ in RFC 1123 [RFC1123], which can be explicit as in "www.example.com."
+ or implicit as in "www.example.com", is not considered in this
+ specification.) IDNA extends the set of usable characters in labels
+ that are treated as text (as distinct from the binary string labels
+ discussed in RFC 1035 and RFC 2181 [RFC2181] and bitstring ones
+ [RFC2673]), but only in certain contexts. The different contexts for
+ different sets of usable characters are outlined in the next section.
+ For the rest of this document and in the related ones, the term
+ "label" is shorthand for "text label", and "every label" means "every
+ text label", including the expanded context.
+
+2.3. Terminology Specific to IDNA
+
+ This section defines some terminology to reduce dependence on terms
+ and definitions that have been problematic in the past. The
+ relationships among these definitions are illustrated in Figure 1 and
+ Figure 2. In the first of those figures, the parenthesized numbers
+ refer to the notes below the figure.
+
+2.3.1. LDH Label
+
+ This is the classical label form used, albeit with some additional
+ restrictions, in hostnames [RFC0952]. Its syntax is identical to
+ that described as the "preferred name syntax" in Section 3.5 of RFC
+ 1034 [RFC1034] as modified by RFC 1123 [RFC1123]. Briefly, it is a
+
+
+
+Klensin Standards Track [Page 7]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ string consisting of ASCII letters, digits, and the hyphen with the
+ further restriction that the hyphen cannot appear at the beginning or
+ end of the string. Like all DNS labels, its total length must not
+ exceed 63 octets.
+
+ LDH labels include the specialized labels used by IDNA (described as
+ "A-labels" below) and some additional restricted forms (also
+ described below).
+
+ To facilitate clear description, two new subsets of LDH labels are
+ created by the introduction of IDNA. These are called Reserved LDH
+ labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels).
+ Reserved LDH labels, known as "tagged domain names" in some other
+ contexts, have the property that they contain "--" in the third and
+ fourth characters but which otherwise conform to LDH label rules.
+ Only a subset of the R-LDH labels can be used in IDNA-aware
+ applications. That subset consists of the class of labels that begin
+ with the prefix "xn--" (case independent), but otherwise conform to
+ the rules for LDH labels. That subset is called "XN-labels" in this
+ set of documents. XN-labels are further divided into those whose
+ remaining characters (after the "xn--") are valid output of the
+ Punycode algorithm [RFC3492] and those that are not (see below). The
+ XN-labels that are valid Punycode output are known as "A-labels" if
+ they also meet the other criteria for IDNA-validity described below.
+ Because LDH labels (and, indeed, any DNS label) must not be more than
+ 63 octets in length, the portion of an XN-label derived from the
+ Punycode algorithm is limited to no more than 59 ASCII characters.
+ Non-Reserved LDH labels are the set of valid LDH labels that do not
+ have "--" in the third and fourth positions.
+
+ A consequence of the restrictions on valid characters in the native
+ Unicode character form (see U-labels) turns out to be that mixed-case
+ annotation, of the sort outlined in Appendix A of RFC 3492 [RFC3492],
+ is never useful. Therefore, since a valid A-label is the result of
+ Punycode encoding of a U-label, A-labels should be produced only in
+ lowercase, despite matching other (mixed-case or uppercase) potential
+ labels in the DNS.
+
+ Some strings that are prefixed with "xn--" to form labels may not be
+ the output of the Punycode algorithm, may fail the other tests
+ outlined below, or may violate other IDNA restrictions and thus are
+ also not valid IDNA labels. They are called "Fake A-labels" for
+ convenience.
+
+ Labels within the class of R-LDH labels that are not prefixed with
+ "xn--" are also not valid IDNA labels. To allow for future use of
+ mechanisms similar to IDNA, those labels MUST NOT be processed as
+
+
+
+
+Klensin Standards Track [Page 8]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ ordinary LDH labels by IDNA-conforming programs and SHOULD NOT be
+ mixed with IDNA labels in the same zone.
+
+ These distinctions among possible LDH labels are only of significance
+ for software that is IDNA-aware or for future extensions that use
+ extensions based on the same "prefix and encoding" model. For
+ IDNA-aware systems, the valid label types are: A-labels, U-labels,
+ and NR-LDH labels.
+
+ IDNA labels come in two flavors: an ACE-encoded form and a Unicode
+ (native character) form. These are referred to as A-labels and
+ U-labels, respectively, and are described in detail in the next
+ section.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Standards Track [Page 9]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ ASCII Label
+ __________________________________________________________________
+ | |
+ | ____________________ LDH Label (1) (4) ________________ |
+ | | ___________________________________ | |
+ | | |IDN Reserved LDH Labels | | |
+ | | | ("??--") or R-LDH Labels | _______________ | |
+ | | | _______________________________ | |NON-RESERVED | | |
+ | | | | XN-labels | | | LDH Labels | | |
+ | | | | _____________ ___________ | | | (NR-LDH | | |
+ | | | | | A-labels | | Fake (3) || | | labels) | | |
+ | | | | | "xn--"(2) | | A-labels || | |_____________| | |
+ | | | | |___________| |__________|| | | |
+ | | | |_____________________________| | | |
+ | | |_________________________________| | |
+ | |_______________________________________________________| |
+ | |
+ | _____________NON-LDH label________ |
+ | | ______________________ | |
+ | | | Underscore labels | | |
+ | | | e.g., _tcp | | |
+ | | |____________________| | |
+ | | | Labels with leading| | |
+ | | | or trailing | | |
+ | | | hyphens "-abcd" | | |
+ | | | or "xyz-" | | |
+ | | | or "-uvw-" | | |
+ | | |____________________| | |
+ | | | Labels with other | | |
+ | | | non-LDH ASCII chars| | |
+ | | | e.g., #$%_ | | |
+ | | |____________________| | |
+ | |________________________________| |
+ |________________________________________________________________|
+
+ (1) ASCII letters (uppercase and lowercase), digits,
+ hyphen. Hyphen may not appear in first or last
+ position. No more than 63 octets.
+ (2) Note that the string following "xn--" must
+ be the valid output of the Punycode algorithm
+ and must be convertible into valid U-label form.
+ (3) Note that a Fake A-label has a prefix "xn--"
+ but the remainder of the label is NOT the valid
+ output of the Punycode algorithm.
+ (4) LDH label subtypes are indistinguishable to
+ applications that are not IDNA-aware.
+
+ Figure 1: IDNA and Related DNS Terminology Space -- ASCII Labels
+
+
+
+Klensin Standards Track [Page 10]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ __________________________
+ | Non-ASCII |
+ | |
+ | ___________________ |
+ | | U-label (5) | |
+ | |_________________| |
+ | | | |
+ | | Binary Label | |
+ | | (including | |
+ | | high bit on) | |
+ | |_________________| |
+ | | | |
+ | | Bit String | |
+ | | Label | |
+ | |_________________| |
+ |________________________|
+
+ (5) To applications that are not IDNA-aware, U-labels
+ are indistinguishable from Binary ones.
+
+ Figure 2: Non-ASCII Labels
+
+2.3.2. Terms for IDN Label Codings
+
+2.3.2.1. IDNA-valid strings, A-label, and U-label
+
+ For IDNA-aware applications, the three types of valid labels are
+ "A-labels", "U-labels", and "NR-LDH labels", each of which is defined
+ below. The relationships among them are illustrated in Figure 1 and
+ Figure 2.
+
+ o A string is "IDNA-valid" if it meets all of the requirements of
+ these specifications for an IDNA label. IDNA-valid strings may
+ appear in either of the two forms defined immediately below, or
+ may be drawn from the NR-LDH label subset. IDNA-valid strings
+ must also conform to all basic DNS requirements for labels. These
+ documents make specific reference to the form appropriate to any
+ context in which the distinction is important.
+
+ o An "A-label" is the ASCII-Compatible Encoding (ACE, see
+ Section 2.3.2.5) form of an IDNA-valid string. It must be a
+ complete label: IDNA is defined for labels, not for parts of them
+ and not for complete domain names. This means, by definition,
+ that every A-label will begin with the IDNA ACE prefix, "xn--"
+ (see Section 2.3.2.5), followed by a string that is a valid output
+ of the Punycode algorithm [RFC3492] and hence a maximum of 59
+ ASCII characters in length. The prefix and string together must
+ conform to all requirements for a label that can be stored in the
+
+
+
+Klensin Standards Track [Page 11]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ DNS including conformance to the rules for LDH labels
+ (Section 2.3.1). If and only if a string meeting the above
+ requirements can be decoded into a U-label is it an A-label.
+
+ o A "U-label" is an IDNA-valid string of Unicode characters, in
+ Normalization Form C (NFC) and including at least one non-ASCII
+ character, expressed in a standard Unicode Encoding Form (such as
+ UTF-8). It is also subject to the constraints about permitted
+ characters that are specified in Section 4.2 of the Protocol
+ document and the rules in the Sections 2 and 3 of the Tables
+ document, the Bidi constraints in that document if it contains any
+ character from scripts that are written right to left, and the
+ symmetry constraint described immediately below. Conversions
+ between U-labels and A-labels are performed according to the
+ "Punycode" specification [RFC3492], adding or removing the ACE
+ prefix as needed.
+
+ To be valid, U-labels and A-labels must obey an important symmetry
+ constraint. While that constraint may be tested in any of several
+ ways, an A-label A1 must be capable of being produced by conversion
+ from a U-label U1, and that U-label U1 must be capable of being
+ produced by conversion from A-label A1. Among other things, this
+ implies that both U-labels and A-labels must be strings in Unicode
+ NFC [Unicode-UAX15] normalized form. These strings MUST contain only
+ characters specified elsewhere in this document series, and only in
+ the contexts indicated as appropriate.
+
+ Any rules or conventions that apply to DNS labels in general apply to
+ whichever of the U-label or A-label would be more restrictive. There
+ are two exceptions to this principle. First, the restriction to
+ ASCII characters does not apply to the U-label. Second, expansion of
+ the A-label form to a U-label may produce strings that are much
+ longer than the normal 63 octet DNS limit (potentially up to 252
+ characters) due to the compression efficiency of the Punycode
+ algorithm. Such extended-length U-labels are valid from the
+ standpoint of IDNA, but caution should be exercised as shorter limits
+ may be imposed by some applications.
+
+ For context, applications that are not IDNA-aware treat all LDH
+ labels as valid for appearance in DNS zone files and queries and some
+ of them may permit additional types of labels (i.e., not impose the
+ LDH restriction). IDNA-aware applications permit only A-labels and
+ NR-LDH labels to appear in zone files and queries. U-labels can
+ appear, along with the other two, in presentation and user interface
+ forms, and in protocols that use IDNA forms but that do not involve
+ the DNS itself.
+
+
+
+
+
+Klensin Standards Track [Page 12]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ Specifically, for IDNA-aware applications and contexts, the three
+ allowed categories are A-label, U-label, and NR-LDH label. Of the
+ Reserved LDH labels (R-LDH labels) only A-labels are valid for IDNA
+ use.
+
+ Strings that appear to be A-labels or U-labels are processed in
+ various operations of the Protocol document [RFC5891]. Those strings
+ are not yet demonstrably conformant with the conditions outlined
+ above because they are in the process of validation. Such strings
+ may be referred to as "unvalidated", "putative", or "apparent", or as
+ being "in the form of" one of the label types to indicate that they
+ have not been verified to meet the specified conformance
+ requirements.
+
+ Unvalidated A-labels are known only to be XN-labels, while Fake
+ A-labels have been demonstrated to fail some of the A-label tests.
+ Similarly, unvalidated U-labels are simply non-ASCII labels that may
+ or may not meet the requirements for U-labels.
+
+2.3.2.2. NR-LDH Label
+
+ These specifications use the term "NR-LDH label" strictly to refer to
+ an all-ASCII label that obeys the LDH label syntax discussed in
+ Section 2.3.1 and that is neither an IDN nor a label form reserved by
+ IDNA (R-LDH label). It should be stressed that all A-labels obey the
+ "hostname" [RFC0952] rules other than the length restriction in those
+ rules.
+
+2.3.2.3. Internationalized Domain Name and Internationalized Label
+
+ An "internationalized domain name" (IDN) is a domain name that
+ contains at least one A-label or U-label, but that otherwise may
+ contain any mixture of NR-LDH labels, A-labels, or U-labels. Just as
+ has been the case with ASCII names, some DNS zone administrators may
+ impose restrictions, beyond those imposed by DNS or IDNA, on the
+ characters or strings that may be registered as labels in their
+ zones. Because of the diversity of characters that can be used in a
+ U-label and the confusion they might cause, such restrictions are
+ mandatory for IDN registries and zones even though the particular
+ restrictions are not part of these specifications (the issue is
+ discussed in more detail in Section 4.3 of the Protocol document
+ [RFC5891]. Because these restrictions, commonly known as "registry
+ restrictions", only affect what can be registered and not lookup
+ processing, they have no effect on the syntax or semantics of DNS
+ protocol messages; a query for a name that matches no records will
+ yield the same response regardless of the reason why it is not in the
+ zone. Clients issuing queries or interpreting responses cannot be
+
+
+
+
+Klensin Standards Track [Page 13]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ assumed to have any knowledge of zone-specific restrictions or
+ conventions. See the section on registration policy in the Rationale
+ document [RFC5894] for additional discussion.
+
+ "Internationalized label" is used when a term is needed to refer to a
+ single label of an IDN, i.e., one that might be any of an NR-LDH
+ label, A-label, or U-label. There are some standardized DNS label
+ formats, such as the "underscore labels" used for service location
+ (SRV) records [RFC2782], that do not fall into any of the three
+ categories and hence are not internationalized labels.
+
+2.3.2.4. Label Equivalence
+
+ In IDNA, equivalence of labels is defined in terms of the A-labels.
+ If the A-labels are equal in a case-independent comparison, then the
+ labels are considered equivalent, no matter how they are represented.
+ Because of the isomorphism of A-labels and U-labels in IDNA2008, it
+ is possible to compare U-labels directly; see the Protocol document
+ [RFC5891] for details. Traditional LDH labels already have a notion
+ of equivalence: within that list of characters, uppercase and
+ lowercase are considered equivalent. The IDNA notion of equivalence
+ is an extension of that older notion but, because the protocol does
+ not specify any mandatory mapping and only those isomorphic forms are
+ considered, the only equivalents are:
+
+ o Exact (bit-string identity) matches between a pair of U-labels.
+
+ o Matches between a pair of A-labels, using normal DNS
+ case-insensitive matching rules.
+
+ o Equivalence between a U-label and an A-label determined by
+ translating the U-label form into an A-label form and then testing
+ for a match between the A-labels using normal DNS case-insensitive
+ matching rules.
+
+2.3.2.5. ACE Prefix
+
+ The "ACE prefix" is defined in this document to be a string of ASCII
+ characters, "xn--", that appears at the beginning of every A-label.
+ "ACE" stands for "ASCII-Compatible Encoding".
+
+2.3.2.6. Domain Name Slot
+
+ A "domain name slot" is defined in this document to be a protocol
+ element or a function argument or a return value (and so on)
+ explicitly designated for carrying a domain name. Examples of domain
+ name slots include the QNAME field of a DNS query; the name argument
+ of the gethostbyname() or getaddrinfo() standard C library functions;
+
+
+
+Klensin Standards Track [Page 14]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ the part of an email address following the at sign ("@") in the
+ parameter to the SMTP MAIL or RCPT commands or the "From:" field of
+ an email message header; and the host portion of the URI in the "src"
+ attribute of an HTML "<IMG>" tag. A string that has the syntax of a
+ domain name but that appears in general text is not in a domain name
+ slot. For example, a domain name appearing in the plain text body of
+ an email message is not occupying a domain name slot.
+
+ An "IDNA-aware domain name slot" is defined for this set of documents
+ to be a domain name slot explicitly designated for carrying an
+ internationalized domain name as defined in this document. The
+ designation may be static (for example, in the specification of the
+ protocol or interface) or dynamic (for example, as a result of
+ negotiation in an interactive session).
+
+ Name slots that are not IDNA-aware obviously include any domain name
+ slot whose specification predates IDNA. Note that the requirements
+ of some protocols that use the DNS for data storage prevent the use
+ of IDNs. For example, the format required for the underscore labels
+ used by the service location protocol [RFC2782] precludes
+ representation of a non-ASCII label in the DNS using A-labels because
+ those SRV-related labels must start with underscores. Of course,
+ non-ASCII IDN labels may be part of a domain name that also includes
+ underscore labels.
+
+2.3.3. Order of Characters in Labels
+
+ Because IDN labels may contain characters that are read, and
+ preferentially displayed, from right to left, there is a potential
+ ambiguity about which character in a label is "first". For the
+ purposes of these specifications, labels are considered, and
+ characters numbered, strictly in the order in which they appear "on
+ the wire". That order is equivalent to the leftmost character being
+ treated as first in a label that is read left to right and to the
+ rightmost character being first in a label that is read right to
+ left. The Bidi specification contains additional discussion of the
+ conditions that influence reading order.
+
+2.3.4. Punycode is an Algorithm, Not a Name or Adjective
+
+ There has been some confusion about whether a "Punycode string" does
+ or does not include the ACE prefix and about whether it is required
+ that such strings could have been the output of the ToASCII operation
+ (see RFC 3490, Section 4 [RFC3490]). This specification discourages
+ the use of the term "Punycode" to describe anything but the encoding
+ method and algorithm of RFC 3492 [RFC3492]. The terms defined above
+ are preferred as much more clear than the term "Punycode string".
+
+
+
+
+Klensin Standards Track [Page 15]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+3. IANA Considerations
+
+ IANA actions for this version of IDNA (IDNA2008) are specified in the
+ Tables document [RFC5892]. An overview of the relationships among
+ the various IANA registries appears in the Rationale document
+ [RFC5894]. This document does not specify any actions for IANA.
+
+4. Security Considerations
+
+4.1. General Issues
+
+ Security on the Internet partly relies on the DNS. Thus, any change
+ to the characteristics of the DNS can change the security of much of
+ the Internet.
+
+ Domain names are used by users to identify and connect to Internet
+ hosts and other network resources. The security of the Internet is
+ compromised if a user entering a single internationalized name is
+ connected to different servers based on different interpretations of
+ the internationalized domain name. In addition to characters that
+ are permitted by IDNA2003 and its mapping conventions (see
+ Section 4.6), the current specification changes the interpretation of
+ a few characters that were mapped to others in the earlier version;
+ zone administrators should be aware of the problems that this might
+ raise and take appropriate measures. The context for this issue is
+ discussed in more detail in the Rationale document [RFC5894].
+
+ In addition to the Security Considerations material that appears in
+ this document, the Bidi document [RFC5893] contains a discussion of
+ security issues specific to labels containing characters from scripts
+ that are normally written right to left.
+
+4.2. U-label Lengths
+
+ Labels associated with the DNS have traditionally been limited to 63
+ octets by the general restrictions in RFC 1035 and by the need to
+ treat them as a six-bit string length followed by the string in
+ actual calls to the DNS. That format is used in some other
+ applications and, in general, that representations of domain names as
+ dot-separated labels and as length-string pairs have been treated as
+ interchangeable. Because A-labels (the form actually used in the
+ DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
+ in general, more compressed that UTF-16 or UTF-32), U-labels that
+ obey all of the relevant symmetry (and other) constraints of these
+ documents may be quite a bit longer, potentially up to 252 characters
+ (Unicode code points). A fully-qualified domain name containing
+ several such labels can obviously also exceed the nominal 255 octet
+
+
+
+
+Klensin Standards Track [Page 16]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ limit for such names. Application authors using U-labels must exert
+ due caution to avoid buffer overflow and truncation errors and
+ attacks in contexts where shorter strings are expected.
+
+4.3. Local Character Set Issues
+
+ When systems use local character sets other than ASCII and Unicode,
+ these specifications leave the problem of converting between the
+ local character set and Unicode up to the application or local
+ system. If different applications (or different versions of one
+ application) implement different rules for conversions among coded
+ character sets, they could interpret the same name differently and
+ contact different servers. This problem is not solved by security
+ protocols, such as Transport Layer Security (TLS) [RFC5246], that do
+ not take local character sets into account.
+
+4.4. Visually Similar Characters
+
+ To help prevent confusion between characters that are visually
+ similar (sometimes called "confusables"), it is suggested that
+ implementations provide visual indications where a domain name
+ contains multiple scripts, especially when the scripts contain
+ characters that are easily confused visually, such as an omicron in
+ Greek mixed with Latin text. Such mechanisms can also be used to
+ show when a name contains a mixture of Simplified Chinese characters
+ with Traditional ones that have Simplified forms, or to distinguish
+ zero and one from uppercase "O" and lowercase "L". DNS zone
+ administrators may impose restrictions (subject to the limitations
+ identified elsewhere in these documents) that try to minimize
+ characters that have similar appearance or similar interpretations.
+
+ If multiple characters appear in a label and the label consists only
+ of characters in one script, individual characters that might be
+ confused with others if compared separately may be unambiguous and
+ non-confusing. On the other hand, that observation makes labels
+ containing characters from more than one script (often called "mixed-
+ script labels") even more risky -- users will tend to see what they
+ expect to see and context is a powerful reinforcement to perception.
+ At the same time, while the risks associated with mixed-script labels
+ are clear, simply prohibiting them will not eliminate problems,
+ especially where closely related scripts are involved. For example,
+ there are many strings that are entirely in Greek or Cyrillic scripts
+ that can be confused with each other or with Latin script strings.
+
+ It is worth noting that there are no comprehensive technical
+ solutions to the problems of confusable characters. One can reduce
+ the extent of the problems in various ways, but probably never
+
+
+
+
+Klensin Standards Track [Page 17]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ eliminate it. Some specific suggestions about identification and
+ handling of confusable characters appear in a Unicode Consortium
+ publication [Unicode-UTR36].
+
+4.5. IDNA Lookup, Registration, and the Base DNS Specifications
+
+ The Protocol specification [RFC5891] describes procedures for
+ registering and looking up labels that are not compatible with the
+ preferred syntax described in the base DNS specifications (see
+ Section 2.3.1) because they contain non-ASCII characters. These
+ procedures depend on the use of a special ASCII-compatible encoding
+ form that contains only characters permitted in hostnames by those
+ earlier specifications. The encoding used is Punycode [RFC3492]. No
+ security issues such as string length increases or new allowed values
+ are introduced by the encoding process or the use of these encoded
+ values, apart from those introduced by the ACE encoding itself.
+
+ Domain names (or portions of them) are sometimes compared against a
+ set of domains to be given special treatment if a match occurs, e.g.,
+ treated as more privileged than others or blocked in some way. In
+ such situations, it is especially important that the comparisons be
+ done properly, as specified in the "Requirements" section of the
+ Protocol document [RFC5891]. For labels already in ASCII form, the
+ proper comparison reduces to the same case-insensitive ASCII
+ comparison that has always been used for ASCII labels although
+ IDNA-aware applications are expected to look up only A-labels and
+ NR-LDH labels, i.e., to avoid looking up R-LDH labels that are not
+ A-labels.
+
+ The introduction of IDNA meant that any existing labels that start
+ with the ACE prefix would be construed as A-labels, at least until
+ they failed one of the relevant tests, whether or not that was the
+ intent of the zone administrator or registrant. There is no evidence
+ that this has caused any practical problems since RFC 3490 was
+ adopted, but the risk still exists in principle.
+
+4.6. Legacy IDN Label Strings
+
+ The URI Standard [RFC3986] and a number of application specifications
+ (e.g., SMTP [RFC5321] and HTTP [RFC2616]) do not permit non-ASCII
+ labels in DNS names used with those protocols, i.e., only the A-label
+ form of IDNs is permitted in those contexts. If only A-labels are
+ used, differences in interpretation between IDNA2003 and this version
+ arise only for characters whose interpretation have actually changed
+ (e.g., characters, such as ZWJ and ZWNJ, that were mapped to nothing
+ in IDNA2003 and that are considered legitimate in some contexts by
+ these specifications). Despite that prohibition, there are a
+ significant number of files and databases on the Internet in which
+
+
+
+Klensin Standards Track [Page 18]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ domain name strings appear in native-character form; a subset of
+ those strings use native-character labels that require IDNA2003
+ mapping to produce valid A-labels. The treatment of such labels will
+ vary by types of applications and application-designer preference: in
+ some situations, warnings to the user or outright rejection may be
+ appropriate; in others, it may be preferable to attempt to apply the
+ earlier mappings if lookup strictly conformant to these
+ specifications fails or even to do lookups under both sets of rules.
+ This general situation is discussed in more detail in the Rationale
+ document [RFC5894]. However, in the absence of care by registries
+ about how strings that could have different interpretations under
+ IDNA2003 and the current specification are handled, it is possible
+ that the differences could be used as a component of name-matching or
+ name-confusion attacks. Such care is therefore appropriate.
+
+4.7. Security Differences from IDNA2003
+
+ The registration and lookup models described in this set of documents
+ change the mechanisms available for lookup applications to determine
+ the validity of labels they encounter. In some respects, the ability
+ to test is strengthened. For example, putative labels that contain
+ unassigned code points will now be rejected, while IDNA2003 permitted
+ them (see the Rationale document [RFC5894] for a discussion of the
+ reasons for this). On the other hand, the Protocol specification no
+ longer assumes that the application that looks up a name will be able
+ to determine, and apply, information about the protocol version used
+ in registration. In theory, that may increase risk since the
+ application will be able to do less pre-lookup validation. In
+ practice, the protection afforded by that test has been largely
+ illusory for reasons explained in RFC 4690 [RFC4690] and elsewhere in
+ these documents.
+
+ Any change to the Stringprep [RFC3454] procedure that is profiled and
+ used in IDNA2003, or, more broadly, the IETF's model of the use of
+ internationalized character strings in different protocols, creates
+ some risk of inadvertent changes to those protocols, invalidating
+ deployed applications or databases, and so on. But these
+ specifications do not change Stringprep at all; they merely bypass
+ it. Because these documents do not depend on Stringprep, the
+ question of upgrading other protocols that do have that dependency
+ can be left to experts on those protocols: the IDNA changes and
+ possible upgrades to security protocols or conventions are
+ independent issues.
+
+
+
+
+
+
+
+
+Klensin Standards Track [Page 19]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+4.8. Summary
+
+ No mechanism involving names or identifiers alone can protect against
+ a wide variety of security threats and attacks that are largely
+ independent of the naming or identification system. These attacks
+ include spoofed pages, DNS query trapping and diversion, and so on.
+
+5. Acknowledgments
+
+ The initial version of this document was created largely by
+ extracting text from early draft versions of the Rationale document
+ [RFC5894]. See the section of this name and the one entitled
+ "Contributors", in it.
+
+ Specific textual suggestions after the extraction process came from
+ Vint Cerf, Lisa Dusseault, Bill McQuillan, Andrew Sullivan, and Ken
+ Whistler. Other changes were made in response to more general
+ comments, lists of concerns or specific errors from participants in
+ the Working Group and other observers, including Lyman Chapin, James
+ Mitchell, Subramanian Moonesamy, and Dan Winship.
+
+6. References
+
+6.1. Normative References
+
+ [ASCII] American National Standards Institute (formerly United
+ States of America Standards Institute), "USA Code for
+ Information Interchange", ANSI X3.4-1968, 1968. ANSI
+ X3.4-1968 has been replaced by newer versions with
+ slight modifications, but the 1968 version remains
+ definitive for the Internet.
+
+ [RFC1034] Mockapetris, P., "Domain names - concepts and
+ facilities", STD 13, RFC 1034, November 1987.
+
+ [RFC1035] Mockapetris, P., "Domain names - implementation and
+ specification", STD 13, RFC 1035, November 1987.
+
+ [RFC1123] Braden, R., "Requirements for Internet Hosts -
+ Application and Support", STD 3, RFC 1123, October 1989.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+
+
+
+
+
+
+
+Klensin Standards Track [Page 20]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ [Unicode-UAX15]
+ The Unicode Consortium, "Unicode Standard Annex #15:
+ Unicode Normalization Forms, Revision 31",
+ September 2009,
+ <http://www.unicode.org/reports/tr15/tr15-31.html>.
+
+ [Unicode52] The Unicode Consortium. The Unicode Standard, Version
+ 5.2.0, defined by: "The Unicode Standard, Version
+ 5.2.0", (Mountain View, CA: The Unicode Consortium,
+ 2009. ISBN 978-1-936213-00-9).
+ <http://www.unicode.org/versions/Unicode5.2.0/>.
+
+6.2. Informative References
+
+ [IDNA2008-Mapping]
+ Resnick, P. and P. Hoffman, "Mapping Characters in
+ Internationalized Domain Names for Applications (IDNA)",
+ Work in Progress, April 2010.
+
+ [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD
+ Internet host table specification", RFC 952,
+ October 1985.
+
+ [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
+ Specification", RFC 2181, July 1997.
+
+ [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
+ Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
+ Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
+
+ [RFC2673] Crawford, M., "Binary Labels in the Domain Name System",
+ RFC 2673, August 1999.
+
+ [RFC2782] Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for
+ specifying the location of services (DNS SRV)",
+ RFC 2782, February 2000.
+
+ [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
+ Internationalized Strings ("stringprep")", RFC 3454,
+ December 2002.
+
+ [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
+ "Internationalizing Domain Names in Applications
+ (IDNA)", RFC 3490, March 2003.
+
+ [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
+ Profile for Internationalized Domain Names (IDN)",
+ RFC 3491, March 2003.
+
+
+
+Klensin Standards Track [Page 21]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+ [RFC3492] Costello, A., "Punycode: A Bootstring encoding of
+ Unicode for Internationalized Domain Names in
+ Applications (IDNA)", RFC 3492, March 2003.
+
+ [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
+ Resource Identifier (URI): Generic Syntax", STD 66,
+ RFC 3986, January 2005.
+
+ [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
+ and Recommendations for Internationalized Domain Names
+ (IDNs)", RFC 4690, September 2006.
+
+ [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer
+ Security (TLS) Protocol Version 1.2", RFC 5246,
+ August 2008.
+
+ [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
+ October 2008.
+
+ [RFC5891] Klensin, J., "Internationalized Domain Names in
+ Applications (IDNA): Protocol", RFC 5891, August 2010.
+
+ [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and
+ Internationalized Domain Names for Applications (IDNA)",
+ RFC 5892, August 2010.
+
+ [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
+ Internationalized Domain Names for Applications (IDNA)",
+ RFC 5893, August 2010.
+
+ [RFC5894] Klensin, J., "Internationalized Domain Names for
+ Applications (IDNA): Background, Explanation, and
+ Rationale", RFC 5894, August 2010.
+
+ [Unicode-UTR36]
+ The Unicode Consortium, "Unicode Technical Report #36:
+ Unicode Security Considerations, Revision 7", July 2008,
+ <http://www.unicode.org/reports/tr36/tr36-7.html>.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Standards Track [Page 22]
+
+RFC 5890 IDNA Definitions August 2010
+
+
+Author's Address
+
+ John C Klensin
+ 1770 Massachusetts Ave, Ste 322
+ Cambridge, MA 02140
+ USA
+
+ Phone: +1 617 245 1457
+ EMail: john+ietf@jck.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Standards Track [Page 23]
+