1 files changed, 1291 insertions, 0 deletions
diff --git a/doc/rfc/rfc5890.txt b/doc/rfc/rfc5890.txt
new file mode 100644
index 0000000..8ca47e2
--- /dev/null
+++ b/doc/rfc/rfc5890.txt
@@ -0,0 +1,1291 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF)                        J. Klensin
+Request for Comments: 5890                                   August 2010
+Obsoletes: 3490
+Category: Standards Track
+ISSN: 2070-1721
+
+
+        Internationalized Domain Names for Applications (IDNA):
+                   Definitions and Document Framework
+
+Abstract
+
+   This document is one of a collection that, together, describe the
+   protocol and usage context for a revision of Internationalized Domain
+   Names for Applications (IDNA), superseding the earlier version.  It
+   describes the document collection and provides definitions and other
+   material that are common to the set.
+
+Status of This Memo
+
+   This is an Internet Standards Track document.
+
+   This document is a product of the Internet Engineering Task Force
+   (IETF).  It represents the consensus of the IETF community.  It has
+   received public review and has been approved for publication by the
+   Internet Engineering Steering Group (IESG).  Further information on
+   Internet Standards is available in Section 2 of RFC 5741.
+
+   Information about the current status of this document, any errata,
+   and how to provide feedback on it may be obtained at
+   http://www.rfc-editor.org/info/rfc5890.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin                      Standards Track                    [Page 1]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+Copyright Notice
+
+   Copyright (c) 2010 IETF Trust and the persons identified as the
+   document authors.  All rights reserved.
+
+   This document is subject to BCP 78 and the IETF Trust's Legal
+   Provisions Relating to IETF Documents
+   (http://trustee.ietf.org/license-info) in effect on the date of
+   publication of this document.  Please review these documents
+   carefully, as they describe your rights and restrictions with respect
+   to this document.  Code Components extracted from this document must
+   include Simplified BSD License text as described in Section 4.e of
+   the Trust Legal Provisions and are provided without warranty as
+   described in the Simplified BSD License.
+
+   This document may contain material from IETF Documents or IETF
+   Contributions published or made publicly available before November
+   10, 2008.  The person(s) controlling the copyright in some of this
+   material may not have granted the IETF Trust the right to allow
+   modifications of such material outside the IETF Standards Process.
+   Without obtaining an adequate license from the person(s) controlling
+   the copyright in such materials, this document may not be modified
+   outside the IETF Standards Process, and derivative works of it may
+   not be created outside the IETF Standards Process, except to format
+   it for publication as an RFC or to translate it into languages other
+   than English.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin                      Standards Track                    [Page 2]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+Table of Contents
+
+   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
+     1.1.  IDNA2008 . . . . . . . . . . . . . . . . . . . . . . . . .  4
+       1.1.1.  Audiences  . . . . . . . . . . . . . . . . . . . . . .  4
+       1.1.2.  Normative Language . . . . . . . . . . . . . . . . . .  5
+     1.2.  Road Map of IDNA2008 Documents . . . . . . . . . . . . . .  5
+   2.  Definitions and Terminology  . . . . . . . . . . . . . . . . .  6
+     2.1.  Characters and Character Sets  . . . . . . . . . . . . . .  6
+     2.2.  DNS-Related Terminology  . . . . . . . . . . . . . . . . .  6
+     2.3.  Terminology Specific to IDNA . . . . . . . . . . . . . . .  7
+       2.3.1.  LDH Label  . . . . . . . . . . . . . . . . . . . . . .  7
+       2.3.2.  Terms for IDN Label Codings  . . . . . . . . . . . . . 11
+         2.3.2.1.  IDNA-valid strings, A-label, and U-label . . . . . 11
+         2.3.2.2.  NR-LDH Label . . . . . . . . . . . . . . . . . . . 13
+         2.3.2.3.  Internationalized Domain Name and
+                   Internationalized Label  . . . . . . . . . . . . . 13
+         2.3.2.4.  Label Equivalence  . . . . . . . . . . . . . . . . 14
+         2.3.2.5.  ACE Prefix . . . . . . . . . . . . . . . . . . . . 14
+         2.3.2.6.  Domain Name Slot . . . . . . . . . . . . . . . . . 14
+       2.3.3.  Order of Characters in Labels  . . . . . . . . . . . . 15
+       2.3.4.  Punycode is an Algorithm, Not a Name or Adjective  . . 15
+   3.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 16
+   4.  Security Considerations  . . . . . . . . . . . . . . . . . . . 16
+     4.1.  General Issues . . . . . . . . . . . . . . . . . . . . . . 16
+     4.2.  U-label Lengths  . . . . . . . . . . . . . . . . . . . . . 16
+     4.3.  Local Character Set Issues . . . . . . . . . . . . . . . . 17
+     4.4.  Visually Similar Characters  . . . . . . . . . . . . . . . 17
+     4.5.  IDNA Lookup, Registration, and the Base DNS
+           Specifications . . . . . . . . . . . . . . . . . . . . . . 18
+     4.6.  Legacy IDN Label Strings . . . . . . . . . . . . . . . . . 18
+     4.7.  Security Differences from IDNA2003 . . . . . . . . . . . . 19
+     4.8.  Summary  . . . . . . . . . . . . . . . . . . . . . . . . . 20
+   5.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 20
+   6.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
+     6.1.  Normative References . . . . . . . . . . . . . . . . . . . 20
+     6.2.  Informative References . . . . . . . . . . . . . . . . . . 21
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin                      Standards Track                    [Page 3]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+1.  Introduction
+
+1.1.  IDNA2008
+
+   This document is one of a collection that, together, describe the
+   protocol and usage context for a revision of Internationalized Domain
+   Names for Applications (IDNA) that was largely completed in 2008,
+   known within the series and elsewhere as "IDNA2008".  The series
+   replaces an earlier version of IDNA [RFC3490] [RFC3491].  For
+   convenience, that version of IDNA is referred to in these documents
+   as "IDNA2003".  The newer version continues to use the Punycode
+   algorithm [RFC3492] and ACE (ASCII-compatible encoding) prefix from
+   that earlier version.  The document collection is described in
+   Section 1.2.  As indicated there, this document provides definitions
+   and other material that are common to the set.
+
+1.1.1.  Audiences
+
+   While many IETF specifications are directed exclusively to protocol
+   implementers, the character of IDNA requires that it be understood
+   and properly used by those whose responsibilities include making
+   decisions about:
+
+   o  what names are permitted in DNS zone files,
+
+   o  policies related to names and naming, and
+
+   o  the handling of domain name strings in files and systems, even
+      with no immediate intention of looking them up.
+
+   This document and those documents concerned with the protocol
+   definition, rules for handling strings that include characters
+   written right to left, and the actual list of characters and
+   categories will be of primary interest to protocol implementers.
+   This document and the one containing explanatory material will be of
+   primary interest to others, although they may have to fill in some
+   details by reference to other documents in the set.
+
+   This document and the associated ones are written from the
+   perspective of an IDNA-aware user, application, or implementation.
+   While they may reiterate fundamental DNS rules and requirements for
+   the convenience of the reader, they make no attempt to be
+   comprehensive about DNS principles and should not be considered as a
+   substitute for a thorough understanding of the DNS protocols and
+   specifications.
+
+
+
+
+
+
+Klensin                      Standards Track                    [Page 4]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+1.1.2.  Normative Language
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in RFC 2119 [RFC2119].
+
+1.2.  Road Map of IDNA2008 Documents
+
+   IDNA2008 consists of the following documents:
+
+   o  This document, containing definitions and other material that are
+      needed for understanding other documents in the set.  It is
+      referred to informally in other documents in the set as "Defs" or
+      "Definitions".
+
+   o  A document, RFC 5894 [RFC5894], that provides an overview of the
+      protocol and associated tables together with explanatory material
+      and some rationale for the decisions that led to IDNA2008.  That
+      document also contains advice for registry operations and those
+      who use Internationalized Domain Names (IDNs).  It is referred to
+      informally in other documents in the set as "Rationale".  It is
+      not normative.
+
+   o  A document, RFC 5891 [RFC5891], that describes the core IDNA2008
+      protocol and its operations.  In combination with the Bidi
+      document, described immediately below, it explicitly updates and
+      replaces RFC 3490.  It is referred to informally in other
+      documents in the set as "Protocol".
+
+   o  A document, RFC 5893 [RFC5893], that specifies special rules
+      (Bidi) for labels that contain characters that are written from
+      right to left.
+
+   o  A specification, RFC 5892 [RFC5892], of the categories and rules
+      that identify the code points allowed in a label written in native
+      character form (defined more specifically as a "U-label" in
+      Section 2.3.2.1 below), based on Unicode 5.2 [Unicode52] code
+      point assignments and additional rules unique to IDNA2008.  The
+      Unicode-based rules are expected to be stable across Unicode
+      updates and hence independent of Unicode versions.  That
+      specification obsoletes RFC 3941 and IDN use of the tables to
+      which it refers.  It is referred to informally in other documents
+      in the set as "Tables".
+
+
+
+
+
+
+
+
+Klensin                      Standards Track                    [Page 5]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   o  A document [IDNA2008-Mapping] that discusses the issue of mapping
+      characters into other characters and that provides guidance for
+      doing so when that is appropriate.  That document, referred to
+      informally as "Mapping", provides advice; it is not a required
+      part of IDNA.
+
+2.  Definitions and Terminology
+
+2.1.  Characters and Character Sets
+
+   A code point is an integer value in the codespace of a coded
+   character set.  In Unicode, these are integers from 0 to 0x10FFFF.
+
+   Unicode [Unicode52] is a coded character set containing somewhat over
+   100,000 characters assigned to code points as of version 5.2.  A
+   single Unicode code point is denoted in these documents by "U+"
+   followed by four to six hexadecimal digits, while a range of Unicode
+   code points is denoted by two four to six digit hexadecimal numbers
+   separated by "..", with no prefixes.
+
+   ASCII means US-ASCII [ASCII], a coded character set containing 128
+   characters associated with code points in the range 0000..007F.
+   Unicode is a superset of ASCII and may be thought of as a
+   generalization of it; it includes all the ASCII characters and
+   associates them with the equivalent code points.
+
+   "Letters" are, informally, generalizations from the ASCII and
+   common-sense understanding of that term, i.e., characters that are
+   used to write text and that are not digits, symbols, or punctuation.
+   Formally, they are characters with a Unicode General Category value
+   starting in "L" (see Section 4.5 of The Unicode Standard
+   [Unicode52]).
+
+2.2.  DNS-Related Terminology
+
+   When discussing the DNS, this document generally assumes the
+   terminology used in the DNS specifications [RFC1034] [RFC1035] as
+   subsequently modified [RFC1123] [RFC2181].  The term "lookup" is used
+   to describe the combination of operations performed by the IDNA2008
+   protocol and those actually performed by a DNS resolver.  The process
+   of placing an entry into the DNS is referred to as "registration".
+   This is similar to common contemporary usage of that term in other
+   contexts.  Consequently, any DNS zone administration is described as
+   a "registry", and the terms "registry" and "zone administrator" are
+   used interchangeably, regardless of the actual administrative
+   arrangements or level in the DNS tree.  More details about that
+   relationship are included in the Rationale document.
+
+
+
+
+Klensin                      Standards Track                    [Page 6]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   The term "LDH code point" is defined in this document to refer to the
+   code points associated with ASCII letters (Unicode code points
+   0041..005A and 0061..007A), digits (0030..0039), and the hyphen-minus
+   (U+002D).  "LDH" is an abbreviation for "letters, digits, hyphen" but
+   is used specifically in this document to refer to the set of naming
+   rules described in Section 2.3.1 below.
+
+   The base DNS specifications [RFC1034] [RFC1035] discuss "domain
+   names" and "hostnames", but many people use the terms
+   interchangeably, as do sections of these specifications.  Lack of
+   clarity about that terminology has contributed to confusion about
+   intent in some cases.  These documents generally use the term "domain
+   name".  When they refer to, e.g., hostname syntax restrictions, they
+   explicitly cite the relevant defining documents.  The remaining
+   definitions in this subsection are essentially a review: if there is
+   any perceived difference between those definitions and the
+   definitions in the base DNS documents or those cited below, the
+   definitions in the other documents take precedence.
+
+   A label is an individual component of a domain name.  Labels are
+   usually shown separated by dots; for example, the domain name
+   "www.example.com" is composed of three labels: "www", "example", and
+   "com".  (The complete name convention using a trailing dot described
+   in RFC 1123 [RFC1123], which can be explicit as in "www.example.com."
+   or implicit as in "www.example.com", is not considered in this
+   specification.)  IDNA extends the set of usable characters in labels
+   that are treated as text (as distinct from the binary string labels
+   discussed in RFC 1035 and RFC 2181 [RFC2181] and bitstring ones
+   [RFC2673]), but only in certain contexts.  The different contexts for
+   different sets of usable characters are outlined in the next section.
+   For the rest of this document and in the related ones, the term
+   "label" is shorthand for "text label", and "every label" means "every
+   text label", including the expanded context.
+
+2.3.  Terminology Specific to IDNA
+
+   This section defines some terminology to reduce dependence on terms
+   and definitions that have been problematic in the past.  The
+   relationships among these definitions are illustrated in Figure 1 and
+   Figure 2.  In the first of those figures, the parenthesized numbers
+   refer to the notes below the figure.
+
+2.3.1.  LDH Label
+
+   This is the classical label form used, albeit with some additional
+   restrictions, in hostnames [RFC0952].  Its syntax is identical to
+   that described as the "preferred name syntax" in Section 3.5 of RFC
+   1034 [RFC1034] as modified by RFC 1123 [RFC1123].  Briefly, it is a
+
+
+
+Klensin                      Standards Track                    [Page 7]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   string consisting of ASCII letters, digits, and the hyphen with the
+   further restriction that the hyphen cannot appear at the beginning or
+   end of the string.  Like all DNS labels, its total length must not
+   exceed 63 octets.
+
+   LDH labels include the specialized labels used by IDNA (described as
+   "A-labels" below) and some additional restricted forms (also
+   described below).
+
+   To facilitate clear description, two new subsets of LDH labels are
+   created by the introduction of IDNA.  These are called Reserved LDH
+   labels (R-LDH labels) and Non-Reserved LDH labels (NR-LDH labels).
+   Reserved LDH labels, known as "tagged domain names" in some other
+   contexts, have the property that they contain "--" in the third and
+   fourth characters but which otherwise conform to LDH label rules.
+   Only a subset of the R-LDH labels can be used in IDNA-aware
+   applications.  That subset consists of the class of labels that begin
+   with the prefix "xn--" (case independent), but otherwise conform to
+   the rules for LDH labels.  That subset is called "XN-labels" in this
+   set of documents.  XN-labels are further divided into those whose
+   remaining characters (after the "xn--") are valid output of the
+   Punycode algorithm [RFC3492] and those that are not (see below).  The
+   XN-labels that are valid Punycode output are known as "A-labels" if
+   they also meet the other criteria for IDNA-validity described below.
+   Because LDH labels (and, indeed, any DNS label) must not be more than
+   63 octets in length, the portion of an XN-label derived from the
+   Punycode algorithm is limited to no more than 59 ASCII characters.
+   Non-Reserved LDH labels are the set of valid LDH labels that do not
+   have "--" in the third and fourth positions.
+
+   A consequence of the restrictions on valid characters in the native
+   Unicode character form (see U-labels) turns out to be that mixed-case
+   annotation, of the sort outlined in Appendix A of RFC 3492 [RFC3492],
+   is never useful.  Therefore, since a valid A-label is the result of
+   Punycode encoding of a U-label, A-labels should be produced only in
+   lowercase, despite matching other (mixed-case or uppercase) potential
+   labels in the DNS.
+
+   Some strings that are prefixed with "xn--" to form labels may not be
+   the output of the Punycode algorithm, may fail the other tests
+   outlined below, or may violate other IDNA restrictions and thus are
+   also not valid IDNA labels.  They are called "Fake A-labels" for
+   convenience.
+
+   Labels within the class of R-LDH labels that are not prefixed with
+   "xn--" are also not valid IDNA labels.  To allow for future use of
+   mechanisms similar to IDNA, those labels MUST NOT be processed as
+
+
+
+
+Klensin                      Standards Track                    [Page 8]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   ordinary LDH labels by IDNA-conforming programs and SHOULD NOT be
+   mixed with IDNA labels in the same zone.
+
+   These distinctions among possible LDH labels are only of significance
+   for software that is IDNA-aware or for future extensions that use
+   extensions based on the same "prefix and encoding" model.  For
+   IDNA-aware systems, the valid label types are: A-labels, U-labels,
+   and NR-LDH labels.
+
+   IDNA labels come in two flavors: an ACE-encoded form and a Unicode
+   (native character) form.  These are referred to as A-labels and
+   U-labels, respectively, and are described in detail in the next
+   section.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin                      Standards Track                    [Page 9]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+                                    ASCII Label
+      __________________________________________________________________
+      |                                                                |
+      |     ____________________ LDH Label (1) (4) ________________    |
+      |    |  ___________________________________                  |   |
+      |    |  |IDN Reserved LDH Labels          |                  |   |
+      |    |  | ("??--") or R-LDH Labels        | _______________  |   |
+      |    |  | _______________________________ | |NON-RESERVED |  |   |
+      |    |  | |       XN-labels             | | | LDH Labels  |  |   |
+      |    |  | | _____________   ___________ | | | (NR-LDH     |  |   |
+      |    |  | | | A-labels  |   | Fake (3) || | |   labels)   |  |   |
+      |    |  | | | "xn--"(2) |   | A-labels || | |_____________|  |   |
+      |    |  | | |___________|   |__________|| |                  |   |
+      |    |  | |_____________________________| |                  |   |
+      |    |  |_________________________________|                  |   |
+      |    |_______________________________________________________|   |
+      |                                                                |
+      |       _____________NON-LDH label________                       |
+      |       |      ______________________    |                       |
+      |       |      | Underscore labels  |    |                       |
+      |       |      |  e.g., _tcp        |    |                       |
+      |       |      |____________________|    |                       |
+      |       |      | Labels with leading|    |                       |
+      |       |      | or trailing        |    |                       |
+      |       |      | hyphens "-abcd"    |    |                       |
+      |       |      | or "xyz-"          |    |                       |
+      |       |      | or "-uvw-"         |    |                       |
+      |       |      |____________________|    |                       |
+      |       |      | Labels with other  |    |                       |
+      |       |      | non-LDH ASCII chars|    |                       |
+      |       |      | e.g., #$%_         |    |                       |
+      |       |      |____________________|    |                       |
+      |       |________________________________|                       |
+      |________________________________________________________________|
+
+             (1) ASCII letters (uppercase and lowercase), digits,
+                    hyphen.  Hyphen may not appear in first or last
+                    position.  No more than 63 octets.
+             (2) Note that the string following "xn--" must
+                    be the valid output of the Punycode algorithm
+                    and must be convertible into valid U-label form.
+             (3) Note that a Fake A-label has a prefix "xn--"
+                    but the remainder of the label is NOT the valid
+                    output of the Punycode algorithm.
+             (4) LDH label subtypes are indistinguishable to
+                    applications that are not IDNA-aware.
+
+    Figure 1: IDNA and Related DNS Terminology Space -- ASCII Labels
+
+
+
+Klensin                      Standards Track                   [Page 10]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+                        __________________________
+                        |  Non-ASCII             |
+                        |                        |
+                        |    ___________________ |
+                        |    | U-label (5)     | |
+                        |    |_________________| |
+                        |    |                 | |
+                        |    |  Binary Label   | |
+                        |    | (including      | |
+                        |    |  high bit on)   | |
+                        |    |_________________| |
+                        |    |                 | |
+                        |    | Bit String      | |
+                        |    |   Label         | |
+                        |    |_________________| |
+                        |________________________|
+
+             (5) To applications that are not IDNA-aware, U-labels
+                    are indistinguishable from Binary ones.
+
+                        Figure 2: Non-ASCII Labels
+
+2.3.2.  Terms for IDN Label Codings
+
+2.3.2.1.  IDNA-valid strings, A-label, and U-label
+
+   For IDNA-aware applications, the three types of valid labels are
+   "A-labels", "U-labels", and "NR-LDH labels", each of which is defined
+   below.  The relationships among them are illustrated in Figure 1 and
+   Figure 2.
+
+   o  A string is "IDNA-valid" if it meets all of the requirements of
+      these specifications for an IDNA label.  IDNA-valid strings may
+      appear in either of the two forms defined immediately below, or
+      may be drawn from the NR-LDH label subset.  IDNA-valid strings
+      must also conform to all basic DNS requirements for labels.  These
+      documents make specific reference to the form appropriate to any
+      context in which the distinction is important.
+
+   o  An "A-label" is the ASCII-Compatible Encoding (ACE, see
+      Section 2.3.2.5) form of an IDNA-valid string.  It must be a
+      complete label: IDNA is defined for labels, not for parts of them
+      and not for complete domain names.  This means, by definition,
+      that every A-label will begin with the IDNA ACE prefix, "xn--"
+      (see Section 2.3.2.5), followed by a string that is a valid output
+      of the Punycode algorithm [RFC3492] and hence a maximum of 59
+      ASCII characters in length.  The prefix and string together must
+      conform to all requirements for a label that can be stored in the
+
+
+
+Klensin                      Standards Track                   [Page 11]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+      DNS including conformance to the rules for LDH labels
+      (Section 2.3.1).  If and only if a string meeting the above
+      requirements can be decoded into a U-label is it an A-label.
+
+   o  A "U-label" is an IDNA-valid string of Unicode characters, in
+      Normalization Form C (NFC) and including at least one non-ASCII
+      character, expressed in a standard Unicode Encoding Form (such as
+      UTF-8).  It is also subject to the constraints about permitted
+      characters that are specified in Section 4.2 of the Protocol
+      document and the rules in the Sections 2 and 3 of the Tables
+      document, the Bidi constraints in that document if it contains any
+      character from scripts that are written right to left, and the
+      symmetry constraint described immediately below.  Conversions
+      between U-labels and A-labels are performed according to the
+      "Punycode" specification [RFC3492], adding or removing the ACE
+      prefix as needed.
+
+   To be valid, U-labels and A-labels must obey an important symmetry
+   constraint.  While that constraint may be tested in any of several
+   ways, an A-label A1 must be capable of being produced by conversion
+   from a U-label U1, and that U-label U1 must be capable of being
+   produced by conversion from A-label A1.  Among other things, this
+   implies that both U-labels and A-labels must be strings in Unicode
+   NFC [Unicode-UAX15] normalized form.  These strings MUST contain only
+   characters specified elsewhere in this document series, and only in
+   the contexts indicated as appropriate.
+
+   Any rules or conventions that apply to DNS labels in general apply to
+   whichever of the U-label or A-label would be more restrictive.  There
+   are two exceptions to this principle.  First, the restriction to
+   ASCII characters does not apply to the U-label.  Second, expansion of
+   the A-label form to a U-label may produce strings that are much
+   longer than the normal 63 octet DNS limit (potentially up to 252
+   characters) due to the compression efficiency of the Punycode
+   algorithm.  Such extended-length U-labels are valid from the
+   standpoint of IDNA, but caution should be exercised as shorter limits
+   may be imposed by some applications.
+
+   For context, applications that are not IDNA-aware treat all LDH
+   labels as valid for appearance in DNS zone files and queries and some
+   of them may permit additional types of labels (i.e., not impose the
+   LDH restriction).  IDNA-aware applications permit only A-labels and
+   NR-LDH labels to appear in zone files and queries.  U-labels can
+   appear, along with the other two, in presentation and user interface
+   forms, and in protocols that use IDNA forms but that do not involve
+   the DNS itself.
+
+
+
+
+
+Klensin                      Standards Track                   [Page 12]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   Specifically, for IDNA-aware applications and contexts, the three
+   allowed categories are A-label, U-label, and NR-LDH label.  Of the
+   Reserved LDH labels (R-LDH labels) only A-labels are valid for IDNA
+   use.
+
+   Strings that appear to be A-labels or U-labels are processed in
+   various operations of the Protocol document [RFC5891].  Those strings
+   are not yet demonstrably conformant with the conditions outlined
+   above because they are in the process of validation.  Such strings
+   may be referred to as "unvalidated", "putative", or "apparent", or as
+   being "in the form of" one of the label types to indicate that they
+   have not been verified to meet the specified conformance
+   requirements.
+
+   Unvalidated A-labels are known only to be XN-labels, while Fake
+   A-labels have been demonstrated to fail some of the A-label tests.
+   Similarly, unvalidated U-labels are simply non-ASCII labels that may
+   or may not meet the requirements for U-labels.
+
+2.3.2.2.  NR-LDH Label
+
+   These specifications use the term "NR-LDH label" strictly to refer to
+   an all-ASCII label that obeys the LDH label syntax discussed in
+   Section 2.3.1 and that is neither an IDN nor a label form reserved by
+   IDNA (R-LDH label).  It should be stressed that all A-labels obey the
+   "hostname" [RFC0952] rules other than the length restriction in those
+   rules.
+
+2.3.2.3.  Internationalized Domain Name and Internationalized Label
+
+   An "internationalized domain name" (IDN) is a domain name that
+   contains at least one A-label or U-label, but that otherwise may
+   contain any mixture of NR-LDH labels, A-labels, or U-labels.  Just as
+   has been the case with ASCII names, some DNS zone administrators may
+   impose restrictions, beyond those imposed by DNS or IDNA, on the
+   characters or strings that may be registered as labels in their
+   zones.  Because of the diversity of characters that can be used in a
+   U-label and the confusion they might cause, such restrictions are
+   mandatory for IDN registries and zones even though the particular
+   restrictions are not part of these specifications (the issue is
+   discussed in more detail in Section 4.3 of the Protocol document
+   [RFC5891].  Because these restrictions, commonly known as "registry
+   restrictions", only affect what can be registered and not lookup
+   processing, they have no effect on the syntax or semantics of DNS
+   protocol messages; a query for a name that matches no records will
+   yield the same response regardless of the reason why it is not in the
+   zone.  Clients issuing queries or interpreting responses cannot be
+
+
+
+
+Klensin                      Standards Track                   [Page 13]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   assumed to have any knowledge of zone-specific restrictions or
+   conventions.  See the section on registration policy in the Rationale
+   document [RFC5894] for additional discussion.
+
+   "Internationalized label" is used when a term is needed to refer to a
+   single label of an IDN, i.e., one that might be any of an NR-LDH
+   label, A-label, or U-label.  There are some standardized DNS label
+   formats, such as the "underscore labels" used for service location
+   (SRV) records [RFC2782], that do not fall into any of the three
+   categories and hence are not internationalized labels.
+
+2.3.2.4.  Label Equivalence
+
+   In IDNA, equivalence of labels is defined in terms of the A-labels.
+   If the A-labels are equal in a case-independent comparison, then the
+   labels are considered equivalent, no matter how they are represented.
+   Because of the isomorphism of A-labels and U-labels in IDNA2008, it
+   is possible to compare U-labels directly; see the Protocol document
+   [RFC5891] for details.  Traditional LDH labels already have a notion
+   of equivalence: within that list of characters, uppercase and
+   lowercase are considered equivalent.  The IDNA notion of equivalence
+   is an extension of that older notion but, because the protocol does
+   not specify any mandatory mapping and only those isomorphic forms are
+   considered, the only equivalents are:
+
+   o  Exact (bit-string identity) matches between a pair of U-labels.
+
+   o  Matches between a pair of A-labels, using normal DNS
+      case-insensitive matching rules.
+
+   o  Equivalence between a U-label and an A-label determined by
+      translating the U-label form into an A-label form and then testing
+      for a match between the A-labels using normal DNS case-insensitive
+      matching rules.
+
+2.3.2.5.  ACE Prefix
+
+   The "ACE prefix" is defined in this document to be a string of ASCII
+   characters, "xn--", that appears at the beginning of every A-label.
+   "ACE" stands for "ASCII-Compatible Encoding".
+
+2.3.2.6.  Domain Name Slot
+
+   A "domain name slot" is defined in this document to be a protocol
+   element or a function argument or a return value (and so on)
+   explicitly designated for carrying a domain name.  Examples of domain
+   name slots include the QNAME field of a DNS query; the name argument
+   of the gethostbyname() or getaddrinfo() standard C library functions;
+
+
+
+Klensin                      Standards Track                   [Page 14]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   the part of an email address following the at sign ("@") in the
+   parameter to the SMTP MAIL or RCPT commands or the "From:" field of
+   an email message header; and the host portion of the URI in the "src"
+   attribute of an HTML "<IMG>" tag.  A string that has the syntax of a
+   domain name but that appears in general text is not in a domain name
+   slot.  For example, a domain name appearing in the plain text body of
+   an email message is not occupying a domain name slot.
+
+   An "IDNA-aware domain name slot" is defined for this set of documents
+   to be a domain name slot explicitly designated for carrying an
+   internationalized domain name as defined in this document.  The
+   designation may be static (for example, in the specification of the
+   protocol or interface) or dynamic (for example, as a result of
+   negotiation in an interactive session).
+
+   Name slots that are not IDNA-aware obviously include any domain name
+   slot whose specification predates IDNA.  Note that the requirements
+   of some protocols that use the DNS for data storage prevent the use
+   of IDNs.  For example, the format required for the underscore labels
+   used by the service location protocol [RFC2782] precludes
+   representation of a non-ASCII label in the DNS using A-labels because
+   those SRV-related labels must start with underscores.  Of course,
+   non-ASCII IDN labels may be part of a domain name that also includes
+   underscore labels.
+
+2.3.3.  Order of Characters in Labels
+
+   Because IDN labels may contain characters that are read, and
+   preferentially displayed, from right to left, there is a potential
+   ambiguity about which character in a label is "first".  For the
+   purposes of these specifications, labels are considered, and
+   characters numbered, strictly in the order in which they appear "on
+   the wire".  That order is equivalent to the leftmost character being
+   treated as first in a label that is read left to right and to the
+   rightmost character being first in a label that is read right to
+   left.  The Bidi specification contains additional discussion of the
+   conditions that influence reading order.
+
+2.3.4.  Punycode is an Algorithm, Not a Name or Adjective
+
+   There has been some confusion about whether a "Punycode string" does
+   or does not include the ACE prefix and about whether it is required
+   that such strings could have been the output of the ToASCII operation
+   (see RFC 3490, Section 4 [RFC3490]).  This specification discourages
+   the use of the term "Punycode" to describe anything but the encoding
+   method and algorithm of RFC 3492 [RFC3492].  The terms defined above
+   are preferred as much more clear than the term "Punycode string".
+
+
+
+
+Klensin                      Standards Track                   [Page 15]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+3.  IANA Considerations
+
+   IANA actions for this version of IDNA (IDNA2008) are specified in the
+   Tables document [RFC5892].  An overview of the relationships among
+   the various IANA registries appears in the Rationale document
+   [RFC5894].  This document does not specify any actions for IANA.
+
+4.  Security Considerations
+
+4.1.  General Issues
+
+   Security on the Internet partly relies on the DNS.  Thus, any change
+   to the characteristics of the DNS can change the security of much of
+   the Internet.
+
+   Domain names are used by users to identify and connect to Internet
+   hosts and other network resources.  The security of the Internet is
+   compromised if a user entering a single internationalized name is
+   connected to different servers based on different interpretations of
+   the internationalized domain name.  In addition to characters that
+   are permitted by IDNA2003 and its mapping conventions (see
+   Section 4.6), the current specification changes the interpretation of
+   a few characters that were mapped to others in the earlier version;
+   zone administrators should be aware of the problems that this might
+   raise and take appropriate measures.  The context for this issue is
+   discussed in more detail in the Rationale document [RFC5894].
+
+   In addition to the Security Considerations material that appears in
+   this document, the Bidi document [RFC5893] contains a discussion of
+   security issues specific to labels containing characters from scripts
+   that are normally written right to left.
+
+4.2.  U-label Lengths
+
+   Labels associated with the DNS have traditionally been limited to 63
+   octets by the general restrictions in RFC 1035 and by the need to
+   treat them as a six-bit string length followed by the string in
+   actual calls to the DNS.  That format is used in some other
+   applications and, in general, that representations of domain names as
+   dot-separated labels and as length-string pairs have been treated as
+   interchangeable.  Because A-labels (the form actually used in the
+   DNS) are potentially much more compressed than UTF-8 (and UTF-8 is,
+   in general, more compressed that UTF-16 or UTF-32), U-labels that
+   obey all of the relevant symmetry (and other) constraints of these
+   documents may be quite a bit longer, potentially up to 252 characters
+   (Unicode code points).  A fully-qualified domain name containing
+   several such labels can obviously also exceed the nominal 255 octet
+
+
+
+
+Klensin                      Standards Track                   [Page 16]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   limit for such names.  Application authors using U-labels must exert
+   due caution to avoid buffer overflow and truncation errors and
+   attacks in contexts where shorter strings are expected.
+
+4.3.  Local Character Set Issues
+
+   When systems use local character sets other than ASCII and Unicode,
+   these specifications leave the problem of converting between the
+   local character set and Unicode up to the application or local
+   system.  If different applications (or different versions of one
+   application) implement different rules for conversions among coded
+   character sets, they could interpret the same name differently and
+   contact different servers.  This problem is not solved by security
+   protocols, such as Transport Layer Security (TLS) [RFC5246], that do
+   not take local character sets into account.
+
+4.4.  Visually Similar Characters
+
+   To help prevent confusion between characters that are visually
+   similar (sometimes called "confusables"), it is suggested that
+   implementations provide visual indications where a domain name
+   contains multiple scripts, especially when the scripts contain
+   characters that are easily confused visually, such as an omicron in
+   Greek mixed with Latin text.  Such mechanisms can also be used to
+   show when a name contains a mixture of Simplified Chinese characters
+   with Traditional ones that have Simplified forms, or to distinguish
+   zero and one from uppercase "O" and lowercase "L".  DNS zone
+   administrators may impose restrictions (subject to the limitations
+   identified elsewhere in these documents) that try to minimize
+   characters that have similar appearance or similar interpretations.
+
+   If multiple characters appear in a label and the label consists only
+   of characters in one script, individual characters that might be
+   confused with others if compared separately may be unambiguous and
+   non-confusing.  On the other hand, that observation makes labels
+   containing characters from more than one script (often called "mixed-
+   script labels") even more risky -- users will tend to see what they
+   expect to see and context is a powerful reinforcement to perception.
+   At the same time, while the risks associated with mixed-script labels
+   are clear, simply prohibiting them will not eliminate problems,
+   especially where closely related scripts are involved.  For example,
+   there are many strings that are entirely in Greek or Cyrillic scripts
+   that can be confused with each other or with Latin script strings.
+
+   It is worth noting that there are no comprehensive technical
+   solutions to the problems of confusable characters.  One can reduce
+   the extent of the problems in various ways, but probably never
+
+
+
+
+Klensin                      Standards Track                   [Page 17]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   eliminate it.  Some specific suggestions about identification and
+   handling of confusable characters appear in a Unicode Consortium
+   publication [Unicode-UTR36].
+
+4.5.  IDNA Lookup, Registration, and the Base DNS Specifications
+
+   The Protocol specification [RFC5891] describes procedures for
+   registering and looking up labels that are not compatible with the
+   preferred syntax described in the base DNS specifications (see
+   Section 2.3.1) because they contain non-ASCII characters.  These
+   procedures depend on the use of a special ASCII-compatible encoding
+   form that contains only characters permitted in hostnames by those
+   earlier specifications.  The encoding used is Punycode [RFC3492].  No
+   security issues such as string length increases or new allowed values
+   are introduced by the encoding process or the use of these encoded
+   values, apart from those introduced by the ACE encoding itself.
+
+   Domain names (or portions of them) are sometimes compared against a
+   set of domains to be given special treatment if a match occurs, e.g.,
+   treated as more privileged than others or blocked in some way.  In
+   such situations, it is especially important that the comparisons be
+   done properly, as specified in the "Requirements" section of the
+   Protocol document [RFC5891].  For labels already in ASCII form, the
+   proper comparison reduces to the same case-insensitive ASCII
+   comparison that has always been used for ASCII labels although
+   IDNA-aware applications are expected to look up only A-labels and
+   NR-LDH labels, i.e., to avoid looking up R-LDH labels that are not
+   A-labels.
+
+   The introduction of IDNA meant that any existing labels that start
+   with the ACE prefix would be construed as A-labels, at least until
+   they failed one of the relevant tests, whether or not that was the
+   intent of the zone administrator or registrant.  There is no evidence
+   that this has caused any practical problems since RFC 3490 was
+   adopted, but the risk still exists in principle.
+
+4.6.  Legacy IDN Label Strings
+
+   The URI Standard [RFC3986] and a number of application specifications
+   (e.g., SMTP [RFC5321] and HTTP [RFC2616]) do not permit non-ASCII
+   labels in DNS names used with those protocols, i.e., only the A-label
+   form of IDNs is permitted in those contexts.  If only A-labels are
+   used, differences in interpretation between IDNA2003 and this version
+   arise only for characters whose interpretation have actually changed
+   (e.g., characters, such as ZWJ and ZWNJ, that were mapped to nothing
+   in IDNA2003 and that are considered legitimate in some contexts by
+   these specifications).  Despite that prohibition, there are a
+   significant number of files and databases on the Internet in which
+
+
+
+Klensin                      Standards Track                   [Page 18]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   domain name strings appear in native-character form; a subset of
+   those strings use native-character labels that require IDNA2003
+   mapping to produce valid A-labels.  The treatment of such labels will
+   vary by types of applications and application-designer preference: in
+   some situations, warnings to the user or outright rejection may be
+   appropriate; in others, it may be preferable to attempt to apply the
+   earlier mappings if lookup strictly conformant to these
+   specifications fails or even to do lookups under both sets of rules.
+   This general situation is discussed in more detail in the Rationale
+   document [RFC5894].  However, in the absence of care by registries
+   about how strings that could have different interpretations under
+   IDNA2003 and the current specification are handled, it is possible
+   that the differences could be used as a component of name-matching or
+   name-confusion attacks.  Such care is therefore appropriate.
+
+4.7.  Security Differences from IDNA2003
+
+   The registration and lookup models described in this set of documents
+   change the mechanisms available for lookup applications to determine
+   the validity of labels they encounter.  In some respects, the ability
+   to test is strengthened.  For example, putative labels that contain
+   unassigned code points will now be rejected, while IDNA2003 permitted
+   them (see the Rationale document [RFC5894] for a discussion of the
+   reasons for this).  On the other hand, the Protocol specification no
+   longer assumes that the application that looks up a name will be able
+   to determine, and apply, information about the protocol version used
+   in registration.  In theory, that may increase risk since the
+   application will be able to do less pre-lookup validation.  In
+   practice, the protection afforded by that test has been largely
+   illusory for reasons explained in RFC 4690 [RFC4690] and elsewhere in
+   these documents.
+
+   Any change to the Stringprep [RFC3454] procedure that is profiled and
+   used in IDNA2003, or, more broadly, the IETF's model of the use of
+   internationalized character strings in different protocols, creates
+   some risk of inadvertent changes to those protocols, invalidating
+   deployed applications or databases, and so on.  But these
+   specifications do not change Stringprep at all; they merely bypass
+   it.  Because these documents do not depend on Stringprep, the
+   question of upgrading other protocols that do have that dependency
+   can be left to experts on those protocols: the IDNA changes and
+   possible upgrades to security protocols or conventions are
+   independent issues.
+
+
+
+
+
+
+
+
+Klensin                      Standards Track                   [Page 19]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+4.8.  Summary
+
+   No mechanism involving names or identifiers alone can protect against
+   a wide variety of security threats and attacks that are largely
+   independent of the naming or identification system.  These attacks
+   include spoofed pages, DNS query trapping and diversion, and so on.
+
+5.  Acknowledgments
+
+   The initial version of this document was created largely by
+   extracting text from early draft versions of the Rationale document
+   [RFC5894].  See the section of this name and the one entitled
+   "Contributors", in it.
+
+   Specific textual suggestions after the extraction process came from
+   Vint Cerf, Lisa Dusseault, Bill McQuillan, Andrew Sullivan, and Ken
+   Whistler.  Other changes were made in response to more general
+   comments, lists of concerns or specific errors from participants in
+   the Working Group and other observers, including Lyman Chapin, James
+   Mitchell, Subramanian Moonesamy, and Dan Winship.
+
+6.  References
+
+6.1.  Normative References
+
+   [ASCII]      American National Standards Institute (formerly United
+                States of America Standards Institute), "USA Code for
+                Information Interchange", ANSI X3.4-1968, 1968.  ANSI
+                X3.4-1968 has been replaced by newer versions with
+                slight modifications, but the 1968 version remains
+                definitive for the Internet.
+
+   [RFC1034]    Mockapetris, P., "Domain names - concepts and
+                facilities", STD 13, RFC 1034, November 1987.
+
+   [RFC1035]    Mockapetris, P., "Domain names - implementation and
+                specification", STD 13, RFC 1035, November 1987.
+
+   [RFC1123]    Braden, R., "Requirements for Internet Hosts -
+                Application and Support", STD 3, RFC 1123, October 1989.
+
+   [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate
+                Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+
+
+
+
+
+
+
+Klensin                      Standards Track                   [Page 20]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   [Unicode-UAX15]
+                The Unicode Consortium, "Unicode Standard Annex #15:
+                Unicode Normalization Forms, Revision 31",
+                September 2009,
+                <http://www.unicode.org/reports/tr15/tr15-31.html>.
+
+   [Unicode52]  The Unicode Consortium.  The Unicode Standard, Version
+                5.2.0, defined by: "The Unicode Standard, Version
+                5.2.0", (Mountain View, CA: The Unicode Consortium,
+                2009. ISBN 978-1-936213-00-9).
+                <http://www.unicode.org/versions/Unicode5.2.0/>.
+
+6.2.  Informative References
+
+   [IDNA2008-Mapping]
+                Resnick, P. and P. Hoffman, "Mapping Characters in
+                Internationalized Domain Names for Applications (IDNA)",
+                Work in Progress, April 2010.
+
+   [RFC0952]    Harrenstien, K., Stahl, M., and E. Feinler, "DoD
+                Internet host table specification", RFC 952,
+                October 1985.
+
+   [RFC2181]    Elz, R. and R. Bush, "Clarifications to the DNS
+                Specification", RFC 2181, July 1997.
+
+   [RFC2616]    Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
+                Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
+                Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
+
+   [RFC2673]    Crawford, M., "Binary Labels in the Domain Name System",
+                RFC 2673, August 1999.
+
+   [RFC2782]    Gulbrandsen, A., Vixie, P., and L. Esibov, "A DNS RR for
+                specifying the location of services (DNS SRV)",
+                RFC 2782, February 2000.
+
+   [RFC3454]    Hoffman, P. and M. Blanchet, "Preparation of
+                Internationalized Strings ("stringprep")", RFC 3454,
+                December 2002.
+
+   [RFC3490]    Faltstrom, P., Hoffman, P., and A. Costello,
+                "Internationalizing Domain Names in Applications
+                (IDNA)", RFC 3490, March 2003.
+
+   [RFC3491]    Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
+                Profile for Internationalized Domain Names (IDN)",
+                RFC 3491, March 2003.
+
+
+
+Klensin                      Standards Track                   [Page 21]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+   [RFC3492]    Costello, A., "Punycode: A Bootstring encoding of
+                Unicode for Internationalized Domain Names in
+                Applications (IDNA)", RFC 3492, March 2003.
+
+   [RFC3986]    Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
+                Resource Identifier (URI): Generic Syntax", STD 66,
+                RFC 3986, January 2005.
+
+   [RFC4690]    Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review
+                and Recommendations for Internationalized Domain Names
+                (IDNs)", RFC 4690, September 2006.
+
+   [RFC5246]    Dierks, T. and E. Rescorla, "The Transport Layer
+                Security (TLS) Protocol Version 1.2", RFC 5246,
+                August 2008.
+
+   [RFC5321]    Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
+                October 2008.
+
+   [RFC5891]    Klensin, J., "Internationalized Domain Names in
+                Applications (IDNA): Protocol", RFC 5891, August 2010.
+
+   [RFC5892]    Faltstrom, P., Ed., "The Unicode Code Points and
+                Internationalized Domain Names for Applications (IDNA)",
+                RFC 5892, August 2010.
+
+   [RFC5893]    Alvestrand, H. and C. Karp, "Right-to-Left Scripts for
+                Internationalized Domain Names for Applications (IDNA)",
+                RFC 5893, August 2010.
+
+   [RFC5894]    Klensin, J., "Internationalized Domain Names for
+                Applications (IDNA): Background, Explanation, and
+                Rationale", RFC 5894, August 2010.
+
+   [Unicode-UTR36]
+                The Unicode Consortium, "Unicode Technical Report #36:
+                Unicode Security Considerations, Revision 7", July 2008,
+                <http://www.unicode.org/reports/tr36/tr36-7.html>.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin                      Standards Track                   [Page 22]
+
+RFC 5890                    IDNA Definitions                 August 2010
+
+
+Author's Address
+
+   John C Klensin
+   1770 Massachusetts Ave, Ste 322
+   Cambridge, MA  02140
+   USA
+
+   Phone: +1 617 245 1457
+   EMail: john+ietf@jck.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin                      Standards Track                   [Page 23]
+