diff options
Diffstat (limited to 'doc/rfc/rfc3490.txt')
-rw-r--r-- | doc/rfc/rfc3490.txt | 1235 |
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc3490.txt b/doc/rfc/rfc3490.txt new file mode 100644 index 0000000..d2e0b3b --- /dev/null +++ b/doc/rfc/rfc3490.txt @@ -0,0 +1,1235 @@ + + + + + + +Network Working Group P. Faltstrom +Request for Comments: 3490 Cisco +Category: Standards Track P. Hoffman + IMC & VPNC + A. Costello + UC Berkeley + March 2003 + + + Internationalizing Domain Names in Applications (IDNA) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2003). All Rights Reserved. + +Abstract + + Until now, there has been no standard method for domain names to use + characters outside the ASCII repertoire. This document defines + internationalized domain names (IDNs) and a mechanism called + Internationalizing Domain Names in Applications (IDNA) for handling + them in a standard fashion. IDNs use characters drawn from a large + repertoire (Unicode), but IDNA allows the non-ASCII characters to be + represented using only the ASCII characters already allowed in so- + called host names today. This backward-compatible representation is + required in existing protocols like DNS, so that IDNs can be + introduced with no changes to the existing infrastructure. IDNA is + only meant for processing domain names, not free text. + +Table of Contents + + 1. Introduction.................................................. 2 + 1.1 Problem Statement......................................... 3 + 1.2 Limitations of IDNA....................................... 3 + 1.3 Brief overview for application developers................. 4 + 2. Terminology................................................... 5 + 3. Requirements and applicability................................ 7 + 3.1 Requirements.............................................. 7 + 3.2 Applicability............................................. 8 + 3.2.1. DNS resource records................................ 8 + + + +Faltstrom, et al. Standards Track [Page 1] + +RFC 3490 IDNA March 2003 + + + 3.2.2. Non-domain-name data types stored in domain names... 9 + 4. Conversion operations......................................... 9 + 4.1 ToASCII................................................... 10 + 4.2 ToUnicode................................................. 11 + 5. ACE prefix.................................................... 12 + 6. Implications for typical applications using DNS............... 13 + 6.1 Entry and display in applications......................... 14 + 6.2 Applications and resolver libraries....................... 15 + 6.3 DNS servers............................................... 15 + 6.4 Avoiding exposing users to the raw ACE encoding........... 16 + 6.5 DNSSEC authentication of IDN domain names................ 16 + 7. Name server considerations.................................... 17 + 8. Root server considerations.................................... 17 + 9. References.................................................... 18 + 9.1 Normative References...................................... 18 + 9.2 Informative References.................................... 18 + 10. Security Considerations...................................... 19 + 11. IANA Considerations.......................................... 20 + 12. Authors' Addresses........................................... 21 + 13. Full Copyright Statement..................................... 22 + +1. Introduction + + IDNA works by allowing applications to use certain ASCII name labels + (beginning with a special prefix) to represent non-ASCII name labels. + Lower-layer protocols need not be aware of this; therefore IDNA does + not depend on changes to any infrastructure. In particular, IDNA + does not depend on any changes to DNS servers, resolvers, or protocol + elements, because the ASCII name service provided by the existing DNS + is entirely sufficient for IDNA. + + This document does not require any applications to conform to IDNA, + but applications can elect to use IDNA in order to support IDN while + maintaining interoperability with existing infrastructure. If an + application wants to use non-ASCII characters in domain names, IDNA + is the only currently-defined option. Adding IDNA support to an + existing application entails changes to the application only, and + leaves room for flexibility in the user interface. + + A great deal of the discussion of IDN solutions has focused on + transition issues and how IDN will work in a world where not all of + the components have been updated. Proposals that were not chosen by + the IDN Working Group would depend on user applications, resolvers, + and DNS servers being updated in order for a user to use an + internationalized domain name. Rather than rely on widespread + updating of all components, IDNA depends on updates to user + applications only; no changes are needed to the DNS protocol or any + DNS servers or the resolvers on user's computers. + + + +Faltstrom, et al. Standards Track [Page 2] + +RFC 3490 IDNA March 2003 + + +1.1 Problem Statement + + The IDNA specification solves the problem of extending the repertoire + of characters that can be used in domain names to include the Unicode + repertoire (with some restrictions). + + IDNA does not extend the service offered by DNS to the applications. + Instead, the applications (and, by implication, the users) continue + to see an exact-match lookup service. Either there is a single + exactly-matching name or there is no match. This model has served + the existing applications well, but it requires, with or without + internationalized domain names, that users know the exact spelling of + the domain names that the users type into applications such as web + browsers and mail user agents. The introduction of the larger + repertoire of characters potentially makes the set of misspellings + larger, especially given that in some cases the same appearance, for + example on a business card, might visually match several Unicode code + points or several sequences of code points. + + IDNA allows the graceful introduction of IDNs not only by avoiding + upgrades to existing infrastructure (such as DNS servers and mail + transport agents), but also by allowing some rudimentary use of IDNs + in applications by using the ASCII representation of the non-ASCII + name labels. While such names are very user-unfriendly to read and + type, and hence are not suitable for user input, they allow (for + instance) replying to email and clicking on URLs even though the + domain name displayed is incomprehensible to the user. In order to + allow user-friendly input and output of the IDNs, the applications + need to be modified to conform to this specification. + + IDNA uses the Unicode character repertoire, which avoids the + significant delays that would be inherent in waiting for a different + and specific character set be defined for IDN purposes by some other + standards developing organization. + +1.2 Limitations of IDNA + + The IDNA protocol does not solve all linguistic issues with users + inputting names in different scripts. Many important language-based + and script-based mappings are not covered in IDNA and need to be + handled outside the protocol. For example, names that are entered in + a mix of traditional and simplified Chinese characters will not be + mapped to a single canonical name. Another example is Scandinavian + names that are entered with U+00F6 (LATIN SMALL LETTER O WITH + DIAERESIS) will not be mapped to U+00F8 (LATIN SMALL LETTER O WITH + STROKE). + + + + + +Faltstrom, et al. Standards Track [Page 3] + +RFC 3490 IDNA March 2003 + + + An example of an important issue that is not considered in detail in + IDNA is how to provide a high probability that a user who is entering + a domain name based on visual information (such as from a business + card or billboard) or aural information (such as from a telephone or + radio) would correctly enter the IDN. Similar issues exist for ASCII + domain names, for example the possible visual confusion between the + letter 'O' and the digit zero, but the introduction of the larger + repertoire of characters creates more opportunities of similar + looking and similar sounding names. Note that this is a complex + issue relating to languages, input methods on computers, and so on. + Furthermore, the kind of matching and searching necessary for a high + probability of success would not fit the role of the DNS and its + exact matching function. + +1.3 Brief overview for application developers + + Applications can use IDNA to support internationalized domain names + anywhere that ASCII domain names are already supported, including DNS + master files and resolver interfaces. (Applications can also define + protocols and interfaces that support IDNs directly using non-ASCII + representations. IDNA does not prescribe any particular + representation for new protocols, but it still defines which names + are valid and how they are compared.) + + The IDNA protocol is contained completely within applications. It is + not a client-server or peer-to-peer protocol: everything is done + inside the application itself. When used with a DNS resolver + library, IDNA is inserted as a "shim" between the application and the + resolver library. When used for writing names into a DNS zone, IDNA + is used just before the name is committed to the zone. + + There are two operations described in section 4 of this document: + + - The ToASCII operation is used before sending an IDN to something + that expects ASCII names (such as a resolver) or writing an IDN + into a place that expects ASCII names (such as a DNS master file). + + - The ToUnicode operation is used when displaying names to users, + for example names obtained from a DNS zone. + + It is important to note that the ToASCII operation can fail. If it + fails when processing a domain name, that domain name cannot be used + as an internationalized domain name and the application has to have + some method of dealing with this failure. + + IDNA requires that implementations process input strings with + Nameprep [NAMEPREP], which is a profile of Stringprep [STRINGPREP], + and then with Punycode [PUNYCODE]. Implementations of IDNA MUST + + + +Faltstrom, et al. Standards Track [Page 4] + +RFC 3490 IDNA March 2003 + + + fully implement Nameprep and Punycode; neither Nameprep nor Punycode + are optional. + +2. Terminology + + The key words "MUST", "SHALL", "REQUIRED", "SHOULD", "RECOMMENDED", + and "MAY" in this document are to be interpreted as described in BCP + 14, RFC 2119 [RFC2119]. + + A code point is an integer value associated with a character in a + coded character set. + + Unicode [UNICODE] is a coded character set containing tens of + thousands of characters. A single Unicode code point is denoted by + "U+" followed by four to six hexadecimal digits, while a range of + Unicode code points is denoted by two hexadecimal numbers separated + by "..", with no prefixes. + + ASCII means US-ASCII [USASCII], a coded character set containing 128 + characters associated with code points in the range 0..7F. Unicode + is an extension of ASCII: it includes all the ASCII characters and + associates them with the same code points. + + The term "LDH code points" is defined in this document to mean the + code points associated with ASCII letters, digits, and the hyphen- + minus; that is, U+002D, 30..39, 41..5A, and 61..7A. "LDH" is an + abbreviation for "letters, digits, hyphen". + + [STD13] talks about "domain names" and "host names", but many people + use the terms interchangeably. Further, because [STD13] was not + terribly clear, many people who are sure they know the exact + definitions of each of these terms disagree on the definitions. In + this document the term "domain name" is used in general. This + document explicitly cites [STD3] whenever referring to the host name + syntax restrictions defined therein. + + A label is an individual part of a domain name. Labels are usually + shown separated by dots; for example, the domain name + "www.example.com" is composed of three labels: "www", "example", and + "com". (The zero-length root label described in [STD13], which can + be explicit as in "www.example.com." or implicit as in + "www.example.com", is not considered a label in this specification.) + IDNA extends the set of usable characters in labels that are text. + For the rest of this document, the term "label" is shorthand for + "text label", and "every label" means "every text label". + + + + + + +Faltstrom, et al. Standards Track [Page 5] + +RFC 3490 IDNA March 2003 + + + An "internationalized label" is a label to which the ToASCII + operation (see section 4) can be applied without failing (with the + UseSTD3ASCIIRules flag unset). This implies that every ASCII label + that satisfies the [STD13] length restriction is an internationalized + label. Therefore the term "internationalized label" is a + generalization, embracing both old ASCII labels and new non-ASCII + labels. Although most Unicode characters can appear in + internationalized labels, ToASCII will fail for some input strings, + and such strings are not valid internationalized labels. + + An "internationalized domain name" (IDN) is a domain name in which + every label is an internationalized label. This implies that every + ASCII domain name is an IDN (which implies that it is possible for a + name to be an IDN without it containing any non-ASCII characters). + This document does not attempt to define an "internationalized host + name". Just as has been the case with ASCII names, some DNS zone + administrators may impose restrictions, beyond those imposed by DNS + or IDNA, on the characters or strings that may be registered as + labels in their zones. Such restrictions have no impact on the + syntax or semantics of DNS protocol messages; a query for a name that + matches no records will yield the same response regardless of the + reason why it is not in the zone. Clients issuing queries or + interpreting responses cannot be assumed to have any knowledge of + zone-specific restrictions or conventions. + + In IDNA, equivalence of labels is defined in terms of the ToASCII + operation, which constructs an ASCII form for a given label, whether + or not the label was already an ASCII label. Labels are defined to + be equivalent if and only if their ASCII forms produced by ToASCII + match using a case-insensitive ASCII comparison. ASCII labels + already have a notion of equivalence: upper case and lower case are + considered equivalent. The IDNA notion of equivalence is an + extension of that older notion. Equivalent labels in IDNA are + treated as alternate forms of the same label, just as "foo" and "Foo" + are treated as alternate forms of the same label. + + To allow internationalized labels to be handled by existing + applications, IDNA uses an "ACE label" (ACE stands for ASCII + Compatible Encoding). An ACE label is an internationalized label + that can be rendered in ASCII and is equivalent to an + internationalized label that cannot be rendered in ASCII. Given any + internationalized label that cannot be rendered in ASCII, the ToASCII + operation will convert it to an equivalent ACE label (whereas an + ASCII label will be left unaltered by ToASCII). ACE labels are + unsuitable for display to users. The ToUnicode operation will + convert any label to an equivalent non-ACE label. In fact, an ACE + label is formally defined to be any label that the ToUnicode + operation would alter (whereas non-ACE labels are left unaltered by + + + +Faltstrom, et al. Standards Track [Page 6] + +RFC 3490 IDNA March 2003 + + + ToUnicode). Every ACE label begins with the ACE prefix specified in + section 5. The ToASCII and ToUnicode operations are specified in + section 4. + + The "ACE prefix" is defined in this document to be a string of ASCII + characters that appears at the beginning of every ACE label. It is + specified in section 5. + + A "domain name slot" is defined in this document to be a protocol + element or a function argument or a return value (and so on) + explicitly designated for carrying a domain name. Examples of domain + name slots include: the QNAME field of a DNS query; the name argument + of the gethostbyname() library function; the part of an email address + following the at-sign (@) in the From: field of an email message + header; and the host portion of the URI in the src attribute of an + HTML <IMG> tag. General text that just happens to contain a domain + name is not a domain name slot; for example, a domain name appearing + in the plain text body of an email message is not occupying a domain + name slot. + + An "IDN-aware domain name slot" is defined in this document to be a + domain name slot explicitly designated for carrying an + internationalized domain name as defined in this document. The + designation may be static (for example, in the specification of the + protocol or interface) or dynamic (for example, as a result of + negotiation in an interactive session). + + An "IDN-unaware domain name slot" is defined in this document to be + any domain name slot that is not an IDN-aware domain name slot. + Obviously, this includes any domain name slot whose specification + predates IDNA. + +3. Requirements and applicability + +3.1 Requirements + + IDNA conformance means adherence to the following four requirements: + + 1) Whenever dots are used as label separators, the following + characters MUST be recognized as dots: U+002E (full stop), U+3002 + (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61 + (halfwidth ideographic full stop). + + 2) Whenever a domain name is put into an IDN-unaware domain name slot + (see section 2), it MUST contain only ASCII characters. Given an + internationalized domain name (IDN), an equivalent domain name + satisfying this requirement can be obtained by applying the + + + + +Faltstrom, et al. Standards Track [Page 7] + +RFC 3490 IDNA March 2003 + + + ToASCII operation (see section 4) to each label and, if dots are + used as label separators, changing all the label separators to + U+002E. + + 3) ACE labels obtained from domain name slots SHOULD be hidden from + users when it is known that the environment can handle the non-ACE + form, except when the ACE form is explicitly requested. When it + is not known whether or not the environment can handle the non-ACE + form, the application MAY use the non-ACE form (which might fail, + such as by not being displayed properly), or it MAY use the ACE + form (which will look unintelligle to the user). Given an + internationalized domain name, an equivalent domain name + containing no ACE labels can be obtained by applying the ToUnicode + operation (see section 4) to each label. When requirements 2 and + 3 both apply, requirement 2 takes precedence. + + 4) Whenever two labels are compared, they MUST be considered to match + if and only if they are equivalent, that is, their ASCII forms + (obtained by applying ToASCII) match using a case-insensitive + ASCII comparison. Whenever two names are compared, they MUST be + considered to match if and only if their corresponding labels + match, regardless of whether the names use the same forms of label + separators. + +3.2 Applicability + + IDNA is applicable to all domain names in all domain name slots + except where it is explicitly excluded. + + This implies that IDNA is applicable to many protocols that predate + IDNA. Note that IDNs occupying domain name slots in those protocols + MUST be in ASCII form (see section 3.1, requirement 2). + +3.2.1. DNS resource records + + IDNA does not apply to domain names in the NAME and RDATA fields of + DNS resource records whose CLASS is not IN. This exclusion applies + to every non-IN class, present and future, except where future + standards override this exclusion by explicitly inviting the use of + IDNA. + + There are currently no other exclusions on the applicability of IDNA + to DNS resource records; it depends entirely on the CLASS, and not on + the TYPE. This will remain true, even as new types are defined, + unless there is a compelling reason for a new type to complicate + matters by imposing type-specific rules. + + + + + +Faltstrom, et al. Standards Track [Page 8] + +RFC 3490 IDNA March 2003 + + +3.2.2. Non-domain-name data types stored in domain names + + Although IDNA enables the representation of non-ASCII characters in + domain names, that does not imply that IDNA enables the + representation of non-ASCII characters in other data types that are + stored in domain names. For example, an email address local part is + sometimes stored in a domain label (hostmaster@example.com would be + represented as hostmaster.example.com in the RDATA field of an SOA + record). IDNA does not update the existing email standards, which + allow only ASCII characters in local parts. Therefore, unless the + email standards are revised to invite the use of IDNA for local + parts, a domain label that holds the local part of an email address + SHOULD NOT begin with the ACE prefix, and even if it does, it is to + be interpreted literally as a local part that happens to begin with + the ACE prefix. + +4. Conversion operations + + An application converts a domain name put into an IDN-unaware slot or + displayed to a user. This section specifies the steps to perform in + the conversion, and the ToASCII and ToUnicode operations. + + The input to ToASCII or ToUnicode is a single label that is a + sequence of Unicode code points (remember that all ASCII code points + are also Unicode code points). If a domain name is represented using + a character set other than Unicode or US-ASCII, it will first need to + be transcoded to Unicode. + + Starting from a whole domain name, the steps that an application + takes to do the conversions are: + + 1) Decide whether the domain name is a "stored string" or a "query + string" as described in [STRINGPREP]. If this conversion follows + the "queries" rule from [STRINGPREP], set the flag called + "AllowUnassigned". + + 2) Split the domain name into individual labels as described in + section 3.1. The labels do not include the separator. + + 3) For each label, decide whether or not to enforce the restrictions + on ASCII characters in host names [STD3]. (Applications already + faced this choice before the introduction of IDNA, and can + continue to make the decision the same way they always have; IDNA + makes no new recommendations regarding this choice.) If the + restrictions are to be enforced, set the flag called + "UseSTD3ASCIIRules" for that label. + + + + + +Faltstrom, et al. Standards Track [Page 9] + +RFC 3490 IDNA March 2003 + + + 4) Process each label with either the ToASCII or the ToUnicode + operation as appropriate. Typically, you use the ToASCII + operation if you are about to put the name into an IDN-unaware + slot, and you use the ToUnicode operation if you are displaying + the name to a user; section 3.1 gives greater detail on the + applicable requirements. + + 5) If ToASCII was applied in step 4 and dots are used as label + separators, change all the label separators to U+002E (full stop). + + The following two subsections define the ToASCII and ToUnicode + operations that are used in step 4. + + This description of the protocol uses specific procedure names, names + of flags, and so on, in order to facilitate the specification of the + protocol. These names, as well as the actual steps of the + procedures, are not required of an implementation. In fact, any + implementation which has the same external behavior as specified in + this document conforms to this specification. + +4.1 ToASCII + + The ToASCII operation takes a sequence of Unicode code points that + make up one label and transforms it into a sequence of code points in + the ASCII range (0..7F). If ToASCII succeeds, the original sequence + and the resulting sequence are equivalent labels. + + It is important to note that the ToASCII operation can fail. ToASCII + fails if any step of it fails. If any step of the ToASCII operation + fails on any label in a domain name, that domain name MUST NOT be + used as an internationalized domain name. The method for dealing + with this failure is application-specific. + + The inputs to ToASCII are a sequence of code points, the + AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of + ToASCII is either a sequence of ASCII code points or a failure + condition. + + ToASCII never alters a sequence of code points that are all in the + ASCII range to begin with (although it could fail). Applying the + ToASCII operation multiple times has exactly the same effect as + applying it just once. + + ToASCII consists of the following steps: + + 1. If the sequence contains any code points outside the ASCII range + (0..7F) then proceed to step 2, otherwise skip to step 3. + + + + +Faltstrom, et al. Standards Track [Page 10] + +RFC 3490 IDNA March 2003 + + + 2. Perform the steps specified in [NAMEPREP] and fail if there is an + error. The AllowUnassigned flag is used in [NAMEPREP]. + + 3. If the UseSTD3ASCIIRules flag is set, then perform these checks: + + (a) Verify the absence of non-LDH ASCII code points; that is, the + absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F. + + (b) Verify the absence of leading and trailing hyphen-minus; that + is, the absence of U+002D at the beginning and end of the + sequence. + + 4. If the sequence contains any code points outside the ASCII range + (0..7F) then proceed to step 5, otherwise skip to step 8. + + 5. Verify that the sequence does NOT begin with the ACE prefix. + + 6. Encode the sequence using the encoding algorithm in [PUNYCODE] and + fail if there is an error. + + 7. Prepend the ACE prefix. + + 8. Verify that the number of code points is in the range 1 to 63 + inclusive. + +4.2 ToUnicode + + The ToUnicode operation takes a sequence of Unicode code points that + make up one label and returns a sequence of Unicode code points. If + the input sequence is a label in ACE form, then the result is an + equivalent internationalized label that is not in ACE form, otherwise + the original sequence is returned unaltered. + + ToUnicode never fails. If any step fails, then the original input + sequence is returned immediately in that step. + + The ToUnicode output never contains more code points than its input. + Note that the number of octets needed to represent a sequence of code + points depends on the particular character encoding used. + + The inputs to ToUnicode are a sequence of code points, the + AllowUnassigned flag, and the UseSTD3ASCIIRules flag. The output of + ToUnicode is always a sequence of Unicode code points. + + 1. If all code points in the sequence are in the ASCII range (0..7F) + then skip to step 3. + + + + + +Faltstrom, et al. Standards Track [Page 11] + +RFC 3490 IDNA March 2003 + + + 2. Perform the steps specified in [NAMEPREP] and fail if there is an + error. (If step 3 of ToASCII is also performed here, it will not + affect the overall behavior of ToUnicode, but it is not + necessary.) The AllowUnassigned flag is used in [NAMEPREP]. + + 3. Verify that the sequence begins with the ACE prefix, and save a + copy of the sequence. + + 4. Remove the ACE prefix. + + 5. Decode the sequence using the decoding algorithm in [PUNYCODE] and + fail if there is an error. Save a copy of the result of this + step. + + 6. Apply ToASCII. + + 7. Verify that the result of step 6 matches the saved copy from step + 3, using a case-insensitive ASCII comparison. + + 8. Return the saved copy from step 5. + +5. ACE prefix + + The ACE prefix, used in the conversion operations (section 4), is two + alphanumeric ASCII characters followed by two hyphen-minuses. It + cannot be any of the prefixes already used in earlier documents, + which includes the following: "bl--", "bq--", "dq--", "lq--", "mq--", + "ra--", "wq--" and "zq--". The ToASCII and ToUnicode operations MUST + recognize the ACE prefix in a case-insensitive manner. + + The ACE prefix for IDNA is "xn--" or any capitalization thereof. + + This means that an ACE label might be "xn--de-jg4avhby1noc0d", where + "de-jg4avhby1noc0d" is the part of the ACE label that is generated by + the encoding steps in [PUNYCODE]. + + While all ACE labels begin with the ACE prefix, not all labels + beginning with the ACE prefix are necessarily ACE labels. Non-ACE + labels that begin with the ACE prefix will confuse users and SHOULD + NOT be allowed in DNS zones. + + + + + + + + + + + +Faltstrom, et al. Standards Track [Page 12] + +RFC 3490 IDNA March 2003 + + +6. Implications for typical applications using DNS + + In IDNA, applications perform the processing needed to input + internationalized domain names from users, display internationalized + domain names to users, and process the inputs and outputs from DNS + and other protocols that carry domain names. + + The components and interfaces between them can be represented + pictorially as: + + +------+ + | User | + +------+ + ^ + | Input and display: local interface methods + | (pen, keyboard, glowing phosphorus, ...) + +-------------------|-------------------------------+ + | v | + | +-----------------------------+ | + | | Application | | + | | (ToASCII and ToUnicode | | + | | operations may be | | + | | called here) | | + | +-----------------------------+ | + | ^ ^ | End system + | | | | + | Call to resolver: | | Application-specific | + | ACE | | protocol: | + | v | ACE unless the | + | +----------+ | protocol is updated | + | | Resolver | | to handle other | + | +----------+ | encodings | + | ^ | | + +-----------------|----------|----------------------+ + DNS protocol: | | + ACE | | + v v + +-------------+ +---------------------+ + | DNS servers | | Application servers | + +-------------+ +---------------------+ + + The box labeled "Application" is where the application splits a + domain name into labels, sets the appropriate flags, and performs the + ToASCII and ToUnicode operations. This is described in section 4. + + + + + + + +Faltstrom, et al. Standards Track [Page 13] + +RFC 3490 IDNA March 2003 + + +6.1 Entry and display in applications + + Applications can accept domain names using any character set or sets + desired by the application developer, and can display domain names in + any charset. That is, the IDNA protocol does not affect the + interface between users and applications. + + An IDNA-aware application can accept and display internationalized + domain names in two formats: the internationalized character set(s) + supported by the application, and as an ACE label. ACE labels that + are displayed or input MUST always include the ACE prefix. + Applications MAY allow input and display of ACE labels, but are not + encouraged to do so except as an interface for special purposes, + possibly for debugging, or to cope with display limitations as + described in section 6.4.. ACE encoding is opaque and ugly, and + should thus only be exposed to users who absolutely need it. Because + name labels encoded as ACE name labels can be rendered either as the + encoded ASCII characters or the proper decoded characters, the + application MAY have an option for the user to select the preferred + method of display; if it does, rendering the ACE SHOULD NOT be the + default. + + Domain names are often stored and transported in many places. For + example, they are part of documents such as mail messages and web + pages. They are transported in many parts of many protocols, such as + both the control commands and the RFC 2822 body parts of SMTP, and + the headers and the body content in HTTP. It is important to + remember that domain names appear both in domain name slots and in + the content that is passed over protocols. + + In protocols and document formats that define how to handle + specification or negotiation of charsets, labels can be encoded in + any charset allowed by the protocol or document format. If a + protocol or document format only allows one charset, the labels MUST + be given in that charset. + + In any place where a protocol or document format allows transmission + of the characters in internationalized labels, internationalized + labels SHOULD be transmitted using whatever character encoding and + escape mechanism that the protocol or document format uses at that + place. + + All protocols that use domain name slots already have the capacity + for handling domain names in the ASCII charset. Thus, ACE labels + (internationalized labels that have been processed with the ToASCII + operation) can inherently be handled by those protocols. + + + + + +Faltstrom, et al. Standards Track [Page 14] + +RFC 3490 IDNA March 2003 + + +6.2 Applications and resolver libraries + + Applications normally use functions in the operating system when they + resolve DNS queries. Those functions in the operating system are + often called "the resolver library", and the applications communicate + with the resolver libraries through a programming interface (API). + + Because these resolver libraries today expect only domain names in + ASCII, applications MUST prepare labels that are passed to the + resolver library using the ToASCII operation. Labels received from + the resolver library contain only ASCII characters; internationalized + labels that cannot be represented directly in ASCII use the ACE form. + ACE labels always include the ACE prefix. + + An operating system might have a set of libraries for performing the + ToASCII operation. The input to such a library might be in one or + more charsets that are used in applications (UTF-8 and UTF-16 are + likely candidates for almost any operating system, and script- + specific charsets are likely for localized operating systems). + + IDNA-aware applications MUST be able to work with both non- + internationalized labels (those that conform to [STD13] and [STD3]) + and internationalized labels. + + It is expected that new versions of the resolver libraries in the + future will be able to accept domain names in other charsets than + ASCII, and application developers might one day pass not only domain + names in Unicode, but also in local script to a new API for the + resolver libraries in the operating system. Thus the ToASCII and + ToUnicode operations might be performed inside these new versions of + the resolver libraries. + + Domain names passed to resolvers or put into the question section of + DNS requests follow the rules for "queries" from [STRINGPREP]. + +6.3 DNS servers + + Domain names stored in zones follow the rules for "stored strings" + from [STRINGPREP]. + + For internationalized labels that cannot be represented directly in + ASCII, DNS servers MUST use the ACE form produced by the ToASCII + operation. All IDNs served by DNS servers MUST contain only ASCII + characters. + + If a signaling system which makes negotiation possible between old + and new DNS clients and servers is standardized in the future, the + encoding of the query in the DNS protocol itself can be changed from + + + +Faltstrom, et al. Standards Track [Page 15] + +RFC 3490 IDNA March 2003 + + + ACE to something else, such as UTF-8. The question whether or not + this should be used is, however, a separate problem and is not + discussed in this memo. + +6.4 Avoiding exposing users to the raw ACE encoding + + Any application that might show the user a domain name obtained from + a domain name slot, such as from gethostbyaddr or part of a mail + header, will need to be updated if it is to prevent users from seeing + the ACE. + + If an application decodes an ACE name using ToUnicode but cannot show + all of the characters in the decoded name, such as if the name + contains characters that the output system cannot display, the + application SHOULD show the name in ACE format (which always includes + the ACE prefix) instead of displaying the name with the replacement + character (U+FFFD). This is to make it easier for the user to + transfer the name correctly to other programs. Programs that by + default show the ACE form when they cannot show all the characters in + a name label SHOULD also have a mechanism to show the name that is + produced by the ToUnicode operation with as many characters as + possible and replacement characters in the positions where characters + cannot be displayed. + + The ToUnicode operation does not alter labels that are not valid ACE + labels, even if they begin with the ACE prefix. After ToUnicode has + been applied, if a label still begins with the ACE prefix, then it is + not a valid ACE label, and is not equivalent to any of the + intermediate Unicode strings constructed by ToUnicode. + +6.5 DNSSEC authentication of IDN domain names + + DNS Security [RFC2535] is a method for supplying cryptographic + verification information along with DNS messages. Public Key + Cryptography is used in conjunction with digital signatures to + provide a means for a requester of domain information to authenticate + the source of the data. This ensures that it can be traced back to a + trusted source, either directly, or via a chain of trust linking the + source of the information to the top of the DNS hierarchy. + + IDNA specifies that all internationalized domain names served by DNS + servers that cannot be represented directly in ASCII must use the ACE + form produced by the ToASCII operation. This operation must be + performed prior to a zone being signed by the private key for that + zone. Because of this ordering, it is important to recognize that + DNSSEC authenticates the ASCII domain name, not the Unicode form or + + + + + +Faltstrom, et al. Standards Track [Page 16] + +RFC 3490 IDNA March 2003 + + + the mapping between the Unicode form and the ASCII form. In the + presence of DNSSEC, this is the name that MUST be signed in the zone + and MUST be validated against. + + One consequence of this for sites deploying IDNA in the presence of + DNSSEC is that any special purpose proxies or forwarders used to + transform user input into IDNs must be earlier in the resolution flow + than DNSSEC authenticating nameservers for DNSSEC to work. + +7. Name server considerations + + Existing DNS servers do not know the IDNA rules for handling non- + ASCII forms of IDNs, and therefore need to be shielded from them. + All existing channels through which names can enter a DNS server + database (for example, master files [STD13] and DNS update messages + [RFC2136]) are IDN-unaware because they predate IDNA, and therefore + requirement 2 of section 3.1 of this document provides the needed + shielding, by ensuring that internationalized domain names entering + DNS server databases through such channels have already been + converted to their equivalent ASCII forms. + + It is imperative that there be only one ASCII encoding for a + particular domain name. Because of the design of the ToASCII and + ToUnicode operations, there are no ACE labels that decode to ASCII + labels, and therefore name servers cannot contain multiple ASCII + encodings of the same domain name. + + [RFC2181] explicitly allows domain labels to contain octets beyond + the ASCII range (0..7F), and this document does not change that. + Note, however, that there is no defined interpretation of octets + 80..FF as characters. If labels containing these octets are returned + to applications, unpredictable behavior could result. The ASCII form + defined by ToASCII is the only standard representation for + internationalized labels in the current DNS protocol. + +8. Root server considerations + + IDNs are likely to be somewhat longer than current domain names, so + the bandwidth needed by the root servers is likely to go up by a + small amount. Also, queries and responses for IDNs will probably be + somewhat longer than typical queries today, so more queries and + responses may be forced to go to TCP instead of UDP. + + + + + + + + + +Faltstrom, et al. Standards Track [Page 17] + +RFC 3490 IDNA March 2003 + + +9. References + +9.1 Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of + Internationalized Strings ("stringprep")", RFC 3454, + December 2002. + + [NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep + Profile for Internationalized Domain Names (IDN)", RFC + 3491, March 2003. + + [PUNYCODE] Costello, A., "Punycode: A Bootstring encoding of + Unicode for use with Internationalized Domain Names in + Applications (IDNA)", RFC 3492, March 2003. + + [STD3] Braden, R., "Requirements for Internet Hosts -- + Communication Layers", STD 3, RFC 1122, and + "Requirements for Internet Hosts -- Application and + Support", STD 3, RFC 1123, October 1989. + + [STD13] Mockapetris, P., "Domain names - concepts and + facilities", STD 13, RFC 1034 and "Domain names - + implementation and specification", STD 13, RFC 1035, + November 1987. + +9.2 Informative References + + [RFC2535] Eastlake, D., "Domain Name System Security Extensions", + RFC 2535, March 1999. + + [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS + Specification", RFC 2181, July 1997. + + [UAX9] Unicode Standard Annex #9, The Bidirectional Algorithm, + <http://www.unicode.org/unicode/reports/tr9/>. + + [UNICODE] The Unicode Consortium. The Unicode Standard, Version + 3.2.0 is defined by The Unicode Standard, Version 3.0 + (Reading, MA, Addison-Wesley, 2000. ISBN 0-201-61633-5), + as amended by the Unicode Standard Annex #27: Unicode + 3.1 (http://www.unicode.org/reports/tr27/) and by the + Unicode Standard Annex #28: Unicode 3.2 + (http://www.unicode.org/reports/tr28/). + + + + +Faltstrom, et al. Standards Track [Page 18] + +RFC 3490 IDNA March 2003 + + + [USASCII] Cerf, V., "ASCII format for Network Interchange", RFC + 20, October 1969. + +10. Security Considerations + + Security on the Internet partly relies on the DNS. Thus, any change + to the characteristics of the DNS can change the security of much of + the Internet. + + This memo describes an algorithm which encodes characters that are + not valid according to STD3 and STD13 into octet values that are + valid. No security issues such as string length increases or new + allowed values are introduced by the encoding process or the use of + these encoded values, apart from those introduced by the ACE encoding + itself. + + Domain names are used by users to identify and connect to Internet + servers. The security of the Internet is compromised if a user + entering a single internationalized name is connected to different + servers based on different interpretations of the internationalized + domain name. + + When systems use local character sets other than ASCII and Unicode, + this specification leaves the the problem of transcoding between the + local character set and Unicode up to the application. If different + applications (or different versions of one application) implement + different transcoding rules, they could interpret the same name + differently and contact different servers. This problem is not + solved by security protocols like TLS that do not take local + character sets into account. + + Because this document normatively refers to [NAMEPREP], [PUNYCODE], + and [STRINGPREP], it includes the security considerations from those + documents as well. + + If or when this specification is updated to use a more recent Unicode + normalization table, the new normalization table will need to be + compared with the old to spot backwards incompatible changes. If + there are such changes, they will need to be handled somehow, or + there will be security as well as operational implications. Methods + to handle the conflicts could include keeping the old normalization, + or taking care of the conflicting characters by operational means, or + some other method. + + Implementations MUST NOT use more recent normalization tables than + the one referenced from this document, even though more recent tables + may be provided by operating systems. If an application is unsure of + which version of the normalization tables are in the operating + + + +Faltstrom, et al. Standards Track [Page 19] + +RFC 3490 IDNA March 2003 + + + system, the application needs to include the normalization tables + itself. Using normalization tables other than the one referenced + from this specification could have security and operational + implications. + + To help prevent confusion between characters that are visually + similar, it is suggested that implementations provide visual + indications where a domain name contains multiple scripts. Such + mechanisms can also be used to show when a name contains a mixture of + simplified and traditional Chinese characters, or to distinguish zero + and one from O and l. DNS zone adminstrators may impose restrictions + (subject to the limitations in section 2) that try to minimize + homographs. + + Domain names (or portions of them) are sometimes compared against a + set of privileged or anti-privileged domains. In such situations it + is especially important that the comparisons be done properly, as + specified in section 3.1 requirement 4. For labels already in ASCII + form, the proper comparison reduces to the same case-insensitive + ASCII comparison that has always been used for ASCII labels. + + The introduction of IDNA means that any existing labels that start + with the ACE prefix and would be altered by ToUnicode will + automatically be ACE labels, and will be considered equivalent to + non-ASCII labels, whether or not that was the intent of the zone + adminstrator or registrant. + +11. IANA Considerations + + IANA has assigned the ACE prefix in consultation with the IESG. + + + + + + + + + + + + + + + + + + + + + +Faltstrom, et al. Standards Track [Page 20] + +RFC 3490 IDNA March 2003 + + +12. Authors' Addresses + + Patrik Faltstrom + Cisco Systems + Arstaangsvagen 31 J + S-117 43 Stockholm Sweden + + EMail: paf@cisco.com + + + Paul Hoffman + Internet Mail Consortium and VPN Consortium + 127 Segre Place + Santa Cruz, CA 95060 USA + + EMail: phoffman@imc.org + + + Adam M. Costello + University of California, Berkeley + + URL: http://www.nicemice.net/amc/ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Faltstrom, et al. Standards Track [Page 21] + +RFC 3490 IDNA March 2003 + + +13. Full Copyright Statement + + Copyright (C) The Internet Society (2003). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Faltstrom, et al. Standards Track [Page 22] + |