summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6055.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc6055.txt')
-rw-r--r--doc/rfc/rfc6055.txt1347
1 files changed, 1347 insertions, 0 deletions
diff --git a/doc/rfc/rfc6055.txt b/doc/rfc/rfc6055.txt
new file mode 100644
index 0000000..81e7098
--- /dev/null
+++ b/doc/rfc/rfc6055.txt
@@ -0,0 +1,1347 @@
+
+
+
+
+
+
+Internet Architecture Board (IAB) D. Thaler
+Request for Comments: 6055 Microsoft
+Updates: 2130 J. Klensin
+Category: Informational
+ISSN: 2070-1721 S. Cheshire
+ Apple
+ February 2011
+
+
+ IAB Thoughts on Encodings for Internationalized Domain Names
+
+Abstract
+
+ This document explores issues with Internationalized Domain Names
+ (IDNs) that result from the use of various encoding schemes such as
+ UTF-8 and the ASCII-Compatible Encoding produced by the Punycode
+ algorithm. It focuses on the importance of agreeing on a single
+ encoding and how complicated the state of affairs ends up being as a
+ result of using different encodings today.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This document is a product of the Internet Architecture Board (IAB)
+ and represents information that the IAB has deemed valuable to
+ provide for permanent record. Documents approved for publication by
+ the IAB are not a candidate for any level of Internet Standard; see
+ Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6055.
+
+Copyright Notice
+
+ Copyright (c) 2011 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document.
+
+
+
+
+
+Thaler, et al. Informational [Page 1]
+
+RFC 6055 IDN Encodings February 2011
+
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2
+ 1.1. APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
+ 2. Use of Non-DNS Protocols . . . . . . . . . . . . . . . . . . . 9
+ 3. Use of Non-ASCII in DNS . . . . . . . . . . . . . . . . . . . 10
+ 3.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . 14
+ 4. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 16
+ 5. Security Considerations . . . . . . . . . . . . . . . . . . . 18
+ 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19
+ 7. IAB Members at the Time of Approval . . . . . . . . . . . . . 19
+ 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
+ 8.1. Normative References . . . . . . . . . . . . . . . . . . . 20
+ 8.2. Informative References . . . . . . . . . . . . . . . . . . 20
+
+1. Introduction
+
+ The goal of this document is to explore what can be learned from some
+ current difficulties in implementing Internationalized Domain Names
+ (IDNs).
+
+ A domain name consists of a sequence of labels, conventionally
+ written separated by dots. An IDN is a domain name that contains one
+ or more labels that, in turn, contain one or more non-ASCII
+ characters. Just as with plain ASCII domain names, each IDN label
+ must be encoded using some mechanism before it can be transmitted in
+ network packets, stored in memory, stored on disk, etc. These
+ encodings need to be reversible, but they need not store domain names
+ the same way humans conventionally write them on paper. For example,
+ when transmitted over the network in DNS packets, domain name labels
+ are *not* separated with dots.
+
+ Internationalized Domain Names for Applications (IDNA), discussed
+ later in this document, is the standard that defines the use and
+ coding of internationalized domain names for use on the public
+ Internet [RFC5890]. An earlier version of IDNA [RFC3490] is now
+ being phased out. Except where noted, the two versions are
+ approximately the same with regard to the issues discussed in this
+ document. However, some explanations appeared in the earlier
+ documents that were no longer considered useful when the later
+ revision was created; they are quoted here from the documents in
+ which they appear. In addition, the terminology of the two versions
+ differ somewhat; this document reflects the terminology of the
+ current version.
+
+ Unicode [Unicode] is a list of characters (including non-spacing
+ marks that are used to form some other characters), where each
+ character is assigned an integer value, called a code point. In
+
+
+
+Thaler, et al. Informational [Page 2]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ simple terms a Unicode string is a string of integer code point
+ values in the range 0 to 1,114,111 (10FFFF in base 16). These
+ integer code points must be encoded using some mechanism before they
+ can be transmitted in network packets, stored in memory, stored on
+ disk, etc. Some common ways of encoding these integer code point
+ values in computer systems include UTF-8, UTF-16, and UTF-32. In
+ addition to the material below, those forms and the tradeoffs among
+ them are discussed in Chapter 2 of The Unicode Standard [Unicode].
+
+ UTF-8 is a mechanism for encoding a Unicode code point in a variable
+ number of 8-bit octets, where an ASCII code point is preserved as-is.
+ Those octets encode a string of integer code point values, which
+ represent a string of Unicode characters. The authoritative
+ definition of UTF-8 is in Sections 3.9 and 3.10 of The Unicode
+ Standard [Unicode], but a description of UTF-8 encoding can also be
+ found in RFC 3629 [RFC3629]. Descriptions and formulae can also be
+ found in Annex D of ISO/IEC 10646-1 [10646].
+
+ UTF-16 is a mechanism for encoding a Unicode code point in one or two
+ 16-bit integers, described in detail in Sections 3.9 and 3.10 of The
+ Unicode Standard [Unicode]. A UTF-16 string encodes a string of
+ integer code point values that represent a string of Unicode
+ characters.
+
+ UTF-32 (formerly UCS-4), also described in Sections 3.9 and 3.10 of
+ The Unicode Standard [Unicode], is a mechanism for encoding a Unicode
+ code point in a single 32-bit integer. A UTF-32 string is thus a
+ string of 32-bit integer code point values, which represent a string
+ of Unicode characters.
+
+ Note that UTF-16 results in some all-zero octets when code points
+ occur early in the Unicode sequence, and UTF-32 always has all-zero
+ octets.
+
+ IDNA specifies validity of a label, such as what characters it can
+ contain, relationships among them, and so on, in Unicode terms.
+ Valid labels can be in either "U-label" or "A-label" form, with the
+ appropriate one determined by particular protocols or by context.
+ U-label form is a direct representation of the Unicode characters
+ using one of the encoding forms discussed above. This document
+ discusses UTF-8 strings in many places. While all U-labels can be
+ represented by UTF-8 strings, not all UTF-8 strings are valid
+ U-labels (see Section 2.3.2 of the IDNA Definitions document
+ [RFC5890] for a discussion of these distinctions). A-label form uses
+ a compressed, ASCII-compatible encoding (an "ACE" in IDNA and other
+ terminology) produced by an algorithm called Punycode. U-labels and
+
+
+
+
+
+Thaler, et al. Informational [Page 3]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ A-labels are duals of each other: transformations from one to the
+ other do not lose information. The transformation mechanisms are
+ specified in the IDNA Protocol document [RFC5891].
+
+ Punycode [RFC3492] is thus a mechanism for encoding a Unicode string
+ in an ASCII-compatible encoding, i.e., using only letters, digits,
+ and hyphens from the ASCII character set. When a Unicode label that
+ is valid under the IDNA rules (a U-label) is encoded with Punycode
+ for IDNA purposes, it is prefixed with "xn--"; the result is called
+ an A-label. The prefix convention assumes that no other DNS labels
+ (at least no other DNS labels in IDNA-aware applications) are allowed
+ to start with these four characters. Consequently, when A-label
+ encoding is assumed, any DNS labels beginning with "xn--" now have a
+ different meaning (the Punycode encoding of a label containing one or
+ more non-ASCII characters) or no defined meaning at all (in the case
+ of labels that are not IDNA-compliant, i.e., are not well-formed
+ A-labels).
+
+ ISO-2022-JP [RFC1468] is a mechanism for encoding a string of ASCII
+ and Japanese characters, where an ASCII character is preserved as-is.
+ ISO-2022-JP is stateful: special sequences are used to switch between
+ character coding tables. As a result, if there are lost or mangled
+ characters in a character stream, it is extremely difficult to
+ recover the original stream after such a lost character encoding
+ shift.
+
+ Comparison of Unicode strings is not as easy as comparing ASCII
+ strings. First, there are a multitude of ways to represent a string
+ of Unicode characters. Second, in many languages and scripts, the
+ actual definition of "same" is very context-dependent. Because of
+ this, comparison of two Unicode strings must take into account how
+ the Unicode strings are encoded. Regardless of the encoding,
+ however, comparison cannot simply be done by comparing the encoded
+ Unicode strings byte by byte. The only time that is possible is when
+ the strings are both mapped into some canonical form and encoded the
+ same way.
+
+ In 1996 the IAB sponsored a workshop on character sets and encodings
+ [RFC2130]. This document adds to that discussion and focuses on the
+ importance of agreeing on a single encoding and how complicated the
+ state of affairs ends up being as a result of using different
+ encodings today.
+
+ Different applications, APIs, and protocols use different encoding
+ schemes today. Many of them were originally defined to use only
+ ASCII. Internationalizing Domain Names in Applications (IDNA)
+ [RFC5890] defines a mechanism that requires changes to applications,
+ but in an attempt not to change APIs or servers, specifies that the
+
+
+
+Thaler, et al. Informational [Page 4]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ A-label format is to be used in many contexts. In some ways this
+ could be seen as not changing the existing APIs, in the sense that
+ the strings being passed to and from the APIs are still apparently
+ ASCII strings. In other ways it is a very profound change to the
+ existing APIs, because while those strings are still syntactically
+ valid ASCII strings, they no longer mean the same thing that they
+ used to. What looks like a plain ASCII string to one piece of
+ software or library could be seen by another piece of software or
+ library (with the application of out-of-band information) to be in
+ fact an encoding of a Unicode string.
+
+ Section 1.3 of the original IDNA specification [RFC3490] states:
+
+ The IDNA protocol is contained completely within applications. It
+ is not a client-server or peer-to-peer protocol: everything is
+ done inside the application itself. When used with a DNS resolver
+ library, IDNA is inserted as a "shim" between the application and
+ the resolver library. When used for writing names into a DNS
+ zone, IDNA is used just before the name is committed to the zone.
+
+ Figure 1 depicts a simplistic architecture that a naive reader might
+ assume from the paragraph quoted above. (A variant of this same
+ picture appears in Section 6 of the original IDNA specification
+ [RFC3490], further strengthening this assumption.)
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Thaler, et al. Informational [Page 5]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ +-----------------------------------------+
+ |Host |
+ | +-------------+ |
+ | | Application | |
+ | +------+------+ |
+ | | |
+ | +----+----+ |
+ | | DNS | |
+ | | Resolver| |
+ | | Library | |
+ | +----+----+ |
+ | | |
+ +-----------------------------------------+
+ |
+ _________|_________
+ / \
+ / \
+ / \
+ | Internet |
+ \ /
+ \ /
+ \___________________/
+
+ Simplistic Architecture
+
+ Figure 1
+
+ There are, however, two problems with this simplistic architecture
+ that cause it to differ from reality.
+
+ First, resolver APIs on Operating Systems (OSs) today (Mac OS,
+ Windows, Linux, etc.) are not DNS-specific. They typically provide a
+ layer of indirection so that the application can work independent of
+ the name resolution mechanism, which could be DNS, mDNS
+ [DNS-MULTICAST], LLMNR [RFC4795], NetBIOS-over-TCP
+ [RFC1001][RFC1002], hosts table [RFC0952], NIS [NIS], or anything
+ else. For example, "Basic Socket Interface Extensions for IPv6"
+ [RFC3493] specifies the getaddrinfo() API and contains many phrases
+ like "For example, when using the DNS" and "any type of name
+ resolution service (for example, the DNS)". Importantly, DNS is
+ mentioned only as an example, and the application has no knowledge as
+ to whether DNS or some other protocol will be used.
+
+ Second, even with the DNS protocol, private namespaces (sometimes
+ including private uses of the DNS) do not necessarily use the same
+ character set encoding scheme as the public Internet namespace.
+
+
+
+
+
+Thaler, et al. Informational [Page 6]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ We will discuss each of the above issues in subsequent sections. For
+ reference, Figure 2 depicts a more realistic architecture on typical
+ hosts today (which don't have IDNA inserted as a shim immediately
+ above the DNS resolver library). More generally, the host may be
+ attached to one or more local networks, each of which may or may not
+ be connected to the public Internet and may or may not have a private
+ namespace.
+ +-----------------------------------------+
+ |Host |
+ | +-------------+ |
+ | | Application | |
+ | +------+------+ |
+ | | |
+ | +------+------+ |
+ | | Generic | |
+ | | Name | |
+ | | Resolution | |
+ | | API | |
+ | +------+------+ |
+ | | |
+ | +-----+------+---+--+-------+-----+ |
+ | | | | | | | |
+ | +-+-++--+--++--+-++---+---++--+--++-+-+ |
+ | |DNS||LLMNR||mDNS||NetBIOS||hosts||...| |
+ | +---++-----++----++-------++-----++---+ |
+ | |
+ +-----------------------------------------+
+ |
+ ______|______
+ / \
+ / \
+ / local \
+ \ network /
+ \ /
+ \_____________/
+ |
+ _________|_________
+ / \
+ / \
+ / \
+ | Internet |
+ \ /
+ \ /
+ \___________________/
+
+ Realistic Architecture
+
+ Figure 2
+
+
+
+Thaler, et al. Informational [Page 7]
+
+RFC 6055 IDN Encodings February 2011
+
+
+1.1. APIs
+
+ Section 6.2 of the original IDNA specification [RFC3490] states
+ (where ToASCII and ToUnicode below refer to conversions using the
+ Punycode algorithm):
+
+ It is expected that new versions of the resolver libraries in the
+ future will be able to accept domain names in other charsets than
+ ASCII, and application developers might one day pass not only
+ domain names in Unicode, but also in local script to a new API for
+ the resolver libraries in the operating system. Thus the ToASCII
+ and ToUnicode operations might be performed inside these new
+ versions of the resolver libraries.
+
+ Resolver APIs such as getaddrinfo() and its predecessor
+ gethostbyname() were defined to accept C-Language "char *" arguments,
+ meaning they accept a string of bytes, terminated with a NULL (0)
+ byte. Because of the use of a NULL octet as a string terminator,
+ this is sufficient for ASCII strings (including A-labels) and even
+ ISO-2022-JP [RFC1468] and UTF-8 strings (unless an implementation
+ artificially precludes them), but not UTF-16 or UTF-32 strings
+ because a NULL octet could appear in the middle of strings using
+ these encodings. Several operating systems historically used in
+ Japan will accept (and expect) ISO-2022-JP strings in such APIs.
+ Some platforms used worldwide also have new versions of the APIs
+ (e.g., GetAddrInfoW() on Windows) that accept other encoding schemes
+ such as UTF-16.
+
+ It is worth noting that an API using C-Language "char *" arguments
+ can distinguish between conventional ASCII "hostname" labels,
+ A-labels, ISO-2022-JP, and UTF-8 labels in names if the coding is
+ known to be one of those four, and the label is intact (no lost or
+ mangled characters). If a stateful encoding like ISO-2022-JP is
+ used, applications extracting labels from text must take special
+ precautions to be sure that the appropriate state-setting characters
+ are included in the string passed to the API.
+
+ An example method for distinguishing among such codings is as
+ follows:
+
+ o if the label contains an ESC (0x1B) byte, the label is
+ ISO-2022-JP; otherwise,
+
+ o if any byte in the label has the high bit set, the label is UTF-8;
+ otherwise,
+
+ o if the label starts with "xn--", then it is presumed to be an
+ A-label; otherwise,
+
+
+
+Thaler, et al. Informational [Page 8]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ o the label is ASCII (and therefore, by definition, the label is
+ also UTF-8, since ASCII is a subset of UTF-8).
+
+ Again this assumes that ASCII labels never start with "xn--", and
+ also that UTF-8 strings never contain an ESC character. Also the
+ above is merely an illustration; UTF-8 can be detected and
+ distinguished from other 8-bit encodings with good accuracy [MJD].
+
+ It is more difficult or impossible to distinguish the ISO 8859
+ character sets [ISO8859] from each other, because they differ in up
+ to about 90 characters that have exactly the same encodings, and a
+ short string is very unlikely to contain enough characters to allow a
+ receiver to deduce the character set. Similarly, it is not possible
+ in general to distinguish between ISO-2022-JP and any other encoding
+ based on ISO 2022 code table switching.
+
+ Although it is possible (as in the example above) to distinguish some
+ encodings when not explicitly specified, it is cleaner to have the
+ encodings specified explicitly, such as specifying UTF-16 for
+ GetAddrInfoW(), or specifying explicitly which APIs expect UTF-8
+ strings.
+
+2. Use of Non-DNS Protocols
+
+ As noted earlier, typical name resolution libraries are not
+ DNS-specific. Furthermore, some protocols are defined to use
+ encoding forms other than IDNA A-labels. For example, mDNS
+ [DNS-MULTICAST] specifies that UTF-8 be used. Indeed, the IETF
+ policy on character sets and languages [RFC2277] (which followed the
+ 1996 IAB-sponsored workshop [RFC2130]) states:
+
+ Protocols MUST be able to use the UTF-8 charset, which consists of
+ the ISO 10646 coded character set combined with the UTF-8
+ character encoding scheme, as defined in [10646] Annex R
+ (published in Amendment 2), for all text.
+
+ Protocols MAY specify, in addition, how to use other charsets or
+ other character encoding schemes for ISO 10646, such as UTF-16,
+ but lack of an ability to use UTF-8 is a violation of this policy;
+ such a violation would need a variance procedure ([BCP9] section
+ 9) with clear and solid justification in the protocol
+ specification document before being entered into or advanced upon
+ the standards track.
+
+ For existing protocols or protocols that move data from existing
+ datastores, support of other charsets, or even using a default
+ other than UTF-8, may be a requirement. This is acceptable, but
+ UTF-8 support MUST be possible.
+
+
+
+Thaler, et al. Informational [Page 9]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ Applications that convert an IDN to A-label form before calling
+ getaddrinfo() will result in name resolution failures if the Punycode
+ name is directly used in such protocols. Having libraries or
+ protocols to convert from A-labels to the encoding scheme defined by
+ the protocol (e.g., UTF-8) would require changes to APIs and/or
+ servers, which IDNA was intended to avoid.
+
+ As a result, applications that assume that non-ASCII names are
+ resolved using the public DNS and blindly convert them to A-labels
+ without knowledge of what protocol will be selected by the name
+ resolution library, have problems. Furthermore, name resolution
+ libraries often try multiple protocols until one succeeds, because
+ they are defined to use a common namespace. For example, the hosts
+ file [RFC0952], NetBIOS-over-TCP [RFC1001], and DNS [RFC1034], are
+ all defined to be able to share a common syntax. This means that
+ when an application passes a name to be resolved, resolution may in
+ fact be attempted using multiple protocols, each with a potentially
+ different encoding scheme. For this to work successfully, the name
+ must be converted to the appropriate encoding scheme only after the
+ choice is made to use that protocol. In general, this cannot be done
+ by the application since the choice of protocol is not made by the
+ application.
+
+3. Use of Non-ASCII in DNS
+
+ A common misconception is that DNS only supports names that can be
+ expressed using letters, digits, and hyphens.
+
+ This misconception originally stems from the 1985 definition of an
+ "Internet hostname" (and net, gateway, and domain name) for use in
+ the "hosts" file [RFC0952]. An Internet hostname was defined therein
+ as including only letters, digits, and hyphens, where uppercase and
+ lowercase letters were to be treated as identical. The DNS
+ specification [RFC1034], Section 3.5 entitled "Preferred name syntax"
+ then repeated this definition in 1987, saying that this "syntax will
+ result in fewer problems with many applications that use domain names
+ (e.g., mail, TELNET)".
+
+ The confusion was thus left as to whether the "preferred" name syntax
+ was a mandatory restriction in DNS, or merely "preferred".
+
+ The definition of an Internet hostname was updated in 1989
+ ([RFC1123], Section 2.1) to allow names starting with a digit.
+ However, it did not address the increasing confusion as to whether
+ all names in DNS are "hostnames", or whether a "hostname" is merely a
+ special case of a DNS name.
+
+
+
+
+
+Thaler, et al. Informational [Page 10]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ By 1997, things had progressed to a state where it was necessary to
+ clarify these areas of confusion. "Clarifications to the DNS
+ Specification" [RFC2181], Section 11 states:
+
+ The DNS itself places only one restriction on the particular
+ labels that can be used to identify resource records. That one
+ restriction relates to the length of the label and the full name.
+ The length of any one label is limited to between 1 and 63 octets.
+ A full domain name is limited to 255 octets (including the
+ separators). The zero length full name is defined as representing
+ the root of the DNS tree, and is typically written and displayed
+ as ".". Those restrictions aside, any binary string whatever can
+ be used as the label of any resource record. Similarly, any
+ binary string can serve as the value of any record that includes a
+ domain name as some or all of its value (SOA, NS, MX, PTR, CNAME,
+ and any others that may be added). Implementations of the DNS
+ protocols must not place any restrictions on the labels that can
+ be used.
+
+ Hence, it clarified that the restriction to letters, digits, and
+ hyphens does not apply to DNS names in general, nor to records that
+ include "domain names". Hence, the "preferred" name syntax described
+ in the original DNS specification [RFC1034] is indeed merely
+ "preferred", not mandatory.
+
+ Since there is no restriction even to ASCII, let alone letter-digit-
+ hyphen use, DNS does not violate the subsequent IETF requirement to
+ allow UTF-8 [RFC2277].
+
+ Using UTF-16 or UTF-32 encoding, however, would not be ideal for use
+ in DNS packets or C-Language "char *" APIs because existing software
+ already uses ASCII, and UTF-16 and UTF-32 strings can contain
+ all-zero octets that existing software will interpret as the end of
+ the string. To use UTF-16 or UTF-32, one would need some way of
+ knowing whether the string was encoded using ASCII, UTF-16, or
+ UTF-32, and indeed for UTF-16 or UTF-32 whether it was big-endian or
+ little-endian encoding. In contrast, UTF-8 works well because any
+ 7-bit ASCII string is also a UTF-8 string representing the same
+ characters.
+
+ If a private namespace is defined to use UTF-8 (and not other
+ encodings such as UTF-16 or UTF-32), there's no need for a mechanism
+ to know whether a string was encoded using ASCII or UTF-8, because
+ (for any string that can be represented using ASCII) the
+ representations are exactly the same. In other words, for any string
+ that can be represented using ASCII, it doesn't matter whether it is
+ interpreted as ASCII or UTF-8 because both encodings are the same,
+ and for any string that can't be represented using ASCII, it's
+
+
+
+Thaler, et al. Informational [Page 11]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ obviously UTF-8. In addition, unlike UTF-16 and UTF-32, ASCII and
+ UTF-8 are both byte-oriented encodings so the question of big-endian
+ or little-endian encoding doesn't apply.
+
+ While implementations of the DNS protocol must not place any
+ restrictions on the labels that can be used, applications that use
+ the DNS are free to impose whatever restrictions they like, and many
+ have. The above rules permit a domain name label that contains
+ unusual characters, such as embedded spaces, which many applications
+ consider a bad idea. For example, the original specification
+ [RFC0821] of the SMTP protocol [RFC5321] constrains the character set
+ usable in email addresses. There is now an effort underway to define
+ an extension to SMTP to support internationalized email addresses and
+ headers. See the EAI framework [RFC4952] for more discussion on this
+ topic.
+
+ Shortly after the DNS Clarifications [RFC2181] and IETF character
+ sets and languages policy [RFC2277] were published, the need for
+ internationalized names within private namespaces (i.e., within
+ enterprises) arose. The current (and past, predating IDNA and the
+ prefixed ACE conventions) practice within enterprises that support
+ other languages is to put UTF-8 names in their internal DNS servers
+ in a private namespace. For example, "Using the UTF-8 Character Set
+ in the Domain Name System" [UTF8-DNS] was first written in 1997, and
+ was then widely deployed in Windows. The use of UTF-8 names in DNS
+ was similarly implemented and deployed in Mac OS, simply by virtue of
+ the fact that applications blindly passed UTF-8 strings to the name
+ resolution APIs, the name resolution APIs blindly passed those UTF-8
+ strings to the DNS servers, and the DNS servers correctly answered
+ those queries. From the user's point of view, everything worked
+ properly without any special new code being written, except that
+ ASCII is matched case-insensitively whereas UTF-8 is not (although
+ some enterprise DNS servers reportedly attempt to do case-insensitive
+ matching on UTF-8 within private namespaces, an action that causes
+ other problems and violates a subsequent prohibition [RFC4343]).
+ Within a private namespace, and especially in light of the IETF UTF-8
+ policy [RFC2277], it was reasonable to assume that binary strings
+ were encoded in UTF-8.
+
+ As implied earlier, there are also issues with mapping strings to
+ some canonical form, independent of the encoding. Such issues are
+ not discussed in detail in this document. They are discussed to some
+ extent in, for example, Section 3 of "Unicode Format for Network
+ Interchange" [RFC5198], and are left as opportunities for elaboration
+ in other documents.
+
+ A few years after UTF-8 was already in use in private namespaces in
+ DNS, the strategy of using a reserved prefix and an ASCII-compatible
+
+
+
+Thaler, et al. Informational [Page 12]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ encoding (ACE) was developed for IDNA. That strategy included the
+ Punycode algorithm, which began to be developed (during the period
+ from 2002 [IDN-PUNYCODE] to 2003 [RFC3492]) for use in the public DNS
+ namespace. There were a number of reasons for this. One such reason
+ the prefixed ACE strategy was selected for the public DNS namespace
+ had to do with the fact that other encodings such as ISO 8859-1 were
+ also in use in DNS and the various encodings were not necessarily
+ distinguishable from each other. Another reason had to do with
+ concerns about whether the details of IDNA, including the use of the
+ Punycode algorithm, were an adequate solution to the problems that
+ were posed. If either the Punycode algorithm or fundamental aspects
+ of character handling were wrong, and had to be changed to something
+ incompatible, it would be possible to switch to a new prefix or adopt
+ another model entirely. Only the part of the public DNS namespace
+ that starts a label with "xn--" would be polluted.
+
+ Today the algorithm is seen as being about as good as it can
+ realistically be, so moving to a different encoding (UTF-8 as
+ suggested in this document) that can be viewed as "native" would not
+ be as risky as it would have been in 2002.
+
+ In any case, the publication of Punycode [RFC3492] and the
+ dependencies on it in the IDNA Protocol document [RFC5891] and the
+ earlier IDNA specification [RFC3490] thus resulted in having to use
+ different encodings for different namespaces (where UTF-8 for private
+ namespaces was already deployed). Hence, referring back to Figure 2,
+ a different encoding scheme may be in use on the Internet vs. a local
+ network.
+
+ In general, a host may be connected to zero or more networks using
+ private namespaces, plus potentially the public namespace.
+ Applications that convert a U-label form IDN to an A-label before
+ calling getaddrinfo() will incur name resolution failures if the name
+ is actually registered in a private namespace in some other encoding
+ (e.g., UTF-8). Having libraries or protocols convert from A-labels
+ to the encoding used by a private namespace (e.g., UTF-8) would
+ require changes to APIs and/or servers, which IDNA was intended to
+ avoid.
+
+ Also, a fully-qualified domain name (FQDN) to be resolved may be
+ obtained directly from an application, or it may be composed by the
+ DNS resolver itself from a single label obtained from an application
+ by using a configured suffix search list, and the resulting FQDN may
+ use multiple encodings in different labels. For more information on
+ the suffix search list, see Section 6 of "Common DNS Implementation
+ Errors and Suggested Fixes" [RFC1536], the DHCP Domain Search Option
+ [RFC3397], and Section 4 of "DNS Configuration options for DHCPv6"
+ [RFC3646].
+
+
+
+Thaler, et al. Informational [Page 13]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ As noted in Section 6 of "Common DNS Implementation Errors and
+ Suggested Fixes" [RFC1536], the community has had bad experiences
+ (e.g., security problems [RFC1535]) with "searching" for domain names
+ by trying multiple variations or appending different suffixes. Such
+ searching can yield inconsistent results depending on the order in
+ which alternatives are tried. Nonetheless, the practice is
+ widespread and must be considered.
+
+ The practice of searching for names, whether by the use of a suffix
+ search list or by searching in different namespaces, can yield
+ inconsistent results. For example, even when a suffix search list is
+ only used when an application provides a name containing no dots, two
+ clients with different configured suffix search lists can get
+ different answers, and the same client could get different answers at
+ different times if it changes its configuration (e.g., when moving to
+ another network). A deeper discussion of this topic is outside the
+ scope of this document.
+
+3.1. Examples
+
+ Some examples of cases that can happen in existing implementations
+ today (where {non-ASCII} below represents some user-entered non-ASCII
+ string) are:
+
+ o User types in {non-ASCII}.{non-ASCII}.com, and the application
+ passes it, in the form of a UTF-8 string, to getaddrinfo() or
+ gethostbyname() or equivalent.
+
+ 1. The DNS resolver passes the (UTF-8) string unmodified to a DNS
+ server.
+
+ o User types in {non-ASCII}.{non-ASCII}.com, and the application
+ passes it to a name resolution API that accepts strings in some
+ other encoding such as UTF-16, e.g., GetAddrInfoW() on Windows.
+
+ 1. The name resolution API decides to pass the string to DNS (and
+ possibly other protocols).
+
+ 2. The DNS resolver converts the name from UTF-16 to UTF-8 and
+ passes the query to a DNS server.
+
+ o User types in {non-ASCII}.{non-ASCII}.com, but the application
+ first converts it to A-label form such that the name that is
+ passed to name resolution APIs is (say)
+ xn--e1afmkfd.xn--80akhbyknj4f.com.
+
+ 1. The name resolution API decides to pass the string to DNS (and
+ possibly other protocols).
+
+
+
+Thaler, et al. Informational [Page 14]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ 2. The DNS resolver passes the string unmodified to a DNS server.
+
+ 3. If the name is not found in DNS, the name resolution API
+ decides to try another protocol, say mDNS.
+
+ 4. The query goes out in mDNS, but since mDNS specified that
+ names are to be registered in UTF-8, the name isn't found
+ since it was encoded as an A-label in the query.
+
+ o User types in {non-ASCII}, and the application passes it, in the
+ form of a UTF-8 string, to getaddrinfo() or equivalent.
+
+ 1. The name resolution API decides to pass the string to DNS (and
+ possibly other protocols).
+
+ 2. The DNS resolver will append suffixes in the suffix search
+ list, which may contain UTF-8 characters if the local network
+ uses a private namespace.
+
+ 3. Each FQDN in turn will then be sent in a query to a DNS
+ server, until one succeeds.
+
+ o User types in {non-ASCII}, but the application first converts it
+ to an A-label, such that the name that is passed to getaddrinfo()
+ or equivalent is (say) xn--e1afmkfd.
+
+ 1. The name resolution API decides to pass the string to DNS (and
+ possibly other protocols).
+
+ 2. The DNS stub resolver will append suffixes in the suffix
+ search list, which may contain UTF-8 characters if the local
+ network uses a private namespace, resulting in (say)
+ xn--e1afmkfd.{non-ASCII}.com
+
+ 3. Each FQDN in turn will then be sent in a query to a DNS
+ server, until one succeeds.
+
+ 4. Since the private namespace in this case uses UTF-8, the above
+ queries fail, since the A-label version of the name was not
+ registered in that namespace.
+
+ o User types in {non-ASCII1}.{non-ASCII2}.{non-ASCII3}.com, where
+ {non-ASCII3}.com is a public namespace using IDNA and A-labels,
+ but {non-ASCII2}.{non-ASCII3}.com is a private namespace using
+ UTF-8, which is accessible to the user. The application passes
+ the name, in the form of a UTF-8 string, to getaddrinfo() or
+ equivalent.
+
+
+
+
+Thaler, et al. Informational [Page 15]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ 1. The name resolution API decides to pass the string to DNS (and
+ possibly other protocols).
+
+ 2. The DNS resolver tries to locate the authoritative server, but
+ fails the lookup because it cannot find a server for the UTF-8
+ encoding of {non-ASCII3}.com, even though it would have access
+ to the private namespace. (To make this work, the private
+ namespace would need to include the UTF-8 encoding of
+ {non-ASCII3}.com.)
+
+ When users use multiple applications, some of which do A-label
+ conversion prior to passing a name to name resolution APIs, and some
+ of which do not, odd behavior can result which at best violates the
+ Principle of Least Surprise, and at worst can result in security
+ vulnerabilities.
+
+ First consider two competing applications, such as web browsers, that
+ are designed to achieve the same task. If the user types the same
+ name into each browser, one may successfully resolve the name (and
+ hence access the desired content) because the encoding scheme is
+ correct, while the other may fail name resolution because the
+ encoding scheme is incorrect. Hence the issue can incent users to
+ switch to another application (which in some cases means switching to
+ an IDNA application, and in other cases means switching away from an
+ IDNA application).
+
+ Next consider two separate applications where one is designed to be
+ launched from the other, for example a web browser launching a media
+ player application when the link to a media file is clicked. If both
+ types of content (web pages and media files in this example) are
+ hosted at the same IDN in a private namespace, but one application
+ converts to A-labels before calling name resolution APIs and the
+ other does not, the user may be able to access a web page, click on
+ the media file causing the media player to launch and attempt to
+ retrieve the media file, which will then fail because the IDN
+ encoding scheme was incorrect. Or even worse, if an attacker is able
+ to register the same name in the other encoding scheme, the user may
+ get the content from the attacker's machine. This is similar to a
+ normal phishing attack, except that the two names represent exactly
+ the same Unicode characters.
+
+4. Recommendations
+
+ On many platforms, the name resolution library will automatically use
+ a variety of protocols to search a variety of namespaces, which might
+ be using UTF-8 or other encodings. In addition, even when only the
+ DNS protocol is used, in many operational environments, a private DNS
+
+
+
+
+Thaler, et al. Informational [Page 16]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ namespace using UTF-8 is also deployed and is automatically searched
+ by the name resolution library.
+
+ As explained earlier, using multiple canonical formats, and multiple
+ encodings in different protocols or even in different places in the
+ same namespace creates problems. Because of this, and the fact that
+ both IDNA A-labels and UTF-8 are in use as encoding mechanisms for
+ domain names today, we make the recommendations described below.
+
+ It is inappropriate for an application that calls a general-purpose
+ name resolution library to convert a name to an A-label unless the
+ application is absolutely certain that, in all environments where the
+ application might be used, only the global DNS that uses IDNA
+ A-labels actually will be used to resolve the name.
+
+ Instead, conversion to A-label form, or any other special encoding
+ required by a particular name-lookup protocol, should be done only by
+ an entity that knows which protocol will be used (e.g., the DNS
+ resolver, or getaddrinfo() upon deciding to pass the name to DNS),
+ rather than by general applications that call protocol-independent
+ name resolution APIs. (Of course, applications that store strings
+ internally in a different format than that required by those APIs,
+ need to convert strings from their own internal format to the format
+ required by the API.) Similarly, even if an application can know
+ that DNS is to be used, the conversion to A-labels should be done
+ only by an entity that knows which part of the DNS namespace will be
+ used.
+
+ That is, a more intelligent DNS resolver would be more liberal in
+ what it would accept from an application and be able to query for
+ both a name in A-label form (e.g., over the Internet) and a UTF-8
+ name (e.g., over a corporate network with a private namespace) in
+ case the server only recognizes one. However, we might also take
+ into account that the various resolution behaviors discussed earlier
+ could also occur with record updates (e.g., with Dynamic Update
+ [RFC2136]), resulting in some names being registered in a local
+ network's private namespace by applications doing conversion to
+ A-labels, and other names being registered using UTF-8. Hence, a
+ name might have to be queried with both encodings to be sure to
+ succeed without changes to DNS servers.
+
+ Similarly, a more intelligent stub resolver would also be more
+ liberal in what it would accept from a response as the value of a
+ record (e.g., PTR) in that it would accept either UTF-8 (U-labels in
+ the case of IDNA) or A-labels and convert them to whatever encoding
+ is used by the application APIs to return strings to applications.
+
+
+
+
+
+Thaler, et al. Informational [Page 17]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ Indeed the choice of conversion within the resolver libraries is
+ consistent with the quote from Section 6.2 of the original IDNA
+ specification [RFC3490] stating that conversion using the Punycode
+ algorithm (i.e., to A-labels) "might be performed inside these new
+ versions of the resolver libraries".
+
+ That said, some application-layer protocols (e.g., EPP Domain Name
+ Mapping [RFC5731]) are defined to use A-labels rather than simply
+ using UTF-8 as recommended by the IETF character sets and languages
+ policy [RFC2277]. In this case, an application may receive a string
+ containing A-labels and want to pass it to name resolution APIs.
+ Again the recommendation that a resolver library be more liberal in
+ what it would accept from an application would mean that such a name
+ would be accepted and re-encoded as needed, rather than requiring the
+ application to do so.
+
+ It is important that any APIs used by applications to pass names
+ specify what encoding(s) the API uses. For example, GetAddrInfoW()
+ on Windows specifies that it accepts UTF-16 and only UTF-16. In
+ contrast, the original specification of getaddrinfo() [RFC3493] does
+ not, and hence platforms vary in what they use (e.g., Mac OS uses
+ UTF-8 whereas Windows uses Windows code pages).
+
+ Finally, the question remains about what, if anything, a DNS server
+ should do to handle cases where some existing applications or hosts
+ do IDNA queries using A-labels within the local network using a
+ private namespace, and other existing applications or hosts send
+ UTF-8 queries. It is undesirable to store different records for
+ different encodings of the same name, since this introduces the
+ possibility for inconsistency between them. Instead, a new DNS
+ server serving a private namespace using UTF-8 could potentially
+ treat encoding-conversion in the same way as case-insensitive
+ comparison which a DNS server is already required to do, as long the
+ DNS server has some way to know what the encoding is. Two encodings
+ are, in this sense, two representations of the same name, just as two
+ case-different strings are. However, whereas case comparison of
+ non-ASCII characters is complicated by ambiguities (as explained in
+ the IAB's Review and Recommendations for Internationalized Domain
+ Names [RFC4690]), encoding conversion between A-labels and U-labels
+ is unambiguous.
+
+5. Security Considerations
+
+ Having applications convert names to prefixed ACE format (A-labels)
+ before calling name resolution can result in security
+ vulnerabilities. If the name is resolved by protocols or in zones
+ for which records are registered using other encoding schemes, an
+ attacker can claim the A-label version of the same name and hence
+
+
+
+Thaler, et al. Informational [Page 18]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ trick the victim into accessing a different destination. This can be
+ done for any non-ASCII name, even when there is no possible confusion
+ due to case, language, or other issues. Other types of confusion
+ beyond those resulting simply from the choice of encoding scheme are
+ discussed in "Review and Recommendations for IDNs" [RFC4690].
+
+ Designers and users of encodings that represent Unicode strings in
+ terms of ASCII should also consider whether trademark protection or
+ phishing are issues, e.g., if one name would be encoded in a way that
+ would be naturally associated with another organization or product.
+
+6. Acknowledgements
+
+ The authors wish to thank Patrik Faltstrom, Martin Duerst, JFC
+ Morfin, Ran Atkinson, S. Moonesamy, Paul Hoffman, and Stephane
+ Bortzmeyer for their careful review and helpful suggestions. It is
+ also interesting to note that none of the first three individuals'
+ names above can be spelled out and written correctly in ASCII text.
+ Furthermore, one of the IAB member's names below (Andrei Robachevsky)
+ cannot be written in the script as it appears on his birth
+ certificate.
+
+7. IAB Members at the Time of Approval
+
+ Bernard Aboba
+ Marcelo Bagnulo
+ Ross Callon
+ Spencer Dawkins
+ Vijay Gill
+ Russ Housley
+ John Klensin
+ Olaf Kolkman
+ Danny McPherson
+ Jon Peterson
+ Andrei Robachevsky
+ Dave Thaler
+ Hannes Tschofenig
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Thaler, et al. Informational [Page 19]
+
+RFC 6055 IDN Encodings February 2011
+
+
+8. References
+
+8.1. Normative References
+
+ [10646] International Organization for Standardization,
+ "Information Technology - Universal Multiple-octet
+ coded Character Set (UCS)".
+
+ ISO/IEC Standard 10646, comprised of ISO/IEC 10646-
+ 1:2000, "Information technology -- Universal
+ Multiple-Octet Coded Character Set (UCS) -- Part 1:
+ Architecture and Basic Multilingual Plane", ISO/IEC
+ 10646-2:2001, "Information technology -- Universal
+ Multiple-Octet Coded Character Set (UCS) -- Part 2:
+ Supplementary Planes" and ISO/IEC 10646- 1:2000/Amd
+ 1:2002, "Mathematical symbols and other characters".
+
+ [Unicode] The Unicode Consortium. The Unicode Standard,
+ Version 5.1.0, defined by: "The Unicode Standard,
+ Version 5.0", Boston, MA, Addison-Wesley, 2007, ISBN
+ 0-321-48091-0, as amended by Unicode 5.1.0
+ (http://www.unicode.org/versions/Unicode5.1.0/).
+
+8.2. Informative References
+
+ [DNS-MULTICAST] Cheshire, S. and M. Krochmal, "Multicast DNS", Work
+ in Progress, February 2011.
+
+ [IDN-PUNYCODE] Costello, A., "Punycode version 0.3.3", Work
+ in Progress, January 2002.
+
+ [ISO8859] International Organization for Standardization,
+ "Information technology -- 8-bit single-byte coded
+ graphic character sets".
+
+ ISO/IEC Standard 8859, comprised of ISO/IEC 8859-
+ 1:1998, Part 1: Latin alphabet No. 1 - ISO/IEC 8859-
+ 2:1999, Part 2: Latin alphabet No. 2 - ISO/IEC 8859-
+ 3:1999, Part 3: Latin alphabet No. 3 - ISO/IEC 8859-
+ 4:1998, Part 4: Latin alphabet No. 4 - ISO/IEC 8859-
+ 5:1999, Part 5: Latin/Cyrillic alphabet - ISO/IEC
+ 8859-6:1999, Part 6: Latin/Arabic alphabet - ISO/IEC
+ 8859-7:2003, Part 7: Latin/Greek alphabet - ISO/IEC
+ 8859-8:1999, Part 8: Latin/Hebrew alphabet - ISO/IEC
+ 8859-9:1999, Part 9: Latin alphabet No. 5 - ISO/IEC
+ 8859-10:1998, Part 10: Latin alphabet No. 6 - ISO/
+ IEC 8859-11:2001, Part 11: Latin/Thai alphabet -
+ ISO/IEC 8859-13:1998, Part 13: Latin alphabet No. 7
+
+
+
+Thaler, et al. Informational [Page 20]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ - ISO/IEC 8859-14:1998, Part 14: Latin alphabet No.
+ 8 (Celtic) - ISO/IEC 8859-15:1999, Part 15: Latin
+ alphabet No. 9 - ISO/IEC 8859-16:2001, Part 16:
+ Latin alphabet No. 10.
+
+ [MJD] Duerst, M., "The Properties and Promizes of UTF-8",
+ 11th International Unicode Conference, San Jose ,
+ September 1997, <http://www.ifi.unizh.ch/mml/
+ mduerst/papers/PDF/IUC11-UTF-8.pdf>.
+
+ [NIS] Sun Microsystems, "System and Network
+ Administration", March 1990.
+
+ [RFC0821] Postel, J., "Simple Mail Transfer Protocol", STD 10,
+ RFC 821, August 1982.
+
+ [RFC0952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD
+ Internet host table specification", RFC 952,
+ October 1985.
+
+ [RFC1001] NetBIOS Working Group, "Protocol standard for a
+ NetBIOS service on a TCP/UDP transport: Concepts and
+ methods", STD 19, RFC 1001, March 1987.
+
+ [RFC1002] NetBIOS Working Group, "Protocol standard for a
+ NetBIOS service on a TCP/UDP transport: Detailed
+ specifications", STD 19, RFC 1002, March 1987.
+
+ [RFC1034] Mockapetris, P., "Domain names - concepts and
+ facilities", STD 13, RFC 1034, November 1987.
+
+ [RFC1123] Braden, R., "Requirements for Internet Hosts -
+ Application and Support", STD 3, RFC 1123,
+ October 1989.
+
+ [RFC1468] Murai, J., Crispin, M., and E. van der Poel,
+ "Japanese Character Encoding for Internet Messages",
+ RFC 1468, June 1993.
+
+ [RFC1535] Gavron, E., "A Security Problem and Proposed
+ Correction With Widely Deployed DNS Software",
+ RFC 1535, October 1993.
+
+ [RFC1536] Kumar, A., Postel, J., Neuman, C., Danzig, P., and
+ S. Miller, "Common DNS Implementation Errors and
+ Suggested Fixes", RFC 1536, October 1993.
+
+
+
+
+
+Thaler, et al. Informational [Page 21]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ [RFC2130] Weider, C., Preston, C., Simonsen, K., Alvestrand,
+ H., Atkinson, R., Crispin, M., and P. Svanberg, "The
+ Report of the IAB Character Set Workshop held 29
+ February - 1 March, 1996", RFC 2130, April 1997.
+
+ [RFC2136] Vixie, P., Thomson, S., Rekhter, Y., and J. Bound,
+ "Dynamic Updates in the Domain Name System (DNS
+ UPDATE)", RFC 2136, April 1997.
+
+ [RFC2181] Elz, R. and R. Bush, "Clarifications to the DNS
+ Specification", RFC 2181, July 1997.
+
+ [RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
+ Languages", BCP 18, RFC 2277, January 1998.
+
+ [RFC3397] Aboba, B. and S. Cheshire, "Dynamic Host
+ Configuration Protocol (DHCP) Domain Search Option",
+ RFC 3397, November 2002.
+
+ [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
+ "Internationalizing Domain Names in Applications
+ (IDNA)", RFC 3490, March 2003.
+
+ [RFC3492] Costello, A., "Punycode: A Bootstring encoding of
+ Unicode for Internationalized Domain Names in
+ Applications (IDNA)", RFC 3492, March 2003.
+
+ [RFC3493] Gilligan, R., Thomson, S., Bound, J., McCann, J.,
+ and W. Stevens, "Basic Socket Interface Extensions
+ for IPv6", RFC 3493, February 2003.
+
+ [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
+ 10646", STD 63, RFC 3629, November 2003.
+
+ [RFC3646] Droms, R., "DNS Configuration options for Dynamic
+ Host Configuration Protocol for IPv6 (DHCPv6)",
+ RFC 3646, December 2003.
+
+ [RFC4343] Eastlake, D., "Domain Name System (DNS) Case
+ Insensitivity Clarification", RFC 4343,
+ January 2006.
+
+ [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB,
+ "Review and Recommendations for Internationalized
+ Domain Names (IDNs)", RFC 4690, September 2006.
+
+
+
+
+
+
+Thaler, et al. Informational [Page 22]
+
+RFC 6055 IDN Encodings February 2011
+
+
+ [RFC4795] Aboba, B., Thaler, D., and L. Esibov, "Link-local
+ Multicast Name Resolution (LLMNR)", RFC 4795,
+ January 2007.
+
+ [RFC4952] Klensin, J. and Y. Ko, "Overview and Framework for
+ Internationalized Email", RFC 4952, July 2007.
+
+ [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for
+ Network Interchange", RFC 5198, March 2008.
+
+ [RFC5321] Klensin, J., "Simple Mail Transfer Protocol",
+ RFC 5321, October 2008.
+
+ [RFC5731] Hollenbeck, S., "Extensible Provisioning Protocol
+ (EPP) Domain Name Mapping", STD 69, RFC 5731,
+ August 2009.
+
+ [RFC5890] Klensin, J., "Internationalized Domain Names for
+ Applications (IDNA): Definitions and Document
+ Framework", RFC 5890, August 2010.
+
+ [RFC5891] Klensin, J., "Internationalized Domain Names in
+ Applications (IDNA): Protocol", RFC 5891,
+ August 2010.
+
+ [UTF8-DNS] Kwan, S. and J. Gilroy, "Using the UTF-8 Character
+ Set in the Domain Name System", Work in Progress,
+ November 1997.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Thaler, et al. Informational [Page 23]
+
+RFC 6055 IDN Encodings February 2011
+
+
+Authors' Addresses
+
+ Dave Thaler
+ Microsoft Corporation
+ One Microsoft Way
+ Redmond, WA 98052
+ USA
+
+ Phone: +1 425 703 8835
+ EMail: dthaler@microsoft.com
+
+
+ John C Klensin
+ 1770 Massachusetts Ave, Ste 322
+ Cambridge, MA 02140
+
+ Phone: +1 617 245 1457
+ EMail: john+ietf@jck.com
+
+
+ Stuart Cheshire
+ Apple Inc.
+ 1 Infinite Loop
+ Cupertino, CA 95014
+
+ Phone: +1 408 974 3207
+ EMail: cheshire@apple.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Thaler, et al. Informational [Page 24]
+