summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc4185.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc4185.txt')
-rw-r--r--doc/rfc/rfc4185.txt1067
1 files changed, 1067 insertions, 0 deletions
diff --git a/doc/rfc/rfc4185.txt b/doc/rfc/rfc4185.txt
new file mode 100644
index 0000000..e21e40d
--- /dev/null
+++ b/doc/rfc/rfc4185.txt
@@ -0,0 +1,1067 @@
+
+
+
+
+
+
+Network Working Group J. Klensin
+Request for Comments: 4185 October 2005
+Category: Informational
+
+
+ National and Local Characters for DNS Top Level Domain (TLD) Names
+
+Status of This Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2005).
+
+IESG Note
+
+ This RFC is not a candidate for any level of Internet Standard. The
+ IETF disclaims any knowledge of the fitness of this RFC for any
+ purpose and notes that the decision to publish is not based on IETF
+ review apart from IESG review for conflict with IETF work. The RFC
+ Editor has chosen to publish this document at its discretion. See
+ RFC 3932 [RFC3932] for more information.
+
+Abstract
+
+ In the context of work on internationalizing the Domain Name System
+ (DNS), there have been extensive discussions about "multilingual" or
+ "internationalized" top level domain names (TLDs), especially for
+ countries whose predominant language is not written in a Roman-based
+ script. This document reviews some of the motivations for such
+ domains, several suggestions that have been made to provide needed
+ functionality, and the constraints that the DNS imposes. It then
+ suggests an alternative, local translation, that may solve a superset
+ of the problem while avoiding protocol changes, serious deployment
+ delays, and other difficulties. The suggestion utilizes a
+ localization technique in applications to permit any TLD to be
+ accessed using the vocabulary and characters of any language. It is
+ not restricted to language- or country-specific "multilingual" TLDs
+ in the language(s) and script(s) of that country.
+
+
+
+
+
+
+
+
+
+Klensin Informational [Page 1]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 1.1. Terminology ................................................3
+ 1.2. Background on the "Multilingual Name" Problem ..............3
+ 1.2.1. Approaches to the Requirement .......................3
+ 1.2.2. Writing the Name of One's Country in its Own
+ Characters ..........................................4
+ 1.2.3. Countries with Multiple Languages and
+ Countries with Multiple .............................5
+ 1.2.4. Availability of Non-ASCII Characters in Programs ....5
+ 1.3. Domain Name System Constraints .............................6
+ 1.3.1. Administrative Hierarchy ............................6
+ 1.3.2. Aliases .............................................6
+ 1.4. Internationalization and Localization ......................7
+ 2. Client-Side Solutions ...........................................7
+ 2.1. IDNA and the Client ........................................8
+ 2.2. Local Translation Tables for TLD Names .....................8
+ 3. Advantages and Disadvantages of Local Translation ...............9
+ 3.1. Every TLD Appears in the Local Language and Character Set ..9
+ 3.2. Unification of Country Code Domains .......................10
+ 3.3. User Understanding of Local and Global References .........11
+ 3.4. Limits on Expansion of the Number of TLDs .................11
+ 3.5. Standardization of the Translations .......................12
+ 3.6. Implications for Future New Domain Names ..................13
+ 3.7. Mapping for TLDs, Not Domain Names or Keywords ............13
+ 4. Information Interchange, IDNs, Comparisons, and Translations ...13
+ 5. Internationalization Considerations ............................15
+ 6. Security Considerations ........................................15
+ 7. Acknowledgements ...............................................16
+ 8. Informative References .........................................17
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Informational [Page 2]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+1. Introduction
+
+1.1. Terminology
+
+ This document assumes the conventional terminology used to discuss
+ the domain name system (DNS) and its hierarchical arrangements.
+ Terms such as "top level domain" (or just "TLD"), "subdomain",
+ "subtree", and "zone file" are used without further explanation. In
+ addition, the term "ccTLD" is used to denote a "country code top
+ level domain" and "gTLD" is used to denote a "generic top level
+ domain" as described in [RFC1591] and in common usage.
+
+1.2. Background on the "Multilingual Name" Problem
+
+ People who share a language usually prefer to communicate in it,
+ using whatever characters are normally used to write that language,
+ rather than in some "foreign" one. There have been standards for
+ using mutually-agreed characters and languages in electronic mail
+ message bodies and selected headers since the introduction of MIME in
+ 1992 [MIME] and the Web has permitted multilingual text since its
+ inception, also using MIME. Actual use of non-Roman-character
+ content came even earlier, using private conventions. However,
+ domain names are exposed to users in email addresses and URLs.
+ Corresponding arrangements, typically also exposing domain names, are
+ made for other application protocols. The combination of exposed
+ domain names with internationalization requirements led rapidly to
+ demands to permit domain names in applications that used characters
+ other than those of the very restrictive, ASCII-subset, "hostname"
+ (or "letter-digit-hyphen" ("LDH")) conventions recommended in the DNS
+ specifications [RFC1035]. The effort to do this soon became known as
+ "multilingual domain names". That was actually a misnomer, since the
+ DNS deals only with characters and identifier strings, and not,
+ except by accident or local registration conventions, with what
+ people usually think of as "names". There has also been little
+ interest in what would actually be a "multilingual name", i.e., a
+ name that contains components from more than one language. Instead,
+ interest has focused on the use, in the context of the DNS, of
+ strings that conform to specific individual languages.
+
+1.2.1. Approaches to the Requirement
+
+ When the requirement was seen, not as "modifying the DNS", but as
+ "providing users with access to the DNS from a variety of languages
+ and character sets", three sets of proposals emerged in the IETF and
+ elsewhere. They were:
+
+
+
+
+
+
+Klensin Informational [Page 3]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ 1. Perform processing in client software that recodes a user-visible
+ string into an ASCII-compatible form that can safely be passed
+ through the DNS protocols and stored in the DNS. This is the
+ approach used, for example, in the IETF's "IDNA" protocol
+ [RFC3490].
+
+ 2. Modify the DNS to be more hospitable to non-ASCII names and
+ strings. There have been a variety of proposals to do this,
+ using several different techniques. Some of these have been
+ implemented on a proprietary basis by various vendors. None of
+ them have gained acceptance in the IETF community, primarily
+ because they would take a long time to deploy, would leave many
+ problems unsolved, and have been shown to cause problems with
+ deployed approaches that had not yet been upgraded.
+
+ 3. Move the problem out of the DNS entirely, relying instead on a
+ "directory" or "presentation" layer to handle
+ internationalization. The rationale for this approach is
+ discussed in [RFC3467].
+
+ This document proposes a fourth approach, applicable to the top level
+ domains (TLDs) only (see Section 1.3.1 for a discussion of the
+ special issues that make TLDs both problematic and a special
+ opportunity). That approach involves having the user interface of
+ applications map non-ASCII names for TLDs to existing TLDs and could
+ be used as an alternate or supplement to the strategies summarized
+ above.
+
+1.2.2. Writing the Name of One's Country in its Own Characters
+
+ An early focus of the "multilingual domain name" efforts was
+ expressed in statements such as "users in my country, in which ASCII
+ is rarely used, should be able to write an entire domain name in
+ their own character set". In particular, since all top-level domain
+ names, at present, follow the LDH rules, the modified naming rules
+ discussed in [RFC1123], and the coding conventions specified in
+ [RFC1591], all fully-qualified DNS names were effectively required to
+ contain at least one ASCII label (the TLD name). Some advocates for
+ internationalized names have considered the presence of any ASCII
+ labels inappropriate. One should, instead, be able to write the name
+ of the ccTLD for China in Chinese, the name of the ccTLD for Saudi
+ Arabia in Arabic, the name for Spain in Spanish, and so on.
+
+ That much could be accomplished, given updated applications, by using
+ a new TLD name with IDNA encoding. Of course, adding such a TLD
+ would raise new questions: what to do about gTLDs, how to handle
+ countries with several official languages (perhaps even using
+ different scripts), how should name strings be chosen, and whether
+
+
+
+Klensin Informational [Page 4]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ there should be an attempt to coordinate the contents of the local-
+ language TLD zone and the traditional ISO 3166-coded one. A few of
+ these issues are addressed below. But, if one examines (or even
+ thinks about) user behavior and preferences, it is almost as
+ important that one be able to write the name of the ccTLD for China
+ in Arabic and that of Saudi Arabia in Chinese: true
+ internationalization implies that, at least to the extent to which
+ ambiguity and conflicts can be avoided, people should be able to use
+ the languages and character sets they prefer. For the same reasons
+ that one would like to have all-Chinese domain names available in
+ China, it is important to have the capability to have an apparent
+ Chinese-language TLD for a domain whose second level and beyond are
+ Chinese characters, even when the TLD itself serves predominantly
+ non-Chinese-speaking registrants and users.
+
+1.2.3. Countries with Multiple Languages and Countries with Multiple
+ Names
+
+ From a user interface standpoint, writing ccTLD names in local
+ characters is a problem. As discussed below in Section 1.3.2, the
+ DNS itself does not easily permit a domain to be referred to by more
+ than one name (or spelling or translation of a name). Countries with
+ more than one official language would require that the country name
+ be represented in each of those languages. And, just as it is
+ important that a user in China be able to represent the name of the
+ Chinese ccTLD in Chinese characters, she should be able to access a
+ Chinese-language site in France using Chinese characters. That would
+ require that she be able to write the name of the French ccTLD in
+ Chinese characters rather than in a form based on a Roman character
+ set.
+
+1.2.4. Availability of Non-ASCII Characters in Programs
+
+ Over the years, computer users have gotten used to the fact that not
+ every computer has a full set of characters available to every
+ program. An extreme example is an Arabic speaker using a public
+ kiosk computer in an airport in the United States: there is only a
+ small chance that the web browser there will be able to input and
+ render Arabic correctly. This has a direct effect on the
+ multilingual TLD problem in that it is not possible to simply change
+ a name of the ccTLDs in the DNS to be one of a given country's non-
+ ASCII names without possibly preventing people from entering those
+ names throughout the world.
+
+
+
+
+
+
+
+
+Klensin Informational [Page 5]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+1.3. Domain Name System Constraints
+
+1.3.1. Administrative Hierarchy
+
+ The domain name system is firmly rooted in the idea of an
+ "administrative hierarchy", with the entity responsible for a given
+ node of the hierarchy responsible for policies applicable to its
+ subhierarchies (Cf. [RFC1034], [RFC1035], and [RFC1591]). The model
+ works quite well for the domain and subdomains of a particular
+ enterprise. In an enterprise situation, the hierarchy can be
+ organized to match the organizational structure; there are
+ established ways to set policies; and there are, at least presumably,
+ shared assumptions about overall goals and objectives among all
+ registrants in the domain. It is more problematic when a domain is
+ shared by unrelated entities that lack common policy assumptions
+ because it is difficult to reach agreement on rules that should apply
+ to all of the entities and subdomains of such a domain. In general,
+ the unrelated entities situation always prevails for the labels
+ registered in a TLD (second-level names). Exceptions occur in those
+ TLDs for which the second level is structural (e.g., the .CO, .AC,
+ .GOV conventions in many ccTLDs or in the historical geographical
+ organization of .US [RFC1480]). In those cases, it exists for the
+ labels within that structural level.
+
+ TLDs may, but need not, have consistent registration policies for
+ those second (or third) level names. Countries (or ccTLD
+ administrators) have often adopted rules about what entities may
+ register in their ccTLDs, and what forms the names may take. RFC
+ 1591 outlined registration norms for most of the then-extant gTLDs;
+ however, those norms have been largely ignored in recent years. Some
+ recent "sponsored" and purpose-specific domains are based on quite
+ specific rules about appropriate registrations. Homogeneous
+ registration rules for the root are, by contrast, impossible: almost
+ by definition, the subdomains registered in the root (TLDs) are
+ diverse, and no single policy about types and formats of names
+ applying to all root subdomains is feasible.
+
+1.3.2. Aliases
+
+ In an environment different from the DNS, a rational way to permit
+ assigning local-language names to a country code (or other) domain
+ would be to set up an alias for the name, or to use some sort of "see
+ instead" reference. But the DNS does not have facilities for either.
+ Instead, it supports a "CNAME" record, whose label can refer only to
+ a particular label and not to a subtree. For example, if A.B.C is a
+ fully-qualified name, then a CNAME reference in B.C from X to A would
+ make X.B.C appear to have the same values as A.B.C. However, a CNAME
+ reference from Y to C in the root would not make A.B.Y referenceable
+
+
+
+Klensin Informational [Page 6]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ (or even defined) at all. A second record type, DNAME [RFC2672], can
+ provide an alias for a portion of the tree. But many believe that it
+ is problematic technically. At a minimum, it can cause
+ synchronization issues when references across zones occur, and its
+ use has been discouraged within the IETF, except as a means of
+ enabling a transition from one domain to another. Even if the design
+ of yet another alias-type record type were contemplated, DNS
+ technical constraints of query-response integrity and DNSSec zone
+ signing (cf. [RFC4033], [RFC4034], and [RFC4035]) make it extremely
+ unlikely that one could be defined that would meet the desired
+ requirements for "see instead" or true synonym references.
+
+1.4. Internationalization and Localization
+
+ It has often been observed that, while many people talk about
+ "internationalization", they often really mean, and want,
+ "localization". "Internationalization", in this context, suggests
+ making something globally accessible while incorporating a broad-
+ range "universal" character set and conventions appropriate to all
+ languages and cultures. "Localization", by contrast, involves having
+ things work well in a particular locality or for a broad range of
+ localities, although aspects of the style of operation might differ
+ for each locality. Anything that actually involves the DNS must be
+ global, and hence internationalized, since the DNS cannot
+ meaningfully support different responses or query and matching models
+ based, e.g., on the location of the user making a query. While the
+ DNS cannot support localization internally, many of the features
+ discussed earlier in this section are much more easily thought about
+ in local terms -- whether localized to a geographical area, users of
+ a language, or using some other criteria -- than in global ones.
+
+2. Client-Side Solutions
+
+ Traditionally, the IETF avoided becoming involved in standardization
+ for actions that take place strictly on individual hosts on the
+ network, instead confining itself to behavior that is observable "on
+ the wire", i.e., in protocols between network hosts. Exceptions to
+ this general principle have been made when different clients were
+ required to utilize data or interpret values in compatible ways to
+ preserve interoperability: the standards for email and web body
+ formats, and IDNA itself, are examples of these exceptions.
+ Regardless of what is required to be standardized, it is almost never
+ required, and often unwise, that a user interface present "on the
+ wire" formats to the user, at least by default (debugging options
+ that show the wire formats are common and often quite useful).
+ However, in most cases when the presentation format and the wire
+ format differ, the client program must take precautions to ensure
+ that the wire format can be reconstructed from user input, or to keep
+
+
+
+Klensin Informational [Page 7]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ the wire format, while hidden, bound to the presentation mechanism so
+ that it can be reconstructed. While it is rarely a goal in itself,
+ it is often necessary that the user be at least vaguely aware that
+ the wire ("real") format is different from the presentation one and
+ that the wire format be available for debugging.
+
+ In fact, the DNS itself is an excellent example of the difference
+ between the wire format and the user presentation format. Most
+ Internet users do not realize that the wire format for DNS queries
+ and responses does not include the "." character. Instead, each
+ label is represented by a length in bytes of the label, followed by
+ the label itself.
+
+2.1. IDNA and the Client
+
+ As mentioned above, IDNA itself is entirely a client-side protocol.
+ It works by performing some mappings and then encoding labels to be
+ placed into the DNS in a special format called "punycode" [RFC3492].
+ When labels in that format are encountered, they are transformed, by
+ the client, back into internationalized (normally Unicode [ISO10646])
+ characters. In the context of this document, the important
+ observation about IDNA is that any application program that supports
+ it is already doing considerable transformation work in the client;
+ it is not simply presenting the "on the wire" formats to the user.
+ It is also the case that, if an application implementation makes
+ different mappings than those called for by IDNA, it is likely to be
+ detected only when, and if, users complain about unexpected behavior.
+ As long as the punycode strings sent to it are valid, the server
+ cannot tell what mappings were applied to develop those strings.
+
+2.2. Local Translation Tables for TLD Names
+
+ We suggest that, in addition to maintaining the code and tables
+ required to support IDNA, authors of application programs may want to
+ maintain a table that contains a list of TLDs and locally-desirable
+ names for each one. For ccTLDs, these might be the names (or
+ locally-standard abbreviations) by which the relevant countries are
+ known locally (whether in ASCII characters or others). With some
+ care on the part of the application designer (e.g., to ensure that
+ local forms do not conflict with the actual TLD names), a particular
+ TLD name input from the user could be either in local or standard
+ form without special tagging or problems. When DNS names are
+ received by these client programs, the TLD labels would be mapped to
+ local form before IDNA is applied to the rest of the name; when names
+ are received from users, local TLD names would be mapped to the
+ global ones before applying IDNA or being used in other DNS
+ processing.
+
+
+
+
+Klensin Informational [Page 8]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+3. Advantages and Disadvantages of Local Translation
+
+3.1. Every TLD Appears in the Local Language and Character Set
+
+ The notion of a top-level domain whose name matches, e.g., the name
+ that is used for a country in that country or the name of a language
+ in that language as, as mentioned above, is immediately appealing.
+ But most of the reasons for it argue equally strongly for other TLDs
+ being accessible from that language. A user in Korea who can access
+ the national ccTLD in the Korean language and character set has every
+ reason to expect that both generic top level domains and domains
+ associated with other countries would be similarly accessible,
+ especially if the second-level domains bear Korean names. A user
+ native to Spain or Portugal, or in Latin America, would presumably
+ have similar expectations, but would expect to use Spanish or
+ Portuguese names, not Korean ones.
+
+ That level of local optimization is not realistic -- some would argue
+ not possible -- with the DNS since it would ultimately require that
+ every top level domain be replicated for each of the world's
+ languages. That replication process would involve not just the top
+ level domain itself; in principle, all of its subtrees would need to
+ be completely replicated as well. Perhaps in practice, not all
+ subtrees would require replication, but only those for which a
+ language variation or translation was significant. But, while that
+ restriction would change the scale of the problem, it would not alter
+ its basic nature. The administrative hierarchy characteristics of
+ the DNS (see Section 1.3.1) turn the replication process into an
+ administrative nightmare: every administrator of a second-level
+ domain in the world would be forced to maintain dozens, probably
+ hundreds, of similar zone files for the replicates of the domain.
+ Even if only the zones relevant to a particular country or language
+ were replicated, the administrative and tracking problems to bind
+ these to the appropriate top-level domain and keep all of the
+ replicas synchronized would be extremely difficult at best. And many
+ administrators of third- and fourth-level domains, and beyond, would
+ be faced with similar problems.
+
+ By contrast, dealing with the names of TLDs as a localization
+ problem, using local translation, is fairly simple, although it
+ places some burden of understanding on the user (see Section 4).
+ Each function represented by a TLD -- a country, generic
+ registrations, or purpose-specific registrations -- could be
+ represented in the local language and character set as needed. And,
+ for countries with many languages -- or users living, working in, or
+ visiting countries where their language is not dominant -- "local"
+ could be defined in terms of the needs or wishes of each particular
+ user.
+
+
+
+Klensin Informational [Page 9]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ An additional benefit is that, if two countries called themselves by
+ the same name in their local languages -- if, e.g., Western Slobbovia
+ and Eastern Slobbovia both called themselves "Slobland" -- local
+ conventions could be followed as long as users understood that only
+ internal forms (in this case, the ISO 3166-based ccTLD name) could be
+ exported outside the country (see Section 3.3).
+
+ Note that this proposal is to allow mapping of native-language
+ strings to existing TLDs. It would almost certainly be ill-advised
+ to stretch this idea too far and try to map strings that local users
+ would be unlikely to guess into TLDs. For example, there are
+ probably no languages in which the country known in English as
+ "Finland" is called "FI". Thus, one would not want to create a
+ mapping from two characters that look or sound like a Roman "F" and a
+ Roman "I" to the ccTLD ".fi".
+
+3.2. Unification of Country Code Domains
+
+ It follows from some of the comments above that, while there appears
+ to be some immediate appeal from having (at least) two domains for
+ each country, one using the ISO 3166-1 code [ISO3166] and another one
+ using a name based on the national name in the national language,
+ such a situation would create considerable problems for registrants
+ in both domains. For registrants maintaining enterprise or
+ organizational subdomains, ease of administration of a single family
+ of zone files will usually make a registration in a single top-level
+ domain preferable to replicated sets of them, at least as long as
+ their functional requirements (such a local-language access) are met
+ by the unified structure. For those registrants with no interest in
+ any Internet function or protocols other than use of the HTTP/HTTPS-
+ based web, this problem can be dealt with at the applications level
+ by the use of redirects but, in the general case, that is not a
+ feasible solution.
+
+ For countries with multiple national languages that are considered
+ equal and legally equivalent, the advantages of a translation-based
+ approach, rather than multiple registrations and replicated trees,
+ would be even more significant. Actually installing and maintaining
+ a separate TLD for each language would be an administrative
+ nightmare, especially if it was intended that the associated zones be
+ kept synchronized. The oft-suggested proposal to adopt an "exactly
+ one extra domain for each country" rule would essentially require
+ some of the multiple-official-language countries to violate their own
+ constitutions. Conversely, having multiple domains for a given
+ country, based on the number of official languages and without any
+ expectation of synchronization, would give some countries an
+ additional allocation of TLDs that others would certainly consider
+ unfair.
+
+
+
+Klensin Informational [Page 10]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ Of course, having replicated domains might be popular with some
+ registries and registrars, since replication would almost inevitably
+ increase the total number of domains to be registered. Helping that
+ group of registries and registrars, while hurting Internet users by
+ adding administrative overhead and confusion, is not a goal of this
+ document.
+
+3.3. User Understanding of Local and Global References
+
+ While the IDNA tables (actually Nameprep [RFC3491] and Stringprep
+ [RFC3454]) must be identical globally for IDNA to work reliably, the
+ tables for mapping between local names and TLD names could be locally
+ determined, and differ from one locale to another, as long as users
+ understood that international interchange of names required using the
+ standard forms. That understanding puts some additional burden of
+ learning on users, although part of it could be assisted by software
+ (see Section 4).
+
+ In any event, at least in the foreseeable future, it is likely that
+ DNS names being passed among users in different countries, or using
+ different languages, will be forced to be in punycode form to
+ guarantee compatibility, since those users would not, in general,
+ have the ability to read each other's scripts or have appropriate
+ input facilities (keyboards, etc.) for then. So the marginal
+ knowledge or effort needed to put TLD names into standard form and
+ transmit them in that way would actually be fairly small.
+
+3.4. Limits on Expansion of the Number of TLDs
+
+ The concept of using local translation does have one side effect that
+ some portions of the Internet community might consider undesirable.
+ The size and complexity of translation tables, and maintaining those
+ tables, will be, to a considerable extent, a function of the number
+ of top-level domains of interest, the frequency with which new
+ domains are added, and the number of domains added at a time. A
+ country or other locale that wished to maintain a complete set of
+ translations (i.e., so that every TLD had a representation in the
+ local language) would presumably find setting up a table for the
+ current collection of a few hundred domains to be a task that would
+ take some days. If the number of TLDs were relatively stable, with a
+ relatively small number being added at infrequent intervals, the
+ updates could probably be dealt with on an ad hoc basis. But, if
+ large numbers of domains were added frequently, or if the total
+ number of TLDs became very large, maintaining the table might require
+ dedicated staff if each new TLD is to be accommodated. Worse,
+ updating the tables stored on client machines might require update
+
+
+
+
+
+Klensin Informational [Page 11]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ and synchronization protocols and all of the complexities that tend
+ to go with them (see [RFC3696] for a discussion of some related
+ issues in applications).
+
+ In practice, there will be little requirement to translate every TLD
+ into a local language. There are already existing TLDs for which
+ there is no obvious translations in many languages (most notably,
+ ".arpa") or where the translation will be far from obvious to typical
+ users (for example, ".int" and ".aero"). Of course, these could be
+ translated by function: ".arpa" to the local term for
+ "infrastructure", ".int" with "international" or "international
+ organization", ".aero" with "aeronautical" or "airlines", and so on;
+ but it is not clear whether doing so would have significant value.
+ For almost every language, there are dozens of ccTLDs for which there
+ are no translations of the country names into the local language that
+ would be known by anyone other than geographers. If new TLDs are
+ added, there might not be a strong need (or even capability) to have
+ language-specific equivalents for each.
+
+3.5. Standardization of the Translations
+
+ An immediate question when proposals such as this one are considered
+ is whether the names for the various TLDs that do not match the
+ strings that are actually in the DNS should be standardized and, if
+ so, by what mechanism. Standardization would promote communication
+ within a country or among people sharing a language. However, it is
+ likely to be very difficult to reach appropriate international
+ agreements to which wide conformance could be expected. Exceptions
+ might arise within particular countries or language groups but, even
+ then, there might be advantages to users being able to specify
+ additional synonymous names that are easy for them to remember. As
+ with IDNA-based IDNs, users who wish to transmit information about
+ domain names to people whose exact capabilities and software are
+ unknown, and to do so with minimal risk of confusion, will probably
+ confine themselves to the names that actually appear in the DNS,
+ i.e., the "punycode" representations.
+
+ In any event, neither standardization nor uniform use of either the
+ system outlined here or of a specific collection of names is required
+ to make the system work for those who would find it useful.
+ Similarly, mechanisms for country-wide coordination, and examination
+ of the appropriateness or inappropriateness of such mechanisms, is
+ beyond the scope of this document.
+
+
+
+
+
+
+
+
+Klensin Informational [Page 12]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+3.6. Implications for Future New Domain Names
+
+ Applications that implement the proposal in this document are likely
+ to make the subsequent creation and acceptance of new IDNA-based TLDs
+ significantly more difficult. If this proposal becomes widely
+ adopted, local language names mapped as it suggests will be generally
+ expected by users of those languages to mean the same as a current
+ TLD. Creating a new, stand-alone IDNA-based TLD will then require
+ more deliberation and care to avoid conflicts and, when executed,
+ will require all the application software that maps the name to the
+ existing TLD to change the mapping tables.
+
+ For several reasons, this problem may not be as serious in practice
+ as it might first appear. For ccTLDs allocated according to the ISO
+ 3166-1 list, there will presumably be no problem at all: not only are
+ the 3166-1 alpha-2 codes strictly in ASCII, but general trends, such
+ as those embodied in ICANN's "GAC Recommendations" against using
+ country names or codes for any purpose not associated with those
+ specific countries, make conflicts with internationalized names
+ extremely unlikely. Because the DNS does not currently have a usable
+ aliasing function (see Section 1.3.2), it is likely that new IDNA-
+ based TLDs will be allocated only after there is considerable
+ opportunity for countries and other individual entities to identify
+ any problems they see with proposed new names.
+
+3.7. Mapping for TLDs, Not Domain Names or Keywords
+
+ It should be clear to anyone who has read this far that the mapping
+ described in this document is limited to TLDs, not full domain names
+ or keywords. In particular, nothing here should be construed as
+ applying to anything other than TLDs, due at least in part to the
+ limitations described in Section 3.1. Further, this document is only
+ about the domain name system (DNS), not about any keyword system.
+ The interactions between particular keyword systems and the proposals
+ here are left as a (possibly very difficult) exercise for the reader
+ or implementer of such systems. However, for the subset of such
+ systems whose intent is to entirely hide DNS names or URIs from the
+ user, their output would presumably be the LDH names that actually
+ appeared in the DNS, i.e., in punycode form for IDNA names and
+ without any application processing of the type contemplated here.
+
+4. Information Interchange, IDNs, Comparisons, and Translations
+
+ This specification is based on a pair of fairly explicit assumptions.
+ The first is that the greatest and most important impact and value of
+ any internationalization or localization technique is to permit users
+ who share a language or culture to communicate with others who also
+ share that language or culture. Communication among users from
+
+
+
+Klensin Informational [Page 13]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ different cultures, using different languages or different scripts is
+ inherently more difficult, and still more difficult if they cannot
+ easily identify languages and scripts in common. The reason for
+ those difficulties are age-old issues in language translation and
+ differences among languages and scripts, not problems associated with
+ the DNS or IDNs, however they are represented. That is the second
+ assumption: when communication across language or cultural groups is
+ required, the users who need to do it -- typically a much smaller
+ number than those communicating within the same language and culture
+ -- are going to need to rely on commonly-understood languages and
+ scripts and will need to exert somewhat more care and effort than
+ within their own groups.
+
+ As outlined in the sections above, the suggestions made in this
+ document could clearly be turned into major problems by misuse or
+ misunderstanding. For example, if two applications on the same host
+ used different translation tables, a situation could easily result
+ that would be very confusing to the user. However, in some cases,
+ this would be only slightly worse than some of the alternatives. For
+ example, if, on a given system, IDNs are expressed in native script,
+ but ASCII TLD names are used, cutting and pasting from one
+ application to another may not work as expected, unless both
+ applications and the underlying operating system are all Unicode-
+ based and use the same encoding model for Unicode. Some applications
+ writers have already discovered, even without significant use of
+ IDNs, that they need to support separate "copy string" and "copy link
+ location", and the corresponding "paste" operations. Any use of IDNs
+ or Internationalized Resource Identifiers (IRIs, see [RFC3987]) may
+ require similar operations, or extensions to those operations, to
+ force strings into internal ("punycode" or URI) form on the copy
+ operation and to translate them back on paste. Were that done, the
+ appropriate translations could be performed as part of the same
+ process. If this author's hypothesis is correct -- that these
+ operations are likely to be required on many systems whether this
+ proposal is adopted or not -- then the additional translation
+ operations are likely to be invisible to the user.
+
+ In particular, precisely because the translated names proposed here
+ are part of a presentation form, rather than the internal form names,
+ they are inappropriate in a number of circumstances in which a
+ globally-unique, internal-form name is actually required. It would
+ be a poor, indeed dangerous, idea to use these names in security
+ contexts such as names in certificates, access lists, or other
+ contexts in which accurate comparisons are necessary.
+
+ A more general issue exists when DNS or IRI references are
+ transferred among users whose systems may be localized for different
+ languages or conventions. In general, a user in one part of the
+
+
+
+Klensin Informational [Page 14]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ world will not actually know how another user's systems are set up,
+ precisely what software is being used, etc., nor should users be
+ expected or forced to learn that information. But, if the user
+ transmitting an internationalized reference doesn't know that the
+ receiving system supports the same characters and fonts, and that the
+ receiving user is prepared to deal with them, the prudent user will
+ transmit the internal form of the reference in addition to, or even
+ instead of, the native-character form. And, of course, if the
+ reference is transmitted on paper, on a sign, in some coded character
+ set other than Unicode, or even as an image, rather than as a Unicode
+ string, the importance of supplementing it with the internal form
+ becomes even more important. The addition of a translation
+ requirement for TLD labels makes availability of internal forms in
+ interchange significantly more important, but does not actually
+ change the requirement to do so.
+
+ It may be helpful to note that, in a different networking model than
+ that used in the Internet, both this proposal and IDNA itself are
+ essentially "presentation layer" approaches rather than constructions
+ that can be expected to work well in interchange.
+
+5. Internationalization Considerations
+
+ This entire specification addresses issues in internationalization
+ and especially the boundaries between internationalization and
+ localization and between network protocols and client/user interface
+ actions.
+
+6. Security Considerations
+
+ IDNA provides a client-based mechanism for presenting Unicode names
+ in applications while passing only ASCII-based names on the wire. As
+ such, it constitutes a major step along the path of introducing a
+ client-based presentation layer into the Internet. Client-based
+ presentation layer transformations introduce risks from non-
+ conforming tables that can change meaning without external
+ protection. For example, if a mapping table normally maps A onto C,
+ and that table is altered by an attacker so that A maps onto D
+ instead, much mischief can be committed. On the other hand, these
+ are not the usual sort of network attacks: they may be thought of as
+ falling into the "users can always cause harm to themselves"
+ category. The local translation model outlined here does not
+ significantly increase the risks over those associated with IDNA, but
+ may provide some new avenues for exploiting them.
+
+ Both this approach and IDNA rely on having updated programs present
+ information to the user in a very different form than the one in
+ which it is transmitted on the wire. Unless the internal (wire) form
+
+
+
+Klensin Informational [Page 15]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ is always used in interchange, or at least made available when DNS
+ names are exchanged, there are possibilities for ambiguity and
+ confusion about references. As with IDNA itself, if only the "wire"
+ form is presented, the user will perceive that nothing of value has
+ been done, i.e., that no internationalization or localization has
+ occurred. So presentation of the "wire" form to eliminate the
+ potential ambiguities is unlikely to be considered an acceptable
+ solution, regardless of its security advantages.
+
+ If the translation tables associated with the technique suggested
+ here are obtained from a server, or translations are obtained from a
+ remote machine using some protocol, the mechanisms used should ensure
+ that the values received are authentic, i.e., that neither they, nor
+ the query for them, have been intercepted and tampered with in any
+ way.
+
+7. Acknowledgements
+
+ This document was inspired by a number of conversations in ICANN,
+ IETF, MINC, and private contexts about the future evolution and
+ internationalization of top level domains. Unknown to the author,
+ but unsurprisingly (the general concept should be obvious to anyone
+ even slightly skilled in the relevant technologies), the concept has
+ been apparently developed independently in other groups but, as far
+ as this author knows, not written up for general comment.
+ Discussions within, and about, the ICANN IDN Committee were
+ particularly helpful, although several of the participants in that
+ committee may be surprised about where those discussions led. Email
+ correspondence with several people after the first version of this
+ document was posted, notably Richard Hill, Paul Hoffman, Lee
+ XiaoDong, and Soobok Lee, led to considerable clarification in the
+ subsequent versions. The author is particularly grateful to Paul
+ Hoffman for extensive comments and additional text for the third
+ version and to Patrik Faltstrom, Joel Halpern, Sam Hartman, and Russ
+ Housley for suggestions incorporated into the final one.
+
+ The first version of this document was posted on October 21, 2002.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Informational [Page 16]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+8. Informative References
+
+ [ISO10646] International Organization for Standardization,
+ "Information Technology - Universal Multiple-octet coded
+ Character Set (UCS) - Part 1: Architecture and Basic
+ Multilingual Plane", ISO Standard 10646-1, May 1993.
+
+ [ISO3166] International Organization for Standardization, "Codes for
+ the representation of names of countries and their
+ subdivisions -- Part 1: Country codes", ISO Standard
+ 3166-1:1977, 1997.
+
+ [MIME] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet
+ Mail Extensions): Mechanisms for Specifying and Describing
+ the Format of Internet Message Bodies", RFC 1341, June
+ 1992.
+
+ Updated and replaced by Freed, N. and N. Borenstein,
+ "Multipurpose Internet Mail Extensions (MIME) Part One:
+ Format of Internet Message Bodies", RFC2045, November
+ 1996. Also, Moore, K., "Representation of Non-ASCII Text
+ in Internet Message Headers", RFC 1342, June 1992.
+ Updated and replaced by Moore, K., "MIME (Multipurpose
+ Internet Mail Extensions) Part Three: Message Header
+ Extensions for Non-ASCII Text", RFC 2047, November 1996.
+
+ [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
+ STD 13, RFC 1034, November 1987.
+
+ [RFC1035] Mockapetris, P., "Domain names - implementation and
+ specification", STD 13, RFC 1035, November 1987.
+
+ [RFC1123] Braden, R., "Requirements for Internet Hosts - Application
+ and Support", STD 3, RFC 1123, October 1989.
+
+ [RFC1480] Cooper, A. and J. Postel, "The US Domain", RFC 1480, June
+ 1993.
+
+ [RFC1591] Postel, J., "Domain Name System Structure and Delegation",
+ RFC 1591, March 1994.
+
+ [RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection", RFC
+ 2672, August 1999.
+
+ [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
+ Internationalized Strings ("stringprep")", RFC 3454,
+ December 2002.
+
+
+
+
+Klensin Informational [Page 17]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+ [RFC3467] Klensin, J., "Role of the Domain Name System (DNS)", RFC
+ 3467, February 2003.
+
+ [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
+ "Internationalizing Domain Names in Applications (IDNA)",
+ RFC 3490, March 2003.
+
+ [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
+ Profile for Internationalized Domain Names (IDN)", RFC
+ 3491, March 2003.
+
+ [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode
+ for Internationalized Domain Names in Applications
+ (IDNA)", RFC 3492, March 2003.
+
+ [RFC3696] Klensin, J., "Application Techniques for Checking and
+ Transformation of Names", RFC 3696, February 2004.
+
+ [RFC3932] Alvestrand, H., "The IESG and RFC Editor Documents:
+ Procedures", BCP 92, RFC 3932, October 2004.
+
+ [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource
+ Identifiers (IRIs)", RFC 3987, January 2005.
+
+ [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S.
+ Rose, "DNS Security Introduction and Requirements", RFC
+ 4033, March 2005.
+
+ [RFC4034] Arends, R., Austein, R., Larson, M., Massey, D., and S.
+ Rose, "Resource Records for the DNS Security Extensions",
+ RFC 4034, March 2005.
+
+ [RFC4035] Arends, R., Austein, R., Larson, M., Massey, D., and S.
+ Rose, "Protocol Modifications for the DNS Security
+ Extensions", RFC 4035, March 2005.
+
+Author's Address
+
+ John C Klensin
+ 1770 Massachusetts Ave, #322
+ Cambridge, MA 02140
+ USA
+
+ Phone: +1 617 491 5735
+ EMail: john-ietf@jck.com
+
+
+
+
+
+
+Klensin Informational [Page 18]
+
+RFC 4185 Characters for DNS TLD Names October 2005
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2005).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78, and except as set forth therein, the authors
+ retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at ietf-
+ ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+Klensin Informational [Page 19]
+