diff options
Diffstat (limited to 'doc/rfc/rfc4185.txt')
-rw-r--r-- | doc/rfc/rfc4185.txt | 1067 |
1 files changed, 1067 insertions, 0 deletions
diff --git a/doc/rfc/rfc4185.txt b/doc/rfc/rfc4185.txt new file mode 100644 index 0000000..e21e40d --- /dev/null +++ b/doc/rfc/rfc4185.txt @@ -0,0 +1,1067 @@ + + + + + + +Network Working Group J. Klensin +Request for Comments: 4185 October 2005 +Category: Informational + + + National and Local Characters for DNS Top Level Domain (TLD) Names + +Status of This Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2005). + +IESG Note + + This RFC is not a candidate for any level of Internet Standard. The + IETF disclaims any knowledge of the fitness of this RFC for any + purpose and notes that the decision to publish is not based on IETF + review apart from IESG review for conflict with IETF work. The RFC + Editor has chosen to publish this document at its discretion. See + RFC 3932 [RFC3932] for more information. + +Abstract + + In the context of work on internationalizing the Domain Name System + (DNS), there have been extensive discussions about "multilingual" or + "internationalized" top level domain names (TLDs), especially for + countries whose predominant language is not written in a Roman-based + script. This document reviews some of the motivations for such + domains, several suggestions that have been made to provide needed + functionality, and the constraints that the DNS imposes. It then + suggests an alternative, local translation, that may solve a superset + of the problem while avoiding protocol changes, serious deployment + delays, and other difficulties. The suggestion utilizes a + localization technique in applications to permit any TLD to be + accessed using the vocabulary and characters of any language. It is + not restricted to language- or country-specific "multilingual" TLDs + in the language(s) and script(s) of that country. + + + + + + + + + +Klensin Informational [Page 1] + +RFC 4185 Characters for DNS TLD Names October 2005 + + +Table of Contents + + 1. Introduction ....................................................3 + 1.1. Terminology ................................................3 + 1.2. Background on the "Multilingual Name" Problem ..............3 + 1.2.1. Approaches to the Requirement .......................3 + 1.2.2. Writing the Name of One's Country in its Own + Characters ..........................................4 + 1.2.3. Countries with Multiple Languages and + Countries with Multiple .............................5 + 1.2.4. Availability of Non-ASCII Characters in Programs ....5 + 1.3. Domain Name System Constraints .............................6 + 1.3.1. Administrative Hierarchy ............................6 + 1.3.2. Aliases .............................................6 + 1.4. Internationalization and Localization ......................7 + 2. Client-Side Solutions ...........................................7 + 2.1. IDNA and the Client ........................................8 + 2.2. Local Translation Tables for TLD Names .....................8 + 3. Advantages and Disadvantages of Local Translation ...............9 + 3.1. Every TLD Appears in the Local Language and Character Set ..9 + 3.2. Unification of Country Code Domains .......................10 + 3.3. User Understanding of Local and Global References .........11 + 3.4. Limits on Expansion of the Number of TLDs .................11 + 3.5. Standardization of the Translations .......................12 + 3.6. Implications for Future New Domain Names ..................13 + 3.7. Mapping for TLDs, Not Domain Names or Keywords ............13 + 4. Information Interchange, IDNs, Comparisons, and Translations ...13 + 5. Internationalization Considerations ............................15 + 6. Security Considerations ........................................15 + 7. Acknowledgements ...............................................16 + 8. Informative References .........................................17 + + + + + + + + + + + + + + + + + + + + +Klensin Informational [Page 2] + +RFC 4185 Characters for DNS TLD Names October 2005 + + +1. Introduction + +1.1. Terminology + + This document assumes the conventional terminology used to discuss + the domain name system (DNS) and its hierarchical arrangements. + Terms such as "top level domain" (or just "TLD"), "subdomain", + "subtree", and "zone file" are used without further explanation. In + addition, the term "ccTLD" is used to denote a "country code top + level domain" and "gTLD" is used to denote a "generic top level + domain" as described in [RFC1591] and in common usage. + +1.2. Background on the "Multilingual Name" Problem + + People who share a language usually prefer to communicate in it, + using whatever characters are normally used to write that language, + rather than in some "foreign" one. There have been standards for + using mutually-agreed characters and languages in electronic mail + message bodies and selected headers since the introduction of MIME in + 1992 [MIME] and the Web has permitted multilingual text since its + inception, also using MIME. Actual use of non-Roman-character + content came even earlier, using private conventions. However, + domain names are exposed to users in email addresses and URLs. + Corresponding arrangements, typically also exposing domain names, are + made for other application protocols. The combination of exposed + domain names with internationalization requirements led rapidly to + demands to permit domain names in applications that used characters + other than those of the very restrictive, ASCII-subset, "hostname" + (or "letter-digit-hyphen" ("LDH")) conventions recommended in the DNS + specifications [RFC1035]. The effort to do this soon became known as + "multilingual domain names". That was actually a misnomer, since the + DNS deals only with characters and identifier strings, and not, + except by accident or local registration conventions, with what + people usually think of as "names". There has also been little + interest in what would actually be a "multilingual name", i.e., a + name that contains components from more than one language. Instead, + interest has focused on the use, in the context of the DNS, of + strings that conform to specific individual languages. + +1.2.1. Approaches to the Requirement + + When the requirement was seen, not as "modifying the DNS", but as + "providing users with access to the DNS from a variety of languages + and character sets", three sets of proposals emerged in the IETF and + elsewhere. They were: + + + + + + +Klensin Informational [Page 3] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + 1. Perform processing in client software that recodes a user-visible + string into an ASCII-compatible form that can safely be passed + through the DNS protocols and stored in the DNS. This is the + approach used, for example, in the IETF's "IDNA" protocol + [RFC3490]. + + 2. Modify the DNS to be more hospitable to non-ASCII names and + strings. There have been a variety of proposals to do this, + using several different techniques. Some of these have been + implemented on a proprietary basis by various vendors. None of + them have gained acceptance in the IETF community, primarily + because they would take a long time to deploy, would leave many + problems unsolved, and have been shown to cause problems with + deployed approaches that had not yet been upgraded. + + 3. Move the problem out of the DNS entirely, relying instead on a + "directory" or "presentation" layer to handle + internationalization. The rationale for this approach is + discussed in [RFC3467]. + + This document proposes a fourth approach, applicable to the top level + domains (TLDs) only (see Section 1.3.1 for a discussion of the + special issues that make TLDs both problematic and a special + opportunity). That approach involves having the user interface of + applications map non-ASCII names for TLDs to existing TLDs and could + be used as an alternate or supplement to the strategies summarized + above. + +1.2.2. Writing the Name of One's Country in its Own Characters + + An early focus of the "multilingual domain name" efforts was + expressed in statements such as "users in my country, in which ASCII + is rarely used, should be able to write an entire domain name in + their own character set". In particular, since all top-level domain + names, at present, follow the LDH rules, the modified naming rules + discussed in [RFC1123], and the coding conventions specified in + [RFC1591], all fully-qualified DNS names were effectively required to + contain at least one ASCII label (the TLD name). Some advocates for + internationalized names have considered the presence of any ASCII + labels inappropriate. One should, instead, be able to write the name + of the ccTLD for China in Chinese, the name of the ccTLD for Saudi + Arabia in Arabic, the name for Spain in Spanish, and so on. + + That much could be accomplished, given updated applications, by using + a new TLD name with IDNA encoding. Of course, adding such a TLD + would raise new questions: what to do about gTLDs, how to handle + countries with several official languages (perhaps even using + different scripts), how should name strings be chosen, and whether + + + +Klensin Informational [Page 4] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + there should be an attempt to coordinate the contents of the local- + language TLD zone and the traditional ISO 3166-coded one. A few of + these issues are addressed below. But, if one examines (or even + thinks about) user behavior and preferences, it is almost as + important that one be able to write the name of the ccTLD for China + in Arabic and that of Saudi Arabia in Chinese: true + internationalization implies that, at least to the extent to which + ambiguity and conflicts can be avoided, people should be able to use + the languages and character sets they prefer. For the same reasons + that one would like to have all-Chinese domain names available in + China, it is important to have the capability to have an apparent + Chinese-language TLD for a domain whose second level and beyond are + Chinese characters, even when the TLD itself serves predominantly + non-Chinese-speaking registrants and users. + +1.2.3. Countries with Multiple Languages and Countries with Multiple + Names + + From a user interface standpoint, writing ccTLD names in local + characters is a problem. As discussed below in Section 1.3.2, the + DNS itself does not easily permit a domain to be referred to by more + than one name (or spelling or translation of a name). Countries with + more than one official language would require that the country name + be represented in each of those languages. And, just as it is + important that a user in China be able to represent the name of the + Chinese ccTLD in Chinese characters, she should be able to access a + Chinese-language site in France using Chinese characters. That would + require that she be able to write the name of the French ccTLD in + Chinese characters rather than in a form based on a Roman character + set. + +1.2.4. Availability of Non-ASCII Characters in Programs + + Over the years, computer users have gotten used to the fact that not + every computer has a full set of characters available to every + program. An extreme example is an Arabic speaker using a public + kiosk computer in an airport in the United States: there is only a + small chance that the web browser there will be able to input and + render Arabic correctly. This has a direct effect on the + multilingual TLD problem in that it is not possible to simply change + a name of the ccTLDs in the DNS to be one of a given country's non- + ASCII names without possibly preventing people from entering those + names throughout the world. + + + + + + + + +Klensin Informational [Page 5] + +RFC 4185 Characters for DNS TLD Names October 2005 + + +1.3. Domain Name System Constraints + +1.3.1. Administrative Hierarchy + + The domain name system is firmly rooted in the idea of an + "administrative hierarchy", with the entity responsible for a given + node of the hierarchy responsible for policies applicable to its + subhierarchies (Cf. [RFC1034], [RFC1035], and [RFC1591]). The model + works quite well for the domain and subdomains of a particular + enterprise. In an enterprise situation, the hierarchy can be + organized to match the organizational structure; there are + established ways to set policies; and there are, at least presumably, + shared assumptions about overall goals and objectives among all + registrants in the domain. It is more problematic when a domain is + shared by unrelated entities that lack common policy assumptions + because it is difficult to reach agreement on rules that should apply + to all of the entities and subdomains of such a domain. In general, + the unrelated entities situation always prevails for the labels + registered in a TLD (second-level names). Exceptions occur in those + TLDs for which the second level is structural (e.g., the .CO, .AC, + .GOV conventions in many ccTLDs or in the historical geographical + organization of .US [RFC1480]). In those cases, it exists for the + labels within that structural level. + + TLDs may, but need not, have consistent registration policies for + those second (or third) level names. Countries (or ccTLD + administrators) have often adopted rules about what entities may + register in their ccTLDs, and what forms the names may take. RFC + 1591 outlined registration norms for most of the then-extant gTLDs; + however, those norms have been largely ignored in recent years. Some + recent "sponsored" and purpose-specific domains are based on quite + specific rules about appropriate registrations. Homogeneous + registration rules for the root are, by contrast, impossible: almost + by definition, the subdomains registered in the root (TLDs) are + diverse, and no single policy about types and formats of names + applying to all root subdomains is feasible. + +1.3.2. Aliases + + In an environment different from the DNS, a rational way to permit + assigning local-language names to a country code (or other) domain + would be to set up an alias for the name, or to use some sort of "see + instead" reference. But the DNS does not have facilities for either. + Instead, it supports a "CNAME" record, whose label can refer only to + a particular label and not to a subtree. For example, if A.B.C is a + fully-qualified name, then a CNAME reference in B.C from X to A would + make X.B.C appear to have the same values as A.B.C. However, a CNAME + reference from Y to C in the root would not make A.B.Y referenceable + + + +Klensin Informational [Page 6] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + (or even defined) at all. A second record type, DNAME [RFC2672], can + provide an alias for a portion of the tree. But many believe that it + is problematic technically. At a minimum, it can cause + synchronization issues when references across zones occur, and its + use has been discouraged within the IETF, except as a means of + enabling a transition from one domain to another. Even if the design + of yet another alias-type record type were contemplated, DNS + technical constraints of query-response integrity and DNSSec zone + signing (cf. [RFC4033], [RFC4034], and [RFC4035]) make it extremely + unlikely that one could be defined that would meet the desired + requirements for "see instead" or true synonym references. + +1.4. Internationalization and Localization + + It has often been observed that, while many people talk about + "internationalization", they often really mean, and want, + "localization". "Internationalization", in this context, suggests + making something globally accessible while incorporating a broad- + range "universal" character set and conventions appropriate to all + languages and cultures. "Localization", by contrast, involves having + things work well in a particular locality or for a broad range of + localities, although aspects of the style of operation might differ + for each locality. Anything that actually involves the DNS must be + global, and hence internationalized, since the DNS cannot + meaningfully support different responses or query and matching models + based, e.g., on the location of the user making a query. While the + DNS cannot support localization internally, many of the features + discussed earlier in this section are much more easily thought about + in local terms -- whether localized to a geographical area, users of + a language, or using some other criteria -- than in global ones. + +2. Client-Side Solutions + + Traditionally, the IETF avoided becoming involved in standardization + for actions that take place strictly on individual hosts on the + network, instead confining itself to behavior that is observable "on + the wire", i.e., in protocols between network hosts. Exceptions to + this general principle have been made when different clients were + required to utilize data or interpret values in compatible ways to + preserve interoperability: the standards for email and web body + formats, and IDNA itself, are examples of these exceptions. + Regardless of what is required to be standardized, it is almost never + required, and often unwise, that a user interface present "on the + wire" formats to the user, at least by default (debugging options + that show the wire formats are common and often quite useful). + However, in most cases when the presentation format and the wire + format differ, the client program must take precautions to ensure + that the wire format can be reconstructed from user input, or to keep + + + +Klensin Informational [Page 7] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + the wire format, while hidden, bound to the presentation mechanism so + that it can be reconstructed. While it is rarely a goal in itself, + it is often necessary that the user be at least vaguely aware that + the wire ("real") format is different from the presentation one and + that the wire format be available for debugging. + + In fact, the DNS itself is an excellent example of the difference + between the wire format and the user presentation format. Most + Internet users do not realize that the wire format for DNS queries + and responses does not include the "." character. Instead, each + label is represented by a length in bytes of the label, followed by + the label itself. + +2.1. IDNA and the Client + + As mentioned above, IDNA itself is entirely a client-side protocol. + It works by performing some mappings and then encoding labels to be + placed into the DNS in a special format called "punycode" [RFC3492]. + When labels in that format are encountered, they are transformed, by + the client, back into internationalized (normally Unicode [ISO10646]) + characters. In the context of this document, the important + observation about IDNA is that any application program that supports + it is already doing considerable transformation work in the client; + it is not simply presenting the "on the wire" formats to the user. + It is also the case that, if an application implementation makes + different mappings than those called for by IDNA, it is likely to be + detected only when, and if, users complain about unexpected behavior. + As long as the punycode strings sent to it are valid, the server + cannot tell what mappings were applied to develop those strings. + +2.2. Local Translation Tables for TLD Names + + We suggest that, in addition to maintaining the code and tables + required to support IDNA, authors of application programs may want to + maintain a table that contains a list of TLDs and locally-desirable + names for each one. For ccTLDs, these might be the names (or + locally-standard abbreviations) by which the relevant countries are + known locally (whether in ASCII characters or others). With some + care on the part of the application designer (e.g., to ensure that + local forms do not conflict with the actual TLD names), a particular + TLD name input from the user could be either in local or standard + form without special tagging or problems. When DNS names are + received by these client programs, the TLD labels would be mapped to + local form before IDNA is applied to the rest of the name; when names + are received from users, local TLD names would be mapped to the + global ones before applying IDNA or being used in other DNS + processing. + + + + +Klensin Informational [Page 8] + +RFC 4185 Characters for DNS TLD Names October 2005 + + +3. Advantages and Disadvantages of Local Translation + +3.1. Every TLD Appears in the Local Language and Character Set + + The notion of a top-level domain whose name matches, e.g., the name + that is used for a country in that country or the name of a language + in that language as, as mentioned above, is immediately appealing. + But most of the reasons for it argue equally strongly for other TLDs + being accessible from that language. A user in Korea who can access + the national ccTLD in the Korean language and character set has every + reason to expect that both generic top level domains and domains + associated with other countries would be similarly accessible, + especially if the second-level domains bear Korean names. A user + native to Spain or Portugal, or in Latin America, would presumably + have similar expectations, but would expect to use Spanish or + Portuguese names, not Korean ones. + + That level of local optimization is not realistic -- some would argue + not possible -- with the DNS since it would ultimately require that + every top level domain be replicated for each of the world's + languages. That replication process would involve not just the top + level domain itself; in principle, all of its subtrees would need to + be completely replicated as well. Perhaps in practice, not all + subtrees would require replication, but only those for which a + language variation or translation was significant. But, while that + restriction would change the scale of the problem, it would not alter + its basic nature. The administrative hierarchy characteristics of + the DNS (see Section 1.3.1) turn the replication process into an + administrative nightmare: every administrator of a second-level + domain in the world would be forced to maintain dozens, probably + hundreds, of similar zone files for the replicates of the domain. + Even if only the zones relevant to a particular country or language + were replicated, the administrative and tracking problems to bind + these to the appropriate top-level domain and keep all of the + replicas synchronized would be extremely difficult at best. And many + administrators of third- and fourth-level domains, and beyond, would + be faced with similar problems. + + By contrast, dealing with the names of TLDs as a localization + problem, using local translation, is fairly simple, although it + places some burden of understanding on the user (see Section 4). + Each function represented by a TLD -- a country, generic + registrations, or purpose-specific registrations -- could be + represented in the local language and character set as needed. And, + for countries with many languages -- or users living, working in, or + visiting countries where their language is not dominant -- "local" + could be defined in terms of the needs or wishes of each particular + user. + + + +Klensin Informational [Page 9] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + An additional benefit is that, if two countries called themselves by + the same name in their local languages -- if, e.g., Western Slobbovia + and Eastern Slobbovia both called themselves "Slobland" -- local + conventions could be followed as long as users understood that only + internal forms (in this case, the ISO 3166-based ccTLD name) could be + exported outside the country (see Section 3.3). + + Note that this proposal is to allow mapping of native-language + strings to existing TLDs. It would almost certainly be ill-advised + to stretch this idea too far and try to map strings that local users + would be unlikely to guess into TLDs. For example, there are + probably no languages in which the country known in English as + "Finland" is called "FI". Thus, one would not want to create a + mapping from two characters that look or sound like a Roman "F" and a + Roman "I" to the ccTLD ".fi". + +3.2. Unification of Country Code Domains + + It follows from some of the comments above that, while there appears + to be some immediate appeal from having (at least) two domains for + each country, one using the ISO 3166-1 code [ISO3166] and another one + using a name based on the national name in the national language, + such a situation would create considerable problems for registrants + in both domains. For registrants maintaining enterprise or + organizational subdomains, ease of administration of a single family + of zone files will usually make a registration in a single top-level + domain preferable to replicated sets of them, at least as long as + their functional requirements (such a local-language access) are met + by the unified structure. For those registrants with no interest in + any Internet function or protocols other than use of the HTTP/HTTPS- + based web, this problem can be dealt with at the applications level + by the use of redirects but, in the general case, that is not a + feasible solution. + + For countries with multiple national languages that are considered + equal and legally equivalent, the advantages of a translation-based + approach, rather than multiple registrations and replicated trees, + would be even more significant. Actually installing and maintaining + a separate TLD for each language would be an administrative + nightmare, especially if it was intended that the associated zones be + kept synchronized. The oft-suggested proposal to adopt an "exactly + one extra domain for each country" rule would essentially require + some of the multiple-official-language countries to violate their own + constitutions. Conversely, having multiple domains for a given + country, based on the number of official languages and without any + expectation of synchronization, would give some countries an + additional allocation of TLDs that others would certainly consider + unfair. + + + +Klensin Informational [Page 10] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + Of course, having replicated domains might be popular with some + registries and registrars, since replication would almost inevitably + increase the total number of domains to be registered. Helping that + group of registries and registrars, while hurting Internet users by + adding administrative overhead and confusion, is not a goal of this + document. + +3.3. User Understanding of Local and Global References + + While the IDNA tables (actually Nameprep [RFC3491] and Stringprep + [RFC3454]) must be identical globally for IDNA to work reliably, the + tables for mapping between local names and TLD names could be locally + determined, and differ from one locale to another, as long as users + understood that international interchange of names required using the + standard forms. That understanding puts some additional burden of + learning on users, although part of it could be assisted by software + (see Section 4). + + In any event, at least in the foreseeable future, it is likely that + DNS names being passed among users in different countries, or using + different languages, will be forced to be in punycode form to + guarantee compatibility, since those users would not, in general, + have the ability to read each other's scripts or have appropriate + input facilities (keyboards, etc.) for then. So the marginal + knowledge or effort needed to put TLD names into standard form and + transmit them in that way would actually be fairly small. + +3.4. Limits on Expansion of the Number of TLDs + + The concept of using local translation does have one side effect that + some portions of the Internet community might consider undesirable. + The size and complexity of translation tables, and maintaining those + tables, will be, to a considerable extent, a function of the number + of top-level domains of interest, the frequency with which new + domains are added, and the number of domains added at a time. A + country or other locale that wished to maintain a complete set of + translations (i.e., so that every TLD had a representation in the + local language) would presumably find setting up a table for the + current collection of a few hundred domains to be a task that would + take some days. If the number of TLDs were relatively stable, with a + relatively small number being added at infrequent intervals, the + updates could probably be dealt with on an ad hoc basis. But, if + large numbers of domains were added frequently, or if the total + number of TLDs became very large, maintaining the table might require + dedicated staff if each new TLD is to be accommodated. Worse, + updating the tables stored on client machines might require update + + + + + +Klensin Informational [Page 11] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + and synchronization protocols and all of the complexities that tend + to go with them (see [RFC3696] for a discussion of some related + issues in applications). + + In practice, there will be little requirement to translate every TLD + into a local language. There are already existing TLDs for which + there is no obvious translations in many languages (most notably, + ".arpa") or where the translation will be far from obvious to typical + users (for example, ".int" and ".aero"). Of course, these could be + translated by function: ".arpa" to the local term for + "infrastructure", ".int" with "international" or "international + organization", ".aero" with "aeronautical" or "airlines", and so on; + but it is not clear whether doing so would have significant value. + For almost every language, there are dozens of ccTLDs for which there + are no translations of the country names into the local language that + would be known by anyone other than geographers. If new TLDs are + added, there might not be a strong need (or even capability) to have + language-specific equivalents for each. + +3.5. Standardization of the Translations + + An immediate question when proposals such as this one are considered + is whether the names for the various TLDs that do not match the + strings that are actually in the DNS should be standardized and, if + so, by what mechanism. Standardization would promote communication + within a country or among people sharing a language. However, it is + likely to be very difficult to reach appropriate international + agreements to which wide conformance could be expected. Exceptions + might arise within particular countries or language groups but, even + then, there might be advantages to users being able to specify + additional synonymous names that are easy for them to remember. As + with IDNA-based IDNs, users who wish to transmit information about + domain names to people whose exact capabilities and software are + unknown, and to do so with minimal risk of confusion, will probably + confine themselves to the names that actually appear in the DNS, + i.e., the "punycode" representations. + + In any event, neither standardization nor uniform use of either the + system outlined here or of a specific collection of names is required + to make the system work for those who would find it useful. + Similarly, mechanisms for country-wide coordination, and examination + of the appropriateness or inappropriateness of such mechanisms, is + beyond the scope of this document. + + + + + + + + +Klensin Informational [Page 12] + +RFC 4185 Characters for DNS TLD Names October 2005 + + +3.6. Implications for Future New Domain Names + + Applications that implement the proposal in this document are likely + to make the subsequent creation and acceptance of new IDNA-based TLDs + significantly more difficult. If this proposal becomes widely + adopted, local language names mapped as it suggests will be generally + expected by users of those languages to mean the same as a current + TLD. Creating a new, stand-alone IDNA-based TLD will then require + more deliberation and care to avoid conflicts and, when executed, + will require all the application software that maps the name to the + existing TLD to change the mapping tables. + + For several reasons, this problem may not be as serious in practice + as it might first appear. For ccTLDs allocated according to the ISO + 3166-1 list, there will presumably be no problem at all: not only are + the 3166-1 alpha-2 codes strictly in ASCII, but general trends, such + as those embodied in ICANN's "GAC Recommendations" against using + country names or codes for any purpose not associated with those + specific countries, make conflicts with internationalized names + extremely unlikely. Because the DNS does not currently have a usable + aliasing function (see Section 1.3.2), it is likely that new IDNA- + based TLDs will be allocated only after there is considerable + opportunity for countries and other individual entities to identify + any problems they see with proposed new names. + +3.7. Mapping for TLDs, Not Domain Names or Keywords + + It should be clear to anyone who has read this far that the mapping + described in this document is limited to TLDs, not full domain names + or keywords. In particular, nothing here should be construed as + applying to anything other than TLDs, due at least in part to the + limitations described in Section 3.1. Further, this document is only + about the domain name system (DNS), not about any keyword system. + The interactions between particular keyword systems and the proposals + here are left as a (possibly very difficult) exercise for the reader + or implementer of such systems. However, for the subset of such + systems whose intent is to entirely hide DNS names or URIs from the + user, their output would presumably be the LDH names that actually + appeared in the DNS, i.e., in punycode form for IDNA names and + without any application processing of the type contemplated here. + +4. Information Interchange, IDNs, Comparisons, and Translations + + This specification is based on a pair of fairly explicit assumptions. + The first is that the greatest and most important impact and value of + any internationalization or localization technique is to permit users + who share a language or culture to communicate with others who also + share that language or culture. Communication among users from + + + +Klensin Informational [Page 13] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + different cultures, using different languages or different scripts is + inherently more difficult, and still more difficult if they cannot + easily identify languages and scripts in common. The reason for + those difficulties are age-old issues in language translation and + differences among languages and scripts, not problems associated with + the DNS or IDNs, however they are represented. That is the second + assumption: when communication across language or cultural groups is + required, the users who need to do it -- typically a much smaller + number than those communicating within the same language and culture + -- are going to need to rely on commonly-understood languages and + scripts and will need to exert somewhat more care and effort than + within their own groups. + + As outlined in the sections above, the suggestions made in this + document could clearly be turned into major problems by misuse or + misunderstanding. For example, if two applications on the same host + used different translation tables, a situation could easily result + that would be very confusing to the user. However, in some cases, + this would be only slightly worse than some of the alternatives. For + example, if, on a given system, IDNs are expressed in native script, + but ASCII TLD names are used, cutting and pasting from one + application to another may not work as expected, unless both + applications and the underlying operating system are all Unicode- + based and use the same encoding model for Unicode. Some applications + writers have already discovered, even without significant use of + IDNs, that they need to support separate "copy string" and "copy link + location", and the corresponding "paste" operations. Any use of IDNs + or Internationalized Resource Identifiers (IRIs, see [RFC3987]) may + require similar operations, or extensions to those operations, to + force strings into internal ("punycode" or URI) form on the copy + operation and to translate them back on paste. Were that done, the + appropriate translations could be performed as part of the same + process. If this author's hypothesis is correct -- that these + operations are likely to be required on many systems whether this + proposal is adopted or not -- then the additional translation + operations are likely to be invisible to the user. + + In particular, precisely because the translated names proposed here + are part of a presentation form, rather than the internal form names, + they are inappropriate in a number of circumstances in which a + globally-unique, internal-form name is actually required. It would + be a poor, indeed dangerous, idea to use these names in security + contexts such as names in certificates, access lists, or other + contexts in which accurate comparisons are necessary. + + A more general issue exists when DNS or IRI references are + transferred among users whose systems may be localized for different + languages or conventions. In general, a user in one part of the + + + +Klensin Informational [Page 14] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + world will not actually know how another user's systems are set up, + precisely what software is being used, etc., nor should users be + expected or forced to learn that information. But, if the user + transmitting an internationalized reference doesn't know that the + receiving system supports the same characters and fonts, and that the + receiving user is prepared to deal with them, the prudent user will + transmit the internal form of the reference in addition to, or even + instead of, the native-character form. And, of course, if the + reference is transmitted on paper, on a sign, in some coded character + set other than Unicode, or even as an image, rather than as a Unicode + string, the importance of supplementing it with the internal form + becomes even more important. The addition of a translation + requirement for TLD labels makes availability of internal forms in + interchange significantly more important, but does not actually + change the requirement to do so. + + It may be helpful to note that, in a different networking model than + that used in the Internet, both this proposal and IDNA itself are + essentially "presentation layer" approaches rather than constructions + that can be expected to work well in interchange. + +5. Internationalization Considerations + + This entire specification addresses issues in internationalization + and especially the boundaries between internationalization and + localization and between network protocols and client/user interface + actions. + +6. Security Considerations + + IDNA provides a client-based mechanism for presenting Unicode names + in applications while passing only ASCII-based names on the wire. As + such, it constitutes a major step along the path of introducing a + client-based presentation layer into the Internet. Client-based + presentation layer transformations introduce risks from non- + conforming tables that can change meaning without external + protection. For example, if a mapping table normally maps A onto C, + and that table is altered by an attacker so that A maps onto D + instead, much mischief can be committed. On the other hand, these + are not the usual sort of network attacks: they may be thought of as + falling into the "users can always cause harm to themselves" + category. The local translation model outlined here does not + significantly increase the risks over those associated with IDNA, but + may provide some new avenues for exploiting them. + + Both this approach and IDNA rely on having updated programs present + information to the user in a very different form than the one in + which it is transmitted on the wire. Unless the internal (wire) form + + + +Klensin Informational [Page 15] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + is always used in interchange, or at least made available when DNS + names are exchanged, there are possibilities for ambiguity and + confusion about references. As with IDNA itself, if only the "wire" + form is presented, the user will perceive that nothing of value has + been done, i.e., that no internationalization or localization has + occurred. So presentation of the "wire" form to eliminate the + potential ambiguities is unlikely to be considered an acceptable + solution, regardless of its security advantages. + + If the translation tables associated with the technique suggested + here are obtained from a server, or translations are obtained from a + remote machine using some protocol, the mechanisms used should ensure + that the values received are authentic, i.e., that neither they, nor + the query for them, have been intercepted and tampered with in any + way. + +7. Acknowledgements + + This document was inspired by a number of conversations in ICANN, + IETF, MINC, and private contexts about the future evolution and + internationalization of top level domains. Unknown to the author, + but unsurprisingly (the general concept should be obvious to anyone + even slightly skilled in the relevant technologies), the concept has + been apparently developed independently in other groups but, as far + as this author knows, not written up for general comment. + Discussions within, and about, the ICANN IDN Committee were + particularly helpful, although several of the participants in that + committee may be surprised about where those discussions led. Email + correspondence with several people after the first version of this + document was posted, notably Richard Hill, Paul Hoffman, Lee + XiaoDong, and Soobok Lee, led to considerable clarification in the + subsequent versions. The author is particularly grateful to Paul + Hoffman for extensive comments and additional text for the third + version and to Patrik Faltstrom, Joel Halpern, Sam Hartman, and Russ + Housley for suggestions incorporated into the final one. + + The first version of this document was posted on October 21, 2002. + + + + + + + + + + + + + + +Klensin Informational [Page 16] + +RFC 4185 Characters for DNS TLD Names October 2005 + + +8. Informative References + + [ISO10646] International Organization for Standardization, + "Information Technology - Universal Multiple-octet coded + Character Set (UCS) - Part 1: Architecture and Basic + Multilingual Plane", ISO Standard 10646-1, May 1993. + + [ISO3166] International Organization for Standardization, "Codes for + the representation of names of countries and their + subdivisions -- Part 1: Country codes", ISO Standard + 3166-1:1977, 1997. + + [MIME] Borenstein, N. and N. Freed, "MIME (Multipurpose Internet + Mail Extensions): Mechanisms for Specifying and Describing + the Format of Internet Message Bodies", RFC 1341, June + 1992. + + Updated and replaced by Freed, N. and N. Borenstein, + "Multipurpose Internet Mail Extensions (MIME) Part One: + Format of Internet Message Bodies", RFC2045, November + 1996. Also, Moore, K., "Representation of Non-ASCII Text + in Internet Message Headers", RFC 1342, June 1992. + Updated and replaced by Moore, K., "MIME (Multipurpose + Internet Mail Extensions) Part Three: Message Header + Extensions for Non-ASCII Text", RFC 2047, November 1996. + + [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", + STD 13, RFC 1034, November 1987. + + [RFC1035] Mockapetris, P., "Domain names - implementation and + specification", STD 13, RFC 1035, November 1987. + + [RFC1123] Braden, R., "Requirements for Internet Hosts - Application + and Support", STD 3, RFC 1123, October 1989. + + [RFC1480] Cooper, A. and J. Postel, "The US Domain", RFC 1480, June + 1993. + + [RFC1591] Postel, J., "Domain Name System Structure and Delegation", + RFC 1591, March 1994. + + [RFC2672] Crawford, M., "Non-Terminal DNS Name Redirection", RFC + 2672, August 1999. + + [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of + Internationalized Strings ("stringprep")", RFC 3454, + December 2002. + + + + +Klensin Informational [Page 17] + +RFC 4185 Characters for DNS TLD Names October 2005 + + + [RFC3467] Klensin, J., "Role of the Domain Name System (DNS)", RFC + 3467, February 2003. + + [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, + "Internationalizing Domain Names in Applications (IDNA)", + RFC 3490, March 2003. + + [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep + Profile for Internationalized Domain Names (IDN)", RFC + 3491, March 2003. + + [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode + for Internationalized Domain Names in Applications + (IDNA)", RFC 3492, March 2003. + + [RFC3696] Klensin, J., "Application Techniques for Checking and + Transformation of Names", RFC 3696, February 2004. + + [RFC3932] Alvestrand, H., "The IESG and RFC Editor Documents: + Procedures", BCP 92, RFC 3932, October 2004. + + [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource + Identifiers (IRIs)", RFC 3987, January 2005. + + [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. + Rose, "DNS Security Introduction and Requirements", RFC + 4033, March 2005. + + [RFC4034] Arends, R., Austein, R., Larson, M., Massey, D., and S. + Rose, "Resource Records for the DNS Security Extensions", + RFC 4034, March 2005. + + [RFC4035] Arends, R., Austein, R., Larson, M., Massey, D., and S. + Rose, "Protocol Modifications for the DNS Security + Extensions", RFC 4035, March 2005. + +Author's Address + + John C Klensin + 1770 Massachusetts Ave, #322 + Cambridge, MA 02140 + USA + + Phone: +1 617 491 5735 + EMail: john-ietf@jck.com + + + + + + +Klensin Informational [Page 18] + +RFC 4185 Characters for DNS TLD Names October 2005 + + +Full Copyright Statement + + Copyright (C) The Internet Society (2005). + + This document is subject to the rights, licenses and restrictions + contained in BCP 78, and except as set forth therein, the authors + retain all their rights. + + This document and the information contained herein are provided on an + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET + ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, + INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE + INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + + The IETF takes no position regarding the validity or scope of any + Intellectual Property Rights or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; nor does it represent that it has + made any independent effort to identify any such rights. Information + on the procedures with respect to rights in RFC documents can be + found in BCP 78 and BCP 79. + + Copies of IPR disclosures made to the IETF Secretariat and any + assurances of licenses to be made available, or the result of an + attempt made to obtain a general license or permission for the use of + such proprietary rights by implementers or users of this + specification can be obtained from the IETF on-line IPR repository at + http://www.ietf.org/ipr. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights that may cover technology that may be required to implement + this standard. Please address the information to the IETF at ietf- + ipr@ietf.org. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + +Klensin Informational [Page 19] + |