summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc4290.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4290.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc4290.txt')
-rw-r--r--doc/rfc/rfc4290.txt1571
1 files changed, 1571 insertions, 0 deletions
diff --git a/doc/rfc/rfc4290.txt b/doc/rfc/rfc4290.txt
new file mode 100644
index 0000000..c123ccc
--- /dev/null
+++ b/doc/rfc/rfc4290.txt
@@ -0,0 +1,1571 @@
+
+
+
+
+
+
+Network Working Group J. Klensin
+Request for Comments: 4290 December 2005
+Category: Informational
+
+
+ Suggested Practices for Registration of
+ Internationalized Domain Names (IDN)
+
+Status of This Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2005).
+
+IESG Note
+
+ This RFC is not a candidate for any level of Internet Standard. The
+ IETF disclaims any knowledge of the fitness of this RFC for any
+ purpose and notes that the decision to publish is not based on IETF
+ review apart from IESG review for conflict with IETF work. The RFC
+ Editor has chosen to publish this document at its discretion. See
+ RFC 3932 for more information.
+
+Abstract
+
+ This document explores the issues in the registration of
+ internationalized domain names (IDNs). The basic IDN definition
+ allows a very large number of possible characters in domain names,
+ and this richness may lead to serious user confusion about similar-
+ looking names. To avoid this confusion, the IDN registration process
+ must impose rules that disallow some otherwise-valid name
+ combinations. This document suggests a set of mechanisms that
+ registries might use to define and implement such rules for a broad
+ range of languages, including adaptation of methods developed for
+ Chinese, Japanese, and Korean domain names.
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Informational [Page 1]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 1.1. Background .................................................3
+ 1.2. The Nature and Status of these Recommendations .............4
+ 1.3. Terminology ................................................5
+ 1.3.1. Languages and Scripts .................................5
+ 1.3.2. Characters, Variants, Registrations, and Other
+ Issues ................................................6
+ 1.3.3. Confusion, Fraud, and Cybersquatting ..................9
+ 1.4. A Review of the JET Guidelines .............................9
+ 1.4.1. JET Model .............................................9
+ 1.4.2. Reserved Names and Label Packages ....................10
+ 1.5. Languages, Scripts, and Variants ..........................11
+ 1.5.1. Languages versus Scripts .............................11
+ 1.5.2. Variant Selection ....................................13
+ 1.6. Variants are not a Universal Remedy .......................14
+ 1.7. Reservations and Exclusions ...............................14
+ 1.7.1. Sequence Exclusions for Valid Characters .............14
+ 1.7.2. Character Pairing Issues .............................15
+ 1.8. The Registration Bundle ...................................15
+ 1.8.1. Definitions and Structure ............................15
+ 1.8.2. Application of the Registration Bundle ...............16
+ 2. Some Implications of This Approach .............................17
+ 3. Possible Modifications of the JET Model ........................18
+ 4. Conclusions and Recommendations About the General Approach .....18
+ 5. A Model Table Format ...........................................19
+ 6. A Model Label Registration Procedure: "CreateBundle" ...........20
+ 6.1. Description of the CreateBundle Mechanism .................21
+ 6.2. The "no-variants" Case ....................................22
+ 6.3. CreateBundle and Nameprep Mapping .........................22
+ 7. IANA Considerations ............................................23
+ 8. Internationalization Considerations ............................24
+ 9. Security Considerations ........................................24
+ 10. Acknowledgements ..............................................25
+ 11. Informative References ........................................26
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Informational [Page 2]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+1. Introduction
+
+1.1. Background
+
+ The IDNA (Internationalized Domain Names in Applications)
+ specification [RFC3490] defines the basic model for encoding non-
+ ASCII strings in the DNS. Additional specifications [RFC3491]
+ [RFC3492] define the mechanisms and tables needed to support IDNA.
+ As work on these specifications neared completion, it became apparent
+ that it would be desirable for registries to impose additional
+ restrictions on the names that could actually be registered (e.g.,
+ see [IESG-IDN] and [ICANN-IDN]) to reduce potential confusion among
+ characters that were similar in some way. This document explores
+ these IDN (international domain name) registration issues and
+ suggests a set of mechanisms that IDN registries might use.
+
+ Registration restrictions are part of a long tradition. For example,
+ while the original DNS specifications [RFC1035] permitted any string
+ of octets in a DNS label, they also recommended the use of a much
+ more restricted subset. This subset was derived from the much older
+ "hostname" rules [RFC952] and defined by the "LDH" convention (for
+ the three permitted types of characters: letters, digits, and the
+ hyphen). Enforcement of this restricted subset in registrations was
+ the responsibility of the registry or domain administrator. The
+ definition of the subset was embedded in the DNS protocol itself,
+ although some applications protocols, notably those concerned with
+ electronic mail, did impose and enforce similar rules.
+
+ If there are no constraints on registration in a zone, people can
+ register characters that increase the risk of misunderstandings,
+ cybersquatting, and other forms of confusion. A similar situation
+ existed even before the introduction of IDNA, as exemplified by
+ domain names such as example.com and examp1e.com (note that the
+ latter domain contains the digit "1" instead of the letter "l").
+
+ For non-ASCII names (so-called "internationalized domain names" or
+ "IDNs"), the problem is more complicated. In the earlier situation
+ that led to the LDH (hostname) rules, all protocols, hosts, and DNS
+ zones used ASCII exclusively in practice, so the LDH restriction
+ could reasonably be applied uniformly across the Internet. Support
+ for IDNs introduces a very large character repertoire, different
+ geographical and political locations, and languages that require
+ different collections of characters. The optimal registration
+ restrictions are no longer a global matter; they may be different in
+ different areas and, hence, in different DNS zones.
+
+
+
+
+
+
+Klensin Informational [Page 3]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ For some human writing systems, there are characters and/or strings
+ that have equivalent or near-equivalent usages. If a name can be
+ registered with such a character or string, the registry might want
+ to automatically associate all of the names that have the same
+ meaning with the registered name. The registry might also decide
+ whether the names that are associated with, or generated by, one
+ registration should, as a group or individually, go into the zone or
+ should be blocked from registration by different parties.
+
+ To date, the best-developed system for handling registration
+ restrictions for IDNs is the JET Guidelines for Chinese, Japanese,
+ and Korean [RFC3743], the so-called "CJK" languages. The JET
+ Guidelines are limited to the CJK languages and, in particular, to
+ their common script base. Those languages are also the best-known
+ and most widely-used examples of writing systems constructed on
+ "ideographic" or "pictographic" principles. This document explores
+ the principles behind the JET guidelines. It then examines some of
+ the issues that might arise in adapting them to alphabetic languages,
+ i.e., to languages whose characters primarily represent sounds rather
+ than meanings.
+
+ This document describes five things:
+
+ 1. The general background and considerations for non-ASCII scripts
+ in names.
+
+ 2. Suggested practices for describing character variants.
+
+ 3. A method for using a zone's character variants to determine which
+ names should be associated with a registration.
+
+ 4. A format for publishing a zone's table of character variants;
+ Such tables are referred to below simply as "language tables" or
+ simply "tables".
+
+ 5. A model algorithm for name registration given the presence of
+ language tables.
+
+1.2. The Nature and Status of these Recommendations
+
+ The document makes recommendations for consideration by registries
+ and, where relevant, by those who coordinate them, and by those who
+ use their services. None of the recommendations are intended to be
+ normative. Instead, the intent of the document is to illustrate a
+ framework for developing variations to meet the needs of particular
+ registries and their processing of particular languages. Of course,
+ if registries make similar decisions and utilize similar tools, costs
+
+
+
+
+Klensin Informational [Page 4]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ and confusion may be reduced -- both between registries and for users
+ and registrars who have relationships with more than one domain.
+
+ Just as the JET Guidelines contain some suggestions that may not be
+ applicable to alphabetic scripts, some of the suggestions here,
+ especially the more specific ones, may be applicable to some scripts
+ and not others.
+
+1.3. Terminology
+
+1.3.1. Languages and Scripts
+
+ This document uses the term "language" in what may be, to many
+ readers, an odd way. Neither this specification, nor IDNA, nor the
+ DNS are directly concerned with natural language, but only with the
+ characters that make up a given label. In some respects, the term
+ "script", used in the character coding community for a collection of
+ characters, might be more appropriate. However, different subsets of
+ the same script may be used with different languages, and the same
+ language may be written using different characters (or even
+ completely different scripts) in different locations, so "script" is
+ not precisely correct either.
+
+ Long-standing confusion has also resulted from the fact that most
+ scripts are, informally at least, named after one of the languages
+ written in them. "Chinese" describes both a language and a
+ collection of characters that are also used in writing Japanese,
+ Korean, and, at least historically, some other languages. "Latin"
+ describes a language, the characters used to write that language,
+ and, often, characters used to write a number of contemporary
+ languages that are derived from or similar to those used to write the
+ Latin language. The script used to write the Arabic language is
+ called "Arabic", but it is also used (typically with some additions
+ or deletions) to write a number of other languages. Situations in
+ which a script has a clearly-defined name that is independent of the
+ name of a language are the exception, rather than the rule; examples
+ include Hangul, used to write Korean, Katakana and Hiragana, used to
+ write Japanese, and a few others. Some scholars have historically
+ used "Roman" or "Roman-derived" for the script in an attempt to
+ distinguish between a script and the Latin language.
+
+ The term "language" is therefore used in this document in the
+ informal sense of a written language and is defined, for this
+ purpose, by the characters used to write it, i.e., as a language-
+ specific subset of a script. In this context, a "language" is
+ defined by the combination of a code (see Section 1.4.1) and an
+ authority that has chosen to use that code and establish a
+ character-listing for it. Authorities are normally TLD (top-level
+
+
+
+Klensin Informational [Page 5]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ domain) registries; see Section 7 and [IANA-language-registry].
+ However, it is expected that TLD registries will find appropriate
+ experts and that advice from language and script experts selected by
+ international neutral bodies will also become part of the
+ registration system. In addition, as discussed below in Section 7,
+ registries may conclude that the best interests of registrants,
+ stakeholders, and the Internet community would be served by
+ constructing "language tables" that mix scripts and characters in
+ ways that conform to no known language. Conventions should be
+ developed for such registrations that do not misleadingly reflect
+ specific language codes.
+
+1.3.2. Characters, Variants, Registrations, and Other Issues
+
+ 1. Characters in this document are specified by their Unicode
+ codepoints in U+xxxx format, by their official names, or both.
+
+ 2. The following terms are used in this document.
+
+ * String
+
+ A "string" is an sequence of one or more characters.
+
+ * Base Character
+
+ This document discusses characters that may have equivalent or
+ near-equivalent characters or strings. A "base character" is
+ a character that has zero or more equivalents. In the JET
+ Guidelines, base characters are referred to as "valid
+ characters". In a table with variants, as described in
+ Section 5, the base characters occupy the first column.
+ Normally (and always, if the recommendation of Section 6.3 is
+ adopted), the base characters will be the characters that
+ appear in registration requests from registrants; any other
+ character will invalidate the registration attempt.
+
+ * Native Script
+
+ Native script is the form in which the relevant string would
+ normally be represented. For example, it might use Lower
+ Slobbovian characters and the glyphs normally used to write
+ them. It would not be punycode as a presentation form.
+
+ * Variant Characters/Strings
+
+ The "variant(s)" are character(s) and/or string(s) that are
+ treated as equivalent to the base character. Note that these
+ might not be exactly equivalent characters; a particular
+
+
+
+Klensin Informational [Page 6]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ original character may be a base character with a mapping to a
+ particular variant character, but that variant character may
+ not have a mapping to the original base character. Indeed,
+ the variant character may not appear in the base character
+ list, and hence may not be valid for use in a registration.
+ Usually, characters or strings to be designated as variants
+ are considered either equivalent or sufficiently similar (by
+ some registry-specific definition) that confusion between them
+ and the base character might occur.
+
+ * Base Registration
+
+ The "base registration" is the single name that the registrant
+ requested from the registry. The JET Guidelines use the term
+ "label string" for this concept.
+
+ * Registered, Activated
+
+ A label (or "name") is described as "registered" if it is
+ actually entered into a domain (i.e., into a zone file) by the
+ registry, so that it can be accessed and resolved using
+ standard DNS tools. The JET Guidelines describe a
+ "registered" label as "activated". However, some domains use
+ a slightly different registration logic in which a name can be
+ registered with the registrar (if one is involved) and with
+ the registry, but not actually entered into the zone file
+ until an additional activation or delegation step occurs.
+ This document does not make that distinction, but is
+ compatible with it.
+
+ As specified in the IDNA Standard, the name actually placed in
+ the zone file is always the internal ("punycode") form. There
+ is no provision for actually entering any other form of an IDN
+ into the DNS. It remains controversial, with different
+ registrars and registries having adopted different policies,
+ as to whether the registration, as submitted by the
+ registrant, is in the form of:
+
+ o The native-script name, either in UTF-8 or in some coding
+ specified by the registrar, or
+
+ o the internal-form ("punycode") name, or
+
+ o both forms of the name together, so that the registrar and
+ registry can verify the intended translation.
+
+
+
+
+
+
+Klensin Informational [Page 7]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ If any of the approaches defined in this document is used, it
+ is almost certain to be necessary that the native-script form
+ of the requested string be available to the registry.
+
+ * Registration Bundle
+
+ A "registration bundle" is the set of all labels that come
+ from expanding the base characters for a single name into
+ their variants. The presence of a label in a registration
+ bundle does not imply that it is registered. In the JET
+ Guidelines, a registration bundle is called an "IDN Package".
+
+ * Reserved Label
+
+ A "reserved label" is a label in a registration bundle that is
+ not actually registered.
+
+ * Registry"
+
+ A "registry" is the administrative authority for a DNS zone.
+ The registry is the body that enforces, and typically makes,
+ policies that are used in a particular zone in the DNS.
+
+ * Coded Character Set
+
+ A "Coded Character Set" (CCS) is a list of characters and the
+ code positions assigned to them. ASCII and Unicode are CCSs.
+
+ * Language
+
+ A "language" is something spoken by humans, independent of how
+ it is written or coded. ISO Standard 639 and IETF BCP 47 (RFC
+ 3066) [RFC3066] list and define codes for identifying
+ languages.
+
+ * Script
+
+ A "script" is a collection of characters (glyphs, independent
+ of coding) that are used together, typically to represent one
+ or more languages. Note that the script for one language may
+ heavily overlap the script for another. This does not imply
+ that they have identical scripts.
+
+ * Charset
+
+ "Charset" is an IETF-invented term to describe, more or less,
+ the combination of a script, a CCS that encodes that script,
+
+
+
+
+Klensin Informational [Page 8]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ and rules for serializing encoded bytes that are stored on a
+ computer or transmitted over the network.
+
+ The last four of these definitions are redundant with, but
+ deliberately somewhat less precise than, the definitions in
+ [RFC3536], which also provides sources. The two sets of definitions
+ are intended to be consistent.
+
+1.3.3. Confusion, Fraud, and Cybersquatting
+
+ The term "confusion" is used very generically in this document to
+ cover the entire range from accidental user misperception of the
+ relationship between characters with some characteristic in common
+ (typically appearance, sound, or meaning) to cybersquatting and
+ (other) deliberately fraudulent attempts to exploit those
+ relationships based on the nature of the characters.
+
+1.4. A Review of the JET Guidelines
+
+1.4.1. JET Model
+
+ In the JET Guidelines model, a prospective registrant approaches the
+ registry for a zone (perhaps through an intermediate registrar) with
+ a candidate base registration -- a proposed name to be registered --
+ and a list of languages in which that name is to be interpreted. The
+ languages are defined according to the fairly high-resolution coding
+ of [RFC3066] or, if the registry considers it more appropriate, a
+ coding based on scripts such as those in [LTRU-Registry]. In this
+ way, Chinese as used on the mainland of the People's Republic of
+ China ("zh-cn") can, at registry option, consist of a somewhat
+ different list of characters (code points) and be represented by a
+ separate table compared to Chinese as used in Taiwan ("zh-tw").
+
+ The design of the JET Guidelines took one important constraint as a
+ basis: IDNA was treated as a firm standard. A procedure that
+ modified some portion of the IDNA functions, or was a variant on
+ them, was considered a violation of those standards and should not be
+ encouraged (or, probably, even permitted).
+
+ Each registry is expected to construct (or obtain) a table for each
+ language it considers relevant and appropriate. These tables list,
+ for the particular zone, the characters permitted for that language.
+ If a character does not appear as a base character (called a "valid
+ code point" in the JET document) in that table, then a name
+ containing it cannot be registered. If multiple languages are listed
+ for the registration, then the character must appear in the tables
+ for each of those languages.
+
+
+
+
+Klensin Informational [Page 9]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ The tables may also contain columns that specify alternate or variant
+ forms of the valid character. If these variants appear, they are
+ used to synthesize labels that are alternatives to the original one.
+ These labels are all reserved and can be registered or "activated"
+ (placed into the DNS) only by the action or request of the original
+ registrant; some (the "preferred variant labels") are typically
+ registered automatically. The zone is expected to establish
+ appropriate policies for situations in which the variant forms of one
+ label conflict with already-reserved or already-registered labels.
+
+ Most of these concepts were introduced because of concerns about
+ specific issues with CJK characters, beginning from the requirement
+ that the use of Simplified Chinese by some registrants and
+ Traditional Chinese by others not be permitted to create confusion or
+ opportunities for fraud. While they may be applicable to registry
+ tables constructed for alphabetic scripts, the translation should be
+ done with care, since many analogies are not exact.
+
+ Some of the important issues are discussed in the sections that
+ follow, especially Section 3. The JET model may be considered as a
+ variation on, and inspiration for, the model and method presented by
+ the rest of this document, although the JET model has been completely
+ developed only for CJK characters. Other languages or scripts,
+ especially alphabetic ones, may require other variations.
+
+1.4.2. Reserved Names and Label Packages
+
+ A basic assumption of the JET model is that, if the evolution of
+ specific characters or the properties of Unicode [Unicode]
+ [Unicode32] or IDNA cause two strings to appear similar enough to
+ cause confusion, then both should be registered by the same party or
+ one of them should become unregisterable. The definition of "appear
+ similar enough" will differ for different cultures and circumstance,
+ and hence DNS zones, but the principle is fairly general. In the JET
+ model, all of the variant strings are identified, some are registered
+ into the DNS automatically, and others are simply reserved and can be
+ registered, if at all, only by the original registrant. Other zones
+ might find other policies appropriate. For example, a zone might
+ conclude that having similar strings registered in the DNS was
+ undesirable. If so, the list of variant strings would be used only
+ to build a list of names that would be reserved and prohibited from
+ being registered.
+
+
+
+
+
+
+
+
+
+Klensin Informational [Page 10]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+1.5. Languages, Scripts, and Variants
+
+1.5.1. Languages versus Scripts
+
+ Conversations about scripts -- collections of characters associated
+ with particular languages -- are common when discussing character
+ sets and codes. However, the boundaries between one script and
+ another are not well-defined. The Unicode Standard ([Unicode],
+ [Unicode32]), for example, does not define script boundaries at all,
+ even though it is structured in terms of usually-related blocks of
+ characters. The issue is complicated by the common origin of most
+ alphabetic scripts in use in the world today (see, for example,
+ [Drucker] or the more scholarly [Daniels]).
+
+ Because of that history, certain characters (or, more precisely,
+ symbols representing characters) appear in the scripts associated
+ with multiple languages, sometimes with very different sounds or
+ meanings. This differs from the CJK situation in which, if a
+ character appears in more than one of the relevant languages, it will
+ usually have the same interpretation in each one. For the subset of
+ characters that actually are ideographs or pictographs, pronunciation
+ is expected to vary widely while meaning is preserved. At least in
+ part because of that similarity of meaning, it made sense in the JET
+ case to permit a registration to specify multiple languages, to
+ verify that the characters in the label string (the requested "Base
+ registration") were valid for each, and then to generate variant
+ labels using each language in turn. For many alphabetic languages,
+ it may be more sensible to prohibit the label string submitted for
+ registration from being associated with more than one language.
+ Indeed, "one label, one language" has been suggested as an important
+ barrier against common sources of "look-alike" confusion. For
+ example, the imposition of that rule in a zone would prevent the
+ insertion of a few Greek or Cyrillic characters with shapes identical
+ to the Latin ones into what was otherwise a Latin-based string. For
+ a particular table, the list of base characters may be thought of as
+ the script associated with the relevant language, with the
+ understanding that the table design does not prevent the same
+ character from appearing in the tables for multiple languages.
+
+ Indeed, this notion of a script that is local and specifically
+ identified can be turned around: so-called "language tables" are
+ associated with languages only insofar as thinking about the
+ character structure and word forms associated with a given language
+ helps to inform the construction of the table. A country like
+ Finland, for example, might select among:
+
+ o One table each for Finnish, Swedish, and English characters and
+ conventions, permitting a string to be registered in one, two, or
+
+
+
+Klensin Informational [Page 11]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ all three languages. However, a three-language registration would
+ necessarily prohibit any characters that did not appear in all
+ three languages, since the label would make little sense
+ otherwise.
+
+ o One table each, but with a "one label, one language" rule for the
+ zone.
+
+ o A combined table based on the observation that all three writing
+ systems were based on Roman characters and that the possibilities
+ for confusion of interest to the registry would not be reduced by
+ "language" differentiation. This option raises an interesting
+ issue about language labeling as described in Section 1.4.1; see
+ the discussion in Section 7 below.
+
+ Regardless of what decisions were made about those languages and
+ scripts, they might have a separate table for registration of labels
+ containing Cyrillic characters. That table might contain some
+ Roman-derived characters (either as base characters or as variants),
+ just as some CJK tables do. See also Section 2, below.
+
+ Tables that present multiple languages, as described above, have
+ introduced confusion and discomfort among those who have failed to
+ understand these definitions. The consequence of these definitions
+ is that use of a language or script code in a registration is a
+ mnemonic, rather than a normative statement about the language or
+ script itself. When that confusion is likely to occur, it is
+ appropriate to simply use the registry identifier and a sequence
+ number to identify the registration.
+
+ As the JET Guidelines stress, no tables or systems of this type --
+ even if identified with a language as a means of defining or
+ describing the table -- can assure linguistic or even syntactic
+ correctness of labels with regard to that language. That assurance
+ may not be possible without human intervention or at least dictionary
+ lookups of complete proposed labels. It may even not be desirable to
+ attempt that level of correctness (see Section 2).
+
+ Of course, if any language-based tests or constraints, including "one
+ label, one language", are to be applied to limit the associated
+ sources of confusion, each zone must have a table for each language
+ in which it expects to accept registrations. The notion of a single
+ combined table for the zone is, in the general case, simply
+ unworkable. One could use a single table for the zone if the intent
+ were to impose only minimal restrictions, e.g., to force alphabetic
+ and numeric characters only, excluding symbols and punctuation. That
+ type of restriction might be useful in eliminating some problems,
+ such as those of unreadable labels, but it would be unlikely to be
+
+
+
+Klensin Informational [Page 12]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ very helpful with, e.g., confusion caused by similar-looking
+ characters.
+
+1.5.2. Variant Selection
+
+ The area of character variants is rife with difficulties (and perhaps
+ opportunities). There is no universal agreement about which base
+ characters have variants, or if they do, what those variants are.
+ For example, in some regions of the world and in some languages,
+ LATIN SMALL LETTER O WITH DIAERESIS (U+00F6) and LATIN SMALL LETTER O
+ WITH STROKE (U+00F8) are variants of each other, while in other
+ regions, most people would think that LATIN SMALL LETTER O WITH
+ STROKE has no variants. In some cases, the list of variants is
+ difficult to enumerate. For example, it required several years for
+ the Chinese language community to create variant tables for use with
+ IDNA, and it remains, at the time of this writing, questionable how
+ widely those tables will be accepted among users of Chinese from
+ areas of the world other than those represented by the groups that
+ created them.
+
+ Thus, the first thing a registry should ask is whether or not any of
+ the characters that they want to permit to be used have variants. If
+ not, the registry's work is much simpler. This is not to say that a
+ registry should ignore variants if they exist: adding variants after
+ a registry has started to take registrations will be nearly as
+ difficult administratively as removing characters from the list of
+ acceptable characters. That is, if a registry later decides that two
+ characters are variants of each other, and there are actively-used
+ names in the zones that differ only on the new variants, the registry
+ might have to transfer ownership of one of the names to a different
+ owner, using some process that is certain to be controversial.
+
+ This situation in likely to be much easier for areas and zones that
+ use characters that previously did not occur in the DNS at all than
+ it will be for zones in which non-English labels have been registered
+ in ASCII characters for some time, presumably because the language of
+ interest uses additional "Latin" characters with some conventions
+ when only ASCII is available. In the former case, the rules and
+ conventions can be established before any registrations occur. In
+ the latter, there may be conflicts or opportunities for confusion
+ between existing registrations and now-permitted Roman-based
+ characters that do not appear in ASCII. For example, a domain name
+ might exist today that uses the name of a city in Canada spelled as
+ "Montreal". If the zone in which it occurs changes its rules to
+ permit the use of the character LATIN SMALL LETTER E WITH ACUTE
+ (U+00E9), does the name of the city, spelled (correctly) using that
+ character, conflict with the existing domain name registration?
+
+
+
+
+Klensin Informational [Page 13]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ Certainly, if both are permitted, and permitted to be registered by
+ separate parties, there are many opportunities for confusion.
+
+ Of course, zone managers should inform all current registrants when
+ the registration policy for the zone changes. This includes the
+ times when IDN characters are first allowed in the zone, when
+ additional characters are permitted, and when any change occurs in
+ the character variant tables.
+
+ Many languages contain two variants for a character, one of which is
+ strongly preferred. A registry might restrict the base registration
+ to the preferred form, or it might allow any form for the base
+ registration. If the variant tables are created carefully, the
+ resulting bundles will be the same, but some registries will give
+ special status to the base registration such as its appearance in
+ "Whois" databases.
+
+1.6. Variants are not a Universal Remedy
+
+ It is worth stressing that there are many obvious opportunities for
+ confusion that variant systems, by virtue of being based on
+ processing of individual characters, cannot address. For example, if
+ a language can be written with more than one script, or
+ transliterations of the language into another script are common,
+ variant models are insufficient to prevent conflicting registration
+ of the related forms. Avoiding those types of problems would require
+ different mechanisms, perhaps based on phonetic or natural language
+ processing techniques for the entire proposed base registration.
+
+1.7. Reservations and Exclusions
+
+1.7.1. Sequence Exclusions for Valid Characters
+
+ The JET Guidelines are based on processing only single characters.
+ Pairs or longer sequences of characters can, at the option of the
+ registry, be handled through what the Guidelines describe as
+ "additional processing". These registry-specific string processing
+ procedures are specifically permitted by the guidelines to supplement
+ the per-character processing that generates the variants.
+
+ A different zone with different needs could use a modified version of
+ the table structure, or different types of additional processing, to
+ prohibit particular sequences of characters by marking them as
+ invalid, and to accept characters by marking them as valid. Other
+ modifications or extensions might be designed to prevent certain
+ letters from appearing at the beginning or end of labels. The use of
+ regular expressions in the "valid characters" column might be one way
+
+
+
+
+Klensin Informational [Page 14]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ to implement these types of restrictions, but there has been no
+ experience so far with that approach.
+
+ In particular, in some scripts derived from Roman characters,
+ sequences that have historically been typographically represented by
+ single "ligature" or "digraph" characters may also be represented by
+ the separate characters (e.g., "ae" for U+00E6 or "ij" for U+0133).
+ If it is desired to either prohibit these, or to treat them as
+ variants, some extensions to the single-character JET model may be
+ needed. Some careful thinking about IDNA (especially nameprep) may
+ also be needed, since some of these combinations are excluded there).
+
+1.7.2. Character Pairing Issues
+
+ Some character pairings -- the use of a character form (glyph) in one
+ language and a different form with the same properties in a related
+ one -- closely approximate the issues with mapping between
+ Traditional and Simplified Chinese, although the history is
+ different. For example, it might be useful to have "o" with a stroke
+ (U+00F8) as a variant for "o" with diaeresis above it (U+00F6) (and
+ the equivalent upper-case pair) in a Swedish table, and vice versa in
+ a Norwegian one, or to prohibit one of these characters entirely in
+ each table. In a German table, U+00F8 would presumably be
+ prohibited, while U+00F6 might have "oe" as a variant. Obviously, if
+ the relevant language of registration is unknown, this type of
+ variant matching cannot be applied in any sensible way.
+
+1.8. The Registration Bundle
+
+1.8.1. Definitions and Structure
+
+ As one of its critical innovations, the JET model defines an "IDN
+ package", known in this document as a "registration bundle", which
+ consists of the primary registered string (which is used as the name
+ of the bundle), the information about the language table(s) used, the
+ variant labels for that string, and indications of which of those
+ labels are registered in the relevant zone file ("activated" in the
+ JET terminology). Registration bundles are also atomic -- one can
+ not add or remove variant labels from one without unregistering the
+ entire package. A label exists in only one registration bundle at a
+ time; if a new label is registered that would generate a variant that
+ matches one that appears in an existing package, that variant simply
+ is not included in the second package. A subsequent de-registration
+ of the first package does not cause the variant to be added to the
+ second. While it might be possible to change this in other models,
+ the JET conclusion was that other options would be far too complex to
+ implement and operate and would cause many new types of name
+ conflicts.
+
+
+
+Klensin Informational [Page 15]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+1.8.2. Application of the Registration Bundle
+
+ A registry has three options for handling the case where the
+ registration bundle contains more than one label. The policy options
+ are:
+
+ o Register and resolve all labels in the zone, making the zone
+ information identical to that of the registered labels. This
+ option will allow end users to find names with variants more
+ easily, but will result in larger zone files. For some language
+ tables, the zone file could become so large that it could
+ negatively affect the ability of the registry to perform name
+ resolution. If the base registration contains several characters
+ that have equivalents, the owner could end up having to take care
+ of large numbers of zones. For instance, if DIGIT ONE is a
+ variant of LATIN SMALL LETTER L, the owner of the domain name all-
+ lollypops.example.com will have to manage 32 zones. If the intent
+ is to keep the contents of those zones identical, the owner may
+ then face a significant administrative problem. If other concerns
+ dictate short times to live and absolute consistency of DNS
+ responses, the challenges may be nearly impossible.
+
+ o Block all labels other than the registered label so they cannot be
+ registered in the future. This option does not increase the size
+ of the zone file and provides maximum safety against false
+ positives, but it may cause end users to not be able to find names
+ with variants that they would expect. If the base registration
+ contains characters that have equivalents, Internet users who do
+ not know what base characters were used in the registration will
+ not know what character to type in to get a DNS response. For
+ instance, if DIGIT ONE is a variant of LATIN SMALL LETTER L, and
+ LATIN SMALL LETTER L is a variant of DIGIT ONE, the user who sees
+ "pale.example.com" will not know whether to type a "1" or a "l"
+ after the "pa" in the first label.
+
+ o Resolve some labels and block some other labels. This option is
+ likely to cause the most confusion with users because including
+ some variants will cause a name to be found, but using other
+ variants will cause the name to be not found. For example, even
+ if people understood that DIGIT ONE and LATIN SMALL LETTER L were
+ variants, a typical DNS user wouldn't know which character to type
+ because they wouldn't know whether this pair were used to register
+ or block the labels. However, this option can be used to balance
+ the desires of the name owner (that every possible attempt to
+ enter their name will work) with the desires of the zone
+ administrator (to make the zone more manageable and possibly to be
+ compensated for greater amounts of work needed for a single
+
+
+
+
+Klensin Informational [Page 16]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ registration). For many circumstances, it may be the most
+ attractive option.
+
+ In all cases, at least the registered label should appear in the
+ zone. It would be almost impossible to describe to name owners why
+ the name that they asked for is not in the zone, but some other name
+ that they now control is. By implication, if the requested label is
+ already registered, the entire registration request must be rejected.
+
+2. Some Implications of This Approach
+
+ Historically, DNS labels were considered to be arbitrary identifier
+ strings, without any inherent meaning. Even in ASCII, there was no
+ requirement that labels form words. Labels that could not possibly
+ represent words in any Romance or Germanic language (the languages
+ that have been written in "Latin" scripts since medieval times or
+ earlier) have actually been quite common. In general, in those
+ languages, words contain at least one vowel and do not have embedded
+ numbers. As a result, a string such as "bc345df" cannot possibly be
+ a "word" in these languages. More generally, the more one moves
+ toward "language"-based registry restrictions, the less it is going
+ to be possible to construct labels out of fanciful strings. While
+ fanciful strings are terrible candidates for "words", they may make
+ very good identifiers. To take a trivial example using only ASCII
+ characters, "rtr32w", "rtr32x", and "rtr32z" might be very good DNS
+ labels for a particular zone and application. However, given the
+ embedded digits and lack of vowels, they, like the "bc345df" example
+ given above, would fail even the most superficial of tests for valid
+ English (or German or French (etc.)) word forms.
+
+ It is worth noting that several DNS experts have suggested that a
+ number of problems could be solved by prohibiting meaningful names in
+ labels, requiring instead that the labels be random or nonsense
+ strings. If methods similar to those discussed in this document were
+ used to force identifiers to be closer to meaningful words in real
+ languages, the result would be directly contradictory to those
+ "random name" approaches.
+
+ Interestingly, if one were trying to develop an "only words" system,
+ a rather different -- but very restrictive -- model could be
+ developed using lookups in a dictionary for the relevant language and
+ a listing of valid business names for the relevant area. If a string
+ did not appear in either, it would not be permitted to be registered.
+ Models that require a prior national business listing (or
+ registration) that is identical to the proposed domain name label
+ have historically been used to restrict registrations in some
+ country-code top level domains, so this is not a new idea. On the
+ other hand, if look-alike characters are a concern, even that type of
+
+
+
+Klensin Informational [Page 17]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ rule (or restriction) would still not avoid the need to consider
+ character variants.
+
+ Consequently, registries applying the principles outlined in this
+ document should be careful not to apply more severe restrictions than
+ are reasonable and appropriate while, at the same time, being aware
+ of how difficult it usually is to add restrictions at a later time.
+
+3. Possible Modifications of the JET Model
+
+ The JET model was designed for CJK characters. The discussion above
+ implies that some extensions to it may be needed to handle the
+ characteristics of various alphabetic scripts and the decisions that
+ might be made about them in different zones. Those extensions might
+ include facilities to process:
+
+ o Two-character (or more) sequences, such as ligatures and
+ typographic spelling conventions, as variants.
+
+ o Regular expressions or some other mechanism for dealing with
+ string positions of characters (e.g., characters that must, or
+ must not, appear at the beginning or end of strings).
+
+ o Delimiter breaks to permit multiple languages to be used,
+ separately, within the same label. E.g., is it possible to define
+ a label as consisting of two or more components, each in a
+ different language, with some particular delimiter to define the
+ boundaries of the components?
+
+4. Conclusions and Recommendations About the General Approach
+
+ After examining the implications of the potential use of the full
+ range of characters permitted by IDNA in DNS labels, multiple groups,
+ including IESG [IESG-IDN] and ICANN [ICANN-IDN] [ICANN-IDN2], have
+ concluded that some restrictions are needed to prevent many forms of
+ user confusion about the actual structure of a name or the word,
+ phrase, or term that it appears to spell out. The best way to
+ approach such restrictions appears to draw from the language and
+ culture of the community of registrants and users in the relevant
+ zone: if particular characters are likely to be surprising or
+ unintelligible to both of those groups, it is probably wise to not
+ permit them to be used in registrations. Registration restrictions
+ can be carried much further than restricting permitted characters to
+ a selected Unicode subset. The idea of a reserved "bundle" of
+ related labels permits probably-confusing combinations or sets of
+ characters to be bound together, under the control of a single
+ registrant. While that registrant might still use the package in a
+ way that confused his or her own users (the approach outlined here
+
+
+
+Klensin Informational [Page 18]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ will not prevent either ill-though-out ideas or stupidity), the
+ possibility of turning potential confusion into a hostile attack
+ would be considerably reduced.
+
+ At the same time, excessive restrictions may make DNS identifiers
+ less useful for their original purpose: identifying particular hosts
+ and similar resources on the network in an orderly way. Registries
+ creating rules and policies about what can be registered in
+ particular zones -- whether those are based on the JET Guidelines or
+ the suggestions in this document -- should balance the need for
+ restrictions against the need for flexibility in constructing
+ identifiers.
+
+ The discussion above provides many options that could be selected,
+ defined, and applied in different ways in different registries
+ (zones). Registrars and registrants would almost certainly prefer
+ systems in which they can predict, at least to a first order
+ approximation, the implications of a particular potential
+ registration. Predictability of that sort probably requires more
+ standards, and less flexibility, than the model itself might suggest.
+
+5. A Model Table Format
+
+ The format of the table is meant to be machine-readable but not
+ human-readable. It is fairly trivial to convert the table into one
+ that can be read by people.
+
+ Each character in the table is given in the "U+" notation for Unicode
+ characters. The lines of the table are terminated with either a
+ carriage return character (ASCII 0x0D), a linefeed character (ASCII
+ 0x0A), or a sequence of carriage return followed by linefeed (ASCII
+ 0x0D 0x0A). The order of the lines in the table may or may not
+ matter, depending on how the table is constructed.
+
+ Comment lines in the table are preceded with a "#" character (ASCII
+ 0x2C).
+
+ Each non-comment line in the table starts with the character that is
+ allowed in the registry and expected to be used in registrations,
+ which is also called the "base character". If the base character has
+ any variants, the base character is followed by a vertical bar
+ character ("|", ASCII 0x7C) and the variant string. If the base
+ character has more than one variant, the variants are separated by a
+ colon (":", ASCII 0x3A). Strings are given with a hyphen ("-", ASCII
+ 0x2D) between each character. Comments beginning with a "#" (ASCII
+ 0x2C), and may be preceded by spaces (" ", ASCII 0x20).
+
+
+
+
+
+Klensin Informational [Page 19]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ The following is an example of how a table might look. The entries
+ in this table are purposely silly and should not be used by any
+ registry as the basis for choosing variants. For the example, assume
+ that the registry:
+
+ o allows the FOR ALL character (U+2200) with no variants
+
+ o allows the COMPLEMENT character (U+2201) which has a single
+ variant of LATIN CAPITAL LETTER C (U+0043)
+
+ o allows the PROPORTION character (U+2237) which has one variant
+ which is the string COLON (U+003A) COLON (U+003A)
+
+ o allows the PARTIAL DIFFERENTIAL character (U+2202) which has two
+ variants: LATIN SMALL LETTER D (U+0064) and GREEK SMALL LETTER
+ DELTA (U+03B4)
+
+ The table contents (after any required header information, see
+ [IANA-language-registry] and the discussion in Section 7 below) would
+ look like:
+
+ # An example of a table
+ U+2200
+ U+2201|U+0043
+ U+2237|U+003A-U+003A # Note that the variant is a string
+ U+2202|U+0064:U+03B4 # Two variants for the same character
+
+ Implementers of table processors should remember that there are tens
+ of thousands of characters whose codepoints are greater than 0xFFFF.
+ Thus, any program that assumes that each character in the table is
+ represented in exactly six octets ("U", "+", and four octets
+ representing the character value) will fail with tables that use
+ characters whose value is greater than 0xFFFF.
+
+6. A Model Label Registration Procedure: "CreateBundle"
+
+ This procedure has three inputs:
+
+ 1. the proposed base registration,
+
+ 2. the language (or script, if the registration is script-based, but
+ "language" is used for convenience below) for the proposed base
+ registration, and
+
+ 3. the processing table associated with that language.
+
+ The output of the process is either failure (the base registration
+ cannot be registered at all), or a registration bundle that contains
+
+
+
+Klensin Informational [Page 20]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ one or more labels (always including the base registration). As
+ described earlier, the registration bundle should be stored with its
+ date of creation so that issues with overlapping elements between
+ bundles can later be resolved on a first-come, first-served basis.
+
+ There are two steps to processing the registration:
+
+ 1. Check whether the proposed base registration exists in any
+ bundle. If it does, stop immediately with a failure.
+
+ 2. Process the base registration with the mechanism described as
+ "CreateBundle" in Section 6.1, below.
+
+ Note that the process must be executed only once. The process must
+ not be performed on any output of the process, only on the proposed
+ base registration.
+
+6.1. Description of the CreateBundle Mechanism
+
+ The CreateBundle mechanism determines whether a registration bundle
+ can be created and, if so, populates that bundle with valid labels.
+
+ During the processing, a "temporary bundle" contains partial labels,
+ that is, labels that are being built and are not complete labels.
+ The partial labels in the temporary bundle consist of strings.
+
+ The steps are:
+
+ 1. Split the base registration into individual characters, called
+ "candidate characters". Compare every candidate character
+ against the base characters in the table. If any candidate
+ character does not exist in the set of base characters, the
+ system must stop and not register any names (that is, it must not
+ register either the base registration or any labels that would
+ have come from character variants).
+
+ 2. Perform the steps in IDNA's ToASCII sequence for the base
+ registration. If ToASCII fails for the base registration, the
+ system must stop and not register any label (that is, it must not
+ register either the base registration or labels that might have
+ been created from variants of characters contained in it). If
+ ToASCII succeeds, place the base registration into the
+ registration bundle.
+
+ 3. For every candidate character in the base registration, do the
+ following:
+
+
+
+
+
+Klensin Informational [Page 21]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ o Create the set of characters that consists of the candidate
+ character and any variants.
+
+ o For each character in the set from the previous step,
+ duplicate the temporary bundle that resulted from the previous
+ candidate character, and add the new character to the end of
+ each partial label.
+
+ 4. The temporary bundle now contains zero or more labels that
+ consist of Unicode characters. For every label in the temporary
+ bundle, do the following:
+
+ o Process the label with ToASCII to see if ToASCII succeeds. If
+ it does, add the label to the registration bundle. Otherwise,
+ do not process this label from the temporary bundle any
+ further; it will not go into the registration bundle.
+
+ The result of the processing outlined above is the registration
+ bundle with the base registration and possibly other labels.
+
+6.2. The "no-variants" Case
+
+ It is clear that, for many scripts, registries will choose to create
+ tables without variants, either because variants are clearly not
+ necessary or because they are determined to cause more confusion and
+ overhead than is justified by the circumstances. For those
+ situations the table model of Section 5 becomes a trivial listing of
+ base characters and only the first two steps of CreateBundle
+ (verifying that all candidate character are in the base ("valid")
+ character list and verifying that the resulting characters will
+ succeed in the ToASCII operation) are applicable. Even the second of
+ those steps becomes pro forma if the advice in the next subsection is
+ followed.
+
+6.3. CreateBundle and Nameprep Mapping
+
+ One of the functions of Nameprep, and IDNA more generally, is to map
+ a large number of Unicode characters (code points) into a smaller
+ number to avoid a different but overlapping set of confusion
+ problems. For example, when a non-ASCII script makes distinctions
+ between "upper case" and "lower case", nameprep maps the upper case
+ characters to the lower case ones in order to simulate the DNS
+ protocol's rule that ASCII characters are interpreted in a case-
+ insensitive way. Unicode also contains many code points that are
+ typographic variants on each other (e.g., forms with different widths
+ and code points that designate font variations for mathematical
+ uses), the Unicode standard explicitly identifies them that way, and
+ Nameprep maps these onto base characters.
+
+
+
+Klensin Informational [Page 22]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ While having these mapping functions available during lookup may be
+ quite helpful to users who type equivalent forms, registrations are
+ probably best performed in terms of the IDNA base characters only,
+ i.e., those characters that nameprep will not change. This will have
+ two advantages.
+
+ o Registrants will never find themselves in the rather confusing
+ position of having submitted one string for registration and
+ finding a different string in the registry database (which could
+ otherwise occur even if the relevant language table does not
+ contain variants).
+
+ o Those who are interested in what characters are permitted by a
+ given registry will only need to examine the relevant tables,
+ rather than simulating the IDNA algorithm to determine the result
+ of processing particular characters.
+
+7. IANA Considerations
+
+ Under ICANN (not IETF) direction and management, the IANA has created
+ a registry for language variant tables. The authoritative
+ documentation for that registry is in [IANA-language-registry].
+ Since the registry exists and is being managed under ICANN direction,
+ the material that follows is a review of the theory of this registry,
+ rather than new instructions for IANA.
+
+ As described above and suggested in the JET Guidelines, the
+ registration rules generally require only that:
+
+ o The application be submitted or endorsed by a TLD registry, to
+ ensure that someone cares about the particular table.
+
+ o The table be identified by the following:
+
+ * the name -- usually the top-level domain name -- of the
+ submitting or endorsing registry;
+
+ * one of: a language designation (consistent with [RFC3066] or
+ with some other system approved by the IANA), a script
+ designation, a combination of the two, or a sequence number
+ acceptable to IANA for this purpose;
+
+ * a version number; and
+
+ * a date.
+
+ o Characters listed in the table be identified by Unicode code
+ points, as discussed above.
+
+
+
+Klensin Informational [Page 23]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ o The table format may correspond to that identified in [RFC3743],
+ or in Section 5 above, or may be some variation on those themes
+ appropriate to the local processing model (with or without
+ variants).
+
+ This raises some issues that will need to be worked out as
+ experiences accumulate. For example, more standardization of table
+ formats would be desirable to allow processing by the same computer
+ tools for different registries and languages. But standardization
+ seems premature at this time due to differences in languages,
+ processing, and requirements and lack of experience with them.
+ Similarly, if a registry concludes that it should use a table that
+ contains characters from several scripts, it is not clear how such a
+ table should be designated. Identifying it with a language code
+ (either according to [RFC3066] or an independent code registered with
+ IANA) is likely to just introduce more confusion, especially given
+ other Internet uses of the language codes. It appears that some
+ other convention will be needed for those cases, and it should be
+ developed (if it has not already been established by the time this
+ document is published).
+
+8. Internationalization Considerations
+
+ This document specifies a model mechanism for registering
+ Internationalized Domain Names (IDNs) that can be used to reduce
+ confusion among similar-appearing names. The proposal is designed to
+ facilitate internationalization while permitting a balance between
+ internationalization concerns and concerns about keeping the Internet
+ global and domain name system references unique in the perception of
+ the user as well as in practice.
+
+9. Security Considerations
+
+ Registration of labels in the DNS that contain essentially
+ unrestricted sequences of arbitrary Unicode characters may introduce
+ opportunities for either attacks or simple confusion. Some of these
+ risks, such as confusion about which character (of several that look
+ alike) is actually intended, may be associated with the presentation
+ form of DNS names. Others may be linked to databases associated with
+ the DNS, e.g., with the difficulty of finding an entry in a "Whois
+ file" when it is not clear how to enter or to search for the
+ characters that make up a name. This document discusses a family of
+ restrictions on the names that can be registered. Restrictions of
+ the type described can be imposed by a DNS zone ("registry"). The
+ document also describes some possible tools for implementing such
+ restrictions.
+
+
+
+
+
+Klensin Informational [Page 24]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ While the increased number and types of characters made available by
+ Unicode considerably increases the scale of the potential problems,
+ the problems addressed by this document are not new. No plausible
+ set of restrictions will eliminate all problems and sources of
+ confusion: for example, it has often been pointed out that, even in
+ ASCII, the characters digit-one ("1") and lower case L ("l") can
+ easily be confused in some display fonts. But, to the degree to
+ which security may be aided by sensible risk reduction, these
+ techniques may be helpful.
+
+10. Acknowledgements
+
+ Discussions in the process of developing the JET Guidelines were
+ vital in developing this document and all of the JET participants are
+ consequently acknowledged. Attempts to explain some of the issues
+ uncovered there to, and feedback from, Vint Cerf, Wendy Rickard, and
+ members of the ICANN IDN Committee were also helpful in the thinking
+ leading up to this document.
+
+ An effort by Paul Hoffman to create a generic specification for
+ registration restrictions of this type helped to inspire this
+ document, which takes a somewhat different, more language-oriented,
+ approach than his initial draft. While the initial version of that
+ draft indicated that multiple languages (or multiple language tables)
+ for a single zone were infeasible, more recent versions [Hoffman-reg]
+ shifted to inclusion of language-based approaches. The current
+ version of this document incorporates considerable text, and even
+ more ideas, from those drafts, with Paul Hoffman's generous
+ permission.
+
+ Feedback was provided by several registry operators (of both country
+ code and generic TLDs), including Edmon Chung and Ram Mohan of
+ Afilias, and by ICANN and IANA staff, notably Tina Dam and Theresa
+ Swinehart. This feedback about issues encountered in registering
+ tables and designing IDN implementations resulted in the addition of
+ significant clarifying text to the current version of the document.
+
+ The opinions expressed here are the sole responsibility of the
+ author. Some of those whose ideas and comments are reflected in this
+ document may disagree with the conclusions the author has drawn from
+ them. The first draft version of this document was posted in June
+ 2003.
+
+
+
+
+
+
+
+
+
+Klensin Informational [Page 25]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+11. Informative References
+
+ [Daniels] P.T. Daniels and W. Bright, The World's Writing
+ Systems, Oxford: Oxford University Press: 1996.
+
+ [Drucker] Drucker, J., "The Alphabetic Labyrinth: The Letters in
+ History and Imagination", 1995.
+
+ [Hoffman-reg] Hoffman, P., "A Method for Registering
+ Internationalized Domain Names", Work in Progress,
+ October 2003.
+
+ [IESG-IDN] Internet Engineering Steering Group, IETF, "IESG
+ Statement on IDN", IESG Statement available from
+ http://www.ietf.org/IESG/STATEMENTS/IDNstatement.txt,
+ February 2003.
+
+ [ICANN-IDN] Internet Corporation for Assigned Names and Numbers
+ (ICANN), "Guidelines for the Implementation of
+ Internationalized Domain Names, Version 1.0", June
+ 2003.
+
+ [ICANN-IDN2] Internet Corporation for Assigned Names and Numbers
+ (ICANN), "Guidelines for the Implementation of
+ Internationalized Domain Names, Version 2.0", September
+ 2005.
+
+ [IANA-language-registry]
+ Internet Assigned Numbers Authority (IANA), "IDN
+ Language Table Registry", April 2004.
+
+ [LTRU-Registry]
+ Phillips, A., Ed. and M. Davis, Ed., "Tags for
+ Identifying Languages", Work in Progress, October 2005.
+
+ [RFC952] Harrenstien, K., Stahl, M., and E. Feinler, "DoD
+ Internet host table specification", RFC 952, October
+ 1985.
+
+ [RFC1035] Mockapetris, P., "Domain names - implementation and
+ specification", STD 13, RFC 1035, November 1987.
+
+ [RFC3066] Alvestrand, H., "Tags for the Identification of
+ Languages", BCP 47, RFC 3066, January 2001.
+
+ [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
+ "Internationalizing Domain Names in Applications
+ (IDNA)", RFC 3490, March 2003.
+
+
+
+Klensin Informational [Page 26]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+ [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
+ Profile for Internationalized Domain Names (IDN)", RFC
+ 3491, March 2003.
+
+ [RFC3492] Costello, A., "Punycode: A Bootstring encoding of
+ Unicode for Internationalized Domain Names in
+ Applications (IDNA)", RFC 3492, March 2003.
+
+ [RFC3536] Hoffman, P., "Terminology Used in Internationalization
+ in the IETF", RFC 3536, May 2003.
+
+ [RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
+ Engineering Team (JET) Guidelines for Internationalized
+ Domain Names (IDN) Registration and Administration for
+ Chinese, Japanese, and Korean", RFC 3743, April 2004.
+
+ [Unicode] The Unicode Consortium, "The Unicode Standard --
+ Version 3.0", January 2000.
+
+ [Unicode32] The Unicode Consortium, "Unicode Standard Annex #28:
+ Unicode 3.2", March 2002.
+
+Author's Address
+
+ John C Klensin
+ 1770 Massachusetts Ave, #322
+ Cambridge, MA 02140
+ USA
+
+ Phone: +1 617 491 5735
+ EMail: john-ietf@jck.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin Informational [Page 27]
+
+RFC 4290 IDN Registration Practices December 2005
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2005).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78 and at www.rfc-editor.org/copyright.html, and
+ except as set forth therein, the authors retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at ietf-
+ ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+Klensin Informational [Page 28]
+