doc: Add RFC documents

author: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committer: Thomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit: 4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree: e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4690.txt
parent: ea76e11061bda059ae9f9ad130a9895cc85607db (diff)
1 files changed, 2075 insertions, 0 deletions
diff --git a/doc/rfc/rfc4690.txt b/doc/rfc/rfc4690.txt
new file mode 100644
index 0000000..233253c
--- /dev/null
+++ b/doc/rfc/rfc4690.txt
@@ -0,0 +1,2075 @@
+
+
+
+
+
+
+Network Working Group                                         J. Klensin
+Request for Comments: 4690                                  P. Faltstrom
+Category: Informational                                    Cisco Systems
+                                                                 C. Karp
+                                       Swedish Museum of Natural History
+                                                                     IAB
+                                                          September 2006
+
+
+  Review and Recommendations for Internationalized Domain Names (IDNs)
+
+Status of This Memo
+
+   This memo provides information for the Internet community.  It does
+   not specify an Internet standard of any kind.  Distribution of this
+   memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2006).
+
+Abstract
+
+   This note describes issues raised by the deployment and use of
+   Internationalized Domain Names.  It describes problems both at the
+   time of registration and for use of those names in the DNS.  It
+   recommends that IETF should update the RFCs relating to IDNs and a
+   framework to be followed in doing so, as well as summarizing and
+   identifying some work that is required outside the IETF.  In
+   particular, it proposes that some changes be investigated for the
+   Internationalizing Domain Names in Applications (IDNA) standard and
+   its supporting tables, based on experience gained since those
+   standards were completed.
+
+Table of Contents
+
+   1. Introduction ....................................................3
+      1.1. The Role of IDNs and This Document .........................3
+      1.2. Status of This Document and Its Recommendations ............4
+      1.3. The IDNA Standard ..........................................4
+      1.4. Unicode Documents ..........................................5
+      1.5. Definitions ................................................5
+           1.5.1. Language ............................................6
+           1.5.2. Script ..............................................6
+           1.5.3. Multilingual ........................................6
+           1.5.4. Localization ........................................7
+           1.5.5. Internationalization ................................7
+
+
+
+
+Klensin, et al.              Informational                      [Page 1]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+      1.6. Statements and Guidelines ..................................7
+           1.6.1. IESG Statement ......................................8
+           1.6.2. ICANN Statements ....................................8
+   2. General Problems and Issues ....................................11
+      2.1. User Conceptions, Local Character Sets, and Input issues ..11
+      2.2. Examples of Issues ........................................13
+           2.2.1. Language-Specific Character Matching ...............13
+           2.2.2. Multiple Scripts ...................................13
+           2.2.3. Normalization and Character Mappings ...............14
+           2.2.4. URLs in Printed Form ...............................16
+           2.2.5. Bidirectional Text .................................17
+           2.2.6. Confusable Character Issues ........................17
+           2.2.7. The IESG Statement and IDNA issues .................19
+   3. Migrating to New Versions of Unicode ...........................20
+      3.1. Versions of Unicode .......................................20
+      3.2. Version Changes and Normalization Issues ..................21
+           3.2.1. Unnormalized Combining Sequences ...................21
+           3.2.2. Combining Characters and Character Components ......22
+           3.2.3. When does normalization occur? .....................23
+   4. Framework for Next Steps in IDN Development ....................24
+      4.1. Issues within the Scope of the IETF .......................24
+           4.1.1. Review of IDNA .....................................24
+           4.1.2. Non-DNS and Above-DNS Internationalization
+                  Approaches .........................................25
+           4.1.3. Security Issues, Certificates, etc. ................25
+           4.1.4. Protocol Changes and Policy Implications ...........27
+           4.1.5. Non-US-ASCII in Local Part of Email Addresses ......27
+           4.1.6. Use of the Unicode Character Set in the IETF .......27
+      4.2. Issues That Fall within the Purview of ICANN ..............28
+           4.2.1. Dispute Resolution .................................28
+           4.2.2. Policy at Registries ...............................28
+           4.2.3. IDNs at the Top Level of the DNS ...................29
+   5. Specific Recommendations for Next Steps ........................29
+      5.1. Reduction of Permitted Character List .....................29
+           5.1.1. Elimination of All Non-Language Characters .........30
+           5.1.2. Elimination of Word-Separation Punctuation .........30
+      5.2. Updating to New Versions of Unicode .......................30
+      5.3. Role and Uses of the DNS ..................................31
+      5.4. Databases of Registered Names .............................31
+   6. Security Considerations ........................................31
+   7. Acknowledgements ...............................................32
+   8. References .....................................................32
+      8.1. Normative References ......................................32
+      8.2. Informative References ....................................33
+
+
+
+
+
+
+
+Klensin, et al.              Informational                      [Page 2]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+1.  Introduction
+
+1.1.  The Role of IDNs and This Document
+
+   While IDNs have been advocated as the solution to a wide range of
+   problems, this document is written from the perspective that they are
+   no more and no less than DNS names, reflecting the same requirements
+   for use, stability, and accuracy as traditional "hostnames", but
+   using a much larger collection of permitted characters.  In
+   particular, while IDNs represent a step toward an Internet that is
+   equally accessible from all languages and scripts, they, at best,
+   address only a small part of that very broad objective.  There has
+   been controversy since IDNs were first suggested about how important
+   they will actually turn out to be; that controversy will probably
+   continue.  Accessibility from all languages is an important
+   objective, hence it is important that our standards and definitions
+   for IDNs be smoothly adaptable to additional scripts as they are
+   added to the Unicode character set.
+
+   The utility of IDNs must be evaluated in terms of their application
+   by users and in protocols: the ability to simply put a name into the
+   DNS and retrieve it is not, in and of itself, important.  From this
+   point of view, IDNs will be useful and effective if they provide
+   stable and predictable references -- references that are no less
+   stable and predictable, and no less secure, than their ASCII
+   counterparts.
+
+   This combination of objectives and criteria has proven very difficult
+   to satisfy.  Experience in developing the IDNA standard and during
+   the initial years of its implementation and deployment suggests that
+   it may be impossible to fully satisfy all of them and that
+   engineering compromises are needed to yield a result that is
+   workable, even if not completely satisfactory.  Based on that
+   experience and issues that have been raised, it is now appropriate to
+   review some of the implications of IDNs, the decisions made in
+   defining them, and the foundation on which they rest and determine
+   whether changes are needed and, if so, which ones.
+
+   The design of the DNS itself imposes some additional constraints.  If
+   the DNS is to remain globally interoperable, there are specific
+   characteristics that no implementation of IDNs, or the DNS more
+   generally, can change.  For example, because the DNS is a global
+   hierarchal administrative namespace with only a single name at any
+   given node, there is one and only one owner of each domain name.
+   Also, when strings are looked up in the DNS, positive responses can
+   only reflect exact matches: if there is no exact match, then one gets
+   an error reply, not a list of near matches or other supplemental
+   information.  Searches and approximate matchings are not possible.
+
+
+
+Klensin, et al.              Informational                      [Page 3]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   Finally, because the DNS is a distributed system where any server
+   might cache responses, and later use those cached responses to
+   attempt to satisfy queries before a global lookup is done, every
+   server must use the same matching criteria.
+
+1.2.  Status of This Document and Its Recommendations
+
+   This document reviews the IDN landscape from an IETF perspective and
+   presents the recommendations and conclusions of the IAB, based
+   partially on input from an ad hoc committee charged with reviewing
+   IDN issues and the path forward (see Section 7).  Its recommendations
+   are advice to the IETF, or in a few cases to other bodies, for topics
+   to be investigated and actions to be taken if those bodies, after
+   their examinations, consider those actions appropriate.
+
+1.3.  The IDNA Standard
+
+   During 2002, the IETF completed the following RFCs that, together,
+   define IDNs:
+
+   RFC 3454  Preparation of Internationalized Strings ("Stringprep")
+      [RFC3454].
+      Stringprep is a generic mechanism for taking a Unicode string and
+      converting it into a canonical format.  Stringprep itself is just
+      a collection of rules, tables, and operations.  Any protocol or
+      algorithm that uses it must define a "Stringprep profile", which
+      specifies which of those rules are applied, how, and with which
+      characteristics.
+
+   RFC 3490  Internationalizing Domain Names in Applications (IDNA)
+      [RFC3490].
+      IDNA is the base specification in this group.  It specifies that
+      Nameprep is used as the Stringprep profile for domain names, and
+      that Punycode is the relevant encoding mechanism for use in
+      generating an ASCII-compatible ("ACE") form of the name.  It also
+      applies some additional conversions and character filtering that
+      are not part of Nameprep.
+
+   RFC 3491  Nameprep: A Stringprep Profile for Internationalized Domain
+      Names (IDN) [RFC3491].
+      Nameprep is designed to meet the specific needs of IDNs and, in
+      particular, to support case-folding for scripts that support what
+      are traditionally known as upper- and lowercase forms of the same
+      letters.  The result of the Nameprep algorithm is a string
+      containing a subset of the Unicode Character set, normalized and
+      case-folded so that case-insensitive comparison can be made.
+
+
+
+
+
+Klensin, et al.              Informational                      [Page 4]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   RFC 3492  Punycode: A Bootstring encoding of Unicode for
+      Internationalized Domain Names in Applications (IDNA) [RFC3492].
+      Punycode is a mechanism for encoding a Unicode string in ASCII
+      characters.  The characters used are the same the subset of
+      characters that are allowed in the hostname definition of DNS,
+      i.e., the "letter, digit, and hyphen" characters, sometimes known
+      as "LDH".
+
+1.4.  Unicode Documents
+
+   Unicode is used as the base, and defining, character set for IDNs.
+   Unicode is standardized by the Unicode Consortium, and synchronized
+   with ISO to create ISO/IEC 10646 [ISO10646].  At the time the RFCs
+   mentioned earlier were created, Unicode was at Version 3.2.  For
+   reasons explained later, it was necessary to pick a particular,
+   then-current, version of Unicode when IDNA was adopted.
+   Consequently, the RFCs are explicitly dependent on Unicode Version
+   3.2 [Unicode32].  There is, at present, no established mechanism for
+   modifying the IDNA RFCs to use newer Unicode versions (see
+   Section 3.1).
+
+   Unicode is a very large and complex character set.  (The term
+   "character set" or "charset" is used in a way that is peculiar to the
+   IETF and may not be the same as the usage in other bodies and
+   contexts.)  The Unicode Standard and related documents are created
+   and maintained by the Unicode Technical Committee (UTC), one of the
+   committees of the Unicode Consortium.
+
+   The Consortium first published The Unicode Standard [Unicode10] in
+   1991, and continues to develop standards based on that original work.
+   Unicode is developed in conjunction with the International
+   Organization for Standardization, and it shares its character
+   repertoire with ISO/IEC 10646.  Unicode and ISO/IEC 10646 function
+   equivalently as character encodings, but The Unicode Standard
+   contains much more information for implementers, covering -- in depth
+   -- topics such as bitwise encoding, collation, and rendering.  The
+   Unicode Standard enumerates a multitude of character properties,
+   including those needed for supporting bidirectional text.  The
+   Unicode Consortium and ISO standards do use slightly different
+   terminology.
+
+1.5.  Definitions
+
+   The following terms and their meanings are critical to understanding
+   the rest of this document and to discussions of IDNs more generally.
+   These terms are derived from [RFC3536], which contains additional
+   discussion of some of them.
+
+
+
+
+Klensin, et al.              Informational                      [Page 5]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+1.5.1.  Language
+
+   A language is a way that humans interact.  The use of language occurs
+   in many forms, including speech, writing, and signing.
+
+   Some languages have a close relationship between the written and
+   spoken forms, while others have a looser relationship.  RFC 3066
+   [RFC3066] discusses languages in more detail and provides identifiers
+   for languages for use in Internet protocols.  Computer languages are
+   explicitly excluded from this definition.  The most recent IETF work
+   in this area, and on script identification (see below), is documented
+   in [RFC4645] and [RFC4646].
+
+1.5.2.  Script
+
+   A script is a set of graphic characters used for the written form of
+   one or more languages.  This definition is the one used in
+   [ISO10646].
+
+   Examples of scripts are Arabic, Cyrillic, Greek, Han (the so-called
+   ideographs used in writing Chinese, Japanese, and Korean), and
+   "Latin".  Arabic, Greek, and Latin are, of course, also names of
+   languages.
+
+   Historically, the script that is known as "Latin" in Unicode and most
+   contexts associated with information technology standards is known in
+   the linguistic community as "Roman" or "Roman-derived".  The latter
+   terminology distinguishes between the Latin language and the
+   characters used to write it, especially in Republican times, from the
+   much richer and more decorated script derived and adapted from those
+   characters.  Since IDNA is defined using Unicode and that standard
+   used the term "LATIN" in its character names and descriptions, that
+   terminology will be used in this document as well except when
+   "Roman-derived" is needed for clarity.  However, readers approaching
+   this document from a cultural or linguistic standpoint should be
+   aware that the use of, or references to, "Latin script" in this
+   document refers to the entire collection of Roman-derived characters,
+   not just the characters used to write the Latin language.  Some other
+   issues with script identification and relationships with other
+   standards are discussed in [RFC4646].
+
+1.5.3.  Multilingual
+
+   The term "multilingual" has many widely-varying definitions and thus
+   is not recommended for use in standards.  Some of the definitions
+   relate to the ability to handle international characters; other
+   definitions relate to the ability to handle multiple charsets; and
+   still others relate to the ability to handle multiple languages.
+
+
+
+Klensin, et al.              Informational                      [Page 6]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   While this term has been deprecated for IETF-related uses and does
+   not otherwise appear in this document, a discussion here seemed
+   appropriate since the term is still widely used in some discussions
+   of IDNs.
+
+1.5.4.  Localization
+
+   Localization is the process of adapting an internationalized
+   application platform or application to a specific cultural
+   environment.  In localization, the same semantics are preserved while
+   the syntax or presentation forms may be changed.
+
+   Localization is the act of tailoring an application for a different
+   language or script or culture.  Some internationalized applications
+   can handle a wide variety of languages.  Typical users understand
+   only a small number of languages, so the program must be tailored to
+   interact with users in just the languages they know.
+
+   Somewhat different definitions for localization and
+   internationalization (see below) are used by groups other than the
+   IETF.  See [W3C-Localization] for one example.
+
+1.5.5.  Internationalization
+
+   In the IETF, the term "internationalization" is used to describe
+   adding or improving the handling of non-ASCII text in a protocol.
+   Other bodies use the term in other ways, often with subtle variation
+   in meaning.  The term "internationalization" is often abbreviated
+   "i18n" (and localization as "l10n").
+
+   Many protocols that handle text only handle the characters associated
+   with one script (often, a subset of the characters used in writing
+   English text), or leave the question of what character set is used up
+   to local guesswork (which leads to interoperability problems).
+   Adding non-ASCII text to such a protocol allows the protocol to
+   handle more scripts, with the intention of being able to include all
+   of the scripts that are useful in the world.  It is naive (sic) to
+   believe that all English words can be written in ASCII, various
+   mythologies notwithstanding.
+
+1.6.  Statements and Guidelines
+
+   When the IDNA RFCs were published, the IESG and ICANN made statements
+   that were intended to guide deployment and future work.  In recent
+   months, ICANN has updated its statement and others have also made
+   contributions.  It is worth noting that the quality of understanding
+   of internationalization issues as applied to the DNS has evolved
+
+
+
+
+Klensin, et al.              Informational                      [Page 7]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   considerably over the last few years.  Organizations that took
+   specific positions a year or more ago might not make exactly the same
+   statements today.
+
+1.6.1.  IESG Statement
+
+   The IESG made a statement on IDNA [IESG-IDN]:
+
+      IDNA, through its requirement of Nameprep [RFC3491], uses
+      equivalence tables that are based only on the characters
+      themselves; no attention is paid to the intended language (if any)
+      for the domain name.  However, for many domain names, the intended
+      language of one or more parts of the domain name actually does
+      matter to the users.
+
+      Similarly, many names cannot be presented and used without
+      ambiguity unless the scripts to which their characters belong are
+      known.  In both cases, this additional information should be of
+      concern to the registry.
+
+   The statement is longer than this, but these paragraphs are the
+   important ones.  The rest of the statement consists of explanations
+   and examples.
+
+1.6.2.  ICANN Statements
+
+1.6.2.1.  Initial ICANN Guidelines
+
+   Soon after the IDNA standards were adopted, ICANN produced an initial
+   version of its "IDN Guidelines" [ICANNv1].  This document was
+   intended to serve two purposes.  The first was to provide a basis for
+   releasing the Generic Top Level Domain (gTLD) registries that had
+   been established by ICANN from a contractual restriction on the
+   registration of labels containing hyphens in the third and fourth
+   positions.  The second was to provide a general framework for the
+   development of registry policies for the implementation of IDNs.
+
+   One of the key components of this framework prescribed strict
+   compliance with RFCs 3490, 3491, and 3492.  With the framework, ICANN
+   specified that IDNA was to be the sole mechanism to be used in the
+   DNS to represent IDNs.
+
+   Limitations on the characters available for inclusion in IDNs were
+   mandated by two mechanisms.  The first was by requiring an
+   "inclusion-based approach (meaning that code points that are not
+   explicitly permitted by the registry are prohibited) for identifying
+   permissible
+
+
+
+
+Klensin, et al.              Informational                      [Page 8]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   code points from among the full Unicode repertoire."  The second
+   mechanism required the association of every IDN with a specific
+   language, with additional policies also being language based:
+
+   "In implementing the IDN standards, top-level domain registries will
+   (a) associate each registered internationalized domain name with one
+   language or set of languages,
+   (b) employ language-specific registration and administration rules
+   that are documented and publicly available, such as the reservation
+   of all domain names with equivalent character variants in the
+   languages associated with the registered domain name, and,
+   (c) where the registry finds that the registration and administration
+   rules for a given language would benefit from a character variants
+   table, allow registrations in that language only when an appropriate
+   table is available. ...  In implementing the IDN standards, top-level
+   domain registries should, at least initially, limit any given domain
+   label (such as a second-level domain name) to the characters
+   associated with one language or set of languages only."
+
+   It was left to each TLD registry to define the character repertoire
+   it would associate with any given language.  This led to significant
+   variation from registry to registry, with further heterogeneity in
+   the underlying language-based IDN policies.  If the guidelines had
+   made provision for IDN policies also being based on script, a
+   substantial amount of the resulting ambiguity could have been
+   avoided.  However, they did not, and the sequence of events leading
+   to the present review of IDNA was thus triggered.
+
+1.6.2.2.  ICANN Version 2 Guidelines
+
+   One of the responses of the TLD registries to what was widely
+   perceived as a crisis situation was to invoke the mechanism described
+   in the initial guidelines: "As the deployment of IDNs proceeds, ICANN
+   and the IDN registries will review these Guidelines at regular
+   intervals, and revise them as necessary based on experience."
+
+   The pivotal requirement was the modification of the guidelines to
+   permit script-based policies for IDNs.  Further concern was expressed
+   about the need for realistically implementable mechanisms for the
+   propagation of TLD registry policies into the lower levels of their
+   name trees.  In addition to the anticipated increase of constraint on
+   the protocol level, one obvious additional approach would be to
+   replace the guidelines by an instrument that itself had clear status
+   in the IETF's normative framework.  A BCP was therefore seen as the
+   appropriate focus for longer-term effort.  The most pressing issues
+   would be dealt with in the interim by incremental modification to the
+   guidelines, but no need was seen for the detailed further development
+   of those guidelines once that incremental modification was complete.
+
+
+
+Klensin, et al.              Informational                      [Page 9]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   The outcome of this action was a version 2.0 of the guidelines
+   [ICANNv2], which was endorsed by the ICANN Board on November 8, 2005
+   for a period of nine months.  The Board stated further that it "tasks
+   the IDN working group to continue its important work and return to
+   the board with specific IDN improvement recommendations before the
+   ICANN Meeting in Morocco" and "supports the working group's continued
+   action to reframe the guidelines completely in a manner appropriate
+   for further development as a Best Current Practices (BCP) document,
+   to ensure that the Guideline directions will be used deeper into the
+   DNS hierarchy and within TLD's where ICANN has a lesser policy
+   relationship."
+
+   Retaining the inclusion-based approach established in version 1.0,
+   the crucial addition to the policy framework is that:
+
+   "All code points in a single label will be taken from the same script
+   as determined by the Unicode Standard Annex #24: Script Names at
+   http://www.unicode.org/reports/tr24.  Exception to this is
+   permissible for languages with established orthographies and
+   conventions that require the commingled use of multiple scripts.  In
+   such cases, visually confusable characters from different scripts
+   will not be allowed to coexist in a single set of permissible
+   codepoints unless a corresponding policy and character table is
+   clearly defined."
+
+   Additionally:
+
+   "Permissible code points will not include: (a) line symbol-drawing
+   characters (as those in the Unicode Box Drawing block), (b) symbols
+   and icons that are neither alphanumeric nor ideographic language
+   characters, such as typographic and pictographic dingbats, (c)
+   characters with well-established functions as protocol elements, (d)
+   punctuation marks used solely to indicate the structure of
+   sentences."
+
+   Attention has been called to several points that are not adequately
+   dealt with (if at all) in the version 2.0 guidelines but that ought
+   to be included in the policy framework without waiting for the
+   production and release of a document based on a "best practices"
+   model.  The term "BCP" above does not necessarily refer to an IETF
+   consensus document.
+
+   The intention in November 2005 was for the recommended major revision
+   to be put to the ICANN Board prior to its meeting in Morocco (in late
+   June 2006), but for the changes to be collated incrementally and
+   appear in interim version 2.n releases of the guidelines.  The IAB's
+   understanding is that, while there has been some progress with this,
+
+
+
+
+Klensin, et al.              Informational                     [Page 10]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   other issues relating to IDNs subsequently diverted much of the
+   energy that was intended to be devoted to the more extensive
+   treatment of the guidelines.
+
+2.  General Problems and Issues
+
+   This section interweaves problems and issues of several types.  Each
+   subsection outlines something that is perceived to be a problem or
+   issue "with IDNs", therefore needing correction.  Some of these
+   issues can be at least partially resolved by making changes to
+   elements of the IDNA protocol or tables.  Others will exist as long
+   as people have expectations of IDNs that are inconsistent with the
+   basic DNS architecture.  It is important to identify this entire
+   range of problems because users, registrants, and policy makers often
+   do not understand the protocol and other technical issues but only
+   the difference between what they believe happens or should happen and
+   what actually happens.  As long as those differences exist, there
+   will be demands for functionality or policy changes for IDNs.  Of
+   course, some of these demands will be less realistic than others, but
+   even the realistic ones should be understood in the same context as
+   the others.
+
+   Most of the issues that have been raised, and that are discussed in
+   this document, exist whether IDNA remains tied to Unicode 3.2 or
+   whether migration to new Unicode versions is contemplated.  A
+   migration path is necessary to accommodate newly-coded scripts and to
+   permit the maximum number of languages and scripts to be represented
+   in domain names.  However, the migration issues are largely separate
+   from those involving a single Unicode version or Version 3.2 in
+   particular, so they have been separated into this section and
+   Section 3.
+
+2.1.  User Conceptions, Local Character Sets, and Input issues
+
+   The labels of the DNS are just strings of characters that are not
+   inherently tied to a particular language.  As mentioned briefly in
+   the Introduction, DNS labels that could not lexically be words in any
+   language are possible and indeed common.  There appears to be no
+   reason to impose protocol restrictions on IDNs that would restrict
+   them more than all-ASCII hostname labels have been restricted.  For
+   that reason, even describing DNS labels or strings of them as "names"
+   is something of a misnomer, one that has probably added to user
+   confusion about what to expect.
+
+   Ordinarily, people use "words" when they think of things and wish
+   others to think of them too, for example, "orange", "tree",
+   "restaurant" or "Acme Inc".  Words are normally in a specific
+   language, such as English or Swedish.  The character-string labels
+
+
+
+Klensin, et al.              Informational                     [Page 11]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   supported by the DNS are, as suggested above, not inherently "words".
+   While it is useful, especially for mnemonic value or to identify
+   objects, for actual words to be used as DNS labels, other constraints
+   on the DNS make it impossible to guarantee that it will be possible
+   to represent every word in every language as a DNS label,
+   internationalized or not.
+
+   When writing or typing the label (or word), a script must be selected
+   and a charset must be picked for use with that script.  The choice of
+   charset is typically not under the control of the user on a per-word
+   or per-document basis, but may depend on local input devices,
+   keyboard or terminal drivers, or other decisions made by operating
+   system or even hardware designers and implementers.
+
+   If that charset, or the local charset being used by the relevant
+   operating system or application software, is not Unicode, a further
+   conversion must be performed to produce Unicode.  How often this is
+   an issue depends on estimates of how widely Unicode is deployed as
+   the native character set for hardware, operating systems, and
+   applications.  Those estimates differ widely, but it should be noted
+   that, among other difficulties:
+
+   o  ISO 8859 versions [ISO.8859.2003] and even national variations of
+      ISO 646 [ISO.646.1991], are still widely used in parts of Europe;
+
+   o  code-table switching methods, typically based on the techniques of
+      ISO 2022 [ISO.2022.1986] are still in general use in many parts of
+      the world, especially in Japan with Shift-JIS and its variations;
+      and
+
+   o  computing, systems, and communications in China tend to use one or
+      more of the national "GB" standards rather than native Unicode.
+
+   Additionally, not all charsets define their characters in the same
+   way and not all preexisting coding systems were incorporated into
+   Unicode without changes.  Sometimes local distinctions were made that
+   Unicode does not make or vice versa.  Consequently, conversion from
+   other systems to Unicode may potentially lose information.
+
+   The Unicode string that results from this processing -- processing
+   that is trivial in a Unicode-native system but that may be
+   significant in others -- is then used as input to IDNA.
+
+
+
+
+
+
+
+
+
+Klensin, et al.              Informational                     [Page 12]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+2.2.  Examples of Issues
+
+   While much of the discussion below is stated in terms of Unicode
+   codings and associated rules, the IAB believes that some of the
+   issues are actually not about the Unicode character set per se, but
+   about how distributed matching systems operate in reality, and about
+   what implications the distributed delayed search for stored data that
+   characterizes the DNS has on the mapping algorithms.
+
+2.2.1.  Language-Specific Character Matching
+
+   There are similar words that can be expressed in multiple languages.
+   Consider, for example, the name Torbjorn in Norwegian and Swedish.
+   In Norwegian it is spelled with the character U+00F8 (LATIN SMALL
+   LETTER O WITH STROKE) in the second syllable, while in Swedish it is
+   spelled with U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS).  Those
+   characters are not treated as equivalent according to the Unicode
+   Standard and its Annexes while most people speaking Swedish, Danish,
+   or Norwegian probably think they are equivalent.
+
+   It is neither possible nor desirable to make these characters
+   equivalent on a global basis.  To do so would, for this example,
+   rationalize the situation in Sweden while causing considerable
+   confusion in Germany because the U+00F8 character is never used in
+   the German language.  But the "variant" model introduced in [RFC3743]
+   and [RFC4290] can be used by a registry to prevent the worst
+   consequence of the possible confusion, by ensuring either that both
+   names are registered to the same party in a given domain or that one
+   of them is completely prohibited.
+
+2.2.2.  Multiple Scripts
+
+   There are languages in the world that can be expressed using multiple
+   scripts.  For example, some Eastern European and Central Asian
+   languages can be expressed in either Cyrillic or Latin (see
+   Section 1.5.2) characters, or some African and Southeast Asian
+   languages can be expressed in either Arabic or Latin characters.  A
+   few languages can even be written in three different scripts.  In
+   other cases, the language is typically written in a combination of
+   scripts (e.g., Kanji, Kana, and Romaji for Japanese; Hangul and Hanji
+   for Korean).  Because of this, the same word, in the same language,
+   can be expressed in different ways.  For some languages, only a
+   single script is normally used to write a single word; for others,
+   mixed scripts are required; and, for still others, special
+   circumstances may dictate mixing scripts in labels although that is
+   not normally done for "words".  For IDN purposes, these variations
+   make the definition of "script" extremely sensitive, especially since
+   ICANN is now recommending that it be used as the primary basis for
+
+
+
+Klensin, et al.              Informational                     [Page 13]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   registry policies.  However essential it may be to prohibit mixed-
+   script labels, additional policy nuance is required for "languages
+   with established orthographies and conventions that require the
+   commingled use of multiple scripts".
+
+2.2.3.  Normalization and Character Mappings
+
+   Unicode contains several different models for representing
+   characters.  The Chinese (Han)-derived characters of the "CJK"
+   (Chinese, Japanese, and Korean) languages are "unified", i.e.,
+   characters with common derivation and similar appearances are
+   assigned to the same code point.  European characters derived from a
+   Greek-Latin base are separated into separate code blocks for Latin,
+   Greek, and Cyrillic even when individual characters are identical in
+   both form and semantics.  Separate code points based on font
+   differences alone are generally prohibited, but a large number of
+   characters for "mathematical" use have been assigned separate code
+   points even though they differ from base ASCII characters only by
+   font attributes such as "script", "bold", or "italic".  Some
+   characters that often appear together are treated as typographical
+   digraphs with specific code points assigned to the combination,
+   others require that the two-character sequences be used, and still
+   others are available in both forms.  Some Roman-derived letters that
+   were developed as decorated variations on the basic Latin letter
+   collection (e.g., by addition of diacritical marks) are assigned code
+   points as individual characters, others must be built up as two (or
+   more) character sequences using "combining characters".
+
+   Many of these differences result from the desire to maintain backward
+   compatibility while the standard evolved historically, and are hence
+   understandable.  However, the DNS requires precise knowledge of which
+   codes and code sequences represent the same character and which ones
+   do not.  Limiting the potential difficulties with confusable
+   characters (see Section 2.2.6) requires even more knowledge of which
+   characters might look alike in some fonts but not in others.  These
+   variations make it difficult or impossible to apply a single set of
+   rules to all of Unicode and, in doing so, satisfy everyone and their
+   perceived needs.  Instead, more or less complex mapping tables,
+   defined on a character-by-character basis, are required to
+   "normalize" different representations of the same character to a
+   single form so that matching is possible.
+
+   Unless normalization rules, such as those that underlie Nameprep, are
+   applied, characters that are essentially identical will not match in
+   the DNS, creating many opportunities for problems.  The most common
+   of these problems is that, due to the processing applied (and
+   discussed above) before a word is represented as a Unicode string, a
+   single word can end up being expressed as several different Unicode
+
+
+
+Klensin, et al.              Informational                     [Page 14]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   strings.  Even if normalization rules are applied, some strings that
+   are considered identical by users will not compare equal.  That
+   problem is discussed in more detail elsewhere in this document,
+   particularly in Section 3.2.1.
+
+   IDNA attempts to compensate for these problems by using a
+   normalization algorithm defined by the Unicode Consortium.  This
+   algorithm can change a sequence of one or more Unicode characters to
+   another set of characters.  One example is that the base character
+   U+0061 (LATIN SMALL LETTER A) followed by U+0308 (COMBINING
+   DIAERESIS) is changed to the single Unicode character U+00E4 (LATIN
+   SMALL LETTER A WITH DIAERESIS).
+
+   This Unicode normalization process accounts only for simple character
+   equivalences, not equivalences that are language or script dependent.
+   For example, as mentioned above, the characters U+00F8 (LATIN SMALL
+   LETTER O WITH STROKE) and U+00F6 (LATIN SMALL LETTER O WITH
+   DIAERESIS) are considered to match in Swedish (and some other
+   languages), but not for all languages that use either of the
+   characters.  Having these characters be treated as equivalent in some
+   contexts and not in others requires decisions and mechanisms that, in
+   turn, depend much more on context than either IDNA or the Unicode
+   character-based normalization tables can provide.
+
+   Additional complications occur if the sequences are more complicated
+   or if an attacker is making a deliberate effort to confuse the
+   normalization process.  For example, if the sequence U+0069 U+0307
+   (LATIN SMALL LETTER I followed by COMBINING DOT ABOVE) appears, the
+   Unicode Normalization Method known as NFKC maps it into U+00EF (LATIN
+   SMALL LETTER I WITH DIAERESIS), which is what one would predict.  But
+   consider U+0131 U+0308 (LATIN SMALL LETTER DOTLESS I and COMBINING
+   DIAERESIS):  is that the same character?  Is U+0131 U+0307 U+0307
+   (dotless i and two combining dot-above characters) equivalent to
+   U+00EF or U+0069, or neither?  NFKC does not appear to tell us, nor
+   does the definition of U+0307 appear to tell us what happens when it
+   is combined with other "symbol above" arrangements (unlike some of
+   the "accent above" combining characters, which more or less specify
+   kerning).  Similar issues arise when U+00EF is combined with various
+   dot-above combining characters.  Each of these questions provides
+   some opportunities for spoofing if different display implementations
+   interpret the rules in different ways.
+
+   If we leave Latin scripts and examine those based on Chinese
+   characters, we see there is also an absence of specific, lexigraphic,
+   rules for transformations between Traditional and Simplified Chinese.
+   Even if there were such rules, unification of Japanese and Korean
+
+
+
+
+
+Klensin, et al.              Informational                     [Page 15]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   characters with Chinese ones would make it impossible to normalize
+   Traditional Chinese into Simplified Chinese ones without causing
+   problems in Japanese and Korean use of the same characters.
+
+   More generally, while some mappings, such as those between
+   precomposed Latin script characters and the equivalent multiple code
+   point composed character sequences, depend only on the characters
+   themselves, in many or most cases, such as the case with Swedish
+   above, the mapping is language or culturally dependent.  There have
+   been discussions as to whether different canonicalization rules (in
+   addition to or instead of Unicode normalization) should be, or could
+   be, applied differently to different languages or scripts.  The fact
+   that most scripts included in Unicode have been initially
+   incorporated by copying an existing standard more or less intact has
+   impact on the optimization of these algorithms and on forward
+   compatibility.  Even if the language is known and language-specific
+   rules can be defined, dependencies on the language do not disappear.
+   Canonicalization operations are not possible unless they either
+   depend only on short sequences of text or have significant context
+   available that is not obvious from the text itself.  DNS lookups and
+   many other operations do not have a way to capture and utilize the
+   language or other information that would be needed to provide that
+   context.
+
+   These variations in languages and in user perceptions of characters
+   make it difficult or impossible to provide uniform algorithms for
+   matching Unicode strings in a way that no end users are ever
+   surprised by the result.  For closely-related scripts or characters,
+   surprises may even be frequent.  However, because uniform algorithms
+   are required for mappings that are applied when names are looked up
+   in the DNS, the rules that are chosen will always represent an
+   approximation that will be more or less successful in minimizing
+   those user surprises.  The current Nameprep and Stringprep algorithms
+   use mapping tables to "normalize" different representations of the
+   same text to a single form so that matching is possible.
+
+   More details on the creation of the normalization algorithms can be
+   found in the Unicode Specification and the associated Technical
+   Reports [UTR] and Annexes.  Technical Report #36 [UTR36] and [UTR39]
+   are specifically related to the IDN discussion.
+
+2.2.4.  URLs in Printed Form
+
+   URLs and other identifiers appear, not only in electronic forms from
+   which they can (at least in principle) be accurately copied and
+   "pasted" but in printed forms from which the user must transcribe
+   them into the computer system.  This is often known as the "side-of-
+   the-bus problem" because a particularly problematic version of it
+
+
+
+Klensin, et al.              Informational                     [Page 16]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   requires that the user be able to observe and accurately remember a
+   URL that is quickly glimpsed in a transient form -- a billboard seen
+   while driving, a sign on the side of a passing vehicle, a television
+   advertisement that is not frequently repeated or on-screen for a long
+   time, and so on.
+
+   The difficulty, in short, is that two Unicode strings that are
+   actually different might look exactly the same, especially when there
+   is no time to study them.  This is because, for example, some glyphs
+   in Cyrillic, Greek, and Latin do look the same, but have been
+   assigned different code points in Unicode.  Worse, one needs to be
+   reasonably familiar with a script and how it is used to understand
+   how much characters can reasonably vary as the result of artistic
+   fonts and typography.  For example, there are a few fonts for Latin
+   characters that are sufficiently highly ornamented that an observer
+   might easily confuse some of the characters with characters in Thai
+   script.  Uppercase ITC Blackadder (a registered trademark of
+   International Typeface Corporation) and Curlz MT are two fairly
+   obvious examples; these fonts use loops at the end of serifs,
+   creating a resemblance to Thai (in some fonts) for some characters.
+
+2.2.5.  Bidirectional Text
+
+   Some scripts (and because of that some words in some languages) are
+   written not left to right, but right to left.  And, to complicate
+   things, one might have something written in Arabic script right to
+   left that includes some characters that are read from left to right,
+   such as European-style digits.  This implies that some texts might
+   have a mixed left-to-right AND right-to-left order (even though in
+   most implementations, and in IDNA, all texts have a major direction,
+   with the other as an exception).
+
+   IDNA permits the inclusion of European digits in a label that is
+   otherwise a sequence of right-to-left characters, but prohibits most
+   other mixed-directional (or bidirectional) strings.  This prohibition
+   can cause other problems such as the rejection of some otherwise
+   linguistically and culturally sensible strings.  As Unicode and
+   conventions for handling so-called bidirectional ("BIDI") strings
+   evolve, the prohibition in IDNA should be reviewed and reevaluated.
+
+2.2.6.  Confusable Character Issues
+
+   Similar-looking characters in identifiers can cause actual problems
+   on the Internet since they can result, deliberately or accidentally,
+   in people being directed to the wrong host or mailbox by believing
+   that they are typing, or clicking on, intended characters that are
+   different from those that actually appear in the domain name or
+   reference.  See Section 4.1.3 for further discussion of this issue.
+
+
+
+Klensin, et al.              Informational                     [Page 17]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   IDNs complicate these issues, not only by providing many additional
+   characters that look sufficiently alike to be potentially confused,
+   but also by raising new policy questions.  For example, if a language
+   can be written in two different scripts, is a label constructed from
+   a word written in one script equivalent to a label constructed from
+   the same word written in the other script?  Is the answer the same
+   for words in two different languages that translate into each other?
+
+   It is now generally understood that, in addition to the collision
+   problems of possibly equivalent words and hence labels, it is
+   possible to utilize characters that look alike -- "confusable"
+   characters -- to spoof names in order to mislead or defraud users.
+   That issue, driven by particular attacks such as those known as
+   "phishing", has introduced stronger requirements for registry efforts
+   to prevent problems than were previously generally recognized as
+   important.
+
+   One commonly-proposed approach is to have a registry establish
+   restrictions on the characters, and combinations of characters, it
+   will permit to be included in a string to be registered as a label.
+   Taking the Swedish top-level domain, .SE, as an example, a rule might
+   be adopted that the registry "only accepts registrations in Swedish,
+   using Latin script, and because of this, Unicode characters Latin-a,
+   -b, -c,...".  But, because there is not a 1:1 mapping between country
+   and language, even a Country Code Top Level Domain (ccTLD) like .SE
+   might have to accept registrations in other languages.  For example,
+   there may be a requirement for Finnish (the second most-used language
+   in Sweden).  What rules and code points are then defined for Finnish?
+   Does it have special mappings that collide with those that are
+   defined for Swedish?  And what does one do in countries that use more
+   than one script?  (Finnish and Swedish use the same script.)  In all
+   cases, the dispute will ultimately be about whether two strings are
+   the same (or confusingly similar) or not.  That, in turn, will
+   generate a discussion of how one defines "what is the same" and "what
+   is similar enough to be a problem".
+
+   Another example arose recently that further illustrates the problem.
+   If one were to use Cyrillic characters to represent the country code
+   for Russia in a localized equivalent to the ccTLD label, the
+   characters themselves would be indistinguishable from the Latin
+   characters "P" and "Y" (in either lower- or uppercase) in most fonts.
+   We presume this might cause some consternation in Paraguay.
+
+   These difficulties can never be completely eliminated by algorithmic
+   means.  Some of the problem can be addressed by appropriate tuning of
+   the protocols and their tables, other parts by registry actions to
+   reduce confusion and conflicts, and still other parts can be
+
+
+
+
+Klensin, et al.              Informational                     [Page 18]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   addressed by careful design of user interfaces in application
+   programs.  But, ultimately, some responsibility to avoid being
+   tricked or harmfully confused will rest with the user.
+
+   Another registry technique that has been extensively explored
+   involves looking at confusable characters and confusion between
+   complete labels, restricting the labels that can be registered based
+   on relationships to what is registered already.  Registries that
+   adopt this approach might establish special mapping rules such as:
+
+   1.  If you register something with code point A, domain names with B
+       instead of A will be blocked from registration by others (where B
+       is a character at a separate code point that has a confusingly
+       similar appearance to A).
+
+   2.  If you register something with code point A, you also get domain
+       name with B instead of A.
+
+   These approaches are discussed in more detail for "CJK" characters in
+   RFC 3743 [RFC3743] and more generally in RFC 4290 [RFC4290].
+
+2.2.7.  The IESG Statement and IDNA issues
+
+   The issues above, at least as they were understood at the time,
+   provided the background for the IESG statement included in
+   Section 1.6.1 (which, in turn, was part of the basis for the initial
+   ICANN Guidelines) that a registry should have a policy about the
+   scripts, languages, code points and text directions for which
+   registrations will be accepted.  While "accept all" might be an
+   acceptable policy, it implies there is also a dispute resolution
+   process that takes the problems listed above into account.  This
+   process must be designed for dealing with all types of potential
+   disputes.  For example, issues might arise between registrant and
+   registry over a decision by the registry on collisions with already
+   registered domain names and between registrant and trademark holder
+   (that a domain name infringes on a trademark).  In both cases, the
+   parties disagreeing have different views on whether two strings are
+   "equivalent" or not.  They may believe that a string that is not
+   allowed to be registered is actually different from one that is
+   already registered.  Or they might believe that two strings are the
+   same, even though the rules adopted by the registry to prevent
+   confusion define them as two different domain names.
+
+
+
+
+
+
+
+
+
+Klensin, et al.              Informational                     [Page 19]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+3.  Migrating to New Versions of Unicode
+
+3.1.  Versions of Unicode
+
+   While opinions differ about how important the issues are in practice,
+   the use of Unicode and its supporting tables for IDNA appears to be
+   far more sensitive to subtle changes than it is in typical Unicode
+   applications.  This may be, at least in part, because many other
+   applications are internally sensitive only to the appearance of
+   characters and not to their representation.  Or those applications
+   may be able to take effective advantage of script, language, or
+   character class identification.  The working group that developed
+   IDNA concluded that attempting to encode any ancillary character
+   information into the DNS label would be impractical and unwise, and
+   the IAB, based in part on the comments in the ad hoc committee, saw
+   no reason to review that decision.
+
+   The Unicode Consortium has sometimes used the likelihood of a
+   combination of characters actually appearing in a natural language as
+   a criterion for the safety of a possible change.  However, as
+   discussed above, DNS names are often fabrications -- abbreviations,
+   strings deliberately formed to be unusual, members of a series
+   sequenced by numbers or other characters, and so on.  Consequently, a
+   criterion that considers a change to be safe if it would not be
+   visible in properly-constructed running text is not helpful for DNS
+   purposes: a change that would be safe under that criterion could
+   still be quite problematic for the DNS.
+
+   This sensitivity to changes has made it quite difficult to migrate
+   IDNA from one version of Unicode to the next if any changes are made
+   that are not strictly additive.  A change in a code point assignment
+   or definition may be extremely disruptive if a DNS label has been
+   defined using the earlier form and any of its previous components has
+   been moved from one table position or normalization rule to another.
+   Unicode normalization tables, tables of scripts or languages and
+   characters that belong to them, and even tables of confusable
+   characters as an adjunct to security recommendations may be very
+   helpful in designing registry restrictions on registrations and
+   applications provisions for avoiding or identifying suspicious names.
+   Ironically, they also extend the sensitivity of IDNA and its
+   implementations to all forms of change between one version of Unicode
+   and the next.  Consequently, they make Unicode version migration more
+   difficult.
+
+   An example of the type of change that appears to be just a small
+   correction from one perspective but may be problematic from another
+   was the correction to the normalization definition in 2004
+   [Unicode-PR29].  Community input suggested that the change would
+
+
+
+Klensin, et al.              Informational                     [Page 20]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   cause problems for Stringprep, but the Unicode Technical Committee
+   decided, on balance, that the change was worthwhile.  Because of
+   difficulties with consistency, some deployed implementations have
+   decided to adopt the change and others have not, leading to subtle
+   incompatibilities.
+
+   This situation leads to a dilemma.  On the one hand, it is completely
+   unacceptable to freeze IDNA at a Unicode version level that excludes
+   more recently-defined characters and scripts that are important to
+   those who use them.  On the other hand, it is equally unacceptable to
+   migrate from one version of Unicode to the next if such migration
+   might invalidate an existing registered DNS name or some of its
+   registered properties or might make the string or representation of
+   that name ambiguous.  If IDNA is to be modified to accommodate new
+   versions of Unicode, the IETF will need to work with the Unicode
+   Consortium and other bodies to find an appropriate balance in this
+   area, but progress will be possible only if all relevant parties are
+   able to fairly consider and discuss possible decisions that may be
+   very difficult and unpalatable.
+
+   It would also prove useful if, during the course of that dialog, the
+   need for Unicode Consortium concern with security issues in
+   applications of the Unicode character set could be clarified.  It
+   would be unfortunate from almost every perspective considered here,
+   if such matters slowed the inclusion of as yet unencoded scripts.
+
+3.2.  Version Changes and Normalization Issues
+
+3.2.1.  Unnormalized Combining Sequences
+
+   One of the advantages of the Unicode model of combining characters,
+   as with previous systems that use character overstriking to
+   accomplish similar purposes, is that it is possible to use sequences
+   of code points to generate characters that are not explicitly
+   provided for in the character set.  However, unless sequences that
+   are not explicitly provided for are prohibited by some mechanism
+   (such as the normalization tables), such combining sequences can
+   permit two related dangers.
+
+   o  The first is another risk of character confusion, especially if
+      the relationship of the combining character with characters it
+      combines with are not precisely defined or unexpected combinations
+      of combining characters are used.  That issue is discussed in more
+      detail, with an example, in Section 2.2.3.
+
+   o  These same issues also inherently impact the stability of the
+      normalization tables.  Suppose that, somewhere in the world, there
+      is a character that looks like a Roman-derived lowercase "i", but
+
+
+
+Klensin, et al.              Informational                     [Page 21]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+      with three (not one or two) dots above it.  And suppose that the
+      users of that character agree to represent it by combining a
+      traditional "i" (U+0069) with a combining diaeresis (U+0308).  So
+      far, no problem.  But, later, a broader need for this character is
+      discovered and it is coded into Unicode either as a single
+      precomposed character or, more likely under existing rules, by
+      introducing a three-dot-above combining character.  In either
+      case, that version of Unicode should include a rule in NFKC that
+      maps the "i"-plus-diaeresis sequence into the new, approved, one.
+      If one does not do so, then there is arguably a normalization that
+      should occur that does not.  If one does so, then strings that
+      were valid and normalized (although unanticipated) under the
+      previous versions of Unicode become unnormalized under the new
+      version.  That, in turn, would impact IDNA comparisons because,
+      effectively, it would introduce a change in the matching rules.
+
+   It would be useful to consider rules that would avoid or minimize
+   these problems with the understanding that, for reasons given
+   elsewhere, simply minimizing it may not be good enough for IDNA.  One
+   partial solution might be to ban any combination of a base character
+   and a combining character that does not appear in a hypothetical
+   "anticipated combinations" table from being used in a domain name
+   label.  The next subsection discusses a more radical, if impractical,
+   view of the problem and its solutions.
+
+3.2.2.  Combining Characters and Character Components
+
+   For several reasons, including those discussed above, one thing that
+   increases IDNA complexity and the need for normalization is that
+   combining characters are permitted.  Without them, complexity might
+   be reduced enough to permit easier transitions to new versions.  The
+   community should consider the impact of entirely prohibiting
+   combining characters from IDNs.  While it is almost certainly
+   unfeasible to introduce this change into Unicode as it is now defined
+   and doing so would be extremely disruptive even if it were feasible,
+   the thought experiment can be helpful in understanding both the
+   issues and the implications of the paths not taken.  For example, one
+   consequence of this, of course, is that each new language or script,
+   and several existing ones, would require that all of its characters
+   have Unicode assignments to specific, precomposed, code points.
+
+   Note that this is not currently permitted within Unicode for Latin
+   scripts.  For non-Latin scripts, some such code points have been
+   defined.  The decisions that govern the assignment of such code
+   points are managed entirely within the Unicode Consortium.  Were the
+   IETF to choose to reduce IDNA complexity by excluding combining
+   characters, no doubt there would be additional input to the Unicode
+   Consortium from users and proponents of scripts that precomposed
+
+
+
+Klensin, et al.              Informational                     [Page 22]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   characters be required.  The IAB and the IETF should examine whether
+   it is appropriate to press the Unicode Consortium to revise these
+   policies or otherwise to recommend actions that would reduce the need
+   for normalization and the related complexities.  However, we have
+   been told that the Technical Committee does not believe it is
+   reasonable or feasible to add all possible precomposed characters to
+   Unicode.  If Unicode cannot be modified to contain the precomposed
+   characters necessary to support existing languages and scripts, much
+   less new ones, this option for IDN restrictions will not be feasible.
+
+3.2.3.  When does normalization occur?
+
+   In many Unicode applications, the preferred solution is to pick a
+   style of normalization and require that all text that is stored or
+   transmitted be normalized to that form.  (This is the approach taken
+   in ongoing work in the IETF on a standard Unicode text form
+   [net-utf8]).  IDNA does not impose this requirement.  Text is
+   normalized and case-reduced at registration time, and only the
+   normalized version is placed in the DNS.  However, there is no
+   requirement that applications show only the native (and lower-case
+   where appropriate) characters associated with the normalized form in
+   discussions or references such as URLs.  If conventions used for
+   all-ASCII DNS labels are to be extended to internationalized forms,
+   such a requirement would be unreasonable, since it would prohibit the
+   use of mixed-case references for clarity or market identification.
+   It might even be culturally inappropriate.  However, without that
+   restriction, the comparison that will ultimately be made in the DNS
+   will be between strings normalized at different times and under
+   different versions of Unicode.  The assertion that a string in
+   normalized form under one version of Unicode will still be in
+   normalized form under all future versions is not sufficient.
+   Normalization at different times also requires that a given source
+   string always normalizes to the same target string, regardless of the
+   version under which it is normalized.  That criterion is much more
+   difficult to fulfill.  The discussion above suggests that it may even
+   be impossible.
+
+   Ignoring these issues with combining characters entirely, as IDNA
+   effectively does today, may leave us "stuck" at Unicode 3.2, leading
+   either to incompatibility differences in applications that otherwise
+   use a modern version of Unicode (while IDN remains at Unicode 3.2) or
+   to painful transitions to new versions.  If decisions are made
+   quickly, it may still be possible to make a one-time version upgrade
+   to Version 4.1 or Version 5 of Unicode.  However, unless we can
+   impose sufficient global restrictions to permit smooth transitions,
+   upgrading to versions beyond that one are likely to be painful (e.g.,
+   potentially requiring changing strings already in the DNS or even a
+   new Punycode prefix) or impossible.
+
+
+
+Klensin, et al.              Informational                     [Page 23]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+4.  Framework for Next Steps in IDN Development
+
+4.1.  Issues within the Scope of the IETF
+
+4.1.1.  Review of IDNA
+
+   The IETF should consider reviewing RFCs 3454, 3490, 3491, and/or
+   3492, and update, replace, or supplement them to meet the criteria of
+   this paragraph (one or more of them may prove impractical after
+   further study).  Any new versions or additional specifications should
+   be adapted to the version of Unicode that is current when they are
+   created.  Ideally, they should specify a path for adapting to future
+   versions of Unicode (some suggestions below may facilitate this).
+   The IETF should also consider whether there are significant
+   advantages to mapping some groups of characters, such as code points
+   assigned to font variations, into others or whether clarity and
+   comprehensibility for the user would be better served by simply
+   prohibiting those characters.  More generally, it appears that it
+   would be worthwhile for the IETF to review whether the Unicode
+   normalization rules now invoked by the Stringprep profile in Nameprep
+   are optimal for the DNS or whether more restrictive rules, or an even
+   more restrictive set of permitted character combinations, would
+   provide better support for DNS internationalization.
+
+   The IAB has concluded that there is a consensus within the broader
+   community that lists of code points should be specified by the use of
+   an inclusion-based mechanism (i.e., identifying the characters that
+   are permitted), rather than by excluding a small number of characters
+   from the total Unicode set as Stringprep and Nameprep do today.  That
+   conclusion should be reviewed by the IETF community and action taken
+   as appropriate.
+
+   We suggest that the individuals doing the review of the code points
+   should work as a specialized design team.  To the extent possible,
+   that work should be done jointly by people with experience from the
+   IETF and deep knowledge of the constraints of the DNS and application
+   design, participants from the Unicode Consortium, and other people
+   necessary to be able to reach a generally-accepted result.  Because
+   any work along these lines would be modifications and updates to
+   standards-track documents, final review and approval of any proposals
+   would necessarily follow normal IETF processes.
+
+   It is worth noting that sufficiently extreme changes to IDNA would
+   require a new Punycode prefix, probably with long-term support for
+   both the old prefix and the new one in both registration arrangements
+   and applications.  An alternative, which is almost certainly
+   impractical, would be some sort of "flag day", i.e., a date on which
+   the old rules are simultaneously abandoned by everyone and the new
+
+
+
+Klensin, et al.              Informational                     [Page 24]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   ones adopted.  However, preliminary analysis indicates that few, if
+   any, of the changes recommended for consideration elsewhere in this
+   document would require this type of version change.  For example,
+   suppose additional restrictions, such as those implied above, are
+   imposed on what can be registered.  Those restrictions might require
+   policy decisions about how labels are to be disposed of if they
+   conformed to the earlier rules but not to the new ones.  But they
+   would not inherently require changes in the protocol or prefix.
+
+4.1.2.  Non-DNS and Above-DNS Internationalization Approaches
+
+   The IETF should once again examine the extent to which it is
+   appropriate to try to solve internationalization problems via the DNS
+   and what place the many varieties of so-called "keyword systems" or
+   other Internet navigational techniques might have.  Those techniques
+   can be designed to impose fewer constraints, or at least different
+   constraints, than IDNA and the DNS.  As discussed elsewhere in this
+   document, IDNA cannot support information about scripts, languages,
+   or Unicode versions on lookup.  As a consequence of the nature of DNS
+   lookups, characters and labels either match or do not match; a near-
+   match is simply not a possible concept in the DNS.  By contrast,
+   observation of near-matching is common in human communication and in
+   matching operations performed by people, especially when they have a
+   particular script or language context in mind.  The DNS is further
+   constrained by a fairly rigid internal aliasing system (via CNAME and
+   DNAME resource records), while some applications of international
+   naming may require more flexibility.  Finally, the rigid hierarchy of
+   the DNS --and the tendency in practice for it to become flat at
+   levels nearest the root-- and the need for names to be unique are
+   more suitable for some purposes than others and may not be a good
+   match for some purposes for which people wish to use IDNs.  Each of
+   these constraints can be relaxed or changed by one or more systems
+   that would provide alternatives to direct use of the DNS by users.
+   Some of the issues involved are discussed further in Section 5.3 and
+   various ideas have been discussed in detail in the IETF or IRTF.
+   Many of those ideas have even been described in Internet Drafts or
+   other documents.  As experience with IDNs and with expectations for
+   them accumulates, it will probably become appropriate for the IETF or
+   IRTF to revisit the underlying questions and possibilities.
+
+4.1.3.  Security Issues, Certificates, etc.
+
+   Some characters look like others, often as the result of common
+   origins.  The problem with these "confusable" characters, often
+   incorrectly called homographs, has always existed when characters are
+   presented to humans who interpret what is displayed and then make
+   decisions based on what is seen.  This is not a problem that exists
+   only when working with internationalized domain names, but they make
+
+
+
+Klensin, et al.              Informational                     [Page 25]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   the problem worse.  The result of a survey that would explain what
+   the problems are might be interesting.  Many of these issues are
+   mentioned in Unicode Technical Report #36 [UTR36].
+
+   In this and other issues associated with IDNs, precise use of
+   terminology is important lest even more confusion result.  The
+   definition of the term 'homograph' that normally appears in
+   dictionaries and linguistic texts states that homographs are
+   different words that are spelled identically (for example, the
+   adjective 'brief' meaning short, the noun 'brief' meaning a document,
+   and the verb 'brief' meaning to inform).  By definition, letters in
+   two different alphabets are not the same, regardless of similarities
+   in appearance.  This means that sequences of letters from two
+   different scripts that appear to be identical on a computer display
+   cannot be homographs in the accepted sense, even if they are both
+   words in the dictionary of some language.  Assuming that there is a
+   language written with Cyrillic script in which "cap" is a word,
+   regardless of what it might mean, it is not a homograph of the
+   Latin-script English word "cap".
+
+   When the security implications of visually confusable characters were
+   brought to the forefront in 2005, the term homograph was used to
+   designate any instance of graphic similarity, even when comparing
+   individual characters.  This usage is not only incorrect, but risks
+   introducing even more confusion and hence should be avoided.  The
+   current preferred terminology is to describe these similar-looking
+   characters as "confusable characters" or even "confusables".
+
+   Many people have suggested that confusable characters are a problem
+   that must be addressed, at least in part, directly in the user
+   interfaces of application software.  While it should almost certainly
+   be part of a complete solution, that approach creates it own set of
+   difficulties.  For example, a user switching between systems, or even
+   between applications on the same system, may be surprised by
+   different types of behavior and different levels of protection.  In
+   addition, it is unclear how a secure setup for the end user should be
+   designed.  Today, in the web browser, a padlock is a traditional way
+   of describing some level of security for the end user.  Is this
+   binary signaling enough?  Should there be any connection between a
+   risk for a displayed string including confusable characters and the
+   padlock or similar signaling to the user?
+
+   Many web browsers have adopted a convention, based on a "whitelist"
+   or similar technique, of restricting the display of native characters
+   to subdomains of top-level domains that are deemed to have safe
+   practices for the registration of potentially confusable labels.
+   IDNs in other domains are displayed as Punycode.  These techniques
+   may not be sufficiently sensitive to differences in policies among
+
+
+
+Klensin, et al.              Informational                     [Page 26]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   top-level domains and their subdomains and so, while they are clearly
+   helpful, they may not be adequate.  Are other methods of dealing with
+   confusable characters possible?  Would other methods of identifying
+   and listing policies about avoiding confusing registrations be
+   feasible and helpful?
+
+   It would be interesting to see a more coordinated effort in
+   establishing guidelines for user interfaces.  If nothing else, the
+   current whitelists are browser specific and both can, and do, differ
+   between implementations.
+
+4.1.4.  Protocol Changes and Policy Implications
+
+   Some potential protocol or table changes raise important policy
+   issues about what to do with existing, registered, names.  Should
+   such changes be needed, their impact must be carefully evaluated in
+   the IETF, ICANN, and possibly other forums.  In particular, protocol
+   or policy changes that would not permit existing names to be
+   registered under the newer rules should be considered carefully,
+   balancing their importance against possible disruption and the issues
+   of invalidating older names against the importance of consistency as
+   seen by the user.
+
+4.1.5.  Non-US-ASCII in Local Part of Email Addresses
+
+   Work is going on in the IETF related to the local part of email
+   addresses.  It should be noted that the local part of email addresses
+   has much different syntax and constraints than a domain name label,
+   so to directly apply IDNA on the local part is not possible.
+
+4.1.6.  Use of the Unicode Character Set in the IETF
+
+   Unicode and the closely-related ISO 10646 are the only coded
+   character sets that aspire to include all of the world's characters.
+   As such, they permit use of international characters without having
+   to identify particular character coding standards or tables.  The
+   requirement for a single character set is particularly important for
+   use with the DNS since there is no place to put character set
+   identification.  The decision to use Unicode as the base for IETF
+   protocols going forward is discussed in [RFC2277].  The IAB does not
+   see any reason to revisit the decision to use Unicode in IETF
+   protocols.
+
+
+
+
+
+
+
+
+
+Klensin, et al.              Informational                     [Page 27]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+4.2.  Issues That Fall within the Purview of ICANN
+
+4.2.1.  Dispute Resolution
+
+   IDNs create new types of collisions between trademarks and domain
+   names as well as collisions between domain names.  These have impact
+   on dispute resolution processes used by registries and otherwise.  It
+   is important that deployment of IDNs evolve in parallel with review
+   and updating of ICANN or registry-specific dispute resolution
+   processes.
+
+4.2.2.  Policy at Registries
+
+   The IAB recommends that registries use an inclusion-based model when
+   choosing what characters to allow at the time of registration.  This
+   list of characters is in turn to be a subset of what is allowed
+   according to the updated IDNA standard.  The IAB further recommends
+   that registries develop their inclusion-based models in parallel with
+   dispute resolution process at the registry itself.
+
+   Most established policies for dealing with claimed or apparent
+   confusion or conflicts of names are based on dispute resolution.
+   Decisions about legitimate use or registration of one or more names
+   are resolved at or after the time of registration on a case-by-case
+   basis and using policies that are specific to the particular DNS zone
+   or jurisdiction involved.  These policies have generally not been
+   extended below the level of the DNS that is directly controlled by
+   the top-level registry.
+
+   Because of the number of conflicts that can be generated by the
+   larger number of available and confusable characters in Unicode, we
+   recommend that registration-restriction and dispute resolution
+   policies be developed to constrain registration of IDNs and zone
+   administrators at all levels of the DNS tree.  Of course, many of
+   these policies will be less formal than others and there is no
+   requirement for complete global consistency, but the arguments for
+   reduction of confusable characters and other issues in TLDs should
+   apply to all zones below that specific TLD.
+
+   Consistency across all zones can obviously only be accomplished by
+   changes to the protocols.  Such changes should be considered by the
+   IETF if particular restrictions are identified that are important and
+   consistent enough to be applied globally.
+
+   Some potential protocol changes or changes to character-mapping
+   tables might, if adopted, have profound registry policy implications.
+   See Section 4.1.4.
+
+
+
+
+Klensin, et al.              Informational                     [Page 28]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+4.2.3.  IDNs at the Top Level of the DNS
+
+   The IAB has concluded that there is not one issue with IDNs at the
+   top level of the DNS (IDN TLDs) but at least three very separate
+   ones:
+
+   o  If IDNs are to be entered in the root zone, decisions must first
+      be made about how these TLDs are to be named and delegated.  These
+      decisions fall within the traditional IANA scope and are ICANN
+      issues today.
+
+   o  There has been discussion of permitting some or all existing TLDs
+      to be referenced by multiple labels, with those labels presumably
+      representing some understanding of the "name" of the TLD in
+      different languages.  If actual aliases of this type are desired
+      for existing domains, the IETF may need to consider whether the
+      use of DNAME records in the root is appropriate to meet that need,
+      what constraints, if any, are needed, whether alternate
+      approaches, such as those of [RFC4185], are appropriate or whether
+      further alternatives should be investigated.  But, to the extent
+      to which aliases are considered desirable and feasible, decisions
+      presumably must be made as to which, if any, root IDN labels
+      should be associated with DNAME records and which ones should be
+      handled by normal delegation records or other mechanisms.  That
+      decision is one of DNS root-level namespace policy and hence falls
+      to ICANN although we would expect ICANN to pay careful attention
+      to any technical, operational, or security recommendations that
+      may be produced by other bodies.
+
+   o  Finally, if IDN labels are to be placed in the root zone, there
+      are issues associated with how they are to be encoded and
+      deployed.  This area may have implications for work that has been
+      done, or should be done, in the IETF.
+
+5.  Specific Recommendations for Next Steps
+
+   Consistent with the framework described above, the IAB offers these
+   recommendations as steps for further consideration in the identified
+   groups.
+
+5.1.  Reduction of Permitted Character List
+
+   Generalize from the original "hostname" rules to non-ASCII
+   characters, permitting as few characters as possible to do that job.
+   This would involve a restrictive model for characters permitted in
+   IDN labels, thus contrasting with the approach used to develop the
+   original IDNA/Nameprep tables.  That approach was to include all
+   Unicode characters that there was not a clear reason to exclude.
+
+
+
+Klensin, et al.              Informational                     [Page 29]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   The specific recommendation here is to specify such internationalized
+   hostnames.  Such an activity would fall to the IETF, although the
+   task of developing the appropriate list of permitted characters will
+   require effort both in the IETF and elsewhere.  The effort should be
+   as linguistically and culturally sensitive as possible, but smooth
+   and effective operation of the DNS, including minimizing of
+   complexity, should be primary goals.  The following should be
+   considered as possible mechanisms for achieving an appropriate
+   minimum number of characters.
+
+5.1.1.  Elimination of All Non-Language Characters
+
+   Unicode characters that are not needed to write words or numbers in
+   any of the world's languages should be eliminated from the list of
+   characters that are appropriate in DNS labels.  In addition to such
+   characters as those used for box-drawing and sentence punctuation,
+   this should exclude punctuation for word structure and other
+   delimiters.  While DNS labels may conveniently be used to express
+   words in many circumstances, the goal is not to express words (or
+   sentences or phrases), but to permit the creation of unambiguous
+   labels with good mnemonic value.
+
+5.1.2.  Elimination of Word-Separation Punctuation
+
+   The inclusion of the hyphen in the original hostname rules is a
+   historical artifact from an older, flat, namespace.  The community
+   should consider whether it is appropriate to treat it as a simple
+   legacy property of ASCII names and not attempt to generalize it to
+   other scripts.  We might, for example, not permit claimed equivalents
+   to the hyphen from other scripts to be used in IDNs.  We might even
+   consider banning use of the hyphen itself in non-ASCII strings or,
+   less restrictively, strings that contained non-Latin characters.
+
+5.2.  Updating to New Versions of Unicode
+
+   As new scripts, to support new languages, continue to be added to
+   Unicode, it is important that IDNA track updates.  If it does not do
+   so, but remains "stuck" at 3.2 or some single later version, it will
+   not be possible to include labels in the DNS that are derived from
+   words in languages that require characters that are available only in
+   later versions.  Making those upgrades is difficult, and will
+   continue to be difficult, as long as new versions require, not just
+   addition of characters, but changes to canonicalization conventions,
+   normalization tables, or matching procedures (see Section 3.1).
+   Anything that can be done to lower complexity and simplify forward
+   transitions should be seriously considered.
+
+
+
+
+
+Klensin, et al.              Informational                     [Page 30]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+5.3.  Role and Uses of the DNS
+
+   We wish to remind the community that there are boundaries to the
+   appropriate uses of the DNS.  It was designed and implemented to
+   serve some specific purposes.  There are additional things that it
+   does well, other things that it does badly, and still other things it
+   cannot do at all.  No amount of protocol work on IDNs will solve
+   problems with alternate spellings, near-matches, searching for
+   appropriate names, and so on.  Registration restrictions and
+   carefully-designed user interfaces can be used to reduce the risk and
+   pain of attempts to do some of these things gone wrong, as well as
+   reducing the risks of various sort of deliberate bad behavior, but,
+   beyond a certain point, use of the DNS simply because it is available
+   becomes a bad tradeoff.  The tradeoff may be particularly unfortunate
+   when the use of IDNs does not actually solve the proposed problem.
+   For example, internationalization of DNS names does not eliminate the
+   ASCII protocol identifiers and structure of URIs [RFC3986] and even
+   IRIs [RFC3987].  Hence, DNS internationalization itself, at any or
+   all levels of the DNS tree, is not a sufficient response to the
+   desire of populations to use the Internet entirely in their own
+   languages and the characters associated with those languages.
+
+   These issues are discussed at more length, and alternatives
+   presented, in [RFC2825], [RFC3467], [INDNS], and [DNS-Choices].
+
+5.4.  Databases of Registered Names
+
+   In addition to their presence in the DNS, IDNs introduce issues in
+   other contexts in which domain names are used.  In particular, the
+   design and content of databases that bind registered names to
+   information about the registrant (commonly described as "whois"
+   databases) will require review and updating.  For example, the whois
+   protocol itself [RFC3912] has no standard capability for handling
+   non-ASCII text: one cannot search consistently for, or report, either
+   a DNS name or contact information that is not in ASCII characters.
+   This may provide some additional impetus for a switch to IRIS
+   [RFC3981] [RFC3982] but also raises a number of other questions about
+   what information, and in what languages and scripts, should be
+   included or permitted in such databases.
+
+6.  Security Considerations
+
+   This document is simply a discussion of IDNs and IDNA issues; it
+   raises no new security concerns.  However, if some of its
+   recommendations to reduce IDNA complexity, the number of available
+   characters, and various approaches to constraining the use of
+   confusable characters, are followed and prove successful, the risks
+   of name spoofing and other problems may be reduced.
+
+
+
+Klensin, et al.              Informational                     [Page 31]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+7.  Acknowledgements
+
+   The contributions to this report from members of the IAB-IDN ad hoc
+   committee are gratefully acknowledged.  Of course, not all of the
+   members of that group endorse every comment and suggestion of this
+   report.  In particular, this report does not claim to reflect the
+   views of the Unicode Consortium as a whole or those of particular
+   participants in the work of that Consortium.
+
+   The members of the ad hoc committee were: Rob Austein, Leslie Daigle,
+   Tina Dam, Mark Davis, Patrik Faltstrom, Scott Hollenbeck, Cary Karp,
+   John Klensin, Gervase Markham, David Meyer, Thomas Narten, Michael
+   Suignard, Sam Weiler, Bert Wijnen, Kurt Zeilenga, and Lixia Zhang.
+
+   Thanks are due to Tina Dam and others associated with the ICANN IDN
+   Working Group for contributions of considerable specific text, to
+   Marcos Sanz and Paul Hoffman for careful late-stage reading and
+   extensive comments, and to Pete Resnick for many contributions and
+   comments, both in conjunction with his former IAB service and
+   subsequently.  Olaf M. Kolkman took over IAB leadership for this
+   document after Patrik Faltstrom and Pete Resnick stepped down in
+   March 2006.
+
+   Members of the IAB at the time of approval of this document were:
+   Bernard Aboba, Loa Andersson, Brian Carpenter, Leslie Daigle, Patrik
+   Faltstrom, Bob Hinden, Kurtis Lindqvist, David Meyer, Pekka Nikander,
+   Eric Rescorla, Pete Resnick, Jonathan Rosenberg and Lixia Zhang.
+
+8.  References
+
+8.1.  Normative References
+
+   [ISO10646]          International Organization for Standardization,
+                       "Information Technology - Universal Multiple-
+                       Octet Coded Character Set (UCS) - Part 1:
+                       Architecture and Basic Multilingual Plane"",
+                       ISO/IEC 10646-1:2000, October 2000.
+
+   [RFC3454]           Hoffman, P. and M. Blanchet, "Preparation of
+                       Internationalized Strings ("stringprep")",
+                       RFC 3454, December 2002.
+
+   [RFC3490]           Faltstrom, P., Hoffman, P., and A. Costello,
+                       "Internationalizing Domain Names in Applications
+                       (IDNA)", RFC 3490, March 2003.
+
+
+
+
+
+
+Klensin, et al.              Informational                     [Page 32]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   [RFC3491]           Hoffman, P. and M. Blanchet, "Nameprep: A
+                       Stringprep Profile for Internationalized Domain
+                       Names (IDN)", RFC 3491, March 2003.
+
+   [RFC3492]           Costello, A., "Punycode: A Bootstring encoding of
+                       Unicode for Internationalized Domain Names in
+                       Applications (IDNA)", RFC 3492, March 2003.
+
+   [Unicode32]         The Unicode Consortium, "The Unicode Standard,
+                       Version 3.0", 2000.
+                       (Reading, MA, Addison-Wesley, 2000.  ISBN
+                       0-201-61633-5).  Version 3.2 consists of the
+                       definition in that book as amended by the Unicode
+                       Standard Annex #27: Unicode 3.1
+                       (http://www.unicode.org/reports/tr27/) and by the
+                       Unicode Standard Annex #28: Unicode 3.2
+                       (http://www.unicode.org/reports/tr28/).
+
+8.2.  Informative References
+
+   [DNS-Choices]       Faltstrom, P., "Design Choices When Expanding
+                       DNS", Work in Progress, June 2005.
+
+   [ICANNv1]           ICANN, "Guidelines for the Implementation of
+                       Internationalized Domain Names, Version 1.0",
+                       March 2003, <http://www.icann.org/general/
+                       idn-guidelines-20jun03.htm>.
+
+   [ICANNv2]           ICANN, "Guidelines for the Implementation of
+                       Internationalized Domain Names, Version 2.0",
+                       November 2005, <http://www.icann.org/general/
+                       idn-guidelines-20sep05.htm>.
+
+   [IESG-IDN]          Internet Engineering Steering Group (IESG), "IESG
+                       Statement on IDN", IESG Statements IDN Statement,
+                       February 2003, <http://www.ietf.org/IESG/
+                       STATEMENTS/IDNstatement.txt>.
+
+   [INDNS]             National Research Council, "Signposts in
+                       Cyberspace: The Domain Name System and Internet
+                       Navigation", National Academy Press ISBN 0309-
+                       09640-5 (Book) 0309-54979-5 (PDF), 2005, <http://
+                       www7.nationalacademies.org/cstb/pub_dns.html>.
+
+   [ISO.2022.1986]     International Organization for Standardization,
+                       "Information Processing: ISO 7-bit and 8-bit
+                       coded character sets: Code extension techniques",
+                       ISO Standard 2022, 1986.
+
+
+
+Klensin, et al.              Informational                     [Page 33]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   [ISO.646.1991]      International Organization for Standardization,
+                       "Information technology - ISO 7-bit coded
+                       character set for information interchange",
+                       ISO Standard 646, 1991.
+
+   [ISO.8859.2003]     International Organization for Standardization,
+                       "Information processing - 8-bit single-byte coded
+                       graphic character sets - Part 1: Latin alphabet
+                       No. 1 (1998) - Part 2: Latin alphabet No. 2
+                       (1999) - Part 3: Latin alphabet No. 3 (1999) -
+                       Part 4: Latin alphabet No. 4 (1998) - Part 5:
+                       Latin/Cyrillic alphabet (1999) - Part 6: Latin/
+                       Arabic alphabet (1999) - Part 7: Latin/Greek
+                       alphabet (2003) - Part 8: Latin/Hebrew alphabet
+                       (1999) - Part 9: Latin alphabet No. 5 (1999) -
+                       Part 10: Latin alphabet No. 6 (1998) - Part 11:
+                       Latin/Thai alphabet (2001) - Part 13: Latin
+                       alphabet No. 7 (1998) - Part 14: Latin alphabet
+                       No. 8 (Celtic) (1998) - Part 15: Latin alphabet
+                       No. 9 (1999) - Part 16: Part 16: Latin alphabet
+                       No. 10 (2001)", ISO Standard 8859, 2003.
+
+   [RFC2277]           Alvestrand, H., "IETF Policy on Character Sets
+                       and Languages", BCP 18, RFC 2277, January 1998.
+
+   [RFC2825]           IAB and L. Daigle, "A Tangled Web: Issues of
+                       I18N, Domain Names, and the Other Internet
+                       protocols", RFC 2825, May 2000.
+
+   [RFC3066]           Alvestrand, H., "Tags for the Identification of
+                       Languages", BCP 47, RFC 3066, January 2001.
+
+   [RFC3467]           Klensin, J., "Role of the Domain Name System
+                       (DNS)", RFC 3467, February 2003.
+
+   [RFC3536]           Hoffman, P., "Terminology Used in
+                       Internationalization in the IETF", RFC 3536,
+                       May 2003.
+
+   [RFC3743]           Konishi, K., Huang, K., Qian, H., and Y. Ko,
+                       "Joint Engineering Team (JET) Guidelines for
+                       Internationalized Domain Names (IDN) Registration
+                       and Administration for Chinese, Japanese, and
+                       Korean", RFC 3743, April 2004.
+
+   [RFC3912]           Daigle, L., "WHOIS Protocol Specification",
+                       RFC 3912, September 2004.
+
+
+
+
+Klensin, et al.              Informational                     [Page 34]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+   [RFC3981]           Newton, A. and M. Sanz, "IRIS: The Internet
+                       Registry Information Service (IRIS) Core
+                       Protocol", RFC 3981, January 2005.
+
+   [RFC3982]           Newton, A. and M. Sanz, "IRIS: A Domain Registry
+                       (dreg) Type for the Internet Registry Information
+                       Service (IRIS)", RFC 3982, January 2005.
+
+   [RFC3986]           Berners-Lee, T., Fielding, R., and L. Masinter,
+                       "Uniform Resource Identifier (URI): Generic
+                       Syntax", STD 66, RFC 3986, January 2005.
+
+   [RFC3987]           Duerst, M. and M. Suignard, "Internationalized
+                       Resource Identifiers (IRIs)", RFC 3987,
+                       January 2005.
+
+   [RFC4185]           Klensin, J., "National and Local Characters for
+                       DNS Top Level Domain (TLD) Names", RFC 4185,
+                       October 2005.
+
+   [RFC4290]           Klensin, J., "Suggested Practices for
+                       Registration of Internationalized Domain Names
+                       (IDN)", RFC 4290, December 2005.
+
+   [RFC4645]           Ewell, D., "Initial Language Subtag Registry",
+                       RFC 4645, September 2006.
+
+   [RFC4646]           Phillips, A. and M. Davis, "Tags for Identifying
+                       Languages", BCP 47, RFC 4646, September 2006.
+
+   [UTR]               Unicode Consortium, "Unicode Technical Reports",
+                       <http://www.unicode.org/reports/>.
+
+   [UTR36]             Davis, M. and M. Suignard, "Unicode Technical
+                       Report #36: Unicode Security Considerations",
+                       November 2005, <http://www.unicode.org/draft/
+                       reports/tr36/tr36.html>.
+
+   [UTR39]             Davis, M. and M. Suignard, "Unicode Technical
+                       Standard #39 (proposed): Unicode Security
+                       Considerations", July 2005, <http://
+                       www.unicode.org/draft/reports/tr39/tr39.html>.
+
+   [Unicode-PR29]      The Unicode Consortium, "Public Review Issue #29:
+                       Normalization Issue", Unicode PR 29,
+                       February 2004.
+
+   [Unicode10]         The Unicode Consortium, "The Unicode Standard,
+
+
+
+Klensin, et al.              Informational                     [Page 35]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+                       Version 1.0", 1991.
+
+   [W3C-Localization]  Ishida, R. and S. Miller, "Localization vs.
+                       Internationalization", W3C International/
+                       questions/qa-i18n.txt, December 2005.
+
+   [net-utf8]          Klensin, J. and M. Padlipsky, "Unicode Format for
+                       Network Interchange", Work in Progress,
+                       April 2006.
+
+Authors' Addresses
+
+   John C Klensin
+   1770 Massachusetts Ave, #322
+   Cambridge, MA  02140
+   USA
+
+   Phone: +1 617 491 5735
+   EMail: john-ietf@jck.com
+
+
+   Patrik Faltstrom
+   Cisco Systems
+
+   EMail: paf@cisco.com
+
+
+   Cary Karp
+   Swedish Museum of Natural History
+   Box 50007
+   Stockholm  SE-10405
+   Sweden
+
+   Phone: +46 8 5195 4055
+   EMail: ck@nrm.museum
+
+
+   IAB
+
+   EMail: iab@iab.org
+
+
+
+
+
+
+
+
+
+
+
+Klensin, et al.              Informational                     [Page 36]
+
+RFC 4690                 IAB -- IDN Next Steps            September 2006
+
+
+Full Copyright Statement
+
+   Copyright (C) The Internet Society (2006).
+
+   This document is subject to the rights, licenses and restrictions
+   contained in BCP 78, and except as set forth therein, the authors
+   retain all their rights.
+
+   This document and the information contained herein are provided on an
+   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+   The IETF takes no position regarding the validity or scope of any
+   Intellectual Property Rights or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; nor does it represent that it has
+   made any independent effort to identify any such rights.  Information
+   on the procedures with respect to rights in RFC documents can be
+   found in BCP 78 and BCP 79.
+
+   Copies of IPR disclosures made to the IETF Secretariat and any
+   assurances of licenses to be made available, or the result of an
+   attempt made to obtain a general license or permission for the use of
+   such proprietary rights by implementers or users of this
+   specification can be obtained from the IETF on-line IPR repository at
+   http://www.ietf.org/ipr.
+
+   The IETF invites any interested party to bring to its attention any
+   copyrights, patents or patent applications, or other proprietary
+   rights that may cover technology that may be required to implement
+   this standard.  Please address the information to the IETF at
+   ietf-ipr@ietf.org.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is provided by the IETF
+   Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+Klensin, et al.              Informational                     [Page 37]
+
author	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
committer	Thomas Voss <mail@thomasvoss.com>	2024-11-27 20:54:24 +0100
commit	4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
tree	e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4690.txt
parent	ea76e11061bda059ae9f9ad130a9895cc85607db (diff)