summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc3743.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc3743.txt')
-rw-r--r--doc/rfc/rfc3743.txt1851
1 files changed, 1851 insertions, 0 deletions
diff --git a/doc/rfc/rfc3743.txt b/doc/rfc/rfc3743.txt
new file mode 100644
index 0000000..2adb10f
--- /dev/null
+++ b/doc/rfc/rfc3743.txt
@@ -0,0 +1,1851 @@
+
+
+
+
+
+
+Network Working Group K. Konishi
+Request for Comments: 3743 K. Huang
+Category: Informational H. Qian
+ Y. Ko
+ April 2004
+
+
+ Joint Engineering Team (JET) Guidelines for
+ Internationalized Domain Names (IDN) Registration and
+ Administration for Chinese, Japanese, and Korean
+
+Status of this Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2004). All Rights Reserved.
+
+IESG Note
+
+ The IESG congratulates the Joint Engineering Team (JET) on developing
+ mechanisms to enforce their desired policy. The Language Variant
+ Table mechanisms described here allow JET to enforce language-based
+ character variant preferences, and they set an example for those who
+ might want to use variant tables for their own policy enforcement.
+
+ The IESG encourages those following this example to take JET's
+ diligence as an example, as well as its technical work. To follow
+ their example, registration authorities may need to articulate
+ policy, develop appropriate procedures and mechanisms for
+ enforcement, and document the relationship between the two. JET's
+ LVT mechanism should be adaptable to different policies, and can be
+ considered during that development process.
+
+ The IETF does not, of course, dictate policy or require the use of
+ any particular mechanisms for the implementation of these policies,
+ as these are matters of sovereignty and contract.
+
+Abstract
+
+ Achieving internationalized access to domain names raises many
+ complex issues. These are associated not only with basic protocol
+ design, such as how names are represented on the network, compared,
+ and converted to appropriate forms, but also with issues and options
+ for deployment, transition, registration, and administration.
+
+
+
+Konishi, et al. Informational [Page 1]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ The IETF Standards for Internationalized Domain Names, known as
+ "IDNA", focuses on access to domain names in a range of scripts that
+ is broader in scope than the original ASCII. The development process
+ made it clear that use of characters with similar appearances and/or
+ interpretations created potential for confusion, as well as
+ difficulties in deployment and transition. The conclusion was that,
+ while those issues were important, they could best be addressed
+ administratively rather than through restrictions embedded in the
+ protocols. This document defines a set of guidelines for applying
+ restrictions of that type for Chinese, Japanese and Korean (CJK)
+ scripts and the zones that use them and, perhaps, the beginning of a
+ framework for thinking about other zones, languages, and scripts.
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 2. Definitions, Context, and Notation . . . . . . . . . . . . . . 5
+ 2.1. Definitions and Context. . . . . . . . . . . . . . . . . 5
+ 2.2. Notation for Ideographs and Other Non-ASCII CJK
+ Characters . . . . . . . . . . . . . . . . . . . . . . . 9
+ 3. Scope of the Administrative Guidelines . . . . . . . . . . . . 9
+ 3.1. Principles Underlying These Guidelines . . . . . . . . . 10
+ 3.2. Registration of IDL. . . . . . . . . . . . . . . . . . . 13
+ 3.2.1. Using the Language Variant Table . . . . . . . . 13
+ 3.2.2. IDL Package. . . . . . . . . . . . . . . . . . . 14
+ 3.2.3. Procedure for Registering IDLs . . . . . . . . . 14
+ 3.3. Deletion and Transfer of IDL and IDL Package . . . . . . 19
+ 3.4. Activation and Deactivation of IDL Variants . . . . . . 19
+ 3.4.1. Activation Algorithm . . . . . . . . . . . . . . 19
+ 3.4.2. Deactivation Algorithm . . . . . . . . . . . . . 20
+ 3.5. Managing Changes in Language Associations. . . . . . . . 21
+ 3.6. Managing Changes to Language Variant Tables. . . . . . . 21
+ 4. Examples of Guideline Use in Zones . . . . . . . . . . . . . . 21
+ 5. Syntax Description for the Language Variant Table. . . . . . . 25
+ 5.1. ABNF Syntax. . . . . . . . . . . . . . . . . . . . . . . 25
+ 5.2. Comments and Explanation of Syntax . . . . . . . . . . . 25
+ 6. Security Considerations. . . . . . . . . . . . . . . . . . . . 27
+ 7. Index to Terminology . . . . . . . . . . . . . . . . . . . . . 27
+ 8. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 28
+ 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29
+ 9.1. Normative References . . . . . . . . . . . . . . . . . . 29
+ 9.2. Informative References . . . . . . . . . . . . . . . . . 30
+ 10. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 30
+ 10.1. Authors' Addresses . . . . . . . . . . . . . . . . . . . 31
+ 10.2. Editors' Addresses . . . . . . . . . . . . . . . . . . . 32
+ 11. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 33
+
+
+
+
+
+Konishi, et al. Informational [Page 2]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+1. Introduction
+
+ Domain names form the fundamental naming architecture of the
+ Internet. Countless Internet protocols and applications rely on
+ them, not just for stability and continuity, but also to avoid
+ ambiguity. They were designed to be identifiers without any language
+ context. However, as domain names have become visible to end users
+ through Web URLs and e-mail addresses, the strings in domain-name
+ labels are being increasingly interpreted as names, words, or
+ phrases. It is likely that users will do the same with languages of
+ differing character sets, such as Chinese, Japanese and Korean (CJK),
+ in which many words or concepts are represented using short sequences
+ of characters.
+
+ The introduction of what are called Internationalized Domain Names
+ (IDN) amplifies both the difficulty of putting names into identifiers
+ and the confusion that exists between scripts and languages.
+ Character symbols that appear (or actually are) identical, or that
+ have similar or identical semantics but that are assigned the
+ different code points, further increase the potential for confusion.
+ DNS internationalization also affects a number of Internet protocols
+ and applications and creates additional layers of complexity in terms
+ of technical administration and services. Given the added
+ complications of using a much broader range of characters than the
+ original small ASCII subset, precautions are necessary in the
+ deployment of IDNs in order to minimize confusion and fraud.
+
+ The IETF IDN Working Group [IDN-WG] addressed the problem of handling
+ the encoding and decoding of Unicode strings into and out of Domain
+ Name System (DNS) labels with the goal that its solution would not
+ put the operational DNS at any risk. Its work resulted in one
+ primary protocol and three supporting ones, respectively:
+
+ 1. Internationalizing Host Names in Applications [IDNA]
+ 2. Preparation of Internationalized Strings [STRINGPREP]
+ 3. A Stringprep Profile for Internationalized Domain Names
+ [NAMEPREP]
+ 4. Punycode [PUNYCODE]
+
+ IDNA, which calls on the others, normalizes and transforms strings
+ that are intended to be used as IDNs. In combination, the four
+ provide the minimum functions required for internationalization, such
+ as performing case mappings, eliminating character differences that
+ would cause severe problems, and specifying matching (equality).
+ They also convert between the resulting Unicode code points and an
+ ASCII-based form that is more suitable for storing in actual DNS
+ labels. In this way, the IDNA transformations improve a user's
+ chances of getting to the correct IDN.
+
+
+
+Konishi, et al. Informational [Page 3]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ Addressing the issues around differing character sets, a primary
+ consideration and administrative challenge involves region-specific
+ definitions, interpretations, and the semantics of strings to be used
+ in IDNs. A Unicode string may have a specific meaning as a name,
+ word, or phrase in a particular language but that meaning could vary
+ depending on the country, region, culture, or other context in which
+ the string is used. It might also have different interpretations in
+ different languages that share some or all of the same characters.
+ Therefore, individual zones and zone administrators may find it
+ necessary to impose restrictions and procedures to reduce the
+ likelihood of confusion, and instabilities of reference, within their
+ own environments.
+
+ Over the centuries, the evolution of CJK characters, and the
+ differences in their use in different languages and even in different
+ regions where the same language is spoken, has given rise to the idea
+ of "variants", wherein one conceptual character can be identified
+ with several different Code Points in character sets for computer
+ use. This document provides a framework for handling such variants
+ while minimizing the possibility of serious user confusion in the
+ obtaining or using of domain names. However, the concept of variants
+ is complex and may require many different layers of solutions. This
+ guideline offers only one of those solution components. It is not
+ sufficient by itself to solve the whole problem, even with zone-
+ specific tables as described below.
+
+ Additionally, because of local language or writing-system
+ differences, it is impossible to create universally accepted
+ definitions for which potential variants are the same and which are
+ not the same. It is even more difficult to define a technical
+ algorithm to generate variants that are linguistically accurate.
+ That is, that the variant forms produced make as much sense in the
+ language as the originally specified forms. It is also possible that
+ variants generated may have no meaning in the associated language or
+ languages. The intention is not to generate meaningful "words" but
+ to generate similar variants to be reserved. So even though the
+ method described in this document may not always be linguistically
+ accurate, nor does it need to be, it increases the chances of getting
+ the right variants while accepting the inherent limitations of the
+ DNS and the complexities of human language.
+
+ This document outlines a model for such conventions for zones in
+ which labels that contain CJK characters are to be registered and a
+ system for implementing that model. It provides a mechanism that
+ allows each zone to define its own local rules for permitted
+ characters and sequences and the handling of IDNs and their variants.
+
+
+
+
+
+Konishi, et al. Informational [Page 4]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ The document is an effort of the Joint Engineering Team (JET), a
+ group composed of members of CNNIC, TWNIC, KRNIC, and JPNIC as well
+ as other individual experts. It offers guidelines for zone
+ administrators, including but not limited to registry operators and
+ registrars and information for all domain names holders on the
+ administration of domain names that contain characters drawn from
+ Chinese, Japanese, and Korean scripts. Other language groups are
+ encouraged to develop their own guidelines as needed, based on these
+ guidelines if that is helpful.
+
+2. Definitions, Context, and Notation
+
+2.1. Definitions and Context
+
+ This document uses a number of special terms. In this section,
+ definitions and explanations are grouped topically. Some readers may
+ prefer to skip over this material, returning, perhaps via the index
+ to terminology in section 7, when needed.
+
+2.1.1. IDN
+
+ IDN: The term "IDN" has a number of different uses: (a) as an
+ abbreviation for "Internationalized Domain Name"; (b) as a fully
+ qualified domain name that contains at least one label that contains
+ characters not appearing in ASCII, specifically not in the subset of
+ ASCII recommended for domain names (the so-called "hostname" or "LDH"
+ subset, see RFC1035 [STD13]); (c) as a label of a domain name that
+ contains at least one character beyond ASCII; (d) as a Unicode string
+ to be processed by Nameprep; (e) as a string that is an output from
+ Nameprep; (f) as a string that is the result of processing through
+ both Nameprep and conversion into Punycode; (g) as the abbreviation
+ of an IDN (more properly, IDL) Package, in the terminology of this
+ document; (h) as the abbreviation of the IETF IDN Working Group; (g)
+ as the abbreviation of the ICANN IDN Committee; and (h) as standing
+ for other IDN activities in other companies/organizations.
+
+ Because of the potential confusion, this document uses the term "IDN"
+ as an abbreviation for Internationalized Domain Name and,
+ specifically, in the second sense described in (b) above. It uses
+ "IDL," defined immediately below, to refer to Internationalized
+ Domain Labels.
+
+2.1.2. IDL
+
+ IDL: This document provides a guideline to be applied on a per-zone
+ basis, one label at a time. Therefore, the term "Internationalized
+ Domain Label" or "IDL" will be used instead of the more general term
+ "IDN" or its equivalents. The processing specifications of this
+
+
+
+Konishi, et al. Informational [Page 5]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ document may be applied, in some zones, to ASCII characters also, if
+ those characters are specified as valid in a Language Variant Table
+ (see below). Hence, in some zones, an IDL may contain or consist
+ entirely of "LDH" characters.
+
+2.1.3. FQDN
+
+ FQDN: A fully qualified domain name, one that explicitly contains all
+ labels, including a Top-Level Domain (TLD) name. In this context, a
+ TLD name is one whose label appears in a nameserver record in the
+ root zone. The term "Domain Name Label" refers to any label of a
+ FQDN.
+
+2.1.4. Registrations
+
+ Registration: In this document, the term "registration" refers to the
+ process by which a potential domain name holder requests that a label
+ be placed in the DNS either as an individual name within a domain or
+ as a subdomain delegation from another domain name holder. In the
+ case of a successful registration, the label or delegation records
+ are placed in the relevant zone file, or, more specifically, they are
+ "activated" or made "active" and additional IDLs may be reserved as
+ part of an "IDL Package" (see below). The guidelines presented here
+ are recommended for all zones, at any hierarchy level, in which CJK
+ characters are to appear and not just domains at the first or second
+ level.
+
+2.1.5. RFC3066
+
+ RFC3066: A system, widely used in the Internet, for coding and
+ representing names of languages [RFC3066]. It is based on an
+ International Organization for Standardization (ISO) standard for
+ coding language names [ISO639], but expands it to provide additional
+ precision.
+
+2.1.6. ISO/IEC 10646
+
+ ISO/IEC 10646: The international standard universal multiple-octet
+ coded character set ("UCS") [IS10646]. The Code Point definitions of
+ this standard are identical to those of corresponding versions of the
+ Unicode standard (see below). Consequently, the characters and their
+ coding are often referred to as "Unicode characters."
+
+2.1.7. Unicode Character
+
+ Unicode Character: The term "Unicode character" is used here in
+ reference to characters chosen from the Unicode Standard Version 3.2
+ [UNICODE] (and hence from ISO/IEC 10646). In this document, the
+
+
+
+Konishi, et al. Informational [Page 6]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ characters are identified by their positions, or "Code Points." The
+ notation U+12AB, for example, indicates the character at the position
+ 12AB (hexadecimal) in the Unicode 3.2 table. For characters in
+ positions above FFFF, i.e., requiring more than sixteen bits to
+ represent, a five to eight-character string is used, such as U+112AB
+ for the character in position 12AB of plane 1.
+
+2.1.8. Unicode String
+
+ Unicode String: "Unicode string" refers to a string of Unicode
+ characters. The Unicode string is identified by the sequence of the
+ Unicode characters regardless of the encoding scheme.
+
+2.1.9. CJK Characters
+
+ CJK Characters: CJK characters are characters commonly used in the
+ Chinese, Japanese, or Korean languages, including but not limited to
+ those defined in the Unicode Standard as ASCII (U+0020 to U+007F),
+ Han ideographs (U+3400 to U+9FAF and U+20000 to U+2A6DF), Bopomofo
+ (U+3100 to U+312F and U+31A0 to U+31BF), Kana (U+3040 to U+30FF),
+ Jamo (U+1100 to 11FF and U+3130 to U+318F), Hangul (U+AC00 to U+D7AF
+ and U+3130 to U+318F), and the respective compatibility forms. The
+ particular characters that are permitted in a given zone are
+ specified in the Language Variant Table(s) for that zone.
+
+2.1.10. Label String
+
+ Label String: A generic term referring to a string of characters that
+ is a candidate for registration in the DNS or such a string, once
+ registered. A label string may or may not be valid according to the
+ rules of this specification and may even be invalid for IDNA use.
+ The term "label", by itself, refers to a string that has been
+ validated and may be formatted to appear in a DNS zone file.
+
+2.1.11. Language Variant Table
+
+ Language Variant Table: The key mechanisms of this specification
+ utilize a three-column table, called a Language Variant Table, for
+ each language permitted to be registered in the zone. Those columns
+ are known, respectively, as "Valid Code Point", "Preferred Variant",
+ and "Character Variant", which are defined separately below. The
+ Language Variant Tables are critical to the success of the guideline
+ described in this document. However, the principles to be used to
+ generate the tables are not within the scope of this document and
+ should be worked out by each registry separately (perhaps by adopting
+ or adapting the work of some other registry). In this document,
+ "Table" and "Variant Table" are used as short forms for Language
+ Variant Table.
+
+
+
+Konishi, et al. Informational [Page 7]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+2.1.12. Valid Code Point
+
+ Valid Code Point: In a Language Variant Table, the list of Code
+ Points that is permitted for that language. Any other Code Points,
+ or any string containing them, will be rejected by this
+ specification. The Valid Code Point list appears as the first column
+ of the Language Variant Table.
+
+2.1.13. Preferred Variant
+
+ Preferred Variant: In a Language Variant Table, a list of Code Points
+ corresponding to each Valid Code Point and providing possible
+ substitutions for it. These substitutions are "preferred" in the
+ sense that the variant labels generated using them are normally
+ registered in the zone file, or "activated." The Preferred Code
+ Points appear in column 2 of the Language Variant Table. "Preferred
+ Code Point" is used interchangeably with this term.
+
+2.1.14. Character Variant
+
+ Character Variant: In a Language Variant Table, a second list of Code
+ Points corresponding to each Valid Code Point and providing possible
+ substitutions for it. Unlike the Preferred Variants, substitutions
+ based on Character Variants are normally reserved but not actually
+ registered (or "activated"). Character Variants appear in column 3
+ of the Language Variant Table. The term "Code Point Variants" is
+ used interchangeably with this term.
+
+2.1.15. Preferred Variant Label
+
+ Preferred Variant Label: A label generated by use of Preferred
+ Variants (or Preferred Code Points).
+
+2.1.16. Character Variant Label
+
+ Character Variant Label: A label generated by use of Character
+ Variants.
+
+2.1.17. Zone Variant
+
+ Zone Variant: A Preferred or Character Variant Label that is actually
+ to be entered (registered) into the DNS. That is, into the zone file
+ for the relevant zone. Zone Variants are also referred to as Zone
+ Variant Labels or Active (or Activated) Labels.
+
+
+
+
+
+
+
+Konishi, et al. Informational [Page 8]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+2.1.18. IDL Package
+
+ IDL Package: A collection of IDLs as determined by these Guidelines.
+ All labels in the package are "reserved", meaning they cannot be
+ registered by anyone other than the holder of the Package. These
+ reserved IDLs may be "activated", meaning they are actually entered
+ into a zone file as a "Zone Variant". The IDL Package also contains
+ identification of the language(s) associated with the registration
+ process. The IDL and its variant labels form a single, atomic unit.
+
+2.2. Notation for Ideographs and Other Non-ASCII CJK Characters.
+
+ For purposes of clarity, particularly in regard to examples, Han
+ ideographs appear in several places in this document. However, they
+ do not appear in the ASCII version of this document. For the
+ convenience of readers of the ASCII version, and some readers not
+ familiar with recognizing and distinguishing Chinese characters, most
+ uses of these characters will be associated with both their Unicode
+ Code Points and an "asterisk tag" with its corresponding Chinese
+ Romanization [ISO7098], with the tone mark represented by a number
+ from 1 to 4. Those tags have no meaning outside this document; they
+ are a quick visual and reading reference to help facilitate the
+ combinations and transformations of characters in the guideline and
+ table excerpts.
+
+3. Scope of the Administrative Guidelines
+
+ Zone administrators are responsible for the administration of the
+ domain name labels under their control. A zone administrator might
+ be responsible for a large zone, such as a top-level domain (TLD),
+ whether generic or country code, or a smaller one, such as a typical
+ second- or third-level domain. A large zone is often more complex
+ than its smaller counterpart. However, actual technical
+ administrative tasks, such as addition, deletion, delegation, and
+ transfer of zones between domain name holders, are similar for all
+ zones.
+
+ This document provides guidelines for the ways CJK characters should
+ be handled within a zone, for how language issues should be
+ considered and incorporated, and for how Domain Name Labels
+ containing CJK characters should be administered (including
+ registration, deletion, and transfer of labels).
+
+ Other IDN policies, such as the creation of new top-level domains
+ (TLDs), the cost structure for registrations, and how the processes
+ described here get allocated between registrar and registry if the
+ zone makes that distinction, also are outside the scope of this
+ document.
+
+
+
+Konishi, et al. Informational [Page 9]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ Technical implementation issues are not discussed here either. For
+ example, deciding which guidelines should be implemented as registry
+ actions and which should be registrar actions is left to zone
+ administrators, with the possibility that it will differ from zone to
+ zone.
+
+3.1. Principles Underlying These Guidelines
+
+ In many places, in the event of a dispute over rights to a name (or,
+ more accurately, DNS label string), this document assumes "first-
+ come, first-served" (FCFS) as a resolution policy even though FCFS is
+ not listed below as one of the principles for this document. If
+ policies are already in place governing priorities and "rights", one
+ can use the guidelines here by replacing uses of FCFS in this
+ document with policies specific to the zone. Some of the guidelines
+ here may not be applicable to other policies for determining rights
+ to labels. Still other alternatives, such as use of UDRP [UDRP] or
+ mutual exclusion, might have little impact on other aspects of these
+ guidelines.
+
+ (a) Although some Unicode strings may be pure identifiers made up of
+ an assortment of characters from many languages and scripts, IDLs are
+ likely to be "words" or "names" or "phrases" that have specific
+ meaning in a language. While a zone administration might or might
+ not require "meaning" as a registration criterion, meaning could
+ prove to be a useful tool for avoiding user confusion.
+
+ Each IDL to be registered should be associated administratively
+ with one or more languages.
+
+ Language associations should either be predetermined by the zone
+ administrator and applied to the entire zone or be chosen by the
+ registrants on a per-IDL basis. The latter may be necessary for some
+ zones, but it will make administration more difficult and will
+ increase the likelihood of conflicts in variant forms.
+
+ A given zone might have multiple languages associated with it or it
+ may have no language specified at all. Omitting specification of a
+ language may provide additional opportunities for user confusion and
+ is therefore NOT recommended.
+
+ (b) Each language uses only a subset of Unicode characters.
+ Therefore, if an IDL is associated with a language, it is not
+ permitted to contain any Unicode character that is not within the
+ valid subset for that language.
+
+ Each IDL to be registered must be verified against the valid
+ subset of Unicode for the language(s) associated with the IDL.
+
+
+
+Konishi, et al. Informational [Page 10]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ That subset is specified by the list of characters appearing in
+ the first column of the language and zone-specific tables as
+ described later in this document.
+
+ If the IDL fails this test for any of its associated languages, the
+ IDL is not valid for registration.
+
+ Note that this verification is not necessarily linguistically
+ accurate, because some languages have special rules. For example,
+ some languages impose restrictions on the order in which particular
+ combinations of characters may appear. Characters that are valid for
+ the language, and hence permitted by this specification, might still
+ not form valid words or even strings in the language.
+
+ (c) When an IDL is associated with a language, it may have Character
+ Variants that depend on that language associated with it in addition
+ to any Preferred Variants. These variants are potential sources of
+ confusion with the Code Points in the original label string.
+ Consequently, the labels generated from them should be unavailable to
+ registrants of other names, words, or phrases.
+
+ During registration, all labels generated from the Character
+ Variants for the associated language(s) of the IDL should be
+ reserved.
+
+ IDL reservations of the type described here normally do not appear in
+ the distributed DNS zone file. In other words, these reserved IDLs
+ may not resolve. Domain name holders could request that these
+ reserved IDLs be placed in the zone file and made active and
+ resolvable.
+
+ Zones will need to establish local policies about how they are to be
+ made active. Specifically, many zones, especially at the top level,
+ have prohibited or restricted the use of "CNAME"s DNS aliases,
+ especially CNAMEs that point to nameserver delegation records (NS
+ records). And long-term use of long-term aliases for domain
+ hierarchies, rather than single names ("DNAME records") are
+ considered problematic because of the recursion they can introduce
+ into DNS lookups.
+
+ (d) When an IDL is a "name", "word", or "phrase", it will have
+ Character Variants depending on the associated language.
+ Furthermore, one or more of those Character Variants will be used
+ more often than others for linguistic, political, or other reasons.
+
+ These more commonly used variants are distinguished from ordinary
+ Character Variants and are known as Preferred Variant(s) for the
+ particular language.
+
+
+
+Konishi, et al. Informational [Page 11]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ To increase the likelihood of correct and predictable resolution
+ of the IDN by end users, all labels generated from the Preferred
+ Variants for the associated language(s) should be resolvable.
+
+ In other words, the Preferred Variant Labels should appear in the
+ distributed DNS zone file.
+
+ (e) IDLs associated with one or more languages may have a large
+ number of Character Variant Labels or Preferred Variant Labels. Some
+ of these labels may include combinations of characters that are
+ meaningless or invalid linguistically. It may therefore be
+ appropriate for a zone to adopt procedures that include only
+ linguistically acceptable labels in the IDL Package.
+
+ A zone administrator may impose additional rules and other
+ processing activities to limit the number of Character Variant
+ Labels or Preferred Variant Labels that are actually reserved or
+ registered.
+
+ These additional rules and other processing activities are based on
+ policies and/or procedures imposed on a per-zone basis and therefore
+ are not within the scope of this document. Such policies or
+ procedures might be used, for example, to restrict the number of
+ Preferred Variant Labels actually reserved or to prevent certain
+ words from being registered at all.
+
+ (f) There are some Character Variant Labels and Preferred Variant
+ Labels that are associated with each IDL. These labels are
+ considered "equivalent" to each another. To avoid confusion, they
+ all should be assigned to a single domain name holder.
+
+ The IDL and its variant labels should be grouped together into a
+ single atomic unit, known in this document as an "IDL Package".
+
+ The IDL Package is created upon registration and is atomic: Transfer
+ and deletion of an IDL is performed on the IDL Package as a whole.
+ That is, an IDL within the IDL Package may not be transferred or
+ deleted individually; any re-registration, transfers, or other
+ actions that impact the IDL should also affect the other variants.
+
+ The name-conflict resolution policy associated with this zone could
+ result in a conflict with the principle of IDL Package atomicity. In
+ such a case, the policy must be defined to make the precedence clear.
+
+
+
+
+
+
+
+
+Konishi, et al. Informational [Page 12]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+3.2. Registration of IDL
+
+ To conform to the principles described in 3.1, this document
+ introduces two concepts: the Language Variant Table and the IDL
+ Package. These are described in the next two subsections, followed
+ by a description of the algorithm that is used to interpret the table
+ and generate variant labels.
+
+3.2.1. Using the Language Variant Table
+
+ For each zone that uses a given language, each language should have
+ its own Language Variant Table. The table consists of a header
+ section that identifies references and version information, followed
+ by a section with one row for each Code Point that is valid for the
+ language and three columns.
+
+ (1) The first column contains the subset of Unicode characters
+ that is valid to be registered ("Valid Code Point"). This is
+ used to verify the IDL to be registered (see 3.1b). As in the
+ registration procedure described later, this column is used as
+ an index to examine characters that appear in a proposed IDL
+ to be processed. The collection of Valid Code Points in the
+ table for a particular language can be thought of as defining
+ the script for that language, although the normal definition
+ of a script would not include, for example, ASCII characters
+ with CJK ones.
+
+ (2) The second column contains the Preferred Variant(s) of the
+ corresponding Unicode character in column one ("Valid Code
+ Point"). These variant characters are used to generate the
+ Preferred Variant Labels for the IDL. Those labels should be
+ resolvable (see 3.1d). Under normal circumstances, all of
+ those Preferred Variant Labels will be activated in the
+ relevant zone file so that they will resolve when the DNS is
+ queried for them.
+
+ (3) The third column contains the Character Variant(s) for the
+ corresponding Valid Code Point. These are used to generate
+ the Character Variant Labels of the IDL, which are then to be
+ reserved (see 3.1c). Registration, or activation, of labels
+ generated from Character Variants will normally be a
+ registrant decision, subject to local policy.
+
+ Each entry in a column consists of one or more Code Points, expressed
+ as a numeric character number in the Unicode table and optionally
+ followed by a parenthetical reference. The first column, or Valid
+ Code Point, may have only one Code Point specified in a given row.
+ The other columns may have more than one.
+
+
+
+Konishi, et al. Informational [Page 13]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ Any row may be terminated with an optional comment, starting with
+ "#".
+
+ The formal syntax of the table and more-precise definitions of some
+ of its organization appear in Section 5.
+
+ The Language Variant Table should be provided by a relevant group,
+ organization, or body. However, the question of who is relevant or
+ has the authority to create this table and the rules that define it
+ is beyond the scope of this document.
+
+3.2.2. IDL Package
+
+ The IDL Package is created on successful registration and consists
+ of:
+
+ (1) the IDL registered
+
+ (2) the language(s) associated with the IDL
+
+ (3) the version of the associated character variant table
+
+ (4) the reserved IDLs
+
+ (5) active IDLs, that is, "Zone Variant Labels" that are to appear
+ in the DNS zone file
+
+3.2.3. Procedure for Registering IDLs
+
+ An explanation follows each step.
+
+ Step 1. IN <= IDL to be registered and
+ {L} <= Set of languages associated with IN
+
+ Start the process with the label string (prospective IDL) to be
+ registered and the associated language(s) as input.
+
+ Step 2. Generate the Nameprep-processed version of the IN,
+ applying all mappings and canonicalization required by
+ IDNA.
+
+ The prospective IDL is processed by using Nameprep to apply the
+ normalizations and exclusions globally required to use IDNA. If the
+ Nameprep processing fails, then the IDL is invalid and the
+ registration process must stop.
+
+
+
+
+
+
+Konishi, et al. Informational [Page 14]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ Step 2.1. NP(IN) <= Nameprep processed IN
+ Step 2.2. Check availability of NP(IN). If not available, route to
+ conflict policy.
+
+ The Nameprep-processed IDL is then checked against the contents of
+ the zone file and previously created IDL Packages. If it is already
+ registered or reserved, then a conflict exists that must be resolved
+ by applying whatever policy is applicable for the zone. For example,
+ if FCFS is used, the registration process terminates unless the
+ conflict resolution policy provides another alternative.
+
+ Step 3. Process each language.
+ For each language (AL) in {L}
+
+ Step 3 goes through all languages associated with the proposed IDL
+ and checks each character (after Nameprep has been applied) for
+ validity in each of them. It then applies the Preferred Variants
+ (column 2 values) and the Character Variants (column 3 values) to
+ generate candidate labels.
+
+ Step 3.1. Check validity of NP(IN) in AL. If failed, stop
+ processing.
+
+ In step 3.1, IDL validation is done by checking that every Code Point
+ in the Nameprep-processed IDL is a Code Point allowed by the "Valid
+ Code Point" column of the Character Variant Table for the language.
+ This is then repeated for any other languages (and hence, Language
+ Variant Tables) specified in the registration. If one or more Code
+ Points are not valid, the registration process terminates.
+
+ Step 3.2. PV(IN,AL) <= Set of available Nameprep-processed Preferred
+ Variants of NP(IN) in AL
+
+ Step 3.2 generates the list of Preferred Variant Labels of the IDL by
+ doing a combination (see Step 3.2A below) of all possible variants
+ listed in the "Preferred Variant(s)" column for each Code Point in
+ the Nameprep-processed IDL. The generated Preferred Variant Labels
+ must be processed through Nameprep. If the Nameprep processing fails
+ for any Preferred Variant Label (this is unlikely to occur if the
+ Preferred Variants are processed through Nameprep before being placed
+ in the table), then that variant label will be removed from the list.
+ The remaining Preferred Variant Labels in the list are then checked
+ to see whether they are already registered or reserved. If any are
+ registered or reserved, then the conflict resolution policy will
+ apply. In general, this will not prevent the originally requested
+ IDL from being registered unless the policy prevents such
+ registration. For example, if FCFS is applied, then the conflicting
+ variants will be removed from the list, but the originally requested
+
+
+
+Konishi, et al. Informational [Page 15]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ IDL and any remaining variants will be registered (see steps 5 and 8
+ below).
+
+ Step 3.2A Generating variant labels from Variant Code Points.
+
+ Steps 3.2 and 3.3 require that the Preferred Variants and Character
+ Variants be combined with the original IDL to form sets of variant
+ labels. Conceptually, one starts with the original, Nameprep-
+ processed, IDL and examines each of its characters in turn. If a
+ character is encountered for which there is a corresponding Preferred
+ Variant or Character Variant, a new variant label is produced with
+ the Variant Code Point substituted for the original one. If variant
+ labels already exist as the result of the processing of characters
+ that appeared earlier in the original IDL, then the substitutions are
+ made in them as well, resulting in additional generated variant
+ labels. This operation is repeated separately for the Preferred
+ Variants (in Step 3.2) and Character Variants (in Step 3.3). Of
+ course, equivalent results could be achieved by processing the
+ original IDL's characters in order, building the Preferred Variant
+ Label set and Character Variant Label set in parallel.
+
+ This process will sometimes generate a very large number of labels.
+ For example, if only two of the characters in the original IDL are
+ associated with Preferred Variants and if the first of those
+ characters has three Preferred Variants and the second has two, one
+ ends up with 12 variant labels to be placed in the IDL Package and,
+ normally, in the zone file. Repeating the process for Character
+ Variants, if any exist, would further increase the number of labels.
+ And if more than one language is specified for the original IDL, then
+ repetition of the process for additional languages (see step 4,
+ below) might further increase the size of the set.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Konishi, et al. Informational [Page 16]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ For illustrative purposes, the "combination" process could be
+ achieved by a recursive function similar to the following pseudocode:
+
+ Function Combination(Str)
+ F <= first codepoint of Str
+ SStr <= Substring of Str, without the first code point
+ NSC <= {}
+
+ If SStr is empty then
+ for each V in (Variants of code point F)
+ NSC = NSC set-union (the string with the code point V)
+ End of Loop
+ Else
+ SubCom = Combination(SStr)
+ For each V in (Variants of code point F)
+ For each SC in SubCom
+ NSC = NSC set-union (the string with the
+ first code point V followed by the string SC)
+ End of Loop
+ End of Loop
+ Endif
+
+ Return NSC
+
+ Step 3.3. CV(IN,AL) <= Set of available Nameprep-processed Character
+ Variants of NP(IN) in AL
+
+ This step generates the list of Character Variant Labels by doing a
+ combination (see Step 3.2A above) of all the possible variants listed
+ in the "Character Variant(s)" column for each Code Point in the
+ Nameprep-processed original IDL. As with the Preferred Variant
+ Labels, the generated Character Variant Labels must be processed by,
+ and acceptable to, Nameprep. If the Nameprep processing fails for a
+ Character Variant Label, then that variant label will be removed from
+ the list. The remaining Character Variant Labels are then checked to
+ be sure they are not registered or reserved. If one or more are,
+ then the conflict resolution policy is applied. As with Preferred
+ Variant Labels, a conflict that is resolved in favor of the earlier
+ registrant does not, in general, prevent the IDL from being
+ registered, nor the remaining variants from being reserved in step 6
+ below.
+
+ Step 3.4. End of Loop
+
+
+
+
+
+
+
+
+Konishi, et al. Informational [Page 17]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ Step 4. Let PVall be the set-union of all PV(IN,AL)
+
+ Step 4 generates the Preferred Variants Label for all languages. In
+ this step, and again in step 6 below, the zone administrator may
+ impose additional rules and processing activities to restrict the
+ number of Preferred (tentatively to be reserved and activated) and
+ Character (tentatively to be reserved) Label Variants. These
+ additional rules and processing activities are zone policy specific
+ and therefore are not specified in this document.
+
+ Step 5. {ZV} <= PVall set-union NP(IN)
+
+ Step 5 generates the initial Zone Variants. The set includes all
+ Preferred Variants for all languages and the original Nameprep-
+ processed IDL. Unless excluded by further processing, these Zone
+ Variants will be activated. That is, placed into the DNS zone. Note
+ that the "set-union" operation will eliminate any duplicates.
+
+ Step 6. Let CVall be the set-union of all CV(IN,AL), set-minus
+ {ZV}
+
+ Step 6 generates the Reserved Label Variants (the Character Variant
+ Label set). These labels are normally reserved but not activated.
+ The set includes all Character Variant Labels for all languages, but
+ not the Zone Variants defined in the previous step. The set-union
+ and set-minus operations eliminate any duplicates.
+
+ Step 7. Create IDL Package for IN using IN, {L}, {ZV} and CVall
+
+ In Step 7, the "IDL Package" is created using the original IDL, the
+ associated language(s), the Zone Variant Labels, and the Reserved
+ Variant Labels. If zone-specific additional processing or filtering
+ is to be applied to eliminate linguistically inappropriate or other
+ forms, it should be applied before the IDL Package is actually
+ assembled.
+
+ Step 8. Put {ZV} into zone file
+
+ The activated IDLs are converted via ToASCII with UseSTD13ASCIIRules
+ [IDNA] before being placed into the zone file. This conversion
+ results in the IDLs being in the actual IDNA ("Punycode") form used
+ in zone files, while the IDLs have been carried in Unicode form up to
+ this point. If ToASCII fails for any of the activated IDLs, that IDL
+ must not be placed into the zone file. If the IDL is a subdomain
+ name, it will be delegated.
+
+
+
+
+
+
+Konishi, et al. Informational [Page 18]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+3.3. Deletion and Transfer of IDL and IDL Package
+
+ In traditional domain administration, every Domain Name Label is
+ independent of all other Domain Name Labels. Registration, deletion,
+ and transfer of labels is done on a per-label basis. However, with
+ the guidelines discussed here, each IDL is associated with specific
+ languages, with all label variants, both active (zone) and reserved,
+ together in an IDL Package. This quite deliberately prohibits labels
+ that contain sufficient mixtures of characters from different scripts
+ to make them impossible as words in any given language. If a zone
+ chooses to not impose that restriction--that is, to permit labels to
+ be constructed by picking characters from several different languages
+ and scripts--then the guidelines described here would be
+ inappropriate.
+
+ As stated earlier, the IDL package should be treated as a single
+ atomic unit and all variants of the IDL should belong to a single
+ domain-name holder. If the local policy related to the handling of
+ disagreements requires a particular IDL to be transferred and deleted
+ independently of the IDL Package, the conflict policy would take
+ precedence. In such an event, the conflict policy should include a
+ transfer or delete procedure that takes the nature of IDL Packages
+ into consideration.
+
+ When an IDL Package is deleted, all of the Zone and Reserved Label
+ Variants again become available. The deletion of one IDL Package
+ does not change any other IDL Packages.
+
+3.4. Activation and Deactivation of IDL variants
+
+ Because there are active (registered) IDLs and inactive (reserved but
+ not registered) IDLs within an IDL package, processes are required to
+ activate or deactivate IDL variants within an IDL Package.
+
+3.4.1. Activation Algorithm
+
+ Step 1. IN <= IDL to be activated and PA <= IDL Package
+
+ Start with the IDL to be activated and the IDL Package of which it is
+ a member.
+
+ Step 2. NP(IN) <= Nameprep processed IN
+
+ Process the IDL through Nameprep. This step should never cause a
+ problem, or even a change, since all labels that become part of the
+ IDL Package are processed through Nameprep in Step 3.2 or 3.3 of the
+ Registration procedure (section 3.2.3).
+
+
+
+
+Konishi, et al. Informational [Page 19]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ Step 3. If NP(IN) not in CVall then stop
+
+ Verify that the Nameprep-processed version of the IDL appears as a
+ still-unactivated label in the IDL Package, i.e., in the list of
+ Reserved Label Variants, CVall. It might be a useful "sanity check"
+ to also verify that it does not already appear in the zone file.
+
+ Step 4. CVall <= CVall set-minus NP(IN) and {ZV} <= {ZV} set-union
+ NP(IN)
+
+ Within the IDL Package, remove the Nameprep-processed version of the
+ IDL from the list of Reserved Label Variants and add it to the list
+ of active (zone) label variants.
+
+ Step 5. Put {ZV} into the zone file
+
+ Actually register (activate) the Zone Variant Labels.
+
+3.4.2. Deactivation Algorithm
+
+ Step 1. IN <= IDL to be deactivated and PA <= IDL Package
+
+ As with activation, start with the IDL to be deactivated and the IDL
+ Package of which it is a member.
+
+ Step 2. NP(IN) <= Nameprep processed IN
+
+ Get the Nameprep-processed version of the name (see discussion in the
+ previous section).
+
+ Step 3. If NP(IN) not in {ZV} then stop
+
+ Verify that the Nameprep-processed version of the IDL appears as an
+ activated (zone) label variant in the IDL Package. It might be a
+ useful "sanity check" at this point to also verify that it actually
+ appears in the zone file.
+
+ Step 4. CVall <= CVall set-union NP(IN) and {ZV} <= {ZV} set-minus
+ NP(IN)
+
+ Within the IDL Package, remove the Nameprep-processed version of the
+ IDL from the list of Active (Zone) Label Variants and add it to the
+ list of Reserved (but inactive) Label Variants.
+
+ Step 5. Put {ZV} into the zone file
+
+
+
+
+
+
+Konishi, et al. Informational [Page 20]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+3.5. Managing Changes in Language Associations
+
+ Since the IDL package is an atomic unit and the associated list of
+ variants must not be changed after creation, this document does not
+ include a mechanism for adding and deleting language associations
+ within the IDL package. Instead, it recommends deleting the IDL
+ package entirely, followed by a registration with the new set of
+ languages. Zone administrators may find it desirable to devise
+ procedures that prevent other parties from capturing the labels in
+ the IDL Package during these operations.
+
+3.6. Managing Changes to the Language Variant Tables
+
+ Language Variant Tables are subject to changes over time, and these
+ changes may or may not be backward compatible. It is possible that
+ updated Language Variant Tables may produce a different set of
+ Preferred Variants and Reserved Variants.
+
+ In order to preserve the atomicity of the IDL Package, when the
+ Language Variant Table is changed, IDL Packages created using the
+ previous version of the Language Variant Table must not be updated or
+ affected.
+
+4. Examples of Guideline Use in Zones
+
+ To provide a meaningful example, some Language Variant Tables must be
+ defined. Assume, then, for the purpose of giving examples, that the
+ following four Language Variant Tables are defined:
+
+ Note: these tables are not a representation of the actual tables, and
+ they do not contain sufficient entries to be used in any actual
+ implementation. IANA maintains a voluntary registry of actual tables
+ [IANA-LVTABLES] which may be consulted for complete examples.
+
+ a) Language Variant Table for zh-cn and zh-sg
+
+Reference 1 CP936 (commonly known as GBK)
+Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt [UNIHAN]
+Reference 3 List of Simplified character Table (Simplified column)
+Reference 4 zSimpVariant in Unihan.txt [UNIHAN]
+Reference 5 variant that exists in GB2312, common simplified hanzi
+
+ Version 1 20020701 # July 2002
+
+ 56E2(1);56E2(5);5718(2) # sphere, ball, circle; mass, lump
+ 5718(1);56E2(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump
+ 60F3(1);60F3(5); # think, speculate, plan, consider
+ 654E(1);6559(5);6559(2) # teach
+
+
+
+Konishi, et al. Informational [Page 21]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ 6559(1);6559(5);654E(2) # teach, class
+ 6DF8(1);6E05(5);6E05(2) # clear
+ 6E05(1);6E05(5);6DF8(2) # clear, pure, clean; peaceful
+ 771E(1);771F(5);771F(2) # real, actual, true, genuine
+ 771F(1);771F(5);771E(2) # real, actual, true, genuine
+ 8054(1);8054(3);806F(2) # connect, join; associate, ally
+ 806F(1);8054(3);8054(2),8068(2) # connect, join; associate, ally
+ 96C6(1);96C6(5); # assemble, collect together
+
+ b) Language Variant Table for zh-tw
+
+ Reference 1 CP950 (commonly known as BIG5)
+ Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt
+ Reference 3 List of Simplified Character Table (Traditional column)
+ Reference 4 zTradVariant in Unihan.txt
+
+ Version 1 20020701 # July 2002
+
+ 5718(1);5718(4);56E2(2),56E3(2) # sphere, ball, circle; mass, lump
+ 60F3(1);60F3(1); # think, speculate, plan, consider
+ 6559(1);6559(1);654E(2) # teach, class
+ 6E05(1);6E05(1);6DF8(2) # clear, pure, clean; peaceful
+ 771F(1);771F(1);771E(2) # real, actual, true, genuine
+ 806F(1);806F(3);8054(2),8068(2) # connect, join; associate, ally
+ 96C6(1);96C6(1); # assemble, collect together
+
+ c) Language Variant Table for ja
+
+ Reference 1 CP932 (commonly known as Shift-JIS)
+ Reference 2 zVariant in Unihan.txt
+ Reference 3 variant that exists in JIS X0208, commonly used Kanji
+
+ Version 1 20020701 # July 2002
+
+ 5718(1);5718(3);56E3(2) # sphere, ball, circle; mass, lump
+ 60F3(1);60F3(3); # think, speculate, plan, consider
+ 654E(1);6559(3);6559(2) # teach
+ 6559(1);6559(3);654E(2) # teach, class
+ 6DF8(1);6E05(3);6E05(2) # clear
+ 6E05(1);6E05(3);6DF8(2) # clear, pure, clean; peaceful
+ 771E(1);771E(1);771F(2) # real, actual, true, genuine
+ 771F(1);771F(1);771E(2) # real, actual, true, genuine
+ 806F(1);806F(1);8068(2) # connect, join; associate, ally
+ 96C6(1);96C6(3); # assemble, collect together
+
+ d) Language Variant Table for ko
+
+ Reference 1 CP949 (commonly known as EUC-KR)
+
+
+
+Konishi, et al. Informational [Page 22]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ Reference 2 zVariant and K-source in Unihan.txt
+
+ Version 1 20020701 # July 2002
+
+ 5718(1);5718(1);56E3(2) # sphere, ball, circle; mass, lump
+ 60F3(1);60F3(1); # think, speculate, plan, consider
+ 654E(1);654E(1);6559(2) # teach
+ 6DF8(1);6DF8(1);6E05(2) # clear
+ 771E(1);771E(1);771F(2) # real, actual, true, genuine
+ 806F(1);806F(1);8068(2) # connect, join; associate, ally
+ 96C6(1);96C6(1); # assemble, collect together
+
+ Example 1: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
+ {L} = {zh-cn, zh-sg, zh-tw}
+
+ NP(IN) = (U+6E05 U+771F U+6559)
+ PV(IN,zh-cn) = (U+6E05 U+771F U+6559)
+ PV(IN,zh-sg) = (U+6E05 U+771F U+6559)
+ PV(IN,zh-tw) = (U+6E05 U+771F U+6559)
+
+ {ZV} = {(U+6E05 U+771F U+6559)}
+ CVall = {(U+6E05 U+771E U+6559),
+ (U+6E05 U+771E U+654E),
+ (U+6E05 U+771F U+654E),
+ (U+6DF8 U+771E U+6559),
+ (U+6DF8 U+771E U+654E),
+ (U+6DF8 U+771F U+6559),
+ (U+6DF8 U+771F U+654E)}
+
+ Example 2: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
+ {L} = {ja}
+
+ NP(IN) = (U+6E05 U+771F U+6559)
+ PV(IN,ja) = (U+6E05 U+771F U+6559)
+ {ZV} = {(U+6E05 U+771F U+6559)}
+
+ CVall = {(U+6E05 U+771E U+6559),
+ (U+6E05 U+771E U+654E),
+ (U+6E05 U+771F U+654E),
+ (U+6DF8 U+771E U+6559),
+ (U+6DF8 U+771E U+654E),
+ (U+6DF8 U+771F U+6559),
+ (U+6DF8 U+771F U+654E)}
+
+ Example 3: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
+ {L} = {zh-cn, zh-sg, zh-tw, ja, ko}
+
+ NP(IN) = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
+
+
+
+Konishi, et al. Informational [Page 23]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ Invalid registration because U+6E05 is invalid in L = ko
+
+ Example 4: IDL = (U+806F U+60F3 U+96C6 U+5718)
+ *lian2 xiang3 ji2 tuan2*
+ {L} = {zh-cn, zh-sg, zh-tw}
+
+ NP(IN) = (U+806F U+60F3 U+96C6 U+5718)
+ PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2)
+ PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2)
+ PV(IN,zh-tw) = (U+806F U+60F3 U+96C6 U+5718)
+ {ZV} = {(U+8054 U+60F3 U+96C6 U+56E2),
+ (U+806F U+60F3 U+96C6 U+5718)}
+ CVall = {(U+8054 U+60F3 U+96C6 U+56E3),
+ (U+8054 U+60F3 U+96C6 U+5718),
+ (U+806F U+60F3 U+96C6 U+56E2),
+ (U+806f U+60F3 U+96C6 U+56E3),
+ (U+8068 U+60F3 U+96C6 U+56E2),
+ (U+8068 U+60F3 U+96C6 U+56E3),
+ (U+8068 U+60F3 U+96C6 U+5718)
+
+ Example 5: IDL = (U+8054 U+60F3 U+96C6 U+56E2)
+ *lian2 xiang3 ji2 tuan2*
+ {L} = {zh-cn, zh-sg}
+
+ NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2)
+ PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2)
+ PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2)
+ {ZV} = {(U+8054 U+60F3 U+96C6 U+56E2)}
+ CVall = {(U+8054 U+60F3 U+96C6 U+56E3),
+ (U+8054 U+60F3 U+96C6 U+5718),
+ (U+806F U+60F3 U+96C6 U+56E2),
+ (U+806f U+60F3 U+96C6 U+56E3),
+ (U+806F U+60F3 U+96C6 U+5718),
+ (U+8068 U+60F3 U+96C6 U+56E2),
+ (U+8068 U+60F3 U+96C6 U+56E3),
+ (U+8068 U+60F3 U+96C6 U+5718)}
+
+ Example 6: IDL = (U+8054 U+60F3 U+96C6 U+56E2)
+ *lian2 xiang3 ji2 tuan2*
+ {L} = {zh-cn, zh-sg, zh-tw}
+
+ NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2)
+ Invalid registration because U+8054 is invalid in L = zh-tw
+
+ Example 7: IDL = (U+806F U+60F3 U+96C6 U+5718)
+ *lian2 xiang3 ji2 tuan2*
+ {L} = {ja,ko}
+
+
+
+
+Konishi, et al. Informational [Page 24]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ NP(IN) = (U+806F U+60F3 U+96C6 U+5718)
+ PV(IN,ja) = (U+806F U+60F3 U+96C6 U+5718)
+ PV(IN,ko) = (U+806F U+60F3 U+96C6 U+5718)
+ {ZV} = {(U+806F U+60F3 U+96C6 U+5718)}
+
+ CVall = {(U+806F U+60F3 U+96C6 U+56E3),
+ (U+8068 U+60F3 U+96C6 U+5718),
+ (U+8068 U+60F3 U+96C6 U+56E3)}
+
+5. Syntax Description for the Language Variant Table
+
+ The formal syntax for the Language Variant Table is as follows, using
+ the IETF "ABNF" metalanguage [ABNF]. Some comments on this syntax
+ appear immediately after it.
+
+5.1. ABNF Syntax
+
+LanguageVariantTable = 1*ReferenceLine VersionLine 1*EntryLine
+ReferenceLine = "Reference" SP RefNo SP RefDesciption [ Comment ] CRLF
+RefNo = 1*DIGIT
+RefDesciption = *[VCHAR]
+VersionLine = "Version" SP VersionNo SP VersionDate [ Comment ] CRLF
+VersionNo = 1*DIGIT
+VersionDate = YYYYMMDD
+EntryLine = VariantEntry/Comment CRLF
+
+VariantEntry = ValidCodePoint ";"
+ PreferredVariant ";" CharacterVariant [ Comment ]
+ValidCodePoint = CodePoint
+RefList = RefNo 0*( "," RefNo )
+PreferredVariant = CodePointSet 0*( "," CodePointSet )
+CharacterVariant = CodePointSet 0*( "," CodePointSet )
+CodePointSet = CodePoint 0*( SP CodePoint )
+CodePoint = 4*8DIGIT [ "(" Reflist ")" ]
+Comment = "#" *VCHAR
+
+ YYYYMMDD is an integer, in alphabetic form, representing a date,
+ where YYYY is the 4-digit year, MM is the 2-digit month, and DD is
+ the 2-digit day.
+
+5.2. Comments and Explanation of Syntax
+
+ Any lines starting with, or portions of lines after, the hash
+ symbol("#") are treated as comments. Comments have no significance
+ in the processing of the tables; nor are there any syntax
+ requirements between the hash symbol and the end of the line. Blank
+ lines in the tables are ignored completely.
+
+
+
+
+Konishi, et al. Informational [Page 25]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ Every language should have its own Language Variant Table provided by
+ a relevant group, organization, or other body. That table will
+ normally be based on some established standard or standards. The
+ group that defines a Language Variant Table should document
+ references to the appropriate standards at the beginning of the
+ table, tagged with the word "Reference" followed by an integer (the
+ reference number) followed by the description of the reference. For
+ example:
+
+ Reference 1 CP936 (commonly known as GBK)
+ Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt
+ Reference 3 List of Simplified Character Table (Simplified column)
+ Reference 4 zSimpVariant in Unihan.txt
+ Reference 5 Variant that exists in GB2312, common simplified Hanzi
+
+ Each Language Variant Table must have a version number and its
+ release date. This is tagged with the word "Version" followed by an
+ integer then followed by the date in the format YYYYMMDD, where YYYY
+ is the 4-digit year, MM is the 2-digit month, and DD is the 2-digit
+ day of the publication date of the table.
+
+ Version 1 20020701 # July 2002 Version 1
+
+ The table has three columns, separated by semicolons: "Valid Code
+ Point"; "Preferred Variant(s)"; and "Character Variant(s)".
+
+ The "Valid Code Point" is the subset of Unicode characters that are
+ valid to be registered.
+
+ There can be more than one Preferred Variant; hence there could be
+ multiple entries in the "Preferred Variant(s)" column. If the
+ "Preferred Variant(s)" column is empty, then there is no
+ corresponding Preferred Variant; in other words, the Preferred
+ Variant is null, there is no corresponding preferred variant
+ codepoint, and no processing to add labels for preferred variants
+ occurs." Unless local policy dictates otherwise, the procedures
+ above will result in only those labels that reflect the valid code
+ point being activated (registered) into the zone file.
+
+ The "Character Variant(s)" column contains all Character Variants of
+ the Code Point. Since the Code Point is always a variant of itself,
+ to avoid redundancy, the Code Point is assumed to be part of the
+ "Character Variant(s)" and need not be repeated in the "Character
+ Variant(s)" column.
+
+ If the variant in the "Preferred Variant(s)" or the "Character
+ Variant(s)" column is composed of a sequence of Code Points, then
+ sequence of Code Points is listed separated by a space.
+
+
+
+Konishi, et al. Informational [Page 26]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ If there are multiple variants in the "Preferred Variant(s)" or the
+ "Character Variant(s)" column, then each variant is separated by a
+ comma.
+
+ Any Code Point listed in the "Preferred Variant(s)" column must be
+ allowed by the rules for the relevant language to be registered.
+ However, this is not a requirement for the entries in the "Character
+ Variant(s)" column; it is possible that some of those entries may not
+ be allowed to be registered.
+
+ Every Code Point in the table should have a corresponding reference
+ number (associated with the references) specified to justify the
+ entry. The reference number is placed in parentheses after the Code
+ Point. If there is more than one reference, then the numbers are
+ placed within a single set of parentheses and separated by commas.
+
+6. Security Considerations
+
+ As discussed in the Introduction, substantially-unrestricted use of
+ international (non-ASCII) characters in domain name labels may cause
+ user confusion and invite various types of attacks. In particular,
+ in the case of CJK languages, an attacker has an opportunity to
+ divert or confuse users as a result of different characters (or, more
+ specifically, assigned code points) with identical or similar
+ semantics. These Guidelines provide a partial remedy for those risks
+ by supplying a framework for prohibiting inappropriate characters
+ from being registered at all and for permitting "variant" characters
+ to be grouped together and reserved, so that they can only be
+ registered in the DNS by the same owner. However, the system it
+ suggests is no better or worse than the per-zone and per-language
+ tables whose format and use this document specifies. Specific
+ tables, and any additional local processing, will reflect per-zone
+ decisions about the balance between risk and flexibility of
+ registrations. And, of course, errors in construction of those
+ tables may significantly reduce the quality of protection provided.
+
+7. Index to Terminology
+
+ As a convenience to the reader, this section lists all of the special
+ terminology used in this document, with a pointer to the section in
+ which it is defined.
+
+ Activated Label 2.1.17
+ Activation 2.1.4
+ Active Label 2.1.17
+ Character Variant 2.1.14
+ Character Variant Label 2.1.16
+ CJK Characters 2.1.9
+
+
+
+Konishi, et al. Informational [Page 27]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ Code point 2.1.7
+ Code Point Variant 2.1.14
+ FQDN 2.1.3
+ Hostname 2.1.1
+ IDL 2.1.2
+ IDL Package 2.1.18
+ IDN 2.1.1
+ Internationalized Domain Label 2.1.2
+ ISO/IEC 10646 2.1.6
+ Label String 2.1.10
+ Language name codes 2.1.5
+ Language Variant Table 2.1.11
+ LDH Subset 2.1.1
+ Preferred Code Point 2.1.13
+ Preferred Variant 2.1.13
+ Preferred Variant Label 2.1.15
+ Registration 2.1.4
+ Reserved 2.1.18
+ RFC3066 2.1.5
+ Table 2.1.11
+ UCS 2.1.6
+ Unicode Character 2.1.7
+ Unicode String 2.1.8
+ Valid Code Point 2.1.12
+ Variant Table 2.1.11
+ Zone Variant 2.1.17
+
+8. Acknowledgments
+
+ The authors gratefully acknowledge the contributions of:
+
+ - V. CHEN, N. HSU, H. HOTTA, S. TASHIRO, Y. YONEYA, and other Joint
+ Engineering Team members at the JET meeting in Bangkok, Thailand.
+
+ - Yves Arrouye, an observer at the JET meeting in Bangkok, for his
+ contribution on the IDL Package.
+
+ - Those who commented on, and made suggestions about, earlier
+ versions, including Harald ALVESTRAND, Erin CHEN, Patrik
+ FALTSTROM, Paul HOFFMAN, Soobok LEE, LEE Xiaodong, MAO Wei, Erik
+ NORDMARK, and L.M. TSENG.
+
+
+
+
+
+
+
+
+
+
+Konishi, et al. Informational [Page 28]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+9. References
+
+9.1. Normative References
+
+ [ABNF] Crocker, D. and P. Overell, Eds., "Augmented BNF for
+ Syntax Specifications: ABNF", RFC 2234, November
+ 1997.
+
+ [STD13] Mockapetris, P., "Domain names concepts and
+ facilities" STD 13, RFC 1034, November 1987.
+ Mockapetris, P., "Domain names implementation and
+ specification", STD 13, RFC 1035, November 1987.
+
+ [RFC3066] Alvestrand, H., "Tags for the Identification of
+ Languages," BCP 47, RFC 3066, January 2001.
+
+ [IDNA] Faltstrom, P., Hoffman, P. and A. M. Costello,
+ "Internationalizing Domain Names in Applications
+ (IDNA)", RFC 3490, March 2003.
+
+ [PUNYCODE] Costello, A.M., "Punycode: A Bootstring encoding of
+ Unicode for Internationalized Domain Names in
+ Applications (IDNA)", RFC 3492, March 2003.
+
+ [STRINGPREP] Hoffman, P. and M. Blanchet, "Preparation of
+ Internationalized Strings ("stringprep")", RFC 3454,
+ December 2002.
+
+ [NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
+ Profile for Internationalized Domain Names (IDN)",
+ RFC 3491, March 2003.
+
+ [IS10646] A product of ISO/IEC JTC1/SC2/WG2, Work Item
+ JTC1.02.18 (ISO/IEC 10646). It is a multipart
+ standard: Part 1, published as ISO/IEC 10646-
+ 1:2000(E), covers the Architecture and Basic
+ Multilingual Plane, and Part 2, published as ISO/IEC
+ 10646-2:2001(E), covers the supplementary
+ (additional) planes.
+
+ [UNIHAN] Unicode Han Database, Unicode Consortium
+ ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt.
+
+ [UNICODE] The Unicode Consortium, "The Unicode Standard Version
+ 3.0," ISBN 0-201-61633-5. Unicode Standard Annex #28
+ (http://www.unicode.org/unicode/reports/tr28/)
+ defines Version 3.2 of the Unicode Standard, which is
+ definitive for IDNA and this document.
+
+
+
+Konishi, et al. Informational [Page 29]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+ [ISO7098] ISO 7098;1991 Information and documentation
+ Romanization of Chinese, ISO/TC46/SC2.
+
+9.2. Informative References
+
+ [IANA-LVTABLES] Internet Assigned Numbers Authority (IANA), IDN
+ Character Registry.
+ http://www.iana.org/assignments/idn/
+
+ [IDN-WG] IETF Internationalized Domain Names Working Group,
+ now concluded,idn@ops.ietf.org, James Seng, Marc
+ Blanchet, co-chairs, http://www.i-d-n.net/.
+
+ [UDRP] ICANN, "Uniform Domain Name Dispute Resolution
+ Policy", October 1999,
+ http://www.icann.org/udrp/udrp-policy-24oct99.htm
+
+ [ISO639] "ISO 639:1988 (E/F) Code for the representation of names
+ of languages", International Organization for
+ Standardization, 1st edition, 1988-04-01.
+
+10. Contributors
+
+ The formal responsibility for this document and the ideas it contains
+ lie with K. Koniski, K. Huang, H. Qian, and Y. Ko. These authors are
+ listed on the first page as authors of record, and they are the
+ appropriate the long-term contacts for questions and comments on this
+ RFC. On the other hand, J. Seng, J. Klensin, and W. Rickard served
+ as editors of the document, transcribing and translating the ideas of
+ the four authors and the teams they represented into the current
+ written form. They were the primary contacts during the editing
+ process, but not in the long term.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Konishi, et al. Informational [Page 30]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+10.1. Authors' Addresses
+
+ Kazunori KONISHI
+ JPNIC
+ Kokusai-Kougyou-Kanda Bldg 6F
+ 2-3-4 Uchi-Kanda, Chiyoda-ku
+ Tokyo 101-0047
+ Japan
+
+ Phone: +81 49-278-7313
+ EMail: konishi@jp.apan.net
+
+
+ Kenny HUANG
+ TWNIC
+ 3F, 16, Kang Hwa Street, Taipei
+ Taiwan
+
+ Phone: 886-2-2658-6510
+ EMail: huangk@alum.sinica.edu
+
+
+ QIAN Hualin
+ CNNIC
+ No.6 Branch-box of No.349 Mailbox, Beijing 100080
+ Peoples Republic of China
+
+ EMail: Hlqian@cnnic.net.cn
+
+
+ KO YangWoo
+ PeaceNet
+ Yangchun P.O. Box 81 Seoul 158-600
+ Korea
+
+ EMail: yw@mrko.pe.kr
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Konishi, et al. Informational [Page 31]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+10.2. Editors' Addresses
+
+ James SENG
+ 180 Lompang Road
+ #22-07 Singapore 670180
+ Phone: +65 9638-7085
+
+ EMail: jseng@pobox.org.sg
+
+
+ John C KLENSIN
+ 1770 Massachusetts Avenue, No. 322
+ Cambridge, MA 02140
+ U.S.A.
+
+ EMail: Klensin+ietf@jck.com
+
+
+ Wendy RICKARD
+ The Rickard Group
+ 16 Seminary Ave
+ Hopewell, NJ 08525
+ USA
+
+ EMail: rickard@rickardgroup.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Konishi, et al. Informational [Page 32]
+
+RFC 3743 JET Guidelines for IDN April 2004
+
+
+11. Full Copyright Statement
+
+ Copyright (C) The Internet Society (2004). This document is subject
+ to the rights, licenses and restrictions contained in BCP 78 and
+ except as set forth therein, the authors retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at ietf-
+ ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+
+
+Konishi, et al. Informational [Page 33]
+