diff options
Diffstat (limited to 'doc/rfc/rfc6912.txt')
-rw-r--r-- | doc/rfc/rfc6912.txt | 675 |
1 files changed, 675 insertions, 0 deletions
diff --git a/doc/rfc/rfc6912.txt b/doc/rfc/rfc6912.txt new file mode 100644 index 0000000..0dbe064 --- /dev/null +++ b/doc/rfc/rfc6912.txt @@ -0,0 +1,675 @@ + + + + + + +Internet Architecture Board (IAB) A. Sullivan +Request for Comments: 6912 Dyn, Inc. +Category: Informational D. Thaler +ISSN: 2070-1721 Microsoft + J. Klensin + + O. Kolkman + NLnet Labs + April 2013 + + + Principles for Unicode Code Point Inclusion in Labels in the DNS + +Abstract + + Internationalized Domain Names in Applications (IDNA) makes available + to DNS zone administrators a very wide range of Unicode code points. + Most operators of zones should probably not permit registration of + U-labels using the entire range. This is especially true of zones + that accept registrations across organizational boundaries, such as + top-level domains and, most importantly, the root. It is + unfortunately not possible to generate algorithms to determine + whether permitting a code point presents a low risk. This memo + presents a set of principles that can be used to guide the decision + of whether a Unicode code point may be wisely included in the + repertoire of permissible code points in a U-label in a zone. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Architecture Board (IAB) + and represents information that the IAB has deemed valuable to + provide for permanent record. It represents the consensus of the + Internet Architecture Board (IAB). Documents approved for + publication by the IAB are not a candidate for any level of Internet + Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc6912. + + + + + + + + + +Sullivan, et al. Informational [Page 1] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + +Copyright Notice + + Copyright (c) 2013 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 + 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 + 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 + 2.1. More-Restrictive Rules Going Up the DNS Tree . . . . . . 6 + 3. Principles Applicable to All Zones . . . . . . . . . . . . . 6 + 3.1. Longevity Principle . . . . . . . . . . . . . . . . . . . 6 + 3.2. Least Astonishment Principle . . . . . . . . . . . . . . 6 + 3.3. Contextual Safety Principle . . . . . . . . . . . . . . . 7 + 4. Principles Applicable to All Public Zones . . . . . . . . . . 7 + 4.1. Conservatism Principle . . . . . . . . . . . . . . . . . 7 + 4.2. Inclusion Principle . . . . . . . . . . . . . . . . . . . 7 + 4.3. Simplicity Principle . . . . . . . . . . . . . . . . . . 7 + 4.4. Predictability Principle . . . . . . . . . . . . . . . . 8 + 4.5. Stability Principle . . . . . . . . . . . . . . . . . . . 8 + 5. Principle Specific to the Root Zone . . . . . . . . . . . . . 8 + 5.1. Letter Principle . . . . . . . . . . . . . . . . . . . . 8 + 6. Confusion and Context . . . . . . . . . . . . . . . . . . . . 9 + 7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 9 + 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 + 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 + 10. IAB Members at the Time of Approval . . . . . . . . . . . . . 10 + 11. Informative References . . . . . . . . . . . . . . . . . . . 10 + + + + + + + + + + + + + + + +Sullivan, et al. Informational [Page 2] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + +1. Introduction + + Operators of a DNS zone need to set policies around what Unicode code + points are allowed in labels in that zone. Typically there are a + number of important goals to consider when constructing such + policies. These include, for instance, avoiding possible visual + confusability between two labels, avoiding possible confusion between + Fully Qualified Domain Names (FQDNs) and IP address literals, + accessibility to the disabled (see "Web Content Accessibility + Guidelines (WCAG) 2.0" [WCAG20] for some discussion in a web + context), and other usability issues. + + This document provides a set of principles that zone operators can + use to construct their code point policies in order to improve + usability and clarity and thereby reduce confusion. + +1.1. Terminology + + This document uses the following terms. + + A-label: an LDH label that starts with "xn--" and meets all the + IDNA requirements, with additional restrictions as explained in + Section 2.3.2.1 of the IDNA Definitions document [RFC5890]. + + Character: a member of a set of elements used for the + organization, control, or representation of data. See Section 2 + of the Internationalization Terminology document [RFC6365] for + more details. + + Language: a way that humans communicate. The use of language + occurs in many forms, the most common of which are speech, + writing, and signing. See Section 2 of RFC 6365 for more details. + + LDH label: a string consisting of ASCII letters, digits, and the + hyphen, with additional restrictions as explained in Section 2.3.1 + of RFC 5890. + + Public zone: in this document, a DNS zone that accepts + registration requests from organizations outside the zone + administrator's own organization. (Whether the zone performs + delegation is a separate question. What is important is the + diversity of the registration-requesting community.) Note that + under this definition, the root zone is a public zone, though one + that has a unique function in the DNS. + + Rendering: the display of a string of text. See Section 5 of RFC + 6365 for more details. + + + + +Sullivan, et al. Informational [Page 3] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + + Script: a set of graphic characters used for the written form of + one or more languages. See Section 2 of RFC 6365 for more + details. + + U-label: a string of Unicode characters that meets all the IDNA + requirements and includes at least one non-ASCII character, with + additional restrictions as explained in Section 2.3.2.1 of RFC + 5890. + + Writing system: a set of rules for using one or more scripts to + write a particular language. See Section 2 of RFC 6365 for more + details. + + This memo does not propose a protocol standard, and the use of words + such as "should" follow the ordinary English meaning, and not that + laid out in [RFC2119]. + +2. Background + + In recent communications [IABCOMM1] [IABCOMM2], the IAB has + emphasized the importance of conservatism in allocating labels + conforming to IDNA2008 [RFC5890] [RFC5891] [RFC5892] [RFC5893] + [RFC5894] [RFC5895] in DNS zones, and especially in the root zone. + Traditional LDH labels in the root zone used only alphabetic + characters (i.e., ASCII a-z, which under the DNS also match A-Z). + Matters are more complicated with U-labels, however. The IAB + communications recommended that U-labels permit only code points with + a General_Category (gc) of Ll (Lowercase_Letter), Lo (Other_Letter), + or Lm (Modifier_Letter), but noted that for practical considerations + other code points might be permitted on a case-by-case basis. + + The IAB recommendations do, however, leave some issues open that need + to be addressed. It is not clear that all code points permitted + under IDNA2008 that have a General_Category of Lo or Lm are + appropriate for a zone such as the root zone. To take but one + example, the code point U+02BC (MODIFIER LETTER APOSTROPHE) has a + General_Category of Lm. In practically every rendering (and we are + unaware of an exception), U+02BC is indistinguishable from U+2019 + (RIGHT SINGLE QUOTATION MARK), which has a General_Category of Pf + (Final_Punctuation). U+02BC will also be read by large numbers of + people as being the same character as U+0027 (APOSTROPHE), which has + a General_Category of Po (Other_Punctuation), and some computer + systems may treat U+02BC as U+0027. U+02BC is PROTOCOL VALID + (PVALID) under IDNA2008 (see the IDNA Code Points document + [RFC5892]), whereas both other code points are DISALLOWED. So, to + begin with, it is plain that not every code point with a + + + + + +Sullivan, et al. Informational [Page 4] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + + General_Category of Ll, Lo, or Lm is consistent with the type of + conservatism principle discussed in Section 4.1 below or the previous + IAB recommendations. + + To make matters worse, some languages are dependent on code points + with General_Category Mc (Spacing_Mark) or General_Category Mn + (Nonspacing_Mark). This dependency is particularly common in Indic + languages, though not exclusive to them. (At the risk of vastly + oversimplifying, the overarching issue is mostly the interaction of + complex writing systems and the way Unicode works.) To restrict + users of those languages to only code points with General_Category of + Ll, Lo, or Lm would be extremely limiting. While DNS labels are not + words, or sentences, or phrases (as noted in the next steps for IDN + [RFC4690]), they are intended to support useful mnemonics. Mnemonics + that diverge wildly from the usual conventions are poor ones, because + in not following the usual conventions they are not easy to remember. + Also, wide divergence from usual conventions, if not well-justified + (and especially in a shared namespace like the root), invites + political controversy. + + Many of the issues above turn out to be relevant to all public zones. + Moreover, the overall issue of developing a policy for code point + permission is common to all zones that accept A-labels or U-labels + for registration. As Section 4.3 of the IDNA Protocol document + [RFC5891] says, every registry at every level of the DNS is "expected + to establish policies about label registrations". + + For reasons of sound management, it is not desirable to decide + whether to permit a given code point only when an application + containing that code point is pending. That approach reduces + predictability and is bound to appear subject to special pleas. It + is better instead to produce the rules governing acceptance of code + points in advance. + + As is evident from the foregoing discussion about the Letter and Mark + categories, it is simply not possible to make code point decisions + algorithmically. If it were possible to develop such an algorithm, + it would already exist: the DNS is hardly unique in needing to impose + restrictions on code points while accommodating many different + linguistic communities. Nevertheless, new guidelines can be made by + starting from overarching principles. These guidelines act more as + meta-rules, leading to the establishment of other rules about the + inclusion and exclusion of particular code points in labels in a + given zone, always based on the list of code points permitted by + IDNA. + + + + + + +Sullivan, et al. Informational [Page 5] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + +2.1. More-Restrictive Rules Going Up the DNS Tree + + A set of principles derived from the above ideas follows in Sections + 3 through 5 below. Such principles fall into three categories. Some + principles apply to every DNS zone. Some additional principles apply + to all public zones, including the root zone. Finally, other + principles apply only to the root zone. This means that zones higher + in the DNS tree tend to have more restrictive rules (since additional + principles apply), and zones lower in the DNS tree tend to have less + restrictive rules, since they are used within a more narrow context. + In general, the relevant context for a principle is that of the zone, + not that of a given subset of the user community; for the root zone, + for example, the context is "the entire Internet population". + +3. Principles Applicable to All Zones + +3.1. Longevity Principle + + Unicode properties of a code point ought to be stable across the + versions of Unicode that users of the zone are likely to have + installed. Because it is possible for the properties of a code point + to change between Unicode versions, a good way to predict such + stability is to ensure that a code point has in fact been stable for + multiple successive versions of Unicode. This principle is related + to the Stability Principle in Section 4.5. + + The more diverse the community using the zone, the greater the + importance of following this principle. The policy for a leaf zone + in the DNS might only require stability across two Unicode versions, + whereas a more public zone might require stability across four or + more releases before the code point's properties are considered long- + lived and stable. + +3.2. Least Astonishment Principle + + Every zone administrator should be sensitive to the likely use of a + code point to be permitted, particularly taking into account the + population likely to use the zone. Zone administrators should + especially consider whether a candidate code point could present + difficulty if the code point is encountered outside the usual + linguistic circumstances. By the same token, the failure to support + a code point that is normal in some linguistic circumstances could be + very surprising for users likely to encounter the names in that + circumstance. + + + + + + + +Sullivan, et al. Informational [Page 6] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + +3.3. Contextual Safety Principle + + Every zone administrator should be sensitive to ways in which a code + point that is permitted could be used in support of malicious + activity. This is not a completely new problem: the digit 1 and the + lowercase letter l are, for instance, easily confused in many + contexts. The very large repertoire of code points in Unicode (even + just the subset permitted for IDNs) makes the problem somewhat worse, + just because of the scale. + +4. Principles Applicable to All Public Zones + +4.1. Conservatism Principle + + Public zones are, by definition, zones that are shared by different + groups of people. Therefore, any decision to permit a code point in + a public zone (including the root) should be as conservative as + practicable. Doubts should always be resolved in favor of rejecting + a code point for inclusion rather than in favor of including it, in + order to minimize risk. + +4.2. Inclusion Principle + + Just as IDNA2008 starts from the principle that the Unicode range is + excluded, and then adds code points according to derived properties + of the code points, so a public zone should only permit inclusion of + a code point if it is known to be "safe" in terms of usability and + confusability within the context of that zone. The default treatment + of a code point should be that it is excluded. + +4.3. Simplicity Principle + + The rules for determining whether a code point is to be included + should be simple enough that they are readily understood by someone + with a moderate background in the DNS and Unicode issues. This + principle does not mean that a completely naive person needs to be + able to understand the rationale for including a code point, but it + does mean that if the reason for inclusion of a very peculiar code + point, even a safe one, is too difficult to understand, the code + point would not be permitted. + + The meaning of "simple" or "readily understood" is context-dependent. + For instance, the root zone has to serve everyone in the world; for + practical purposes, this means that the reasons for including a code + point need to be comprehensible even to people who cannot use the + script where the code point is found. In a zone that permits a + constrained subset of Unicode characters (for instance, only those + needed to write a single alphabetic language) and that supports a + + + +Sullivan, et al. Informational [Page 7] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + + clearly delineated linguistic community (for instance, the speakers + of a single language with well-understood written conventions), more + complicated rules might be acceptable. Compare this principle with + the Least Astonishment Principle in Section 3.2. + +4.4. Predictability Principle + + The rules for determining whether a code point is to be included + should be predictable enough that those with the requisite + understanding of DNS, IDNA, and Unicode will usually reach the same + conclusion. This is not a requirement for algorithmic treatment of + code points; as previously noted, that is not possible. Rather, it + is to say that the consistent application of professional judgment is + likely to yield the same results; combined with the principle in + Section 4.1, when results are not predictable, the anomalous code + point would not be permitted. + + Just as in Section 4.3, this principle tends to cause more + restriction the more diverse the community using the zone; it is most + restrictive for the root zone. This is because what is predictable + within a given language community is possibly very surprising across + languages. + +4.5. Stability Principle + + Once a code point is permitted, it is at least very hard to stop + permitting that code point. In public zones (including the root), + the list of code points to be permitted should change very slowly, if + at all, and usually only in the direction of permitting an addition + as time and experience indicate that inclusion of such a code point + is both safe and consistent with these principles. + +5. Principle Specific to the Root Zone + +5.1. Letter Principle + + "Requirements for Internet Hosts - Application and Support" [RFC1123] + notes that top-level labels "will be alphabetic". In the absence of + widespread agreement about the force of that note, prudence suggests + that U-labels in the root zone should exclude code points that are + not normally used to write words, or that are in some cases normally + used for purposes other than writing words. This is not the same as + using Unicode's General_Category to include only letters. It is a + restriction that expands the possible class of included code points + beyond the Unicode letters, but only expands so far as to include the + things that are normally used to create words. Under this principle, + code points with (for example) General_Category Mn (Nonspacing_Mark) + might be included -- but only those that are used to write words and + + + +Sullivan, et al. Informational [Page 8] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + + not (for instance) musical symbols. In addition, such marks should + only be used within a label in ways that they would be used when + making a word: combinations that would be nonsense when used in a + word should also be rejected when tried in DNS labels. This + principle should be applied as narrowly as possible; as the next + steps for IDN document [RFC4690] says, "While DNS labels may + conveniently be used to express words in many circumstances, the goal + is not to express words (or sentences or phrases), but to permit the + creation of unambiguous labels with good mnemonic value". + +6. Confusion and Context + + While many discussions of confusion have focused on characters, e.g., + whether two characters are confusable with each other (and under what + circumstances), a focus on characters alone could lead to the + prohibition of very large numbers of labels, including many that + present little risk. Instead, the focus should be on whether one + label is confusable with another. For example, if a label contains + several characters that are distinct to a particular script, and all + of its characters are from that script, it is inherently not + confusable with a label from any other script no matter what other + characters might appear in it. Another label that lacks those + distinguishing characters might be a problem. The notion extends + from labels to domain names, in the sense that distinguishing + characters used in a higher-level label may set expectations with + respect to the characters in the lower-level labels. This + expectation might be regarded as a benefit, but it is also a problem, + since there is no technical way to require consistent policies in + delegated namespaces. + +7. Conclusion + + The principles outlined in this document can be applied when + considering any range of Unicode code points for possible inclusion + in a DNS zone. It is worth observing that doing anything (especially + in light of Section 4.5) implicitly disadvantages communities with a + writing system not yet well understood and not represented in the + technical and policy communities involved in the discussion. That + disadvantage is to be guarded against as much as practical, but is + effectively impossible to prevent (while still taking action) in + light of imperfect human knowledge. + + + + + + + + + + +Sullivan, et al. Informational [Page 9] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + +8. Security Considerations + + The principles outlined in this memo are intended to improve + usability and clarity and thereby reduce confusion among different + labels. While these principles may contribute to reduction of risk, + they are not sufficient to provide a comprehensive + internationalization policy for zone management. + + Additional discussion of security considerations can be found in the + Unicode Security Considerations [UTR36]. + +9. Acknowledgements + + The authors thank the participants in the IAB Internationalization + program for the discussion of the ideas in this memo, particularly + Marc Blanchet. In addition, Stephane Bortzmeyer, Paul Hoffman, + Daniel Kalchev, Panagiotis Papaspiliopoulos, and Vaggelis Segredakis + made specific comments. + +10. IAB Members at the Time of Approval + + Bernard Aboba + Jari Arkko + Marc Blanchet + Ross Callon + Alissa Cooper + Spencer Dawkins + Joel Halpern + Russ Housley + David Kessens + Danny McPherson + Jon Peterson + Dave Thaler + Hannes Tschofenig + +11. Informative References + + [IABCOMM1] Internet Architecture Board, "IAB Statement: 'The + interpretation of rules in the ICANN gTLD Applicant + Guidebook'", February 2012, <http://www.iab.org/ + documents/correspondence-reports-documents/201/>. + + [IABCOMM2] Internet Architecture Board, "Response to ICANN questions + concerning 'The interpretation of rules in the ICANN gTLD + Applicant Guidebook'", March 2012, <http://www.iab.org/ + documents/correspondence-reports-documents/201/>. + + + + + +Sullivan, et al. Informational [Page 10] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + + [RFC1123] Braden, R., "Requirements for Internet Hosts - Application + and Support", STD 3, RFC 1123, October 1989. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and + Recommendations for Internationalized Domain Names + (IDNs)", RFC 4690, September 2006. + + [RFC5890] Klensin, J., "Internationalized Domain Names for + Applications (IDNA): Definitions and Document Framework", + RFC 5890, August 2010. + + [RFC5891] Klensin, J., "Internationalized Domain Names in + Applications (IDNA): Protocol", RFC 5891, August 2010. + + [RFC5892] Faltstrom, P., "The Unicode Code Points and + Internationalized Domain Names for Applications (IDNA)", + RFC 5892, August 2010. + + [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for + Internationalized Domain Names for Applications (IDNA)", + RFC 5893, August 2010. + + [RFC5894] Klensin, J., "Internationalized Domain Names for + Applications (IDNA): Background, Explanation, and + Rationale", RFC 5894, August 2010. + + [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for + Internationalized Domain Names in Applications (IDNA) + 2008", RFC 5895, September 2010. + + [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in + Internationalization in the IETF", BCP 166, RFC 6365, + September 2011. + + [UTR36] Davis, M. and M. Suignard, "Unicode Security + Considerations", Unicode Technical Report #36, July 2012. + + [WCAG20] W3C, "Web Content Accessibility Guidelines (WCAG) 2.0", + W3C Recommendation, December 2008, + <http://www.w3.org/TR/2008/REC-WCAG20-20081211/>. + + + + + + + + +Sullivan, et al. Informational [Page 11] + +RFC 6912 DNS Zone Code Point Principles April 2013 + + +Authors' Addresses + + Andrew Sullivan + Dyn, Inc. + 150 Dow St + Manchester, NH 03101 + USA + + EMail: asullivan@dyn.com + + + Dave Thaler + Microsoft + One Microsoft Way + Redmond, WA 98052 + USA + + EMail: dthaler@microsoft.com + + + John C Klensin + 1770 Massachusetts Ave, Ste 322 + Cambridge, MA 02140 + USA + + Phone: +1 617 491 5735 + EMail: john-ietf@jck.com + + + Olaf Kolkman + NLnet Labs + Science Park 400 + Amsterdam 1098 XH + The Netherlands + + EMail: olaf@NLnetLabs.nl + + + + + + + + + + + + + + + +Sullivan, et al. Informational [Page 12] + |