1 files changed, 899 insertions, 0 deletions
diff --git a/doc/rfc/rfc3696.txt b/doc/rfc/rfc3696.txt
new file mode 100644
index 0000000..d56d567
--- /dev/null
+++ b/doc/rfc/rfc3696.txt
@@ -0,0 +1,899 @@
+
+
+
+
+
+
+Network Working Group                                         J. Klensin
+Request for Comments: 3696                                 February 2004
+Category: Informational
+
+
+    Application Techniques for Checking and Transformation of Names
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  It does
+   not specify an Internet standard of any kind.  Distribution of this
+   memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2004).  All Rights Reserved.
+
+Abstract
+
+   Many Internet applications have been designed to deduce top-level
+   domains (or other domain name labels) from partial information.  The
+   introduction of new top-level domains, especially non-country-code
+   ones, has exposed flaws in some of the methods used by these
+   applications.  These flaws make it more difficult, or impossible, for
+   users of the applications to access the full Internet.  This memo
+   discusses some of the techniques that have been used and gives some
+   guidance for minimizing their negative impact as the domain name
+   environment evolves.  This document draws summaries of the applicable
+   rules together in one place and supplies references to the actual
+   standards.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Klensin                      Informational                      [Page 1]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+Table of Contents
+
+   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
+   2.  Restrictions on domain (DNS) names . . . . . . . . . . . . . .  3
+   3.  Restrictions on email addresses  . . . . . . . . . . . . . . .  5
+   4.  URLs and URIs  . . . . . . . . . . . . . . . . . . . . . . . .  7
+       4.1.  URI syntax definitions and issues  . . . . . . . . . . .  7
+       4.2.  The HTTP URL . . . . . . . . . . . . . . . . . . . . . .  8
+       4.3.  The MAILTO URL . . . . . . . . . . . . . . . . . . . . .  9
+       4.4.  Guessing domain names in web contexts  . . . . . . . . . 11
+   5.  Implications of internationalization . . . . . . . . . . . . . 11
+   6.  Summary  . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
+   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
+   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
+   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
+       9.1.  Normative References . . . . . . . . . . . . . . . . . . 14
+       9.2.  Informative References . . . . . . . . . . . . . . . . . 15
+   10. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 15
+   11. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 16
+
+1.  Introduction
+
+   Designers of user interfaces to Internet applications have often
+   found it useful to examine user-provided values for validity before
+   passing them to the Internet tools themselves.  This type of test,
+   most commonly involving syntax checks or application of other rules
+   to domain names, email addresses, or "web addresses" (URLs or,
+   occasionally, extended URI forms (see Section 4)) may enable better-
+   quality diagnostics for the user than might be available from the
+   protocol itself.  Local validity tests on values are also thought to
+   improve the efficiency of back-office processing programs and to
+   reduce the load on the protocols themselves.  Certainly, they are
+   consistent with the well-established principle that it is better to
+   detect errors as early as possible.
+
+   The tests must, however, be made correctly or at least safely.  If
+   criteria are applied that do not match the protocols, users will be
+   inconvenienced, addresses and sites will effectively become
+   inaccessible to some groups, and business and communications
+   opportunities will be lost.  Experience in recent years indicates
+   that syntax tests are often performed incorrectly and that tests for
+   top-level domain names are applied using obsolete lists and
+   conventions.  We assume that most of these incorrect tests are the
+   result of the inability to conveniently locate exact definitions for
+   the criteria to be applied.  This document draws summaries of the
+   applicable rules together in one place and supplies references to the
+
+
+
+
+
+Klensin                      Informational                      [Page 2]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+   actual standards.  It does not add anything to those standards; it
+   merely draws the information together into a form that may be more
+   accessible.
+
+   Many experts on Internet protocols believe that tests and rules of
+   these sorts should be avoided in applications and that the tests in
+   the protocols and back-office systems should be relied on instead.
+   Certainly implementations of the protocols cannot assume that the
+   data passed to them will be valid.  Unless the standards specify
+   particular behavior, this document takes no position on whether or
+   not the testing is desirable.  It only identifies the correct tests
+   to be made if tests are to be applied.
+
+   The sections that follow discuss domain names, email addresses, and
+   URLs.
+
+2.  Restrictions on domain (DNS) names
+
+   The authoritative definitions of the format and syntax of domain
+   names appear in RFCs 1035 [RFC1035], 1123 [RFC1123], and 2181
+   [RFC2181].
+
+   Any characters, or combination of bits (as octets), are permitted in
+   DNS names.  However, there is a preferred form that is required by
+   most applications.  This preferred form has been the only one
+   permitted in the names of top-level domains, or TLDs.  In general, it
+   is also the only form permitted in most second-level names registered
+   in TLDs, although some names that are normally not seen by users obey
+   other rules.  It derives from the original ARPANET rules for the
+   naming of hosts (i.e., the "hostname" rule) and is perhaps better
+   described as the "LDH rule", after the characters that it permits.
+   The LDH rule, as updated, provides that the labels (words or strings
+   separated by periods) that make up a domain name must consist of only
+   the ASCII [ASCII] alphabetic and numeric characters, plus the hyphen.
+   No other symbols or punctuation characters are permitted, nor is
+   blank space.  If the hyphen is used, it is not permitted to appear at
+   either the beginning or end of a label.  There is an additional rule
+   that essentially requires that top-level domain names not be all-
+   numeric.
+
+   When it is necessary to express labels with non-character octets, or
+   to embed periods within labels, there is a mechanism for keying them
+   in that utilizes an escape sequence.  RFC 1035 [RFC1035] should be
+   consulted if that mechanism is needed (most common applications,
+   including email and the Web, will generally not permit those escaped
+   strings).  A special encoding is now available for non-ASCII
+   characters, see the brief discussion in Section 5.
+
+
+
+
+Klensin                      Informational                      [Page 3]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+   Most internet applications that reference other hosts or systems
+   assume they will be supplied with "fully-qualified" domain names,
+   i.e., ones that include all of the labels leading to the root,
+   including the TLD name.  Those fully-qualified domain names are then
+   passed to either the domain name resolution protocol itself or to the
+   remote systems.  Consequently, purported DNS names to be used in
+   applications and to locate resources generally must contain at least
+   one period (".") character.  Those that do not are either invalid or
+   require the application to supply additional information.  Of course,
+   this principle does not apply when the purpose of the application is
+   to process or query TLD names themselves.  The DNS specification also
+   permits a trailing period to be used to denote the root, e.g.,
+   "a.b.c" and "a.b.c." are equivalent, but the latter is more explicit
+   and is required to be accepted by applications.  This convention is
+   especially important when a TLD name is being referred to directly.
+   For example, while ".COM" has become the popular terminology for
+   referring to that top-level domain, "COM." would be strictly and
+   technically correct in talking about the DNS, since it shows that
+   "COM" is a top-level domain name.
+
+   There is a long history of applications moving beyond the "one or
+   more periods" test in an attempt to verify that a valid TLD name is
+   actually present.  They have done this either by applying some
+   heuristics to the form of the name or by consulting a local list of
+   valid names.  The historical heuristics are no longer effective.  If
+   one is to keep a local list, much more effort must be devoted to
+   keeping it up-to-date than was the case several years ago.
+
+   The heuristics were based on the observation that, since the DNS was
+   first deployed, all top-level domain names were two, three, or four
+   characters in length.  All two-character names were associated with
+   "country code" domains, with the specific labels (with a few early
+   exceptions) drawn from the ISO list of codes for countries and
+   similar entities [IS3166].  The three-letter names were "generic"
+   TLDs, whose function was not country-specific, and there was exactly
+   one four-letter TLD, the infrastructure domain "ARPA."  [RFC1591].
+   However, these length-dependent rules were conventions, rather than
+   anything on which the protocols depended.
+
+   Before the mid-1990s, lists of valid top-level domain names changed
+   infrequently.  New country codes were gradually, and then more
+   rapidly, added as the Internet expanded, but the list of generic
+   domains did not change at all between the establishment of the "INT."
+   domain in 1988 and ICANN's allocation of new generic TLDs in 2000.
+   Some application developers responded by assuming that any two-letter
+   domain name could be valid as a TLD, but the list of generic TLDs was
+   fixed and could be kept locally and tested.  Several of these
+   assumptions changed as ICANN started to allocate new top-level
+
+
+
+Klensin                      Informational                      [Page 4]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+   domains: one two-letter domain that does not appear in the ISO 3166-1
+   table [ISO.3166.1988] was tentatively approved, and new domains were
+   created with three, four, and even six letter codes.
+
+   As of the first quarter of 2003, the list of valid, non-country,
+   top-level domains was .AERO, .BIZ, .COM, .COOP, .EDU, .GOV, .INFO,
+   .INT, .MIL, .MUSEUM, .NAME, .NET, .ORG, .PRO, and .ARPA.  ICANN is
+   expected to expand that list at regular intervals, so the list that
+   appears here should not be used in testing.  Instead, systems that
+   filter by testing top-level domain names should regularly update
+   their local tables of TLDs (both "generic" and country-code-related)
+   by polling the list published by IANA [DomainList].  It is
+   likely that the better strategy has now become to make the "at least
+   one period" test, to verify LDH conformance (including verification
+   that the apparent TLD name is not all-numeric), and then to use the
+   DNS to determine domain name validity, rather than trying to maintain
+   a local list of valid TLD names.
+
+   A DNS label may be no more than 63 octets long.  This is in the form
+   actually stored; if a non-ASCII label is converted to encoded
+   "punycode" form (see Section 5), the length of that form may restrict
+   the number of actual characters (in the original character set) that
+   can be accommodated.  A complete, fully-qualified, domain name must
+   not exceed 255 octets.
+
+   Some additional mechanisms for guessing correct domain names when
+   incomplete information is provided have been developed for use with
+   the web and are discussed in Section 4.4.
+
+3.  Restrictions on email addresses
+
+   Reference documents: RFC 2821 [RFC2821] and RFC 2822 [RFC2822]
+
+   Contemporary email addresses consist of a "local part" separated from
+   a "domain part" (a fully-qualified domain name) by an at-sign ("@").
+   The syntax of the domain part corresponds to that in the previous
+   section.  The concerns identified in that section about filtering and
+   lists of names apply to the domain names used in an email context as
+   well.  The domain name can also be replaced by an IP address in
+   square brackets, but that form is strongly discouraged except for
+   testing and troubleshooting purposes.
+
+   The local part may appear using the quoting conventions described
+   below.  The quoted forms are rarely used in practice, but are
+   required for some legitimate purposes.  Hence, they should not be
+   rejected in filtering routines but, should instead be passed to the
+   email system for evaluation by the destination host.
+
+
+
+
+Klensin                      Informational                      [Page 5]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+   The exact rule is that any ASCII character, including control
+   characters, may appear quoted, or in a quoted string.  When quoting
+   is needed, the backslash character is used to quote the following
+   character.  For example
+
+      Abc\@def@example.com
+
+   is a valid form of an email address.  Blank spaces may also appear,
+   as in
+
+      Fred\ Bloggs@example.com
+
+   The backslash character may also be used to quote itself, e.g.,
+
+      Joe.\\Blow@example.com
+
+   In addition to quoting using the backslash character, conventional
+   double-quote characters may be used to surround strings.  For example
+
+      "Abc@def"@example.com
+
+      "Fred Bloggs"@example.com
+
+   are alternate forms of the first two examples above.  These quoted
+   forms are rarely recommended, and are uncommon in practice, but, as
+   discussed above, must be supported by applications that are
+   processing email addresses.  In particular, the quoted forms often
+   appear in the context of addresses associated with transitions from
+   other systems and contexts; those transitional requirements do still
+   arise and, since a system that accepts a user-provided email address
+   cannot "know" whether that address is associated with a legacy
+   system, the address forms must be accepted and passed into the email
+   environment.
+
+   Without quotes, local-parts may consist of any combination of
+   alphabetic characters, digits, or any of the special characters
+
+      ! # $ % & ' * + - / = ?  ^ _ ` . { | } ~
+
+   period (".") may also appear, but may not be used to start or end the
+   local part, nor may two or more consecutive periods appear.  Stated
+   differently, any ASCII graphic (printing) character other than the
+   at-sign ("@"), backslash, double quote, comma, or square brackets may
+   appear without quoting.  If any of that list of excluded characters
+   are to appear, they must be quoted.  Forms such as
+
+      user+mailbox@example.com
+
+
+
+
+Klensin                      Informational                      [Page 6]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+      customer/department=shipping@example.com
+
+      $A12345@example.com
+
+      !def!xyz%abc@example.com
+
+      _somename@example.com
+
+   are valid and are seen fairly regularly, but any of the characters
+   listed above are permitted.  In the context of local parts,
+   apostrophe ("'") and acute accent ("`") are ordinary characters, not
+   quoting characters.  Some of the characters listed above are used in
+   conventions about routing or other types of special handling by some
+   receiving hosts.  But, since there is no way to know whether the
+   remote host is using those conventions or just treating these
+   characters as normal text, sending programs (and programs evaluating
+   address validity) must simply accept the strings and pass them on.
+
+   In addition to restrictions on syntax, there is a length limit on
+   email addresses.  That limit is a maximum of 64 characters (octets)
+   in the "local part" (before the "@") and a maximum of 255 characters
+   (octets) in the domain part (after the "@") for a total length of 320
+   characters.  Systems that handle email should be prepared to process
+   addresses which are that long, even though they are rarely
+   encountered.
+
+4.  URLs and URIs
+
+4.1.  URI syntax definitions and issues
+
+   The syntax for URLs (Uniform Resource Locators) is specified in
+   [RFC1738].  The syntax for the more general "URI" (Uniform Resource
+   Identifier) is specified in [RFC2396].  The URI syntax is extremely
+   general, with considerable variations permitted according to the type
+   of "scheme" (e.g., "http", "ftp", "mailto") that is being used.
+   While it is possible to use the general syntax rules of RFC 2396 to
+   perform syntax checks, they are general enough --essentially only
+   specifying the separation of the scheme name and "scheme specific
+   part" with a colon (":") and excluding some characters that must be
+   escaped if used-- to provide little significant filtering or
+   validation power.
+
+   The following characters are reserved in many URIs -- they must be
+   used for either their URI-intended purpose or must be encoded.  Some
+   particular schemes may either broaden or relax these restrictions
+   (see the following sections for URLs applicable to "web pages" and
+   electronic mail), or apply them only to particular URI component
+   parts.
+
+
+
+Klensin                      Informational                      [Page 7]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+      ; / ? : @ & = + $ , ?
+
+   In addition, control characters, the space character, the double-
+   quote (") character, and the following special characters
+
+      < > # %
+
+   are generally forbidden and must either be avoided or escaped, as
+   discussed below.
+
+   The colon after the scheme name, and the percent sign used to escape
+   characters, are specifically reserved for those purposes, although
+   ":" may also be used elsewhere in some schemes.
+
+   When it is necessary to encode these, or other, characters, the
+   method used is to replace it with a percent-sign ("%") followed by
+   two hexidecimal digits representing its octet value.  See section
+   2.4.1 of [RFC2396] for an exact definition.  Unless it is used as a
+   delimiter of the URI scheme itself, any character may optionally be
+   encoded this way; systems that are testing URI syntax should be
+   prepared for these encodings to appear in any component of the URI
+   except the scheme name itself.
+
+   A "generic URI" syntax is specified and is more restrictive, but
+   using it to test URI strings requires that one know whether or not
+   the particular scheme in use obeys that syntax.  Consequently,
+   applications that intend to check or validate URIs should normally
+   identify the scheme name and then apply scheme-specific tests.  The
+   rules for two of those -- HTTP [RFC1738] and MAILTO [RFC2368] URLs --
+   are discussed below, but the author of an application which intends
+   to make very precise checks, or to reject particular syntax rather
+   than just warning the user, should consult the relevant scheme-
+   definition documents for precise syntax and relationships.
+
+4.2.  The HTTP URL
+
+   Absolute HTTP URLs consist of the scheme name, a host name (expressed
+   as a domain name or IP address), and optional port number, and then,
+   optionally, a path, a search part, and a fragment identifier.  These
+   are separated, respectively, by a colon and the two slashes that
+   precede the host name, a colon, a slash, a question mark, and a hash
+   mark ("#").  So we have
+
+      http://host:port/path?search#fragment
+
+      http://host/path/
+
+      http://host/path#fragment
+
+
+
+Klensin                      Informational                      [Page 8]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+      http://host/path?search
+
+      http://host
+
+   and other variations on that form.  There is also a "relative" form,
+   but it almost never appears in text that a user might, e.g., enter
+   into a form.  See [RFC2616] for details.
+
+   The characters
+
+      / ; ?
+
+   are reserved within the path and search parts and must be encoded;
+   the first of these may be used unencoded, and is often used within
+   the path, to designate hierarchy.
+
+4.3.  The MAILTO URL
+
+   MAILTO is a URL type whose content is an email address.  It can be
+   used to encode any of the email address formats discussed in Section
+   3 above.  It can also support multiple addresses and the inclusion of
+   headers (e.g., Subject lines) within the body of the URL.  MAILTO is
+   authoritatively defined in RFC 2368 [RFC2368]; anyone expecting to
+   accept and test multiple addresses or mail header or body formats
+   should consult that document carefully.
+
+   In accepting text for, or validating, a MAILTO URL, it is important
+   to note that, while it can be used to encode any valid email address,
+   it is not sufficient to copy an email address into a MAILTO URL since
+   email addresses may include a number of characters that are invalid
+   in, or have reserved uses for, URLs.  Those characters must be
+   encoded, as outlined in Section 4.1 above, when the addresses are
+   mapped into the URL form.  Conversely, addresses in MAILTO URLs
+   cannot, in general, be copied directly into email contexts, since few
+   email programs will reverse the decodings (and doing so might be
+   interpreted as a protocol violation).
+
+   The following characters may appear in MAILTO URLs only with the
+   specific defined meanings given.  If they appear in an email address
+   (i.e., for some other purpose), they must be encoded:
+
+      :       The colon in "mailto:"
+
+      < > # " % { } | \ ^ ~ `
+
+      These characters are "unsafe" in any URL, and must always be
+      encoded.
+
+
+
+
+Klensin                      Informational                      [Page 9]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+   The following characters must also be encoded if they appear in a
+   MAILTO URL
+
+      ? & =
+         Used to delimit headers and their values when these are encoded
+         into URLs.
+
+   Some examples may be helpful:
+
+   +-------------------------+-----------------------------+-----------+
+   |      Email address      |         MAILTO URL          |   Notes   |
+   +-------------------------+-----------------------------+-----------+
+   |     Joe@example.com     |  mailto:joe@example.com     |     1     |
+   |                         |                             |           |
+   |  user+mailbox@example   |         mailto:             |     2     |
+   |          .com           |  user%2Bmailbox@example     |           |
+   |                         |          .com               |           |
+   |                         |                             |           |
+   |  customer/department=   |  mailto:customer%2F         |     3     |
+   |  shipping@example.com   | department=shipping@example |           |
+   |                         |          .com               |           |
+   |                         |                             |           |
+   |   $A12345@example.com   |  mailto:$A12345@example     |     4     |
+   |                         |          .com               |           |
+   |                         |                             |           |
+   |  !def!xyz%abc@example   |  mailto:!def!xyz%25abc      |     5     |
+   |          .com           |       @example.com          |           |
+   |                         |                             |           |
+   |  _somename@example.com  |  mailto:_somename@example   |     4     |
+   |                         |          .com               |           |
+   +-------------------------+-----------------------------+-----------+
+
+                                  Table 1
+
+   Notes on Table
+
+   1.  No characters appear in the email address that require escaping,
+       so the body of the MAILTO URL is identical to the email address.
+
+   2.  There is actually some uncertainty as to whether or not the "+"
+       characters requires escaping in MAILTO URLs (the standards are
+       not precisely clear).  But, since any character in the address
+       specification may optionally be encoded, it is probably safer to
+       encode it.
+
+   3.  The "/" character is generally reserved in URLs, and must be
+       encoded as %2F.
+
+
+
+
+Klensin                      Informational                     [Page 10]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+   4.  Neither the "$" nor the "_" character are given any special
+       interpretation in MAILTO URLs, so need not be encoded.
+
+   5.  While the "!" character has no special interpretation, the "%"
+       character is used to introduce encoded sequences and hence it
+       must always be encoded.
+
+4.4.  Guessing domain names in web contexts
+
+   Several web browsers have adopted a practice that permits an
+   incomplete domain name to be used as input instead of a complete URL.
+   This has, for example, permitted users to type "microsoft" and have
+   the browser interpret the input as "http://www.microsoft.com/".
+   Other browser versions have gone even further, trying to build DNS
+   names up through a series of heuristics, testing each variation in
+   turn to see if it appears in the DNS, and accepting the first one
+   found as the intended domain name.  Still, others automatically
+   invoke search engines if no period appears or if the reference fails.
+   If any of these approaches are to be used, it is often critical that
+   the browser recognize the complete list of TLDs.  If an incomplete
+   list is used, complete domain names may not be recognized as such and
+   the system may try to turn them into completely different names.  For
+   example, "example.aero" is a fully-qualified name, since "AERO." is a
+   TLD name.  But, if the system doesn't recognize "AERO" as a TLD name,
+   it is likely to try to look up "example.aero.com" and
+   "www.example.aero.com" (and then fail or find the wrong host), rather
+   than simply looking up the user-supplied name.
+
+   As discussed in Section 2 above, there are dangers associated with
+   software that attempts to "know" the list of top-level domain names
+   locally and take advantage of that knowledge.  These name-guessing
+   heuristics are another example of that situation: if the lists are
+   up-to-date and used carefully, the systems in which they are embedded
+   may provide an easier, and more attractive, experience for at least
+   some users.  But finding the wrong host, or being unable to find a
+   host even when its name is precisely known, constitute bad
+   experiences by any measure.
+
+   More generally, there have been bad experiences with attempts to
+   "complete" domain names by adding additional information to them.
+   These issues are described in some detail in RFC 1535 [RFC1535].
+
+5.  Implications of internationalization
+
+   The IETF has adopted a series of proposals ([RFC3490] - [RFC3492])
+   whose purpose is to permit encoding internationalized (i.e., non-
+   ASCII) names in the DNS.  The primary standard, and the group
+   generically, are known as "IDNA".  The actual strings stored in the
+
+
+
+Klensin                      Informational                     [Page 11]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+   DNS are in an encoded form: the labels begin with the characters
+   "xn--" followed by the encoded string.  Applications should be
+   prepared to accept and process the encoded form (those strings are
+   consistent with the "LDH rule" (see Section 2) so should not raise
+   any separate issues) and the use of local, and potentially other,
+   characters as appropriate to local systems and circumstances.
+
+   The IDNA specification describes the exact process to be used to
+   validate a name or encoded string.  The process is sufficiently
+   complex that shortcuts or heuristics, especially for versions of
+   labels written directly in Unicode or other coded character sets, are
+   likely to fail and cause problems.  In particular, the strings cannot
+   be validated with syntax or semantic rules of any of the usual sorts:
+   syntax validity is defined only in terms of the result of executing a
+   particular function.
+
+   In addition to the restrictions imposed by the protocols themselves,
+   many domains are implementing rules about just which non-ASCII names
+   they will permit to be registered (see, e.g., [JET], [RegRestr]).
+   This work is still relatively new, and the rules and conventions are
+   likely to be different for each domain, or at least each language or
+   script group.  Attempting to test for those rules in a client program
+   to see if a user-supplied name might possibly exist in the relevant
+   domain would almost certainly be ill-advised.
+
+   One quick local test however, may be reasonable: as of the time of
+   this writing, there should be no instances of labels in the DNS that
+   start with two characters, followed by two hyphens, where the two
+   characters are not "xn" (in, of course, either upper or lower case).
+   Such label strings, if they appear, are probably erroneous or
+   obsolete, and it may be reasonable to at least warn the user about
+   them.
+
+   There is ongoing work in the IETF and elsewhere to define
+   internationalized formats for use in other protocols, including email
+   addresses.  Those forms may or may not conform to existing rules for
+   ASCII-only identifiers; anyone designing evaluators or filters should
+   watch that work closely.
+
+6.  Summary
+
+   When an application accepts a string from the user and ultimately
+   passes it on to an API for a protocol, the desirability of testing or
+   filtering the text in any way not required by the protocol itself is
+   hotly debated.  If it must divide the string into its components, or
+   otherwise interpret it, it obviously must make at least enough tests
+   to validate that process.  With, e.g., domain names or email
+   addresses that can be passed on untouched, the appropriateness of
+
+
+
+Klensin                      Informational                     [Page 12]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+   trying to figure out which ones are valid and which ones are not
+   requires a more complex decision, one that should include
+   considerations of how to make exactly the correct tests and to keep
+   information that changes and evolves up-to-date.  A test containing
+   obsolete information, can be extremely frustrating for potential
+   correspondents or customers and may harm desired relationships.
+
+7.  Security Considerations
+
+   Since this document merely summarizes the requirements of existing
+   standards, it does not introduce any new security issues.  However,
+   many of the techniques that motivate the document raise important
+   security concerns of their own.  Rejecting valid forms of domain
+   names, email addresses, or URIs often denies service to the user of
+   those entities.  Worse, guessing at the user's intent when an
+   incomplete address, or other string, is given can result in
+   compromises to privacy or accuracy of reference if the wrong target
+   is found and returned.  From a security standpoint, the optimum
+   behavior is probably to never guess, but instead, to force the user
+   to specify exactly what is wanted.  When that position involves a
+   tradeoff with an acceptable user experience, good judgment should be
+   used and the fact that it is a tradeoff recognized.
+
+   Some characters have special or privileged meanings on some systems
+   (i.e., ` on Unix).  Applications should be careful to escape those
+   locally if necessary.  By the same token, they are valid, and should
+   not be disallowed locally, or escaped when transmitted through
+   Internet protocols, for such reasons if a remote site chooses to use
+   them.
+
+   The presence of local checking does not permit remote checking to be
+   bypassed.  Note that this can apply to a single machine; in
+   particular, a local MTA should not assume that a local MUA has
+   properly escaped locally-significant special characters.
+
+8.  Acknowledgements
+
+   The author would like to express his appreciation for helpful
+   comments from Harald Alvestrand, Eric A. Hall, and the RFC Editor,
+   and for partial support of this work from SITA.  Responsibility for
+   any errors remains, of course, with the author.
+
+   The first Internet-Draft on this subject was posted in February 2003.
+   The document was submitted to the RFC Editor on 20 June 2003,
+   returned for revisions on 19 August, and resubmitted on 5 September
+   2003.
+
+
+
+
+
+Klensin                      Informational                     [Page 13]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+9.  References
+
+9.1.  Normative References
+
+   [RFC1035]       Mockapetris, P., "Domain names - implementation and
+                   specification", STD 13, RFC 1035, November 1987.
+
+   [RFC1123]       Braden, R., Ed., "Requirements for Internet Hosts -
+                   Application and Support", STD 3, RFC 1123, October
+                   1989.
+
+   [RFC1535]       Gavron, E., "A Security Problem and Proposed
+                   Correction With Widely Deployed DNS Software", RFC
+                   1535, October 1993.
+
+   [RFC1738]       Berners-Lee, T., Masinter, L. and M. McCahill,
+                   "Uniform Resource Locators (URL)", RFC 1738, December
+                   1994.
+
+   [RFC2181]       Elz, R. and R. Bush, "Clarifications to the DNS
+                   Specification", RFC 2181, July 1997.
+
+   [RFC2368]       Hoffman, P., Masinter, L. and J. Zawinski, "The
+                   mailto URL scheme", RFC 2368, July 1998.
+
+   [RFC2396]       Berners-Lee, T., Fielding, R. and L. Masinter,
+                   "Uniform Resource Identifiers (URI): Generic Syntax",
+                   RFC 2396, August 1998.
+
+   [RFC2616]       Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
+                   Masinter, L., Leach, P. and T. Berners-Lee,
+                   "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616,
+                   June 1999.
+
+   [RFC2821]       Klensin, J., Ed., "Simple Mail Transfer Protocol",
+                   RFC 2821, April 2001.
+
+   [RFC2822]       Resnick, P., Ed., "Internet Message Format", RFC
+                   2822, April 2001.
+
+   [RFC3490]       Faltstrom, P., Hoffman, P. and A. Costello,
+                   "Internationalizing Domain Names in Applications
+                   (IDNA)", RFC 3490, March 2003.
+
+   [RFC3491]       Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
+                   Profile for Internationalized Domain Names (IDN)",
+                   RFC 3491, March 2003.
+
+
+
+
+Klensin                      Informational                     [Page 14]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+   [RFC3492]       Costello, A., "Punycode: A Bootstring encoding of
+                   Unicode for Internationalized Domain Names in
+                   Applications (IDNA)", RFC 3492, March 2003.
+
+   [ASCII]         American National Standards Institute (formerly
+                   United States of America Standards Institute), "USA
+                   Code for Information Interchange", ANSI X3.4-1968.
+                   ANSI X3.4-1968 has been replaced by newer versions
+                   with slight modifications, but the 1968 version
+                   remains definitive for the Internet.
+
+   [DomainList]    Internet Assigned Numbers Authority (IANA), Untitled
+                   alphabetical list of current top-level domains.
+                   http://data.iana.org/TLD/tlds-alpha-by-domain.txt
+                   ftp://data.iana.org/TLD/tlds-alpha-by-domain.txt
+
+9.2.  Informative References
+
+   [ISO.3166.1988] International Organization for Standardization,
+                   "Codes for the representation of names of countries,
+                   3rd edition", ISO Standard 3166, August 1988.
+
+   [JET]           Konishi, K., et al., "Internationalized Domain Names
+                   Registration and Administration Guideline for
+                   Chinese, Japanese and Korean", Work in Progress.
+
+   [RFC1591]       Postel, J., "Domain Name System Structure and
+                   Delegation", RFC 1591, March 1994.
+
+   [RegRestr]      Klensin, J., "Registration of Internationalized
+                   Domain Names: Overview and Method", Work in Progress,
+                   February 2004.
+
+10.  Author's Address
+
+   John C Klensin
+   1770 Massachusetts Ave, #322
+   Cambridge, MA  02140
+   USA
+
+   Phone: +1 617 491 5735
+   EMail: john-ietf@jck.com
+
+
+
+
+
+
+
+
+
+Klensin                      Informational                     [Page 15]
+
+RFC 3696          Checking and Transformation of Names     February 2004
+
+
+11.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (2004).  This document is subject
+   to the rights, licenses and restrictions contained in BCP 78 and
+   except as set forth therein, the authors retain all their rights.
+
+   This document and the information contained herein are provided on an
+   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+   The IETF takes no position regarding the validity or scope of any
+   Intellectual Property Rights or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; nor does it represent that it has
+   made any independent effort to identify any such rights.  Information
+   on the procedures with respect to rights in RFC documents can be
+   found in BCP 78 and BCP 79.
+
+   Copies of IPR disclosures made to the IETF Secretariat and any
+   assurances of licenses to be made available, or the result of an
+   attempt made to obtain a general license or permission for the use of
+   such proprietary rights by implementers or users of this
+   specification can be obtained from the IETF on-line IPR repository at
+   http://www.ietf.org/ipr.
+
+   The IETF invites any interested party to bring to its attention any
+   copyrights, patents or patent applications, or other proprietary
+   rights that may cover technology that may be required to implement
+   this standard.  Please address the information to the IETF at ietf-
+   ipr@ietf.org.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+Klensin                      Informational                     [Page 16]
+