summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6943.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc6943.txt')
-rw-r--r--doc/rfc/rfc6943.txt1459
1 files changed, 1459 insertions, 0 deletions
diff --git a/doc/rfc/rfc6943.txt b/doc/rfc/rfc6943.txt
new file mode 100644
index 0000000..16e9098
--- /dev/null
+++ b/doc/rfc/rfc6943.txt
@@ -0,0 +1,1459 @@
+
+
+
+
+
+
+Internet Architecture Board (IAB) D. Thaler, Ed.
+Request for Comments: 6943 Microsoft
+Category: Informational May 2013
+ISSN: 2070-1721
+
+
+ Issues in Identifier Comparison for Security Purposes
+
+Abstract
+
+ Identifiers such as hostnames, URIs, IP addresses, and email
+ addresses are often used in security contexts to identify security
+ principals and resources. In such contexts, an identifier presented
+ via some protocol is often compared using some policy to make
+ security decisions such as whether the security principal may access
+ the resource, what level of authentication or encryption is required,
+ etc. If the parties involved in a security decision use different
+ algorithms to compare identifiers, then failure scenarios ranging
+ from denial of service to elevation of privilege can result. This
+ document provides a discussion of these issues that designers should
+ consider when defining identifiers and protocols, and when
+ constructing architectures that use multiple protocols.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This document is a product of the Internet Architecture Board (IAB)
+ and represents information that the IAB has deemed valuable to
+ provide for permanent record. It represents the consensus of the
+ Internet Architecture Board (IAB). Documents approved for
+ publication by the IAB are not a candidate for any level of Internet
+ Standard; see Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6943.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Thaler Informational [Page 1]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+Copyright Notice
+
+ Copyright (c) 2013 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document.
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 1.1. Classes of Identifiers .....................................5
+ 1.2. Canonicalization ...........................................5
+ 2. Identifier Use in Security Policies and Decisions ...............6
+ 2.1. False Positives and Negatives ..............................7
+ 2.2. Hypothetical Example .......................................8
+ 3. Comparison Issues with Common Identifiers .......................9
+ 3.1. Hostnames ..................................................9
+ 3.1.1. IPv4 Literals ......................................11
+ 3.1.2. IPv6 Literals ......................................12
+ 3.1.3. Internationalization ...............................13
+ 3.1.4. Resolution for Comparison ..........................14
+ 3.2. Port Numbers and Service Names ............................14
+ 3.3. URIs ......................................................15
+ 3.3.1. Scheme Component ...................................16
+ 3.3.2. Authority Component ................................16
+ 3.3.3. Path Component .....................................17
+ 3.3.4. Query Component ....................................17
+ 3.3.5. Fragment Component .................................17
+ 3.3.6. Resolution for Comparison ..........................18
+ 3.4. Email Address-Like Identifiers ............................18
+ 4. General Issues .................................................19
+ 4.1. Conflation ................................................19
+ 4.2. Internationalization ......................................20
+ 4.3. Scope .....................................................21
+ 4.4. Temporality ...............................................21
+ 5. Security Considerations ........................................22
+ 6. Acknowledgements ...............................................22
+ 7. IAB Members at the Time of Approval ............................23
+ 8. Informative References .........................................23
+
+
+
+
+
+
+
+Thaler Informational [Page 2]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+1. Introduction
+
+ In computing and the Internet, various types of "identifiers" are
+ used to identify humans, devices, content, etc. This document
+ provides a discussion of some security issues that designers should
+ consider when defining identifiers and protocols, and when
+ constructing architectures that use multiple protocols. Before
+ discussing these security issues, we first give some background on
+ some typical processes involving identifiers. Terms such as
+ "identifier", "identity", and "principal" are used as defined in
+ [RFC4949].
+
+ As depicted in Figure 1, there are multiple processes relevant to our
+ discussion.
+
+ 1. An identifier is first generated. If the identifier is intended
+ to be unique, the generation process must include some mechanism,
+ such as allocation by a central authority or verification among
+ the members of a distributed authority, to help ensure
+ uniqueness. However, the notion of "unique" involves determining
+ whether a putative identifier matches any other identifier that
+ has already been allocated. As we will see, for many types of
+ identifiers, this is not simply an exact binary match.
+
+ After generating the identifier, it is often stored in two
+ locations: with the requester or "holder" of the identifier, and
+ with some repository of identifiers (e.g., DNS). For example, if
+ the identifier was allocated by a central authority, the
+ repository might be that authority. If the identifier identifies
+ a device or content on a device, the repository might be that
+ device.
+
+ 2. The identifier is distributed, either by the holder of the
+ identifier or by a repository of identifiers, to others who could
+ use the identifier. This distribution might be electronic, but
+ sometimes it is via other channels such as voice, business card,
+ billboard, or other form of advertisement. The identifier itself
+ might be distributed directly, or it might be used to generate a
+ portion of another type of identifier that is then distributed.
+ For example, a URI or email address might include a server name,
+ and hence distributing the URI or email address also inherently
+ distributes the server name.
+
+ 3. The identifier is used by some party. Generally, the user
+ supplies the identifier, which is (directly or indirectly) sent
+ to the repository of identifiers. The repository of identifiers
+ must then attempt to match the user-supplied identifier with an
+ identifier in its repository.
+
+
+
+Thaler Informational [Page 3]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ For example, using an email address to send email to the holder
+ of an identifier may result in the email arriving at the holder's
+ email server, which has access to the mail stores.
+
+ +------------+
+ | Holder of | 1. Generation
+ | identifier +<---------+
+ +----+-------+ |
+ | | Match
+ | v/
+ | +-------+-------+
+ +----------+ Repository of |
+ | | identifiers |
+ | +-------+-------+
+ 2. Distribution | ^\
+ | | Match
+ v |
+ +---------+-------+ |
+ | User of | |
+ | identifier +----------+
+ +-----------------+ 3. Use
+
+ Figure 1: Typical Identifier Processes
+
+ Another variation is where a user is given the identifier of a
+ resource (e.g., a web site) to access securely, sometimes known as a
+ "reference identifier" [RFC6125], and the server hosting the resource
+ then presents its identity at the time of use. In this case, the
+ user application attempts to match the presented identity against the
+ reference identifier.
+
+ One key aspect is that the identifier values passed in generation,
+ distribution, and use may all be in different forms. For example, an
+ identifier might be exchanged in printed form at generation time,
+ distributed to a user via voice, and then used electronically. As
+ such, the match process can be complicated.
+
+ Furthermore, in many cases, the relationship between holder,
+ repositories, and users may be more involved. For example, when a
+ hierarchy of web caches exists, each cache is itself a repository of
+ a sort, and the match process is usually intended to be the same as
+ on the origin server.
+
+ Another aspect to keep in mind is that there can be multiple
+ identifiers that refer to the same object (i.e., resource, human,
+ device, etc.). For example, a human might have a passport number and
+ a drivers license number, and an RFC might be available at multiple
+ locations (rfc-editor.org and ietf.org). In this document, we focus
+
+
+
+Thaler Informational [Page 4]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ on comparing two identifiers to see whether they are the same
+ identifier, rather than comparing two different identifiers to see
+ whether they refer to the same entity (although a few issues with the
+ latter are touched on in several places, such as Sections 3.1.4 and
+ 3.3.6).
+
+1.1. Classes of Identifiers
+
+ In this document, we will refer to the following classes of
+ identifiers:
+
+ o Absolute: identifiers that can be compared byte-by-byte for
+ equality. Two identifiers that have different bytes are defined
+ to be different. For example, binary IP addresses are in this
+ class.
+
+ o Definite: identifiers that have a single well-defined comparison
+ algorithm. For example, URI scheme names are required to be
+ US-ASCII [USASCII] and are defined to match in a case-insensitive
+ way; the comparison is thus definite, since there is a well-
+ specified algorithm (Section 9.2.1 of [RFC4790]) on how to do a
+ case-insensitive match among ASCII strings.
+
+ o Indefinite: identifiers that have no single well-defined
+ comparison algorithm. For example, human names are in this class.
+ Everyone might want the comparison to be tailored for their
+ locale, for some definition of "locale". In some cases, there may
+ be limited subsets of parties that might be able to agree (e.g.,
+ ASCII users might all agree on a common comparison algorithm,
+ whereas users of other Roman-derived scripts, such as Turkish, may
+ not), but identifiers often tend to leak out of such limited
+ environments.
+
+1.2. Canonicalization
+
+ Perhaps the most common algorithm for comparison involves first
+ converting each identifier to a canonical form (a process known as
+ "canonicalization" or "normalization") and then testing the resulting
+ canonical representations for bitwise equality. In so doing, it is
+ thus critical that all entities involved agree on the same canonical
+ form and use the same canonicalization algorithm so that the overall
+ comparison process is also the same.
+
+ Note that in some contexts, such as in internationalization, the
+ terms "canonicalization" and "normalization" have a precise meaning.
+ In this document, however, we use these terms synonymously in their
+ more generic form, to mean conversion to some standard form.
+
+
+
+
+Thaler Informational [Page 5]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ While the most common method of comparison includes canonicalization,
+ comparison can also be done by defining an equivalence algorithm,
+ where no single form is canonical. However, in most cases, a
+ canonical form is useful for other purposes, such as output, and so
+ in such cases defining a canonical form suffices to define a
+ comparison method.
+
+2. Identifier Use in Security Policies and Decisions
+
+ Identifiers such as hostnames, URIs, and email addresses are used in
+ security contexts to identify security principals (i.e., entities
+ that can be authenticated) and resources as well as other security
+ parameters such as types and values of claims. Those identifiers are
+ then used to make security decisions based on an identifier presented
+ via some protocol. For example:
+
+ o Authentication: a protocol might match a security principal's
+ identifier to look up expected keying material and then match
+ keying material.
+
+ o Authorization: a protocol might match a resource name against some
+ policy. For example, it might look up an access control list
+ (ACL) and then look up the security principal's identifier (or a
+ surrogate for it) in that ACL.
+
+ o Accounting: a system might create an accounting record for a
+ security principal's identifier or resource name, and then might
+ later need to match a presented identifier to (for example) add
+ new filtering rules based on the records in order to stop an
+ attack.
+
+ If the parties involved in a security decision use different matching
+ algorithms for the same identifiers, then failure scenarios ranging
+ from denial of service to elevation of privilege can result, as we
+ will see.
+
+ This is especially complicated in cases involving multiple parties
+ and multiple protocols. For example, there are many scenarios where
+ some form of "security token service" is used to grant to a requester
+ permission to access a resource, where the resource is held by a
+ third party that relies on the security token service (see Figure 2).
+ The protocol used to request permission (e.g., Kerberos or OAuth) may
+ be different from the protocol used to access the resource (e.g.,
+ HTTP). Opportunities for security problems arise when two protocols
+ define different comparison algorithms for the same type of
+ identifier, or when a protocol is ambiguously specified and two
+ endpoints (e.g., a security token service and a resource holder)
+ implement different algorithms within the same protocol.
+
+
+
+Thaler Informational [Page 6]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ +----------+
+ | security |
+ | token |
+ | service |
+ +----------+
+ ^
+ | 1. supply credentials and
+ | get token for resource
+ | +--------+
+ +----------+ 2. supply token and access resource |resource|
+ |requester |=------------------------------------->| holder |
+ +----------+ +--------+
+
+ Figure 2: Simple Security Exchange
+
+ In many cases, the situation is more complex. With X.509 Public Key
+ Infrastructure (PKIX) certificates [RFC6125], for example, the name
+ in a certificate gets compared against names in ACLs or other things.
+ In the case of web site security, the name in the certificate gets
+ compared to a portion of the URI that a user may have typed into a
+ browser. The fact that many different people are doing the typing,
+ on many different types of systems, complicates the problem.
+
+ Add to this the certificate enrollment step, and the certificate
+ issuance step, and two more parties have an opportunity to adjust the
+ encoding, or worse, the software that supports them might make
+ changes that the parties are unaware are happening.
+
+2.1. False Positives and Negatives
+
+ It is first worth discussing in more detail the effects of errors in
+ the comparison algorithm. A "false positive" results when two
+ identifiers compare as if they were equal but in reality refer to two
+ different objects (e.g., security principals or resources). When
+ privilege is granted on a match, a false positive thus results in an
+ elevation of privilege -- for example, allowing execution of an
+ operation that should not have been permitted otherwise. When
+ privilege is denied on a match (e.g., matching an entry in a
+ block/deny list or a revocation list), a permissible operation is
+ denied. At best, this can cause worse performance (e.g., a cache
+ miss or forcing redundant authentication) and at worst can result in
+ a denial of service.
+
+
+
+
+
+
+
+
+
+Thaler Informational [Page 7]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ A "false negative" results when two identifiers that in reality refer
+ to the same thing compare as if they were different, and the effects
+ are the reverse of those for false positives. That is, when
+ privilege is granted on a match, the result is at best worse
+ performance and at worst a denial of service; when privilege is
+ denied on a match, elevation of privilege results.
+
+ Figure 3 summarizes these effects.
+
+ | "Grant on match" | "Deny on match"
+ ---------------+------------------------+-----------------------
+ False positive | Elevation of privilege | Denial of service
+ ---------------+------------------------+-----------------------
+ False negative | Denial of service | Elevation of privilege
+ ---------------+------------------------+-----------------------
+
+ Figure 3: Worst Effects of False Positives/Negatives
+
+ When designing a comparison algorithm, one can typically modify it to
+ increase the likelihood of false positives and decrease the
+ likelihood of false negatives, or vice versa. Which outcome is
+ better depends on the context.
+
+ Elevation of privilege is almost always seen as far worse than denial
+ of service. Hence, for URIs, for example, Section 6.1 of [RFC3986]
+ states that "comparison methods are designed to minimize false
+ negatives while strictly avoiding false positives".
+
+ Thus, URIs were defined with a "grant privilege on match" paradigm in
+ mind, where it is critical to prevent elevation of privilege while
+ minimizing denial of service. Using URIs in a "deny privilege on
+ match" system can thus be problematic.
+
+2.2. Hypothetical Example
+
+ In this example, both security principals and resources are
+ identified using URIs. Foo Corp has paid example.com for access to
+ the Stuff service. Foo Corp allows its employees to create accounts
+ on the Stuff service. Alice gets the account
+ "http://example.com/Stuff/FooCorp/alice" and Bob gets
+ "http://example.com/Stuff/FooCorp/bob". It turns out, however, that
+ Foo Corp's URI canonicalizer includes URI fragment components in
+ comparisons whereas example.com's does not, and Foo Corp does not
+ disallow the # character in the account name. So Chuck, who is a
+ malicious employee of Foo Corp, asks to create an account at
+ example.com with the name alice#stuff. Foo Corp's URI logic checks
+ its records for accounts it has created with stuff and sees that
+ there is no account with the name alice#stuff. Hence, in its
+
+
+
+Thaler Informational [Page 8]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ records, it associates the account alice#stuff with Chuck and will
+ only issue tokens good for use with
+ "http://example.com/Stuff/FooCorp/alice#stuff" to Chuck.
+
+ Chuck, the attacker, goes to a security token service at Foo Corp and
+ asks for a security token good for
+ "http://example.com/Stuff/FooCorp/alice#stuff". Foo Corp issues the
+ token, since Chuck is the legitimate owner (in Foo Corp's view) of
+ the alice#stuff account. Chuck then submits the security token in a
+ request to "http://example.com/Stuff/FooCorp/alice".
+
+ But example.com uses a URI canonicalizer that, for the purposes of
+ checking equality, ignores fragments. So when example.com looks in
+ the security token to see if the requester has permission from Foo
+ Corp to access the given account, it successfully matches the URI in
+ the security token, "http://example.com/Stuff/FooCorp/alice#stuff",
+ with the requested resource name
+ "http://example.com/Stuff/FooCorp/alice".
+
+ Leveraging the inconsistencies in the canonicalizers used by Foo Corp
+ and example.com, Chuck is able to successfully launch an elevation-
+ of-privilege attack and access Alice's resource.
+
+ Furthermore, consider an attacker using a similar corporation, such
+ as "foocorp" (or any variation containing a non-ASCII character that
+ some humans might expect to represent the same corporation). If the
+ resource holder treats them as different but the security token
+ service treats them as the same, then elevation of privilege can
+ occur in this scenario as well.
+
+3. Comparison Issues with Common Identifiers
+
+ In this section, we walk through a number of common types of
+ identifiers and discuss various issues related to comparison that may
+ affect security whenever they are used to identify security
+ principals or resources. These examples illustrate common patterns
+ that may arise with other types of identifiers.
+
+3.1. Hostnames
+
+ Hostnames (composed of dot-separated labels) are commonly used either
+ directly as identifiers, or as components in identifiers such as in
+ URIs and email addresses. Another example is in Sections 7.2 and 7.3
+ of [RFC5280] (and updated in Section 3 of [RFC6818]), which specify
+ use in PKIX certificates.
+
+ In this section, we discuss a number of issues in comparing strings
+ that appear to be some form of hostname.
+
+
+
+Thaler Informational [Page 9]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ It is first worth pointing out that the term "hostname" itself is
+ often ambiguous, and hence it is important that any use clarify which
+ definition is intended. Some examples of definitions include:
+
+ a. A Fully Qualified Domain Name (FQDN),
+
+ b. An FQDN that is associated with address records in the DNS,
+
+ c. The leftmost label in an FQDN, or
+
+ d. The leftmost label in an FQDN that is associated with address
+ records.
+
+ The use of different definitions in different places results in
+ questions such as whether "example" and "example.com" are considered
+ equal or not, and hence it is important when writing new
+ specifications to be clear about which definition is meant.
+
+ Section 3 of [RFC6055] discusses the differences between a "hostname"
+ and a "DNS name", where the former is a subset of the latter by using
+ a restricted set of characters (letters, digits, and hyphens). If
+ one canonicalizer uses the "DNS name" definition whereas another uses
+ a "hostname" definition, a name might be valid in the former but
+ invalid in the latter. As long as invalid identifiers are denied
+ privilege, this difference will not result in elevation of privilege.
+
+ Section 3.1 of [RFC1034] discusses the difference between a
+ "complete" domain name, which ends with a dot (such as
+ "example.com."), and a multi-label relative name such as
+ "example.com" that assumes the root (".") is in the suffix search
+ list. In most contexts, these are considered equal, but there may be
+ issues if different entities in a security architecture have
+ different interpretations of a relative domain name.
+
+ [IAB1123] briefly discusses issues with the ambiguity around whether
+ a label will be "alphabetic" -- including, among other issues, how
+ "alphabetic" should be interpreted in an internationalized
+ environment -- and whether a hostname can be interpreted as an IP
+ address. We explore this last issue in more detail below.
+
+
+
+
+
+
+
+
+
+
+
+
+Thaler Informational [Page 10]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+3.1.1. IPv4 Literals
+
+ Section 2.1 of [RFC1123] states:
+
+ Whenever a user inputs the identity of an Internet host, it SHOULD
+ be possible to enter either (1) a host domain name or (2) an IP
+ address in dotted-decimal ("#.#.#.#") form. The host SHOULD check
+ the string syntactically for a dotted-decimal number before
+ looking it up in the Domain Name System.
+
+ and
+
+ This last requirement is not intended to specify the complete
+ syntactic form for entering a dotted-decimal host number; that is
+ considered to be a user-interface issue.
+
+ In specifying the inet_addr() API, the Portable Operating System
+ Interface (POSIX) standard [IEEE-1003.1] defines "IPv4 dotted decimal
+ notation" as allowing not only strings of the form "10.0.1.2" but
+ also allowing octal and hexadecimal, and addresses with less than
+ four parts. For example, "10.0.258", "0xA000102", and "012.0x102"
+ all represent the same IPv4 address in standard "IPv4 dotted decimal"
+ notation. We will refer to this as the "loose" syntax of an IPv4
+ address literal.
+
+ In Section 6.1 of [RFC3493], getaddrinfo() is defined to support the
+ same (loose) syntax as inet_addr():
+
+ If the specified address family is AF_INET or AF_UNSPEC, address
+ strings using Internet standard dot notation as specified in
+ inet_addr() are valid.
+
+ In contrast, Section 6.3 of the same RFC states, specifying
+ inet_pton():
+
+ If the af argument of inet_pton() is AF_INET, the src string shall
+ be in the standard IPv4 dotted-decimal form:
+
+ ddd.ddd.ddd.ddd
+
+ where "ddd" is a one to three digit decimal number between 0 and
+ 255. The inet_pton() function does not accept other formats (such
+ as the octal numbers, hexadecimal numbers, and fewer than four
+ numbers that inet_addr() accepts).
+
+
+
+
+
+
+
+Thaler Informational [Page 11]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ As shown above, inet_pton() uses what we will refer to as the
+ "strict" form of an IPv4 address literal. Some platforms also use
+ the strict form with getaddrinfo() when the AI_NUMERICHOST flag is
+ passed to it.
+
+ Both the strict and loose forms are standard forms, and hence a
+ protocol specification is still ambiguous if it simply defines a
+ string to be in the "standard IPv4 dotted decimal form". And, as a
+ result of these differences, names such as "10.11.12" are ambiguous
+ as to whether they are an IP address or a hostname, and even
+ "10.11.12.13" can be ambiguous because of the "SHOULD" in the above
+ text from RFC 1123, making it optional whether to treat it as an
+ address or a DNS name.
+
+ Protocols and data formats that can use addresses in string form for
+ security purposes need to resolve these ambiguities. For example,
+ for the host component of URIs, Section 3.2.2 of [RFC3986] resolves
+ the first ambiguity by only allowing the strict form and resolves the
+ second ambiguity by specifying that it is considered an IPv4 address
+ literal. New protocols and data formats should similarly consider
+ using the strict form rather than the loose form in order to better
+ match user expectations.
+
+ A string might be valid under the "loose" definition but invalid
+ under the "strict" definition. As long as invalid identifiers are
+ denied privilege, this difference will not result in elevation of
+ privilege. Some protocols, however, use strings that can be either
+ an IP address literal or a hostname. Such strings are at best
+ Definite identifiers, and often turn out to be Indefinite
+ identifiers. (See Section 4.1 for more discussion.)
+
+3.1.2. IPv6 Literals
+
+ IPv6 addresses similarly have a wide variety of alternate but
+ semantically identical string representations, as defined in
+ Section 2.2 of [RFC4291] and Section 2 of [RFC6874]. As discussed in
+ Section 3.2.5 of [RFC5952], this fact causes problems in security
+ contexts if comparison (such as in PKIX certificates) is done between
+ strings rather than between the binary representations of addresses.
+
+ [RFC5952] specified a recommended canonical string format as an
+ attempt to solve this problem, but it may not be ubiquitously
+ supported at present. And, when strings can contain non-ASCII
+ characters, the same issues (and more, since hexadecimal and colons
+ are allowed) arise as with IPv4 literals.
+
+
+
+
+
+
+Thaler Informational [Page 12]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ Whereas (binary) IPv6 addresses are Absolute identifiers, IPv6
+ address literals are Definite identifiers, since string-to-address
+ conversion for IPv6 address literals is unambiguous.
+
+3.1.3. Internationalization
+
+ The IETF policy on character sets and languages [RFC2277] requires
+ support for UTF-8 in protocols, and as a result many protocols now do
+ support non-ASCII characters. When a hostname is sent in a UTF-8
+ field, there are a number of ways it may be encoded. For example,
+ hostname labels might be encoded directly in UTF-8, or they might
+ first be Punycode-encoded [RFC3492] or even percent-encoded from
+ UTF-8.
+
+ For example, in URIs, Section 3.2.2 of [RFC3986] specifically allows
+ for the use of percent-encoded UTF-8 characters in the hostname as
+ well as the use of Internationalized Domain Names in Applications
+ (IDNA) encoding [RFC3490] using the Punycode algorithm.
+
+ Percent-encoding is unambiguous for hostnames, since the percent
+ character cannot appear in the strict definition of a "hostname",
+ though it can appear in a DNS name.
+
+ Punycode-encoded labels (or "A-labels"), on the other hand, can be
+ ambiguous if hosts are actually allowed to be named with a name
+ starting with "xn--", and false positives can result. While this may
+ be extremely unlikely for normal scenarios, it nevertheless provides
+ a possible vector for an attacker.
+
+ A hostname comparator thus needs to decide whether a Punycode-encoded
+ label should or should not be considered a valid hostname label, and
+ if so, then whether it should match a label encoded in some other
+ form such as a percent-encoded Unicode label (U-label).
+
+ For example, Section 3 of "Transport Layer Security (TLS) Extensions:
+ Extension Definitions" [RFC6066] states:
+
+ "HostName" contains the fully qualified DNS hostname of the
+ server, as understood by the client. The hostname is represented
+ as a byte string using ASCII encoding without a trailing dot.
+ This allows the support of internationalized domain names through
+ the use of A-labels defined in [RFC5890]. DNS hostnames are case-
+ insensitive. The algorithm to compare hostnames is described in
+ [RFC5890], Section 2.3.2.4.
+
+ For some additional discussion of security issues that arise with
+ internationalization, see Section 4.2 and [TR36].
+
+
+
+
+Thaler Informational [Page 13]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+3.1.4. Resolution for Comparison
+
+ Some systems (specifically Java URLs [JAVAURL]) use the rule that if
+ two hostnames resolve to the same IP address(es) then the hostnames
+ are considered equal. That is, the canonicalization algorithm
+ involves name resolution with an IP address being the canonical form.
+
+ For example, if resolution was done via DNS, and DNS contained:
+
+ example.com. IN A 10.0.0.6
+ example.net. CNAME example.com.
+ example.org. IN A 10.0.0.6
+
+ then the algorithm might treat all three names as equal, even though
+ the third name might refer to a different entity.
+
+ With the introduction of dynamic IP addresses; private IP addresses;
+ multiple IP addresses per name; multiple address families (e.g., IPv4
+ vs. IPv6); devices that roam to new locations; commonly deployed DNS
+ tricks that result in the answer depending on factors such as the
+ requester's location and the load on the server whose address is
+ returned; etc., this method of comparison cannot be relied upon.
+ There is no guarantee that two names for the same host will resolve
+ the name to the same IP addresses; nor that the addresses resolved
+ refer to the same entity, such as when the names resolve to private
+ IP addresses; nor even that the system has connectivity (and the
+ willingness to wait for the delay) to resolve names at the time the
+ answer is needed. The lifetime of the identifier, and of any cached
+ state from a previous resolution, also affects security (see
+ Section 4.4).
+
+ In addition, a comparison mechanism that relies on the ability to
+ resolve identifiers such as hostnames to other identifiers such as IP
+ addresses leaks information about security decisions to outsiders if
+ these queries are publicly observable. (See [PRIVACY-CONS] for a
+ deeper discussion of information disclosure.)
+
+ Finally, it is worth noting that resolving two identifiers to
+ determine if they refer to the same entity can be thought of as a use
+ of such identifiers, as opposed to actually comparing the identifiers
+ themselves, which is the focus of this document.
+
+3.2. Port Numbers and Service Names
+
+ Port numbers and service names are discussed in depth in [RFC6335].
+ Historically, there were port numbers, service names used in SRV
+ records, and mnemonic identifiers for assigned port numbers (known as
+ port "keywords" at [IANA-PORT]). The latter two are now unified, and
+
+
+
+Thaler Informational [Page 14]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ various protocols use one or more of these types in strings. For
+ example, the common syntax used by many URI schemes allows port
+ numbers but not service names. Some implementations of the
+ getaddrinfo() API support strings that can be either port numbers or
+ port keywords (but not service names).
+
+ For protocols that use service names that must be resolved, the
+ issues are the same as those for resolution of addresses in
+ Section 3.1.4. In addition, Section 5.1 of [RFC6335] clarifies that
+ service names/port keywords must contain at least one letter. This
+ prevents confusion with port numbers in strings where both are
+ allowed.
+
+3.3. URIs
+
+ This section looks at issues related to using URIs for security
+ purposes. For example, Section 7.4 of [RFC5280] specifies comparison
+ of URIs in certificates. Examples of URIs in security-token-based
+ access control systems include WS-*, SAML 2.0 [OASIS-SAMLv2-CORE],
+ and OAuth Web Resource Authorization Profiles (WRAP) [OAuth-WRAP].
+ In such systems, a variety of participants in the security
+ infrastructure are identified by URIs. For example, requesters of
+ security tokens are sometimes identified with URIs. The issuers of
+ security tokens and the relying parties who are intended to consume
+ security tokens are frequently identified by URIs. Claims in
+ security tokens often have their types defined using URIs, and the
+ values of the claims can also be URIs.
+
+ URIs are defined with multiple components, each of which has its own
+ rules. We cover each in turn below. However, it is also important
+ to note that there exist multiple comparison algorithms. Section 6.2
+ of [RFC3986] states:
+
+ A variety of methods are used in practice to test URI equivalence.
+ These methods fall into a range, distinguished by the amount of
+ processing required and the degree to which the probability of
+ false negatives is reduced. As noted above, false negatives
+ cannot be eliminated. In practice, their probability can be
+ reduced, but this reduction requires more processing and is not
+ cost-effective for all applications.
+
+ If this range of comparison practices is considered as a ladder,
+ the following discussion will climb the ladder, starting with
+ practices that are cheap but have a relatively higher chance of
+ producing false negatives, and proceeding to those that have
+ higher computational cost and lower risk of false negatives.
+
+
+
+
+
+Thaler Informational [Page 15]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ The ladder approach has both pros and cons. On the pro side, it
+ allows some uses to optimize for security, and other uses to optimize
+ for cost, thus allowing URIs to be applicable to a wide range of
+ uses. A disadvantage is that when different approaches are taken by
+ different components in the same system using the same identifiers,
+ the inconsistencies can result in security issues.
+
+3.3.1. Scheme Component
+
+ [RFC3986] defines URI schemes as being case-insensitive US-ASCII and
+ in Section 6.2.2.1 specifies that scheme names should be normalized
+ to lowercase characters.
+
+ New schemes can be defined over time. In general, however, two URIs
+ with an unrecognized scheme cannot be safely compared. This is
+ because the canonicalization and comparison rules for the other
+ components may vary by scheme. For example, a new URI scheme might
+ have a default port of X, and without that knowledge, a comparison
+ algorithm cannot know whether "example.com" and "example.com:X"
+ should be considered to match in the authority component. Hence, for
+ security purposes, it is safest for unrecognized schemes to be
+ treated as invalid identifiers. However, if the URIs are only used
+ with a "grant access on match" paradigm, then unrecognized schemes
+ can be supported by doing a generic case-sensitive comparison, at the
+ expense of some false negatives.
+
+3.3.2. Authority Component
+
+ The authority component is scheme-specific, but many schemes follow a
+ common syntax that allows for userinfo, host, and port.
+
+3.3.2.1. Host
+
+ Section 3.1 discusses issues with hostnames in general. In addition,
+ Section 3.2.2 of [RFC3986] allows future changes using the IPvFuture
+ production. As with IPv4 and IPv6 literals, IPvFuture formats may
+ have issues with multiple semantically identical string
+ representations and may also be semantically identical to an IPv4 or
+ IPv6 address. As such, false negatives may be common if IPvFuture is
+ used.
+
+3.3.2.2. Port
+
+ See discussion in Section 3.2.
+
+
+
+
+
+
+
+Thaler Informational [Page 16]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+3.3.2.3. Userinfo
+
+ [RFC3986] defines the userinfo production that allows arbitrary data
+ about the user of the URI to be placed before '@' signs in URIs. For
+ example, "ftp://alice:bob@example.com/bar" has the value "alice:bob"
+ as its userinfo. When comparing URIs in a security context, one must
+ decide whether to treat the userinfo as being significant or not.
+ Some URI comparison services, for example, treat
+ "ftp://alice:ick@example.com" and "ftp://example.com" as being equal.
+
+ When the userinfo is treated as being significant, it has additional
+ considerations (e.g., whether or not it is case sensitive), which we
+ cover in Section 3.4.
+
+3.3.3. Path Component
+
+ [RFC3986] supports the use of path segment values such as "./" or
+ "../" for relative URIs. As discussed in Section 6.2.2.3 of
+ [RFC3986], they are intended only for use within a reference relative
+ to some other base URI, but Section 5.2.4 of [RFC3986] nevertheless
+ defines an algorithm to remove them as part of URI normalization.
+
+ Unless a scheme states otherwise, the path component is defined to be
+ case sensitive. However, if the resource is stored and accessed
+ using a filesystem using case-insensitive paths, there will be many
+ paths that refer to the same resource. As such, false negatives can
+ be common in this case.
+
+3.3.4. Query Component
+
+ There is the question as to whether "http://example.com/foo",
+ "http://example.com/foo?", and "http://example.com/foo?bar" are each
+ considered equal or different.
+
+ Similarly, it is unspecified whether the order of values matters.
+ For example, should "http://example.com/blah?ick=bick&foo=bar" be
+ considered equal to "http://example.com/blah?foo=bar&ick=bick"? And
+ if a domain name is permitted to appear in a query component (e.g.,
+ in a reference to another URI), the same issues in Section 3.1 apply.
+
+3.3.5. Fragment Component
+
+ Some URI formats include fragment identifiers. These are typically
+ handles to locations within a resource and are used for local
+ reference. A classic example is the use of fragments in HTTP URIs
+ where a URI of the form "http://example.com/blah.html#ick" means
+ retrieve the resource "http://example.com/blah.html" and, once it has
+ arrived locally, find the HTML anchor named "ick" and display that.
+
+
+
+Thaler Informational [Page 17]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ So, for example, when a user clicks on the link
+ "http://example.com/blah.html#baz", a browser will check its cache by
+ doing a URI comparison for "http://example.com/blah.html" and, if the
+ resource is present in the cache, a match is declared.
+
+ Hence, comparisons for security purposes typically ignore the
+ fragment component and treat all fragments as equal to the full
+ resource. However, if one were actually trying to compare the piece
+ of a resource that was identified by the fragment identifier,
+ ignoring it would result in potential false positives.
+
+3.3.6. Resolution for Comparison
+
+ It may be tempting to define a URI comparison algorithm based on
+ whether URIs resolve to the same content, along the lines of
+ resolving hostnames as described in Section 3.1.4. However, such an
+ algorithm would result in similar problems, including content that
+ dynamically changes over time or that is based on factors such as the
+ requester's location, potential lack of external connectivity at the
+ time or place that comparison is done, introduction of potentially
+ undesirable delay, etc.
+
+ In addition, as noted in Section 3.1.4, resolution leaks information
+ about security decisions to outsiders if the queries are publicly
+ observable.
+
+3.4. Email Address-Like Identifiers
+
+ Section 3.4.1 of [RFC5322] defines the syntax of an email address-
+ like identifier, and Section 3.2 of [RFC6532] updates it to support
+ internationalization. Section 7.5 of [RFC5280] further discusses the
+ use of internationalized email addresses in certificates.
+
+ Regarding the security impact of internationalized email headers,
+ [RFC6532] points to Section 14 of [RFC6530], which contains a
+ discussion of many issues resulting from internationalization.
+
+ Email address-like identifiers have a local part and a domain part.
+ The issues with the domain part are essentially the same as with
+ hostnames, as covered earlier in Section 3.1.
+
+ The local part is left for each domain to define. People quite
+ commonly use email addresses as usernames with web sites such as
+ banks or shopping sites, but the site doesn't know whether
+ foo@example.com is the same person as FOO@example.com. Thus, email
+ address-like identifiers are typically Indefinite identifiers.
+
+
+
+
+
+Thaler Informational [Page 18]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ To avoid false positives, some security mechanisms (such as those
+ described in [RFC5280]) compare the local part using an exact match.
+ Hence, like URIs, email address-like identifiers are designed for use
+ in grant-on-match security schemes, not in deny-on-match schemes.
+
+ Furthermore, when such identifiers are actually used as email
+ addresses, Section 2.4 of [RFC5321] states that the local part of a
+ mailbox must be treated as case sensitive, but if a mailbox is stored
+ and accessed using a filesystem using case-insensitive paths, there
+ may be many paths that refer to the same mailbox. As such, false
+ negatives can be common in this case.
+
+4. General Issues
+
+4.1. Conflation
+
+ There are a number of examples (some in the preceding sections) of
+ strings that conflate two types of identifiers, using some heuristic
+ to try to determine which type of identifier is given. Similarly,
+ two ways of encoding the same type of identifier might be conflated
+ within the same string.
+
+ Some examples include:
+
+ 1. A string that might be an IPv4 address literal or an IPv6 address
+ literal
+
+ 2. A string that might be an IP address literal or a hostname
+
+ 3. A string that might be a port number or a service name
+
+ 4. A DNS label that might be literal or be Punycode-encoded
+
+ Strings that allow such conflation can only be considered Definite if
+ there exists a well-defined rule to determine which identifier type
+ is meant. One way to do so is to ensure that the valid syntax for
+ the two is disjoint (e.g., distinguishing IPv4 vs. IPv6 address
+ literals by the use of colons in the latter). A second way to do so
+ is to define a precedence rule that results in some identifiers being
+ inaccessible via a conflated string (e.g., a host literally named
+ "xn--de-jg4avhby1noc0d" may be inaccessible due to the "xn--" prefix
+ denoting the use of Punycode encoding). In some cases, such
+ inaccessible space may be reserved so that the actual set of
+ identifiers in use is unambiguous. For example, Section 2.5.5.2 of
+ [RFC4291] defines a range of the IPv6 address space for representing
+ IPv4 addresses.
+
+
+
+
+
+Thaler Informational [Page 19]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+4.2. Internationalization
+
+ In addition to the issues with hostnames discussed in Section 3.1.3,
+ there are a number of internationalization issues that apply to many
+ types of Definite and Indefinite identifiers.
+
+ First, there is no DNS mechanism for identifying whether
+ non-identical strings would be seen by a human as being equivalent.
+ There are problematic examples even with US-ASCII (Basic Latin)
+ strings, including regional spelling variations such as "color" and
+ "colour", and with many non-English cases, including partially
+ numeric strings in Arabic script contexts, Chinese strings in
+ Simplified and Traditional forms, and so on. Attempts to produce
+ such alternate forms algorithmically could produce false positives
+ and hence have an adverse effect on security.
+
+ Second, some strings are visually confusable with others, and hence
+ if a security decision is made by a user based on visual inspection,
+ many opportunities for false positives exist. As such, using visual
+ inspection for security is unreliable. In addition to the security
+ issues, visual confusability also adversely affects the usability of
+ identifiers distributed via visual media. Similar issues can arise
+ with audible confusability when using audio (e.g., for radio
+ distribution, accessibility to the blind, etc.) in place of a visual
+ medium. Furthermore, when strings conflate two types of identifiers
+ as discussed in Section 4.1, allowing non-ASCII characters can cause
+ one type of identifier to appear to a human as another type of
+ identifier. For example, characters that may look like digits and
+ dots may appear to be an IPv4 literal to a human (especially to one
+ who might expect digits to appear in his or her native script).
+ Hence, conflation often increases the chance of confusability.
+
+ Determining whether a string is a valid identifier should typically
+ be done after, or as part of, canonicalization. Otherwise, an
+ attacker might use the canonicalization algorithm to inject (e.g.,
+ via percent encoding, Normalization Form KC (NFKC), or non-shortest-
+ form UTF-8) delimiters such as '@' in an email address-like
+ identifier, or a '.' in a hostname.
+
+ Any case-insensitive comparisons need to define how comparison is
+ done, since such comparisons may vary by the locale of the endpoint.
+ As such, using case-insensitive comparisons in general often results
+ in identifiers being either Indefinite or, if the legal character set
+ is restricted (e.g., to US-ASCII), Definite.
+
+ See also [WEBER] for a more visual discussion of many of these
+ issues.
+
+
+
+
+Thaler Informational [Page 20]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ Finally, the set of permitted characters and the canonical form of
+ the characters (and hence the canonicalization algorithm) sometimes
+ vary by protocol today, even when the intent is to use the same
+ identifier, such as when one protocol passes identifiers to the
+ other. See [RFC6885] for further discussion.
+
+4.3. Scope
+
+ Another issue arises when an identifier (e.g., "localhost",
+ "10.11.12.13", etc.) is not globally unique. Section 1.1 of
+ [RFC3986] states:
+
+ URIs have a global scope and are interpreted consistently
+ regardless of context, though the result of that interpretation
+ may be in relation to the end-user's context. For example,
+ "http://localhost/" has the same interpretation for every user of
+ that reference, even though the network interface corresponding to
+ "localhost" may be different for each end-user: interpretation is
+ independent of access.
+
+ Whenever an identifier that is not globally unique is passed to
+ another entity outside of the scope of uniqueness, it will refer to a
+ different resource and can result in a false positive. This problem
+ is often addressed by using the identifier together with some other
+ unique identifier of the context. For example, "alice" may uniquely
+ identify a user within a system but must be used with "example.com"
+ (as in "alice@example.com") to uniquely identify the context outside
+ of that system.
+
+ It is also worth noting that IPv6 addresses that are not globally
+ scoped can be written with, or otherwise associated with, a "zone ID"
+ to identify the context (see [RFC4007] for more information).
+ However, zone IDs are only unique within a host, so they typically
+ narrow, rather than expand, the scope of uniqueness of the resulting
+ identifier.
+
+4.4. Temporality
+
+ Often, identifiers are not unique across all time but have some
+ lifetime associated with them after which they may be reassigned to
+ another entity. For example, bob@example.com might be assigned to an
+ employee of the Example company, but if he leaves and another Bob is
+ later hired, the same identifier might be reused. As another
+ example, IP address 203.0.113.1 might be assigned to one subscriber
+ and then later reassigned to another subscriber. Security issues can
+ arise if updates are not made in all entities that store the
+ identifier (e.g., in an access control list as discussed in
+ Section 2, or in a resolution cache as discussed in Section 3.1.4).
+
+
+
+Thaler Informational [Page 21]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ This issue is similar to the issue of scope discussed in Section 4.3,
+ except that the scope of uniqueness is temporal rather than
+ topological.
+
+5. Security Considerations
+
+ This entire document is about security considerations.
+
+ To minimize issues related to elevation of privilege, any system that
+ requires the ability to use both deny and allow operations within the
+ same identifier space should avoid the use of Indefinite identifiers
+ in security comparisons.
+
+ To minimize future security risks, any new identifiers being designed
+ should specify an Absolute or Definite comparison algorithm, and if
+ extensibility is allowed (e.g., as new schemes in URIs allow), then
+ the comparison algorithm should remain invariant so that unrecognized
+ extensions can be compared. That is, security risks can be reduced
+ by specifying the comparison algorithm, making sure to resolve any
+ ambiguities pointed out in this document (e.g., "standard dotted
+ decimal").
+
+ Some issues (such as unrecognized extensions) can be mitigated by
+ treating such identifiers as invalid. Validity checking of
+ identifiers is further discussed in [RFC3696].
+
+ Perhaps the hardest issues arise when multiple protocols are used
+ together, such as in Figure 2, where the two protocols are defined or
+ implemented using different comparison algorithms. When constructing
+ an architecture that uses multiple such protocols, designers should
+ pay attention to any differences in comparison algorithms among the
+ protocols in order to fully understand the security risks. How to
+ deal with such security risks in current systems is an area for
+ future work.
+
+6. Acknowledgements
+
+ Yaron Goland contributed to the discussion on URIs. Patrik Faltstrom
+ contributed to the background on identifiers. John Klensin
+ contributed text in a number of different sections. Additional
+ helpful feedback and suggestions came from Bernard Aboba, Fred Baker,
+ Leslie Daigle, Mark Davis, Jeff Hodges, Bjoern Hoehrmann, Russ
+ Housley, Christian Huitema, Magnus Nystrom, Tom Petch, and Chris
+ Weber.
+
+
+
+
+
+
+
+Thaler Informational [Page 22]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+7. IAB Members at the Time of Approval
+
+ Bernard Aboba
+ Jari Arkko
+ Marc Blanchet
+ Ross Callon
+ Alissa Cooper
+ Spencer Dawkins
+ Joel Halpern
+ Russ Housley
+ David Kessens
+ Danny McPherson
+ Jon Peterson
+ Dave Thaler
+ Hannes Tschofenig
+
+8. Informative References
+
+ [IAB1123] Internet Architecture Board, "IAB Statement: 'The
+ interpretation of rules in the ICANN gTLD Applicant
+ Guidebook'", February 2012, <http://www.iab.org/documents/
+ correspondence-reports-documents/2012-2/iab-statement-the-
+ interpretation-of-rules-in-the-icann-gtld-applicant-
+ guidebook>.
+
+ [IANA-PORT]
+ IANA, "Service Name and Transport Protocol Port Number
+ Registry", March 2013,
+ <http://www.iana.org/assignments/service-names-port-
+ numbers/>.
+
+ [IEEE-1003.1]
+ IEEE and The Open Group, "The Open Group Base
+ Specifications, Issue 6, IEEE Std 1003.1, 2004 Edition",
+ IEEE Std 1003.1, 2004.
+
+ [JAVAURL] Oracle, "Class URL", Java(TM) Platform Standard Ed. 7,
+ 2013, <http://docs.oracle.com/javase/7/docs/api/java/net/
+ URL.html>.
+
+ [OASIS-SAMLv2-CORE]
+ Cantor, S., Ed., Kemp, J., Ed., Philpott, R., Ed., and E.
+ Maler, Ed., "Assertions and Protocols for the OASIS
+ Security Assertion Markup Language (SAML) V2.0", OASIS
+ Standard saml-core-2.0-os, March 2005,
+ <http://docs.oasis-open.org/security/saml/v2.0/
+ saml-core-2.0-os.pdf>.
+
+
+
+
+Thaler Informational [Page 23]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ [OAuth-WRAP]
+ Hardt, D., Ed., Tom, A., Eaton, B., and Y. Goland, "OAuth
+ Web Resource Authorization Profiles", Work in Progress,
+ January 2010.
+
+ [PRIVACY-CONS]
+ Cooper, A., Tschofenig, H., Aboba, B., Peterson, J.,
+ Morris, J., Hansen, M., and R. Smith, "Privacy
+ Considerations for Internet Protocols", Work in Progress,
+ April 2013.
+
+ [RFC1034] Mockapetris, P., "Domain names - concepts and facilities",
+ STD 13, RFC 1034, November 1987.
+
+ [RFC1123] Braden, R., "Requirements for Internet Hosts - Application
+ and Support", STD 3, RFC 1123, October 1989.
+
+ [RFC2277] Alvestrand, H.T., "IETF Policy on Character Sets and
+ Languages", BCP 18, RFC 2277, January 1998.
+
+ [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
+ "Internationalizing Domain Names in Applications (IDNA)",
+ RFC 3490, March 2003.
+
+ [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode
+ for Internationalized Domain Names in Applications
+ (IDNA)", RFC 3492, March 2003.
+
+ [RFC3493] Gilligan, R., Thomson, S., Bound, J., McCann, J., and W.
+ Stevens, "Basic Socket Interface Extensions for IPv6",
+ RFC 3493, February 2003.
+
+ [RFC3696] Klensin, J., "Application Techniques for Checking and
+ Transformation of Names", RFC 3696, February 2004.
+
+ [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
+ Resource Identifier (URI): Generic Syntax", STD 66,
+ RFC 3986, January 2005.
+
+ [RFC4007] Deering, S., Haberman, B., Jinmei, T., Nordmark, E., and
+ B. Zill, "IPv6 Scoped Address Architecture", RFC 4007,
+ March 2005.
+
+ [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing
+ Architecture", RFC 4291, February 2006.
+
+
+
+
+
+
+Thaler Informational [Page 24]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ [RFC4790] Newman, C., Duerst, M., and A. Gulbrandsen, "Internet
+ Application Protocol Collation Registry", RFC 4790,
+ March 2007.
+
+ [RFC4949] Shirey, R., "Internet Security Glossary, Version 2",
+ RFC 4949, August 2007.
+
+ [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
+ Housley, R., and W. Polk, "Internet X.509 Public Key
+ Infrastructure Certificate and Certificate Revocation List
+ (CRL) Profile", RFC 5280, May 2008.
+
+ [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
+ October 2008.
+
+ [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322,
+ October 2008.
+
+ [RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6
+ Address Text Representation", RFC 5952, August 2010.
+
+ [RFC6055] Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on
+ Encodings for Internationalized Domain Names", RFC 6055,
+ February 2011.
+
+ [RFC6066] Eastlake, D., "Transport Layer Security (TLS) Extensions:
+ Extension Definitions", RFC 6066, January 2011.
+
+ [RFC6125] Saint-Andre, P. and J. Hodges, "Representation and
+ Verification of Domain-Based Application Service Identity
+ within Internet Public Key Infrastructure Using X.509
+ (PKIX) Certificates in the Context of Transport Layer
+ Security (TLS)", RFC 6125, March 2011.
+
+ [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S.
+ Cheshire, "Internet Assigned Numbers Authority (IANA)
+ Procedures for the Management of the Service Name and
+ Transport Protocol Port Number Registry", BCP 165,
+ RFC 6335, August 2011.
+
+ [RFC6530] Klensin, J. and Y. Ko, "Overview and Framework for
+ Internationalized Email", RFC 6530, February 2012.
+
+ [RFC6532] Yang, A., Steele, S., and N. Freed, "Internationalized
+ Email Headers", RFC 6532, February 2012.
+
+
+
+
+
+
+Thaler Informational [Page 25]
+
+RFC 6943 Identifier Comparison May 2013
+
+
+ [RFC6818] Yee, P., "Updates to the Internet X.509 Public Key
+ Infrastructure Certificate and Certificate Revocation List
+ (CRL) Profile", RFC 6818, January 2013.
+
+ [RFC6874] Carpenter, B., Cheshire, S., and R. Hinden, "Representing
+ IPv6 Zone Identifiers in Address Literals and Uniform
+ Resource Identifiers", RFC 6874, February 2013.
+
+ [RFC6885] Blanchet, M. and A. Sullivan, "Stringprep Revision and
+ Problem Statement for the Preparation and Comparison of
+ Internationalized Strings (PRECIS)", RFC 6885, March 2013.
+
+ [TR36] Unicode Consortium, "Unicode Security Considerations",
+ Unicode Technical Report #36, Revision 11, July 2012,
+ <http://www.unicode.org/reports/tr36/>.
+
+ [USASCII] American National Standards Institute, "Coded Character
+ Sets -- 7-bit American Standard Code for Information
+ Interchange (7-bit ASCII)", ANSI X3.4, 1986.
+
+ [WEBER] Weber, C., "Attacking Software Globalization", March 2010,
+ <http://www.lookout.net/files/
+ Chris_Weber_Character%20Transformations%20v1.7_IUC33.pdf>.
+
+Author's Address
+
+ Dave Thaler (editor)
+ Microsoft Corporation
+ One Microsoft Way
+ Redmond, WA 98052
+ USA
+
+ Phone: +1 425 703 8835
+ EMail: dthaler@microsoft.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Thaler Informational [Page 26]
+