From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc6943.txt | 1459 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1459 insertions(+) create mode 100644 doc/rfc/rfc6943.txt (limited to 'doc/rfc/rfc6943.txt') diff --git a/doc/rfc/rfc6943.txt b/doc/rfc/rfc6943.txt new file mode 100644 index 0000000..16e9098 --- /dev/null +++ b/doc/rfc/rfc6943.txt @@ -0,0 +1,1459 @@ + + + + + + +Internet Architecture Board (IAB) D. Thaler, Ed. +Request for Comments: 6943 Microsoft +Category: Informational May 2013 +ISSN: 2070-1721 + + + Issues in Identifier Comparison for Security Purposes + +Abstract + + Identifiers such as hostnames, URIs, IP addresses, and email + addresses are often used in security contexts to identify security + principals and resources. In such contexts, an identifier presented + via some protocol is often compared using some policy to make + security decisions such as whether the security principal may access + the resource, what level of authentication or encryption is required, + etc. If the parties involved in a security decision use different + algorithms to compare identifiers, then failure scenarios ranging + from denial of service to elevation of privilege can result. This + document provides a discussion of these issues that designers should + consider when defining identifiers and protocols, and when + constructing architectures that use multiple protocols. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Architecture Board (IAB) + and represents information that the IAB has deemed valuable to + provide for permanent record. It represents the consensus of the + Internet Architecture Board (IAB). Documents approved for + publication by the IAB are not a candidate for any level of Internet + Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc6943. + + + + + + + + + + + + + +Thaler Informational [Page 1] + +RFC 6943 Identifier Comparison May 2013 + + +Copyright Notice + + Copyright (c) 2013 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. + +Table of Contents + + 1. Introduction ....................................................3 + 1.1. Classes of Identifiers .....................................5 + 1.2. Canonicalization ...........................................5 + 2. Identifier Use in Security Policies and Decisions ...............6 + 2.1. False Positives and Negatives ..............................7 + 2.2. Hypothetical Example .......................................8 + 3. Comparison Issues with Common Identifiers .......................9 + 3.1. Hostnames ..................................................9 + 3.1.1. IPv4 Literals ......................................11 + 3.1.2. IPv6 Literals ......................................12 + 3.1.3. Internationalization ...............................13 + 3.1.4. Resolution for Comparison ..........................14 + 3.2. Port Numbers and Service Names ............................14 + 3.3. URIs ......................................................15 + 3.3.1. Scheme Component ...................................16 + 3.3.2. Authority Component ................................16 + 3.3.3. Path Component .....................................17 + 3.3.4. Query Component ....................................17 + 3.3.5. Fragment Component .................................17 + 3.3.6. Resolution for Comparison ..........................18 + 3.4. Email Address-Like Identifiers ............................18 + 4. General Issues .................................................19 + 4.1. Conflation ................................................19 + 4.2. Internationalization ......................................20 + 4.3. Scope .....................................................21 + 4.4. Temporality ...............................................21 + 5. Security Considerations ........................................22 + 6. Acknowledgements ...............................................22 + 7. IAB Members at the Time of Approval ............................23 + 8. Informative References .........................................23 + + + + + + + +Thaler Informational [Page 2] + +RFC 6943 Identifier Comparison May 2013 + + +1. Introduction + + In computing and the Internet, various types of "identifiers" are + used to identify humans, devices, content, etc. This document + provides a discussion of some security issues that designers should + consider when defining identifiers and protocols, and when + constructing architectures that use multiple protocols. Before + discussing these security issues, we first give some background on + some typical processes involving identifiers. Terms such as + "identifier", "identity", and "principal" are used as defined in + [RFC4949]. + + As depicted in Figure 1, there are multiple processes relevant to our + discussion. + + 1. An identifier is first generated. If the identifier is intended + to be unique, the generation process must include some mechanism, + such as allocation by a central authority or verification among + the members of a distributed authority, to help ensure + uniqueness. However, the notion of "unique" involves determining + whether a putative identifier matches any other identifier that + has already been allocated. As we will see, for many types of + identifiers, this is not simply an exact binary match. + + After generating the identifier, it is often stored in two + locations: with the requester or "holder" of the identifier, and + with some repository of identifiers (e.g., DNS). For example, if + the identifier was allocated by a central authority, the + repository might be that authority. If the identifier identifies + a device or content on a device, the repository might be that + device. + + 2. The identifier is distributed, either by the holder of the + identifier or by a repository of identifiers, to others who could + use the identifier. This distribution might be electronic, but + sometimes it is via other channels such as voice, business card, + billboard, or other form of advertisement. The identifier itself + might be distributed directly, or it might be used to generate a + portion of another type of identifier that is then distributed. + For example, a URI or email address might include a server name, + and hence distributing the URI or email address also inherently + distributes the server name. + + 3. The identifier is used by some party. Generally, the user + supplies the identifier, which is (directly or indirectly) sent + to the repository of identifiers. The repository of identifiers + must then attempt to match the user-supplied identifier with an + identifier in its repository. + + + +Thaler Informational [Page 3] + +RFC 6943 Identifier Comparison May 2013 + + + For example, using an email address to send email to the holder + of an identifier may result in the email arriving at the holder's + email server, which has access to the mail stores. + + +------------+ + | Holder of | 1. Generation + | identifier +<---------+ + +----+-------+ | + | | Match + | v/ + | +-------+-------+ + +----------+ Repository of | + | | identifiers | + | +-------+-------+ + 2. Distribution | ^\ + | | Match + v | + +---------+-------+ | + | User of | | + | identifier +----------+ + +-----------------+ 3. Use + + Figure 1: Typical Identifier Processes + + Another variation is where a user is given the identifier of a + resource (e.g., a web site) to access securely, sometimes known as a + "reference identifier" [RFC6125], and the server hosting the resource + then presents its identity at the time of use. In this case, the + user application attempts to match the presented identity against the + reference identifier. + + One key aspect is that the identifier values passed in generation, + distribution, and use may all be in different forms. For example, an + identifier might be exchanged in printed form at generation time, + distributed to a user via voice, and then used electronically. As + such, the match process can be complicated. + + Furthermore, in many cases, the relationship between holder, + repositories, and users may be more involved. For example, when a + hierarchy of web caches exists, each cache is itself a repository of + a sort, and the match process is usually intended to be the same as + on the origin server. + + Another aspect to keep in mind is that there can be multiple + identifiers that refer to the same object (i.e., resource, human, + device, etc.). For example, a human might have a passport number and + a drivers license number, and an RFC might be available at multiple + locations (rfc-editor.org and ietf.org). In this document, we focus + + + +Thaler Informational [Page 4] + +RFC 6943 Identifier Comparison May 2013 + + + on comparing two identifiers to see whether they are the same + identifier, rather than comparing two different identifiers to see + whether they refer to the same entity (although a few issues with the + latter are touched on in several places, such as Sections 3.1.4 and + 3.3.6). + +1.1. Classes of Identifiers + + In this document, we will refer to the following classes of + identifiers: + + o Absolute: identifiers that can be compared byte-by-byte for + equality. Two identifiers that have different bytes are defined + to be different. For example, binary IP addresses are in this + class. + + o Definite: identifiers that have a single well-defined comparison + algorithm. For example, URI scheme names are required to be + US-ASCII [USASCII] and are defined to match in a case-insensitive + way; the comparison is thus definite, since there is a well- + specified algorithm (Section 9.2.1 of [RFC4790]) on how to do a + case-insensitive match among ASCII strings. + + o Indefinite: identifiers that have no single well-defined + comparison algorithm. For example, human names are in this class. + Everyone might want the comparison to be tailored for their + locale, for some definition of "locale". In some cases, there may + be limited subsets of parties that might be able to agree (e.g., + ASCII users might all agree on a common comparison algorithm, + whereas users of other Roman-derived scripts, such as Turkish, may + not), but identifiers often tend to leak out of such limited + environments. + +1.2. Canonicalization + + Perhaps the most common algorithm for comparison involves first + converting each identifier to a canonical form (a process known as + "canonicalization" or "normalization") and then testing the resulting + canonical representations for bitwise equality. In so doing, it is + thus critical that all entities involved agree on the same canonical + form and use the same canonicalization algorithm so that the overall + comparison process is also the same. + + Note that in some contexts, such as in internationalization, the + terms "canonicalization" and "normalization" have a precise meaning. + In this document, however, we use these terms synonymously in their + more generic form, to mean conversion to some standard form. + + + + +Thaler Informational [Page 5] + +RFC 6943 Identifier Comparison May 2013 + + + While the most common method of comparison includes canonicalization, + comparison can also be done by defining an equivalence algorithm, + where no single form is canonical. However, in most cases, a + canonical form is useful for other purposes, such as output, and so + in such cases defining a canonical form suffices to define a + comparison method. + +2. Identifier Use in Security Policies and Decisions + + Identifiers such as hostnames, URIs, and email addresses are used in + security contexts to identify security principals (i.e., entities + that can be authenticated) and resources as well as other security + parameters such as types and values of claims. Those identifiers are + then used to make security decisions based on an identifier presented + via some protocol. For example: + + o Authentication: a protocol might match a security principal's + identifier to look up expected keying material and then match + keying material. + + o Authorization: a protocol might match a resource name against some + policy. For example, it might look up an access control list + (ACL) and then look up the security principal's identifier (or a + surrogate for it) in that ACL. + + o Accounting: a system might create an accounting record for a + security principal's identifier or resource name, and then might + later need to match a presented identifier to (for example) add + new filtering rules based on the records in order to stop an + attack. + + If the parties involved in a security decision use different matching + algorithms for the same identifiers, then failure scenarios ranging + from denial of service to elevation of privilege can result, as we + will see. + + This is especially complicated in cases involving multiple parties + and multiple protocols. For example, there are many scenarios where + some form of "security token service" is used to grant to a requester + permission to access a resource, where the resource is held by a + third party that relies on the security token service (see Figure 2). + The protocol used to request permission (e.g., Kerberos or OAuth) may + be different from the protocol used to access the resource (e.g., + HTTP). Opportunities for security problems arise when two protocols + define different comparison algorithms for the same type of + identifier, or when a protocol is ambiguously specified and two + endpoints (e.g., a security token service and a resource holder) + implement different algorithms within the same protocol. + + + +Thaler Informational [Page 6] + +RFC 6943 Identifier Comparison May 2013 + + + +----------+ + | security | + | token | + | service | + +----------+ + ^ + | 1. supply credentials and + | get token for resource + | +--------+ + +----------+ 2. supply token and access resource |resource| + |requester |=------------------------------------->| holder | + +----------+ +--------+ + + Figure 2: Simple Security Exchange + + In many cases, the situation is more complex. With X.509 Public Key + Infrastructure (PKIX) certificates [RFC6125], for example, the name + in a certificate gets compared against names in ACLs or other things. + In the case of web site security, the name in the certificate gets + compared to a portion of the URI that a user may have typed into a + browser. The fact that many different people are doing the typing, + on many different types of systems, complicates the problem. + + Add to this the certificate enrollment step, and the certificate + issuance step, and two more parties have an opportunity to adjust the + encoding, or worse, the software that supports them might make + changes that the parties are unaware are happening. + +2.1. False Positives and Negatives + + It is first worth discussing in more detail the effects of errors in + the comparison algorithm. A "false positive" results when two + identifiers compare as if they were equal but in reality refer to two + different objects (e.g., security principals or resources). When + privilege is granted on a match, a false positive thus results in an + elevation of privilege -- for example, allowing execution of an + operation that should not have been permitted otherwise. When + privilege is denied on a match (e.g., matching an entry in a + block/deny list or a revocation list), a permissible operation is + denied. At best, this can cause worse performance (e.g., a cache + miss or forcing redundant authentication) and at worst can result in + a denial of service. + + + + + + + + + +Thaler Informational [Page 7] + +RFC 6943 Identifier Comparison May 2013 + + + A "false negative" results when two identifiers that in reality refer + to the same thing compare as if they were different, and the effects + are the reverse of those for false positives. That is, when + privilege is granted on a match, the result is at best worse + performance and at worst a denial of service; when privilege is + denied on a match, elevation of privilege results. + + Figure 3 summarizes these effects. + + | "Grant on match" | "Deny on match" + ---------------+------------------------+----------------------- + False positive | Elevation of privilege | Denial of service + ---------------+------------------------+----------------------- + False negative | Denial of service | Elevation of privilege + ---------------+------------------------+----------------------- + + Figure 3: Worst Effects of False Positives/Negatives + + When designing a comparison algorithm, one can typically modify it to + increase the likelihood of false positives and decrease the + likelihood of false negatives, or vice versa. Which outcome is + better depends on the context. + + Elevation of privilege is almost always seen as far worse than denial + of service. Hence, for URIs, for example, Section 6.1 of [RFC3986] + states that "comparison methods are designed to minimize false + negatives while strictly avoiding false positives". + + Thus, URIs were defined with a "grant privilege on match" paradigm in + mind, where it is critical to prevent elevation of privilege while + minimizing denial of service. Using URIs in a "deny privilege on + match" system can thus be problematic. + +2.2. Hypothetical Example + + In this example, both security principals and resources are + identified using URIs. Foo Corp has paid example.com for access to + the Stuff service. Foo Corp allows its employees to create accounts + on the Stuff service. Alice gets the account + "http://example.com/Stuff/FooCorp/alice" and Bob gets + "http://example.com/Stuff/FooCorp/bob". It turns out, however, that + Foo Corp's URI canonicalizer includes URI fragment components in + comparisons whereas example.com's does not, and Foo Corp does not + disallow the # character in the account name. So Chuck, who is a + malicious employee of Foo Corp, asks to create an account at + example.com with the name alice#stuff. Foo Corp's URI logic checks + its records for accounts it has created with stuff and sees that + there is no account with the name alice#stuff. Hence, in its + + + +Thaler Informational [Page 8] + +RFC 6943 Identifier Comparison May 2013 + + + records, it associates the account alice#stuff with Chuck and will + only issue tokens good for use with + "http://example.com/Stuff/FooCorp/alice#stuff" to Chuck. + + Chuck, the attacker, goes to a security token service at Foo Corp and + asks for a security token good for + "http://example.com/Stuff/FooCorp/alice#stuff". Foo Corp issues the + token, since Chuck is the legitimate owner (in Foo Corp's view) of + the alice#stuff account. Chuck then submits the security token in a + request to "http://example.com/Stuff/FooCorp/alice". + + But example.com uses a URI canonicalizer that, for the purposes of + checking equality, ignores fragments. So when example.com looks in + the security token to see if the requester has permission from Foo + Corp to access the given account, it successfully matches the URI in + the security token, "http://example.com/Stuff/FooCorp/alice#stuff", + with the requested resource name + "http://example.com/Stuff/FooCorp/alice". + + Leveraging the inconsistencies in the canonicalizers used by Foo Corp + and example.com, Chuck is able to successfully launch an elevation- + of-privilege attack and access Alice's resource. + + Furthermore, consider an attacker using a similar corporation, such + as "foocorp" (or any variation containing a non-ASCII character that + some humans might expect to represent the same corporation). If the + resource holder treats them as different but the security token + service treats them as the same, then elevation of privilege can + occur in this scenario as well. + +3. Comparison Issues with Common Identifiers + + In this section, we walk through a number of common types of + identifiers and discuss various issues related to comparison that may + affect security whenever they are used to identify security + principals or resources. These examples illustrate common patterns + that may arise with other types of identifiers. + +3.1. Hostnames + + Hostnames (composed of dot-separated labels) are commonly used either + directly as identifiers, or as components in identifiers such as in + URIs and email addresses. Another example is in Sections 7.2 and 7.3 + of [RFC5280] (and updated in Section 3 of [RFC6818]), which specify + use in PKIX certificates. + + In this section, we discuss a number of issues in comparing strings + that appear to be some form of hostname. + + + +Thaler Informational [Page 9] + +RFC 6943 Identifier Comparison May 2013 + + + It is first worth pointing out that the term "hostname" itself is + often ambiguous, and hence it is important that any use clarify which + definition is intended. Some examples of definitions include: + + a. A Fully Qualified Domain Name (FQDN), + + b. An FQDN that is associated with address records in the DNS, + + c. The leftmost label in an FQDN, or + + d. The leftmost label in an FQDN that is associated with address + records. + + The use of different definitions in different places results in + questions such as whether "example" and "example.com" are considered + equal or not, and hence it is important when writing new + specifications to be clear about which definition is meant. + + Section 3 of [RFC6055] discusses the differences between a "hostname" + and a "DNS name", where the former is a subset of the latter by using + a restricted set of characters (letters, digits, and hyphens). If + one canonicalizer uses the "DNS name" definition whereas another uses + a "hostname" definition, a name might be valid in the former but + invalid in the latter. As long as invalid identifiers are denied + privilege, this difference will not result in elevation of privilege. + + Section 3.1 of [RFC1034] discusses the difference between a + "complete" domain name, which ends with a dot (such as + "example.com."), and a multi-label relative name such as + "example.com" that assumes the root (".") is in the suffix search + list. In most contexts, these are considered equal, but there may be + issues if different entities in a security architecture have + different interpretations of a relative domain name. + + [IAB1123] briefly discusses issues with the ambiguity around whether + a label will be "alphabetic" -- including, among other issues, how + "alphabetic" should be interpreted in an internationalized + environment -- and whether a hostname can be interpreted as an IP + address. We explore this last issue in more detail below. + + + + + + + + + + + + +Thaler Informational [Page 10] + +RFC 6943 Identifier Comparison May 2013 + + +3.1.1. IPv4 Literals + + Section 2.1 of [RFC1123] states: + + Whenever a user inputs the identity of an Internet host, it SHOULD + be possible to enter either (1) a host domain name or (2) an IP + address in dotted-decimal ("#.#.#.#") form. The host SHOULD check + the string syntactically for a dotted-decimal number before + looking it up in the Domain Name System. + + and + + This last requirement is not intended to specify the complete + syntactic form for entering a dotted-decimal host number; that is + considered to be a user-interface issue. + + In specifying the inet_addr() API, the Portable Operating System + Interface (POSIX) standard [IEEE-1003.1] defines "IPv4 dotted decimal + notation" as allowing not only strings of the form "10.0.1.2" but + also allowing octal and hexadecimal, and addresses with less than + four parts. For example, "10.0.258", "0xA000102", and "012.0x102" + all represent the same IPv4 address in standard "IPv4 dotted decimal" + notation. We will refer to this as the "loose" syntax of an IPv4 + address literal. + + In Section 6.1 of [RFC3493], getaddrinfo() is defined to support the + same (loose) syntax as inet_addr(): + + If the specified address family is AF_INET or AF_UNSPEC, address + strings using Internet standard dot notation as specified in + inet_addr() are valid. + + In contrast, Section 6.3 of the same RFC states, specifying + inet_pton(): + + If the af argument of inet_pton() is AF_INET, the src string shall + be in the standard IPv4 dotted-decimal form: + + ddd.ddd.ddd.ddd + + where "ddd" is a one to three digit decimal number between 0 and + 255. The inet_pton() function does not accept other formats (such + as the octal numbers, hexadecimal numbers, and fewer than four + numbers that inet_addr() accepts). + + + + + + + +Thaler Informational [Page 11] + +RFC 6943 Identifier Comparison May 2013 + + + As shown above, inet_pton() uses what we will refer to as the + "strict" form of an IPv4 address literal. Some platforms also use + the strict form with getaddrinfo() when the AI_NUMERICHOST flag is + passed to it. + + Both the strict and loose forms are standard forms, and hence a + protocol specification is still ambiguous if it simply defines a + string to be in the "standard IPv4 dotted decimal form". And, as a + result of these differences, names such as "10.11.12" are ambiguous + as to whether they are an IP address or a hostname, and even + "10.11.12.13" can be ambiguous because of the "SHOULD" in the above + text from RFC 1123, making it optional whether to treat it as an + address or a DNS name. + + Protocols and data formats that can use addresses in string form for + security purposes need to resolve these ambiguities. For example, + for the host component of URIs, Section 3.2.2 of [RFC3986] resolves + the first ambiguity by only allowing the strict form and resolves the + second ambiguity by specifying that it is considered an IPv4 address + literal. New protocols and data formats should similarly consider + using the strict form rather than the loose form in order to better + match user expectations. + + A string might be valid under the "loose" definition but invalid + under the "strict" definition. As long as invalid identifiers are + denied privilege, this difference will not result in elevation of + privilege. Some protocols, however, use strings that can be either + an IP address literal or a hostname. Such strings are at best + Definite identifiers, and often turn out to be Indefinite + identifiers. (See Section 4.1 for more discussion.) + +3.1.2. IPv6 Literals + + IPv6 addresses similarly have a wide variety of alternate but + semantically identical string representations, as defined in + Section 2.2 of [RFC4291] and Section 2 of [RFC6874]. As discussed in + Section 3.2.5 of [RFC5952], this fact causes problems in security + contexts if comparison (such as in PKIX certificates) is done between + strings rather than between the binary representations of addresses. + + [RFC5952] specified a recommended canonical string format as an + attempt to solve this problem, but it may not be ubiquitously + supported at present. And, when strings can contain non-ASCII + characters, the same issues (and more, since hexadecimal and colons + are allowed) arise as with IPv4 literals. + + + + + + +Thaler Informational [Page 12] + +RFC 6943 Identifier Comparison May 2013 + + + Whereas (binary) IPv6 addresses are Absolute identifiers, IPv6 + address literals are Definite identifiers, since string-to-address + conversion for IPv6 address literals is unambiguous. + +3.1.3. Internationalization + + The IETF policy on character sets and languages [RFC2277] requires + support for UTF-8 in protocols, and as a result many protocols now do + support non-ASCII characters. When a hostname is sent in a UTF-8 + field, there are a number of ways it may be encoded. For example, + hostname labels might be encoded directly in UTF-8, or they might + first be Punycode-encoded [RFC3492] or even percent-encoded from + UTF-8. + + For example, in URIs, Section 3.2.2 of [RFC3986] specifically allows + for the use of percent-encoded UTF-8 characters in the hostname as + well as the use of Internationalized Domain Names in Applications + (IDNA) encoding [RFC3490] using the Punycode algorithm. + + Percent-encoding is unambiguous for hostnames, since the percent + character cannot appear in the strict definition of a "hostname", + though it can appear in a DNS name. + + Punycode-encoded labels (or "A-labels"), on the other hand, can be + ambiguous if hosts are actually allowed to be named with a name + starting with "xn--", and false positives can result. While this may + be extremely unlikely for normal scenarios, it nevertheless provides + a possible vector for an attacker. + + A hostname comparator thus needs to decide whether a Punycode-encoded + label should or should not be considered a valid hostname label, and + if so, then whether it should match a label encoded in some other + form such as a percent-encoded Unicode label (U-label). + + For example, Section 3 of "Transport Layer Security (TLS) Extensions: + Extension Definitions" [RFC6066] states: + + "HostName" contains the fully qualified DNS hostname of the + server, as understood by the client. The hostname is represented + as a byte string using ASCII encoding without a trailing dot. + This allows the support of internationalized domain names through + the use of A-labels defined in [RFC5890]. DNS hostnames are case- + insensitive. The algorithm to compare hostnames is described in + [RFC5890], Section 2.3.2.4. + + For some additional discussion of security issues that arise with + internationalization, see Section 4.2 and [TR36]. + + + + +Thaler Informational [Page 13] + +RFC 6943 Identifier Comparison May 2013 + + +3.1.4. Resolution for Comparison + + Some systems (specifically Java URLs [JAVAURL]) use the rule that if + two hostnames resolve to the same IP address(es) then the hostnames + are considered equal. That is, the canonicalization algorithm + involves name resolution with an IP address being the canonical form. + + For example, if resolution was done via DNS, and DNS contained: + + example.com. IN A 10.0.0.6 + example.net. CNAME example.com. + example.org. IN A 10.0.0.6 + + then the algorithm might treat all three names as equal, even though + the third name might refer to a different entity. + + With the introduction of dynamic IP addresses; private IP addresses; + multiple IP addresses per name; multiple address families (e.g., IPv4 + vs. IPv6); devices that roam to new locations; commonly deployed DNS + tricks that result in the answer depending on factors such as the + requester's location and the load on the server whose address is + returned; etc., this method of comparison cannot be relied upon. + There is no guarantee that two names for the same host will resolve + the name to the same IP addresses; nor that the addresses resolved + refer to the same entity, such as when the names resolve to private + IP addresses; nor even that the system has connectivity (and the + willingness to wait for the delay) to resolve names at the time the + answer is needed. The lifetime of the identifier, and of any cached + state from a previous resolution, also affects security (see + Section 4.4). + + In addition, a comparison mechanism that relies on the ability to + resolve identifiers such as hostnames to other identifiers such as IP + addresses leaks information about security decisions to outsiders if + these queries are publicly observable. (See [PRIVACY-CONS] for a + deeper discussion of information disclosure.) + + Finally, it is worth noting that resolving two identifiers to + determine if they refer to the same entity can be thought of as a use + of such identifiers, as opposed to actually comparing the identifiers + themselves, which is the focus of this document. + +3.2. Port Numbers and Service Names + + Port numbers and service names are discussed in depth in [RFC6335]. + Historically, there were port numbers, service names used in SRV + records, and mnemonic identifiers for assigned port numbers (known as + port "keywords" at [IANA-PORT]). The latter two are now unified, and + + + +Thaler Informational [Page 14] + +RFC 6943 Identifier Comparison May 2013 + + + various protocols use one or more of these types in strings. For + example, the common syntax used by many URI schemes allows port + numbers but not service names. Some implementations of the + getaddrinfo() API support strings that can be either port numbers or + port keywords (but not service names). + + For protocols that use service names that must be resolved, the + issues are the same as those for resolution of addresses in + Section 3.1.4. In addition, Section 5.1 of [RFC6335] clarifies that + service names/port keywords must contain at least one letter. This + prevents confusion with port numbers in strings where both are + allowed. + +3.3. URIs + + This section looks at issues related to using URIs for security + purposes. For example, Section 7.4 of [RFC5280] specifies comparison + of URIs in certificates. Examples of URIs in security-token-based + access control systems include WS-*, SAML 2.0 [OASIS-SAMLv2-CORE], + and OAuth Web Resource Authorization Profiles (WRAP) [OAuth-WRAP]. + In such systems, a variety of participants in the security + infrastructure are identified by URIs. For example, requesters of + security tokens are sometimes identified with URIs. The issuers of + security tokens and the relying parties who are intended to consume + security tokens are frequently identified by URIs. Claims in + security tokens often have their types defined using URIs, and the + values of the claims can also be URIs. + + URIs are defined with multiple components, each of which has its own + rules. We cover each in turn below. However, it is also important + to note that there exist multiple comparison algorithms. Section 6.2 + of [RFC3986] states: + + A variety of methods are used in practice to test URI equivalence. + These methods fall into a range, distinguished by the amount of + processing required and the degree to which the probability of + false negatives is reduced. As noted above, false negatives + cannot be eliminated. In practice, their probability can be + reduced, but this reduction requires more processing and is not + cost-effective for all applications. + + If this range of comparison practices is considered as a ladder, + the following discussion will climb the ladder, starting with + practices that are cheap but have a relatively higher chance of + producing false negatives, and proceeding to those that have + higher computational cost and lower risk of false negatives. + + + + + +Thaler Informational [Page 15] + +RFC 6943 Identifier Comparison May 2013 + + + The ladder approach has both pros and cons. On the pro side, it + allows some uses to optimize for security, and other uses to optimize + for cost, thus allowing URIs to be applicable to a wide range of + uses. A disadvantage is that when different approaches are taken by + different components in the same system using the same identifiers, + the inconsistencies can result in security issues. + +3.3.1. Scheme Component + + [RFC3986] defines URI schemes as being case-insensitive US-ASCII and + in Section 6.2.2.1 specifies that scheme names should be normalized + to lowercase characters. + + New schemes can be defined over time. In general, however, two URIs + with an unrecognized scheme cannot be safely compared. This is + because the canonicalization and comparison rules for the other + components may vary by scheme. For example, a new URI scheme might + have a default port of X, and without that knowledge, a comparison + algorithm cannot know whether "example.com" and "example.com:X" + should be considered to match in the authority component. Hence, for + security purposes, it is safest for unrecognized schemes to be + treated as invalid identifiers. However, if the URIs are only used + with a "grant access on match" paradigm, then unrecognized schemes + can be supported by doing a generic case-sensitive comparison, at the + expense of some false negatives. + +3.3.2. Authority Component + + The authority component is scheme-specific, but many schemes follow a + common syntax that allows for userinfo, host, and port. + +3.3.2.1. Host + + Section 3.1 discusses issues with hostnames in general. In addition, + Section 3.2.2 of [RFC3986] allows future changes using the IPvFuture + production. As with IPv4 and IPv6 literals, IPvFuture formats may + have issues with multiple semantically identical string + representations and may also be semantically identical to an IPv4 or + IPv6 address. As such, false negatives may be common if IPvFuture is + used. + +3.3.2.2. Port + + See discussion in Section 3.2. + + + + + + + +Thaler Informational [Page 16] + +RFC 6943 Identifier Comparison May 2013 + + +3.3.2.3. Userinfo + + [RFC3986] defines the userinfo production that allows arbitrary data + about the user of the URI to be placed before '@' signs in URIs. For + example, "ftp://alice:bob@example.com/bar" has the value "alice:bob" + as its userinfo. When comparing URIs in a security context, one must + decide whether to treat the userinfo as being significant or not. + Some URI comparison services, for example, treat + "ftp://alice:ick@example.com" and "ftp://example.com" as being equal. + + When the userinfo is treated as being significant, it has additional + considerations (e.g., whether or not it is case sensitive), which we + cover in Section 3.4. + +3.3.3. Path Component + + [RFC3986] supports the use of path segment values such as "./" or + "../" for relative URIs. As discussed in Section 6.2.2.3 of + [RFC3986], they are intended only for use within a reference relative + to some other base URI, but Section 5.2.4 of [RFC3986] nevertheless + defines an algorithm to remove them as part of URI normalization. + + Unless a scheme states otherwise, the path component is defined to be + case sensitive. However, if the resource is stored and accessed + using a filesystem using case-insensitive paths, there will be many + paths that refer to the same resource. As such, false negatives can + be common in this case. + +3.3.4. Query Component + + There is the question as to whether "http://example.com/foo", + "http://example.com/foo?", and "http://example.com/foo?bar" are each + considered equal or different. + + Similarly, it is unspecified whether the order of values matters. + For example, should "http://example.com/blah?ick=bick&foo=bar" be + considered equal to "http://example.com/blah?foo=bar&ick=bick"? And + if a domain name is permitted to appear in a query component (e.g., + in a reference to another URI), the same issues in Section 3.1 apply. + +3.3.5. Fragment Component + + Some URI formats include fragment identifiers. These are typically + handles to locations within a resource and are used for local + reference. A classic example is the use of fragments in HTTP URIs + where a URI of the form "http://example.com/blah.html#ick" means + retrieve the resource "http://example.com/blah.html" and, once it has + arrived locally, find the HTML anchor named "ick" and display that. + + + +Thaler Informational [Page 17] + +RFC 6943 Identifier Comparison May 2013 + + + So, for example, when a user clicks on the link + "http://example.com/blah.html#baz", a browser will check its cache by + doing a URI comparison for "http://example.com/blah.html" and, if the + resource is present in the cache, a match is declared. + + Hence, comparisons for security purposes typically ignore the + fragment component and treat all fragments as equal to the full + resource. However, if one were actually trying to compare the piece + of a resource that was identified by the fragment identifier, + ignoring it would result in potential false positives. + +3.3.6. Resolution for Comparison + + It may be tempting to define a URI comparison algorithm based on + whether URIs resolve to the same content, along the lines of + resolving hostnames as described in Section 3.1.4. However, such an + algorithm would result in similar problems, including content that + dynamically changes over time or that is based on factors such as the + requester's location, potential lack of external connectivity at the + time or place that comparison is done, introduction of potentially + undesirable delay, etc. + + In addition, as noted in Section 3.1.4, resolution leaks information + about security decisions to outsiders if the queries are publicly + observable. + +3.4. Email Address-Like Identifiers + + Section 3.4.1 of [RFC5322] defines the syntax of an email address- + like identifier, and Section 3.2 of [RFC6532] updates it to support + internationalization. Section 7.5 of [RFC5280] further discusses the + use of internationalized email addresses in certificates. + + Regarding the security impact of internationalized email headers, + [RFC6532] points to Section 14 of [RFC6530], which contains a + discussion of many issues resulting from internationalization. + + Email address-like identifiers have a local part and a domain part. + The issues with the domain part are essentially the same as with + hostnames, as covered earlier in Section 3.1. + + The local part is left for each domain to define. People quite + commonly use email addresses as usernames with web sites such as + banks or shopping sites, but the site doesn't know whether + foo@example.com is the same person as FOO@example.com. Thus, email + address-like identifiers are typically Indefinite identifiers. + + + + + +Thaler Informational [Page 18] + +RFC 6943 Identifier Comparison May 2013 + + + To avoid false positives, some security mechanisms (such as those + described in [RFC5280]) compare the local part using an exact match. + Hence, like URIs, email address-like identifiers are designed for use + in grant-on-match security schemes, not in deny-on-match schemes. + + Furthermore, when such identifiers are actually used as email + addresses, Section 2.4 of [RFC5321] states that the local part of a + mailbox must be treated as case sensitive, but if a mailbox is stored + and accessed using a filesystem using case-insensitive paths, there + may be many paths that refer to the same mailbox. As such, false + negatives can be common in this case. + +4. General Issues + +4.1. Conflation + + There are a number of examples (some in the preceding sections) of + strings that conflate two types of identifiers, using some heuristic + to try to determine which type of identifier is given. Similarly, + two ways of encoding the same type of identifier might be conflated + within the same string. + + Some examples include: + + 1. A string that might be an IPv4 address literal or an IPv6 address + literal + + 2. A string that might be an IP address literal or a hostname + + 3. A string that might be a port number or a service name + + 4. A DNS label that might be literal or be Punycode-encoded + + Strings that allow such conflation can only be considered Definite if + there exists a well-defined rule to determine which identifier type + is meant. One way to do so is to ensure that the valid syntax for + the two is disjoint (e.g., distinguishing IPv4 vs. IPv6 address + literals by the use of colons in the latter). A second way to do so + is to define a precedence rule that results in some identifiers being + inaccessible via a conflated string (e.g., a host literally named + "xn--de-jg4avhby1noc0d" may be inaccessible due to the "xn--" prefix + denoting the use of Punycode encoding). In some cases, such + inaccessible space may be reserved so that the actual set of + identifiers in use is unambiguous. For example, Section 2.5.5.2 of + [RFC4291] defines a range of the IPv6 address space for representing + IPv4 addresses. + + + + + +Thaler Informational [Page 19] + +RFC 6943 Identifier Comparison May 2013 + + +4.2. Internationalization + + In addition to the issues with hostnames discussed in Section 3.1.3, + there are a number of internationalization issues that apply to many + types of Definite and Indefinite identifiers. + + First, there is no DNS mechanism for identifying whether + non-identical strings would be seen by a human as being equivalent. + There are problematic examples even with US-ASCII (Basic Latin) + strings, including regional spelling variations such as "color" and + "colour", and with many non-English cases, including partially + numeric strings in Arabic script contexts, Chinese strings in + Simplified and Traditional forms, and so on. Attempts to produce + such alternate forms algorithmically could produce false positives + and hence have an adverse effect on security. + + Second, some strings are visually confusable with others, and hence + if a security decision is made by a user based on visual inspection, + many opportunities for false positives exist. As such, using visual + inspection for security is unreliable. In addition to the security + issues, visual confusability also adversely affects the usability of + identifiers distributed via visual media. Similar issues can arise + with audible confusability when using audio (e.g., for radio + distribution, accessibility to the blind, etc.) in place of a visual + medium. Furthermore, when strings conflate two types of identifiers + as discussed in Section 4.1, allowing non-ASCII characters can cause + one type of identifier to appear to a human as another type of + identifier. For example, characters that may look like digits and + dots may appear to be an IPv4 literal to a human (especially to one + who might expect digits to appear in his or her native script). + Hence, conflation often increases the chance of confusability. + + Determining whether a string is a valid identifier should typically + be done after, or as part of, canonicalization. Otherwise, an + attacker might use the canonicalization algorithm to inject (e.g., + via percent encoding, Normalization Form KC (NFKC), or non-shortest- + form UTF-8) delimiters such as '@' in an email address-like + identifier, or a '.' in a hostname. + + Any case-insensitive comparisons need to define how comparison is + done, since such comparisons may vary by the locale of the endpoint. + As such, using case-insensitive comparisons in general often results + in identifiers being either Indefinite or, if the legal character set + is restricted (e.g., to US-ASCII), Definite. + + See also [WEBER] for a more visual discussion of many of these + issues. + + + + +Thaler Informational [Page 20] + +RFC 6943 Identifier Comparison May 2013 + + + Finally, the set of permitted characters and the canonical form of + the characters (and hence the canonicalization algorithm) sometimes + vary by protocol today, even when the intent is to use the same + identifier, such as when one protocol passes identifiers to the + other. See [RFC6885] for further discussion. + +4.3. Scope + + Another issue arises when an identifier (e.g., "localhost", + "10.11.12.13", etc.) is not globally unique. Section 1.1 of + [RFC3986] states: + + URIs have a global scope and are interpreted consistently + regardless of context, though the result of that interpretation + may be in relation to the end-user's context. For example, + "http://localhost/" has the same interpretation for every user of + that reference, even though the network interface corresponding to + "localhost" may be different for each end-user: interpretation is + independent of access. + + Whenever an identifier that is not globally unique is passed to + another entity outside of the scope of uniqueness, it will refer to a + different resource and can result in a false positive. This problem + is often addressed by using the identifier together with some other + unique identifier of the context. For example, "alice" may uniquely + identify a user within a system but must be used with "example.com" + (as in "alice@example.com") to uniquely identify the context outside + of that system. + + It is also worth noting that IPv6 addresses that are not globally + scoped can be written with, or otherwise associated with, a "zone ID" + to identify the context (see [RFC4007] for more information). + However, zone IDs are only unique within a host, so they typically + narrow, rather than expand, the scope of uniqueness of the resulting + identifier. + +4.4. Temporality + + Often, identifiers are not unique across all time but have some + lifetime associated with them after which they may be reassigned to + another entity. For example, bob@example.com might be assigned to an + employee of the Example company, but if he leaves and another Bob is + later hired, the same identifier might be reused. As another + example, IP address 203.0.113.1 might be assigned to one subscriber + and then later reassigned to another subscriber. Security issues can + arise if updates are not made in all entities that store the + identifier (e.g., in an access control list as discussed in + Section 2, or in a resolution cache as discussed in Section 3.1.4). + + + +Thaler Informational [Page 21] + +RFC 6943 Identifier Comparison May 2013 + + + This issue is similar to the issue of scope discussed in Section 4.3, + except that the scope of uniqueness is temporal rather than + topological. + +5. Security Considerations + + This entire document is about security considerations. + + To minimize issues related to elevation of privilege, any system that + requires the ability to use both deny and allow operations within the + same identifier space should avoid the use of Indefinite identifiers + in security comparisons. + + To minimize future security risks, any new identifiers being designed + should specify an Absolute or Definite comparison algorithm, and if + extensibility is allowed (e.g., as new schemes in URIs allow), then + the comparison algorithm should remain invariant so that unrecognized + extensions can be compared. That is, security risks can be reduced + by specifying the comparison algorithm, making sure to resolve any + ambiguities pointed out in this document (e.g., "standard dotted + decimal"). + + Some issues (such as unrecognized extensions) can be mitigated by + treating such identifiers as invalid. Validity checking of + identifiers is further discussed in [RFC3696]. + + Perhaps the hardest issues arise when multiple protocols are used + together, such as in Figure 2, where the two protocols are defined or + implemented using different comparison algorithms. When constructing + an architecture that uses multiple such protocols, designers should + pay attention to any differences in comparison algorithms among the + protocols in order to fully understand the security risks. How to + deal with such security risks in current systems is an area for + future work. + +6. Acknowledgements + + Yaron Goland contributed to the discussion on URIs. Patrik Faltstrom + contributed to the background on identifiers. John Klensin + contributed text in a number of different sections. Additional + helpful feedback and suggestions came from Bernard Aboba, Fred Baker, + Leslie Daigle, Mark Davis, Jeff Hodges, Bjoern Hoehrmann, Russ + Housley, Christian Huitema, Magnus Nystrom, Tom Petch, and Chris + Weber. + + + + + + + +Thaler Informational [Page 22] + +RFC 6943 Identifier Comparison May 2013 + + +7. IAB Members at the Time of Approval + + Bernard Aboba + Jari Arkko + Marc Blanchet + Ross Callon + Alissa Cooper + Spencer Dawkins + Joel Halpern + Russ Housley + David Kessens + Danny McPherson + Jon Peterson + Dave Thaler + Hannes Tschofenig + +8. Informative References + + [IAB1123] Internet Architecture Board, "IAB Statement: 'The + interpretation of rules in the ICANN gTLD Applicant + Guidebook'", February 2012, . + + [IANA-PORT] + IANA, "Service Name and Transport Protocol Port Number + Registry", March 2013, + . + + [IEEE-1003.1] + IEEE and The Open Group, "The Open Group Base + Specifications, Issue 6, IEEE Std 1003.1, 2004 Edition", + IEEE Std 1003.1, 2004. + + [JAVAURL] Oracle, "Class URL", Java(TM) Platform Standard Ed. 7, + 2013, . + + [OASIS-SAMLv2-CORE] + Cantor, S., Ed., Kemp, J., Ed., Philpott, R., Ed., and E. + Maler, Ed., "Assertions and Protocols for the OASIS + Security Assertion Markup Language (SAML) V2.0", OASIS + Standard saml-core-2.0-os, March 2005, + . + + + + +Thaler Informational [Page 23] + +RFC 6943 Identifier Comparison May 2013 + + + [OAuth-WRAP] + Hardt, D., Ed., Tom, A., Eaton, B., and Y. Goland, "OAuth + Web Resource Authorization Profiles", Work in Progress, + January 2010. + + [PRIVACY-CONS] + Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., + Morris, J., Hansen, M., and R. Smith, "Privacy + Considerations for Internet Protocols", Work in Progress, + April 2013. + + [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", + STD 13, RFC 1034, November 1987. + + [RFC1123] Braden, R., "Requirements for Internet Hosts - Application + and Support", STD 3, RFC 1123, October 1989. + + [RFC2277] Alvestrand, H.T., "IETF Policy on Character Sets and + Languages", BCP 18, RFC 2277, January 1998. + + [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, + "Internationalizing Domain Names in Applications (IDNA)", + RFC 3490, March 2003. + + [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode + for Internationalized Domain Names in Applications + (IDNA)", RFC 3492, March 2003. + + [RFC3493] Gilligan, R., Thomson, S., Bound, J., McCann, J., and W. + Stevens, "Basic Socket Interface Extensions for IPv6", + RFC 3493, February 2003. + + [RFC3696] Klensin, J., "Application Techniques for Checking and + Transformation of Names", RFC 3696, February 2004. + + [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform + Resource Identifier (URI): Generic Syntax", STD 66, + RFC 3986, January 2005. + + [RFC4007] Deering, S., Haberman, B., Jinmei, T., Nordmark, E., and + B. Zill, "IPv6 Scoped Address Architecture", RFC 4007, + March 2005. + + [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing + Architecture", RFC 4291, February 2006. + + + + + + +Thaler Informational [Page 24] + +RFC 6943 Identifier Comparison May 2013 + + + [RFC4790] Newman, C., Duerst, M., and A. Gulbrandsen, "Internet + Application Protocol Collation Registry", RFC 4790, + March 2007. + + [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", + RFC 4949, August 2007. + + [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., + Housley, R., and W. Polk, "Internet X.509 Public Key + Infrastructure Certificate and Certificate Revocation List + (CRL) Profile", RFC 5280, May 2008. + + [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, + October 2008. + + [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322, + October 2008. + + [RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6 + Address Text Representation", RFC 5952, August 2010. + + [RFC6055] Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on + Encodings for Internationalized Domain Names", RFC 6055, + February 2011. + + [RFC6066] Eastlake, D., "Transport Layer Security (TLS) Extensions: + Extension Definitions", RFC 6066, January 2011. + + [RFC6125] Saint-Andre, P. and J. Hodges, "Representation and + Verification of Domain-Based Application Service Identity + within Internet Public Key Infrastructure Using X.509 + (PKIX) Certificates in the Context of Transport Layer + Security (TLS)", RFC 6125, March 2011. + + [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. + Cheshire, "Internet Assigned Numbers Authority (IANA) + Procedures for the Management of the Service Name and + Transport Protocol Port Number Registry", BCP 165, + RFC 6335, August 2011. + + [RFC6530] Klensin, J. and Y. Ko, "Overview and Framework for + Internationalized Email", RFC 6530, February 2012. + + [RFC6532] Yang, A., Steele, S., and N. Freed, "Internationalized + Email Headers", RFC 6532, February 2012. + + + + + + +Thaler Informational [Page 25] + +RFC 6943 Identifier Comparison May 2013 + + + [RFC6818] Yee, P., "Updates to the Internet X.509 Public Key + Infrastructure Certificate and Certificate Revocation List + (CRL) Profile", RFC 6818, January 2013. + + [RFC6874] Carpenter, B., Cheshire, S., and R. Hinden, "Representing + IPv6 Zone Identifiers in Address Literals and Uniform + Resource Identifiers", RFC 6874, February 2013. + + [RFC6885] Blanchet, M. and A. Sullivan, "Stringprep Revision and + Problem Statement for the Preparation and Comparison of + Internationalized Strings (PRECIS)", RFC 6885, March 2013. + + [TR36] Unicode Consortium, "Unicode Security Considerations", + Unicode Technical Report #36, Revision 11, July 2012, + . + + [USASCII] American National Standards Institute, "Coded Character + Sets -- 7-bit American Standard Code for Information + Interchange (7-bit ASCII)", ANSI X3.4, 1986. + + [WEBER] Weber, C., "Attacking Software Globalization", March 2010, + . + +Author's Address + + Dave Thaler (editor) + Microsoft Corporation + One Microsoft Way + Redmond, WA 98052 + USA + + Phone: +1 425 703 8835 + EMail: dthaler@microsoft.com + + + + + + + + + + + + + + + + + +Thaler Informational [Page 26] + -- cgit v1.2.3