diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc8264.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc8264.txt')
-rw-r--r-- | doc/rfc/rfc8264.txt | 2410 |
1 files changed, 2410 insertions, 0 deletions
diff --git a/doc/rfc/rfc8264.txt b/doc/rfc/rfc8264.txt new file mode 100644 index 0000000..e7bb3cc --- /dev/null +++ b/doc/rfc/rfc8264.txt @@ -0,0 +1,2410 @@ + + + + + + +Internet Engineering Task Force (IETF) P. Saint-Andre +Request for Comments: 8264 Jabber.org +Obsoletes: 7564 M. Blanchet +Category: Standards Track Viagenie +ISSN: 2070-1721 October 2017 + + + PRECIS Framework: Preparation, Enforcement, and Comparison of + Internationalized Strings in Application Protocols + +Abstract + + Application protocols using Unicode code points in protocol strings + need to properly handle such strings in order to enforce + internationalization rules for strings placed in various protocol + slots (such as addresses and identifiers) and to perform valid + comparison operations (e.g., for purposes of authentication or + authorization). This document defines a framework enabling + application protocols to perform the preparation, enforcement, and + comparison of internationalized strings ("PRECIS") in a way that + depends on the properties of Unicode code points and thus is more + agile with respect to versions of Unicode. As a result, this + framework provides a more sustainable approach to the handling of + internationalized strings than the previous framework, known as + Stringprep (RFC 3454). This document obsoletes RFC 7564. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc8264. + +Copyright Notice + + Copyright (c) 2017 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + + + +Saint-Andre & Blanchet Standards Track [Page 1] + +RFC 8264 PRECIS Framework October 2017 + + + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 + 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 + 3. Preparation, Enforcement, and Comparison . . . . . . . . . . 6 + 4. String Classes . . . . . . . . . . . . . . . . . . . . . . . 8 + 4.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 8 + 4.2. IdentifierClass . . . . . . . . . . . . . . . . . . . . . 9 + 4.2.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 9 + 4.2.2. Contextual Rule Required . . . . . . . . . . . . . . 10 + 4.2.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 10 + 4.2.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 10 + 4.2.5. Examples . . . . . . . . . . . . . . . . . . . . . . 11 + 4.3. FreeformClass . . . . . . . . . . . . . . . . . . . . . . 11 + 4.3.1. Valid . . . . . . . . . . . . . . . . . . . . . . . . 11 + 4.3.2. Contextual Rule Required . . . . . . . . . . . . . . 12 + 4.3.3. Disallowed . . . . . . . . . . . . . . . . . . . . . 12 + 4.3.4. Unassigned . . . . . . . . . . . . . . . . . . . . . 12 + 4.3.5. Examples . . . . . . . . . . . . . . . . . . . . . . 12 + 4.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 5. Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . 14 + 5.1. Profiles Must Not Be Multiplied beyond Necessity . . . . 14 + 5.2. Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 15 + 5.2.1. Width Mapping Rule . . . . . . . . . . . . . . . . . 15 + 5.2.2. Additional Mapping Rule . . . . . . . . . . . . . . . 15 + 5.2.3. Case Mapping Rule . . . . . . . . . . . . . . . . . . 16 + 5.2.4. Normalization Rule . . . . . . . . . . . . . . . . . 16 + 5.2.5. Directionality Rule . . . . . . . . . . . . . . . . . 17 + 5.3. A Note about Spaces . . . . . . . . . . . . . . . . . . . 18 + 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . 18 + 6.1. How to Use PRECIS in Applications . . . . . . . . . . . . 18 + 6.2. Further Excluded Characters . . . . . . . . . . . . . . . 20 + 6.3. Building Application-Layer Constructs . . . . . . . . . . 20 + 7. Order of Operations . . . . . . . . . . . . . . . . . . . . . 21 + 8. Code Point Properties . . . . . . . . . . . . . . . . . . . . 21 + 9. Category Definitions Used to Calculate Derived Property . . . 24 + 9.1. LetterDigits (A) . . . . . . . . . . . . . . . . . . . . 25 + 9.2. Unstable (B) . . . . . . . . . . . . . . . . . . . . . . 25 + 9.3. IgnorableProperties (C) . . . . . . . . . . . . . . . . . 25 + 9.4. IgnorableBlocks (D) . . . . . . . . . . . . . . . . . . . 25 + 9.5. LDH (E) . . . . . . . . . . . . . . . . . . . . . . . . . 25 + + + +Saint-Andre & Blanchet Standards Track [Page 2] + +RFC 8264 PRECIS Framework October 2017 + + + 9.6. Exceptions (F) . . . . . . . . . . . . . . . . . . . . . 25 + 9.7. BackwardCompatible (G) . . . . . . . . . . . . . . . . . 25 + 9.8. JoinControl (H) . . . . . . . . . . . . . . . . . . . . . 26 + 9.9. OldHangulJamo (I) . . . . . . . . . . . . . . . . . . . . 26 + 9.10. Unassigned (J) . . . . . . . . . . . . . . . . . . . . . 26 + 9.11. ASCII7 (K) . . . . . . . . . . . . . . . . . . . . . . . 26 + 9.12. Controls (L) . . . . . . . . . . . . . . . . . . . . . . 27 + 9.13. PrecisIgnorableProperties (M) . . . . . . . . . . . . . . 27 + 9.14. Spaces (N) . . . . . . . . . . . . . . . . . . . . . . . 27 + 9.15. Symbols (O) . . . . . . . . . . . . . . . . . . . . . . . 27 + 9.16. Punctuation (P) . . . . . . . . . . . . . . . . . . . . . 27 + 9.17. HasCompat (Q) . . . . . . . . . . . . . . . . . . . . . . 28 + 9.18. OtherLetterDigits (R) . . . . . . . . . . . . . . . . . . 28 + 10. Guidelines for Designated Experts . . . . . . . . . . . . . . 28 + 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 + 11.1. PRECIS Derived Property Value Registry . . . . . . . . . 29 + 11.2. PRECIS Base Classes Registry . . . . . . . . . . . . . . 29 + 11.3. PRECIS Profiles Registry . . . . . . . . . . . . . . . . 30 + 12. Security Considerations . . . . . . . . . . . . . . . . . . . 32 + 12.1. General Issues . . . . . . . . . . . . . . . . . . . . . 32 + 12.2. Use of the IdentifierClass . . . . . . . . . . . . . . . 33 + 12.3. Use of the FreeformClass . . . . . . . . . . . . . . . . 33 + 12.4. Local Character Set Issues . . . . . . . . . . . . . . . 33 + 12.5. Visually Similar Characters . . . . . . . . . . . . . . 33 + 12.6. Security of Passwords . . . . . . . . . . . . . . . . . 35 + 13. Interoperability Considerations . . . . . . . . . . . . . . . 36 + 13.1. Coded Character Sets . . . . . . . . . . . . . . . . . . 36 + 13.2. Dependency on Unicode . . . . . . . . . . . . . . . . . 37 + 13.3. Encoding . . . . . . . . . . . . . . . . . . . . . . . . 37 + 13.4. Unicode Versions . . . . . . . . . . . . . . . . . . . . 37 + 13.5. Potential Changes to Handling of Certain Unicode Code + Points . . . . . . . . . . . . . . . . . . . . . . . . . 37 + 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 + 14.1. Normative References . . . . . . . . . . . . . . . . . . 38 + 14.2. Informative References . . . . . . . . . . . . . . . . . 39 + Appendix A. Changes from RFC 7564 . . . . . . . . . . . . . . . 43 + Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 43 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 43 + +1. Introduction + + Application protocols using Unicode code points [Unicode] in protocol + strings need to properly handle such strings in order to enforce + internationalization rules for strings placed in various protocol + slots (such as addresses and identifiers) and to perform valid + comparison operations (e.g., for purposes of authentication or + authorization). This document defines a framework enabling + application protocols to perform the preparation, enforcement, and + + + +Saint-Andre & Blanchet Standards Track [Page 3] + +RFC 8264 PRECIS Framework October 2017 + + + comparison of internationalized strings ("PRECIS") in a way that + depends on the properties of Unicode code points and thus is more + agile with respect to versions of Unicode. (Note: PRECIS is + restricted to Unicode and does not support any other coded character + set [RFC6365].) + + As described in the PRECIS problem statement [RFC6885], many IETF + protocols have used the Stringprep framework [RFC3454] as the basis + for preparing, enforcing, and comparing protocol strings that contain + Unicode code points, especially code points outside the ASCII range + [RFC20]. The Stringprep framework was developed during work on the + original technology for internationalized domain names (IDNs), here + called "IDNA2003" [RFC3490], and Nameprep [RFC3491] was the + Stringprep profile for IDNs. At the time, Stringprep was designed as + a general framework so that other application protocols could define + their own Stringprep profiles. Indeed, a number of application + protocols defined such profiles. + + After the publication of [RFC3454] in 2002, several significant + issues arose with the use of Stringprep in the IDN case, as + documented in the IAB's recommendations regarding IDNs [RFC4690] + (most significantly, Stringprep was tied to Unicode version 3.2). + Therefore, the newer IDNA specifications, here called "IDNA2008" + [RFC5890] [RFC5891] [RFC5892] [RFC5893] [RFC5894], no longer use + Stringprep and Nameprep. This migration away from Stringprep for + IDNs prompted other "customers" of Stringprep to consider new + approaches to the preparation, enforcement, and comparison of + internationalized strings, as described in [RFC6885]. + + This document defines a framework for a post-Stringprep approach to + the preparation, enforcement, and comparison of internationalized + strings in application protocols, based on several principles: + + 1. Define a small set of string classes that specify the Unicode + code points appropriate for common application-protocol + constructs (where possible, maintaining compatibility with + IDNA2008 to help ensure a more consistent user experience). + + 2. Define each PRECIS string class in terms of Unicode code points + and their properties so that an algorithm can be used to + determine whether each code point or character category is + (a) valid, (b) allowed in certain contexts, (c) disallowed, or + (d) unassigned. + + 3. Use an "inclusion model" such that a string class consists only + of code points that are explicitly allowed, with the result that + any code point not explicitly allowed is forbidden. + + + + +Saint-Andre & Blanchet Standards Track [Page 4] + +RFC 8264 PRECIS Framework October 2017 + + + 4. Enable application protocols to define profiles of the PRECIS + string classes if necessary (addressing matters such as width + mapping, case mapping, Unicode normalization, and + directionality), but strongly discourage the multiplication of + profiles beyond necessity in order to avoid violations of the + "Principle of Least Astonishment". + + It is expected that this framework will yield the following benefits: + + o Application protocols will be more agile with regard to Unicode + versions (recognizing that complete agility cannot be realized in + practice). + + o Implementers will be able to share code point tables and software + code across application protocols, most likely by means of + software libraries. + + o End users will be able to acquire more accurate expectations about + the code points that are acceptable in various contexts. Given + this more uniform set of string classes, it is also expected that + copy/paste operations between software implementing different + application protocols will be more predictable and coherent. + + Whereas the string classes define the "baseline" code points for a + range of applications, profiling enables application protocols to + apply the string classes in ways that are appropriate for common + constructs such as usernames [RFC8265], opaque strings such as + passwords [RFC8265], and nicknames [RFC8266]. Profiles are + responsible for defining the handling of right-to-left code points as + well as various mapping operations of the kind also discussed for + IDNs in [RFC5895], such as case preservation or lowercasing, Unicode + normalization, mapping of certain code points to other code points or + to nothing, and mapping of fullwidth and halfwidth code points. + + When an application applies a profile of a PRECIS string class, it + transforms an input string (which might or might not be conforming) + into an output string that definitively conforms to the profile. In + particular, this document focuses on the resulting ability to achieve + the following objectives: + + a. Enforcing all the rules of a profile for a single output string + to check whether the output string conforms to the rules of the + profile and thus determine if a string can be included in a + protocol slot, communicated to another entity within a protocol, + stored in a retrieval system, etc. + + b. Comparing two output strings to determine if they are equivalent, + typically through octet-for-octet matching to test for + + + +Saint-Andre & Blanchet Standards Track [Page 5] + +RFC 8264 PRECIS Framework October 2017 + + + "bit-string identity" (e.g., to make an access decision for + purposes of authentication or authorization as further described + in [RFC6943]). + + The opportunity to define profiles naturally introduces the + possibility of a proliferation of profiles, thus potentially + mitigating the benefits of common code and violating user + expectations. See Section 5 for a discussion of this important + topic. + + In addition, it is extremely important for protocol designers and + application developers to understand that the transformation of an + input string to an output string is rarely reversible. As one + relatively simple example, case mapping would transform an input + string of "StPeter" to an output string of "stpeter", thus leading to + a loss of information about the capitalization of the first and third + characters. Similar considerations apply to other forms of mapping + and normalization. + + Although this framework is similar to IDNA2008 and includes by + reference some of the character categories defined in [RFC5892], it + defines additional character categories to meet the needs of common + application protocols other than DNS. + + The character categories and calculation rules defined under + Sections 8 and 9 are normative and apply to all Unicode code points. + The code point table that results from applying the character + categories and calculation rules to the latest version of Unicode can + be found in an IANA registry (see Section 11). + +2. Terminology + + Many important terms used in this document are defined in [RFC5890], + [RFC6365], [RFC6885], and [Unicode]. The terms "left-to-right" (LTR) + and "right-to-left" (RTL) are defined in Unicode Standard Annex #9 + [UAX9]. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + +3. Preparation, Enforcement, and Comparison + + This document distinguishes between three different actions that an + entity can take with regard to a string: + + + + +Saint-Andre & Blanchet Standards Track [Page 6] + +RFC 8264 PRECIS Framework October 2017 + + + o Enforcement entails applying all of the rules specified for a + particular string class, or profile thereof, to a single input + string, for the purpose of checking whether the string conforms to + all of the rules and thus determining if the string can be used in + a given protocol slot. + + o Comparison entails applying all of the rules specified for a + particular string class, or profile thereof, to two separate input + strings, for the purpose of determining if the two strings are + equivalent. + + o Preparation primarily entails ensuring that the code points in a + single input string are allowed by the underlying PRECIS string + class, and sometimes also entails applying one or more of the + rules specified for a particular string class or profile thereof. + Preparation can be appropriate for constrained devices that can to + some extent restrict the code points in a string to a limited + repertoire of characters but that do not have the processing power + or onboard memory to perform operations such as Unicode + normalization. However, preparation does not ensure that an input + string conforms to all of the rules for a string class or profile + thereof. + + Note: The term "preparation" as used in this specification and + related documents has a much more limited scope than it did in + Stringprep; it essentially refers to a kind of preprocessing of + an input string, not the actual operations that apply + internationalization rules to produce an output string (here + termed "enforcement") or to compare two output strings (here + termed "comparison"). + + In most cases, authoritative entities such as servers are responsible + for enforcement, whereas subsidiary entities such as clients are + responsible only for preparation. The rationale for this distinction + is that clients might not have the facilities (in terms of device + memory and processing power) to enforce all the rules regarding + internationalized strings (such as width mapping and Unicode + normalization), although they can more easily limit the repertoire of + characters they offer to an end user. By contrast, it is assumed + that a server would have more capacity to enforce the rules, and in + any case a server acts as an authority regarding allowable strings in + protocol slots such as addresses and endpoint identifiers. In + addition, a client cannot necessarily be trusted to properly generate + such strings, especially for security-sensitive contexts such as + authentication and authorization. + + + + + + +Saint-Andre & Blanchet Standards Track [Page 7] + +RFC 8264 PRECIS Framework October 2017 + + +4. String Classes + +4.1. Overview + + Starting in 2010, various "customers" of Stringprep began to discuss + the need to define a post-Stringprep approach to the preparation and + comparison of internationalized strings other than IDNs. This + community analyzed the existing Stringprep profiles and also weighed + the costs and benefits of defining a relatively small set of Unicode + code points that would minimize the potential for user confusion + caused by visually similar code points (and thus be relatively + "safe") vs. defining a much larger set of Unicode code points that + would maximize the potential for user creativity (and thus be + relatively "expressive"). As a result, the community concluded that + most existing uses could be addressed by two string classes: + + IdentifierClass: a sequence of letters, numbers, and some symbols + that is used to identify or address a network entity such as a + user account, a venue (e.g., a chat room), an information source + (e.g., a data feed), or a collection of data (e.g., a file); the + intent is that this class will minimize user confusion in a wide + variety of application protocols, with the result that safety has + been prioritized over expressiveness for this class. + + FreeformClass: a sequence of letters, numbers, symbols, spaces, and + other code points that is used for free-form strings, including + passwords as well as display elements such as human-friendly + nicknames for devices or for participants in a chat room; the + intent is that this class will allow nearly any Unicode code + point, with the result that expressiveness has been prioritized + over safety for this class. Note well that protocol designers, + application developers, service providers, and end users might not + understand or be able to enter all of the code points that can be + included in the FreeformClass (see Section 12.3 for details). + + Future specifications might define additional PRECIS string classes, + such as a class that falls somewhere between the IdentifierClass and + the FreeformClass. At this time, it is not clear how useful such a + class would be. In any case, because application developers are able + to define profiles of PRECIS string classes, a protocol needing a + construct between the IdentifierClass and the FreeformClass could + define a restricted profile of the FreeformClass if needed. + + The following subsections discuss the IdentifierClass and + FreeformClass in more detail, with reference to the dimensions + described in Section 5 of [RFC6885]. Each string class is defined by + the following behavioral rules: + + + + +Saint-Andre & Blanchet Standards Track [Page 8] + +RFC 8264 PRECIS Framework October 2017 + + + Valid: Defines which code points are treated as valid for the + string. + + Contextual Rule Required: Defines which code points are treated as + allowed only if the requirements of a contextual rule are met + (i.e., either CONTEXTJ or CONTEXTO as originally defined in the + IDNA2008 specifications). + + Disallowed: Defines which code points need to be excluded from the + string. + + Unassigned: Defines application behavior in the presence of code + points that are unknown (i.e., not yet designated) for the version + of Unicode used by the application. + + This document defines the valid, contextual rule required, + disallowed, and unassigned rules for the IdentifierClass and + FreeformClass. As described under Section 5, profiles of these + string classes are responsible for defining the width mapping, + additional mapping, case mapping, normalization, and directionality + rules. + +4.2. IdentifierClass + + Most application technologies need strings that can be used to refer + to, include, or communicate protocol strings like usernames, + filenames, data feed identifiers, and chat room names. We group such + strings into a class called "IdentifierClass" having the following + features. + +4.2.1. Valid + + o Code points traditionally used as letters and numbers in writing + systems, i.e., the LetterDigits ("A") category first defined in + [RFC5892] and listed here under Section 9.1. + + o Code points in the range U+0021 through U+007E, i.e., the + (printable) ASCII7 ("K") category defined under Section 9.11. + These code points are "grandfathered" into PRECIS and thus are + valid even if they would otherwise be disallowed according to the + property-based rules specified in the next section. + + Note: Although the PRECIS IdentifierClass reuses the LetterDigits + category from IDNA2008, the range of code points allowed in the + IdentifierClass is wider than the range of code points allowed in + IDNA2008. The main reason is that IDNA2008 applies the + Unstable ("B") category (Section 9.2) before the LetterDigits + + + + +Saint-Andre & Blanchet Standards Track [Page 9] + +RFC 8264 PRECIS Framework October 2017 + + + category, thus disallowing uppercase code points, whereas the + IdentifierClass does not apply the Unstable category. + +4.2.2. Contextual Rule Required + + o A number of code points from the Exceptions ("F") category defined + under Section 9.6. + + o Joining code points, i.e., the JoinControl ("H") category defined + under Section 9.8. + +4.2.3. Disallowed + + o Old Hangul Jamo code points, i.e., the OldHangulJamo ("I") + category defined under Section 9.9. + + o Control code points, i.e., the Controls ("L") category defined + under Section 9.12. + + o Ignorable code points, i.e., the PrecisIgnorableProperties ("M") + category defined under Section 9.13. + + o Space code points, i.e., the Spaces ("N") category defined under + Section 9.14. + + o Symbol code points, i.e., the Symbols ("O") category defined under + Section 9.15. + + o Punctuation code points, i.e., the Punctuation ("P") category + defined under Section 9.16. + + o Any code point that is decomposed and recomposed into something + other than itself under Unicode Normalization Form KC, i.e., the + HasCompat ("Q") category defined under Section 9.17. These code + points are disallowed even if they would otherwise be valid + according to the property-based rules specified in the previous + section. + + o Letters and digits other than the "traditional" letters and digits + allowed in IDNs, i.e., the OtherLetterDigits ("R") category + defined under Section 9.18. + +4.2.4. Unassigned + + Any code points that are not yet designated in the Unicode coded + character set are considered unassigned for purposes of the + IdentifierClass, and such code points are to be treated as + disallowed. See Section 9.10. + + + +Saint-Andre & Blanchet Standards Track [Page 10] + +RFC 8264 PRECIS Framework October 2017 + + +4.2.5. Examples + + As described in the Introduction to this document, the string classes + do not handle all issues related to string preparation and comparison + (such as case mapping); instead, such issues are handled at the level + of profiles. Examples for profiles of the IdentifierClass can be + found in [RFC8265] (the UsernameCaseMapped and UsernameCasePreserved + profiles). + +4.3. FreeformClass + + Some application technologies need strings that can be used in a + free-form way, e.g., as a password in an authentication exchange (see + [RFC8265]) or a nickname in a chat room (see [RFC8266]). We group + such things into a class called "FreeformClass" having the following + features. + + Security Warning: As mentioned, the FreeformClass prioritizes + expressiveness over safety; Section 12.3 describes some of the + security hazards involved with using or profiling the + FreeformClass. + + Security Warning: Consult Section 12.6 for relevant security + considerations when strings conforming to the FreeformClass, or a + profile thereof, are used as passwords. + +4.3.1. Valid + + o Traditional letters and numbers, i.e., the LetterDigits ("A") + category first defined in [RFC5892] and listed here under + Section 9.1. + + o Code points in the range U+0021 through U+007E, i.e., the + (printable) ASCII7 ("K") category defined under Section 9.11. + + o Space code points, i.e., the Spaces ("N") category defined under + Section 9.14. + + o Symbol code points, i.e., the Symbols ("O") category defined under + Section 9.15. + + o Punctuation code points, i.e., the Punctuation ("P") category + defined under Section 9.16. + + o Any code point that is decomposed and recomposed into something + other than itself under Unicode Normalization Form KC, i.e., the + HasCompat ("Q") category defined under Section 9.17. + + + + +Saint-Andre & Blanchet Standards Track [Page 11] + +RFC 8264 PRECIS Framework October 2017 + + + o Letters and digits other than the "traditional" letters and digits + allowed in IDNs, i.e., the OtherLetterDigits ("R") category + defined under Section 9.18. + +4.3.2. Contextual Rule Required + + o A number of code points from the Exceptions ("F") category defined + under Section 9.6. + + o Joining code points, i.e., the JoinControl ("H") category defined + under Section 9.8. + +4.3.3. Disallowed + + o Old Hangul Jamo code points, i.e., the OldHangulJamo ("I") + category defined under Section 9.9. + + o Control code points, i.e., the Controls ("L") category defined + under Section 9.12. + + o Ignorable code points, i.e., the PrecisIgnorableProperties ("M") + category defined under Section 9.13. + +4.3.4. Unassigned + + Any code points that are not yet designated in the Unicode coded + character set are considered unassigned for purposes of the + FreeformClass, and such code points are to be treated as disallowed. + +4.3.5. Examples + + As described in the Introduction to this document, the string classes + do not handle all issues related to string preparation and comparison + (such as case mapping); instead, such issues are handled at the level + of profiles. Examples for profiles of the FreeformClass can be found + in [RFC8265] (the OpaqueString profile) and [RFC8266] (the Nickname + profile). + +4.4. Summary + + The following table summarizes the differences between the + IdentifierClass and the FreeformClass (i.e., the disposition of a + code point as valid, contextual rule required, disallowed, or + unassigned), depending on its PRECIS category. + + + + + + + +Saint-Andre & Blanchet Standards Track [Page 12] + +RFC 8264 PRECIS Framework October 2017 + + + +===============================+=================+===============+ + | CATEGORY | IDENTIFIERCLASS | FREEFORMCLASS | + +===============================+=================+===============+ + | (A) LetterDigits | Valid | Valid | + +-------------------------------+-----------------+---------------+ + | (B) Unstable | [N/A (unused)] | + +-------------------------------+-----------------+---------------+ + | (C) IgnorableProperties | [N/A (unused)] | + +-------------------------------+-----------------+---------------+ + | (D) IgnorableBlocks | [N/A (unused)] | + +-------------------------------+-----------------+---------------+ + | (E) LDH | [N/A (unused)] | + +-------------------------------+-----------------+---------------+ + | (F) Exceptions | Contextual | Contextual | + | | Rule Required | Rule Required | + +-------------------------------+-----------------+---------------+ + | (G) BackwardCompatible | [Handled by IDNA Rules] | + +-------------------------------+-----------------+---------------+ + | (H) JoinControl | Contextual | Contextual | + | | Rule Required | Rule Required | + +-------------------------------+-----------------+---------------+ + | (I) OldHangulJamo | Disallowed | Disallowed | + +-------------------------------+-----------------+---------------+ + | (J) Unassigned | Unassigned | Unassigned | + +-------------------------------+-----------------+---------------+ + | (K) ASCII7 | Valid | Valid | + +-------------------------------+-----------------+---------------+ + | (L) Controls | Disallowed | Disallowed | + +-------------------------------+-----------------+---------------+ + | (M) PrecisIgnorableProperties | Disallowed | Disallowed | + +-------------------------------+-----------------+---------------+ + | (N) Spaces | Disallowed | Valid | + +-------------------------------+-----------------+---------------+ + | (O) Symbols | Disallowed | Valid | + +-------------------------------+-----------------+---------------+ + | (P) Punctuation | Disallowed | Valid | + +-------------------------------+-----------------+---------------+ + | (Q) HasCompat | Disallowed | Valid | + +-------------------------------+-----------------+---------------+ + | (R) OtherLetterDigits | Disallowed | Valid | + +-------------------------------+-----------------+---------------+ + + Table 1: Comparative Disposition of Code Points + + + + + + + + +Saint-Andre & Blanchet Standards Track [Page 13] + +RFC 8264 PRECIS Framework October 2017 + + +5. Profiles + + This framework document defines the valid, contextual rule required, + disallowed, and unassigned rules for the IdentifierClass and the + FreeformClass. A profile of a PRECIS string class MUST define the + width mapping, additional mapping (if any), case mapping, + normalization, and directionality rules. A profile MAY also restrict + the allowable code points above and beyond the definition of the + relevant PRECIS string class (but MUST NOT add as valid any code + points that are disallowed by the relevant PRECIS string class). + These matters are discussed in the following subsections. + + Profiles of the PRECIS string classes are registered with the IANA as + described under Section 11.3. Profile names use the following + convention: they are of the form "Profilename of BaseClass", where + the "Profilename" string is a differentiator and "BaseClass" is the + name of the PRECIS string class being profiled; for example, the + profile used for opaque strings such as passwords is the OpaqueString + profile of the FreeformClass [RFC8265]. + +5.1. Profiles Must Not Be Multiplied beyond Necessity + + The risk of profile proliferation is significant because having too + many profiles will result in different behavior across various + applications, thus violating what is known in user interface design + as the "Principle of Least Astonishment". + + Indeed, we already have too many profiles. Ideally, we would have at + most two or three profiles. Unfortunately, numerous application + protocols exist with their own quirks regarding protocol strings. + Domain names, email addresses, instant messaging addresses, chat room + names, user nicknames or display names, filenames, authentication + identifiers, passwords, and other strings already exist in the wild + and need to be supported in existing application protocols such as + DNS, SMTP, the Extensible Messaging and Presence Protocol (XMPP), + Internet Relay Chat (IRC), NFS, the Internet Small Computer System + Interface (iSCSI), the Extensible Authentication Protocol (EAP), and + the Simple Authentication and Security Layer (SASL) [RFC4422], among + others. + + Nevertheless, profiles must not be multiplied beyond necessity. + + To help prevent profile proliferation, this document recommends + sensible defaults for the various options offered to profile creators + (such as width mapping and Unicode normalization). In addition, the + guidelines for designated experts provided under Section 10 are meant + to encourage a high level of due diligence regarding new profiles. + + + + +Saint-Andre & Blanchet Standards Track [Page 14] + +RFC 8264 PRECIS Framework October 2017 + + +5.2. Rules + +5.2.1. Width Mapping Rule + + The width mapping rule of a profile specifies whether width mapping + is performed on a string and how the mapping is done. Typically, + such mapping consists of mapping fullwidth and halfwidth code points, + i.e., code points with a Decomposition Type of Wide or Narrow, to + their decomposition mappings; as an example, "0" (FULLWIDTH DIGIT + ZERO, U+FF10) would be mapped to "0" (DIGIT ZERO U+0030). + + The normalization form specified by a profile (see below) has an + impact on the need for width mapping. Because width mapping is + performed as a part of compatibility decomposition, a profile + employing either Normalization Form KD (NFKD) or Normalization + Form KC (NFKC) does not need to specify width mapping. However, if + Unicode Normalization Form C (NFC) is used (as is recommended), then + the profile needs to specify whether to apply width mapping; in this + case, width mapping is in general RECOMMENDED because allowing + fullwidth and halfwidth code points to remain unmapped to their + compatibility variants would violate the "Principle of Least + Astonishment". For more information about the concept of width in + East Asian scripts within Unicode, see Unicode Standard Annex #11 + [UAX11]. + + Note: Because the East Asian width property is not guaranteed to + be stable by the Unicode Standard (see + <http://unicode.org/policies/stability_policy.html> for details), + the results of applying a given width mapping rule might not be + consistent across different versions of Unicode. + +5.2.2. Additional Mapping Rule + + The additional mapping rule of a profile specifies whether additional + mappings are performed on a string, such as: + + o Mapping of delimiter code points (such as '@', ':', '/', '+', + and '-'). + + o Mapping of special code points (e.g., non-ASCII space code points + to SPACE (U+0020) or control code points to nothing). + + The PRECIS mappings document [RFC7790] describes such mappings in + more detail. + + + + + + + +Saint-Andre & Blanchet Standards Track [Page 15] + +RFC 8264 PRECIS Framework October 2017 + + +5.2.3. Case Mapping Rule + + The case mapping rule of a profile specifies whether case mapping + (instead of case preservation) is performed on a string and how the + mapping is applied (e.g., mapping uppercase and titlecase code points + to their lowercase equivalents). + + If case mapping is desired (instead of case preservation), it is + RECOMMENDED to use the Unicode toLowerCase() operation defined in the + Unicode Standard [Unicode]. In contrast to the Unicode toCaseFold() + operation, the toLowerCase() operation is less likely to violate the + "Principle of Least Astonishment", especially when an application + merely wishes to convert uppercase and titlecase code points to their + lowercase equivalents while preserving lowercase code points. + Although the toCaseFold() operation can be appropriate when an + application needs to compare two strings (such as in search + operations), in general few application developers and even fewer + users understand its implications, so toLowerCase() is almost always + the safer choice. + + Note: Neither toLowerCase() nor toCaseFold() is designed to handle + various language-specific issues, such as the character "ı" (LATIN + SMALL LETTER DOTLESS I, U+0131) in several Turkic languages. The + reader is referred to the PRECIS mappings document [RFC7790], + which describes these issues in greater detail. + + In order to maximize entropy and minimize the potential for false + accepts, it is NOT RECOMMENDED for application protocols to map + uppercase and titlecase code points to their lowercase equivalents + when strings conforming to the FreeformClass, or a profile thereof, + are used in passwords; instead, it is RECOMMENDED to preserve the + case of all code points contained in such strings and then perform + case-sensitive comparison. See also the related discussion in + Section 12.6 of this document and in [RFC8265]. + +5.2.4. Normalization Rule + + The normalization rule of a profile specifies which Unicode + Normalization Form (D, KD, C, or KC) is to be applied (see Unicode + Standard Annex #15 [UAX15] for background information). + + In accordance with [RFC5198], Normalization Form C (NFC) is + RECOMMENDED. + + Protocol designers and application developers need to understand that + certain Unicode normalization forms, especially NFKC and NFKD, can + result in significant loss of information in various circumstances + and that these circumstances can depend on the language and script of + + + +Saint-Andre & Blanchet Standards Track [Page 16] + +RFC 8264 PRECIS Framework October 2017 + + + the strings to which the normalization forms are applied. Extreme + care should be taken when specifying the use of these normalization + forms. + +5.2.5. Directionality Rule + + The directionality rule of a profile specifies how to treat strings + containing what are often called "right-to-left" (RTL) code points + (see Unicode Standard Annex #9 [UAX9]). RTL code points come from + scripts that are normally written from right to left and are + considered by Unicode to, themselves, have right-to-left + directionality. Some strings containing RTL code points also contain + "left-to-right" (LTR) code points, such as ASCII numerals, as well as + code points without directional properties. Consequently, such + strings are known as "bidirectional strings". + + Presenting bidirectional strings in different layout systems (e.g., a + user interface that is configured to handle primarily an RTL script + vs. an interface that is configured to handle primarily an LTR + script) can yield display results that, while predictable to those + who understand the display rules, are counterintuitive to casual + users. In particular, the same bidirectional string (in PRECIS + terms) might not be presented in the same way to users of those + different layout systems, even though the presentation is consistent + within any particular layout system. In some applications, these + presentation differences might be considered problematic and thus the + application designers might wish to restrict the use of bidirectional + strings by specifying a directionality rule. In other applications, + these presentation differences might not be considered problematic + (this especially tends to be true of more "free-form" strings) and + thus no directionality rule is needed. + + The PRECIS framework does not directly address how to deal with + bidirectional strings across all string classes and profiles nor does + it define any new directionality rules, because at present there is + no widely accepted and implemented solution for the safe display of + arbitrary bidirectional strings beyond the Unicode bidirectional + algorithm [UAX9]. Although rules for management and display of + bidirectional strings have been defined for domain name labels and + similar identifiers through the "Bidi Rule" specified in the IDNA2008 + specification on right-to-left scripts [RFC5893], those rules are + quite restrictive and are not necessarily applicable to all + bidirectional strings. + + The authors of a PRECIS profile might believe that they need to + define a new directionality rule of their own. Because of the + complexity of the issues involved, such a belief is almost always + misguided, even if the authors have done a great deal of careful + + + +Saint-Andre & Blanchet Standards Track [Page 17] + +RFC 8264 PRECIS Framework October 2017 + + + research into the challenges of displaying bidirectional strings. + This document strongly suggests that profile authors who are thinking + about defining a new directionality rule should think again and + instead consider using the "Bidi Rule" [RFC5893] (for profiles based + on the IdentifierClass) or following the Unicode bidirectional + algorithm [UAX9] (for profiles based on the FreeformClass or in + situations where the IdentifierClass is not appropriate). + +5.3. A Note about Spaces + + With regard to the IdentifierClass, the consensus of the PRECIS + Working Group was that spaces are problematic for many reasons, + including the following: + + o Many Unicode code points are confusable with SPACE (U+0020). + + o Even if non-ASCII space code points are mapped to SPACE (U+0020), + space code points are often not rendered in user interfaces, + leading to the possibility that a human user might consider a + string containing spaces to be equivalent to the same string + without spaces. + + o In some locales, some devices are known to generate a code point + other than SPACE (U+0020), such as ZERO WIDTH JOINER (U+200D), + when a user performs an action like pressing the space bar on a + keyboard. + + One consequence of disallowing space code points in the + IdentifierClass might be to effectively discourage their use within + identifiers created in newer application protocols; given the + challenges involved with properly handling space code points + (especially non-ASCII space code points) in identifiers and other + protocol strings, the PRECIS Working Group considered this to be a + feature, not a bug. + + However, the FreeformClass does allow spaces; this in turn enables + application protocols to define profiles of the FreeformClass that + are more flexible than any profiles of the IdentifierClass. In + addition, as explained in Section 6.3, application protocols can also + define application-layer constructs containing spaces. + +6. Applications + +6.1. How to Use PRECIS in Applications + + Although PRECIS has been designed with applications in mind, + internationalization is not suddenly made easy through the use of + PRECIS. Indeed, because it is extremely difficult for protocol + + + +Saint-Andre & Blanchet Standards Track [Page 18] + +RFC 8264 PRECIS Framework October 2017 + + + designers and application developers to do the right thing for all + users when supporting internationalized strings, often the safest + option is to support only the ASCII range [RFC20] in various protocol + slots. This state of affairs is unfortunate but is the direct result + of the complexities involved with human languages (e.g., the vast + number of code points, scripts, user communities, and rules with + their inevitable exceptions), which kinds of strings application + developers and their users wish to support, the wide range of devices + that users employ to access services enabled by various Internet + protocols, and so on. + + Despite these significant challenges, application and protocol + developers sometimes persevere in attempting to support + internationalized strings in their systems. These developers need to + think carefully about how they will use the PRECIS string classes, or + profiles thereof, in their applications. This section provides some + guidelines to application developers (and to expert reviewers of + application-protocol specifications). + + o Don't define your own profile unless absolutely necessary (see + Section 5.1). Existing profiles have been designed for wide + reuse. It is highly likely that an existing profile will meet + your needs, especially given the ability to specify further + excluded code points (Section 6.2) and to build application-layer + constructs (see Section 6.3). + + o Do specify: + + * Exactly which entities are responsible for preparation, + enforcement, and comparison of internationalized strings (e.g., + servers or clients). + + * Exactly when those entities need to complete their tasks (e.g., + a server might need to enforce the rules of a profile before + allowing a client to gain network access). + + * Exactly which protocol slots need to be checked against which + profiles (e.g., checking the address of a message's intended + recipient against the UsernameCaseMapped profile [RFC8265] of + the IdentifierClass or checking the password of a user against + the OpaqueString profile [RFC8265] of the FreeformClass). + + See [RFC8265] and [RFC7622] for definitions of these matters for + several applications. + + + + + + + +Saint-Andre & Blanchet Standards Track [Page 19] + +RFC 8264 PRECIS Framework October 2017 + + +6.2. Further Excluded Characters + + An application protocol that uses a profile MAY specify particular + code points that are not allowed in relevant slots within that + application protocol, above and beyond those excluded by the string + class or profile. + + That is, an application protocol MAY do either of the following: + + 1. Exclude specific code points that are allowed by the relevant + string class. + + 2. Exclude code points matching certain Unicode properties (e.g., + math symbols) that are included in the relevant PRECIS string + class. + + As a result of such exclusions, code points that are defined as valid + for the PRECIS string class or profile will be defined as disallowed + for the relevant protocol slot. + + Typically, such exclusions are defined for the purpose of backward + compatibility with legacy formats within an application protocol. + These are defined for application protocols, not profiles, in order + to prevent multiplication of profiles beyond necessity (see + Section 5.1). + +6.3. Building Application-Layer Constructs + + Sometimes, an application-layer construct does not map in a + straightforward manner to one of the PRECIS string classes or a + profile thereof. Consider, for example, the "simple username" + construct in SASL [RFC4422]. Depending on the deployment, a simple + username might take the form of a user's full name (e.g., the user's + personal name followed by a space and then the user's family name). + Such a simple username cannot be defined as an instance of the + IdentifierClass or a profile thereof, because space code points are + not allowed in the IdentifierClass; however, it could be defined + using a space-separated sequence of IdentifierClass instances, as in + the following ABNF [RFC5234] from [RFC8265]: + + username = userpart *(1*SP userpart) + userpart = 1*(idpoint) + ; + ; an "idpoint" is a Unicode code point that + ; can be contained in a string conforming to + ; the PRECIS IdentifierClass + ; + + + + +Saint-Andre & Blanchet Standards Track [Page 20] + +RFC 8264 PRECIS Framework October 2017 + + + Similar techniques could be used to define many application-layer + constructs, say of the form "user@domain" or "/path/to/file". + +7. Order of Operations + + To ensure proper comparison, the rules specified for a particular + string class or profile MUST be applied in the following order: + + 1. Width Mapping Rule + + 2. Additional Mapping Rule + + 3. Case Mapping Rule + + 4. Normalization Rule + + 5. Directionality Rule + + 6. Behavioral rules for determining whether a code point is valid, + allowed under a contextual rule, disallowed, or unassigned + + As already described, the width mapping, additional mapping, case + mapping, normalization, and directionality rules are specified for + each profile, whereas the behavioral rules are specified for each + string class. Some of the logic behind this order is provided under + Section 5.2.1 (see also the PRECIS mappings document [RFC7790]). In + addition, this order is consistent with IDNA2008, and with both + IDNA2003 and Stringprep before then, for the purpose of enabling code + reuse and of ensuring as much continuity as possible with the + Stringprep profiles that are obsoleted by several PRECIS profiles. + + Because of the order of operations specified here, applying the rules + for any given PRECIS profile is not necessarily an idempotent + procedure (e.g., under certain circumstances, such as when Unicode + Normalization Form KC is used, performing Unicode normalization after + case mapping can still yield uppercase characters for certain code + points). Therefore, an implementation SHOULD apply the rules + repeatedly until the output string is stable; if the output string + does not stabilize after reapplying the rules three (3) additional + times after the first application, the implementation SHOULD + terminate application of the rules and reject the input string as + invalid. + +8. Code Point Properties + + In order to implement the string classes described above, this + document does the following: + + + + +Saint-Andre & Blanchet Standards Track [Page 21] + +RFC 8264 PRECIS Framework October 2017 + + + 1. Reviews and classifies the collections of code points in the + Unicode coded character set by examining various code point + properties. + + 2. Defines an algorithm for determining a derived property value, + which can depend on the string class being used by the relevant + application protocol. + + This document is not intended to specify precisely how derived + property values are to be applied in protocol strings. That + information is the responsibility of the protocol specification that + uses or profiles a PRECIS string class from this document. The value + of the property is to be interpreted as follows. + + PROTOCOL VALID Those code points that are allowed to be used in any + PRECIS string class (currently, IdentifierClass and + FreeformClass). The abbreviated term "PVALID" is used to refer to + this value in the remainder of this document. + + SPECIFIC CLASS PROTOCOL VALID Those code points that are allowed to + be used in specific string classes. In the remainder of this + document, the abbreviated term *_PVAL is used, where * = (ID | + FREE), i.e., either "FREE_PVAL" for the FreeformClass or "ID_PVAL" + for the IdentifierClass. In practice, the derived property + ID_PVAL is not used in this specification, because every ID_PVAL + code point is PVALID. + + CONTEXTUAL RULE REQUIRED Some characteristics of the code point, + such as its being invisible in certain contexts or problematic in + others, require that it not be used in a string unless specific + other code points or properties are present in the string. As in + IDNA2008, there are two subdivisions of CONTEXTUAL RULE REQUIRED: + the first for Join_controls (called "CONTEXTJ") and the second for + other code points (called "CONTEXTO"). A string MUST NOT contain + any characters whose validity is context-dependent, unless the + validity is positively confirmed by a contextual rule. To check + this, each code point identified as CONTEXTJ or CONTEXTO in the + "PRECIS Derived Property Value" registry (Section 11.1) MUST have + a non-null rule. If such a code point is missing a rule, the + string is invalid. If the rule exists but the result of applying + the rule is negative or inconclusive, the proposed string is + invalid. The most notable of the CONTEXTUAL RULE REQUIRED code + points are the Join Control code points ZERO WIDTH JOINER (U+200D) + and ZERO WIDTH NON-JOINER (U+200C), which have a derived property + value of CONTEXTJ. See Appendix A of [RFC5892] for more + information. + + + + + +Saint-Andre & Blanchet Standards Track [Page 22] + +RFC 8264 PRECIS Framework October 2017 + + + DISALLOWED Those code points that are not permitted in any PRECIS + string class. + + SPECIFIC CLASS DISALLOWED Those code points that are not to be + included in one of the string classes but that might be permitted + in others. In the remainder of this document, the abbreviated + term *_DIS is used, where * = (ID | FREE), i.e., either "FREE_DIS" + for the FreeformClass or "ID_DIS" for the IdentifierClass. In + practice, the derived property FREE_DIS is not used in this + specification, because every FREE_DIS code point is DISALLOWED. + + UNASSIGNED Those code points that are not designated (i.e., are + unassigned) in the Unicode Standard. + + The algorithm to calculate the value of the derived property is as + follows (implementations MUST NOT modify the order of operations + within this algorithm, because doing so would cause inconsistent + results across implementations): + + If .cp. .in. Exceptions Then Exceptions(cp); + Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp); + Else If .cp. .in. Unassigned Then UNASSIGNED; + Else If .cp. .in. ASCII7 Then PVALID; + Else If .cp. .in. JoinControl Then CONTEXTJ; + Else If .cp. .in. OldHangulJamo Then DISALLOWED; + Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED; + Else If .cp. .in. Controls Then DISALLOWED; + Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL; + Else If .cp. .in. LetterDigits Then PVALID; + Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL; + Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL; + Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL; + Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL; + Else DISALLOWED; + + The value of the derived property calculated can depend on the string + class; for example, if an identifier used in an application protocol + is defined as profiling the PRECIS IdentifierClass then a space + character such as SPACE (U+0020) would be assigned to ID_DIS, whereas + if an identifier is defined as profiling the PRECIS FreeformClass + then the character would be assigned to FREE_PVAL. For the sake of + brevity, the designation "FREE_PVAL" is used herein, instead of the + longer designation "ID_DIS or FREE_PVAL". In practice, the derived + properties ID_PVAL and FREE_DIS are not used in this specification, + because every ID_PVAL code point is PVALID and every FREE_DIS code + point is DISALLOWED. + + + + + +Saint-Andre & Blanchet Standards Track [Page 23] + +RFC 8264 PRECIS Framework October 2017 + + + Use of the name of a rule (such as "Exceptions") implies the set of + code points that the rule defines, whereas the same name as a + function call (such as "Exceptions(cp)") implies the value that the + code point has in the Exceptions table. + + The mechanisms described here allow determination of the value of the + property for future versions of Unicode (including code points added + after Unicode 5.2 or 7.0, depending on the category, because some + categories mentioned in this document are simply pointers to IDNA2008 + and therefore were defined at the time of Unicode 5.2). Changes in + Unicode properties that do not affect the outcome of this process + therefore do not affect this framework. For example, a code point + can have its Unicode General_Category value change from So to Sm, or + from Lo to Ll, without affecting the algorithm results. Moreover, + even if such changes were to result, the BackwardCompatible list + (Section 9.7) can be adjusted to ensure the stability of the results. + +9. Category Definitions Used to Calculate Derived Property + + The derived property obtains its value based on a two-step procedure: + + 1. Code points are placed in one or more character categories either + (1) based on core properties defined by the Unicode Standard or + (2) by treating the code point as an exception and addressing the + code point based on its code point value. These categories are + not mutually exclusive. + + 2. Set operations are used with these categories to determine the + values for a property specific to a given string class. These + operations are specified under Section 8. + + Note: Unicode property names and property value names might have + short abbreviations, such as "gc" for the General_Category + property and "Ll" for the Lowercase_Letter property value of the + gc property. + + In the following specification of character categories, the operation + that returns the value of a particular Unicode code point property + for a code point is designated by using the formal name of that + property (from the Unicode PropertyAliases.txt file [PropertyAliases] + followed by "(cp)" for "code point". For example, the value of the + General_Category property for a code point is indicated by + General_Category(cp). + + The first ten categories (A-J) shown below were previously defined + for IDNA2008 and are referenced from [RFC5892] to ease the + understanding of how PRECIS handles various code points. Some of + these categories are reused in PRECIS, and some of them are not; + + + +Saint-Andre & Blanchet Standards Track [Page 24] + +RFC 8264 PRECIS Framework October 2017 + + + however, the lettering of categories is retained to prevent overlap + and to ease implementation of both IDNA2008 and PRECIS in a single + software application. The next eight categories (K-R) are specific + to PRECIS. + +9.1. LetterDigits (A) + + This category is defined in Section 2.1 of [RFC5892] and is included + by reference for use in PRECIS. + +9.2. Unstable (B) + + This category is defined in Section 2.2 of [RFC5892]. However, it is + not used in PRECIS. + +9.3. IgnorableProperties (C) + + This category is defined in Section 2.3 of [RFC5892]. However, it is + not used in PRECIS. + + Note: See the PrecisIgnorableProperties ("M") category below for a + more inclusive category used in PRECIS identifiers. + +9.4. IgnorableBlocks (D) + + This category is defined in Section 2.4 of [RFC5892]. However, it is + not used in PRECIS. + +9.5. LDH (E) + + This category is defined in Section 2.5 of [RFC5892]. However, it is + not used in PRECIS. + + Note: See the ASCII7 ("K") category below for a more inclusive + category used in PRECIS identifiers. + +9.6. Exceptions (F) + + This category is defined in Section 2.6 of [RFC5892] and is included + by reference for use in PRECIS. + +9.7. BackwardCompatible (G) + + This category is defined in Section 2.7 of [RFC5892] and is included + by reference for use in PRECIS. + + Note: Management of this category is handled via the processes + specified in [RFC5892]. At the time of this writing (and also at the + + + +Saint-Andre & Blanchet Standards Track [Page 25] + +RFC 8264 PRECIS Framework October 2017 + + + time that RFC 5892 was published), this category consisted of the + empty set; however, that is subject to change as described in + RFC 5892. + +9.8. JoinControl (H) + + This category is defined in Section 2.8 of [RFC5892] and is included + by reference for use in PRECIS. + + Note: In particular, the code points ZERO WIDTH JOINER (U+200D) and + ZERO WIDTH NON-JOINER (U+200C) are necessary to produce certain + combinations of characters in certain scripts (e.g., Arabic, Persian, + and Indic scripts), but if used in other contexts, they can have + consequences that violate the "Principle of Least Astonishment". + Therefore, these code points are allowed only in contexts where they + are appropriate, specifically where the relevant rule (CONTEXTJ or + CONTEXTO) has been defined. See [RFC5892] and [RFC5894] for further + discussion. + +9.9. OldHangulJamo (I) + + This category is defined in Section 2.9 of [RFC5892] and is included + by reference for use in PRECIS. + + Note: Exclusion of these code points results in disallowing certain + archaic Korean syllables and in restricting supported Korean + syllables to preformed, modern Hangul characters. + +9.10. Unassigned (J) + + This category is defined in Section 2.10 of [RFC5892] and is included + by reference for use in PRECIS. + +9.11. ASCII7 (K) + + This PRECIS-specific category consists of all printable, non-space + code points from the 7-bit ASCII range. By applying this category, + the algorithm specified under Section 8 exempts these code points + from other rules that might be applied during PRECIS processing, on + the assumption that these code points are in such wide use that + disallowing them would be counterproductive. + + K: cp is in {0021..007E} + + + + + + + + +Saint-Andre & Blanchet Standards Track [Page 26] + +RFC 8264 PRECIS Framework October 2017 + + +9.12. Controls (L) + + This PRECIS-specific category consists of all control code points, + such as LINE FEED (U+000A). + + L: Control(cp) = True + +9.13. PrecisIgnorableProperties (M) + + This PRECIS-specific category is used to group code points that are + discouraged from use in PRECIS string classes. + + M: Default_Ignorable_Code_Point(cp) = True or + Noncharacter_Code_Point(cp) = True + + The definition for Default_Ignorable_Code_Point can be found in the + DerivedCoreProperties.txt file [DerivedCoreProperties]. + + Note: In general, these code points are constructs such as so-called + "soft hyphens", certain joining code points, various specialized code + points for use within Unicode itself (e.g., language tags and + variation selectors), and so on. Disallowing these code points in + PRECIS reduces the potential for unexpected results in the use of + internationalized strings. + +9.14. Spaces (N) + + This PRECIS-specific category is used to group code points that are + spaces. + + N: General_Category(cp) is in {Zs} + +9.15. Symbols (O) + + This PRECIS-specific category is used to group code points that are + symbols. + + O: General_Category(cp) is in {Sm, Sc, Sk, So} + +9.16. Punctuation (P) + + This PRECIS-specific category is used to group code points that are + punctuation. + + P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po} + + + + + + +Saint-Andre & Blanchet Standards Track [Page 27] + +RFC 8264 PRECIS Framework October 2017 + + +9.17. HasCompat (Q) + + This PRECIS-specific category is used to group any code point that is + decomposed and recomposed into something other than itself under + Unicode Normalization Form KC. + + Q: toNFKC(cp) != cp + + Typically, this category is true of code points that are + "compatibility decomposable characters" as defined in the Unicode + Standard. + + The toNFKC() operation returns the code point in Normalization + Form KC. For more information, see Unicode Standard Annex #15 + [UAX15]. + +9.18. OtherLetterDigits (R) + + This PRECIS-specific category is used to group code points that are + letters and digits other than the "traditional" letters and digits + grouped under the LetterDigits ("A") category (see Section 9.1). + + R: General_Category(cp) is in {Lt, Nl, No, Me} + +10. Guidelines for Designated Experts + + Experience with internationalization in application protocols has + shown that protocol designers and application developers usually do + not understand the subtleties and trade-offs involved with + internationalization and that they need considerable guidance in + making reasonable decisions with regard to the options before them. + + Therefore: + + o Protocol designers are strongly encouraged to question the + assumption that they need to define new profiles, because existing + profiles are designed for wide reuse (see Section 5 for further + discussion). + + o Those who persist in defining new profiles are strongly encouraged + to clearly explain a strong justification for doing so and to + publish a stable specification that provides all of the + information described under Section 11.3. + + o The designated experts for profile registration requests ought to + seek answers to all of the questions provided under Section 11.3 + and ought to encourage applicants to provide a stable + specification documenting the profile (even though the + + + +Saint-Andre & Blanchet Standards Track [Page 28] + +RFC 8264 PRECIS Framework October 2017 + + + registration policy for PRECIS profiles is "Expert Review" and a + stable specification is not strictly required). + + o Developers of applications that use PRECIS are strongly encouraged + to apply the guidelines provided under Section 6 and to seek out + the advice of the designated experts or other knowledgeable + individuals in doing so. + + o All parties are strongly encouraged to help prevent the + multiplication of profiles beyond necessity, as described under + Section 5.1, and to use PRECIS in ways that will minimize user + confusion and insecure application behavior. + + Internationalization can be difficult and contentious; designated + experts, profile registrants, and application developers are strongly + encouraged to work together in a spirit of good faith and mutual + understanding to achieve rough consensus on profile registration + requests and the use of PRECIS in particular applications. They are + also encouraged to bring additional expertise into the discussion if + that would be helpful in adding perspective or otherwise resolving + issues. + +11. IANA Considerations + +11.1. PRECIS Derived Property Value Registry + + IANA has created and now maintains the "PRECIS Derived Property + Value" registry (<https://www.iana.org/assignments/precis-tables/>), + which records the derived properties for each version of Unicode + released starting from version 6.3. The derived property value is to + be calculated in cooperation with a designated expert [RFC8126] + according to the rules specified under Sections 8 and 9. + + The IESG is to be notified if backward-incompatible changes to the + table of derived properties are discovered or if other problems arise + during the process of creating the table of derived property values + or during Expert Review. Changes to the rules defined under + Sections 8 and 9 require IETF Review. + + Note: IANA is requested to not make further updates to this registry + until it receives notice from the IESG that the issues described in + [IAB-Statement] and Section 13.5 of this document have been settled. + +11.2. PRECIS Base Classes Registry + + IANA has created the "PRECIS Base Classes" registry + (<https://www.iana.org/assignments/precis-parameters/>). In + accordance with [RFC8126], the registration policy is "RFC Required". + + + +Saint-Andre & Blanchet Standards Track [Page 29] + +RFC 8264 PRECIS Framework October 2017 + + + The registration template is as follows: + + Base Class: [the name of the PRECIS string class] + + Description: [a brief description of the PRECIS string class and its + intended use, e.g., "A sequence of letters, numbers, and symbols + that is used to identify or address a network entity."] + + Reference: [the RFC number] + + The initial registrations are as follows: + + Base Class: FreeformClass + Description: A sequence of letters, numbers, symbols, spaces, and + other code points that is used for free-form strings. + Specification: Section 4.3 of RFC 8264 + + Base Class: IdentifierClass + Description: A sequence of letters, numbers, and symbols that is + used to identify or address a network entity. + Specification: Section 4.2 of RFC 8264 + +11.3. PRECIS Profiles Registry + + IANA has created the "PRECIS Profiles" registry + (<https://www.iana.org/assignments/precis-parameters/>) to identify + profiles that use the PRECIS string classes. In accordance with + [RFC8126], the registration policy is "Expert Review". This policy + was chosen in order to ease the burden of registration while ensuring + that "customers" of PRECIS receive appropriate guidance regarding the + sometimes complex and subtle internationalization issues related to + profiles of PRECIS string classes. + + The registration template is as follows: + + Name: [the name of the profile] + + Base Class: [which PRECIS string class is being profiled] + + Applicability: [the specific protocol elements to which this profile + applies, e.g., "Usernames in security and application protocols."] + + Replaces: [the Stringprep profile that this PRECIS profile replaces, + if any] + + Width Mapping Rule: [the behavioral rule for handling of width, + e.g., "Map fullwidth and halfwidth code points to their + compatibility variants."] + + + +Saint-Andre & Blanchet Standards Track [Page 30] + +RFC 8264 PRECIS Framework October 2017 + + + Additional Mapping Rule: [any additional mappings that are required + or recommended, e.g., "Map non-ASCII space code points to SPACE + (U+0020)."] + + Case Mapping Rule: [the behavioral rule for handling of case, e.g., + "Apply the Unicode toLowerCase() operation."] + + Normalization Rule: [which Unicode normalization form is applied, + e.g., "NFC"] + + Directionality Rule: [the behavioral rule for handling of right-to- + left code points, e.g., "The 'Bidi Rule' defined in RFC 5893 + applies."] + + Enforcement: [which entities enforce the rules, and when that + enforcement occurs during protocol operations] + + Specification: [a pointer to relevant documentation, such as an RFC + or Internet-Draft] + + In order to request a review, the registrant shall send a completed + template to the <precis@ietf.org> list or its designated successor. + + Factors to focus on while defining profiles and reviewing profile + registrations include the following: + + o Would an existing PRECIS string class or profile solve the + problem? If not, why not? (See Section 5.1 for related + considerations.) + + o Is the problem being addressed by this profile well defined? + + o Does the specification define what kinds of applications are + involved and the protocol elements to which this profile applies? + + o Is the profile clearly defined? + + o Is the profile based on an appropriate dividing line between user + interface (culture, context, intent, locale, device limitations, + etc.) and the use of conformant strings in protocol elements? + + o Are the width mapping, case mapping, additional mapping, + normalization, and directionality rules appropriate for the + intended use? + + o Does the profile explain which entities enforce the rules and when + such enforcement occurs during protocol operations? + + + + +Saint-Andre & Blanchet Standards Track [Page 31] + +RFC 8264 PRECIS Framework October 2017 + + + o Does the profile reduce the degree to which human users could be + surprised or confused by application behavior (the "Principle of + Least Astonishment")? + + o Does the profile introduce any new security concerns such as those + described under Section 12 of this document (e.g., false accepts + for authentication or authorization)? + +12. Security Considerations + +12.1. General Issues + + If input strings that appear "the same" to users are programmatically + considered to be distinct in different systems or if input strings + that appear distinct to users are programmatically considered to be + "the same" in different systems, then users can be confused. Such + confusion can have security implications, such as the false accepts + and false rejects discussed in [RFC6943] (the terms "false positives" + and "false negatives" are used in that document). One starting goal + of work on the PRECIS framework was to limit the number of times that + users are confused (consistent with the "Principle of Least + Astonishment"). Unfortunately, this goal has been difficult to + achieve given the large number of application protocols already in + existence. Despite these difficulties, profiles should not be + multiplied beyond necessity (see Section 5.1). In particular, + designers of application protocols should think long and hard before + defining a new profile instead of using one that has already been + defined, and if they decide to define a new profile then they should + clearly explain their reasons for doing so. + + The security of applications that use this framework can depend in + part on the proper preparation, enforcement, and comparison of + internationalized strings. For example, such strings can be used to + make authentication and authorization decisions, and the security of + an application could be compromised if an entity providing a given + string is connected to the wrong account or online resource based on + different interpretations of the string (again, see [RFC6943]). + + Specifications of application protocols that use this framework are + strongly encouraged to describe how internationalized strings are + used in the protocol, including the security implications of any + false accepts and false rejects that might result from various + enforcement and comparison operations. For some helpful guidelines, + refer to [RFC6943], [RFC5890], [UTR36], and [UTS39]. + + + + + + + +Saint-Andre & Blanchet Standards Track [Page 32] + +RFC 8264 PRECIS Framework October 2017 + + +12.2. Use of the IdentifierClass + + Strings that conform to the IdentifierClass, and any profile thereof, + are intended to be relatively safe for use in a broad range of + applications, primarily because they include only letters, digits, + and "grandfathered" non-space code points from the ASCII range; thus, + they exclude spaces, code points with compatibility equivalents, and + almost all symbols and punctuation marks. However, because such + strings can still include so-called "confusable code points" (see + Section 12.5), protocol designers and implementers are encouraged to + pay close attention to the security considerations described + elsewhere in this document. + +12.3. Use of the FreeformClass + + Strings that conform to the FreeformClass, and many profiles thereof, + can include virtually any Unicode code point. This makes the + FreeformClass quite expressive, but also problematic from the + perspective of possible user confusion. Protocol designers are + hereby warned that the FreeformClass contains code points they might + not understand, and they are encouraged to profile the + IdentifierClass wherever feasible; however, if an application + protocol requires more code points than are allowed by the + IdentifierClass, protocol designers are encouraged to define a + profile of the FreeformClass that restricts the allowable code points + as tightly as possible. (The PRECIS Working Group considered the + option of allowing "superclasses" as well as profiles of PRECIS + string classes but decided against allowing superclasses to reduce + the likelihood of security and interoperability problems.) + +12.4. Local Character Set Issues + + When systems use local character sets other than ASCII and Unicode, + this specification leaves the problem of converting between the local + character set and Unicode up to the application or local system. If + different applications (or different versions of one application) + implement different rules for conversions among coded character sets, + they could interpret the same name differently and contact different + application servers or other network entities. This problem is not + solved by security protocols, such as Transport Layer Security (TLS) + [RFC5246] and SASL [RFC4422], that do not take local character sets + into account. + +12.5. Visually Similar Characters + + Some code points are visually similar and thus can cause confusion + among humans. Such characters are often called "confusable + characters" or "confusables". + + + +Saint-Andre & Blanchet Standards Track [Page 33] + +RFC 8264 PRECIS Framework October 2017 + + + The problem of confusable characters is not necessarily caused by the + use of Unicode code points outside the ASCII range. For example, in + some presentations and to some individuals the string "ju1iet" + (spelled with DIGIT ONE (U+0031) as the third character) might appear + to be the same as "juliet" (spelled with LATIN SMALL LETTER L + (U+006C)), especially on casual visual inspection. This phenomenon + is sometimes called "typejacking". + + However, the problem is made more serious by introducing the full + range of Unicode code points into protocol strings. A well-known + example is confusion between "а" CYRILLIC SMALL LETTER A (U+0430) and + "a" LATIN SMALL LETTER A (U+0061). As another example, the + characters "ᏚᎢᎵᏋᎢᏋᏒ" (U+13DA U+13A2 U+13B5 U+13AC U+13A2 U+13AC + U+13D2) from the Cherokee block look similar to the ASCII code points + representing "STPETER" as they might appear when presented using a + "creative" font family. Confusion among such characters is perhaps + not unexpected, given that the alphabetic writing systems involved + all bear a family resemblance or historical lineage. Perhaps more + surprising is confusion among characters from disparate writing + systems, such as "O" (LATIN CAPITAL LETTER O, U+004F), "0" (DIGIT + ZERO, U+0030), "໐" (LAO DIGIT ZERO, U+0ED0), "ዐ" (ETHIOPIC SYLLABLE + PHARYNGEAL A, U+12D0), and other graphemes that have the appearance + of open circles. And the reader needs to be aware that the foregoing + represent merely a small sample of characters that are confusable in + Unicode. + + In some instances of confusable characters, it is unlikely that the + average human could tell the difference between the real string and + the fake string. (Indeed, there is no programmatic way to + distinguish with full certainty which is the fake string and which is + the real string; in some contexts, the string formed of Cherokee code + points might be the real string and the string formed of ASCII code + points might be the fake string.) Because PRECIS-compliant strings + can contain almost any properly encoded Unicode code point, it can be + relatively easy to fake or mimic some strings in systems that use the + PRECIS framework. The fact that some strings are easily confused + introduces security vulnerabilities of the kind that have also + plagued the World Wide Web, specifically the phenomenon known as + phishing. + + Despite the fact that some specific suggestions about identification + and handling of confusable characters appear in the Unicode Security + Considerations [UTR36] and the Unicode Security Mechanisms [UTS39], + it is also true (as noted in [RFC5890]) that "there are no + comprehensive technical solutions to the problems of confusable + characters." Because it is impossible to map visually similar + characters without a great deal of context (such as knowing the font + families used), the PRECIS framework does nothing to map similar- + + + +Saint-Andre & Blanchet Standards Track [Page 34] + +RFC 8264 PRECIS Framework October 2017 + + + looking characters together, nor does it prohibit some characters + because they look like others. + + Nevertheless, specifications for application protocols that use this + framework are strongly encouraged to describe how confusable + characters can be abused to compromise the security of systems that + use the protocol in question, along with any protocol-specific + suggestions for overcoming those threats. In particular, software + implementations and service deployments that use PRECIS-based + technologies are strongly encouraged to define and implement + consistent policies regarding the registration, storage, and + presentation of visually similar characters. The following + recommendations are appropriate: + + 1. An application service SHOULD define a policy that specifies the + scripts or blocks of code points that the service will allow to + be registered (e.g., in an account name) or stored (e.g., in a + filename). Such a policy SHOULD be informed by the languages and + scripts that are used to write registered account names; in + particular, to reduce confusion, the service SHOULD forbid + registration or storage of strings that contain code points from + more than one script and SHOULD restrict registrations to code + points drawn from a very small number of scripts (e.g., scripts + that are well understood by the administrators of the service, to + improve manageability). + + 2. User-oriented application software SHOULD define a policy that + specifies how internationalized strings will be presented to a + human user. Because every human user of such software has a + preferred language or a small set of preferred languages, the + software SHOULD gather that information either explicitly from + the user or implicitly via the operating system of the user's + device. + + The challenges inherent in supporting the full range of Unicode code + points have in the past led some to hope for a way to + programmatically negotiate more restrictive ranges based on locale, + script, or other relevant factors; to tag the locale associated with + a particular string; etc. As a general-purpose internationalization + technology, the PRECIS framework does not include such mechanisms. + +12.6. Security of Passwords + + Two goals of passwords are to maximize the amount of entropy and to + minimize the potential for false accepts. These goals can be + achieved in part by allowing a wide range of code points and by + ensuring that passwords are handled in such a way that code points + are not compared aggressively. Therefore, it is NOT RECOMMENDED for + + + +Saint-Andre & Blanchet Standards Track [Page 35] + +RFC 8264 PRECIS Framework October 2017 + + + application protocols to profile the FreeformClass for use in + passwords in a way that removes entire categories (e.g., by + disallowing symbols or punctuation). Furthermore, it is + NOT RECOMMENDED for application protocols to map uppercase and + titlecase code points to their lowercase equivalents in such strings; + instead, it is RECOMMENDED to preserve the case of all code points + contained in such strings and to compare them in a case-sensitive + manner. + + That said, software implementers need to be aware that there exist + trade-offs between entropy and usability. For example, allowing a + user to establish a password containing "uncommon" code points might + make it difficult for the user to access a service when using an + unfamiliar or constrained input device. + + Some application protocols use passwords directly, whereas others + reuse technologies that themselves process passwords (one example of + such a technology is SASL [RFC4422]). Moreover, passwords are often + carried by a sequence of protocols with backend authentication + systems or data storage systems such as RADIUS [RFC2865] and the + Lightweight Directory Access Protocol (LDAP) [RFC4510]. Developers + of application protocols are encouraged to look into reusing these + profiles instead of defining new ones, so that end-user expectations + about passwords are consistent no matter which application protocol + is used. + + In protocols that provide passwords as input to a cryptographic + algorithm such as a hash function, the client will need to perform + proper preparation of the password before applying the algorithm, + because the password is not available to the server in plaintext + form. + + Further discussion of password handling can be found in [RFC8265]. + +13. Interoperability Considerations + +13.1. Coded Character Sets + + It is known that some existing applications and systems do not + support the full Unicode coded character set, or even any characters + outside the ASCII repertoire [RFC20]. If two (or more) applications + or systems need to interoperate when exchanging data (e.g., for the + purpose of authenticating the combination of a username and + password), naturally they will need to have in common at least one + coded character set and the repertoire of characters being exchanged + (see [RFC6365] for definitions of these terms). Establishing such a + baseline is a matter for the application or system that uses PRECIS, + not for the PRECIS framework. + + + +Saint-Andre & Blanchet Standards Track [Page 36] + +RFC 8264 PRECIS Framework October 2017 + + +13.2. Dependency on Unicode + + The only coded character set supported by PRECIS is Unicode. If an + application or system does not support Unicode or uses a different + coded character set [RFC6365], then the PRECIS rules cannot be + applied to that application or system. + +13.3. Encoding + + Although strings that are consumed in PRECIS-based application + protocols are often encoded using UTF-8 [RFC3629], the exact encoding + is a matter for the application protocol that uses PRECIS, not for + the PRECIS framework or for specifications that define PRECIS string + classes or profiles thereof. + +13.4. Unicode Versions + + It is extremely important for protocol designers and application + developers to understand that various changes can occur across + versions of the Unicode Standard, and such changes can result in + instability of PRECIS categories. The following are merely a few + examples: + + o As described in [RFC6452], between Unicode 5.2 (current at the + time IDNA2008 was originally published) and Unicode 6.0, three + code points underwent changes in their GeneralCategory, resulting + in modified handling, depending on which version of Unicode is + available on the underlying system. + + o The HasCompat() categorization of a given input string could + change if, for example, the string includes a precomposed + character that was added in a recent version of Unicode. + + o The East Asian width property, which is used in many PRECIS width + mapping rules, is not guaranteed to be stable across Unicode + versions. + +13.5. Potential Changes to Handling of Certain Unicode Code Points + + As part of the review of Unicode 7.0 for IDNA, a question was raised + about a newly added code point that led to a re-analysis of the + normalization rules used by IDNA and inherited by this document + (Section 5.2.4). Some of the general issues are described in + [IAB-Statement] and pursued in more detail in [IDNA-Unicode]. + + At the time of this writing, these issues have yet to be settled. + However, implementers need to be aware that this specification is + + + + +Saint-Andre & Blanchet Standards Track [Page 37] + +RFC 8264 PRECIS Framework October 2017 + + + likely to be updated in the future to address these issues. The + potential changes include but might not be limited to the following: + + o The range of code points in the LetterDigits category + (Sections 4.2.1 and 9.1) might be narrowed. + + o Some code points with special properties that are now allowed + might be excluded. + + o More additional mapping rules (Section 5.2.2) might be defined. + + o Alternative normalization methods might be added. + + As described in Section 11.1, until these issues are settled, it is + reasonable for the IANA to apply the same precautionary principle + described in [IAB-Statement] to the "PRECIS Derived Property Value" + registry as is applied to the "IDNA Parameters" registry + <https://www.iana.org/assignments/idna-tables/>: that is, to not make + further updates to the registry. + + Nevertheless, implementations and deployments are unlikely to + encounter significant problems as a consequence of these issues or + potential changes if they follow the advice given in this + specification to use the more restrictive IdentifierClass whenever + possible or, if using the FreeformClass, to allow only a restricted + set of code points, particularly avoiding code points whose + implications they do not understand. + +14. References + +14.1. Normative References + + [RFC20] Cerf, V., "ASCII format for network interchange", STD 80, + RFC 20, DOI 10.17487/RFC0020, October 1969, + <https://www.rfc-editor.org/info/rfc20>. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network + Interchange", RFC 5198, DOI 10.17487/RFC5198, March 2008, + <https://www.rfc-editor.org/info/rfc5198>. + + + + + + + +Saint-Andre & Blanchet Standards Track [Page 38] + +RFC 8264 PRECIS Framework October 2017 + + + [RFC6365] Hoffman, P. and J. Klensin, "Terminology Used in + Internationalization in the IETF", BCP 166, RFC 6365, + DOI 10.17487/RFC6365, September 2011, + <https://www.rfc-editor.org/info/rfc6365>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + + [Unicode] The Unicode Consortium, "The Unicode Standard", + <http://www.unicode.org/versions/latest/>. + +14.2. Informative References + + [DerivedCoreProperties] + The Unicode Consortium, "DerivedCoreProperties- + 10.0.0.txt", Unicode Character Database, March 2017, + <http://www.unicode.org/Public/UCD/latest/ucd/ + DerivedCoreProperties.txt>. + + [Err4568] RFC Errata, Erratum ID 4568, RFC 7564, + <https://www.rfc-editor.org/errata/eid4568>. + + [IAB-Statement] + Internet Architecture Board, "IAB Statement on Identifiers + and Unicode 7.0.0", February 2015, + <https://www.iab.org/documents/ + correspondence-reports-documents/2015-2/ + iab-statement-on-identifiers-and-unicode-7-0-0/>. + + [IDNA-Unicode] + Klensin, J. and P. Faltstrom, "IDNA Update for Unicode + 7.0.0", Work in Progress, draft-klensin-idna-5892upd- + unicode70-04, March 2015. + + [PropertyAliases] + The Unicode Consortium, "PropertyAliases-10.0.0.txt", + Unicode Character Database, February 2017, + <http://www.unicode.org/Public/UCD/latest/ucd/ + PropertyAliases.txt>. + + [RFC2865] Rigney, C., Willens, S., Rubens, A., and W. Simpson, + "Remote Authentication Dial In User Service (RADIUS)", + RFC 2865, DOI 10.17487/RFC2865, June 2000, + <https://www.rfc-editor.org/info/rfc2865>. + + + + + + +Saint-Andre & Blanchet Standards Track [Page 39] + +RFC 8264 PRECIS Framework October 2017 + + + [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of + Internationalized Strings ("stringprep")", RFC 3454, + DOI 10.17487/RFC3454, December 2002, + <https://www.rfc-editor.org/info/rfc3454>. + + [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, + "Internationalizing Domain Names in Applications (IDNA)", + RFC 3490, DOI 10.17487/RFC3490, March 2003, + <https://www.rfc-editor.org/info/rfc3490>. + + [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep + Profile for Internationalized Domain Names (IDN)", + RFC 3491, DOI 10.17487/RFC3491, March 2003, + <https://www.rfc-editor.org/info/rfc3491>. + + [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO + 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November + 2003, <https://www.rfc-editor.org/info/rfc3629>. + + [RFC4422] Melnikov, A., Ed. and K. Zeilenga, Ed., "Simple + Authentication and Security Layer (SASL)", RFC 4422, + DOI 10.17487/RFC4422, June 2006, + <https://www.rfc-editor.org/info/rfc4422>. + + [RFC4510] Zeilenga, K., Ed., "Lightweight Directory Access Protocol + (LDAP): Technical Specification Road Map", RFC 4510, + DOI 10.17487/RFC4510, June 2006, + <https://www.rfc-editor.org/info/rfc4510>. + + [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and + Recommendations for Internationalized Domain Names + (IDNs)", RFC 4690, DOI 10.17487/RFC4690, September 2006, + <https://www.rfc-editor.org/info/rfc4690>. + + [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax + Specifications: ABNF", STD 68, RFC 5234, + DOI 10.17487/RFC5234, January 2008, + <https://www.rfc-editor.org/info/rfc5234>. + + [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security + (TLS) Protocol Version 1.2", RFC 5246, + DOI 10.17487/RFC5246, August 2008, + <https://www.rfc-editor.org/info/rfc5246>. + + [RFC5890] Klensin, J., "Internationalized Domain Names for + Applications (IDNA): Definitions and Document Framework", + RFC 5890, DOI 10.17487/RFC5890, August 2010, + <https://www.rfc-editor.org/info/rfc5890>. + + + +Saint-Andre & Blanchet Standards Track [Page 40] + +RFC 8264 PRECIS Framework October 2017 + + + [RFC5891] Klensin, J., "Internationalized Domain Names in + Applications (IDNA): Protocol", RFC 5891, + DOI 10.17487/RFC5891, August 2010, + <https://www.rfc-editor.org/info/rfc5891>. + + [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and + Internationalized Domain Names for Applications (IDNA)", + RFC 5892, DOI 10.17487/RFC5892, August 2010, + <https://www.rfc-editor.org/info/rfc5892>. + + [RFC5893] Alvestrand, H., Ed. and C. Karp, "Right-to-Left Scripts + for Internationalized Domain Names for Applications + (IDNA)", RFC 5893, DOI 10.17487/RFC5893, August 2010, + <https://www.rfc-editor.org/info/rfc5893>. + + [RFC5894] Klensin, J., "Internationalized Domain Names for + Applications (IDNA): Background, Explanation, and + Rationale", RFC 5894, DOI 10.17487/RFC5894, August 2010, + <https://www.rfc-editor.org/info/rfc5894>. + + [RFC5895] Resnick, P. and P. Hoffman, "Mapping Characters for + Internationalized Domain Names in Applications (IDNA) + 2008", RFC 5895, DOI 10.17487/RFC5895, September 2010, + <https://www.rfc-editor.org/info/rfc5895>. + + [RFC6452] Faltstrom, P., Ed. and P. Hoffman, Ed., "The Unicode Code + Points and Internationalized Domain Names for Applications + (IDNA) - Unicode 6.0", RFC 6452, DOI 10.17487/RFC6452, + November 2011, <https://www.rfc-editor.org/info/rfc6452>. + + [RFC6885] Blanchet, M. and A. Sullivan, "Stringprep Revision and + Problem Statement for the Preparation and Comparison of + Internationalized Strings (PRECIS)", RFC 6885, + DOI 10.17487/RFC6885, March 2013, + <https://www.rfc-editor.org/info/rfc6885>. + + [RFC6943] Thaler, D., Ed., "Issues in Identifier Comparison for + Security Purposes", RFC 6943, DOI 10.17487/RFC6943, May + 2013, <https://www.rfc-editor.org/info/rfc6943>. + + [RFC7564] Saint-Andre, P. and M. Blanchet, "PRECIS Framework: + Preparation, Enforcement, and Comparison of + Internationalized Strings in Application Protocols", + RFC 7564, DOI 10.17487/RFC7564, May 2015, + <https://www.rfc-editor.org/info/rfc7564>. + + + + + + +Saint-Andre & Blanchet Standards Track [Page 41] + +RFC 8264 PRECIS Framework October 2017 + + + [RFC7622] Saint-Andre, P., "Extensible Messaging and Presence + Protocol (XMPP): Address Format", RFC 7622, + DOI 10.17487/RFC7622, September 2015, + <https://www.rfc-editor.org/info/rfc7622>. + + [RFC7790] Yoneya, Y. and T. Nemoto, "Mapping Characters for Classes + of the Preparation, Enforcement, and Comparison of + Internationalized Strings (PRECIS)", RFC 7790, + DOI 10.17487/RFC7790, February 2016, + <https://www.rfc-editor.org/info/rfc7790>. + + [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for + Writing an IANA Considerations Section in RFCs", BCP 26, + RFC 8126, DOI 10.17487/RFC8126, June 2017, + <https://www.rfc-editor.org/info/rfc8126>. + + [RFC8265] Saint-Andre, P. and A. Melnikov, "Preparation, + Enforcement, and Comparison of Internationalized Strings + Representing Usernames and Passwords", RFC 8265, + DOI 10.17487/RFC8265, October 2017, + <https://www.rfc-editor.org/info/rfc8265>. + + [RFC8266] Saint-Andre, P., "Preparation, Enforcement, and Comparison + of Internationalized Strings Representing Nicknames", + RFC 8266, DOI 10.17487/RFC8266, October 2017, + <https://www.rfc-editor.org/info/rfc8266>. + + [UAX11] Unicode Standard Annex #11, "East Asian Width", edited by + Ken Lunde. An integral part of The Unicode Standard, + <http://unicode.org/reports/tr11/>. + + [UAX15] Unicode Standard Annex #15, "Unicode Normalization Forms", + edited by Mark Davis and Ken Whistler. An integral part + of The Unicode Standard, + <http://unicode.org/reports/tr15/>. + + [UAX9] Unicode Standard Annex #9, "Unicode Bidirectional + Algorithm", edited by Mark Davis, Aharon Lanin, and Andrew + Glass. An integral part of The Unicode Standard, + <http://unicode.org/reports/tr9/>. + + [UTR36] Unicode Technical Report #36, "Unicode Security + Considerations", edited by Mark Davis and Michel Suignard, + <http://unicode.org/reports/tr36/>. + + [UTS39] Unicode Technical Standard #39, "Unicode Security + Mechanisms", edited by Mark Davis and Michel Suignard, + <http://unicode.org/reports/tr39/>. + + + +Saint-Andre & Blanchet Standards Track [Page 42] + +RFC 8264 PRECIS Framework October 2017 + + +Appendix A. Changes from RFC 7564 + + The following changes were made from [RFC7564]. + + o Recommended the Unicode toLowerCase() operation over the Unicode + toCaseFold() operation in most PRECIS applications. + + o Clarified the meaning of "preparation", and described the + motivation for including it in PRECIS. + + o Updated references. + + See [RFC7564] for a description of the differences from [RFC3454]. + +Acknowledgements + + Thanks to Martin Duerst, William Fisher, John Klensin, Christian + Schudt, and Sam Whited for their feedback. Thanks to Sam Whited also + for submitting [Err4568]. + + See [RFC7564] for acknowledgements related to the specification that + this document supersedes. + + Some algorithms and textual descriptions have been borrowed from + [RFC5892]. Some text regarding security has been borrowed from + [RFC5890], [RFC8265], and [RFC7622]. + +Authors' Addresses + + Peter Saint-Andre + Jabber.org + P.O. Box 787 + Parker, CO 80134 + United States of America + + Phone: +1 720 256 6756 + Email: stpeter@jabber.org + URI: https://www.jabber.org/ + + + Marc Blanchet + Viagenie + 246 Aberdeen + Québec, QC G1R 2E1 + Canada + + Email: Marc.Blanchet@viagenie.ca + URI: http://www.viagenie.ca/ + + + +Saint-Andre & Blanchet Standards Track [Page 43] |