summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc7940.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc7940.txt')
-rw-r--r--doc/rfc/rfc7940.txt4595
1 files changed, 4595 insertions, 0 deletions
diff --git a/doc/rfc/rfc7940.txt b/doc/rfc/rfc7940.txt
new file mode 100644
index 0000000..9f0fea8
--- /dev/null
+++ b/doc/rfc/rfc7940.txt
@@ -0,0 +1,4595 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) K. Davies
+Request for Comments: 7940 ICANN
+Category: Standards Track A. Freytag
+ISSN: 2070-1721 ASMUS, Inc.
+ August 2016
+
+
+ Representing Label Generation Rulesets Using XML
+
+Abstract
+
+ This document describes a method of representing rules for validating
+ identifier labels and alternate representations of those labels using
+ Extensible Markup Language (XML). These policies, known as "Label
+ Generation Rulesets" (LGRs), are used for the implementation of
+ Internationalized Domain Names (IDNs), for example. The rulesets are
+ used to implement and share that aspect of policy defining which
+ labels and Unicode code points are permitted for registrations, which
+ alternative code points are considered variants, and what actions may
+ be performed on labels containing those variants.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc7940.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 1]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+Copyright Notice
+
+ Copyright (c) 2016 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction ....................................................4
+ 2. Design Goals ....................................................5
+ 3. Normative Language ..............................................6
+ 4. LGR Format ......................................................6
+ 4.1. Namespace ..................................................7
+ 4.2. Basic Structure ............................................7
+ 4.3. Metadata ...................................................8
+ 4.3.1. The "version" Element ...............................8
+ 4.3.2. The "date" Element ..................................9
+ 4.3.3. The "language" Element ..............................9
+ 4.3.4. The "scope" Element ................................10
+ 4.3.5. The "description" Element ..........................10
+ 4.3.6. The "validity-start" and "validity-end" Elements ...11
+ 4.3.7. The "unicode-version" Element ......................11
+ 4.3.8. The "references" Element ...........................12
+ 5. Code Points and Variants .......................................13
+ 5.1. Sequences .................................................14
+ 5.2. Conditional Contexts ......................................15
+ 5.3. Variants ..................................................16
+ 5.3.1. Basic Variants .....................................16
+ 5.3.2. The "type" Attribute ...............................17
+ 5.3.3. Null Variants ......................................18
+ 5.3.4. Variants with Reflexive Mapping ....................19
+ 5.3.5. Conditional Variants ...............................20
+ 5.4. Annotations ...............................................22
+ 5.4.1. The "ref" Attribute ................................22
+ 5.4.2. The "comment" Attribute ............................23
+ 5.5. Code Point Tagging ........................................23
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 2]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ 6. Whole Label and Context Evaluation .............................23
+ 6.1. Basic Concepts ............................................23
+ 6.2. Character Classes .........................................25
+ 6.2.1. Declaring and Invoking Named Classes ...............25
+ 6.2.2. Tag-Based Classes ..................................26
+ 6.2.3. Unicode Property-Based Classes .....................26
+ 6.2.4. Explicitly Declared Classes ........................28
+ 6.2.5. Combined Classes ...................................29
+ 6.3. Whole Label and Context Rules .............................30
+ 6.3.1. The "rule" Element .................................31
+ 6.3.2. The Match Operators ................................32
+ 6.3.3. The "count" Attribute ..............................33
+ 6.3.4. The "name" and "by-ref" Attributes .................34
+ 6.3.5. The "choice" Element ...............................34
+ 6.3.6. Literal Code Point Sequences .......................35
+ 6.3.7. The "any" Element ..................................35
+ 6.3.8. The "start" and "end" Elements .....................35
+ 6.3.9. Example Context Rule from IDNA Specification .......36
+ 6.4. Parameterized Context or When Rules .......................37
+ 6.4.1. The "anchor" Element ...............................37
+ 6.4.2. The "look-behind" and "look-ahead" Elements ........38
+ 6.4.3. Omitting the "anchor" Element ......................40
+ 7. The "action" Element ...........................................40
+ 7.1. The "match" and "not-match" Attributes ....................41
+ 7.2. Actions with Variant Type Triggers ........................41
+ 7.2.1. The "any-variant", "all-variants", and
+ "only-variants" Attributes .........................41
+ 7.2.2. Example from Tables in the Style of RFC 3743 .......44
+ 7.3. Recommended Disposition Values ............................45
+ 7.4. Precedence ................................................45
+ 7.5. Implied Actions ...........................................45
+ 7.6. Default Actions ...........................................46
+ 8. Processing a Label against an LGR ..............................47
+ 8.1. Determining Eligibility for a Label .......................47
+ 8.1.1. Determining Eligibility Using Reflexive
+ Variant Mappings ...................................47
+ 8.2. Determining Variants for a Label ..........................48
+ 8.3. Determining a Disposition for a Label or Variant Label ....49
+ 8.4. Duplicate Variant Labels ..................................50
+ 8.5. Checking Labels for Collision .............................50
+ 9. Conversion to and from Other Formats ...........................51
+ 10. Media Type ....................................................51
+ 11. IANA Considerations ...........................................52
+ 11.1. Media Type Registration ..................................52
+ 11.2. URN Registration .........................................53
+ 11.3. Disposition Registry .....................................53
+
+
+
+
+
+Davies & Freytag Standards Track [Page 3]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ 12. Security Considerations .......................................54
+ 12.1. LGRs Are Only a Partial Remedy for Problem Space .........54
+ 12.2. Computational Expense of Complex Tables ..................54
+ 13. References ....................................................55
+ 13.1. Normative References .....................................55
+ 13.2. Informative References ...................................56
+ Appendix A. Example Tables ........................................58
+ Appendix B. How to Translate Tables Based on RFC 3743 into the
+ XML Format ............................................63
+ Appendix C. Indic Syllable Structure Example ......................68
+ C.1. Reducing Complexity .......................................70
+ Appendix D. RELAX NG Compact Schema ...............................71
+ Acknowledgements ..................................................82
+ Authors' Addresses ................................................82
+
+1. Introduction
+
+ This document specifies a method of using Extensible Markup Language
+ (XML) to describe Label Generation Rulesets (LGRs). LGRs are
+ algorithms used to determine whether, and under what conditions, a
+ given identifier label is permitted, based on the code points it
+ contains and their context. These algorithms comprise a list of
+ permissible code points, variant code point mappings, and a set of
+ rules that act on the code points and mappings. LGRs form part of an
+ administrator's policies. In deploying Internationalized Domain
+ Names (IDNs), they have also been known as IDN tables or variant
+ tables.
+
+ There are other kinds of policies relating to labels that are not
+ normally covered by LGRs and are therefore not necessarily
+ representable by the XML format described here. These include, but
+ are not limited to, policies around trademarks, or prohibition of
+ fraudulent or objectionable words.
+
+ Administrators of the zones for top-level domain registries have
+ historically published their LGRs using ASCII text or HTML. The
+ formatting of these documents has been loosely based on the format
+ used for the Language Variant Table described in [RFC3743].
+ [RFC4290] also provides a "model table format" that describes a
+ similar set of functionality. Common to these formats is that the
+ algorithms used to evaluate the data therein are implicit or
+ specified elsewhere.
+
+ Through the first decade of IDN deployment, experience has shown that
+ LGRs derived from these formats are difficult to consistently
+ implement and compare, due to their differing formats. A universal
+
+
+
+
+
+Davies & Freytag Standards Track [Page 4]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ format, such as one using a structured XML format, will assist by
+ improving machine readability, consistency, reusability, and
+ maintainability of LGRs.
+
+ When used to represent a simple list of permitted code points, the
+ format is quite straightforward. At the cost of some complexity in
+ the resulting file, it also allows for an implementation of more
+ sophisticated handling of conditional variants that reflects the
+ known requirements of current zone administrator policies.
+
+ Another feature of this format is that it allows many of the
+ algorithms to be made explicit and machine implementable. A
+ remaining small set of implicit algorithms is described in this
+ document to allow commonality in implementation.
+
+ While the predominant usage of this specification is to represent IDN
+ label policy, the format is not limited to IDN usage and may also be
+ used for describing ASCII domain name label rulesets, or other types
+ of identifier labels beyond those used for domain names.
+
+2. Design Goals
+
+ The following goals informed the design of this format:
+
+ o The format needs to be implementable in a reasonably
+ straightforward manner in software.
+
+ o The format should be able to be automatically checked for
+ formatting errors, so that common mistakes can be caught.
+
+ o An LGR needs to be able to express the set of valid code points
+ that are allowed for registration under a specific administrator's
+ policies.
+
+ o An LGR needs to be able to express computed alternatives to a
+ given identifier based on mapping relationships between code
+ points, whether one-to-one or many-to-many. These computed
+ alternatives are commonly known as "variants".
+
+ o Variant code points should be able to be tagged with explicit
+ dispositions or categories that can be used to support registry
+ policy (such as whether to allocate the computed variant or to
+ merely block it from usage or registration).
+
+ o Variants and code points must be able to be stipulated based on
+ contextual information. For example, some variants may only be
+ applicable when they follow a certain code point or when the code
+ point is displayed in a specific presentation form.
+
+
+
+Davies & Freytag Standards Track [Page 5]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ o The data contained within an LGR must be able to be interpreted
+ unambiguously, so that independent implementations that utilize
+ the contents will arrive at the same results.
+
+ o To the largest extent possible, policy rules should be able to be
+ specified in the XML format without relying on hidden or built-in
+ algorithms in implementations.
+
+ o LGRs should be suitable for comparison and reuse, such that one
+ could easily compare the contents of two or more to see the
+ differences, to merge them, and so on.
+
+ o As many existing IDN tables as practicable should be able to be
+ migrated to the LGR format with all applicable interpretation
+ logic retained.
+
+ These requirements are partly derived from reviewing the existing
+ corpus of published IDN tables, plus the requirements of ICANN's work
+ to implement an LGR for the DNS root zone [LGR-PROCEDURE]. In
+ particular, Section B of that document identifies five specific
+ requirements for an LGR methodology.
+
+ The syntax and rules in [RFC5892] and [RFC3743] were also reviewed.
+
+ It is explicitly not the goal of this format to stipulate what code
+ points should be listed in an LGR by a zone administrator. Which
+ registration policies are used for a particular zone are outside the
+ scope of this memo.
+
+3. Normative Language
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [RFC2119].
+
+4. LGR Format
+
+ An LGR is expressed as a well-formed XML document [XML] that conforms
+ to the schema defined in Appendix D.
+
+ As XML is case sensitive, an LGR must be authored with the correct
+ casing. For example, the XML element names MUST be in lowercase as
+ described in this specification, and matching of attribute values is
+ only performed in a case-sensitive manner.
+
+ A document that is not well-formed, is non-conforming, or violates
+ other constraints specified in this specification MUST be rejected.
+
+
+
+
+Davies & Freytag Standards Track [Page 6]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+4.1. Namespace
+
+ The XML Namespace URI is "urn:ietf:params:xml:ns:lgr-1.0".
+
+ See Section 11.2 for more information.
+
+4.2. Basic Structure
+
+ The basic XML framework of the document is as follows:
+
+ <?xml version="1.0"?>
+ <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
+ ...
+ </lgr>
+
+ The "lgr" element contains up to three sub-elements or sections.
+ First is an optional "meta" element that contains all metadata
+ associated with the LGR, such as its authorship, what it is used for,
+ implementation notes, and references. This is followed by a required
+ "data" element that contains the substantive code point data.
+ Finally, an optional "rules" element contains information on rules
+ for evaluating labels, if any, along with "action" elements providing
+ for the disposition of labels and computed variant labels.
+
+ <?xml version="1.0"?>
+ <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
+ <meta>
+ ...
+ </meta>
+ <data>
+ ...
+ </data>
+ <rules>
+ ...
+ </rules>
+ </lgr>
+
+ A document MUST contain exactly one "lgr" element. Each "lgr"
+ element MUST contain zero or one "meta" element, exactly one "data"
+ element, and zero or one "rules" element; and these three elements
+ MUST be in that order.
+
+ Some elements that are direct or nested child elements of the "rules"
+ element MUST be placed in a specific relative order to other elements
+ for the LGR to be valid. An LGR that violates these constraints MUST
+ be rejected. In other cases, changing the ordering would result in a
+ valid, but different, specification.
+
+
+
+
+Davies & Freytag Standards Track [Page 7]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ In the following descriptions, required, non-repeating elements or
+ attributes are generally not called out explicitly, in contrast to
+ "OPTIONAL" ones, or those that "MAY" be repeated. For attributes
+ that take lists as values, the elements MUST be space-separated.
+
+4.3. Metadata
+
+ The "meta" element expresses metadata associated with the LGR, and
+ the element SHOULD be included so that the associated metadata are
+ available as part of the LGR and cannot become disassociated. The
+ following subsections describe elements that may appear within the
+ "meta" element.
+
+ The "meta" element can be used to identify the author or relevant
+ contact person, explain the intended usage of the LGR, and provide
+ implementation notes as well as references. Detailed metadata allow
+ the LGR document to become self-documenting -- for example, if
+ rendered in a human-readable format by an appropriate tool.
+
+ Providing metadata pertaining to the date and version of the LGR is
+ particularly encouraged to make it easier for interoperating
+ consumers to ensure that they are using the correct LGR.
+
+ With the exception of the "unicode-version" element, the data
+ contained within is not required by software consuming the LGR in
+ order to calculate valid labels or to calculate variants. If
+ present, the "unicode-version" element MUST be used by a consumer of
+ the table to identify that it has the correct Unicode property data
+ to perform operations on the table. This ensures that possible
+ differences in code point properties between editions of the Unicode
+ Standard do not impact the product of calculations utilizing an LGR.
+
+4.3.1. The "version" Element
+
+ The "version" element is OPTIONAL. It is used to uniquely
+ identify each version of the LGR. No specific format is required,
+ but it is RECOMMENDED that it be the decimal representation of a
+ single positive integer, which is incremented with each revision of
+ the file.
+
+ An example of a typical first edition of a document:
+
+ <version>1</version>
+
+ The "version" element may have an OPTIONAL "comment" attribute.
+
+ <version comment="draft">1</version>
+
+
+
+
+Davies & Freytag Standards Track [Page 8]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+4.3.2. The "date" Element
+
+ The OPTIONAL "date" element is used to identify the date the LGR was
+ posted. The contents of this element MUST be a valid ISO 8601
+ "full-date" string as described in [RFC3339].
+
+ Example of a date:
+
+ <date>2009-11-01</date>
+
+4.3.3. The "language" Element
+
+ Each OPTIONAL "language" element identifies a language or script for
+ which the LGR is intended. The value of the "language" element MUST
+ be a valid language tag as described in [RFC5646]. The tag may refer
+ to a script plus undefined language if the LGR is not intended for a
+ specific language.
+
+ Example of an LGR for the English language:
+
+ <language>en</language>
+
+ If the LGR applies to a script rather than a specific language, the
+ "und" language tag SHOULD be used followed by the relevant script
+ subtag from [RFC5646]. For example, for a Cyrillic script LGR:
+
+ <language>und-Cyrl</language>
+
+ If the LGR covers a set of multiple languages or scripts, the
+ "language" element MAY be repeated. However, for cases of a
+ script-specific LGR exhibiting insignificant admixture of code points
+ from other scripts, it is RECOMMENDED to use a single "language"
+ element identifying the predominant script. In the exceptional case
+ of a multi-script LGR where no script is predominant, use Zyyy
+ (Common):
+
+ <language>und-Zyyy</language>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 9]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+4.3.4. The "scope" Element
+
+ This OPTIONAL element refers to a scope, such as a domain, to which
+ this policy is applied. The "type" attribute specifies the type of
+ scope being defined. A type of "domain" means that the scope is a
+ domain that represents the apex of the DNS zone to which the LGR is
+ applied. For that type, the content of the "scope" element MUST be a
+ domain name written relative to the root zone, in presentation format
+ with no trailing dot. However, in the unique case of the DNS root
+ zone, it is represented as ".".
+
+ <scope type="domain">example.com</scope>
+
+ There may be multiple "scope" tags used -- for example, to reflect a
+ list of domains to which the LGR is applied.
+
+ No other values of the "type" attribute are defined by this
+ specification; however, this specification can be used for
+ applications other than domain names. Implementers of LGRs for
+ applications other than domain names SHOULD define the scope
+ extension grammar in an IETF specification or use XML namespaces to
+ distinguish their scoping mechanism distinctly from the base LGR
+ namespace. An explanation of any custom usage of the scope in the
+ "description" element is RECOMMENDED.
+
+ <scope xmlns="http://example.com/ns/scope/1.0">
+ ... content per alternate namespace ...
+ </scope>
+
+4.3.5. The "description" Element
+
+ The "description" element is an OPTIONAL, free-form element that
+ contains any additional relevant description that is useful for the
+ user in its interpretation. Typically, this field contains
+ authorship information, as well as additional context on how the LGR
+ was formulated and how it applies, such as citations and references
+ that apply to the LGR as a whole.
+
+ This field should not be relied upon for providing instructions on
+ how to parse or utilize the data contained elsewhere in the
+ specification. Authors of tables should expect that software
+ applications that parse and use LGRs will not use the "description"
+ element to condition the application of the LGR's data and rules.
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 10]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ The element has an OPTIONAL "type" attribute, which refers to the
+ Internet media type [RFC2045] of the enclosed data. Typical types
+ would be "text/plain" or "text/html". The attribute SHOULD be a
+ valid media type. If supplied, it will be assumed that the contents
+ are of that media type. If the description lacks a "type" value, it
+ will be assumed to be plain text ("text/plain").
+
+4.3.6. The "validity-start" and "validity-end" Elements
+
+ The "validity-start" and "validity-end" elements are OPTIONAL
+ elements that describe the time period from which the contents of the
+ LGR become valid (are used in registry policy) and time when the
+ contents of the LGR cease to be used, respectively.
+
+ The dates MUST conform to the "full-date" format described in
+ Section 5.6 of [RFC3339].
+
+ <validity-start>2014-03-12</validity-start>
+
+4.3.7. The "unicode-version" Element
+
+ Whenever an LGR depends on character properties from a given version
+ of the Unicode Standard, the version number used in creating the LGR
+ MUST be listed in the form x.y.z, where x, y, and z are positive
+ decimal integers (see [Unicode-Versions]). If any software
+ processing the table does not have access to character property data
+ of the requisite version, it MUST NOT perform any operations relating
+ to whole-label evaluation relying on Unicode character properties
+ (Section 6.2.3).
+
+ The value of a given Unicode character property may change between
+ versions of the Unicode Character Database [UAX44], unless such
+ change has been explicitly disallowed in [Unicode-Stability]. It is
+ RECOMMENDED to only reference properties defined as stable or
+ immutable. As an alternative to referencing the property, the
+ information can be presented explicitly in the LGR.
+
+ <unicode-version>6.3.0</unicode-version>
+
+ It is not necessary to include a "unicode-version" element for LGRs
+ that do not make use of Unicode character properties; however, it is
+ RECOMMENDED.
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 11]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+4.3.8. The "references" Element
+
+ An LGR may define a list of references that are used to associate
+ various individual elements in the LGR to one or more normative
+ references. A common use for references is to annotate that code
+ points belong to an externally defined collection or standard or to
+ give normative references for rules.
+
+ References are specified in an OPTIONAL "references" element
+ containing one or more "reference" elements, each with a unique "id"
+ attribute. It is RECOMMENDED that the "id" attribute be a zero-based
+ integer; however, in addition to digits 0-9, it MAY contain uppercase
+ letters A-Z, as well as a period, hyphen, colon, or underscore. The
+ value of each "reference" element SHOULD be the citation of a
+ standard, dictionary, or other specification in any suitable format.
+ In addition to an "id" attribute, a "reference" element MAY have a
+ "comment" attribute for an optional free-form annotation.
+
+ <references>
+ <reference id="0">The Unicode Consortium. The Unicode
+ Standard, Version 8.0.0, (Mountain View, CA: The Unicode
+ Consortium, 2015. ISBN 978-1-936213-10-8)
+ http://www.unicode.org/versions/Unicode8.0.0/</reference>
+ <reference id="1">Big-5: Computer Chinese Glyph and Character
+ Code Mapping Table, Technical Report C-26, 1984</reference>
+ <reference id="2" comment="synchronized with Unicode 6.1">
+ ISO/IEC
+ 10646:2012 3rd edition</reference>
+ ...
+ </references>
+ ...
+ <data>
+ <char cp="0620" ref="0 2" />
+ ...
+ </data>
+
+ A reference is associated with an element by using its id as part of
+ an optional "ref" attribute (see Section 5.4.1). The "ref" attribute
+ may be used with many kinds of elements in the "data" or "rules"
+ sections of the LGR, most notably those defining code points,
+ variants, and rules. However, a "ref" attribute may not occur in
+ certain kinds of elements, including references to named character
+ classes or rules. See below for the description of these elements.
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 12]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+5. Code Points and Variants
+
+ The bulk of an LGR is a description of which set of code points is
+ eligible for a given label. For rulesets that perform operations
+ that result in potential variants, the code point-level relationships
+ between variants need to also be described.
+
+ The code point data is collected within the "data" element. Within
+ this element, a series of "char" and "range" elements describe
+ eligible code points or ranges of code points, respectively.
+ Collectively, these are known as the repertoire.
+
+ Discrete permissible code points or code point sequences (see
+ Section 5.1) are declared with a "char" element. Here is a minimal
+ example declaration for a single code point, with the code point
+ value given in the "cp" attribute:
+
+ <char cp="002D"/>
+
+ As described below, a full declaration for a "char" element, whether
+ or not it is used for a single code point or for a sequence (see
+ Section 5.1), may have optional child elements defining variants.
+ Both the "char" and "range" elements can take a number of optional
+ attributes for conditional inclusion, commenting, cross-referencing,
+ and character tagging, as described below.
+
+ Ranges of permissible code points may be declared with a "range"
+ element, as in this minimal example:
+
+ <range first-cp="0030" last-cp="0039"/>
+
+ The range is inclusive of the first and last code points. Any
+ additional attributes defined for a "range" element act as if applied
+ to each code point within. A "range" element has no child elements.
+
+ It is always possible to substitute a list of individually specified
+ code points for a "range" element. The reverse is not necessarily
+ the case. Whenever such a substitution is possible, it makes no
+ difference in processing the data. Tools reading or writing the LGR
+ format are free to aggregate sequences of consecutive code points of
+ the same properties into "range" elements.
+
+ Code points MUST be represented according to the standard Unicode
+ convention but without the prefix "U+": they are expressed in
+ uppercase hexadecimal and are zero-padded to a minimum of 4 digits.
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 13]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ The rationale for not allowing other encoding formats, including
+ native Unicode encoding in XML, is explored in [UAX42]. The XML
+ conventions used in this format, such as element and attribute names,
+ mirror this document where practical and reasonable to do so. It is
+ RECOMMENDED to list all "char" elements in ascending order of the
+ "cp" attribute. Not doing so makes it unnecessarily difficult for
+ authors and reviewers to check for errors, such as duplications, or
+ to review and compare against listing of code points in other
+ documents and specifications.
+
+ All "char" elements in the "data" section MUST have distinct "cp"
+ attributes. The "range" elements MUST NOT specify code point ranges
+ that overlap either another range or any single code point "char"
+ elements. An LGR that defines the same code point more than once by
+ any combination of "char" or "range" elements MUST be rejected.
+
+5.1. Sequences
+
+ A sequence of two or more code points may be specified in an LGR --
+ for example, when defining the source for n:m variant mappings.
+ Another use of sequences would be in cases when the exact sequence of
+ code points is required to occur in order for the constituent
+ elements to be eligible, such as when some code point is only
+ eligible when preceded or followed by a certain code point. The
+ following would define the eligibility of the MIDDLE DOT (U+00B7)
+ only when both preceded and followed by the LATIN SMALL LETTER L
+ (U+006C):
+
+ <char cp="006C 00B7 006C" comment="Catalan middle dot"/>
+
+ All sequences defined this way must be distinct, but sub-sequences
+ may be defined. Thus, the sequence defined here may coexist with
+ single code point definitions such as:
+
+ <char cp="006C" />
+
+ As an alternative to using sequences to define a required context, a
+ "char" or "range" element may specify a conditional context using an
+ optional "when" attribute as described below in Section 5.2. Using a
+ conditional context is more flexible because a context is not limited
+ to a specific sequence of code points. In addition, using a context
+ allows the choice of specifying either a prohibited or a required
+ context.
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 14]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+5.2. Conditional Contexts
+
+ A conditional context is specified by a rule that must be satisfied
+ (or, alternatively, must not be satisfied) for a code point in a
+ given label, often at a particular location in a label.
+
+ To specify a conditional context, either a "when" or "not-when"
+ attribute may be used. The value of each "when" or "not-when"
+ attribute is a context rule as described below in Section 6.3. This
+ rule can be a rule evaluating the whole label or a parameterized
+ context rule. The context condition is met when the rule specified
+ in the "when" attribute is matched or when the rule in the "not-when"
+ attribute fails to match. It is an error to reference a rule that is
+ not actually defined in the "rules" element.
+
+ A parameterized context rule (see Section 6.4) defines the context
+ immediately surrounding a given code point; unlike a sequence, the
+ context is not limited to a specific fixed code point but, for
+ example, may designate any member of a certain character class or a
+ code point that has a certain Unicode character property.
+
+ Given a suitable definition of a parameterized context rule named
+ "follows-virama", this example specifies that a ZERO WIDTH JOINER
+ (U+200D) is restricted to immediately follow any of several code
+ points classified as virama:
+
+ <char cp="200D" when="follows-virama" />
+
+ For a complete example, see Appendix A.
+
+ In contrast, a whole label rule (see Section 6.3) specifies a
+ condition to be met by the entire label -- for example, that it must
+ contain at least one code point from a given script anywhere in the
+ label. In the following example, no digit from either range may
+ occur in a label that mixes digits from both ranges:
+
+ <data>
+ <range first-cp="0660" last-cp="0669" not-when="mixed-digits"
+ tag="arabic-indic-digits" />
+ <range first-cp="06F0" last-cp="06F9" not-when="mixed-digits"
+ tag="extended-arabic-indic-digits" />
+ </data>
+
+ (See Section 6.3.9 for an example of the "mixed-digits" rule.)
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 15]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ The OPTIONAL "when" or "not-when" attributes are mutually exclusive.
+ They MAY be applied to both "char" and "range" elements in the "data"
+ element, including "char" elements defining sequences of code points,
+ as well as to "var" elements (see Section 5.3.5).
+
+ If a label contains one or more code points that fail to satisfy a
+ conditional context, the label is invalid (see Section 7.5). For
+ variants, the conditional context restricts the definition of the
+ variant to the case where the condition is met. Outside the
+ specified context, a variant is not defined.
+
+5.3. Variants
+
+ Most LGRs typically only determine simple code point eligibility, and
+ for them, the elements described so far would be the only ones
+ required for their "data" section. Others additionally specify a
+ mapping of code points to other code points, known as "variants".
+ What constitutes a variant code point is a matter of policy and
+ varies for each implementation. The following examples are intended
+ to demonstrate the syntax; they are not necessarily typical.
+
+5.3.1. Basic Variants
+
+ Variant code points are specified using one of more "var" elements as
+ children of a "char" element. The target mapping is specified using
+ the "cp" attribute. Other, optional attributes for the "var" element
+ are described below.
+
+ For example, to map LATIN SMALL LETTER V (U+0076) as a variant of
+ LATIN SMALL LETTER U (U+0075):
+
+ <char cp="0075">
+ <var cp="0076"/>
+ </char>
+
+ A sequence of multiple code points can be specified as a variant of a
+ single code point. For example, the sequence of LATIN SMALL LETTER O
+ (U+006F) then LATIN SMALL LETTER E (U+0065) might hypothetically be
+ specified as a variant for a LATIN SMALL LETTER O WITH DIAERESIS
+ (U+00F6) as follows:
+
+ <char cp="00F6">
+ <var cp="006F 0065"/>
+ </char>
+
+ The source and target of a variant mapping may both be sequences but
+ not ranges.
+
+
+
+
+Davies & Freytag Standards Track [Page 16]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ If the source of one mapping is a prefix sequence of the source for
+ another, both variant mappings will be considered at the same
+ location in the input label when generating permuted variant labels.
+ If poorly designed, an LGR containing such an instance of a prefix
+ relation could generate multiple instances of the same variant label
+ for the same original label, but with potentially different
+ dispositions. Any duplicate variant labels encountered MUST be
+ treated as an error (see Section 8.4).
+
+ The "var" element specifies variant mappings in only one direction,
+ even though the variant relation is usually considered symmetric;
+ that is, if A is a variant of B, then B should also be a variant of
+ A. The format requires that the inverse of the variant be given
+ explicitly to fully specify symmetric variant relations in the LGR.
+ This has the beneficial side effect of making the symmetry explicit:
+
+ <char cp="006F 0065">
+ <var cp="00F6"/>
+ </char>
+
+ Variant relations are normally not only symmetric but also
+ transitive. If A is a variant of B and B is a variant of C, then A
+ is also a variant of C. As with symmetry, these transitive relations
+ are only part of the LGR if spelled out explicitly. Implementations
+ that require an LGR to be symmetric and transitive should verify this
+ mechanically.
+
+ All variant mappings are unique. For a given "char" element, all
+ "var" elements MUST have a unique combination of "cp", "when", and
+ "not-when" attributes. It is RECOMMENDED to list the "var" elements
+ in ascending order of their target code point sequence. (For "when"
+ and "not-when" attributes, see Section 5.3.5.)
+
+5.3.2. The "type" Attribute
+
+ Variants may be tagged with an OPTIONAL "type" attribute. The value
+ of the "type" attribute may be any non-empty value not starting with
+ an underscore and not containing spaces. This value is used to
+ resolve the disposition of any variant labels created using a given
+ variant. (See Section 7.2.)
+
+ By default, the values of the "type" attribute directly describe the
+ target policy status (disposition) for a variant label that was
+ generated using a particular variant, with any variant label being
+ assigned a disposition corresponding to the most restrictive variant
+ type. Several conventional disposition values are predefined below
+ in Section 7. Whenever these values can represent the desired
+ policy, they SHOULD be used.
+
+
+
+Davies & Freytag Standards Track [Page 17]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ <char cp="767C">
+ <var cp="53D1" type="allocatable"/>
+ <var cp="5F42" type="blocked"/>
+ <var cp="9AEA" type="blocked"/>
+ <var cp="9AEE" type="blocked"/>
+ </char>
+
+ By default, if a variant label contains any instance of one of the
+ variants of type "blocked", the label would be blocked, but if it
+ contained only instances of variants to be allocated, it could be
+ allocated. See the discussion about implied actions in Section 7.6.
+
+ The XML format for the LGR makes the relation between the values of
+ the "type" attribute on variants and the resulting disposition of
+ variant labels fully explicit. See the discussion in Section 7.2.
+ Making this relation explicit allows a generalization of the "type"
+ attribute from directly reflecting dispositions to a more
+ differentiated intermediate value that is then used in the resolution
+ of label disposition. Instead of the default action of applying the
+ most restrictive disposition to the entire label, such a generalized
+ resolution can be used to achieve additional goals, such as limiting
+ the set of allocatable variant labels or implementing other policies
+ found in existing LGRs (see, for example, Appendix B).
+
+ Because variant mappings MUST be unique, it is not possible to define
+ the same variant for the same "char" element with different "type"
+ attributes (however, see Section 5.3.5).
+
+5.3.3. Null Variants
+
+ A null variant is a variant string that maps to no code point. This
+ is used when a particular code point sequence is considered
+ discretionary in the context of a whole label. To specify a null
+ variant, use an empty "cp" attribute. For example, to mark a string
+ with a ZERO WIDTH NON-JOINER (U+200C) to the same string without the
+ ZERO WIDTH NON-JOINER:
+
+ <char cp="200C">
+ <var cp=""/>
+ </char>
+
+ This is useful in expressing the intent that some code points in a
+ label are to be mapped away when generating a canonical variant of
+ the label. However, in tables that are designed to have symmetric
+ variant mappings, this could lead to combinatorial explosion if not
+ handled carefully.
+
+
+
+
+
+Davies & Freytag Standards Track [Page 18]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ The symmetric form of a null variant is expressed as follows:
+
+ <char cp="">
+ <var cp="200C" type="invalid" />
+ </char>
+
+ A "char" element with an empty "cp" attribute MUST specify at least
+ one variant mapping. It is strongly RECOMMENDED to use a type of
+ "invalid" or equivalent when defining variant mappings from null
+ sequences, so that variant mappings from null sequences are removed
+ in variant label generation (see Section 5.3.2).
+
+5.3.4. Variants with Reflexive Mapping
+
+ At first glance, there seems to be no call for adding variant
+ mappings for which source and target code points are the same -- that
+ is, for which the mapping is reflexive, or, in other words, an
+ identity mapping. Yet, such reflexive mappings occur frequently in
+ LGRs that follow [RFC3743].
+
+ Adding a "var" element allows both a type and a reference id to be
+ specified for it. While the reference id is not used in processing,
+ the type of the variant can be used to trigger actions. In permuting
+ the label to generate all possible variants, the type associated with
+ a reflexive variant mapping is applied to any of the permuted labels
+ containing the original code point.
+
+ In the following example, let's assume that the goal is to allocate
+ only those labels that contain a variant that is considered
+ "preferred" in some way. As defined in the example, the code point
+ U+3473 exists both as a variant of U+3447 and as a variant of itself
+ (reflexive mapping). Assuming an original label of "U+3473 U+3447",
+ the permuted variant "U+3473 U+3473" would consist of the reflexive
+ variant of U+3473 followed by a variant of U+3447. Given the variant
+ mappings as defined here, the types for both of the variant mappings
+ used to generate that particular permutation would have the value
+ "preferred":
+
+ <char cp="3447" ref="0">
+ <var cp="3473" type="preferred" ref="1 3" />
+ </char>
+ <char cp="3473" ref="0">
+ <var cp="3447" type="blocked" ref="1 3" />
+ <var cp="3473" type="preferred" ref="0" />
+ </char>
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 19]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ Having established the variant types in this way, a set of actions
+ could be defined that return a disposition of "allocatable" or
+ "activated" for a label consisting exclusively of variants with type
+ "preferred", for example. (For details on how to define actions
+ based on variant types, see Section 7.2.1.)
+
+ In general, using reflexive variant mappings in this manner makes it
+ possible to calculate disposition values using a uniform approach for
+ all labels, whether they consist of mapped variant code points,
+ original code points, or a mixture of both. In particular, the
+ dispositions for two otherwise identical labels may differ based on
+ which variant mappings were executed in order to generate each of
+ them. (For details on how to generate variants and evaluate
+ dispositions, see Section 8.)
+
+ Another useful convention that uses reflexive variants is described
+ below in Section 7.2.1.
+
+5.3.5. Conditional Variants
+
+ Fundamentally, variants are mappings between two sequences of code
+ points. However, in some instances, for a variant relationship to
+ exist, some context external to the code point sequence must also be
+ considered. For example, a positional context may determine whether
+ two code point sequences are variants of each other.
+
+ An example of that are Arabic code points, which can have different
+ forms based on position, with some code points sharing forms, thus
+ making them variants in the positions corresponding to those forms.
+ Such positional context cannot be solely derived from the code point
+ by itself, as the code point would be the same for the various forms.
+
+ As described in Section 5.2, an OPTIONAL "when" or "not-when"
+ attribute may be given for any "var" element to specify required or
+ prohibited contextual conditions under which the variant is defined.
+
+ Assuming that the "rules" element contains suitably defined rules for
+ "arabic-isolated" and "arabic-final", the following example shows how
+ to mark ARABIC LETTER ALEF WITH WAVY HAMZA BELOW (U+0673) as a
+ variant of ARABIC LETTER ALEF WITH HAMZA BELOW (U+0625), but only
+ when it appears in its isolated or final forms:
+
+ <char cp="0625">
+ <var cp="0673" when="arabic-isolated"/>
+ <var cp="0673" when="arabic-final"/>
+ </char>
+
+
+
+
+
+Davies & Freytag Standards Track [Page 20]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ While a "var" element MUST NOT contain multiple conditions (it is
+ only allowed a single "when" or "not-when" attribute), multiple "var"
+ elements using the same mapping MAY be specified with different
+ "when" or "not-when" attributes. The combination of mapping and
+ conditional context defines a unique variant.
+
+ For each variant label, care must be taken to ensure that at most one
+ of the contextual conditions is met for variants with the same
+ mapping; otherwise, duplicate variant labels would be created for the
+ same input label. Any such duplicate variant labels MUST be treated
+ as an error; see Section 8.4.
+
+ Two contexts may be complementary, as in the following example, which
+ shows ARABIC LETTER TEH MARBUTA (U+0629) as a variant of ARABIC
+ LETTER HEH (U+0647), but with two different types.
+
+ <char cp="0647" >
+ <var cp="0629" not-when="arabic-final" type="blocked" />
+ <var cp="0629" when="arabic-final" type="allocatable" />
+ </char>
+
+ The intent is that a label that uses U+0629 instead of U+0647 in a
+ final position should be considered essentially the same label and,
+ therefore, allocatable to the same entity, while the same
+ substitution in a non-final position leads to labels that are
+ different, but considered confusable, so that either one, but not
+ both, should be delegatable.
+
+ For symmetry, the reverse mappings must exist and must agree in their
+ "when" or "not-when" attributes. However, symmetry does not apply to
+ the other attributes. For example, these are potential reverse
+ mappings for the above:
+
+ <char cp="0629" >
+ <var cp="0647" not-when="arabic-final" type="allocatable" />
+ <var cp="0647" when="arabic-final" type="allocatable" />
+ </char>
+
+ Here, both variants have the same "type" attribute. While it is
+ tempting to recognize that, in this instance, the "when" and
+ "not-when" attributes are complementary; therefore, between them they
+ cover every single possible context, it is strongly RECOMMENDED to
+ use the format shown in the example that makes the symmetry easily
+ verifiable by parsers and tools. (The same applies to entries
+ created for transitivity.)
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 21]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ Arabic is an example of a script for which such conditional variants
+ have been implemented based on the joining contexts for Arabic code
+ points. The mechanism defined here supports other forms of
+ conditional variants that may be required by other scripts.
+
+5.4. Annotations
+
+ Two attributes, the "ref" and "comment" attributes, can be used to
+ annotate individual elements in the LGR. They are ignored in
+ machine-processing of the LGR. The "ref" attribute is intended for
+ formal annotations and the "comment" attribute for free-form
+ annotations. The latter can be applied more widely.
+
+5.4.1. The "ref" Attribute
+
+ Reference information MAY optionally be specified by a "ref"
+ attribute consisting of a space-delimited sequence of reference
+ identifiers (see Section 4.3.8).
+
+ <char cp="5220" ref="0">
+ <var cp="5220" ref="5"/>
+ <var cp="522A" ref="2 3"/>
+ </char>
+
+ This facility is typically used to give source information for code
+ points or variant relations. This information is ignored when
+ machine-processing an LGR. If applied to a range, the "ref"
+ attribute applies to every code point in the range. All reference
+ identifiers MUST be from the set declared in the "references" element
+ (see Section 4.3.8). It is an error to repeat a reference identifier
+ in the same "ref" attribute. It is RECOMMENDED that identifiers be
+ listed in ascending order.
+
+ In addition to "char", "range", and "var" elements in the "data"
+ section, a "ref" attribute may be present for a number of element
+ types contained in the "rules" element as described below: actions
+ and literals ("char" inside a rule), as well as for definitions of
+ rules and classes, but not for references to named character classes
+ or rules using the "by-ref" attribute defined below. (The use of the
+ "by-ref" and "ref" attributes is mutually exclusive.) None of the
+ elements in the metadata take a "ref" attribute; to provide
+ additional information, use the "description" element instead.
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 22]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+5.4.2. The "comment" Attribute
+
+ Any "char", "range", or "variant" element in the "data" section may
+ contain an OPTIONAL "comment" attribute. The contents of a "comment"
+ attribute are free-form plain text. Comments are ignored in machine
+ processing of the table. "comment" attributes MAY also be placed on
+ all elements in the "rules" section of the document, such as actions
+ and match operators, as well as definitions of classes and rules, but
+ not on child elements of the "class" element. Finally, in the
+ metadata, only the "version" and "reference" elements MAY have
+ "comment" attributes (to match the syntax in [RFC3743]).
+
+5.5. Code Point Tagging
+
+ Typically, LGRs are used to explicitly designate allowable code
+ points, where any label that contains a code point not explicitly
+ listed in the LGR is considered an ineligible label according to the
+ ruleset.
+
+ For more-complex registry rules, there may be a need to discern one
+ or more subsets of code points. This can be accomplished by applying
+ an OPTIONAL "tag" attribute to "char" or "range" elements that are
+ child elements of the "data" element. By collecting code points that
+ share the same tag value, character classes may be defined (see
+ Section 6.2.2) that can then be used in parameterized context or
+ whole label rules (see Section 6.3.2).
+
+ Each "tag" attribute MAY contain multiple values separated by
+ white space. A tag value is an identifier that may also include
+ certain punctuation marks, such as a colon. Formally, it MUST
+ correspond to the XML 1.0 Nmtoken (Name token) production (see [XML]
+ Section 2.3). It is an error to duplicate a value within the same
+ "tag" attribute. A "tag" attribute for a "range" element applies to
+ all code points in the range. Because code point sequences are not
+ proper members of a set of code points, a "tag" attribute MUST NOT be
+ present in a "char" element defining a code point sequence.
+
+6. Whole Label and Context Evaluation
+
+6.1. Basic Concepts
+
+ The "rules" element contains the specification of both context-based
+ and whole label rules. Collectively, these are known as Whole Label
+ Evaluation (WLE) rules (Section 6.3). The "rules" element also
+ contains the character classes (Section 6.2) that they depend on, and
+ any actions (Section 7) that assign dispositions to labels based on
+ rules or variant mappings.
+
+
+
+
+Davies & Freytag Standards Track [Page 23]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ A whole label rule is applied to the whole label. It is used to
+ validate both original labels and any variant labels computed
+ from them.
+
+ A rule implementing a conditional context as discussed in Section 5.2
+ does not necessarily apply to the whole label but may be specific to
+ the context around a single code point or code point sequence.
+ Certain code points in a label sometimes need to satisfy
+ context-based rules -- for example, for the label to be considered
+ valid, or to satisfy the context for a variant mapping (see the
+ description of the "when" attribute in Section 6.4).
+
+ For example, if a rule is referenced in the "when" attribute of a
+ variant mapping, it is used to describe the conditional context under
+ which the particular variant mapping is defined to exist.
+
+ Each rule is defined in a "rule" element. A rule may contain the
+ following as child elements:
+
+ o literal code points or code point sequences
+
+ o character classes, which define sets of code points to be used for
+ context comparisons
+
+ o context operators, which define when character classes and
+ literals may appear
+
+ o nested rules, whether defined in place or invoked by reference
+
+ Collectively, these are called "match operators" and are listed in
+ Section 6.3.2. An LGR containing rules or match operators that
+
+ 1. are incorrectly defined or nested,
+
+ 2. have invalid attributes, or
+
+ 3. have invalid or undefined attribute values
+
+ MUST be rejected. Note that not all of the constraints defined here
+ are validated by the schema.
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 24]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+6.2. Character Classes
+
+ Character classes are sets of characters that often share a
+ particular property. While they function like sets in every way,
+ even supporting the usual set operators, they are called "character
+ classes" here in a nod to the use of that term in regular expression
+ syntax. (This also avoids confusion with the term "character set" in
+ the sense of character encoding.)
+
+ Character classes can be specified in several ways:
+
+ o by defining the class via matching a tag in the code point data.
+ All characters with the same "tag" attribute are part of the same
+ class;
+
+ o by referencing a value of one of the Unicode character properties
+ defined in the Unicode Character Database;
+
+ o by explicitly listing all the code points in the class; or
+
+ o by defining the class as a set combination of any number of other
+ classes.
+
+6.2.1. Declaring and Invoking Named Classes
+
+ A character class has an OPTIONAL "name" attribute consisting of a
+ single identifier not containing spaces. All names for classes must
+ be unique. If the "name" attribute is omitted, the class is
+ anonymous and exists only inside the rule or combined class where it
+ is defined. A named character class is defined independently and can
+ be referenced by name from within any rules or as part of other
+ character class definitions.
+
+ <class name="example" comment="an example class definition">
+ 0061 4E00
+ </class>
+ ...
+ <rule>
+ <class by-ref="example" />
+ </rule>
+
+ An empty "class" element with a "by-ref" attribute is a reference to
+ an existing named class. The "by-ref" attribute MUST NOT be used in
+ the same "class" element with any of these attributes: "name",
+ "from-tag", "property", or "ref". The "name" attribute MUST be
+ present if and only if the class is a direct child element of the
+ "rules" element. It is an error to reference a named class for which
+ the definition has not been seen.
+
+
+
+Davies & Freytag Standards Track [Page 25]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+6.2.2. Tag-Based Classes
+
+ The "char" or "range" elements that are child elements of the "data"
+ element MAY contain a "tag" attribute that consists of one or more
+ space-separated tag values; for example:
+
+ <char cp="0061" tag="letter lower"/>
+ <char cp="4E00" tag="letter"/>
+
+ This defines two tags for use with code point U+0061, the tag
+ "letter" and the tag "lower". Use
+
+ <class name="letter" from-tag="letter" />
+ <class name="lower" from-tag="lower" />
+
+ to define two named character classes, "letter" and "lower",
+ containing all code points with the respective tags, the first with
+ 0061 and 4E00 as elements, and the latter with 0061 but not 4E00 as
+ an element. The "name" attribute may be omitted for an anonymous
+ in-place definition of a nested, tag-based class.
+
+ Tag values are typically identifiers, with the addition of a few
+ punctuation symbols, such as a colon. Formally, they MUST correspond
+ to the XML 1.0 Nmtoken production. While a "tag" attribute may
+ contain a list of tag values, the "from-tag" attribute MUST always
+ contain a single tag value.
+
+ If the document contains no "char" or "range" elements with a
+ corresponding tag, the character class represents the empty set.
+ This is valid, to allow a common "rules" element to be shared across
+ files. However, it is RECOMMENDED that implementations allow for a
+ warning to ensure that referring to an undefined tag in this way is
+ intentional.
+
+6.2.3. Unicode Property-Based Classes
+
+ A class is defined in terms of Unicode properties by giving the
+ Unicode property alias and the property value or property value
+ alias, separated by a colon.
+
+ <class name="virama" property="ccc:9" />
+
+ The example above selects all code points for which the Unicode
+ Canonical Combining Class (ccc) value is 9. This value of the ccc is
+ assigned to all code points that encode viramas.
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 26]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ Unicode property values MUST be designated via a composite of the
+ attribute name and value as defined for the property value in
+ [UAX42], separated by a colon. Loose matching of property values and
+ names as described in [UAX44] is not appropriate for an XML schema
+ and is not supported; it is likewise not supported in the XML
+ representation [UAX42] of the Unicode Character Database itself.
+
+ A property-based class MAY be anonymous, or, when defined as an
+ immediate child of the "rules" element, it MAY be named to relate a
+ formal property definition to its usage, such as the use of the value
+ 9 for ccc to designate a virama (or halant) in various scripts.
+
+ Unicode properties may, in principle, change between versions of the
+ Unicode Standard. However, the values assigned for a given version
+ are fixed. If Unicode properties are used, a Unicode version MUST be
+ declared in the "unicode-version" element in the header. (Note: Some
+ Unicode properties are by definition stable across versions and do
+ not change once assigned; see [Unicode-Stability].)
+
+ All implementations processing LGR files SHOULD provide support for
+ the following minimal set of Unicode properties:
+
+ o General Category (gc)
+
+ o Script (sc)
+
+ o Canonical Combining Class (ccc)
+
+ o Bidi Class (bc)
+
+ o Arabic Joining Type (jt)
+
+ o Indic Syllabic Category (InSC)
+
+ o Deprecated (Dep)
+
+ The short name for each property is given in parentheses.
+
+ If a program that is using an LGR to determine the validity of a
+ label encounters a property that it does not support, it MUST abort
+ with an error.
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 27]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+6.2.4. Explicitly Declared Classes
+
+ A class of code points may also be declared by listing all code
+ points that are members of the class. This is useful when tagging
+ cannot be used because code points are not listed individually as
+ part of the eligible set of code points for the given LGR -- for
+ example, because they only occur in code point sequences.
+
+ To define a class in terms of an explicit list of code points, use a
+ space-separated list of hexadecimal code point values:
+
+ <class name="abcd">0061 0062 0063 0064</class>
+
+ This defines a class named "abcd" containing the code points for
+ characters "a", "b", "c", and "d". The ordering of the code points
+ is not material, but it is RECOMMENDED to list them in ascending
+ order; not doing so makes it unnecessarily difficult for users to
+ detect errors such as duplicates or to compare and review these
+ classes against other specifications.
+
+ In a class definition, ranges of code points are represented by a
+ hexadecimal start and end value separated by a hyphen. The following
+ declaration is equivalent to the preceding:
+
+ <class name="abcd">0061-0064</class>
+
+ Range and code point declarations can be freely intermixed:
+
+ <class name="abcd">0061 0062-0063 0064</class>
+
+ The contents of a class differ from a repertoire in that the latter
+ MAY contain sequences as elements, while the former MUST NOT.
+ Instead, they closely resemble character classes as found in regular
+ expressions.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 28]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+6.2.5. Combined Classes
+
+ Classes may be combined using operators for set complement, union,
+ intersection, difference (elements of the first class that are not in
+ the second), and symmetric difference (elements in either class but
+ not both). Because classes fundamentally function like sets, the
+ union of several character classes is itself a class, for example.
+
+ +-------------------+----------------------------------------------+
+ | Logical Operation | Example |
+ +-------------------+----------------------------------------------+
+ | Complement | <complement><class by-ref="xxx"></complement>|
+ +-------------------+----------------------------------------------+
+ | Union | <union> |
+ | | <class by-ref="class-1"/> |
+ | | <class by-ref="class-2"/> |
+ | | <class by-ref="class-3"/> |
+ | | </union> |
+ +-------------------+----------------------------------------------+
+ | Intersection | <intersection> |
+ | | <class by-ref="class-1"/> |
+ | | <class by-ref="class-2"/> |
+ | | </intersection> |
+ +-------------------+----------------------------------------------+
+ | Difference | <difference> |
+ | | <class by-ref="class-1"/> |
+ | | <class by-ref="class-2"/> |
+ | | </difference> |
+ +-------------------+----------------------------------------------+
+ | Symmetric | <symmetric-difference> |
+ | Difference | <class by-ref="class-1"/> |
+ | | <class by-ref="class-2"/> |
+ | | </symmetric-difference> |
+ +-------------------+----------------------------------------------+
+
+ Set Operators
+
+ The elements from this table may be arbitrarily nested inside each
+ other, subject to the following restriction: a "complement" element
+ MUST contain precisely one "class" or one of the operator elements,
+ while an "intersection", "symmetric-difference", or "difference"
+ element MUST contain precisely two, and a "union" element MUST
+ contain two or more of these elements.
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 29]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ An anonymous combined class can be defined directly inside a rule or
+ any of the match operator elements that allow child elements (see
+ Section 6.3.2) by using the set combination as the outer element.
+
+ <rule>
+ <union>
+ <class by-ref="xxx"/>
+ <class by-ref="yyy"/>
+ </union>
+ </rule>
+
+ The example shows the definition of an anonymous combined class that
+ represents the union of classes "xxx" and "yyy". There is no need to
+ wrap this union inside another "class" element, and, in fact, set
+ combination elements MUST NOT be nested inside a "class" element.
+
+ Lastly, to create a named combined class that can be referenced in
+ other classes or in rules as <class by-ref="xxxyyy"/>, add a "name"
+ attribute to the set combination element -- for example,
+ <union name="xxxyyy" /> -- and place it at the top level immediately
+ below the "rules" element (see Section 6.2.1).
+
+ <rules>
+ <union name="xxxyyy">
+ <class by-ref="xxx"/>
+ <class by-ref="yyy"/>
+ </union>
+ ...
+ </rules>
+
+ Because (as for ordinary sets) a combination of classes is itself a
+ class, no matter by what combinations of set operators a combined
+ class is created, a reference to it always uses the "class" element
+ as described in Section 6.2.1. That is, a named class is always
+ referenced via an empty "class" element using the "by-ref" attribute
+ containing the name of the class to be referenced.
+
+6.3. Whole Label and Context Rules
+
+ Each rule comprises a series of matching operators that must be
+ satisfied in order to determine whether a label meets a given
+ condition. Rules may reference other rules or character classes
+ defined elsewhere in the table.
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 30]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+6.3.1. The "rule" Element
+
+ A matching rule is defined by a "rule" element, the child elements of
+ which are one of the match operators from Section 6.3.2. In
+ evaluating a rule, each child element is matched in order. "rule"
+ elements MAY be nested inside each other and inside certain match
+ operators.
+
+ A simple rule to match a label where all characters are members of
+ some class called "preferred-codepoint":
+
+ <rule name="preferred-label">
+ <start />
+ <class by-ref="preferred-codepoint" count="1+"/>
+ <end />
+ </rule>
+
+ Rules are paired with explicit and implied actions, triggering these
+ actions when a rule matches a label. For example, a simple explicit
+ action for the rule shown above would be:
+
+ <action disp="allocatable" match="preferred-label" />
+
+ The rule in this example would have the effect of setting the policy
+ disposition for a label made up entirely of preferred code points to
+ "allocatable". Explicit actions are further discussed in Section 7
+ and implicit actions in Section 7.5. Another use of rules is in
+ defining conditional contexts for code points and variants as
+ discussed in Sections 5.2 and 5.3.5.
+
+ A rule that is an immediate child element of the "rules" element MUST
+ be named using a "name" attribute containing a single identifier
+ string with no spaces. A named rule may be incorporated into another
+ rule by reference and may also be referenced by an "action" element,
+ "when" attribute, or "not-when" attribute. If the "name" attribute
+ is omitted, the rule is anonymous and MUST be nested inside another
+ rule or match operator.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 31]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+6.3.2. The Match Operators
+
+ The child elements of a rule are a series of match operators, which
+ are listed here by type and name and with a basic example or two.
+
+ +------------+-------------+------------------------------------+
+ | Type | Operator | Examples |
+ +------------+-------------+------------------------------------+
+ | logical | any | <any /> |
+ | +-------------+------------------------------------+
+ | | choice | <choice> |
+ | | | <rule by-ref="alternative1"/> |
+ | | | <rule by-ref="alternative2"/> |
+ | | | </choice> |
+ +--------------------------+------------------------------------+
+ | positional | start | <start /> |
+ | +-------------+------------------------------------+
+ | | end | <end /> |
+ +--------------------------+------------------------------------+
+ | literal | char | <char cp="0061 0062 0063" /> |
+ +--------------------------+------------------------------------+
+ | set | class | <class by-ref="class1" /> |
+ | | | <class>0061 0064-0065</class> |
+ +--------------------------+------------------------------------+
+ | group | rule | <rule by-ref="rule1" /> |
+ | | | <rule><any /></rule> |
+ +--------------------------+------------------------------------+
+ | contextual | anchor | <anchor /> |
+ | +-------------+------------------------------------+
+ | | look-ahead | <look-ahead><any /></look-ahead> |
+ | +-------------+------------------------------------+
+ | | look-behind | <look-behind><any /></look-behind> |
+ +--------------------------+------------------------------------+
+
+ Match Operators
+
+ Any element defining an anonymous class can be used as a match
+ operator, including any of the set combination operators (see
+ Section 6.2.5) as well as references to named classes.
+
+ All match operators shown as empty elements in the Examples column of
+ the table above do not support child elements of their own;
+ otherwise, match operators MAY be nested. In particular, anonymous
+ "rule" elements can be used for grouping.
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 32]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+6.3.3. The "count" Attribute
+
+ The OPTIONAL "count" attribute, when present, specifies the minimally
+ required or maximal permitted number of times a match operator is
+ used to match input. If the "count" attribute is
+
+ n the match operator matches the input exactly n times, where n is
+ 1 or greater.
+
+ n+ the match operator matches the input at least n times, where n
+ is 0 or greater.
+
+ n:m the match operator matches the input at least n times, where n
+ is 0 or greater, but matches the input up to m times in total,
+ where m > n. If m = n and n > 0, the match operator matches the
+ input exactly n times.
+
+ If there is no "count" attribute, the match operator matches the
+ input exactly once.
+
+ In matching, greedy evaluation is used in the sense defined for
+ regular expressions: beyond the required number or times, the input
+ is matched as many times as possible, but not so often as to prevent
+ a match of the remainder of the rule.
+
+ A "count" attribute MUST NOT be applied to any element that contains
+ a "name" attribute but MAY be applied to operators such as "class"
+ that declare anonymous classes (including combined classes) or invoke
+ any predefined classes by reference. The "count" attribute MUST NOT
+ be applied to any "class" element, or element defining a combined
+ class, when it is nested inside a combined class.
+
+ A "count" attribute MUST NOT be applied to match operators of type
+ "start", "end", "anchor", "look-ahead", or "look-behind" or to any
+ operators, such as "rule" or "choice", that contain a nested instance
+ of them. This limitation applies recursively and irrespective of
+ whether a "rule" element containing these nested instances is
+ declared in place or used by reference.
+
+ However, the "count" attribute MAY be applied to any other instances
+ of either an anonymous "rule" element or a "choice" element,
+ including those instances nested inside other match operators. It
+ MAY also be applied to the elements "any" and "char", when used as
+ match operators.
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 33]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+6.3.4. The "name" and "by-ref" Attributes
+
+ Like classes (see Section 6.2.1), rules declared as immediate child
+ elements of the "rules" element MUST be named using a unique "name"
+ attribute, and all other instances MUST NOT be named. Anonymous
+ rules and classes or references to named rules and classes can be
+ nested inside other match operators by reference.
+
+ To reference a named rule or class inside a rule or match operator,
+ use a "rule" or "class" element with an OPTIONAL "by-ref" attribute
+ containing the name of the referenced element. It is an error to
+ reference a rule or class for which the complete definition has not
+ been seen. In other words, it is explicitly not possible to define
+ recursive rules or class definitions. The "by-ref" attribute
+ MUST NOT appear in the same element as the "name" attribute or in an
+ element that has any child elements.
+
+ The example shows several named classes and a named rule referencing
+ some of them by name.
+
+ <class name="letter" property="gc:L"/>
+ <class name="combining-mark" property="gc:M"/>
+ <class name="digit" property="gc:Nd" />
+ <rule name="letter-grapheme">
+ <class by-ref="letter" count="1+"/>
+ <class by-ref="combining-mark" count="0+"/>
+ </rule>
+
+6.3.5. The "choice" Element
+
+ The "choice" element is used to represent a list of two or more
+ alternatives:
+
+ <rule name="ldh">
+ <choice count="1+">
+ <class by-ref="letter"/>
+ <class by-ref="digit"/>
+ <char cp="002D" comment="literal HYPHEN"/>
+ </choice>
+ </rule>
+
+ Each child element of a "choice" element represents one alternative.
+ The first matching alternative determines the match for the
+ "choice" element. To express a choice where an alternative itself
+ consists of a sequence of elements, the sequence must be wrapped in
+ an anonymous rule.
+
+
+
+
+
+Davies & Freytag Standards Track [Page 34]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+6.3.6. Literal Code Point Sequences
+
+ A literal code point sequence matches a single code point or a
+ sequence. It is defined by a "char" element, with the code point or
+ sequence to be matched given by the "cp" attribute. When used as a
+ literal, a "char" element MAY contain a "count" attribute in addition
+ to the "cp" attribute and OPTIONAL "comment" or "ref" attributes. No
+ other attributes or child elements are permitted.
+
+6.3.7. The "any" Element
+
+ The "any" element is an empty element that matches any single code
+ point. It MAY have a "count" attribute. For an example, see
+ Section 6.3.9.
+
+ Unlike a literal, the "any" element MUST NOT have a "ref" attribute.
+
+6.3.8. The "start" and "end" Elements
+
+ To match the beginning or end of a label, use the "start" or "end"
+ element. An empty label would match this rule:
+
+ <rule name="empty-label">
+ <start/>
+ <end/>
+ </rule>
+
+ Conceptually, whole label rules evaluate the label as a whole, but in
+ practice, many rules do not actually need to be specified to match
+ the entire label. For example, to express a requirement of not
+ starting a label with a digit, a rule needs to describe only the
+ initial part of a label.
+
+ This example uses the previously defined rules, together with "start"
+ and "end" elements, to define a rule that requires that an entire
+ label be well-formed. For this example, that means that it must
+ start with a letter and that it contains no leading digits or
+ combining marks nor combining marks placed on digits.
+
+ <rule name="leading-letter" >
+ <start />
+ <rule by-ref="letter-grapheme" count="1"/>
+ <choice count="0+">
+ <rule by-ref="letter-grapheme" count="0+"/>
+ <class by-ref="digit" count="0+"/>
+ </choice>
+ <end />
+ </rule>
+
+
+
+Davies & Freytag Standards Track [Page 35]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ Each "start" or "end" element occurs at most once in a rule, except
+ if nested inside a "choice" element in such a way that in matching
+ each alternative at most one occurrence of each is encountered.
+ Otherwise, the result is an error, as is any case where a "start" or
+ "end" element is not encountered as the first or last element to be
+ matched, respectively, in matching a rule. "start" and "end"
+ elements are empty elements that do not have a "count" attribute or
+ any other attribute other than "comment". It is an error for any
+ match operator enclosing a nested "start" or "end" element to have a
+ "count" attribute.
+
+6.3.9. Example Context Rule from IDNA Specification
+
+ This is an example of the WLE rule from [RFC5892] forbidding the
+ mixture of the Arabic-Indic and extended Arabic-Indic digits in the
+ same label. It is implemented as a whole label rule associated with
+ the code point ranges using the "not-when" attribute, which defines
+ an impermissible context. The example also demonstrates several
+ instances of the use of anonymous rules for grouping.
+
+ <data>
+ <range first-cp="0660" last-cp="0669" not-when="mixed-digits"
+ tag="arabic-indic-digits" />
+ <range first-cp="06F0" last-cp="06F9" not-when="mixed-digits"
+ tag="extended-arabic-indic-digits" />
+ </data>
+ <rules>
+ <rule name="mixed-digits">
+ <choice>
+ <rule>
+ <class from-tag="arabic-indic-digits"/>
+ <any count="0+"/>
+ <class from-tag="extended-arabic-indic-digits"/>
+ </rule>
+ <rule>
+ <class from-tag="extended-arabic-indic-digits"/>
+ <any count="0+"/>
+ <class from-tag="arabic-indic-digits"/>
+ </rule>
+ </choice>
+ </rule>
+ </rules>
+
+ As specified in the example, a label containing a code point from
+ either of the two digit ranges is invalid for any label matching the
+ "mixed-digits" rule, that is, any time that a code point from the
+ other range is also present. Note that invalidating the label is not
+
+
+
+
+Davies & Freytag Standards Track [Page 36]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ the same as invalidating the definition of the "range" elements; in
+ particular, the definition of the tag values does not depend on the
+ "when" attribute.
+
+6.4. Parameterized Context or When Rules
+
+ To recap: When a rule is intended to provide a context for evaluating
+ the validity of a code point or variant mapping, it is invoked by the
+ "when" or "not-when" attributes described in Section 5.2. For "char"
+ and "range" elements, an action implied by a context rule always has
+ a disposition of "invalid" whenever the rule given by the "when"
+ attribute is not matched (see Section 7.5). Conversely, a "not-when"
+ attribute results in a disposition of "invalid" whenever the rule is
+ matched. When a rule is used in this way, it is called a context or
+ "when" rule.
+
+ The example in the previous section shows a whole label rule used as
+ a context rule, essentially making the whole label the context. The
+ next sections describe several match operators that can be used to
+ provide a more specific specification of a context, allowing a
+ parameterized context rule. See Section 7 for an alternative method
+ of defining an invalid disposition for a label not matching a whole
+ label rule.
+
+6.4.1. The "anchor" Element
+
+ Such parameterized context rules are rules that contain a special
+ placeholder represented by an "anchor" element. As each When Rule is
+ evaluated, if an "anchor" element is present, it is replaced by a
+ literal corresponding to the "cp" attribute of the element containing
+ the "when" (or "not-when") attribute. The match to the "anchor"
+ element must be at the same position in the label as the code point
+ or variant mapping triggering the When Rule.
+
+ For example, the Greek lower numeral sign is invalid if not
+ immediately preceding a character in the Greek script. This is most
+ naturally addressed with a parameterized When Rule using
+ "look-ahead":
+
+ <char cp="0375" when="preceding-greek"/>
+ ...
+ <class name="greek-script" property="sc:Grek"/>
+ <rule name="preceding-greek">
+ <anchor/>
+ <look-ahead>
+ <class by-ref="greek-script"/>
+ </look-ahead>
+ </rule>
+
+
+
+Davies & Freytag Standards Track [Page 37]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ In evaluating this rule, the "anchor" element is treated as if it was
+ replaced by a literal
+
+ <char cp="0375"/>
+
+ but only the instance of U+0375 at the given position is evaluated.
+ If a label had two instances of U+0375 with the first one matching
+ the rule and the second not, then evaluating the When Rule MUST
+ succeed for the first instance and fail for the second.
+
+ Unlike other rules, rules containing an "anchor" element MUST only be
+ invoked via the "when" or "not-when" attributes on code points or
+ variants; otherwise, their "anchor" elements cannot be evaluated.
+ However, it is possible to invoke rules not containing an "anchor"
+ element from a "when" or "not-when" attribute. (See Section 6.4.3.)
+
+ The "anchor" element is an empty element, with no attributes
+ permitted except "comment".
+
+6.4.2. The "look-behind" and "look-ahead" Elements
+
+ Context rules use the "look-behind" and "look-ahead" elements to
+ define context before and after the code point sequence matched by
+ the "anchor" element. If the "anchor" element is omitted, neither
+ the "look-behind" nor the "look-ahead" element may be present in
+ a rule.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 38]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ Here is an example of a rule that defines an "initial" context for an
+ Arabic code point:
+
+ <class name="transparent" property="jt:T"/>
+ <class name="right-joining" property="jt:R"/>
+ <class name="left-joining" property="jt:L"/>
+ <class name="dual-joining" property="jt:D"/>
+ <class name="non-joining" property="jt:U"/>
+ <rule name="Arabic-initial">
+ <look-behind>
+ <choice>
+ <start/>
+ <rule>
+ <class by-ref="transparent" count="0+"/>
+ <class by-ref="non-joining"/>
+ </rule>
+ </choice>
+ </look-behind>
+ <anchor/>
+ <look-ahead>
+ <class by-ref="transparent" count="0+" />
+ <choice>
+ <class by-ref="right-joining" />
+ <class by-ref="dual-joining" />
+ </choice>
+ </look-ahead>
+ </rule>
+
+ A "when" rule (or context rule) is a named rule that contains any
+ combination of "look-behind", "anchor", and "look-ahead" elements, in
+ that order. Each of these elements occurs at most once, except if
+ nested inside a "choice" element in such a way that in matching each
+ alternative at most one occurrence of each is encountered.
+ Otherwise, the result is undefined. None of these elements takes a
+ "count" attribute, nor does any enclosing match operator; otherwise,
+ the result is undefined. If a context rule contains a "look-ahead"
+ or "look-behind" element, it MUST contain an "anchor" element. If,
+ because of a "choice" element, a required anchor is not actually
+ encountered, the results are undefined.
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 39]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+6.4.3. Omitting the "anchor" Element
+
+ If the "anchor" element is omitted, the evaluation of the context
+ rule is not tied to the position of the code point or sequence
+ associated with the "when" attribute.
+
+ According to [RFC5892], the Katakana middle dot is invalid in any
+ label not containing at least one Japanese character anywhere in the
+ label. Because this requirement is independent of the position of
+ the middle dot, the rule does not require an "anchor" element.
+
+ <char cp="30FB" when="japanese-in-label"/>
+ <rule name="japanese-in-label">
+ <union>
+ <class property="sc:Hani"/>
+ <class property="sc:Kata"/>
+ <class property="sc:Hira"/>
+ </union>
+ </rule>
+
+ The Katakana middle dot is used only with Han, Katakana, or Hiragana.
+ The corresponding When Rule requires that at least one code point in
+ the label be in one of these scripts, but the position of that code
+ point is independent of the location of the middle dot; therefore, no
+ anchor is required. (Note that the Katakana middle dot itself is of
+ script Common, that is, "sc:Zyyy".)
+
+7. The "action" Element
+
+ The purpose of an action is to assign a disposition to a label in
+ response to being triggered by the label meeting a specified
+ condition. Often, the action simply results in blocking or
+ invalidating a label that does not match a rule. An example of an
+ action invalidating a label because it does not match a rule named
+ "leading-letter" is as follows:
+
+ <action disp="invalid" not-match="leading-letter"/>
+
+ If an action is to be triggered on matching a rule, a "match"
+ attribute is used instead. Actions are evaluated in the order that
+ they appear in the XML file. Once an action is triggered by a label,
+ the disposition defined in the "disp" attribute is assigned to the
+ label and no other actions are evaluated for that label.
+
+ The goal of the LGR is to identify all labels and variant labels and
+ to assign them disposition values. These dispositions are then fed
+ into a further process that ultimately implements all aspects of
+ policy. To allow this specification to be used with the widest range
+
+
+
+Davies & Freytag Standards Track [Page 40]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ of policies, the permissible values for the "disp" attribute are
+ neither defined nor restricted. Nevertheless, a set of commonly used
+ disposition values is RECOMMENDED. (See Section 7.3.)
+
+7.1. The "match" and "not-match" Attributes
+
+ An OPTIONAL "match" or "not-match" attribute specifies a rule that
+ must be matched or not matched as a condition for triggering an
+ action. Only a single rule may be named as the value of a "match" or
+ "not-match" attribute. Because rules may be composed of other rules,
+ this restriction to a single attribute value does not impose any
+ limitation on the contexts that can trigger an action.
+
+ An action MUST NOT contain both a "match" and a "not-match"
+ attribute, and the value of either attribute MUST be the name of a
+ previously defined rule; otherwise, the document MUST be rejected.
+ An action without any attributes is triggered by all labels
+ unconditionally. For a very simple LGR, the following action would
+ allocate all labels that match the repertoire:
+
+ <action disp="allocatable" />
+
+ Since rules are evaluated for all labels, whether they are the
+ original label or computed by permuting the defined and valid variant
+ mappings for the label's code points, actions based on matching or
+ not matching a rule may be triggered for both original and variant
+ labels, but the rules are not affected by the disposition attributes
+ of the variant mappings. To trigger any actions based on these
+ dispositions requires the use of additional optional attributes for
+ actions described next.
+
+7.2. Actions with Variant Type Triggers
+
+7.2.1. The "any-variant", "all-variants", and "only-variants"
+ Attributes
+
+ An action may contain one of the OPTIONAL attributes "any-variant",
+ "all-variants", or "only-variants" defining triggers based on variant
+ types. The permitted value for these attributes consists of one or
+ more variant type values, separated by spaces. These MAY include
+ type values that are not used in any "var" element in the LGR. When
+ a variant label is generated, these variant type values are compared
+ to the set of type values on the variant mappings used to generate
+ the particular variant label (see Section 8).
+
+ Any single match may trigger an action that contains an "any-variant"
+ attribute, while for an "all-variants" or "only-variants" attribute,
+ the variant type for all variant code points must match one or
+
+
+
+Davies & Freytag Standards Track [Page 41]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ several of the type values specified in the attribute to trigger the
+ action. There is no requirement that the entire list of variant type
+ values be matched, as long as all variant code points match at least
+ one of the values.
+
+ An "only-variants" attribute will trigger the action only if all code
+ points of the variant label have variant mappings from the original
+ code points. In other words, the label contains no original code
+ points other than those with a reflexive mapping (see Section 5.3.4).
+
+ <char cp="0078" comment="x">
+ <var cp="0078" type="allocatable" comment="reflexive" />
+ <var cp="0079" type="blocked" />
+ </char>
+ <char cp="0079" comment="y">
+ <var cp="0078" type="allocatable" />
+ </char>
+ ...
+ <action disp="blocked" any-variant="blocked" />
+ <action disp="allocatable" only-variants="allocatable" />
+ <action disp="some-disp" any-variant="allocatable" />
+
+ In the example above, the label "xx" would have variant labels "xx",
+ "xy", "yx", and "yy". The first action would result in blocking any
+ variant label containing "y", because the variant mapping from "x" to
+ "y" is of type "blocked", triggering the "any-variant" condition.
+ Because in this example "x" has a reflexive variant mapping to itself
+ of type "allocatable", the original label "xx" has a reflexive
+ variant "xx" that would trigger the "only-variants" condition on the
+ second action.
+
+ A label "yy" would have the variants "xy", "yx", and "xx". Because
+ the variant mapping from "y" to "x" is of type "allocatable" and a
+ mapping from "y" to "y" is not defined, the labels "xy" and "yx"
+ trigger the "any-variant" condition on the third label. The variant
+ "xx", being generated using the mapping from "y" to "x" of type
+ "allocatable", would trigger the "only-variants" condition on the
+ section action. As there is no reflexive variant "yy", the original
+ label "yy" cannot trigger any variant type triggers. However, it
+ could still trigger an action defined as matching or not matching
+ a rule.
+
+ In each action, one variant type trigger may be present by itself or
+ in conjunction with an attribute matching or not matching a rule. If
+ variant triggers and rule-matching triggers are used together, the
+ label MUST "match" or respectively "not-match" the specified rule AND
+ satisfy the conditions on the variant type values given by the
+ "any-variant", "all-variants", or "only-variants" attribute.
+
+
+
+Davies & Freytag Standards Track [Page 42]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ A useful convention combines the "any-variant" trigger with reflexive
+ variant mappings (Section 5.3.4). This convention is used, for
+ example, when multiple LGRs are defined within the same registry and
+ for overlapping repertoire. In some cases, the delegation of a label
+ from one LGR must prohibit the delegation of another label in some
+ other LGR. This can be done using a variant of type "blocked" as in
+ this example from an Armenian LGR, where the Armenian, Latin, and
+ Cyrillic letters all look identical:
+
+ <char cp="0570" comment="ARMENIAN SMALL LETTER HO">
+ <var cp="0068" type="blocked" comment="LATIN SMALL LETTER H" />
+ <var cp="04BB" type="blocked"
+ comment="CYRILLIC SMALL LETTER SHHA" />
+ </char>
+
+ The issue is that the target code points for these two variants are
+ both outside the Armenian repertoire. By using a reflexive variant
+ with the following convention:
+
+ <char cp="0068" comment="not part of repertoire">
+ <var cp="0068" type="out-of-repertoire-var"
+ comment="reflexive mapping" />
+ <var cp="04BB" type="blocked" />
+ <var cp="0570" type="blocked" />
+ </char>
+ ...
+
+ and associating this with an action of the form:
+
+ <action disp="invalid" any-variant="out-of-repertoire-var" />
+
+ it is possible to list the symmetric and transitive variant mappings
+ in the LGR even where they involve out-of-repertoire code points. By
+ associating the action shown with the special type for these
+ reflexive mappings, any original labels containing one or more of the
+ out-of-repertoire code points are filtered out, just as if these code
+ points had not been listed in the LGR in the first place.
+ Nevertheless, they do participate in the permutation of variant
+ labels for n-repertoire labels (Armenian in the example), and these
+ permuted variants can be used to detect collisions with out-of-
+ repertoire labels (see Section 8).
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 43]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+7.2.2. Example from Tables in the Style of RFC 3743
+
+ This section gives an example of using variant type triggers,
+ combined with variants with reflexive mappings (Section 5.3.4), to
+ achieve LGRs that implement tables like those defined according to
+ [RFC3743] where the goal is to allow as variants only labels that
+ consist entirely of simplified or traditional variants, in addition
+ to the original label.
+
+ This example assumes an LGR where all variants have been given
+ suitable "type" attributes of "blocked", "simplified", "traditional",
+ or "both", similar to the ones discussed in Appendix B. Given such
+ an LGR, the following example actions evaluate the disposition for
+ the variant label:
+
+ <action disp="blocked" any-variant="blocked" />
+ <action disp="allocatable" only-variants="simplified both" />
+ <action disp="allocatable" only-variants="traditional both" />
+ <action disp="blocked" all-variants="simplified traditional" />
+ <action disp="allocatable" />
+
+ The first action matches any variant label for which at least one of
+ the code point variants is of type "blocked". The second matches any
+ variant label for which all of the code point variants are of type
+ "simplified" or "both" -- in other words, an all-simplified label.
+ The third matches any label for which all variants are of type
+ "traditional" or "both" -- that is, all traditional. These two
+ actions are not triggered by any variant labels containing some
+ original code points, unless each of those code points has a variant
+ defined with a reflexive mapping (Section 5.3.4).
+
+ The final two actions rely on the fact that actions are evaluated in
+ sequence and that the first action triggered also defines the final
+ disposition for a variant label (see Section 7.4). They further rely
+ on the assumption that the only variants with type "both" are also
+ reflexive variants.
+
+ Given these assumptions, any remaining simplified or traditional
+ variants must then be part of a mixed label and so are blocked; all
+ labels surviving to the last action are original code points only
+ (that is, the original label). The example assumes that an original
+ label may be a mixed label; if that is not the case, the disposition
+ for the last action would be set to "blocked".
+
+ There are exceptions where the assumption on reflexive mappings made
+ above does not hold, so this basic scheme needs some refinements to
+ cover all cases. For a more complete example, see Appendix B.
+
+
+
+
+Davies & Freytag Standards Track [Page 44]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+7.3. Recommended Disposition Values
+
+ The precise nature of the policy action taken in response to a
+ disposition and the name of the corresponding "disp" attributes are
+ only partially defined here. It is strongly RECOMMENDED to use the
+ following dispositions only in their conventional sense.
+
+ invalid The resulting string is not a valid label. This disposition
+ may be assigned implicitly; see Section 7.5. No variant labels
+ should be generated from a variant mapping with this type.
+
+ blocked The resulting string is a valid label but should be blocked
+ from registration. This would typically apply for a derived
+ variant that is undesirable due to having no practical use or
+ being confusingly similar to some other label.
+
+ allocatable The resulting string should be reserved for use by the
+ same operator of the origin string but not automatically
+ allocated for use.
+
+ activated The resulting string should be activated for use. (This
+ is the same as a Preferred Variant [RFC3743].)
+
+ valid The resultant string is a valid label. (This is the typical
+ default action if no dispositions are defined.)
+
+7.4. Precedence
+
+ Actions are applied in the order of their appearance in the file.
+ This defines their relative precedence. The first action triggered
+ by a label defines the disposition for that label. To define the
+ order of precedence, list the actions in the desired order. The
+ conventional order of precedence for the actions defined in
+ Section 7.3 is "invalid", "blocked", "allocatable", "activated", and
+ then "valid". This default precedence is used for the default
+ actions defined in Section 7.6.
+
+7.5. Implied Actions
+
+ The context rules on code points ("not-when" or "when" rules) carry
+ an implied action with a disposition of "invalid" (not eligible) if a
+ "when" context is not satisfied or a "not-when" context is matched,
+ respectively. These rules are evaluated at the time the code points
+ for a label or its variant labels are checked for validity (see
+ Section 8). In other words, they are evaluated before any of the
+ actions are applied, and with higher precedence. The context rules
+ for variant mappings are evaluated when variants are generated and/or
+ when variant tables are made symmetric and transitive. They have an
+
+
+
+Davies & Freytag Standards Track [Page 45]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ implied action with a disposition of "invalid", which means that a
+ putative variant mapping does not exist whenever the given context
+ matches a "not-when" rule or fails to match a "when" rule specified
+ for that mapping. The result of that disposition is that the variant
+ mapping is ignored in generating variant labels and the value is
+ therefore not accessible to trigger any explicit actions.
+
+ Note that such non-existing variant mapping is different from a
+ blocked variant, which is a variant code point mapping that exists
+ but results in a label that may not be allocated.
+
+7.6. Default Actions
+
+ If a label does not trigger any of the actions defined explicitly in
+ the LGR, the following implicitly defined default actions are
+ evaluated. They are shown below in their relative order of
+ precedence (see Section 7.4). Default actions have a lower order of
+ precedence than explicit actions (see Section 8.3).
+
+ The default actions for variant labels are defined as follows. The
+ first set is triggered based on the standard variant type values of
+ "invalid", "blocked", "allocatable", and "activated":
+
+ <action disp="invalid" any-variant="invalid"/>
+ <action disp="blocked" any-variant="blocked"/>
+ <action disp="allocatable" any-variant="allocatable"/>
+ <action disp="activated" all-variants="activated"/>
+
+ A final default action sets the disposition to "valid" for any label
+ matching the repertoire for which no other action has been triggered.
+ This "catch-all" action also matches all remaining variant labels
+ from variants that do not have a type value.
+
+ <action disp="valid" comment="Catch-all if other rules not met"/>
+
+ Conceptually, the implicitly defined default actions act just like a
+ block of "action" elements that is added (virtually) beyond the last
+ of the user-supplied actions. Any label not processed by the
+ user-supplied actions would thus be processed by the default actions
+ as if they were present in the LGR. As the last default action is a
+ "catch-all", all processing is guaranteed to end with a definite
+ disposition for the label.
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 46]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+8. Processing a Label against an LGR
+
+8.1. Determining Eligibility for a Label
+
+ In order to test a given label for membership in the LGR, a consumer
+ of the LGR must iterate through each code point within a given label
+ and test that each instance of a code point is a member of the LGR.
+ If any instance of a code point is not a member of the LGR, the label
+ shall be deemed invalid.
+
+ An individual instance of a code point is deemed a member of the LGR
+ when it is listed using a "char" element, or is part of a range
+ defined with a "range" element, and all necessary conditions in any
+ "when" or "not-when" attributes are correctly satisfied for that
+ instance.
+
+ Alternatively, an instance of a code point is also deemed a member of
+ the LGR when it forms part of a sequence that corresponds to a
+ sequence listed using a "char" element for which the "cp" attribute
+ defines a sequence, and all necessary conditions in any "when" or
+ "not-when" attributes are correctly satisfied for that instance of
+ the sequence.
+
+ In determining eligibility, at each position the longest possible
+ sequence of code points is evaluated first. If that sequence matches
+ a sequence defined in the LGR and satisfies any required context at
+ that position, the instances of its constituent code points are
+ deemed members of the LGR and evaluation proceeds with the next code
+ point following the sequence. If the sequence does not match a
+ defined sequence or does not satisfy the required context,
+ successively shorter sequences are evaluated until only a single code
+ point remains. The eligibility of that code point is determined as
+ described above for an individual code point instance.
+
+ A label must also not trigger any action that results in a
+ disposition of "invalid"; otherwise, it is deemed not eligible.
+ (This step may need to be deferred until variant code point
+ dispositions have been determined.)
+
+8.1.1. Determining Eligibility Using Reflexive Variant Mappings
+
+ For LGRs that contain reflexive variant mappings (defined in
+ Section 5.3.4), the final evaluation of eligibility for the label
+ must be deferred until variants are generated. In essence, LGRs that
+ use this feature treat the original label as the (identity) variant
+ of itself. For such LGRs, the ordinary determination of eligibility
+ described here is but a first step that generally excludes only a
+ subset of invalid labels.
+
+
+
+Davies & Freytag Standards Track [Page 47]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ To further check the validity of a label with reflexive mappings, it
+ is not necessary to generate all variant labels. Only a single
+ variant needs to be created, where any reflexive variants are applied
+ for each code point, and the label disposition is evaluated (as
+ described in Section 8.3). A disposition of "invalid" results in the
+ label being not eligible. (In the exceptional case where context
+ rules are present on reflexive mappings, multiple reflexive variants
+ may be defined, but for each original label, at most one of these can
+ be valid at each code position. However, see Section 8.4.)
+
+8.2. Determining Variants for a Label
+
+ For a given eligible label, the set of variant labels is deemed to
+ consist of each possible permutation of original code points and
+ substituted code points or sequences defined in "var" elements,
+ whereby all "when" and "not-when" attributes are correctly satisfied
+ for each "char" or "var" element in the given permutation and all
+ applicable whole label rules are satisfied as follows:
+
+ 1. Create each possible permutation of a label by substituting each
+ code point or code point sequence in turn by any defined variant
+ mapping (including any reflexive mappings).
+
+ 2. Apply variant mappings with "when" or "not-when" attributes only
+ if the conditions are satisfied; otherwise, they are not defined.
+
+ 3. Record each of the "type" values on the variant mappings used in
+ creating a given variant label in a disposition set; for any
+ unmapped code point, record the "type" value of any reflexive
+ variant (see Section 5.3.4).
+
+ 4. Determine the disposition for each variant label per Section 8.3.
+
+ 5. If the disposition is "invalid", remove the label from the set.
+
+ 6. If final evaluation of the disposition for the unpermuted label
+ per Section 8.3 results in a disposition of "invalid", remove all
+ associated variant labels from the set.
+
+ The number of potential permutations can be very large. In practice,
+ implementations would use suitable optimizations to avoid having to
+ actually create all permutations (see Section 8.5).
+
+ In determining the permuted set of variant labels in step (1) above,
+ all eligible partitions into sequences must be evaluated. A label
+ "ab" that matches a sequence "ab" defined in the LGR but also matches
+
+
+
+
+
+Davies & Freytag Standards Track [Page 48]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ the sequence of individual code points "a" and "b" (both defined in
+ the LGR) must be permuted using any defined variant mappings for both
+ the sequence "ab" and the code points "a" and "b" individually.
+
+8.3. Determining a Disposition for a Label or Variant Label
+
+ For a given label (variant or original), its disposition is
+ determined by evaluating, in order of their appearance, all actions
+ for which the label or variant label satisfies the conditions.
+
+ 1. For any label that contains code points or sequences not defined
+ in the repertoire, or does not satisfy the context rules on all
+ of its code points and variants, the disposition is "invalid".
+
+ 2. For all other labels, the disposition is given by the value of
+ the "disp" attribute for the first action triggered by the label.
+ An action is triggered if all of the following are true:
+
+ * the label matches the whole label rule given in the "match"
+ attribute for that action;
+
+ * the label does not match the whole label rule given in the
+ "not-match" attribute for that action;
+
+ * any of the recorded variant types for a variant label match
+ the types given in the "any-variant" attribute for that
+ action;
+
+ * all of the recorded variant types for a variant label match
+ the types given in the "all-variants" or "only-variants"
+ attribute given for that action;
+
+ * in case of an "only-variants" attribute, the label contains
+ only code points that are the target of applied variant
+ mappings;
+
+ or
+
+ * the action does not contain any "match", "not-match",
+ "any-variant", "all-variants", or "only-variants" attributes:
+ catch-all.
+
+ 3. For any remaining variant label, assign the variant label the
+ disposition using the default actions defined in Section 7.6.
+ For this step, variant types outside the predefined recommended
+ set (see Section 7.3) are ignored.
+
+ 4. For any remaining label, set the disposition to "valid".
+
+
+
+Davies & Freytag Standards Track [Page 49]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+8.4. Duplicate Variant Labels
+
+ For a poorly designed LGR, it is possible to generate duplicate
+ variant labels from the same input label, but with different, and
+ potentially conflicting, dispositions. Implementations MUST treat
+ any duplicate variant labels encountered as an error, irrespective of
+ their dispositions.
+
+ This situation can arise in two ways. One is described in
+ Section 5.3.5 and involves defining the same variant mapping with two
+ context rules that are formally distinct but nevertheless overlap so
+ that they are not mutually exclusive for the same label.
+
+ The other case involves variants defined for sequences, where one
+ sequence is a prefix of another (see Section 5.3.1). The following
+ shows such an example resulting in conflicting reflexive variants:
+
+ <char cp="0061">
+ <var cp="0061" type="allocatable"/>
+ </char>
+ <char cp="0062"/>
+ <char cp="0061 0062">
+ <var cp="0061 0062" type="blocked"/>
+ </char>
+
+ A label "ab" would generate the variant labels "{a}{b}" and "{ab}"
+ where the curly braces show the sequence boundaries as they were
+ applied during variant mapping. The result is a duplicate variant
+ label "ab", one based on a variant of type "allocatable" plus an
+ original code point "b" that has no variant, and another one based on
+ a single variant of type "blocked", thus creating two variant labels
+ with conflicting dispositions.
+
+ In the general case, it is difficult to impossible to prove by
+ mechanical inspection of the LGR that duplicate variant labels will
+ never occur, so implementations have to be prepared to detect this
+ error during variant label generation. The condition is easily
+ avoided by careful design of context rules and special attention to
+ the relation among code point sequences with variants.
+
+8.5. Checking Labels for Collision
+
+ The obvious method for checking for collision between labels is to
+ generate the fully permuted set of variants for one of them and see
+ whether it contains the other label as a member. As discussed above,
+ this can be prohibitive and is not necessary.
+
+
+
+
+
+Davies & Freytag Standards Track [Page 50]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ Because of symmetry and transitivity, all variant mappings form
+ disjoint sets. In each of these sets, the source and target of each
+ mapping are also variants of the sources and targets of all the other
+ mappings. However, members of two different sets are never variants
+ of each other.
+
+ If two labels have code points at the same position that are members
+ of two different variant mapping sets, any variant labels of one
+ cannot be variant labels of the other: the sets of their variant
+ labels are likewise disjoint. Instead of generating all permutations
+ to compare all possible variants, it is enough to find out whether
+ code points at the same position belong to the same variant set
+ or not.
+
+ For that, it is sufficient to substitute an "index" mapping that
+ identifies the set. This index mapping could be, for example, the
+ variant mapping for which the target code point (or sequence) comes
+ first in some sorting order. This index mapping would, in effect,
+ identify the set of variant mappings for that position.
+
+ To check for collision then means generating a single variant label
+ from the original by substituting the respective "index" value for
+ each code point. This results in an "index label". Two labels
+ collide whenever the index labels for them are the same.
+
+9. Conversion to and from Other Formats
+
+ Both [RFC3743] and [RFC4290] provide different grammars for IDN
+ tables. The formats in those documents are unable to fully support
+ the increased requirements of contemporary IDN variant policies.
+
+ This specification is a superset of functionality provided by the
+ older IDN table formats; thus, any table expressed in those formats
+ can be expressed in this new format. Automated conversion can be
+ conducted between tables conformant with the grammar specified in
+ each document.
+
+ For notes on how to translate a table in the style of RFC 3743, see
+ Appendix B.
+
+10. Media Type
+
+ Well-formed LGRs that comply with this specification SHOULD be
+ transmitted with a media type of "application/lgr+xml". This media
+ type will signal to an LGR-aware client that the content is designed
+ to be interpreted as an LGR.
+
+
+
+
+
+Davies & Freytag Standards Track [Page 51]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+11. IANA Considerations
+
+ IANA has completed the following actions:
+
+11.1. Media Type Registration
+
+ The media type "application/lgr+xml" has been registered to denote
+ transmission of LGRs that are compliant with this specification, in
+ accordance with [RFC6838].
+
+ Type name: application
+
+ Subtype name: lgr+xml
+
+ Required parameters: N/A
+
+ Optional parameters: charset (as for application/xml per [RFC7303])
+
+ Security considerations: See the security considerations for
+ application/xml in [RFC7303] and the specific security
+ considerations for Label Generation Rulesets (LGRs) in RFC 7940
+
+ Interoperability considerations: As for application/xml per
+ [RFC7303]
+
+ Published specification: See RFC 7940
+
+ Applications that use this media type: Software using LGRs for
+ international identifiers, such as IDNs, including registry
+ applications and client validators.
+
+ Additional information:
+
+ Deprecated alias names for this type: N/A
+
+ Magic number(s): N/A
+
+ File extension(s): .lgr
+
+ Macintosh file type code(s): N/A
+
+ Person & email address to contact for further information:
+
+ Kim Davies <kim.davies@icann.org>
+
+ Asmus Freytag <asmus@unicode.org>
+
+ Intended usage: COMMON
+
+
+
+Davies & Freytag Standards Track [Page 52]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ Restrictions on usage: N/A
+
+ Author:
+
+ Kim Davies <kim.davies@icann.org>
+
+ Asmus Freytag <asmus@unicode.org>
+
+ Change controller: IESG
+
+ Provisional registration? (standards tree only): No
+
+11.2. URN Registration
+
+ This specification uses a URN to describe the XML namespace, in
+ accordance with [RFC3688].
+
+ URI: urn:ietf:params:xml:ns:lgr-1.0
+
+ Registrant Contact: See the Authors of this document.
+
+ XML: None.
+
+11.3. Disposition Registry
+
+ This document establishes a vocabulary of "Label Generation Ruleset
+ Dispositions", which has been reflected as a new IANA registry. This
+ registry is divided into two subregistries:
+
+ o Standard Dispositions - This registry lists dispositions that have
+ been defined in published specifications, i.e., the eligibility
+ for such registrations is "Specification Required" [RFC5226]. The
+ initial set of registrations are the five dispositions in this
+ document described in Section 7.3.
+
+ o Private Dispositions - This registry lists dispositions that have
+ been registered "First Come First Served" [RFC5226] by third
+ parties with the IANA. Such dispositions must take the form
+ "entity:disposition" where the entity is a domain name that
+ uniquely identifies the private user of the namespace. For
+ example, "example.org:reserved" could be a private extension used
+ by the example organization to denote a disposition relating to
+ reserved labels. These extensions are not intended to be
+ interoperable, but registration is designed to minimize potential
+ conflicts. It is strongly recommended that any new dispositions
+ that require interoperability and have applicability beyond a
+ single organization be defined as Standard Dispositions.
+
+
+
+
+Davies & Freytag Standards Track [Page 53]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ In order to distinguish them from Private Dispositions, Standard
+ Dispositions MUST NOT contain the ":" character. All disposition
+ names shall be in lowercase ASCII.
+
+ The IANA registry provides data on the name of the disposition, the
+ intended purposes, and the registrant or defining specification for
+ the disposition.
+
+12. Security Considerations
+
+12.1. LGRs Are Only a Partial Remedy for Problem Space
+
+ Substantially unrestricted use of non-ASCII characters in security-
+ relevant identifiers such as domain name labels may cause user
+ confusion and invite various types of attacks. In many languages, in
+ particular those using complex or large scripts, an attacker has an
+ opportunity to divert or confuse users as a result of different code
+ points with identical appearance or similar semantics.
+
+ The use of an LGR provides a partial remedy for these risks by
+ supplying a framework for prohibiting inappropriate code points or
+ sequences from being registered at all and for permitting "variant"
+ code points to be grouped together so that labels containing them may
+ be mutually exclusive or registered only to the same owner.
+
+ In addition, by being fully machine processable the format may enable
+ automated checks for known weaknesses in label generation rules.
+ However, the use of this format, or compliance with this
+ specification, by itself does not ensure that the LGRs expressed in
+ this format are free of risk. Additional approaches may be
+ considered, depending on the acceptable trade-off between flexibility
+ and risk for a given application. One method of managing risk may
+ involve a case-by-case evaluation of a proposed label in context with
+ already-registered labels -- for example, when reviewing labels for
+ their degree of visual confusability.
+
+12.2. Computational Expense of Complex Tables
+
+ A naive implementation attempting to generate all variant labels for
+ a given label could lead to the possibility of exhausting the
+ resources on the machine running the LGR processor, potentially
+ causing denial-of-service consequences. For many operations,
+ brute-force generation can be avoided by optimization, and if needed,
+ the number of permuted labels can be estimated more cheaply ahead
+ of time.
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 54]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ The implementation of WLE rules, using certain backtracking
+ algorithms, can take exponential time for pathological rules or
+ labels and exhaust stack resources. This can be mitigated by
+ proper implementation and enforcing the restrictions on permissible
+ label length.
+
+13. References
+
+13.1. Normative References
+
+ [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
+ Extensions (MIME) Part One: Format of Internet Message
+ Bodies", RFC 2045, DOI 10.17487/RFC2045, November 1996,
+ <http://www.rfc-editor.org/info/rfc2045>.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <http://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet:
+ Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002,
+ <http://www.rfc-editor.org/info/rfc3339>.
+
+ [RFC5646] Phillips, A., Ed., and M. Davis, Ed., "Tags for
+ Identifying Languages", BCP 47, RFC 5646,
+ DOI 10.17487/RFC5646, September 2009,
+ <http://www.rfc-editor.org/info/rfc5646>.
+
+ [UAX42] The Unicode Consortium, "Unicode Character Database in
+ XML", May 2016, <http://unicode.org/reports/tr42/>.
+
+ [Unicode-Stability]
+ The Unicode Consortium, "Unicode Encoding Stability
+ Policy, Property Value Stability", April 2015,
+ <http://www.unicode.org/policies/
+ stability_policy.html#Property_Value>.
+
+ [Unicode-Versions]
+ The Unicode Consortium, "Unicode Version Numbering",
+ June 2016,
+ <http://unicode.org/versions/#Version_Numbering>.
+
+ [XML] Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and
+ F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth
+ Edition)", World Wide Web Consortium, November 2008,
+ <http://www.w3.org/TR/REC-xml/>.
+
+
+
+
+Davies & Freytag Standards Track [Page 55]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+13.2. Informative References
+
+ [ASIA-TABLE]
+ DotAsia Organisation, ".ASIA ZH IDN Language Table",
+ February 2012,
+ <http://www.dot.asia/policies/ASIA-ZH-1.2.pdf>.
+
+ [LGR-PROCEDURE]
+ Internet Corporation for Assigned Names and Numbers,
+ "Procedure to Develop and Maintain the Label Generation
+ Rules for the Root Zone in Respect of IDNA Labels",
+ December 2012, <http://www.icann.org/en/resources/idn/
+ draft-lgr-procedure-07dec12-en.pdf>.
+
+ [RELAX-NG] The Organization for the Advancement of Structured
+ Information Standards (OASIS), "RELAX NG Compact Syntax",
+ November 2002, <https://www.oasis-open.org/committees/
+ relax-ng/compact-20021121.html>.
+
+ [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688,
+ DOI 10.17487/RFC3688, January 2004,
+ <http://www.rfc-editor.org/info/rfc3688>.
+
+ [RFC3743] Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
+ Engineering Team (JET) Guidelines for Internationalized
+ Domain Names (IDN) Registration and Administration for
+ Chinese, Japanese, and Korean", RFC 3743,
+ DOI 10.17487/RFC3743, April 2004,
+ <http://www.rfc-editor.org/info/rfc3743>.
+
+ [RFC4290] Klensin, J., "Suggested Practices for Registration of
+ Internationalized Domain Names (IDN)", RFC 4290,
+ DOI 10.17487/RFC4290, December 2005,
+ <http://www.rfc-editor.org/info/rfc4290>.
+
+ [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an
+ IANA Considerations Section in RFCs", BCP 26, RFC 5226,
+ DOI 10.17487/RFC5226, May 2008,
+ <http://www.rfc-editor.org/info/rfc5226>.
+
+ [RFC5564] El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman,
+ "Linguistic Guidelines for the Use of the Arabic Language
+ in Internet Domains", RFC 5564, DOI 10.17487/RFC5564,
+ February 2010, <http://www.rfc-editor.org/info/rfc5564>.
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 56]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ [RFC5891] Klensin, J., "Internationalized Domain Names in
+ Applications (IDNA): Protocol", RFC 5891,
+ DOI 10.17487/RFC5891, August 2010,
+ <http://www.rfc-editor.org/info/rfc5891>.
+
+ [RFC5892] Faltstrom, P., Ed., "The Unicode Code Points and
+ Internationalized Domain Names for Applications (IDNA)",
+ RFC 5892, DOI 10.17487/RFC5892, August 2010,
+ <http://www.rfc-editor.org/info/rfc5892>.
+
+ [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type
+ Specifications and Registration Procedures", BCP 13,
+ RFC 6838, DOI 10.17487/RFC6838, January 2013,
+ <http://www.rfc-editor.org/info/rfc6838>.
+
+ [RFC7303] Thompson, H. and C. Lilley, "XML Media Types", RFC 7303,
+ DOI 10.17487/RFC7303, July 2014,
+ <http://www.rfc-editor.org/info/rfc7303>.
+
+ [TDIL-HINDI]
+ Technology Development for Indian Languages (TDIL)
+ Programme, "Devanagari Script Behaviour for Hindi Ver2.0",
+ <http://tdil-dc.in/index.php?option=com_download&task=show
+ resourceDetails&toolid=1625&lang=en>.
+
+ [UAX44] The Unicode Consortium, "Unicode Character Database",
+ June 2016, <http://unicode.org/reports/tr44/>.
+
+ [WLE-RULES]
+ Internet Corporation for Assigned Names and Numbers,
+ "Whole Label Evaluation (WLE) Rules", August 2016,
+ <https://community.icann.org/download/
+ attachments/43989034/WLE-Rules.pdf>.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 57]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+Appendix A. Example Tables
+
+ The following presents a minimal LGR table defining the lowercase LDH
+ (letters, digits, hyphen) repertoire and containing no rules or
+ metadata elements. Many simple LGR tables will look quite similar,
+ except that they would contain some metadata.
+
+ <?xml version="1.0" encoding="utf-8"?>
+ <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
+ <data>
+ <char cp="002D" comment="HYPHEN (-)" />
+ <range first-cp="0030" last-cp="0039"
+ comment="DIGIT ZERO - DIGIT NINE" />
+ <range first-cp="0061" last-cp="007A"
+ comment="LATIN SMALL LETTER A - LATIN SMALL LETTER Z" />
+ </data>
+ </lgr>
+
+ In practice, any LGR that includes the hyphen might also contain
+ rules invalidating any labels beginning with a hyphen, ending with a
+ hyphen, and containing consecutive hyphens in the third and fourth
+ positions as required by [RFC5891].
+
+ <?xml version="1.0" encoding="utf-8"?>
+ <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
+ <data>
+ <char cp="002D"
+ not-when="hyphen-minus-disallowed" />
+ <range first-cp="0030" last-cp="0039" />
+ <range first-cp="0061" last-cp="007A" />
+ </data>
+ <rules>
+ <rule name="hyphen-minus-disallowed"
+ comment="RFC5891 restrictions on U+002D">
+ <choice>
+ <rule comment="no leading hyphen">
+ <look-behind>
+ <start />
+ </look-behind>
+ <anchor />
+ </rule>
+ <rule comment="no trailing hyphen">
+ <anchor />
+ <look-ahead>
+ <end />
+ </look-ahead>
+ </rule>
+
+
+
+
+Davies & Freytag Standards Track [Page 58]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ <rule comment="no consecutive hyphens
+ in third and fourth positions">
+ <look-behind>
+ <start />
+ <any />
+ <any />
+ <char cp="002D" comment="hyphen-minus" />
+ </look-behind>
+ <anchor />
+ </rule>
+ </choice>
+ </rule>
+ </rules>
+ </lgr>
+
+ The following sample LGR shows a more complete collection of the
+ elements and attributes defined in this specification in a somewhat
+ typical context.
+
+ <?xml version="1.0" encoding="utf-8"?>
+
+ <!-- This example uses a large subset of the features of this
+ specification. It does not include every set operator,
+ match operator element, or action trigger attribute, their
+ use being largely parallel to the ones demonstrated. -->
+
+ <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
+ <!-- meta element with all optional elements -->
+ <meta>
+ <version comment="initial version">1</version>
+ <date>2010-01-01</date>
+ <language>sv</language>
+ <scope type="domain">example.com</scope>
+ <validity-start>2010-01-01</validity-start>
+ <validity-end>2013-12-31</validity-end>
+ <description type="text/html">
+ <![CDATA[
+ This language table was developed with the
+ <a href="http://swedish.example/">Swedish
+ examples institute</a>.
+ ]]>
+ </description>
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 59]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ <unicode-version>6.3.0</unicode-version>
+ <references>
+ <reference id="0" comment="the most recent" >The
+ Unicode Standard 9.0</reference>
+ <reference id="1" >RFC 5892</reference>
+ <reference id="2" >Big-5: Computer Chinese Glyph
+ and Character Code Mapping Table, Technical Report
+ C-26, 1984</reference>
+ </references>
+ </meta>
+
+ <!-- the "data" section describing the repertoire -->
+ <data>
+ <!-- single code point "char" element -->
+ <char cp="002D" ref="1" comment="HYPHEN" />
+
+ <!-- "range" elements for contiguous code points, with tags -->
+ <range first-cp="0030" last-cp="0039" ref="1" tag="digit" />
+ <range first-cp="0061" last-cp="007A" ref ="1" tag="letter" />
+
+ <!-- code point sequence -->
+ <char cp="006C 00B7 006C" comment="Catalan middle dot" />
+
+ <!-- alternatively, use a When Rule -->
+ <char cp="00B7" when="catalan-middle-dot" />
+
+ <!-- code point with context rule -->
+ <char cp="200D" when="joiner" ref="2" />
+
+ <!-- code points with variants -->
+ <char cp="4E16" tag="preferred" ref="0">
+ <var cp="4E17" type="blocked" ref="2" />
+ <var cp="534B" type="allocatable" ref="2" />
+ </char>
+ <char cp="4E17" ref="0">
+ <var cp="4E16" type="allocatable" ref="2" />
+ <var cp="534B" type="allocatable" ref="2" />
+ </char>
+ <char cp="534B" ref="0">
+ <var cp="4E16" type="allocatable" ref="2" />
+ <var cp="4E17" type="blocked" ref="2" />
+ </char>
+ </data>
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 60]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ <!-- Context and whole label rules -->
+ <rules>
+ <!-- Require the given code point to be between two 006C
+ code points -->
+ <rule name="catalan-middle-dot" ref="0">
+ <look-behind>
+ <char cp="006C" />
+ </look-behind>
+ <anchor />
+ <look-ahead>
+ <char cp="006C" />
+ </look-ahead>
+ </rule>
+
+ <!-- example of a context rule based on property -->
+ <class name="virama" property="ccc:9" />
+ <rule name="joiner" ref="1" >
+ <look-behind>
+ <class by-ref="virama" />
+ </look-behind>
+ <anchor />
+ </rule>
+
+ <!-- example of using set operators -->
+
+ <!-- Subtract vowels from letters to get
+ consonant, demonstrating the different
+ set notations and the difference operator -->
+ <difference name="consonants">
+ <class comment="all letters">0061-007A</class>
+ <class comment="all vowels">
+ 0061 0065 0069 006F 0075
+ </class>
+ </difference>
+
+ <!-- by using the start and end, rule matches whole label -->
+ <rule name="three-or-more-consonants">
+ <start />
+ <!-- reference the class defined by the difference,
+ and require three or more matches -->
+ <class by-ref="consonants" count="3+" />
+ <end />
+ </rule>
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 61]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ <!-- rule for negative matching -->
+ <rule name="non-preferred"
+ comment="matches any non-preferred code point">
+ <complement comment="non-preferred" >
+ <class from-tag="preferred" />
+ </complement>
+ </rule>
+
+ <!-- actions triggered by matching rules and/or
+ variant types -->
+ <action disp="invalid"
+ match="three-or-more-consonants" />
+ <action disp="blocked" any-variant="blocked" />
+ <action disp="allocatable" all-variants="allocatable"
+ not-match="non-preferred" />
+ </rules>
+ </lgr>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 62]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+Appendix B. How to Translate Tables Based on RFC 3743 into the XML
+ Format
+
+ As background, the rules specified in [RFC3743] work as follows:
+
+ 1. The original (requested) label is checked to make sure that all
+ the code points are a subset of the repertoire.
+
+ 2. If it passes the check, the original label is allocatable.
+
+ 3. Generate the all-simplified and all-traditional variant labels
+ (union of all the labels generated using all the simplified
+ variants of the code points) for allocation.
+
+ To illustrate by example, here is one of the more complicated set of
+ variants:
+
+ U+4E7E
+ U+4E81
+ U+5E72
+ U+5E79
+ U+69A6
+ U+6F27
+
+ The following shows the relevant section of the Chinese language
+ table published by the .ASIA registry [ASIA-TABLE]. Its
+ entries read:
+
+ <codepoint>;<simpl-variant(s)>;<trad-variant(s)>;<other-variant(s)>
+
+ These are the lines corresponding to the set of variants
+ listed above:
+
+ U+4E7E;U+4E7E,U+5E72;U+4E7E;U+4E81,U+5E72,U+6F27,U+5E79,U+69A6
+ U+4E81;U+5E72;U+4E7E;U+5E72,U+6F27,U+5E79,U+69A6
+ U+5E72;U+5E72;U+5E72,U+4E7E,U+5E79;U+4E7E,U+4E81,U+69A6,U+6F27
+ U+5E79;U+5E72;U+5E79;U+69A6,U+4E7E,U+4E81,U+6F27
+ U+69A6;U+5E72;U+69A6;U+5E79,U+4E7E,U+4E81,U+6F27
+ U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 63]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ The corresponding "data" section XML format would look like this:
+
+ <data>
+ <char cp="4E7E">
+ <var cp="4E7E" type="both" comment="identity" />
+ <var cp="4E81" type="blocked" />
+ <var cp="5E72" type="simp" />
+ <var cp="5E79" type="blocked" />
+ <var cp="69A6" type="blocked" />
+ <var cp="6F27" type="blocked" />
+ </char>
+ <char cp="4E81">
+ <var cp="4E7E" type="trad" />
+ <var cp="5E72" type="simp" />
+ <var cp="5E79" type="blocked" />
+ <var cp="69A6" type="blocked" />
+ <var cp="6F27" type="blocked" />
+ </char>
+ <char cp="5E72">
+ <var cp="4E7E" type="trad"/>
+ <var cp="4E81" type="blocked"/>
+ <var cp="5E72" type="both" comment="identity"/>
+ <var cp="5E79" type="trad"/>
+ <var cp="69A6" type="blocked"/>
+ <var cp="6F27" type="blocked"/>
+ </char>
+ <char cp="5E79">
+ <var cp="4E7E" type="blocked"/>
+ <var cp="4E81" type="blocked"/>
+ <var cp="5E72" type="simp"/>
+ <var cp="5E79" type="trad" comment="identity"/>
+ <var cp="69A6" type="blocked"/>
+ <var cp="6F27" type="blocked"/>
+ </char>
+ <char cp="69A6">
+ <var cp="4E7E" type="blocked"/>
+ <var cp="4E81" type="blocked"/>
+ <var cp="5E72" type="simp"/>
+ <var cp="5E79" type="blocked"/>
+ <var cp="69A6" type="trad" comment="identity"/>
+ <var cp="6F27" type="blocked"/>
+ </char>
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 64]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ <char cp="6F27">
+ <var cp="4E7E" type="simp"/>
+ <var cp="4E81" type="blocked"/>
+ <var cp="5E72" type="blocked"/>
+ <var cp="5E79" type="blocked"/>
+ <var cp="69A6" type="blocked"/>
+ <var cp="6F27" type="trad" comment="identity"/>
+ </char>
+ </data>
+
+ Here, the simplified variants have been given a type of "simp" and
+ the traditional variants one of "trad", and all other ones are given
+ "blocked".
+
+ Because some variant mappings show in more than one column, while the
+ XML format allows only a single type value, they have been given the
+ type of "both".
+
+ Note that some variant mappings map to themselves (identity); that
+ is, the mapping is reflexive (see Section 5.3.4). In creating the
+ permutation of all variant labels, these mappings have no effect,
+ other than adding a value to the variant type list for the variant
+ label containing them.
+
+ In the example so far, all of the entries with type="both" are also
+ mappings where source and target are identical. That is, they are
+ reflexive mappings as defined in Section 5.3.4.
+
+ Given a label "U+4E7E U+4E81", the following labels would be ruled
+ allocatable per [RFC3743], based on how that standard is commonly
+ implemented in domain registries:
+
+ Original label: U+4E7E U+4E81
+ Simplified label 1: U+4E7E U+5E72
+ Simplified label 2: U+5E72 U+5E72
+ Traditional label: U+4E7E U+4E7E
+
+ However, if allocatable labels were generated simply by a straight
+ permutation of all variants with type other than type="blocked" and
+ without regard to the simplified and traditional variants, we would
+ end up with an extra allocatable label of "U+5E72 U+4E7E". This
+ label is composed of both a Simplified Chinese character and a
+ Traditional Chinese code point and therefore shouldn't be
+ allocatable.
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 65]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ To more fully resolve the dispositions requires several actions to be
+ defined, as described in Section 7.2.2, that will override the
+ default actions from Section 7.6. After blocking all labels that
+ contain a variant with type "blocked", these actions will set to
+ "allocatable" labels based on the following variant types: "simp",
+ "trad", and "both". Note that these variant types do not directly
+ relate to dispositions for the variant label, but that the actions
+ will resolve them to the Standard Dispositions on labels, i.e.,
+ "blocked" and "allocatable".
+
+ To resolve label dispositions requires five actions to be defined (in
+ the "rules" section of the XML document in question); these actions
+ apply in order, and the first one triggered defines the disposition
+ for the label. The actions are as follows:
+
+ 1. Block all variant labels containing at least one blocked variant.
+
+ 2. Allocate all labels that consist entirely of variants that are
+ "simp" or "both".
+
+ 3. Also allocate all labels that are entirely "trad" or "both".
+
+ 4. Block all surviving labels containing any one of the dispositions
+ "simp" or "trad" or "both", because they are now known to be part
+ of an undesirable mixed simplified/traditional label.
+
+ 5. Allocate any remaining label; the original label would be such a
+ label.
+
+ The rules declarations would be represented as:
+
+ <rules>
+ <!--"action" elements - order defines precedence-->
+ <action disp="blocked" any-variant="blocked" />
+ <action disp="allocatable" only-variants="simp both" />
+ <action disp="allocatable" only-variants="trad both" />
+ <action disp="blocked" any-variant="simp trad" />
+ <action disp="allocatable" comment="catch-all" />
+ </rules>
+
+ Up to now, variants with type "both" have occurred only associated
+ with reflexive variant mappings. The "action" elements defined above
+ rely on the assumption that this is always the case. However,
+ consider the following set of variants:
+
+ U+62E0;U+636E;U+636E;U+64DA
+ U+636E;U+636E;U+64DA;U+62E0
+ U+64DA;U+636E;U+64DA;U+62E0
+
+
+
+Davies & Freytag Standards Track [Page 66]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ The corresponding XML would be:
+
+ <char cp="62E0">
+ <var cp="636E" type="both" comment="both, but not reflexive" />
+ <var cp="64DA" type="blocked" />
+ </char>
+ <char cp="636E">
+ <var cp="636E" type="simp" comment="reflexive, but not both" />
+ <var cp="64DA" type="trad" />
+ <var cp="62E0" type="blocked" />
+ </char>
+ <char cp="64DA">
+ <var cp="636E" type="simp" />
+ <var cp="64DA" type="trad" comment="reflexive" />
+ <var cp="62E0" type="blocked" />
+ </char>
+
+ To make such variant sets work requires a way to selectively trigger
+ an action based on whether a variant type is associated with an
+ identity or reflexive mapping, or is associated with an ordinary
+ variant mapping. This can be done by adding a prefix "r-" to the
+ "type" attribute on reflexive variant mappings. For example, the
+ "trad" for code point U+64DA in the preceding figure would become
+ "r-trad".
+
+ With the dispositions prepared in this way, only a slight
+ modification to the actions is needed to yield the correct set of
+ allocatable labels:
+
+ <action disp="blocked" any-variant="blocked" />
+ <action disp="allocatable" only-variants="simp r-simp both r-both" />
+ <action disp="allocatable" only-variants="trad r-trad both r-both" />
+ <action disp="blocked" all-variants="simp trad both" />
+ <action disp="allocatable" />
+
+ The first three actions get triggered by the same labels as before.
+
+ The fourth action blocks any label that combines an original code
+ point with any mix of ordinary variant mappings; however, no labels
+ that are a combination of only original code points (code points
+ having either no variant mappings or a reflexive mapping) would be
+ affected. These are the original labels, and they are allocated in
+ the last action.
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 67]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ Using this scheme of assigning types to ordinary and reflexive
+ variants, all tables in the style of RFC 3743 can be converted to
+ XML. By defining a set of actions as outlined above, the LGR will
+ yield the correct set of allocatable variants: all variants
+ consisting completely of variant code points preferred for simplified
+ or traditional, respectively, will be allocated, as will be the
+ original label. All other variant labels will be blocked.
+
+Appendix C. Indic Syllable Structure Example
+
+ In LGRs for Indic scripts, it may be desirable to restrict valid
+ labels to sequences of valid Indic syllables, or aksharas. This
+ appendix gives a sample set of rules designed to enforce this
+ restriction.
+
+ Below is an example of BNF for an akshara, which has been published
+ in "Devanagari Script Behaviour for Hindi" [TDIL-HINDI]. The rules
+ for other languages and scripts used in India are expected to be
+ generally similar.
+
+ For Hindi, the BNF has the form:
+
+ V[m]|{C[N]H}C[N](H|[v][m])
+
+ Where:
+
+ V (uppercase) is any independent vowel
+
+ m is any vowel modifier (Devanagari Anusvara, Visarga, and
+ Candrabindu)
+
+ C is any consonant (with inherent vowel)
+
+ N is Nukta
+
+ H is a halant (or virama)
+
+ v (lowercase) is any dependent vowel sign (matra)
+
+ {} encloses items that may be repeated one or more times
+
+ [ ] encloses items that may or may not be present
+
+ | separates items, out of which only one can be present
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 68]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ By using the Unicode character property "InSC" or
+ "Indic_Syllabic_Category", which corresponds rather directly to the
+ classification of characters in the BNF above, we can translate the
+ BNF into a set of WLE rules matching the definition of an akshara.
+
+ <rules>
+ <!--Character class definitions go here-->
+ <class name="halant" property="InSC:Virama" />
+ <union name="vowel-modifier">
+ <class property="InSC:Visarga" />
+ <class property="InSC:Bindu" comment="includes anusvara" />
+ </union>
+ <!--Whole label evaluation and context rules go here-->
+ <rule name="consonant-with-optional-nukta">
+ <class by-ref="InSC:Consonant" />
+ <class by-ref="InSC:Nukta" count="0:1"/>
+ </rule>
+ <rule name="independent-vowel-with-optional-modifier">
+ <class by-ref="InSC:Vowel_Independent" />
+ <class by-ref="vowel-modifier" count="0:1" />
+ </rule>
+ <rule name="optional-dependent-vowel-with-opt-modifier" >
+ <class by-ref="InSC:Vowel_Dependent" count="0:1" />
+ <class by-ref="vowel-modifier" count="0:1" />
+ </rule>
+ <rule name="consonant-cluster">
+ <rule count="0+">
+ <rule by-ref="consonant-with-optional-nukta" />
+ <class by-ref="halant" />
+ </rule>
+ <rule by-ref="consonant-with-optional-nukta" />
+ <choice>
+ <class by-ref="halant" />
+ <rule by-ref="optional-dependent-vowel-with-opt-modifier" />
+ </choice>
+ </rule>
+ <rule name="akshara">
+ <choice>
+ <rule by-ref="independent-vowel-with-optional-modifier" />
+ <rule by-ref="consonant-cluster" />
+ </choice>
+ </rule>
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 69]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ <rule name="WLE-akshara-or-other" comment="series of one or
+ more aksharas, possibly alternating with other types of
+ code points such as digits">
+ <start />
+ <choice count="1+">
+ <class property="InSC:other" />
+ <rule by-ref="akshara" />
+ </choice>
+ <end />
+ </rule>
+ <!--"action" elements go here - order defines precedence-->
+ <action disp="invalid" not-match="WLE-akshara-or-other" />
+ </rules>
+
+ With the rules and classes as defined above, the final action assigns
+ a disposition of "invalid" to all labels that are not composed of a
+ sequence of well-formed aksharas, optionally interspersed with other
+ characters, perhaps digits, for example.
+
+ The relevant Unicode character property could be replicated by
+ tagging repertoire values directly in the LGR; this would remove the
+ dependency on any specific version of the Unicode Standard.
+
+ Generally, dependent vowels may only follow consonant expressions;
+ however, for some scripts, like Bengali, the Unicode Standard
+ supports sequences of dependent vowels or their application on
+ independent vowels. This makes the definition of akshara less
+ restrictive.
+
+C.1. Reducing Complexity
+
+ As presented in this example, the rules are rather complex --
+ although useful in demonstrating the features of the XML format, such
+ complexity would be an undesirable feature in an actual LGR.
+
+ It is possible to reduce the complexity of the rules in this example
+ by defining alternate rules that simply define the permissible
+ pair-wise context of adjacent code points by character class, such as
+ a rule that a halant can only follow a (nuktated) consonant. Such
+ pair-wise contexts are easier to understand, implement, and verify,
+ and have the additional benefit of allowing tools to better pinpoint
+ why a label failed to validate. They also tend to correspond more
+ directly to the kind of well-formedness requirements that are most
+ relevant to DNS security, like the requirement to limit the
+ application of a combining mark (such as a vowel modifier) to only
+ selected base characters (in this case, vowels). (See the example
+ and discussion in [WLE-RULES].)
+
+
+
+
+Davies & Freytag Standards Track [Page 70]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+Appendix D. RELAX NG Compact Schema
+
+ This schema is provided in RELAX NG Compact format [RELAX-NG].
+
+ <CODE BEGINS>
+ #
+ # LGR XML Schema 1.0
+ #
+
+ default namespace = "urn:ietf:params:xml:ns:lgr-1.0"
+
+ #
+ # SIMPLE TYPES
+ #
+
+ # RFC 5646 language tag (e.g., "de", "und-Latn")
+ language-tag = xsd:token
+
+ # The scope to which the LGR applies. For the "domain" scope type,
+ # it should be a fully qualified domain name.
+ scope-value = xsd:token {
+ minLength = "1"
+ }
+
+ ## a single code point
+ code-point = xsd:token {
+ pattern = "[0-9A-F]{4,6}"
+ }
+
+ ## a space-separated sequence of code points
+ code-point-sequence = xsd:token {
+ pattern = "[0-9A-F]{4,6}( [0-9A-F]{4,6})+"
+ }
+
+ ## single code point, or a sequence of code points, or empty string
+ code-point-literal = code-point | code-point-sequence | ""
+
+ ## code point or sequence only
+ non-empty-code-point-literal = code-point | code-point-sequence
+
+ ## code point sent represented in short form
+ code-point-set-shorthand = xsd:token {
+ pattern = "([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6})"
+ ~ "( ([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6}))*"
+ }
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 71]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ ## dates are used in information fields in the meta
+ ## section ("YYYY-MM-DD")
+ date-pattern = xsd:token {
+ pattern = "\d{4}-\d\d-\d\d"
+ }
+
+ ## variant type
+ ## the variant type MUST be non-empty and MUST NOT
+ ## start with a "_"; using xsd:NMTOKEN here because
+ ## we need space-separated lists of them
+ variant-type = xsd:NMTOKEN
+
+ ## variant type list for action triggers
+ ## the list MUST NOT be empty, and entries MUST NOT
+ ## start with a "_"
+ variant-type-list = xsd:NMTOKENS
+
+ ## reference to a rule name (used in "when" and "not-when"
+ ## attributes, as well as the "by-ref" attribute of the "rule"
+ ## element).
+ rule-ref = xsd:IDREF
+
+ ## a space-separated list of tags. Tags should generally follow
+ ## xsd:Name syntax. However, we are using the xsd:NMTOKENS here
+ ## because there is no native XSD datatype for space-separated
+ ## xsd:Name
+ tags = xsd:NMTOKENS
+
+ ## The value space of a "from-tag" attribute. Although it is closer
+ ## to xsd:IDREF lexically and semantically, tags are not unique in
+ ## the document. As such, we are unable to take advantage of
+ ## facilities provided by a validator. xsd:NMTOKEN is used instead
+ ## of the stricter xsd:Names here so as to be consistent with
+ ## the above.
+ tag-ref = xsd:NMTOKEN
+
+ ## an identifier type (used by "name" attributes).
+ identifier = xsd:ID
+
+ ## used in the class "by-ref" attribute to reference another class of
+ ## the same "name" attribute value.
+ class-ref = xsd:IDREF
+
+ ## "count" attribute pattern ("n", "n+", or "n:m")
+ count-pattern = xsd:token {
+ pattern = "\d+(\+|:\d+)?"
+ }
+
+
+
+
+Davies & Freytag Standards Track [Page 72]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ ## "ref" attribute pattern
+ ## space-separated list of "id" attribute values for
+ ## "reference" elements. These reference ids
+ ## must be declared in a "reference" element
+ ## before they can be used in a "ref" attribute
+ ref-pattern = xsd:token {
+ pattern = "[\-_.:0-9A-Z]+( [\-_.:0-9A-Z]+)*"
+ }
+
+ #
+ # STRUCTURES
+ #
+
+ ## Representation of a single code point or a sequence of code
+ ## points
+ char = element char {
+ attribute cp { code-point-literal },
+ attribute comment { text }?,
+ attribute when { rule-ref }?,
+ attribute not-when { rule-ref }?,
+ attribute tag { tags }?,
+ attribute ref { ref-pattern }?,
+ variant*
+ }
+
+ ## Representation of a range of code points
+ range = element range {
+ attribute first-cp { code-point },
+ attribute last-cp { code-point },
+ attribute comment { text }?,
+ attribute when { rule-ref }?,
+ attribute not-when { rule-ref }?,
+ attribute tag { tags }?,
+ attribute ref { ref-pattern }?
+ }
+
+ ## Representation of a variant code point or sequence
+ variant = element var {
+ attribute cp { code-point-literal },
+ attribute type { xsd:NMTOKEN }?,
+ attribute when { rule-ref }?,
+ attribute not-when { rule-ref }?,
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?
+ }
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 73]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ #
+ # Classes
+ #
+
+ ## a "class" element that references the name of another "class"
+ ## (or set-operator like "union") defined elsewhere.
+ ## If used as a matcher (appearing under a "rule" element),
+ ## the "count" attribute may be present.
+ class-invocation = element class { class-invocation-content }
+
+ class-invocation-content =
+ attribute by-ref { class-ref },
+ attribute count { count-pattern }?,
+ attribute comment { text }?
+
+ ## defines a new class (set of code points) using Unicode property
+ ## or code points of the same tag value or code point literals
+ class-declaration = element class { class-declaration-content }
+
+ class-declaration-content =
+ # "name" attribute MUST be present if this is a "top-level"
+ # class declaration, i.e., appearing directly under the "rules"
+ # element. Otherwise, it MUST be absent.
+ attribute name { identifier }?,
+ # If used as a matcher (appearing in a "rule" element, but not
+ # when nested inside a set-operator or class), the "count"
+ # attribute may be present. Otherwise, it MUST be absent.
+ attribute count { count-pattern }?,
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?,
+ (
+ # define the class by property (e.g., property="sc:Latn"), OR
+ attribute property { xsd:NMTOKEN }
+ # define the class by tagged code points, OR
+ | attribute from-tag { tag-ref }
+ # text node to allow for shorthand notation
+ # e.g., "0061 0062-0063"
+ | code-point-set-shorthand
+ )
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 74]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ class-invocation-or-declaration = element class {
+ class-invocation-content | class-declaration-content
+ }
+
+ class-or-set-operator-nested =
+ class-invocation-or-declaration | set-operator
+
+ class-or-set-operator-declaration =
+ # a "class" element or set-operator (effectively defining a class)
+ # directly in the "rules" element.
+ class-declaration | set-operator
+
+
+ #
+ # set-operators
+ #
+
+ complement-operator = element complement {
+ attribute name { identifier }?,
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?,
+ # "count" attribute MUST only be used when this set-operator is
+ # used as a matcher (i.e., nested in a "rule" element but not
+ # inside a set-operator or class)
+ attribute count { count-pattern }?,
+ class-or-set-operator-nested
+ }
+
+ union-operator = element union {
+ attribute name { identifier }?,
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?,
+ # "count" attribute MUST only be used when this set-operator is
+ # used as a matcher (i.e., nested in a "rule" element but not
+ # inside a set-operator or class)
+ attribute count { count-pattern }?,
+ class-or-set-operator-nested,
+ # needs two or more child elements
+ class-or-set-operator-nested+
+ }
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 75]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ intersection-operator = element intersection {
+ attribute name { identifier }?,
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?,
+ # "count" attribute MUST only be used when this set-operator is
+ # used as a matcher (i.e., nested in a "rule" element but not
+ # inside a set-operator or class)
+ attribute count { count-pattern }?,
+ class-or-set-operator-nested,
+ class-or-set-operator-nested
+ }
+
+ difference-operator = element difference {
+ attribute name { identifier }?,
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?,
+ # "count" attribute MUST only be used when this set-operator is
+ # used as a matcher (i.e., nested in a "rule" element but not
+ # inside a set-operator or class)
+ attribute count { count-pattern }?,
+ class-or-set-operator-nested,
+ class-or-set-operator-nested
+ }
+
+ symmetric-difference-operator = element symmetric-difference {
+ attribute name { identifier }?,
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?,
+ # "count" attribute MUST only be used when this set-operator is
+ # used as a matcher (i.e., nested in a "rule" element but not
+ # inside a set-operator or class)
+ attribute count { count-pattern }?,
+ class-or-set-operator-nested,
+ class-or-set-operator-nested
+ }
+
+ ## operators that transform class(es) into a new class.
+ set-operator = complement-operator
+ | union-operator
+ | intersection-operator
+ | difference-operator
+ | symmetric-difference-operator
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 76]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ #
+ # Match operators (matchers)
+ #
+
+ any-matcher = element any {
+ attribute count { count-pattern }?,
+ attribute comment { text }?
+ }
+
+ choice-matcher = element choice {
+ ## "count" attribute MUST only be used when the choice-matcher
+ ## contains no nested "start", "end", "anchor", "look-behind",
+ ## or "look-ahead" operators and no nested rule-matchers
+ ## containing any of these elements
+ attribute count { count-pattern }?,
+ attribute comment { text }?,
+ # two or more match operators
+ match-operator-choice,
+ match-operator-choice+
+ }
+
+ char-matcher =
+ # for use as a matcher - like "char" but without a "tag" attribute
+ element char {
+ attribute cp { non-empty-code-point-literal },
+ # If used as a matcher (appearing in a "rule" element), the
+ # "count" attribute may be present. Otherwise, it MUST be
+ # absent.
+ attribute count { count-pattern }?,
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?
+ }
+
+ start-matcher = element start {
+ attribute comment { text }?
+ }
+
+ end-matcher = element end {
+ attribute comment { text }?
+ }
+
+ anchor-matcher = element anchor {
+ attribute comment { text }?
+ }
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 77]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ look-ahead-matcher = element look-ahead {
+ attribute comment { text }?,
+ match-operators-non-pos
+ }
+ look-behind-matcher = element look-behind {
+ attribute comment { text }?,
+ match-operators-non-pos
+ }
+
+ ## non-positional match operator that can be used as a direct child
+ ## element of the choice-matcher.
+ match-operator-choice = (
+ any-matcher | choice-matcher | start-matcher | end-matcher
+ | char-matcher | class-or-set-operator-nested | rule-matcher
+ )
+
+ ## non-positional match operators do not contain any "anchor",
+ ## "look-behind", or "look-ahead" elements.
+ match-operators-non-pos = (
+ start-matcher?,
+ (any-matcher | choice-matcher | char-matcher
+ | class-or-set-operator-nested | rule-matcher)*,
+ end-matcher?
+ )
+
+ ## positional match operators have an "anchor" element, which may be
+ ## preceded by a "look-behind" element, or followed by a "look-ahead"
+ ## element, or both.
+ match-operators-pos =
+ look-behind-matcher?, anchor-matcher, look-ahead-matcher?
+
+ match-operators = match-operators-non-pos | match-operators-pos
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 78]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ #
+ # Rules
+ #
+
+ # top-level rule must have "name" attribute
+ rule-declaration-top = element rule {
+ attribute name { identifier },
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?,
+ match-operators
+ }
+
+ ## "rule" element used as a matcher (either "by-ref" or contains
+ ## other match operators itself)
+ rule-matcher =
+ element rule {
+ ## "count" attribute MUST only be used when the rule-matcher
+ ## contains no nested "start", "end", "anchor", "look-behind",
+ ## or "look-ahead" operators and no nested rule-matchers
+ ## containing any of these elements
+ attribute count { count-pattern }?,
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?,
+ (attribute by-ref { rule-ref } | match-operators)
+ }
+
+ #
+ # Actions
+ #
+
+ action-declaration = element action {
+ attribute comment { text }?,
+ attribute ref { ref-pattern }?,
+ # dispositions are often named after variant types or vice versa
+ attribute disp { variant-type },
+ ( attribute match { rule-ref }
+ | attribute not-match { rule-ref } )?,
+ ( attribute any-variant { variant-type-list }
+ | attribute all-variants { variant-type-list }
+ | attribute only-variants { variant-type-list } )?
+ }
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 79]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ # DOCUMENT STRUCTURE
+
+ start = lgr
+ lgr = element lgr {
+ meta-section?,
+ data-section,
+ rules-section?
+ }
+
+ ## Meta section - information recorded with an LGR that generally
+ ## does not affect machine processing (except for "unicode-version").
+ ## However, if any "class-declaration" uses the "property" attribute,
+ ## a "unicode-version" element MUST be present.
+ meta-section = element meta {
+ element version {
+ attribute comment { text }?,
+ text
+ }?
+ & element date { date-pattern }?
+ & element language { language-tag }*
+ & element scope {
+ # type may by "domain" or an application-defined value
+ attribute type { xsd:NCName },
+ scope-value
+ }*
+ & element validity-start { date-pattern }?
+ & element validity-end { date-pattern }?
+ & element unicode-version {
+ xsd:token {
+ pattern = "\d+\.\d+\.\d+"
+ }
+ }?
+ & element description {
+ # this SHOULD be a valid MIME type
+ attribute type { text }?,
+ text
+ }?
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 80]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+ & element references {
+ element reference {
+ attribute id {
+ xsd:token {
+ # limit "id" attribute to uppercase letters,
+ # digits, and a few punctuation marks; use of
+ # integers is RECOMMENDED
+ pattern = "[\-_.:0-9A-Z]*"
+ minLength = "1"
+ }
+ },
+ attribute comment { text }?,
+ text
+ }*
+ }?
+ }
+
+ data-section = element data { (char | range)+ }
+
+ ## Note that action declarations are strictly order dependent.
+ ## class-or-set-operator-declaration and rule-declaration-top
+ ## are weakly order dependent; they must precede first use of the
+ ## identifier via "by-ref".
+ rules-section = element rules {
+ ( class-or-set-operator-declaration
+ | rule-declaration-top
+ | action-declaration)*
+ }
+
+ <CODE ENDS>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 81]
+
+RFC 7940 Label Generation Rulesets in XML August 2016
+
+
+Acknowledgements
+
+ This format builds upon the work on documenting IDN tables by many
+ different registry operators. Notably, a comprehensive language
+ table for Chinese, Japanese, and Korean was developed by the "Joint
+ Engineering Team" [RFC3743]; this table is the basis of many registry
+ policies. Also, a set of guidelines for Arabic script registrations
+ [RFC5564] was published by the Arabic-language community.
+
+ Contributions that have shaped this document have been provided by
+ Francisco Arias, Julien Bernard, Mark Davis, Martin Duerst, Paul
+ Hoffman, Sarmad Hussain, Barry Leiba, Alexander Mayrhofer, Alexey
+ Melnikov, Nicholas Ostler, Thomas Roessler, Audric Schiltknecht,
+ Steve Sheng, Michel Suignard, Andrew Sullivan, Wil Tan, and John
+ Yunker.
+
+Authors' Addresses
+
+ Kim Davies
+ Internet Corporation for Assigned Names and Numbers
+ 12025 Waterfront Drive
+ Los Angeles, CA 90094
+ United States of America
+
+ Phone: +1 310 301 5800
+ Email: kim.davies@icann.org
+ URI: http://www.icann.org/
+
+
+ Asmus Freytag
+ ASMUS, Inc.
+
+ Email: asmus@unicode.org
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Davies & Freytag Standards Track [Page 82]
+