diff options
Diffstat (limited to 'doc/rfc/rfc2141.txt')
-rw-r--r-- | doc/rfc/rfc2141.txt | 451 |
1 files changed, 451 insertions, 0 deletions
diff --git a/doc/rfc/rfc2141.txt b/doc/rfc/rfc2141.txt new file mode 100644 index 0000000..1c9f685 --- /dev/null +++ b/doc/rfc/rfc2141.txt @@ -0,0 +1,451 @@ + + + + + + +Network Working Group R. Moats +Request for Comments: 2141 AT&T +Category: Standards Track May 1997 + + + URN Syntax + +Status of This Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + Uniform Resource Names (URNs) are intended to serve as persistent, + location-independent, resource identifiers. This document sets + forward the canonical syntax for URNs. A discussion of both existing + legacy and new namespaces and requirements for URN presentation and + transmission are presented. Finally, there is a discussion of URN + equivalence and how to determine it. + +1. Introduction + + Uniform Resource Names (URNs) are intended to serve as persistent, + location-independent, resource identifiers and are designed to make + it easy to map other namespaces (which share the properties of URNs) + into URN-space. Therefore, the URN syntax provides a means to encode + character data in a form that can be sent in existing protocols, + transcribed on most keyboards, etc. + +2. Syntax + + All URNs have the following syntax (phrases enclosed in quotes are + REQUIRED): + + <URN> ::= "urn:" <NID> ":" <NSS> + + where <NID> is the Namespace Identifier, and <NSS> is the Namespace + Specific String. The leading "urn:" sequence is case-insensitive. + The Namespace ID determines the _syntactic_ interpretation of the + Namespace Specific String (as discussed in [1]). + + + + + + + +Moats Standards Track [Page 1] + +RFC 2141 URN Syntax May 1997 + + + RFC 1630 [2] and RFC 1737 [3] each presents additional considerations + for URN encoding, which have implications as far as limiting syntax. + On the other hand, the requirement to support existing legacy naming + systems has the effect of broadening syntax. Thus, we discuss the + acceptable syntax for both the Namespace Identifier and the Namespace + Specific String separately. + +2.1 Namespace Identifier Syntax + + The following is the syntax for the Namespace Identifier. To (a) be + consistent with all potential resolution schemes and (b) not put any + undue constraints on any potential resolution scheme, the syntax for + the Namespace Identifier is: + + <NID> ::= <let-num> [ 1,31<let-num-hyp> ] + + <let-num-hyp> ::= <upper> | <lower> | <number> | "-" + + <let-num> ::= <upper> | <lower> | <number> + + <upper> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | + "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | + "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | + "Y" | "Z" + + <lower> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | + "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | + "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | + "y" | "z" + + <number> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | + "8" | "9" + + + This is slightly more restrictive that what is stated in [4] (which + allows the characters "." and "+"). Further, the Namespace + Identifier is case insensitive, so that "ISBN" and "isbn" refer to + the same namespace. + + To avoid confusion with the "urn:" identifier, the NID "urn" is + reserved and MUST NOT be used. + + + + + + + + + + +Moats Standards Track [Page 2] + +RFC 2141 URN Syntax May 1997 + + +2.2 Namespace Specific String Syntax + + As required by RFC 1737, there is a single canonical representation + of the NSS portion of an URN. The format of this single canonical + form follows: + + <NSS> ::= 1*<URN chars> + + <URN chars> ::= <trans> | "%" <hex> <hex> + + <trans> ::= <upper> | <lower> | <number> | <other> | <reserved> + + <hex> ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" | + "a" | "b" | "c" | "d" | "e" | "f" + + <other> ::= "(" | ")" | "+" | "," | "-" | "." | + ":" | "=" | "@" | ";" | "$" | + "_" | "!" | "*" | "'" + + Depending on the rules governing a namespace, valid identifiers in a + namespace might contain characters that are not members of the URN + character set above (<URN chars>). Such strings MUST be translated + into canonical NSS format before using them as protocol elements or + otherwise passing them on to other applications. Translation is done + by encoding each character outside the URN character set as a + sequence of one to six octets using UTF-8 encoding [5], and the + encoding of each of those octets as "%" followed by two characters + from the <hex> character set above. The two characters give the + hexadecimal representation of that octet. + +2.3 Reserved characters + + The remaining character set left to be discussed above is the + reserved character set, which contains various characters reserved + from normal use. The reserved character set follows, with a + discussion on the specifics of why each character is reserved. + + The reserved character set is: + + <reserved> ::= '%" | "/" | "?" | "#" + +2.3.1 The "%" character + + The "%" character is reserved in the URN syntax for introducing the + escape sequence for an octet. Literal use of the "%" character in a + namespace must be encoded using "%25" in URNs for that namespace. + The presence of an "%" character in an URN MUST be followed by two + characters from the <hex> character set. + + + +Moats Standards Track [Page 3] + +RFC 2141 URN Syntax May 1997 + + + Namespaces MAY designate one or more characters from the URN + character set as having special meaning for that namespace. If the + namespace also uses that character in a literal sense as well, the + character used in a literal sense MUST be encoded with "%" followed + by the hexadecimal representation of that octet. Further, a + character MUST NOT be "%"-encoded if the character is not a reserved + character. Therefore, the process of registering a namespace + identifier shall include publication of a definition of which + characters have a special meaning to that namespace. + +2.3.2 The other reserved characters + + RFC 1630 [2] reserves the characters "/", "?", and "#" for particular + purposes. The URN-WG has not yet debated the applicability and + precise semantics of those purposes as applied to URNs. Therefore, + these characters are RESERVED for future developments. Namespace + developers SHOULD NOT use these characters in unencoded form, but + rather use the appropriate %-encoding for each character. + +2.4 Excluded characters + + The following list is included only for the sake of completeness. + Any octets/characters on this list are explicitly NOT part of the URN + character set, and if used in an URN, MUST be %encoded: + + <excluded> ::= octets 1-32 (1-20 hex) | "\" | """ | "&" | "<" + | ">" | "[" | "]" | "^" | "`" | "{" | "|" | "}" | "~" + | octets 127-255 (7F-FF hex) + + In addition, octet 0 (0 hex) should NEVER be used, in either + unencoded or %-encoded form. + + An URN ends when an octet/character from the excluded character set + (<excluded>) is encountered. The character from the excluded + character set is NOT part of the URN. + +3. Support of existing legacy naming systems and new naming systems + + Any namespace (existing or newly-devised) that is proposed as an + URN-namespace and fulfills the criteria of URN-namespaces MUST be + expressed in this syntax. If names in these namespaces contain + characters other than those defined for the URN character set, they + MUST be translated into canonical form as discussed in section 2.2. + + + + + + + + +Moats Standards Track [Page 4] + +RFC 2141 URN Syntax May 1997 + + +4. URN presentation and transport + + The URN syntax defines the canonical format for URNs and all URN + transport and interchanges MUST take place in this format. Further, + all URN-aware applications MUST offer the option of displaying URNs + in this canonical form to allow for direct transcription (for example + by cut and paste techniques). Such applications MAY support display + of URNs in a more human-friendly form and may use a character set + that includes characters that aren't permitted in URN syntax as + defined in this RFC (that is, they may replace %-notation by + characters in some extended character set in display to humans). + +5. Lexical Equivalence in URNs + + For various purposes such as caching, it's often desirable to + determine if two URNs are the same without resolving them. The + general purpose means of doing so is by testing for "lexical + equivalence" as defined below. + + Two URNs are lexically equivalent if they are octet-by-octet equal + after the following preprocessing: + + 1. normalize the case of the leading "urn:" token + 2. normalize the case of the NID + 3. normalizing the case of any %-escaping + + Note that %-escaping MUST NOT be removed. + + Some namespaces may define additional lexical equivalences, such as + case-insensitivity of the NSS (or parts thereof). Additional lexical + equivalences MUST be documented as part of namespace registration, + MUST always have the effect of eliminating some of the false + negatives obtained by the procedure above, and MUST NEVER say that + two URNs are not equivalent if the procedure above says they are + equivalent. + +6. Examples of lexical equivalence + + The following URN comparisons highlight the lexical equivalence + definitions: + + 1- URN:foo:a123,456 + 2- urn:foo:a123,456 + 3- urn:FOO:a123,456 + 4- urn:foo:A123,456 + 5- urn:foo:a123%2C456 + 6- URN:FOO:a123%2c456 + + + + +Moats Standards Track [Page 5] + +RFC 2141 URN Syntax May 1997 + + + URNs 1, 2, and 3 are all lexically equivalent. URN 4 is not + lexically equivalent any of the other URNs of the above set. URNs 5 + and 6 are only lexically equivalent to each other. + +7. Functional Equivalence in URNs + + Functional equivalence is determined by practice within a given + namespace and managed by resolvers for that namespeace. Thus, it is + beyond the scope of this document. Namespace registration must + include guidance on how to determine functional equivalence for that + namespace, i.e. when two URNs are the identical within a namespace. + +8. Security considerations + + This document specifies the syntax for URNs. While some namespaces + resolvers may assign special meaning to certain of the characters of + the Namespace Specific String, any security consideration resulting + from such assignment are outside the scope of this document. It is + strongly recommended that the process of registering a namespace + identifier include any such considerations. + +9. Acknowledgments + + Thanks to various members of the URN working group for comments on + earlier drafts of this document. This document is partially + supported by the National Science Foundation, Cooperative Agreement + NCR-9218179. + +10. References + + Request For Comments (RFC) and Internet Draft documents are available + from <URL:ftp://ftp.internic.net> and numerous mirror sites. + + [1] Sollins, K. R., "Requirements and a Framework for + URN Resolution Systems," Work in Progress. + + [2] Berners-Lee, T., "Universal Resource Identifiers in + WWW," RFC 1630, June 1994. + + [3] Sollins, K. and L. Masinter, "Functional Requirements + for Uniform Resource Names," RFC 1737. + December 1994. + + + + + + + + + +Moats Standards Track [Page 6] + +RFC 2141 URN Syntax May 1997 + + + [4] Berners-Lee, T., R. Fielding, L. Masinter, "Uniform + Resource Locators (URL)," Work in Progress. + + [5] Appendix A.2 of The Unicode Consortium, "The + Unicode Standard, Version 2.0", Addison-Wesley + Developers Press, 1996. ISBN 0-201-48345-9. + +11. Editor's address + + Ryan Moats + AT&T + 15621 Drexel Circle + Omaha, NE 68135-2358 + USA + + Phone: +1 402 894-9456 + EMail: jayhawk@ds.internic.net + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Moats Standards Track [Page 7] + +RFC 2141 URN Syntax May 1997 + + +Appendix A. Handling of URNs by URL resolvers/browsers. + + The URN syntax has been defined so that URNs can be used in places + where URLs are expected. A resolver that conforms to the current URL + syntax specification [3] will extract a scheme value of "urn:" rather + than a scheme value of "urn:<nid>". + + An URN MUST be considered an opaque URL by URL resolvers and passed + (with the "urn:" tag) to an URN resolver for resolution. The URN + resolver can either be an external resolver that the URL resolver + knows of, or it can be functionality built-in to the URL resolver. + + To avoid confusion of users, an URL browser SHOULD display the + complete URN (including the "urn:" tag) to ensure that there is no + confusion between URN namespace identifiers and URL scheme + identifiers. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Moats Standards Track [Page 8] + |