summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc2141.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc2141.txt')
-rw-r--r--doc/rfc/rfc2141.txt451
1 files changed, 451 insertions, 0 deletions
diff --git a/doc/rfc/rfc2141.txt b/doc/rfc/rfc2141.txt
new file mode 100644
index 0000000..1c9f685
--- /dev/null
+++ b/doc/rfc/rfc2141.txt
@@ -0,0 +1,451 @@
+
+
+
+
+
+
+Network Working Group R. Moats
+Request for Comments: 2141 AT&T
+Category: Standards Track May 1997
+
+
+ URN Syntax
+
+Status of This Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Abstract
+
+ Uniform Resource Names (URNs) are intended to serve as persistent,
+ location-independent, resource identifiers. This document sets
+ forward the canonical syntax for URNs. A discussion of both existing
+ legacy and new namespaces and requirements for URN presentation and
+ transmission are presented. Finally, there is a discussion of URN
+ equivalence and how to determine it.
+
+1. Introduction
+
+ Uniform Resource Names (URNs) are intended to serve as persistent,
+ location-independent, resource identifiers and are designed to make
+ it easy to map other namespaces (which share the properties of URNs)
+ into URN-space. Therefore, the URN syntax provides a means to encode
+ character data in a form that can be sent in existing protocols,
+ transcribed on most keyboards, etc.
+
+2. Syntax
+
+ All URNs have the following syntax (phrases enclosed in quotes are
+ REQUIRED):
+
+ <URN> ::= "urn:" <NID> ":" <NSS>
+
+ where <NID> is the Namespace Identifier, and <NSS> is the Namespace
+ Specific String. The leading "urn:" sequence is case-insensitive.
+ The Namespace ID determines the _syntactic_ interpretation of the
+ Namespace Specific String (as discussed in [1]).
+
+
+
+
+
+
+
+Moats Standards Track [Page 1]
+
+RFC 2141 URN Syntax May 1997
+
+
+ RFC 1630 [2] and RFC 1737 [3] each presents additional considerations
+ for URN encoding, which have implications as far as limiting syntax.
+ On the other hand, the requirement to support existing legacy naming
+ systems has the effect of broadening syntax. Thus, we discuss the
+ acceptable syntax for both the Namespace Identifier and the Namespace
+ Specific String separately.
+
+2.1 Namespace Identifier Syntax
+
+ The following is the syntax for the Namespace Identifier. To (a) be
+ consistent with all potential resolution schemes and (b) not put any
+ undue constraints on any potential resolution scheme, the syntax for
+ the Namespace Identifier is:
+
+ <NID> ::= <let-num> [ 1,31<let-num-hyp> ]
+
+ <let-num-hyp> ::= <upper> | <lower> | <number> | "-"
+
+ <let-num> ::= <upper> | <lower> | <number>
+
+ <upper> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
+ "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
+ "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
+ "Y" | "Z"
+
+ <lower> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
+ "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
+ "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
+ "y" | "z"
+
+ <number> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
+ "8" | "9"
+
+
+ This is slightly more restrictive that what is stated in [4] (which
+ allows the characters "." and "+"). Further, the Namespace
+ Identifier is case insensitive, so that "ISBN" and "isbn" refer to
+ the same namespace.
+
+ To avoid confusion with the "urn:" identifier, the NID "urn" is
+ reserved and MUST NOT be used.
+
+
+
+
+
+
+
+
+
+
+Moats Standards Track [Page 2]
+
+RFC 2141 URN Syntax May 1997
+
+
+2.2 Namespace Specific String Syntax
+
+ As required by RFC 1737, there is a single canonical representation
+ of the NSS portion of an URN. The format of this single canonical
+ form follows:
+
+ <NSS> ::= 1*<URN chars>
+
+ <URN chars> ::= <trans> | "%" <hex> <hex>
+
+ <trans> ::= <upper> | <lower> | <number> | <other> | <reserved>
+
+ <hex> ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" |
+ "a" | "b" | "c" | "d" | "e" | "f"
+
+ <other> ::= "(" | ")" | "+" | "," | "-" | "." |
+ ":" | "=" | "@" | ";" | "$" |
+ "_" | "!" | "*" | "'"
+
+ Depending on the rules governing a namespace, valid identifiers in a
+ namespace might contain characters that are not members of the URN
+ character set above (<URN chars>). Such strings MUST be translated
+ into canonical NSS format before using them as protocol elements or
+ otherwise passing them on to other applications. Translation is done
+ by encoding each character outside the URN character set as a
+ sequence of one to six octets using UTF-8 encoding [5], and the
+ encoding of each of those octets as "%" followed by two characters
+ from the <hex> character set above. The two characters give the
+ hexadecimal representation of that octet.
+
+2.3 Reserved characters
+
+ The remaining character set left to be discussed above is the
+ reserved character set, which contains various characters reserved
+ from normal use. The reserved character set follows, with a
+ discussion on the specifics of why each character is reserved.
+
+ The reserved character set is:
+
+ <reserved> ::= '%" | "/" | "?" | "#"
+
+2.3.1 The "%" character
+
+ The "%" character is reserved in the URN syntax for introducing the
+ escape sequence for an octet. Literal use of the "%" character in a
+ namespace must be encoded using "%25" in URNs for that namespace.
+ The presence of an "%" character in an URN MUST be followed by two
+ characters from the <hex> character set.
+
+
+
+Moats Standards Track [Page 3]
+
+RFC 2141 URN Syntax May 1997
+
+
+ Namespaces MAY designate one or more characters from the URN
+ character set as having special meaning for that namespace. If the
+ namespace also uses that character in a literal sense as well, the
+ character used in a literal sense MUST be encoded with "%" followed
+ by the hexadecimal representation of that octet. Further, a
+ character MUST NOT be "%"-encoded if the character is not a reserved
+ character. Therefore, the process of registering a namespace
+ identifier shall include publication of a definition of which
+ characters have a special meaning to that namespace.
+
+2.3.2 The other reserved characters
+
+ RFC 1630 [2] reserves the characters "/", "?", and "#" for particular
+ purposes. The URN-WG has not yet debated the applicability and
+ precise semantics of those purposes as applied to URNs. Therefore,
+ these characters are RESERVED for future developments. Namespace
+ developers SHOULD NOT use these characters in unencoded form, but
+ rather use the appropriate %-encoding for each character.
+
+2.4 Excluded characters
+
+ The following list is included only for the sake of completeness.
+ Any octets/characters on this list are explicitly NOT part of the URN
+ character set, and if used in an URN, MUST be %encoded:
+
+ <excluded> ::= octets 1-32 (1-20 hex) | "\" | """ | "&" | "<"
+ | ">" | "[" | "]" | "^" | "`" | "{" | "|" | "}" | "~"
+ | octets 127-255 (7F-FF hex)
+
+ In addition, octet 0 (0 hex) should NEVER be used, in either
+ unencoded or %-encoded form.
+
+ An URN ends when an octet/character from the excluded character set
+ (<excluded>) is encountered. The character from the excluded
+ character set is NOT part of the URN.
+
+3. Support of existing legacy naming systems and new naming systems
+
+ Any namespace (existing or newly-devised) that is proposed as an
+ URN-namespace and fulfills the criteria of URN-namespaces MUST be
+ expressed in this syntax. If names in these namespaces contain
+ characters other than those defined for the URN character set, they
+ MUST be translated into canonical form as discussed in section 2.2.
+
+
+
+
+
+
+
+
+Moats Standards Track [Page 4]
+
+RFC 2141 URN Syntax May 1997
+
+
+4. URN presentation and transport
+
+ The URN syntax defines the canonical format for URNs and all URN
+ transport and interchanges MUST take place in this format. Further,
+ all URN-aware applications MUST offer the option of displaying URNs
+ in this canonical form to allow for direct transcription (for example
+ by cut and paste techniques). Such applications MAY support display
+ of URNs in a more human-friendly form and may use a character set
+ that includes characters that aren't permitted in URN syntax as
+ defined in this RFC (that is, they may replace %-notation by
+ characters in some extended character set in display to humans).
+
+5. Lexical Equivalence in URNs
+
+ For various purposes such as caching, it's often desirable to
+ determine if two URNs are the same without resolving them. The
+ general purpose means of doing so is by testing for "lexical
+ equivalence" as defined below.
+
+ Two URNs are lexically equivalent if they are octet-by-octet equal
+ after the following preprocessing:
+
+ 1. normalize the case of the leading "urn:" token
+ 2. normalize the case of the NID
+ 3. normalizing the case of any %-escaping
+
+ Note that %-escaping MUST NOT be removed.
+
+ Some namespaces may define additional lexical equivalences, such as
+ case-insensitivity of the NSS (or parts thereof). Additional lexical
+ equivalences MUST be documented as part of namespace registration,
+ MUST always have the effect of eliminating some of the false
+ negatives obtained by the procedure above, and MUST NEVER say that
+ two URNs are not equivalent if the procedure above says they are
+ equivalent.
+
+6. Examples of lexical equivalence
+
+ The following URN comparisons highlight the lexical equivalence
+ definitions:
+
+ 1- URN:foo:a123,456
+ 2- urn:foo:a123,456
+ 3- urn:FOO:a123,456
+ 4- urn:foo:A123,456
+ 5- urn:foo:a123%2C456
+ 6- URN:FOO:a123%2c456
+
+
+
+
+Moats Standards Track [Page 5]
+
+RFC 2141 URN Syntax May 1997
+
+
+ URNs 1, 2, and 3 are all lexically equivalent. URN 4 is not
+ lexically equivalent any of the other URNs of the above set. URNs 5
+ and 6 are only lexically equivalent to each other.
+
+7. Functional Equivalence in URNs
+
+ Functional equivalence is determined by practice within a given
+ namespace and managed by resolvers for that namespeace. Thus, it is
+ beyond the scope of this document. Namespace registration must
+ include guidance on how to determine functional equivalence for that
+ namespace, i.e. when two URNs are the identical within a namespace.
+
+8. Security considerations
+
+ This document specifies the syntax for URNs. While some namespaces
+ resolvers may assign special meaning to certain of the characters of
+ the Namespace Specific String, any security consideration resulting
+ from such assignment are outside the scope of this document. It is
+ strongly recommended that the process of registering a namespace
+ identifier include any such considerations.
+
+9. Acknowledgments
+
+ Thanks to various members of the URN working group for comments on
+ earlier drafts of this document. This document is partially
+ supported by the National Science Foundation, Cooperative Agreement
+ NCR-9218179.
+
+10. References
+
+ Request For Comments (RFC) and Internet Draft documents are available
+ from <URL:ftp://ftp.internic.net> and numerous mirror sites.
+
+ [1] Sollins, K. R., "Requirements and a Framework for
+ URN Resolution Systems," Work in Progress.
+
+ [2] Berners-Lee, T., "Universal Resource Identifiers in
+ WWW," RFC 1630, June 1994.
+
+ [3] Sollins, K. and L. Masinter, "Functional Requirements
+ for Uniform Resource Names," RFC 1737.
+ December 1994.
+
+
+
+
+
+
+
+
+
+Moats Standards Track [Page 6]
+
+RFC 2141 URN Syntax May 1997
+
+
+ [4] Berners-Lee, T., R. Fielding, L. Masinter, "Uniform
+ Resource Locators (URL)," Work in Progress.
+
+ [5] Appendix A.2 of The Unicode Consortium, "The
+ Unicode Standard, Version 2.0", Addison-Wesley
+ Developers Press, 1996. ISBN 0-201-48345-9.
+
+11. Editor's address
+
+ Ryan Moats
+ AT&T
+ 15621 Drexel Circle
+ Omaha, NE 68135-2358
+ USA
+
+ Phone: +1 402 894-9456
+ EMail: jayhawk@ds.internic.net
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moats Standards Track [Page 7]
+
+RFC 2141 URN Syntax May 1997
+
+
+Appendix A. Handling of URNs by URL resolvers/browsers.
+
+ The URN syntax has been defined so that URNs can be used in places
+ where URLs are expected. A resolver that conforms to the current URL
+ syntax specification [3] will extract a scheme value of "urn:" rather
+ than a scheme value of "urn:<nid>".
+
+ An URN MUST be considered an opaque URL by URL resolvers and passed
+ (with the "urn:" tag) to an URN resolver for resolution. The URN
+ resolver can either be an external resolver that the URL resolver
+ knows of, or it can be functionality built-in to the URL resolver.
+
+ To avoid confusion of users, an URL browser SHOULD display the
+ complete URN (including the "urn:" tag) to ensure that there is no
+ confusion between URN namespace identifiers and URL scheme
+ identifiers.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moats Standards Track [Page 8]
+