1 files changed, 451 insertions, 0 deletions
diff --git a/doc/rfc/rfc2141.txt b/doc/rfc/rfc2141.txt
new file mode 100644
index 0000000..1c9f685
--- /dev/null
+++ b/doc/rfc/rfc2141.txt
@@ -0,0 +1,451 @@
+
+
+
+
+
+
+Network Working Group                                           R. Moats
+Request for Comments: 2141                                          AT&T
+Category: Standards Track                                       May 1997
+
+
+                               URN Syntax
+
+Status of This Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Abstract
+
+   Uniform Resource Names (URNs) are intended to serve as persistent,
+   location-independent, resource identifiers. This document sets
+   forward the canonical syntax for URNs.  A discussion of both existing
+   legacy and new namespaces and requirements for URN presentation and
+   transmission are presented.  Finally, there is a discussion of URN
+   equivalence and how to determine it.
+
+1. Introduction
+
+   Uniform Resource Names (URNs) are intended to serve as persistent,
+   location-independent, resource identifiers and are designed to make
+   it easy to map other namespaces (which share the properties of URNs)
+   into URN-space. Therefore, the URN syntax provides a means to encode
+   character data in a form that can be sent in existing protocols,
+   transcribed on most keyboards, etc.
+
+2. Syntax
+
+   All URNs have the following syntax (phrases enclosed in quotes are
+   REQUIRED):
+
+                     <URN> ::= "urn:" <NID> ":" <NSS>
+
+   where <NID> is the Namespace Identifier, and <NSS> is the Namespace
+   Specific String.  The leading "urn:" sequence is case-insensitive.
+   The Namespace ID determines the _syntactic_ interpretation of the
+   Namespace Specific String (as discussed in [1]).
+
+
+
+
+
+
+
+Moats                       Standards Track                     [Page 1]
+
+RFC 2141                       URN Syntax                      May 1997
+
+
+   RFC 1630 [2] and RFC 1737 [3] each presents additional considerations
+   for URN encoding, which have implications as far as limiting syntax.
+   On the other hand, the requirement to support existing legacy naming
+   systems has the effect of broadening syntax.  Thus, we discuss the
+   acceptable syntax for both the Namespace Identifier and the Namespace
+   Specific String separately.
+
+2.1 Namespace Identifier Syntax
+
+   The following is the syntax for the Namespace Identifier. To (a) be
+   consistent with all potential resolution schemes and (b) not put any
+   undue constraints on any potential resolution scheme, the syntax for
+   the Namespace Identifier is:
+
+   <NID>         ::= <let-num> [ 1,31<let-num-hyp> ]
+
+   <let-num-hyp> ::= <upper> | <lower> | <number> | "-"
+
+   <let-num>     ::= <upper> | <lower> | <number>
+
+   <upper>       ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
+                     "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
+                     "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
+                     "Y" | "Z"
+
+   <lower>       ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
+                     "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
+                     "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
+                     "y" | "z"
+
+   <number>      ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
+                     "8" | "9"
+
+
+   This is slightly more restrictive that what is stated in [4] (which
+   allows the characters "." and "+").  Further, the Namespace
+   Identifier is case insensitive, so that "ISBN" and "isbn" refer to
+   the same namespace.
+
+   To avoid confusion with the "urn:" identifier, the NID "urn" is
+   reserved and MUST NOT be used.
+
+
+
+
+
+
+
+
+
+
+Moats                       Standards Track                     [Page 2]
+
+RFC 2141                       URN Syntax                      May 1997
+
+
+2.2 Namespace Specific String Syntax
+
+   As required by RFC 1737, there is a single canonical representation
+   of the NSS portion of an URN.   The format of this single canonical
+   form follows:
+
+   <NSS>         ::= 1*<URN chars>
+
+   <URN chars>   ::= <trans> | "%" <hex> <hex>
+
+   <trans>       ::= <upper> | <lower> | <number> | <other> | <reserved>
+
+   <hex>         ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" |
+                     "a" | "b" | "c" | "d" | "e" | "f"
+
+   <other>       ::= "(" | ")" | "+" | "," | "-" | "." |
+                     ":" | "=" | "@" | ";" | "$" |
+                     "_" | "!" | "*" | "'"
+
+   Depending on the rules governing a namespace, valid identifiers in a
+   namespace might contain characters that are not members of the URN
+   character set above (<URN chars>).  Such strings MUST be translated
+   into canonical NSS format before using them as protocol elements or
+   otherwise passing them on to other applications. Translation is done
+   by encoding each character outside the URN character set as a
+   sequence of one to six octets using UTF-8 encoding [5], and the
+   encoding of each of those octets as "%" followed by two characters
+   from the <hex> character set above. The two characters give the
+   hexadecimal representation of that octet.
+
+2.3 Reserved characters
+
+   The remaining character set left to be discussed above is the
+   reserved character set, which contains various characters reserved
+   from normal use.  The reserved character set follows, with a
+   discussion on the specifics of why each character is reserved.
+
+   The reserved character set is:
+
+   <reserved>    ::= '%" | "/" | "?" | "#"
+
+2.3.1 The "%" character
+
+   The "%" character is reserved in the URN syntax for introducing the
+   escape sequence for an octet.  Literal use of the "%" character in a
+   namespace must be encoded using "%25" in URNs for that namespace.
+   The presence of an "%" character in an URN MUST be followed by two
+   characters from the <hex> character set.
+
+
+
+Moats                       Standards Track                     [Page 3]
+
+RFC 2141                       URN Syntax                      May 1997
+
+
+   Namespaces MAY designate one or more characters from the URN
+   character set as having special meaning for that namespace.  If the
+   namespace also uses that character in a literal sense as well, the
+   character used in a literal sense MUST be encoded with "%" followed
+   by the hexadecimal representation of that octet.  Further, a
+   character MUST NOT be "%"-encoded if the character is not a reserved
+   character.  Therefore, the process of registering a namespace
+   identifier shall include publication of a definition of which
+   characters have a special meaning to that namespace.
+
+2.3.2 The other reserved characters
+
+   RFC 1630 [2] reserves the characters "/", "?", and "#" for particular
+   purposes. The URN-WG has not yet debated the applicability and
+   precise semantics of those purposes as applied to URNs. Therefore,
+   these characters are RESERVED for future developments.  Namespace
+   developers SHOULD NOT use these characters in unencoded form, but
+   rather use the appropriate %-encoding for each character.
+
+2.4 Excluded characters
+
+   The following list is included only for the sake of completeness.
+   Any octets/characters on this list are explicitly NOT part of the URN
+   character set, and if used in an URN, MUST be %encoded:
+
+   <excluded> ::= octets 1-32 (1-20 hex) | "\" | """ | "&" | "<"
+                  | ">" | "[" | "]" | "^" | "`" | "{" | "|" | "}" | "~"
+                  | octets 127-255 (7F-FF hex)
+
+   In addition, octet 0 (0 hex) should NEVER be used, in either
+   unencoded or %-encoded form.
+
+   An URN ends when an octet/character from the excluded character set
+   (<excluded>) is encountered.  The character from the excluded
+   character set is NOT part of the URN.
+
+3. Support of existing legacy naming systems and new naming systems
+
+   Any namespace (existing or newly-devised) that is proposed as an
+   URN-namespace and fulfills the criteria of URN-namespaces MUST be
+   expressed in this syntax.  If names in these namespaces contain
+   characters other than those defined for the URN character set, they
+   MUST be translated into canonical form as discussed in section 2.2.
+
+
+
+
+
+
+
+
+Moats                       Standards Track                     [Page 4]
+
+RFC 2141                       URN Syntax                      May 1997
+
+
+4. URN presentation and transport
+
+   The URN syntax defines the canonical format for URNs and all URN
+   transport and interchanges MUST take place in this format. Further,
+   all URN-aware applications MUST offer the option of displaying URNs
+   in this canonical form to allow for direct transcription (for example
+   by cut and paste techniques).  Such applications MAY support display
+   of URNs in a more human-friendly form and may use a character set
+   that includes characters that aren't permitted in URN syntax as
+   defined in this RFC (that is, they may replace %-notation by
+   characters in some extended character set in display to humans).
+
+5. Lexical Equivalence in URNs
+
+   For various purposes such as caching, it's often desirable to
+   determine if two URNs are the same without resolving them. The
+   general purpose means of doing so is by testing for "lexical
+   equivalence" as defined below.
+
+   Two URNs are lexically equivalent if they are octet-by-octet equal
+   after the following preprocessing:
+
+           1. normalize the case of the leading "urn:" token
+           2. normalize the case of the NID
+           3. normalizing the case of any %-escaping
+
+   Note that %-escaping MUST NOT be removed.
+
+   Some namespaces may define additional lexical equivalences, such as
+   case-insensitivity of the NSS (or parts thereof).  Additional lexical
+   equivalences MUST be documented as part of namespace registration,
+   MUST always have the effect of eliminating some of the false
+   negatives obtained by the procedure above, and MUST NEVER say that
+   two URNs are not equivalent if the procedure above says they are
+   equivalent.
+
+6. Examples of lexical equivalence
+
+   The following URN comparisons highlight the lexical equivalence
+   definitions:
+
+           1- URN:foo:a123,456
+           2- urn:foo:a123,456
+           3- urn:FOO:a123,456
+           4- urn:foo:A123,456
+           5- urn:foo:a123%2C456
+           6- URN:FOO:a123%2c456
+
+
+
+
+Moats                       Standards Track                     [Page 5]
+
+RFC 2141                       URN Syntax                      May 1997
+
+
+   URNs 1, 2, and 3 are all lexically equivalent.  URN 4 is not
+   lexically equivalent any of the other URNs of the above set.  URNs 5
+   and 6 are only lexically equivalent to each other.
+
+7. Functional Equivalence in URNs
+
+   Functional equivalence is determined by practice within a given
+   namespace and managed by resolvers for that namespeace. Thus, it is
+   beyond the scope of this document.  Namespace registration must
+   include guidance on how to determine functional equivalence for that
+   namespace, i.e. when two URNs are the identical within a namespace.
+
+8. Security considerations
+
+   This document specifies the syntax for URNs.  While some namespaces
+   resolvers may assign special meaning to certain of the characters of
+   the Namespace Specific String, any security consideration resulting
+   from such assignment are outside the scope of this document.  It is
+   strongly recommended that the process of registering a namespace
+   identifier include any such considerations.
+
+9. Acknowledgments
+
+   Thanks to various members of the URN working group for comments on
+   earlier drafts of this document.  This document is partially
+   supported by the National Science Foundation, Cooperative Agreement
+   NCR-9218179.
+
+10. References
+
+   Request For Comments (RFC) and Internet Draft documents are available
+   from <URL:ftp://ftp.internic.net> and numerous mirror sites.
+
+   [1]         Sollins, K. R., "Requirements and a Framework for
+               URN Resolution Systems," Work in Progress.
+
+   [2]         Berners-Lee, T., "Universal Resource Identifiers in
+               WWW," RFC 1630, June 1994.
+
+   [3]         Sollins, K. and L. Masinter,  "Functional Requirements
+               for Uniform Resource Names," RFC 1737.
+               December 1994.
+
+
+
+
+
+
+
+
+
+Moats                       Standards Track                     [Page 6]
+
+RFC 2141                       URN Syntax                      May 1997
+
+
+   [4]         Berners-Lee, T., R. Fielding, L. Masinter, "Uniform
+               Resource Locators (URL),"  Work in Progress.
+
+   [5]         Appendix A.2 of The Unicode Consortium, "The
+               Unicode Standard, Version 2.0", Addison-Wesley
+               Developers Press, 1996.  ISBN 0-201-48345-9.
+
+11. Editor's address
+
+      Ryan Moats
+      AT&T
+      15621 Drexel Circle
+      Omaha, NE 68135-2358
+      USA
+
+      Phone:  +1 402 894-9456
+      EMail:  jayhawk@ds.internic.net
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moats                       Standards Track                     [Page 7]
+
+RFC 2141                       URN Syntax                      May 1997
+
+
+Appendix A. Handling of URNs by URL resolvers/browsers.
+
+   The URN syntax has been defined so that URNs can be used in places
+   where URLs are expected.  A resolver that conforms to the current URL
+   syntax specification [3] will extract a scheme value of "urn:" rather
+   than a scheme value of "urn:<nid>".
+
+   An URN MUST be considered an opaque URL by URL resolvers and passed
+   (with the "urn:" tag) to an URN resolver for resolution.  The URN
+   resolver can either be an external resolver that the URL resolver
+   knows of, or it can be functionality built-in to the URL resolver.
+
+   To avoid confusion of users, an URL browser SHOULD display the
+   complete URN (including the "urn:" tag) to ensure that there is no
+   confusion between URN namespace identifiers and URL scheme
+   identifiers.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moats                       Standards Track                     [Page 8]
+