summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1630.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1630.txt')
-rw-r--r--doc/rfc/rfc1630.txt1571
1 files changed, 1571 insertions, 0 deletions
diff --git a/doc/rfc/rfc1630.txt b/doc/rfc/rfc1630.txt
new file mode 100644
index 0000000..c1e9d9a
--- /dev/null
+++ b/doc/rfc/rfc1630.txt
@@ -0,0 +1,1571 @@
+
+
+
+
+
+
+Network Working Group T. Berners-Lee
+Request for Comments: 1630 CERN
+Category: Informational June 1994
+
+
+ Universal Resource Identifiers in WWW
+
+ A Unifying Syntax for the Expression of
+ Names and Addresses of Objects on the Network
+ as used in the World-Wide Web
+
+Status of this Memo
+
+ This memo provides information for the Internet community. This memo
+ does not specify an Internet standard of any kind. Distribution of
+ this memo is unlimited.
+
+IESG Note:
+
+ Note that the work contained in this memo does not describe an
+ Internet standard. An Internet standard for general Resource
+ Identifiers is under development within the IETF.
+
+Introduction
+
+ This document defines the syntax used by the World-Wide Web
+ initiative to encode the names and addresses of objects on the
+ Internet. The web is considered to include objects accessed using an
+ extendable number of protocols, existing, invented for the web
+ itself, or to be invented in the future. Access instructions for an
+ individual object under a given protocol are encoded into forms of
+ address string. Other protocols allow the use of object names of
+ various forms. In order to abstract the idea of a generic object,
+ the web needs the concepts of the universal set of objects, and of
+ the universal set of names or addresses of objects.
+
+ A Universal Resource Identifier (URI) is a member of this universal
+ set of names in registered name spaces and addresses referring to
+ registered protocols or name spaces. A Uniform Resource Locator
+ (URL), defined elsewhere, is a form of URI which expresses an address
+ which maps onto an access algorithm using network protocols. Existing
+ URI schemes which correspond to the (still mutating) concept of IETF
+ URLs are listed here. The Uniform Resource Name (URN) debate attempts
+ to define a name space (and presumably resolution protocols) for
+ persistent object names. This area is not addressed by this document,
+ which is written in order to document existing practice and provide a
+ reference point for URL and URN discussions.
+
+
+
+
+Berners-Lee [Page 1]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ The world-wide web protocols are discussed on the mailing list www-
+ talk-request@info.cern.ch and the newsgroup comp.infosystems.www is
+ preferable for beginner's questions. The mailing list uri-
+ request@bunyip.com has discussion related particularly to the URI
+ issue. The author may be contacted as timbl@info.cern.ch.
+
+ This document is available in hypertext form at:
+
+ http://info.cern.ch/hypertext/WWW/Addressing/URL/URI_Overview.html
+
+The Need For a Universal Syntax
+
+ This section describes the concept of the URI and does not form part
+ of the specification.
+
+ Many protocols and systems for document search and retrieval are
+ currently in use, and many more protocols or refinements of existing
+ protocols are to be expected in a field whose expansion is explosive.
+
+ These systems are aiming to achieve global search and readership of
+ documents across differing computing platforms, and despite a
+ plethora of protocols and data formats. As protocols evolve,
+ gateways can allow global access to remain possible. As data formats
+ evolve, format conversion programs can preserve global access. There
+ is one area, however, in which it is impractical to make conversions,
+ and that is in the names and addresses used to identify objects.
+ This is because names and addresses of objects are passed on in so
+ many ways, from the backs of envelopes to hypertext objects, and may
+ have a long life.
+
+ A common feature of almost all the data models of past and proposed
+ systems is something which can be mapped onto a concept of "object"
+ and some kind of name, address, or identifier for that object. One
+ can therefore define a set of name spaces in which these objects can
+ be said to exist.
+
+ Practical systems need to access and mix objects which are part of
+ different existing and proposed systems. Therefore, the concept of
+ the universal set of all objects, and hence the universal set of
+ names and addresses, in all name spaces, becomes important. This
+ allows names in different spaces to be treated in a common way, even
+ though names in different spaces have differing characteristics, as
+ do the objects to which they refer.
+
+
+
+
+
+
+
+
+Berners-Lee [Page 2]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ URIs
+
+ This document defines a way to encapsulate a name in any
+ registered name space, and label it with the the name space,
+ producing a member of the universal set. Such an encoded and
+ labelled member of this set is known as a Universal Resource
+ Identifier, or URI.
+
+ The universal syntax allows access of objects available using
+ existing protocols, and may be extended with technology.
+
+ The specification of the URI syntax does not imply anything about
+ the properties of names and addresses in the various name spaces
+ which are mapped onto the set of URI strings. The properties
+ follow from the specifications of the protocols and the associated
+ usage conventions for each scheme.
+
+ URLs
+
+ For existing Internet access protocols, it is necessary in most
+ cases to define the encoding of the access algorithm into
+ something concise enough to be termed address. URIs which refer
+ to objects accessed with existing protocols are known as "Uniform
+ Resource Locators" (URLs) and are listed here as used in WWW, but
+ to be formally defined in a separate document.
+
+ URNs
+
+ There is currently a drive to define a space of more persistent
+ names than any URLs. These "Uniform Resource Names" are the
+ subject of an IETF working group's discussions. (See Sollins and
+ Masinter, Functional Specifications for URNs, circulated
+ informally.)
+
+ The URI syntax and URL forms have been in widespread use by
+ World-Wide Web software since 1990.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Berners-Lee [Page 3]
+
+RFC 1630 URIs in WWW June 1994
+
+
+Design Criteria and Choices
+
+ This section is not part of the specification: it is simply an
+ explanation of the way in which the specification was derived.
+
+ Design criteria
+
+ The syntax was designed to be:
+
+ Extensible New naming schemes may be added later.
+
+ Complete It is possible to encode any naming
+ scheme.
+
+ Printable It is possible to express any URI using
+ 7-bit ASCII characters so that URIs may,
+ if necessary, be passed using pen and ink.
+
+ Choices for a universal syntax
+
+ For the syntax itself there is little choice except for the order
+ and punctuation of the elements, and the acceptable characters and
+ escaping rules.
+
+ The extensibility requirement is met by allowing an arbitrary (but
+ registered) string to be used as a prefix. A prefix is chosen as
+ left to right parsing is more common than right to left. The
+ choice of a colon as separator of the prefix from the rest of the
+ URI was arbitrary.
+
+ The decoding of the rest of the string is defined as a function of
+ the prefix. New prefixed are introduced for new schemes as
+ necessary, in agreement with the registration authority. The
+ registration of a new scheme clearly requires the definition of
+ the decoding of the URI into a given name space, and a definition
+ of the properties and, where applicable, resolution protocols, for
+ the name space.
+
+ The completeness requirement is easily met by allowing
+ particularly strange or plain binary names to be encoded in base
+ 16 or 64 using the acceptable characters.
+
+ The printability requirement could have been met by requiring all
+ schemes to encode characters not part of a basic set. This led to
+ many discussions of what the basic set should be. A difficult
+ case, for example, is when an ISO latin 1 string appears in a URL,
+ and within an application with ISO Latin-1 capability, it can be
+ handled intact. However, for transport in general, the non-ASCII
+
+
+
+Berners-Lee [Page 4]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ characters need to be escaped.
+
+ The solution to this was to specify a safe set of characters, and
+ a general escaping scheme which may be used for encoding "unsafe"
+ characters. This "safe" set is suitable, for example, for use in
+ electronic mail. This is the canonical form of a URI.
+
+ The choice of escape character for introducing representations of
+ non-allowed characters also tends to be a matter of taste. An
+ ANSI standard exists in the C language, using the back-slash
+ character "\". The use of this character on unix command lines,
+ however, can be a problem as it is interpreted by many shell
+ programs, and would have itself to be escaped. It is also a
+ character which is not available on certain keyboards. The equals
+ sign is commonly used in the encoding of names having
+ attribute=value pairs. The percent sign was eventually chosen as
+ a suitable escape character.
+
+ There is a conflict between the need to be able to represent many
+ characters including spaces within a URI directly, and the need to
+ be able to use a URI in environments which have limited character
+ sets or in which certain characters are prone to corruption. This
+ conflict has been resolved by use of an hexadecimal escaping
+ method which may be applied to any characters forbidden in a given
+ context. When URLs are moved between contexts, the set of
+ characters escaped may be enlarged or reduced unambiguously.
+
+ The use of white space characters is risky in URIs to be printed
+ or sent by electronic mail, and the use of multiple white space
+ characters is very risky. This is because of the frequent
+ introduction of extraneous white space when lines are wrapped by
+ systems such as mail, or sheer necessity of narrow column width,
+ and because of the inter-conversion of various forms of white
+ space which occurs during character code conversion and the
+ transfer of text between applications. This is why the canonical
+ form for URIs has all white spaces encoded.
+
+Reommendations
+
+ This section describes the syntax for URIs as used in the WorldWide
+ Web initiative. The generic syntax provides a framework for new
+ schemes for names to be resolved using as yet undefined protocols.
+
+URI syntax
+
+ A complete URI consists of a naming scheme specifier followed by a
+ string whose format is a function of the naming scheme. For locators
+ of information on the Internet, a common syntax is used for the IP
+
+
+
+Berners-Lee [Page 5]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ address part. A BNF description of the URL syntax is given in an a
+ later section. The components are as follows. Fragment identifiers
+ and relative URIs are not involved in the basic URL definition.
+
+ SCHEME
+
+ Within the URI of a object, the first element is the name of the
+ scheme, separated from the rest of the object by a colon.
+
+ PATH
+
+ The rest of the URI follows the colon in a format depending on the
+ scheme. The path is interpreted in a manner dependent on the
+ protocol being used. However, when it contains slashes, these
+ must imply a hierarchical structure.
+
+Reserved characters
+
+ The path in the URI has a significance defined by the particular
+ scheme. Typically, it is used to encode a name in a given name
+ space, or an algorithm for accessing an object. In either case, the
+ encoding may use those characters allowed by the BNF syntax, or
+ hexadecimal encoding of other characters.
+
+ Some of the reserved characters have special uses as defined here.
+
+ THE PERCENT SIGN
+
+ The percent sign ("%", ASCII 25 hex) is used as the escape
+ character in the encoding scheme and is never allowed for anything
+ else.
+
+ HIERARCHICAL FORMS
+
+ The slash ("/", ASCII 2F hex) character is reserved for the
+ delimiting of substrings whose relationship is hierarchical. This
+ enables partial forms of the URI. Substrings consisting of single
+ or double dots ("." or "..") are similarly reserved.
+
+ The significance of the slash between two segments is that the
+ segment of the path to the left is more significant than the
+ segment of the path to the right. ("Significance" in this case
+ refers solely to closeness to the root of the hierarchical
+ structure and makes no value judgement!)
+
+
+
+
+
+
+
+Berners-Lee [Page 6]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ Note
+
+ The similarity to unix and other disk operating system filename
+ conventions should be taken as purely coincidental, and should
+ not be taken to indicate that URIs should be interpreted as
+ file names.
+
+ HASH FOR FRAGMENT IDENTIFIERS
+
+ The hash ("#", ASCII 23 hex) character is reserved as a delimiter
+ to separate the URI of an object from a fragment identifier .
+
+ QUERY STRINGS
+
+ The question mark ("?", ASCII 3F hex) is used to delimit the
+ boundary between the URI of a queryable object, and a set of words
+ used to express a query on that object. When this form is used,
+ the combined URI stands for the object which results from the
+ query being applied to the original object.
+
+ Within the query string, the plus sign is reserved as shorthand
+ notation for a space. Therefore, real plus signs must be encoded.
+ This method was used to make query URIs easier to pass in systems
+ which did not allow spaces.
+
+ The query string represents some operation applied to the object,
+ but this specification gives no common syntax or semantics for it.
+ In practice the syntax and sematics may depend on the scheme and
+ may even on the base URI.
+
+ OTHER RESERVED CHARACTERS
+
+ The astersik ("*", ASCII 2A hex) and exclamation mark ("!" , ASCII
+ 21 hex) are reserved for use as having special signifiance within
+ specific schemes.
+
+Unsafe characters
+
+ In canonical form, certain characters such as spaces, control
+ characters, some characters whose ASCII code is used differently in
+ different national character variant 7 bit sets, and all 8bit
+ characters beyond DEL (7F hex) of the ISO Latin-1 set, shall not be
+ used unencoded. This is a recommendation for trouble-free
+ interchange, and as indicated below, the encoded set may be extended
+ or reduced.
+
+
+
+
+
+
+Berners-Lee [Page 7]
+
+RFC 1630 URIs in WWW June 1994
+
+
+Encoding reserved characters
+
+ When a system uses a local addressing scheme, it is useful to provide
+ a mapping from local addresses into URIs so that references to
+ objects within the addressing scheme may be referred to globally, and
+ possibly accessed through gateway servers.
+
+ For a new naming scheme, any mapping scheme may be defined provided
+ it is unambiguous, reversible, and provides valid URIs. It is
+ recommended that where hierarchical aspects to the local naming
+ scheme exist, they be mapped onto the hierarchical URL path syntax in
+ order to allow the partial form to be used.
+
+ It is also recommended that the conventional scheme below be used in
+ all cases except for any scheme which encodes binary data as opposed
+ to text, in which case a more compact encoding such as pure
+ hexadecimal or base 64 might be more appropriate. For example, the
+ conventional URI encoding method is used for mapping WAIS, FTP,
+ Prospero and Gopher addresses in the URI specification.
+
+ CONVENTIONAL URI ENCODING SCHEME
+
+ Where the local naming scheme uses ASCII characters which are not
+ allowed in the URI, these may be represented in the URL by a
+ percent sign "%" immediately followed by two hexadecimal digits
+ (0-9, A-F) giving the ISO Latin 1 code for that character.
+ Character codes other than those allowed by the syntax shall not
+ be used unencoded in a URI.
+
+ REDUCED OR INCREASED SAFE CHARACTER SETS
+
+ The same encoding method may be used for encoding characters whose
+ use, although technically allowed in a URI, would be unwise due to
+ problems of corruption by imperfect gateways or misrepresentation
+ due to the use of variant character sets, or which would simply be
+ awkward in a given environment. Because a % sign always indicates
+ an encoded character, a URI may be made "safer" simply by encoding
+ any characters considered unsafe, while leaving already encoded
+ characters still encoded. Similarly, in cases where a larger set
+ of characters is acceptable, % signs can be selectively and
+ reversibly expanded.
+
+ Before two URIs can be compared, it is therefore necessary to
+ bring them to the same encoding level.
+
+ However, the reserved characters mentioned above have a quite
+ different significance when encoded, and so may NEVER be encoded
+ and unencoded in this way.
+
+
+
+Berners-Lee [Page 8]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ The percent sign intended as such must always be encoded, as its
+ presence otherwise always indicates an encoding. Sequences which
+ start with a percent sign but are not followed by two hexadecimal
+ characters are reserved for future extension. (See Example 3.)
+
+ Example 1
+
+ The URIs
+
+ http://info.cern.ch/albert/bertram/marie-claude
+
+ and
+
+ http://info.cern.ch/albert/bertram/marie%2Dclaude
+
+ are identical, as the %2D encodes a hyphen character.
+
+ Example 2
+
+ The URIs
+
+ http://info.cern.ch/albert/bertram/marie-claude
+
+ and
+
+ http://info.cern.ch/albert/bertram%2Fmarie-claude
+
+ are NOT identical, as in the second case the encoded slash does not
+ have hierarchical significance.
+
+ Example 3
+
+ The URIs
+
+ fxqn:/us/va/reston/cnri/ietf/24/asdf%*.fred
+
+ and
+
+ news:12345667123%asdghfh@info.cern.ch
+
+ are illegal, as all % characters imply encodings, and there is no
+ decoding defined for "%*" or "%as" in this recommendation.
+
+Partial (relative) form
+
+ Within a object whose URI is well defined, the URI of another object
+ may be given in abbreviated form, where parts of the two URIs are the
+ same. This allows objects within a group to refer to each other
+
+
+
+Berners-Lee [Page 9]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ without requiring the space for a complete reference, and it
+ incidentally allows the group of objects to be moved without changing
+ any references. It must be emphasized that when a reference is
+ passed in anything other than a well controlled context, the full
+ form must always be used.
+
+ In the World-Wide Web applications, the context URI is that of the
+ document or object containing a reference. In this case partial URIs
+ can be generated in virtual objects or stored in real objects,
+ without the need for dramatic change if the higher-order parts of a
+ hierarchical naming system are modified. Apart from terseness, this
+ gives greater robustness to practical systems, by enabling
+ information hiding between system components.
+
+ The partial form relies on a property of the URI syntax that certain
+ characters ("/") and certain path elements ("..", ".") have a
+ significance reserved for representing a hierarchical space, and must
+ be recognized as such by both clients and servers.
+
+ A partial form can be distinguished from an absolute form in that the
+ latter must have a colon and that colon must occur before any slash
+ characters. Systems not requiring partial forms should not use any
+ unencoded slashes in their naming schemes. If they do, absolute URIs
+ will still work, but confusion may result. (See note on Gopher
+ below.)
+
+ The rules for the use of a partial name relative to the URI of the
+ context are:
+
+ If the scheme parts are different, the whole absolute URI must
+ be given. Otherwise, the scheme is omitted, and:
+
+ If the partial URI starts with a non-zero number of consecutive
+ slashes, then everything from the context URI up to (but not
+ including) the first occurrence of exactly the same number of
+ consecutive slashes which has no greater number of consecutive
+ slashes anywhere to the right of it is taken to be the same and
+ so prepended to the partial URL to form the full URL. Otherwise:
+
+ The last part of the path of the context URI (anything following
+ the rightmost slash) is removed, and the given partial URI
+ appended in its place, and then:
+
+ Within the result, all occurrences of "xxx/../" or "/." are
+ recursively removed, where xxx, ".." and "." are complete path
+ elements.
+
+
+
+
+
+Berners-Lee [Page 10]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ Note: Trailing slashes
+
+ If a path of the context locator ends in slash, partial URIs are
+ treated differently to the URI with the same path but without a
+ trailing slash. The trailing slash indicates a void segment of the
+ path.
+
+ Note: Gopher
+
+ The gopher system does not have the concept of relative URIs, and the
+ gopher community currently allows / as data characters in gopher URIs
+ without escaping them to %2F. Relative forms may not in general be
+ used for documents served by gopher servers. If they are used, then
+ WWW software assumes, normally correctly, that in fact they do have
+ hierarchical significance despite the specifications. The use of HTTP
+ rather than gopher protocol is however recommended.
+
+ Examples
+
+ In the context of URI
+
+ magic://a/b/c//d/e/f
+
+ the partial URIs would expand as follows:
+
+ g magic://a/b/c//d/e/g
+
+ /g magic://a/g
+
+ //g magic://g
+
+ ../g magic://a/b/c//d/g
+
+ g:h g:h
+
+ and in the context of the URI
+
+ magic://a/b/c//d/e/
+
+ the results would be exactly the same.
+
+Fragment-id
+
+ This represents a part of, fragment of, or a sub-function within, an
+ object. Its syntax and semantics are defined by the application
+ responsible for the object, or the specification of the content type
+ of the object. The only definition here is of the allowed characters
+ by which it may be represented in a URL.
+
+
+
+Berners-Lee [Page 11]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ Specific syntaxes for representing fragments in text documents by
+ line and character range, or in graphics by coordinates, or in
+ structured documents using ladders, are suitable for standardization
+ but not defined here.
+
+ The fragment-id follows the URL of the whole object from which it is
+ separated by a hash sign (#). If the fragment-id is void, the hash
+ sign may be omitted: A void fragment-id with or without the hash sign
+ means that the URL refers to the whole object.
+
+ While this hook is allowed for identification of fragments, the
+ question of addressing of parts of objects, or of the grouping of
+ objects and relationship between continued and containing objects, is
+ not addressed by this document.
+
+ Fragment identifiers do NOT address the question of objects which are
+ different versions of a "living" object, nor of expressing the
+ relationships between different versions and the living object.
+
+ There is no implication that a fragment identifier refers to anything
+ which can be extracted as an object in its own right. It may, for
+ example, refer to an indivisible point within an object.
+
+Specific Schemes
+
+ The mapping for URIs onto some existing standard and experimental
+ protocols is outlined in the BNF syntax definition. Notes on
+ particular protocols follow. These URIs are frequently referred to
+ as URLs, though the exact definition of the term URL is still under
+ discussion (March 1993). The schemes covered are:
+
+ http Hypertext Transfer Protocol (examples)
+
+ ftp File Transfer protocol
+
+ gopher Gopher protocol
+
+ mailto Electronic mail address
+
+ news Usenet news
+
+ telnet, rlogin and tn3270
+ Reference to interactive sessions
+
+ wais Wide Area Information Servers
+
+ file Local file access
+
+
+
+
+Berners-Lee [Page 12]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ The following schemes are proposed as essential to the unification of
+ the web with electronic mail, but not currently (to the author's
+ knowledge) implemented:
+
+ mid Message identifiers for electronic mail
+
+ cid Content identifiers for MIME body part
+
+ The schemes for X.500, network management database, and Whois++ have
+ not been specified and may be the subject of further study. Schemes
+ for Prospero, and restricted NNTP use are not currently implemented
+ as far as the author is aware.
+
+ The "urn" prefix is reserved for use in encoding a Uniform Resource
+ Name when that has been developed by the IETF working group.
+
+ New schemes may be registered at a later time.
+
+HTTP
+
+ The HTTP protocol specifies that the path is handled transparently by
+ those who handle URLs, except for the servers which de-reference
+ them. The path is passed by the client to the server with any
+ request, but is not otherwise understood by the client.
+
+ The host details are not passed on to the client when the URL is an
+ HTTP URL which refers to the server in question. In this case the
+ string sent starts with the slash which follows the host details.
+ However, when an HTTP server is being used as a gateway (or "proxy")
+ then the entire URI, whether HTTP or some other scheme, is passed on
+ the HTTP command line. The search part, if present, is sent as part
+ of the HTTP command, and may in this respect be treated as part of
+ the path. No fragmentid part of a WWW URI (the hash sign and
+ following) is sent with the request. Spaces and control characters
+ in URLs must be escaped for transmission in HTTP, as must other
+ disallowed characters.
+
+ EXAMPLES
+
+ These examples are not part of the specification: they are
+ provided as illustations only. The URI of the "welcome" page to a
+ server is conventionally
+
+ http://www.my.work.com/
+
+ As the rest of the URL (after the hostname an port) is opaque
+ to the client, it shows great variety but the following are all
+ fairly typical.
+
+
+
+Berners-Lee [Page 13]
+
+RFC 1630 URIs in WWW June 1994
+
+
+http://www.my.uni.edu/info/matriculation/enroling.html
+
+http://info.my.org/AboutUs/Phonebook
+
+http://www.library.my.town.va.us/Catalogue/76523471236%2Fwen44--4.98
+
+http://www.my.org/462F4F2D4241522A314159265358979323846
+
+ A URL for a server on a different port to 80 looks like
+
+ http://info.cern.ch:8000/imaginary/test
+
+ A reference to a particular part of a document may, including the
+ fragment identifier, look like
+
+ http://www.myu.edu/org/admin/people#andy
+
+ in which case the string "#andy" is not sent to the server, but is
+ retained by the client and used when the whole object had been
+ retrieved.
+
+ A search on a text database might look like
+
+ http://info.my.org/AboutUs/Index/Phonebook?dobbins
+
+ and on another database
+
+ http://info.cern.ch/RDB/EMP?*%20where%20name%%3Ddobbins
+
+ In all cases the client passes the path string to the server
+ uninterpreted, and for the client to deduce anything from
+
+FTP
+
+ The ftp: prefix indicates that the FTP protocol is used, as defined
+ in STD 9, RFC 959 or any successor. The port number, if present,
+ gives the port of the FTP server if not the FTP default.
+
+ User name and password
+
+ The syntax allows for the inclusion of a user name and even a
+ password for those systems which do not use the anonymous FTP
+ convention. The default, however, if no user or password is
+ supplied, will be to use that convention, viz. that the user name
+ is "anonymous" and the password the user's Internet-style mail
+ address.
+
+
+
+
+
+Berners-Lee [Page 14]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ Where possible, this mail address should correspond to a usable
+ mail address for the user, and preferably give a DNS host name
+ which resolves to the IP address of the client. Note that servers
+ currently vary in their treatment of the anonymous password.
+
+ Path
+
+ The FTP protocol allows for a sequence of CWD commands (change
+ working directory) and a TYPE command prior to service commands
+ such as RETR (retrieve) or NLIST (etc.) which actually access a
+ file.
+
+ The arguments of any CWD commands are successive segment parts of
+ the URL delimited by slash, and the final segment is suitable as
+ the filename argument to the RETR command for retrieval or the
+ directory argument to NLIST.
+
+ For some file systems (Unix in particular), the "/" used to denote
+ the hierarchical structure of the URL corresponds to the delimiter
+ used to construct a file name hierarchy, and thus, the filename
+ will look the same as the URL path. This does NOT mean that the
+ URL is a Unix filename.
+
+ Note: Retrieving subsequent URLs from the same host
+
+ There is no common hierarchical model to the FTP protocol, so if a
+ directory change command has been given, it is impossible in
+ general to deduce what sequence should be given to navigate to
+ another directory for a second retrieval, if the paths are
+ different. The only reliable algorithm is to disconnect and
+ reestablish the control connection.
+
+ Data type
+
+ The data content type of a file can only, in the general FTP case,
+ be deduced from the name, normally the suffix of the name. This
+ is not standardized. An alternative is for it to be transferred in
+ information outside the URL. A suitable FTP transfer type (for
+ example binary "I" or text "A") must in turn be deduced from the
+ data content type. It is recommended that conventions for
+ suffixes of public archives be established, but it is outside the
+ scope of this standard.
+
+ An FTP URL may optionally specify the FTP data transfer type by
+ which an object is to be retrieved. Most of the methods correspond
+ to the FTP "Data Types" ASCII and IMAGE for the retrieval of a
+ document, as specified in FTP by the TYPE command. One method
+ indicates directory access.
+
+
+
+Berners-Lee [Page 15]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ The data type is specified by a suffix to the URL. Possible
+ suffixes are:
+
+ ;type = <type-code> Use FTP type as given to perform data
+ transfer.
+
+ / Use FTP directory list commands to read
+ directory
+
+ The type code is in the format defined in RFC 959 except that THE
+ SPACE IS OMITTED FROM THE URL.
+
+ Transfer Mode
+
+ Stream Mode is always used.
+
+Gopher
+
+ The gopher URL specifies the host and optionally the port to which
+ the client should connect. This is followed by a slash and a single
+ gopher type code. This type code is used by the client to determine
+ how to interpret the server's reply and is is not for sending to
+ server. The command string to be sent to the server immediately
+ follows the gopher type character. It consists of the gopher
+ selector string followed by any "Gopher plus" syntax, but always
+ omitting the trainling CR LF pair.
+
+ When the gopher command string contains characters (such a embedded
+ CR LF and HT characters) not allowed in a URL, these are encoded
+ using the conventional encoding.
+
+ Note that some gopher selector strings begin with a copy of the
+ gopher type character, in which case that character will occur twice
+ consecutively. Also note that the gopher selector string may be an
+ empty string since this is how gopher clients refer to the top-level
+ directory on a gopher server.
+
+ If the encoded command string (with trailing CR LF stripped) would be
+ void then the gopher type character may be omiited and "1" (ASCII 31
+ hex) is assumed.
+
+ Note that slash "/" in gopher selector strings may not correspond to
+ a level in a hierarchical structure.
+
+
+
+
+
+
+
+
+Berners-Lee [Page 16]
+
+RFC 1630 URIs in WWW June 1994
+
+
+Mailto
+
+ This allows a URL to specify an RFC822 addr-spec mail address. Note
+ that use of % , for example as used in forming a gatewayed mail
+ address, requires conversion to %25 in a URL.
+
+News
+
+ The news locators refer to either news group names or article message
+ identifiers which must conform to the rules for a Message-Id of RFC
+ 1036 (Horton 1987). A message identifier may be distinguished from a
+ news group name by the presence of the commercial at "@" character.
+ These rules imply that within an article, a reference to a news group
+ or to another article will be a valid URL (in the partial form).
+
+ A news URL may be dereferenced using NNTP (RFC 977, Kantor 1986)
+ (The ARTICLE by message-id command ) or using any other protocol for
+ the conveyance of usenet news articles, or by reference to a body of
+ news articles already received.
+
+ Note 1:
+
+ Among URLs the "news" URLs are anomalous in that they are
+ location-independent. They are unsuitable as URN candidates
+ because the NNTP architecture relies on the expiry of articles and
+ therefore a small number of articles being available at any time.
+ When a news: URL is quoted, the assumption is that the reader will
+ fetch the article or group from his or her local news host. News
+ host names are NOT part of news URLs.
+
+ Note 2:
+
+ An outstanding problem is that the message identifier is
+ insufficient to allow the retrieval of an expired article, as no
+ algorithm exists for deriving an archive site and file name. The
+ addition of the date and news group set to the article's URL would
+ allow this if a directory existed of archive sites by news group.
+
+ Suggested subject of study in conjunction with NNTP working group.
+ Further extension possible may be to allow the naming of subject
+ threads as addressable objects.
+
+Telnet, rlogin, tn3270
+
+ The use of URLs to represent interactive sessions is a convenient
+ extension to their uses for objects. This allows access to
+ information systems which only provide an interactive service, and no
+ information server. As information within the service cannot be
+
+
+
+Berners-Lee [Page 17]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ addressed individually or, in general, automatically retrieved, this
+ is a less desirable, though currently common, solution.
+
+URN
+
+ The "Universal Resource Name" is currently (March 1993) under
+ development in the IETF. A requirements specification is in
+ preparation. It currently looks as though it will be a short string
+ suitable for encoding in URI syntax, for which case the "urn:" prefix
+ is reserved. The URN shall be encoded precisely as defined in the
+ (future) URN standard, except in that:
+
+ If the official description of the URN syntax includes any
+ constant wrapper characters, then they shall not be omitted from
+ the URI encoding of the URN;
+
+ If the URN has a hierarchical nature, then the slash delimiter
+ shall be used in the URI encoding;
+
+ If the URN has a hierarchical nature, the most significant part
+ shall be encoded on the left in the URI encoding;
+
+ Any characters with reserved meanings in the URI syntax shall be
+ escape encoded
+
+ These rules of course apply to any URI scheme. It is of course
+ possible that the URN syntax will be chosen such that the URI
+ encoding will be a 1-1 transcription.
+
+ An example might be a name such as
+
+ urn:/iana/dns/ch/cern/cn/techdoc/94/1642-3
+
+ but the reader should refer to the latest URN drafts or
+ specifications.
+
+WAIS
+
+ The current WAIS implementation public domain requires that a client
+ know the "type" of a object prior to retrieval. This value is
+ returned along with the internal object identifier in the search
+ response. It has been encoded into the path part of the URL in order
+ to make the URL sufficient for the retrieval of the object.
+
+ Within the WAIS world, names do not of course need to be prefixed by
+ "wais:" (by the partial form rules).
+
+
+
+
+
+Berners-Lee [Page 18]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ The wpath of a WAIS URL consists of encoded fields of the WAIS
+ identifier, in the same order as inthe WAIS identifier. For each
+ field, the identifier field number is the digits before the equals
+ sign, and the field contents follow, encoded in the conventional
+ encoding, terminated by ";".
+
+file
+
+ The other URI schemes (except nntp) share the property that they are
+ equally valid at any geographical place.
+
+ There is however a real practical requirement to be able to generate
+ a URL for an object in a machine's local file system.
+
+ The syntax is similar to the ftp syntax, but in this case the slash
+ is used to donate boundaries between directory levels of a
+ hierarchical file system is used. The "client" software converts the
+ file URL into a file name in the local file name conventions. This
+ allows local files to be treated just as network objects without any
+ necessity to use a network server for access. This may be used for
+ example for defining a user's "home" document in WWW.
+
+ There is clearly a danger of confusion that a link made to a local
+ file should be followed by someone on a different system, with
+ unexpected and possibly harmful results. Therefore, the convention
+ is that even a "file" URL is provided with a host part. This allows
+ a client on another system to know that it cannot access the file
+ system, or perhaps to use some other local mecahnism to access the
+ file.
+
+ The special value "localhost" is used in the host field to indicate
+ that the filename should really be used on whatever host one is.
+ This for example allows links to be made to files which are
+ distribted on many machines, or to "your unix local password file"
+ subject of course to consistency across the users of the data.
+
+ A void host field is equivalent to "localhost".
+
+Message-Id
+
+ For systems which include information transferred using mail
+ protocols, there is a need to be able to make cross-references
+ between different items of information, even though, by the nature of
+ mail, those items are only available to a restricted set of people.
+
+ Two schemes are defined. The first, "mid:", refers to the STD 11,
+ RFC 822 Message-Id of a mail message. This Identifier is already
+ used in RFC 822 in for example the References and In-Reply-to field.
+
+
+
+Berners-Lee [Page 19]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ The rest of the URL after the "mid:" is the RFC822 msg-id with the
+ constant <> wrapper removed, leaving an identifier whose format in
+ fact happens to be the same as addr-spec format for mailboxes (though
+ the semantics are different).
+
+ The use of a "mid" URL implies access to a body of mail already
+ received. If a message has been distributed using NNTP or other
+ usenet protocols over the news system, then the "news:" form should
+ be used.
+
+Content-Id
+
+ The second scheme, "cid:", is similar to "mid:", but makes reference
+ to a body part of a MIME message by the value of its content-id
+ field. This allows, for example, a master document being the first
+ part of a multipart/related MIME message to refer to component parts
+ which are transferred in the same message.
+
+ Note
+
+ Beware however, that content identifiers are only required to be
+ unique within the context of a given MIME message, and so the cid:
+ URL is only meaningful with the context the same MIME message. For
+ a reference outside the message, it would need to be appended to
+ the message-id of the whole message. A syntax for this has not
+ been defined.
+
+Schemes for Further Study
+
+ X500
+
+ The mapping of x500 names onto URLs is not defined here. A
+ decision is required as to whether "distinguished names" or "user
+ friendly names" (ufn), or both, should be allowed. If any
+ punctuation conversions are needed from the adopted x500
+ representation (such as the use of slashes between parts of a ufn)
+ they must be defined. This is a subject for study.
+
+ WHOIS
+
+ This prefix describes the access using the "whois++" scheme in the
+ process of definition. The host name part is the same as for
+ other IP based schemes. The path part can be either a whois
+ handle for a whois object, or it can be a valid whois query
+ string. This is a subject for further study.
+
+
+
+
+
+
+Berners-Lee [Page 20]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ NETWORK MANAGEMENT DATABASE
+
+ This is a subject for study.
+
+ NNTP
+
+ This is an alternative form of reference for news articles,
+ specifically to be used with NNTP servers, and particularly those
+ incomplete server implementations which do not allow retrieval by
+ message identifier. In all other cases the "news" scheme should
+ be used.
+
+ The news server name, newsgroup name, and index number of an
+ article within the newsgroup on that particular server are given.
+ The NNTP protocol must be used.
+
+ Note 1.
+
+ This form of URL is not of global accessability, as typically
+ NNTP servers only allow access from local clients. Note that
+ the article numbers within groups vary from server to server.
+
+ This form or URL should not be quoted outside this local area.
+ It should not be used within news articles for wider
+ circulation than the one server. This is a local identifier
+ for a resource which is often available globally, and so is not
+ recommended except in the case in which incomplete NNTP
+ implementations on the local server force its adoption.
+
+Prospero
+
+ The Prospero (Neuman, 1991) directory service is used to resolve the
+ URL yielding an access method for the object (which can then itself
+ be represented as a URL if translated). The host part contains a
+ host name or internet address. The port part is optional.
+
+ The path part contains a host specific object name and an optional
+ version number. If present, the version number is separated from the
+ host specific object name by the characters "%00" (percent zero
+ zero), this being an escaped string terminator (null). External
+ Prospero links are represented as URLs of the underlying access
+ method and are not represented as Prospero URLs.
+
+Registration of naming schemes
+
+ A new naming scheme may be introduced by defining a mapping onto a
+ conforming URL syntax, using a new prefix. Experimental prefixes may
+ be used by mutual agreement between parties, and must start with the
+
+
+
+Berners-Lee [Page 21]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ characters "x-". The scheme name "urn:" is reserved for the work in
+ progress on a scheme for more persistent names.
+
+ It is proposed that the Internet Assigned Numbers Authority (IANA)
+ perform the function of registration of new schemes. Any submission
+ of a new URI scheme must include a definition of an algorithm for the
+ retrieval of any object within that scheme. The algorithm must take
+ the URI and produce either a set of URL(s) which will lead to the
+ desired object, or the object itself, in a well-defined or
+ determinable format.
+
+ It is recommended that those proposing a new scheme demonstrate its
+ utility and operability by the provision of a gateway which will
+ provide images of objects in the new scheme for clients using an
+ existing protocol. If the new scheme is not a locator scheme, then
+ the properties of names in the new space should be clearly defined.
+ It is likewise recommended that, where a protocol allows for
+ retrieval by URL, that the client software have provision for being
+ configured to use specific gateway locators for indirect access
+ through new naming schemes.
+
+BNF of Generic URI Syntax
+
+ This is a BNF-like description of the URI syntax. at the level at
+ which specific schemes are not considered.
+
+ A vertical line "|" indicates alternatives, and [brackets] indicate
+ optional parts. Spaces are represented by the word "space", and the
+ vertical line character by "vline". Single letters stand for single
+ letters. All words of more than one letter below are entities
+ described somewhere in this description.
+
+ The "generic" production gives a higher level parsing of the same
+ URIs as the other productions. The "national" and "punctuation"
+ characters do not appear in any productions and therefore may not
+ appear in URIs.
+
+ fragmentaddress uri [ # fragmentid ]
+
+ uri scheme : path [ ? search ]
+
+ scheme ialpha
+
+ path void | xpalphas [ / path ]
+
+ search xalphas [ + search ]
+
+ fragmentid xalphas
+
+
+
+Berners-Lee [Page 22]
+
+RFC 1630 URIs in WWW June 1994
+
+
+
+ xalpha alpha | digit | safe | extra | escape
+
+ xalphas xalpha [ xalphas ]
+
+ xpalpha xalpha | +
+
+ xpalphas xpalpha [ xpalpha ]
+
+ ialpha alpha [ xalphas ]
+
+ alpha a | b | c | d | e | f | g | h | i | j | k |
+ l | m | n | o | p | q | r | s | t | u | v |
+ w | x | y | z | A | B | C | D | E | F | G |
+ H | I | J | K | L | M | N | O | P | Q | R |
+ S | T | U | V | W | X | Y | Z
+
+ digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
+ safe $ | - | _ | @ | . | &
+
+ extra ! | * | " | ' | ( | ) | ,
+
+ reserved = | ; | / | # | ? | : | space
+
+ escape % hex hex
+
+ hex digit | a | b | c | d | e | f | A | B | C |
+ D | E | F
+
+ national { | } | vline | [ | ] | \ | ^ | ~
+
+ punctuation < | >
+
+ void
+
+ (end of URI BNF)
+
+BNF for specific URL schemes
+
+ This is a BNF-like description of the Uniform Resource Locator
+ syntax. A vertical line "|" indicates alternatives, and [brackets]
+ indicate optional parts. Spaces are represented by the word "space",
+ and the vertical line character by "vline". Single letters stand for
+ single letters. All words of more than one letter below are entities
+ described somewhere in this description.
+
+
+
+
+
+Berners-Lee [Page 23]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ The current IETF URI Working Group preference is for the prefixedurl
+ production. (Nov 1993. July 93: url).
+
+ The "national" and "punctuation" characters do not appear in any
+ productions and therefore may not appear in URLs.
+
+ The "afsaddress" is left in as historical note, but is not a url
+ production.
+
+ prefixedurl u r l : url
+
+ url httpaddress | ftpaddress | newsaddress |
+ nntpaddress | prosperoaddress | telnetaddress
+ | gopheraddress | waisaddress |
+ mailtoaddress | midaddress | cidaddress
+
+ scheme ialpha
+
+ httpaddress h t t p : / / hostport [ / path ] [ ?
+ search ]
+
+ ftpaddress f t p : / / login / path [ ftptype ]
+
+ afsaddress a f s : / / cellname / path
+
+ newsaddress n e w s : groupart
+
+ nntpaddress n n t p : group / digits
+
+ midaddress m i d : addr-spec
+
+ cidaddress c i d : content-identifier
+
+ mailtoaddress m a i l t o : xalphas @ hostname
+
+ waisaddress waisindex | waisdoc
+
+ waisindex w a i s : / / hostport / database [ ? search
+ ]
+
+ waisdoc w a i s : / / hostport / database / wtype /
+ wpath
+
+ wpath digits = path ; [ wpath ]
+
+ groupart * | group | article
+
+ group ialpha [ . group ]
+
+
+
+Berners-Lee [Page 24]
+
+RFC 1630 URIs in WWW June 1994
+
+
+
+ article xalphas @ host
+
+ database xalphas
+
+ wtype xalphas
+
+ prosperoaddress prosperolink
+
+ prosperolink p r o s p e r o : / / hostport / hsoname [ %
+ 0 0 version [ attributes ] ]
+
+ hsoname path
+
+ version digits
+
+ attributes attribute [ attributes ]
+
+ attribute alphanums
+
+ telnetaddress t e l n e t : / / login
+
+ gopheraddress g o p h e r : / / hostport [/ gtype [
+ gcommand ] ]
+
+ login [ user [ : password ] @ ] hostport
+
+ hostport host [ : port ]
+
+ host hostname | hostnumber
+
+ ftptype A formcode | E formcode | I | L digits
+
+ formcode N | T | C
+
+ cellname hostname
+
+ hostname ialpha [ . hostname ]
+
+ hostnumber digits . digits . digits . digits
+
+ port digits
+
+ gcommand path
+
+ path void | segment [ / path ]
+
+ segment xpalphas
+
+
+
+Berners-Lee [Page 25]
+
+RFC 1630 URIs in WWW June 1994
+
+
+
+ search xalphas [ + search ]
+
+ user alphanum2 [ user ]
+
+ password alphanum2 [ password ]
+
+ fragmentid xalphas
+
+ gtype xalpha
+
+ alphanum2 alpha | digit | - | _ | . | +
+
+ xalpha alpha | digit | safe | extra | escape
+
+ xalphas xalpha [ xalphas ]
+
+ xpalpha xalpha | +
+
+ xpalphas xpalpha [ xpalphas ]
+
+ ialpha alpha [ xalphas ]
+
+ alpha a | b | c | d | e | f | g | h | i | j | k |
+ l | m | n | o | p | q | r | s | t | u | v |
+ w | x | y | z | A | B | C | D | E | F | G |
+ H | I | J | K | L | M | N | O | P | Q | R |
+ S | T | U | V | W | X | Y | Z
+
+ digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
+
+ safe $ | - | _ | @ | . | & | + | -
+
+ extra ! | * | " | ' | ( | ) | ,
+
+ reserved = | ; | / | # | ? | : | space
+
+ escape % hex hex
+
+ hex digit | a | b | c | d | e | f | A | B | C |
+ D | E | F
+
+ national { | } | vline | [ | ] | \ | ^ | ~
+
+ punctuation < | >
+
+ digits digit [ digits ]
+
+
+
+
+Berners-Lee [Page 26]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ alphanum alpha | digit
+
+ alphanums alphanum [ alphanums ]
+
+ void
+
+ (end of URL BNF)
+
+References
+
+ Alberti, R., et.al., "Notes on the Internet Gopher Protocol",
+ University of Minnesota, December 1991,
+ <ftp://boombox.micro.umn.edu/pub/gopher/ gopher_protocol>. See also
+ <gopher://gopher.micro.umn.edu/00/Information About Gopher/About
+ Gopher>
+
+ Berners-Lee, T., "Hypertext Transfer Protocol (HTTP)", CERN, December
+ 1991, as updated from time to time,
+ <ftp://info.cern.ch/pub/www/doc/http-spec.txt>
+
+ Crocker, D., "Standard for ARPA Internet Text Messages" STD 11, RFC
+ 822, UDel, August 1982.
+
+ Davis, F, et al., "WAIS Interface Protocol: Prototype Functional
+ Specification", Thinking Machines Corporation, April 23, 1990.
+ <ftp://quake.think.com/pub/wa is/doc/protspec.txt>
+
+ International Standards Organization, Information and Documentation -
+ Search and Retrieve Application Protocol Specification for open
+ Systems Interconnection, ISO-10163.
+
+ Horton, M., and R. Adams, "Standard for Interchange of USENET
+ messages", RFC 1036, AT&T Bell Laboratories, Center for Seismic
+ Studies, December 1987.
+
+ Huitema, C., "Naming: strategies and techniques", Computer Networks
+ and ISDN Systems 23 (1991) 107-110.
+
+ Kahle, B., "Document Identifiers, or International Standard Book
+ Numbers for the Electronic Age", <ftp:
+ //quake.think.com/pub/wais/doc/doc-ids.txt>
+
+ Kantor, B., and P. Lapsley, Kantor, B., and P. Lapsley, "Network News
+ Transfer Protocol", RFC 977, UC San Diego & UC Berkeley, February
+ 1986. <ftp://ds.internic.net/rfc/rfc977.txt>
+
+ Kunze, J., "Requirements for URLs", Work in Progress.
+
+
+
+
+Berners-Lee [Page 27]
+
+RFC 1630 URIs in WWW June 1994
+
+
+ Lynch, C., Coalition for Networked Information: "Workshop on ID and
+ Reference Structures for Networked Information", November 1991. See
+ <wais://quake.think.com/wais-discussion-archives?lynch>
+
+ Mockapetris, P., "Domain Names - Concepts and Facilities", STD 13, RFC
+ 1034, USC/Information Sciences Institute, November 1987,
+ <ftp://ds.internic.net/rfc/rfc1034.txt>
+
+ Neuman, B. Clifford, "Prospero: A Tool for Organizing Internet
+ Resources", Electronic Networking: Research, Applications and
+ Policy, Vol 1 No 2, Meckler Westport CT USA, 1992. See also
+ <ftp://prospero.isi.edu/pub/prospero/oir.ps>
+
+ Postel, J., and J. Reynolds, "File Transfer Protocol (FTP)", STD 9,
+ RFC 959, USC/Information Sciences Institute, October 1985.
+ <ftp://ds.internic.net/rfc/rfc959.txt>
+
+ Sollins, K., and L. Masinter, "Requiremnets for URNs", Work in
+ Progress.
+
+ Yeong, W., "Towards Networked Information Retrieval", Technical report
+ 91-06-25-01, June 1991, Performance Systems International, Inc.
+ <ftp://uu.psi.com/wp/nir.txt>
+
+ Yeong, W., "Representing Public Archives in the Directory", Work in
+ Progress, November 1991, now expired.
+
+Security Considerations
+
+ Security issues are not discussed in this memo.
+
+Author's Address
+
+ Tim Berners-Lee
+ World-Wide Web project
+ CERN
+ 1211 Geneva 23,
+ Switzerland
+
+ Phone: +41 (22)767 3755
+ Fax: +41 (22)767 7155
+ EMail: timbl@info.cern.ch
+
+
+
+
+
+
+
+
+
+Berners-Lee [Page 28]
+