diff options
Diffstat (limited to 'doc/rfc/rfc1738.txt')
-rw-r--r-- | doc/rfc/rfc1738.txt | 1403 |
1 files changed, 1403 insertions, 0 deletions
diff --git a/doc/rfc/rfc1738.txt b/doc/rfc/rfc1738.txt new file mode 100644 index 0000000..3728866 --- /dev/null +++ b/doc/rfc/rfc1738.txt @@ -0,0 +1,1403 @@ + + + + + + +Network Working Group T. Berners-Lee +Request for Comments: 1738 CERN +Category: Standards Track L. Masinter + Xerox Corporation + M. McCahill + University of Minnesota + Editors + December 1994 + + + Uniform Resource Locators (URL) + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + This document specifies a Uniform Resource Locator (URL), the syntax + and semantics of formalized information for location and access of + resources via the Internet. + +1. Introduction + + This document describes the syntax and semantics for a compact string + representation for a resource available via the Internet. These + strings are called "Uniform Resource Locators" (URLs). + + The specification is derived from concepts introduced by the World- + Wide Web global information initiative, whose use of such objects + dates from 1990 and is described in "Universal Resource Identifiers + in WWW", RFC 1630. The specification of URLs is designed to meet the + requirements laid out in "Functional Requirements for Internet + Resource Locators" [12]. + + This document was written by the URI working group of the Internet + Engineering Task Force. Comments may be addressed to the editors, or + to the URI-WG <uri@bunyip.com>. Discussions of the group are archived + at <URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html> + + + + + + + + +Berners-Lee, Masinter & McCahill [Page 1] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +2. General URL Syntax + + Just as there are many different methods of access to resources, + there are several schemes for describing the location of such + resources. + + The generic syntax for URLs provides a framework for new schemes to + be established using protocols other than those defined in this + document. + + URLs are used to `locate' resources, by providing an abstract + identification of the resource location. Having located a resource, + a system may perform a variety of operations on the resource, as + might be characterized by such words as `access', `update', + `replace', `find attributes'. In general, only the `access' method + needs to be specified for any URL scheme. + +2.1. The main parts of URLs + + A full BNF description of the URL syntax is given in Section 5. + + In general, URLs are written as follows: + + <scheme>:<scheme-specific-part> + + A URL contains the name of the scheme being used (<scheme>) followed + by a colon and then a string (the <scheme-specific-part>) whose + interpretation depends on the scheme. + + Scheme names consist of a sequence of characters. The lower case + letters "a"--"z", digits, and the characters plus ("+"), period + ("."), and hyphen ("-") are allowed. For resiliency, programs + interpreting URLs should treat upper case letters as equivalent to + lower case in scheme names (e.g., allow "HTTP" as well as "http"). + +2.2. URL Character Encoding Issues + + URLs are sequences of characters, i.e., letters, digits, and special + characters. A URLs may be represented in a variety of ways: e.g., ink + on paper, or a sequence of octets in a coded character set. The + interpretation of a URL depends only on the identity of the + characters used. + + In most URL schemes, the sequences of characters in different parts + of a URL are used to represent sequences of octets used in Internet + protocols. For example, in the ftp scheme, the host name, directory + name and file names are such sequences of octets, represented by + parts of the URL. Within those parts, an octet may be represented by + + + +Berners-Lee, Masinter & McCahill [Page 2] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + the chararacter which has that octet as its code within the US-ASCII + [20] coded character set. + + In addition, octets may be encoded by a character triplet consisting + of the character "%" followed by the two hexadecimal digits (from + "0123456789ABCDEF") which forming the hexadecimal value of the octet. + (The characters "abcdef" may also be used in hexadecimal encodings.) + + Octets must be encoded if they have no corresponding graphic + character within the US-ASCII coded character set, if the use of the + corresponding character is unsafe, or if the corresponding character + is reserved for some other interpretation within the particular URL + scheme. + + No corresponding graphic US-ASCII: + + URLs are written only with the graphic printable characters of the + US-ASCII coded character set. The octets 80-FF hexadecimal are not + used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent + control characters; these must be encoded. + + Unsafe: + + Characters can be unsafe for a number of reasons. The space + character is unsafe because significant spaces may disappear and + insignificant spaces may be introduced when URLs are transcribed or + typeset or subjected to the treatment of word-processing programs. + The characters "<" and ">" are unsafe because they are used as the + delimiters around URLs in free text; the quote mark (""") is used to + delimit URLs in some systems. The character "#" is unsafe and should + always be encoded because it is used in World Wide Web and in other + systems to delimit a URL from a fragment/anchor identifier that might + follow it. The character "%" is unsafe because it is used for + encodings of other characters. Other characters are unsafe because + gateways and other transport agents are known to sometimes modify + such characters. These characters are "{", "}", "|", "\", "^", "~", + "[", "]", and "`". + + All unsafe characters must always be encoded within a URL. For + example, the character "#" must be encoded within URLs even in + systems that do not normally deal with fragment or anchor + identifiers, so that if the URL is copied into another system that + does use them, it will not be necessary to change the URL encoding. + + + + + + + + +Berners-Lee, Masinter & McCahill [Page 3] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + Reserved: + + Many URL schemes reserve certain characters for a special meaning: + their appearance in the scheme-specific part of the URL has a + designated semantics. If the character corresponding to an octet is + reserved in a scheme, the octet must be encoded. The characters ";", + "/", "?", ":", "@", "=" and "&" are the characters which may be + reserved for special meaning within a scheme. No other characters may + be reserved within a scheme. + + Usually a URL has the same interpretation when an octet is + represented by a character and when it encoded. However, this is not + true for reserved characters: encoding a character reserved for a + particular scheme may change the semantics of a URL. + + Thus, only alphanumerics, the special characters "$-_.+!*'(),", and + reserved characters used for their reserved purposes may be used + unencoded within a URL. + + On the other hand, characters that are not required to be encoded + (including alphanumerics) may be encoded within the scheme-specific + part of a URL, as long as they are not being used for a reserved + purpose. + +2.3 Hierarchical schemes and relative links + + In some cases, URLs are used to locate resources that contain + pointers to other resources. In some cases, those pointers are + represented as relative links where the expression of the location of + the second resource is in terms of "in the same place as this one + except with the following relative path". Relative links are not + described in this document. However, the use of relative links + depends on the original URL containing a hierarchical structure + against which the relative link is based. + + Some URL schemes (such as the ftp, http, and file schemes) contain + names that can be considered hierarchical; the components of the + hierarchy are separated by "/". + + + + + + + + + + + + + +Berners-Lee, Masinter & McCahill [Page 4] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +3. Specific Schemes + + The mapping for some existing standard and experimental protocols is + outlined in the BNF syntax definition. Notes on particular protocols + follow. The schemes covered are: + + ftp File Transfer protocol + http Hypertext Transfer Protocol + gopher The Gopher protocol + mailto Electronic mail address + news USENET news + nntp USENET news using NNTP access + telnet Reference to interactive sessions + wais Wide Area Information Servers + file Host-specific file names + prospero Prospero Directory Service + + Other schemes may be specified by future specifications. Section 4 of + this document describes how new schemes may be registered, and lists + some scheme names that are under development. + +3.1. Common Internet Scheme Syntax + + While the syntax for the rest of the URL may vary depending on the + particular scheme selected, URL schemes that involve the direct use + of an IP-based protocol to a specified host on the Internet use a + common syntax for the scheme-specific data: + + //<user>:<password>@<host>:<port>/<url-path> + + Some or all of the parts "<user>:<password>@", ":<password>", + ":<port>", and "/<url-path>" may be excluded. The scheme specific + data start with a double slash "//" to indicate that it complies with + the common Internet scheme syntax. The different components obey the + following rules: + + user + An optional user name. Some schemes (e.g., ftp) allow the + specification of a user name. + + password + An optional password. If present, it follows the user + name separated from it by a colon. + + The user name (and password), if present, are followed by a + commercial at-sign "@". Within the user and password field, any ":", + "@", or "/" must be encoded. + + + + +Berners-Lee, Masinter & McCahill [Page 5] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + Note that an empty user name or password is different than no user + name or password; there is no way to specify a password without + specifying a user name. E.g., <URL:ftp://@host.com/> has an empty + user name and no password, <URL:ftp://host.com/> has no user name, + while <URL:ftp://foo:@host.com/> has a user name of "foo" and an + empty password. + + host + The fully qualified domain name of a network host, or its IP + address as a set of four decimal digit groups separated by + ".". Fully qualified domain names take the form as described + in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123 + [5]: a sequence of domain labels separated by ".", each domain + label starting and ending with an alphanumerical character and + possibly also containing "-" characters. The rightmost domain + label will never start with a digit, though, which + syntactically distinguishes all domain names from the IP + addresses. + + port + The port number to connect to. Most schemes designate + protocols that have a default port number. Another port number + may optionally be supplied, in decimal, separated from the + host by a colon. If the port is omitted, the colon is as well. + + url-path + The rest of the locator consists of data specific to the + scheme, and is known as the "url-path". It supplies the + details of how the specified resource can be accessed. Note + that the "/" between the host (or port) and the url-path is + NOT part of the url-path. + + The url-path syntax depends on the scheme being used, as does the + manner in which it is interpreted. + +3.2. FTP + + The FTP URL scheme is used to designate files and directories on + Internet hosts accessible using the FTP protocol (RFC959). + + A FTP URL follow the syntax described in Section 3.1. If :<port> is + omitted, the port defaults to 21. + + + + + + + + + +Berners-Lee, Masinter & McCahill [Page 6] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +3.2.1. FTP Name and Password + + A user name and password may be supplied; they are used in the ftp + "USER" and "PASS" commands after first making the connection to the + FTP server. If no user name or password is supplied and one is + requested by the FTP server, the conventions for "anonymous" FTP are + to be used, as follows: + + The user name "anonymous" is supplied. + + The password is supplied as the Internet e-mail address + of the end user accessing the resource. + + If the URL supplies a user name but no password, and the remote + server requests a password, the program interpreting the FTP URL + should request one from the user. + +3.2.2. FTP url-path + + The url-path of a FTP URL has the following syntax: + + <cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode> + + Where <cwd1> through <cwdN> and <name> are (possibly encoded) strings + and <typecode> is one of the characters "a", "i", or "d". The part + ";type=<typecode>" may be omitted. The <cwdx> and <name> parts may be + empty. The whole url-path may be omitted, including the "/" + delimiting it from the prefix containing user, password, host, and + port. + + The url-path is interpreted as a series of FTP commands as follows: + + Each of the <cwd> elements is to be supplied, sequentially, as the + argument to a CWD (change working directory) command. + + If the typecode is "d", perform a NLST (name list) command with + <name> as the argument, and interpret the results as a file + directory listing. + + Otherwise, perform a TYPE command with <typecode> as the argument, + and then access the file whose name is <name> (for example, using + the RETR command.) + + Within a name or CWD component, the characters "/" and ";" are + reserved and must be encoded. The components are decoded prior to + their use in the FTP protocol. In particular, if the appropriate FTP + sequence to access a particular file requires supplying a string + containing a "/" as an argument to a CWD or RETR command, it is + + + +Berners-Lee, Masinter & McCahill [Page 7] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + necessary to encode each "/". + + For example, the URL <URL:ftp://myname@host.dom/%2Fetc/motd> is + interpreted by FTP-ing to "host.dom", logging in as "myname" + (prompting for a password if it is asked for), and then executing + "CWD /etc" and then "RETR motd". This has a different meaning from + <URL:ftp://myname@host.dom/etc/motd> which would "CWD etc" and then + "RETR motd"; the initial "CWD" might be executed relative to the + default directory for "myname". On the other hand, + <URL:ftp://myname@host.dom//etc/motd>, would "CWD " with a null + argument, then "CWD etc", and then "RETR motd". + + FTP URLs may also be used for other operations; for example, it is + possible to update a file on a remote file server, or infer + information about it from the directory listings. The mechanism for + doing so is not spelled out here. + +3.2.3. FTP Typecode is Optional + + The entire ;type=<typecode> part of a FTP URL is optional. If it is + omitted, the client program interpreting the URL must guess the + appropriate mode to use. In general, the data content type of a file + can only be guessed from the name, e.g., from the suffix of the name; + the appropriate type code to be used for transfer of the file can + then be deduced from the data content of the file. + +3.2.4 Hierarchy + + For some file systems, the "/" used to denote the hierarchical + structure of the URL corresponds to the delimiter used to construct a + file name hierarchy, and thus, the filename will look similar to the + URL path. This does NOT mean that the URL is a Unix filename. + +3.2.5. Optimization + + Clients accessing resources via FTP may employ additional heuristics + to optimize the interaction. For some FTP servers, for example, it + may be reasonable to keep the control connection open while accessing + multiple URLs from the same server. However, there is no common + hierarchical model to the FTP protocol, so if a directory change + command has been given, it is impossible in general to deduce what + sequence should be given to navigate to another directory for a + second retrieval, if the paths are different. The only reliable + algorithm is to disconnect and reestablish the control connection. + + + + + + + +Berners-Lee, Masinter & McCahill [Page 8] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +3.3. HTTP + + The HTTP URL scheme is used to designate Internet resources + accessible using HTTP (HyperText Transfer Protocol). + + The HTTP protocol is specified elsewhere. This specification only + describes the syntax of HTTP URLs. + + An HTTP URL takes the form: + + http://<host>:<port>/<path>?<searchpart> + + where <host> and <port> are as described in Section 3.1. If :<port> + is omitted, the port defaults to 80. No user name or password is + allowed. <path> is an HTTP selector, and <searchpart> is a query + string. The <path> is optional, as is the <searchpart> and its + preceding "?". If neither <path> nor <searchpart> is present, the "/" + may also be omitted. + + Within the <path> and <searchpart> components, "/", ";", "?" are + reserved. The "/" character may be used within HTTP to designate a + hierarchical structure. + +3.4. GOPHER + + The Gopher URL scheme is used to designate Internet resources + accessible using the Gopher protocol. + + The base Gopher protocol is described in RFC 1436 and supports items + and collections of items (directories). The Gopher+ protocol is a set + of upward compatible extensions to the base Gopher protocol and is + described in [2]. Gopher+ supports associating arbitrary sets of + attributes and alternate data representations with Gopher items. + Gopher URLs accommodate both Gopher and Gopher+ items and item + attributes. + +3.4.1. Gopher URL syntax + + A Gopher URL takes the form: + + gopher://<host>:<port>/<gopher-path> + + where <gopher-path> is one of + + <gophertype><selector> + <gophertype><selector>%09<search> + <gophertype><selector>%09<search>%09<gopher+_string> + + + + +Berners-Lee, Masinter & McCahill [Page 9] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + If :<port> is omitted, the port defaults to 70. <gophertype> is a + single-character field to denote the Gopher type of the resource to + which the URL refers. The entire <gopher-path> may also be empty, in + which case the delimiting "/" is also optional and the <gophertype> + defaults to "1". + + <selector> is the Gopher selector string. In the Gopher protocol, + Gopher selector strings are a sequence of octets which may contain + any octets except 09 hexadecimal (US-ASCII HT or tab) 0A hexadecimal + (US-ASCII character LF), and 0D (US-ASCII character CR). + + Gopher clients specify which item to retrieve by sending the Gopher + selector string to a Gopher server. + + Within the <gopher-path>, no characters are reserved. + + Note that some Gopher <selector> strings begin with a copy of the + <gophertype> character, in which case that character will occur twice + consecutively. The Gopher selector string may be an empty string; + this is how Gopher clients refer to the top-level directory on a + Gopher server. + +3.4.2 Specifying URLs for Gopher Search Engines + + If the URL refers to a search to be submitted to a Gopher search + engine, the selector is followed by an encoded tab (%09) and the + search string. To submit a search to a Gopher search engine, the + Gopher client sends the <selector> string (after decoding), a tab, + and the search string to the Gopher server. + +3.4.3 URL syntax for Gopher+ items + + URLs for Gopher+ items have a second encoded tab (%09) and a Gopher+ + string. Note that in this case, the %09<search> string must be + supplied, although the <search> element may be the empty string. + + The <gopher+_string> is used to represent information required for + retrieval of the Gopher+ item. Gopher+ items may have alternate + views, arbitrary sets of attributes, and may have electronic forms + associated with them. + + To retrieve the data associated with a Gopher+ URL, a client will + connect to the server and send the Gopher selector, followed by a tab + and the search string (which may be empty), followed by a tab and the + Gopher+ commands. + + + + + + +Berners-Lee, Masinter & McCahill [Page 10] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +3.4.4 Default Gopher+ data representation + + When a Gopher server returns a directory listing to a client, the + Gopher+ items are tagged with either a "+" (denoting Gopher+ items) + or a "?" (denoting Gopher+ items which have a +ASK form associated + with them). A Gopher URL with a Gopher+ string consisting of only a + "+" refers to the default view (data representation) of the item + while a Gopher+ string containing only a "?" refer to an item with a + Gopher electronic form associated with it. + +3.4.5 Gopher+ items with electronic forms + + Gopher+ items which have a +ASK associated with them (i.e. Gopher+ + items tagged with a "?") require the client to fetch the item's +ASK + attribute to get the form definition, and then ask the user to fill + out the form and return the user's responses along with the selector + string to retrieve the item. Gopher+ clients know how to do this but + depend on the "?" tag in the Gopher+ item description to know when to + handle this case. The "?" is used in the Gopher+ string to be + consistent with Gopher+ protocol's use of this symbol. + +3.4.6 Gopher+ item attribute collections + + To refer to the Gopher+ attributes of an item, the Gopher URL's + Gopher+ string consists of "!" or "$". "!" refers to the all of a + Gopher+ item's attributes. "$" refers to all the item attributes for + all items in a Gopher directory. + +3.4.7 Referring to specific Gopher+ attributes + + To refer to specific attributes, the URL's gopher+_string is + "!<attribute_name>" or "$<attribute_name>". For example, to refer to + the attribute containing the abstract of an item, the gopher+_string + would be "!+ABSTRACT". + + To refer to several attributes, the gopher+_string consists of the + attribute names separated by coded spaces. For example, + "!+ABSTRACT%20+SMELL" refers to the +ABSTRACT and +SMELL attributes + of an item. + +3.4.8 URL syntax for Gopher+ alternate views + + Gopher+ allows for optional alternate data representations (alternate + views) of items. To retrieve a Gopher+ alternate view, a Gopher+ + client sends the appropriate view and language identifier (found in + the item's +VIEW attribute). To refer to a specific Gopher+ alternate + view, the URL's Gopher+ string would be in the form: + + + + +Berners-Lee, Masinter & McCahill [Page 11] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + +<view_name>%20<language_name> + + For example, a Gopher+ string of "+application/postscript%20Es_ES" + refers to the Spanish language postscript alternate view of a Gopher+ + item. + +3.4.9 URL syntax for Gopher+ electronic forms + + The gopher+_string for a URL that refers to an item referenced by a + Gopher+ electronic form (an ASK block) filled out with specific + values is a coded version of what the client sends to the server. + The gopher+_string is of the form: + ++%091%0D%0A+-1%0D%0A<ask_item1_value>%0D%0A<ask_item2_value>%0D%0A.%0D%0A + + To retrieve this item, the Gopher client sends: + + <a_gopher_selector><tab>+<tab>1<cr><lf> + +-1<cr><lf> + <ask_item1_value><cr><lf> + <ask_item2_value><cr><lf> + .<cr><lf> + + to the Gopher server. + +3.5. MAILTO + + The mailto URL scheme is used to designate the Internet mailing + address of an individual or service. No additional information other + than an Internet mailing address is present or implied. + + A mailto URL takes the form: + + mailto:<rfc822-addr-spec> + + where <rfc822-addr-spec> is (the encoding of an) addr-spec, as + specified in RFC 822 [6]. Within mailto URLs, there are no reserved + characters. + + Note that the percent sign ("%") is commonly used within RFC 822 + addresses and must be encoded. + + Unlike many URLs, the mailto scheme does not represent a data object + to be accessed directly; there is no sense in which it designates an + object. It has a different use than the message/external-body type in + MIME. + + + + + +Berners-Lee, Masinter & McCahill [Page 12] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +3.6. NEWS + + The news URL scheme is used to refer to either news groups or + individual articles of USENET news, as specified in RFC 1036. + + A news URL takes one of two forms: + + news:<newsgroup-name> + news:<message-id> + + A <newsgroup-name> is a period-delimited hierarchical name, such as + "comp.infosystems.www.misc". A <message-id> corresponds to the + Message-ID of section 2.1.5 of RFC 1036, without the enclosing "<" + and ">"; it takes the form <unique>@<full_domain_name>. A message + identifier may be distinguished from a news group name by the + presence of the commercial at "@" character. No additional characters + are reserved within the components of a news URL. + + If <newsgroup-name> is "*" (as in <URL:news:*>), it is used to refer + to "all available news groups". + + The news URLs are unusual in that by themselves, they do not contain + sufficient information to locate a single resource, but, rather, are + location-independent. + +3.7. NNTP + + The nntp URL scheme is an alternative method of referencing news + articles, useful for specifying news articles from NNTP servers (RFC + 977). + + A nntp URL take the form: + + nntp://<host>:<port>/<newsgroup-name>/<article-number> + + where <host> and <port> are as described in Section 3.1. If :<port> + is omitted, the port defaults to 119. + + The <newsgroup-name> is the name of the group, while the <article- + number> is the numeric id of the article within that newsgroup. + + Note that while nntp: URLs specify a unique location for the article + resource, most NNTP servers currently on the Internet today are + configured only to allow access from local clients, and thus nntp + URLs do not designate globally accessible resources. Thus, the news: + form of URL is preferred as a way of identifying news articles. + + + + + +Berners-Lee, Masinter & McCahill [Page 13] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +3.8. TELNET + + The Telnet URL scheme is used to designate interactive services that + may be accessed by the Telnet protocol. + + A telnet URL takes the form: + + telnet://<user>:<password>@<host>:<port>/ + + as specified in Section 3.1. The final "/" character may be omitted. + If :<port> is omitted, the port defaults to 23. The :<password> can + be omitted, as well as the whole <user>:<password> part. + + This URL does not designate a data object, but rather an interactive + service. Remote interactive services vary widely in the means by + which they allow remote logins; in practice, the <user> and + <password> supplied are advisory only: clients accessing a telnet URL + merely advise the user of the suggested username and password. + +3.9. WAIS + + The WAIS URL scheme is used to designate WAIS databases, searches, or + individual documents available from a WAIS database. WAIS is + described in [7]. The WAIS protocol is described in RFC 1625 [17]; + Although the WAIS protocol is based on Z39.50-1988, the WAIS URL + scheme is not intended for use with arbitrary Z39.50 services. + + A WAIS URL takes one of the following forms: + + wais://<host>:<port>/<database> + wais://<host>:<port>/<database>?<search> + wais://<host>:<port>/<database>/<wtype>/<wpath> + + where <host> and <port> are as described in Section 3.1. If :<port> + is omitted, the port defaults to 210. The first form designates a + WAIS database that is available for searching. The second form + designates a particular search. <database> is the name of the WAIS + database being queried. + + The third form designates a particular document within a WAIS + database to be retrieved. In this form <wtype> is the WAIS + designation of the type of the object. Many WAIS implementations + require that a client know the "type" of an object prior to + retrieval, the type being returned along with the internal object + identifier in the search response. The <wtype> is included in the + URL in order to allow the client interpreting the URL adequate + information to actually retrieve the document. + + + + +Berners-Lee, Masinter & McCahill [Page 14] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + The <wpath> of a WAIS URL consists of the WAIS document-id, encoded + as necessary using the method described in Section 2.2. The WAIS + document-id should be treated opaquely; it may only be decomposed by + the server that issued it. + +3.10 FILES + + The file URL scheme is used to designate files accessible on a + particular host computer. This scheme, unlike most other URL schemes, + does not designate a resource that is universally accessible over the + Internet. + + A file URL takes the form: + + file://<host>/<path> + + where <host> is the fully qualified domain name of the system on + which the <path> is accessible, and <path> is a hierarchical + directory path of the form <directory>/<directory>/.../<name>. + + For example, a VMS file + + DISK$USER:[MY.NOTES]NOTE123456.TXT + + might become + + <URL:file://vms.host.edu/disk$user/my/notes/note12345.txt> + + As a special case, <host> can be the string "localhost" or the empty + string; this is interpreted as `the machine from which the URL is + being interpreted'. + + The file URL scheme is unusual in that it does not specify an + Internet protocol or access method for such files; as such, its + utility in network protocols between hosts is limited. + +3.11 PROSPERO + + The Prospero URL scheme is used to designate resources that are + accessed via the Prospero Directory Service. The Prospero protocol is + described elsewhere [14]. + + A prospero URLs takes the form: + + prospero://<host>:<port>/<hsoname>;<field>=<value> + + where <host> and <port> are as described in Section 3.1. If :<port> + is omitted, the port defaults to 1525. No username or password is + + + +Berners-Lee, Masinter & McCahill [Page 15] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + allowed. + + The <hsoname> is the host-specific object name in the Prospero + protocol, suitably encoded. This name is opaque and interpreted by + the Prospero server. The semicolon ";" is reserved and may not + appear without quoting in the <hsoname>. + + Prospero URLs are interpreted by contacting a Prospero directory + server on the specified host and port to determine appropriate access + methods for a resource, which might themselves be represented as + different URLs. External Prospero links are represented as URLs of + the underlying access method and are not represented as Prospero + URLs. + + Note that a slash "/" may appear in the <hsoname> without quoting and + no significance may be assumed by the application. Though slashes + may indicate hierarchical structure on the server, such structure is + not guaranteed. Note that many <hsoname>s begin with a slash, in + which case the host or port will be followed by a double slash: the + slash from the URL syntax, followed by the initial slash from the + <hsoname>. (E.g., <URL:prospero://host.dom//pros/name> designates a + <hsoname> of "/pros/name".) + + In addition, after the <hsoname>, optional fields and values + associated with a Prospero link may be specified as part of the URL. + When present, each field/value pair is separated from each other and + from the rest of the URL by a ";" (semicolon). The name of the field + and its value are separated by a "=" (equal sign). If present, these + fields serve to identify the target of the URL. For example, the + OBJECT-VERSION field can be specified to identify a specific version + of an object. + +4. REGISTRATION OF NEW SCHEMES + + A new scheme may be introduced by defining a mapping onto a + conforming URL syntax, using a new prefix. URLs for experimental + schemes may be used by mutual agreement between parties. Scheme names + starting with the characters "x-" are reserved for experimental + purposes. + + The Internet Assigned Numbers Authority (IANA) will maintain a + registry of URL schemes. Any submission of a new URL scheme must + include a definition of an algorithm for accessing of resources + within that scheme and the syntax for representing such a scheme. + + URL schemes must have demonstrable utility and operability. One way + to provide such a demonstration is via a gateway which provides + objects in the new scheme for clients using an existing protocol. If + + + +Berners-Lee, Masinter & McCahill [Page 16] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + the new scheme does not locate resources that are data objects, the + properties of names in the new space must be clearly defined. + + New schemes should try to follow the same syntactic conventions of + existing schemes, where appropriate. It is likewise recommended + that, where a protocol allows for retrieval by URL, that the client + software have provision for being configured to use specific gateway + locators for indirect access through new naming schemes. + + The following scheme have been proposed at various times, but this + document does not define their syntax or use at this time. It is + suggested that IANA reserve their scheme names for future definition: + + afs Andrew File System global file names. + mid Message identifiers for electronic mail. + cid Content identifiers for MIME body parts. + nfs Network File System (NFS) file names. + tn3270 Interactive 3270 emulation sessions. + mailserver Access to data available from mail servers. + z39.50 Access to ANSI Z39.50 services. + +5. BNF for specific URL schemes + + This is a BNF-like description of the Uniform Resource Locator + syntax, using the conventions of RFC822, except that "|" is used to + designate alternatives, and brackets [] are used around optional or + repeated elements. Briefly, literals are quoted with "", optional + elements are enclosed in [brackets], and elements may be preceded + with <n>* to designate n or more repetitions of the following + element; n defaults to 0. + +; The generic form of a URL is: + +genericurl = scheme ":" schemepart + +; Specific predefined schemes are defined here; new schemes +; may be registered with IANA + +url = httpurl | ftpurl | newsurl | + nntpurl | telneturl | gopherurl | + waisurl | mailtourl | fileurl | + prosperourl | otherurl + +; new schemes follow the general syntax +otherurl = genericurl + +; the scheme is in lower case; interpreters should use case-ignore +scheme = 1*[ lowalpha | digit | "+" | "-" | "." ] + + + +Berners-Lee, Masinter & McCahill [Page 17] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +schemepart = *xchar | ip-schemepart + + +; URL schemeparts for ip based protocols: + +ip-schemepart = "//" login [ "/" urlpath ] + +login = [ user [ ":" password ] "@" ] hostport +hostport = host [ ":" port ] +host = hostname | hostnumber +hostname = *[ domainlabel "." ] toplabel +domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit +toplabel = alpha | alpha *[ alphadigit | "-" ] alphadigit +alphadigit = alpha | digit +hostnumber = digits "." digits "." digits "." digits +port = digits +user = *[ uchar | ";" | "?" | "&" | "=" ] +password = *[ uchar | ";" | "?" | "&" | "=" ] +urlpath = *xchar ; depends on protocol see section 3.1 + +; The predefined schemes: + +; FTP (see also RFC959) + +ftpurl = "ftp://" login [ "/" fpath [ ";type=" ftptype ]] +fpath = fsegment *[ "/" fsegment ] +fsegment = *[ uchar | "?" | ":" | "@" | "&" | "=" ] +ftptype = "A" | "I" | "D" | "a" | "i" | "d" + +; FILE + +fileurl = "file://" [ host | "localhost" ] "/" fpath + +; HTTP + +httpurl = "http://" hostport [ "/" hpath [ "?" search ]] +hpath = hsegment *[ "/" hsegment ] +hsegment = *[ uchar | ";" | ":" | "@" | "&" | "=" ] +search = *[ uchar | ";" | ":" | "@" | "&" | "=" ] + +; GOPHER (see also RFC1436) + +gopherurl = "gopher://" hostport [ / [ gtype [ selector + [ "%09" search [ "%09" gopher+_string ] ] ] ] ] +gtype = xchar +selector = *xchar +gopher+_string = *xchar + + + + +Berners-Lee, Masinter & McCahill [Page 18] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +; MAILTO (see also RFC822) + +mailtourl = "mailto:" encoded822addr +encoded822addr = 1*xchar ; further defined in RFC822 + +; NEWS (see also RFC1036) + +newsurl = "news:" grouppart +grouppart = "*" | group | article +group = alpha *[ alpha | digit | "-" | "." | "+" | "_" ] +article = 1*[ uchar | ";" | "/" | "?" | ":" | "&" | "=" ] "@" host + +; NNTP (see also RFC977) + +nntpurl = "nntp://" hostport "/" group [ "/" digits ] + +; TELNET + +telneturl = "telnet://" login [ "/" ] + +; WAIS (see also RFC1625) + +waisurl = waisdatabase | waisindex | waisdoc +waisdatabase = "wais://" hostport "/" database +waisindex = "wais://" hostport "/" database "?" search +waisdoc = "wais://" hostport "/" database "/" wtype "/" wpath +database = *uchar +wtype = *uchar +wpath = *uchar + +; PROSPERO + +prosperourl = "prospero://" hostport "/" ppath *[ fieldspec ] +ppath = psegment *[ "/" psegment ] +psegment = *[ uchar | "?" | ":" | "@" | "&" | "=" ] +fieldspec = ";" fieldname "=" fieldvalue +fieldname = *[ uchar | "?" | ":" | "@" | "&" ] +fieldvalue = *[ uchar | "?" | ":" | "@" | "&" ] + +; Miscellaneous definitions + +lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | + "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | + "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | + "y" | "z" +hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | + "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | + "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" + + + +Berners-Lee, Masinter & McCahill [Page 19] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +alpha = lowalpha | hialpha +digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | + "8" | "9" +safe = "$" | "-" | "_" | "." | "+" +extra = "!" | "*" | "'" | "(" | ")" | "," +national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`" +punctuation = "<" | ">" | "#" | "%" | <"> + + +reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" +hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | + "a" | "b" | "c" | "d" | "e" | "f" +escape = "%" hex hex + +unreserved = alpha | digit | safe | extra +uchar = unreserved | escape +xchar = unreserved | reserved | escape +digits = 1*digit + +6. Security Considerations + + The URL scheme does not in itself pose a security threat. Users + should beware that there is no general guarantee that a URL which at + one time points to a given object continues to do so, and does not + even at some later time point to a different object due to the + movement of objects on servers. + + A URL-related security threat is that it is sometimes possible to + construct a URL such that an attempt to perform a harmless idempotent + operation such as the retrieval of the object will in fact cause a + possibly damaging remote operation to occur. The unsafe URL is + typically constructed by specifying a port number other than that + reserved for the network protocol in question. The client + unwittingly contacts a server which is in fact running a different + protocol. The content of the URL contains instructions which when + interpreted according to this other protocol cause an unexpected + operation. An example has been the use of gopher URLs to cause a rude + message to be sent via a SMTP server. Caution should be used when + using any URL which specifies a port number other than the default + for the protocol, especially when it is a number within the reserved + space. + + Care should be taken when URLs contain embedded encoded delimiters + for a given protocol (for example, CR and LF characters for telnet + protocols) that these are not unencoded before transmission. This + would violate the protocol but could be used to simulate an extra + operation or parameter, again causing an unexpected and possible + harmful remote operation to be performed. + + + +Berners-Lee, Masinter & McCahill [Page 20] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + The use of URLs containing passwords that should be secret is clearly + unwise. + +7. Acknowledgements + + This paper builds on the basic WWW design (RFC 1630) and much + discussion of these issues by many people on the network. The + discussion was particularly stimulated by articles by Clifford Lynch, + Brewster Kahle [10] and Wengyik Yeong [18]. Contributions from John + Curran, Clifford Neuman, Ed Vielmetti and later the IETF URL BOF and + URI working group were incorporated. + + Most recently, careful readings and comments by Dan Connolly, Ned + Freed, Roy Fielding, Guido van Rossum, Michael Dolan, Bert Bos, John + Kunze, Olle Jarnefors, Peter Svanberg and many others have helped + refine this RFC. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Berners-Lee, Masinter & McCahill [Page 21] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +APPENDIX: Recommendations for URLs in Context + + URIs, including URLs, are intended to be transmitted through + protocols which provide a context for their interpretation. + + In some cases, it will be necessary to distinguish URLs from other + possible data structures in a syntactic structure. In this case, is + recommended that URLs be preceeded with a prefix consisting of the + characters "URL:". For example, this prefix may be used to + distinguish URLs from other kinds of URIs. + + In addition, there are many occasions when URLs are included in other + kinds of text; examples include electronic mail, USENET news + messages, or printed on paper. In such cases, it is convenient to + have a separate syntactic wrapper that delimits the URL and separates + it from the rest of the text, and in particular from punctuation + marks that might be mistaken for part of the URL. For this purpose, + is recommended that angle brackets ("<" and ">"), along with the + prefix "URL:", be used to delimit the boundaries of the URL. This + wrapper does not form part of the URL and should not be used in + contexts in which delimiters are already specified. + + In the case where a fragment/anchor identifier is associated with a + URL (following a "#"), the identifier would be placed within the + brackets as well. + + In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may + need to be added to break long URLs across lines. The whitespace + should be ignored when extracting the URL. + + No whitespace should be introduced after a hyphen ("-") character. + Because some typesetters and printers may (erroneously) introduce a + hyphen at the end of line when breaking a line, the interpreter of a + URL containing a line break immediately after a hyphen should ignore + all unencoded whitespace around the line break, and should be aware + that the hyphen may or may not actually be part of the URL. + + Examples: + + Yes, Jim, I found it under <URL:ftp://info.cern.ch/pub/www/doc; + type=d> but you can probably pick it up from <URL:ftp://ds.in + ternic.net/rfc>. Note the warning in <URL:http://ds.internic. + net/instructions/overview.html#WARNING>. + + + + + + + + +Berners-Lee, Masinter & McCahill [Page 22] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + +References + + [1] Anklesaria, F., McCahill, M., Lindner, P., Johnson, D., + Torrey, D., and B. Alberti, "The Internet Gopher Protocol + (a distributed document search and retrieval protocol)", + RFC 1436, University of Minnesota, March 1993. + <URL:ftp://ds.internic.net/rfc/rfc1436.txt;type=a> + + [2] Anklesaria, F., Lindner, P., McCahill, M., Torrey, D., + Johnson, D., and B. Alberti, "Gopher+: Upward compatible + enhancements to the Internet Gopher protocol", + University of Minnesota, July 1993. + <URL:ftp://boombox.micro.umn.edu/pub/gopher/gopher_protocol + /Gopher+/Gopher+.txt> + + [3] Berners-Lee, T., "Universal Resource Identifiers in WWW: A + Unifying Syntax for the Expression of Names and Addresses of + Objects on the Network as used in the World-Wide Web", RFC + 1630, CERN, June 1994. + <URL:ftp://ds.internic.net/rfc/rfc1630.txt> + + [4] Berners-Lee, T., "Hypertext Transfer Protocol (HTTP)", + CERN, November 1993. + <URL:ftp://info.cern.ch/pub/www/doc/http-spec.txt.Z> + + [5] Braden, R., Editor, "Requirements for Internet Hosts -- + Application and Support", STD 3, RFC 1123, IETF, October 1989. + <URL:ftp://ds.internic.net/rfc/rfc1123.txt> + + [6] Crocker, D. "Standard for the Format of ARPA Internet Text + Messages", STD 11, RFC 822, UDEL, April 1982. + <URL:ftp://ds.internic.net/rfc/rfc822.txt> + + [7] Davis, F., Kahle, B., Morris, H., Salem, J., Shen, T., Wang, R., + Sui, J., and M. Grinbaum, "WAIS Interface Protocol Prototype + Functional Specification", (v1.5), Thinking Machines + Corporation, April 1990. + <URL:ftp://quake.think.com/pub/wais/doc/protspec.txt> + + [8] Horton, M. and R. Adams, "Standard For Interchange of USENET + Messages", RFC 1036, AT&T Bell Laboratories, Center for Seismic + Studies, December 1987. + <URL:ftp://ds.internic.net/rfc/rfc1036.txt> + + [9] Huitema, C., "Naming: Strategies and Techniques", Computer + Networks and ISDN Systems 23 (1991) 107-110. + + + + + +Berners-Lee, Masinter & McCahill [Page 23] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + [10] Kahle, B., "Document Identifiers, or International Standard + Book Numbers for the Electronic Age", 1991. + <URL:ftp://quake.think.com/pub/wais/doc/doc-ids.txt> + + [11] Kantor, B. and P. Lapsley, "Network News Transfer Protocol: + A Proposed Standard for the Stream-Based Transmission of News", + RFC 977, UC San Diego & UC Berkeley, February 1986. + <URL:ftp://ds.internic.net/rfc/rfc977.txt> + + [12] Kunze, J., "Functional Requirements for Internet Resource + Locators", Work in Progress, December 1994. + <URL:ftp://ds.internic.net/internet-drafts + /draft-ietf-uri-irl-fun-req-02.txt> + + [13] Mockapetris, P., "Domain Names - Concepts and Facilities", + STD 13, RFC 1034, USC/Information Sciences Institute, + November 1987. + <URL:ftp://ds.internic.net/rfc/rfc1034.txt> + + [14] Neuman, B., and S. Augart, "The Prospero Protocol", + USC/Information Sciences Institute, June 1993. + <URL:ftp://prospero.isi.edu/pub/prospero/doc + /prospero-protocol.PS.Z> + + [15] Postel, J. and J. Reynolds, "File Transfer Protocol (FTP)", + STD 9, RFC 959, USC/Information Sciences Institute, + October 1985. + <URL:ftp://ds.internic.net/rfc/rfc959.txt> + + [16] Sollins, K. and L. Masinter, "Functional Requirements for + Uniform Resource Names", RFC 1737, MIT/LCS, Xerox Corporation, + December 1994. + <URL:ftp://ds.internic.net/rfc/rfc1737.txt> + + [17] St. Pierre, M, Fullton, J., Gamiel, K., Goldman, J., Kahle, B., + Kunze, J., Morris, H., and F. Schiettecatte, "WAIS over + Z39.50-1988", RFC 1625, WAIS, Inc., CNIDR, Thinking Machines + Corp., UC Berkeley, FS Consulting, June 1994. + <URL:ftp://ds.internic.net/rfc/rfc1625.txt> + + [18] Yeong, W. "Towards Networked Information Retrieval", Technical + report 91-06-25-01, Performance Systems International, Inc. + <URL:ftp://uu.psi.com/wp/nir.txt>, June 1991. + + [19] Yeong, W., "Representing Public Archives in the Directory", + Work in Progress, November 1991. + + + + + +Berners-Lee, Masinter & McCahill [Page 24] + +RFC 1738 Uniform Resource Locators (URL) December 1994 + + + [20] "Coded Character Set -- 7-bit American Standard Code for + Information Interchange", ANSI X3.4-1986. + +Editors' Addresses + +Tim Berners-Lee +World-Wide Web project +CERN, +1211 Geneva 23, +Switzerland + +Phone: +41 (22)767 3755 +Fax: +41 (22)767 7155 +EMail: timbl@info.cern.ch + + +Larry Masinter +Xerox PARC +3333 Coyote Hill Road +Palo Alto, CA 94034 + +Phone: (415) 812-4365 +Fax: (415) 812-4333 +EMail: masinter@parc.xerox.com + + +Mark McCahill +Computer and Information Services, +University of Minnesota +Room 152 Shepherd Labs +100 Union Street SE +Minneapolis, MN 55455 + +Phone: (612) 625 1300 +EMail: mpm@boombox.micro.umn.edu + + + + + + + + + + + + + + + + +Berners-Lee, Masinter & McCahill [Page 25] + |