diff options
Diffstat (limited to 'doc/rfc/rfc2718.txt')
-rw-r--r-- | doc/rfc/rfc2718.txt | 563 |
1 files changed, 563 insertions, 0 deletions
diff --git a/doc/rfc/rfc2718.txt b/doc/rfc/rfc2718.txt new file mode 100644 index 0000000..e07de5e --- /dev/null +++ b/doc/rfc/rfc2718.txt @@ -0,0 +1,563 @@ + + + + + + +Network Working Group L. Masinter +Request for Comments: 2718 Xerox Corporation +Category: Informational H. Alvestrand + Maxware, Pirsenteret + D. Zigmond + WebTV Networks, Inc. + R. Petke + UUNET Technologies + November 1999 + + + Guidelines for new URL Schemes + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (1999). All Rights Reserved. + +Abstract + + A Uniform Resource Locator (URL) is a compact string representation + of the location for a resource that is available via the Internet. + This document provides guidelines for the definition of new URL + schemes. + +1. Introduction + + A Uniform Resource Locator (URL) is a compact string representation + of the location for a resource that is available via the Internet. + RFC 2396 [1] defines the general syntax and semantics of URIs, and, + by inclusion, URLs. URLs are designated by including a "<scheme>:" + and then a "<scheme-specific-part>". Many URL schemes are already + defined. + + This document provides guidelines for the definition of new URL + schemes, for consideration by those who are defining and registering + or evaluating those definitions. + + The process by which new URL schemes are registered is defined in RFC + 2717 [2]. + + + + + + +Masinter, et al. Informational [Page 1] + +RFC 2718 Guidelines for new URL Schemes November 1999 + + +2. Guidelines for new URL schemes + + Because new URL schemes potentially complicate client software, new + schemes must have demonstrable utility and operability, as well as + compatibility with existing URL schemes. This section elaborates + these criteria. + +2.1 Syntactic compatibility + + New URL schemes should follow the same syntactic conventions of + existing schemes when appropriate. If a URI scheme that has embedded + links in content accessed by that scheme does not share syntax with a + different scheme, the same content cannot be served up under + different schemes without rewriting the content. This can already be + a problem, and with future digital signature schemes, rewriting may + not even be possible. Deployment of other schemes in the future + could therefore become extremely difficult. + +2.1.1 Motivations for syntactic compatibility + + Why should new URL schemes share as much of the generic URI syntax + (that makes sense to share) as possible? Consider the following: + + o If fragment syntax isn't shared between two schemes, (e.g. "<a + href="#foo">"), you can't move individual completely self + referential documents between schemes without rewriting the + embedded references within the document. In the Web, the fragment + syntax is a property of the media type, and evaluated by the + client. + + o If fragment syntax is not shared between different media types of + the same capability (e.g. HTML, XML, Word, or image types such as + GIF, JPEG, PNG) then you can't have a URI reference that can + evolve to superior media types as they become available, or even + likely work properly today with content negotiation. + + o If relative syntax (to the extent of understanding the URI is + relative, and what part of the URI string is relative) isn't + shared between two schemes, (e.g. "<a href="foo">"), you can't + move sets of documents that are internally self referential + between schemes without rewriting the embedded URIs. + + o If the ".." syntax as a path component in relative URI's isn't + shared between schemes, you can't easily have sets of document + sets and refer to them between schemes without rewriting the + embedded references. + + + + + +Masinter, et al. Informational [Page 2] + +RFC 2718 Guidelines for new URL Schemes November 1999 + + + o If the "/" syntax (to the extent of understanding that the URI + refers to a path relative to the current naming authority, see + section 2.1.1) isn't shared, you can't have multiple sets of + documents easily be moved up or down in a relative hierarchy of + names and share a common set of documents between them, without + rewriting the content, shared either in that scheme or between + schemes. The best example is a site that has a common set of + GIF's, JPEG and PNG images, and you want to reorganize the site + changing the depth of a subtree from one depth to another, or from + one directory to another where the depth isn't the same. + + o If naming authority syntax (e.g. what comes after "//" in most URL + schemes, see section 2.1.1) and relative path syntax is shared, to + the extent of understanding that the URI has a naming authority, + and what part of the URI string is the naming authority vs. path), + isn't shared between two schemes, you can't share identical name + spaces and serve them up via different schemes. (The naming + authority syntax is a property of the scheme). The fact that + HTTP, and FTP have the same syntax, for example, has often been + exploited by sites transitioning from ftp archive service to HTTP + archive service so that the URL's can be identical between schemes + except for the scheme; the same content can be served via two + schemes simultaneously. + +2.1.2 Improper use of "//" following "<scheme>:" + + Contrary to some examples set in past years, the use of double + slashes as the first component of the <scheme-specific-part> of a URL + is not simply an artistic indicator that what follows is a URL: + Double slashes are used ONLY when the syntax of the URL's <scheme- + specific-part> contains a hierarchical structure as described in RFC + 2396. In URLs from such schemes, the use of double slashes indicates + that what follows is the top hierarchical element for a naming + authority. (See section 3 of RFC 2396 for more details.) URL + schemes which do not contain a conformant hierarchical structure in + their <scheme-specific-part> should not use double slashes following + the "<scheme>:" string. + +2.1.3 Compatibility with relative URLs + + URL schemes should use the generic URL syntax if they are intended to + be used with relative URLs. A description of the allowed relative + forms should be included in the scheme's definition. Many + applications use relative URLs extensively. Specifically, + + o Can the scheme be parsed according to RFC 2396 - for example, if + the tokens "//", "/", ";", or "?" are used, do they have the + meaning given in RFC 2396? + + + +Masinter, et al. Informational [Page 3] + +RFC 2718 Guidelines for new URL Schemes November 1999 + + + o Does the scheme make sense to use it in relative URLs like those + RFC 2396 specifies? + + o If the scheme syntax is designed to be broken into pieces, does + the documentation for the scheme's syntax specify what those + pieces are, why it should be broken in this way, and why the + breaks aren't where RFC 2396 says that they usually should be? + + o If the scheme has a hierarchy, does it go left-to-right and with + slash separators like RFC 2396? + +2.2 Is the scheme well defined? + + It is important that the semantics of the "resource" that a URL + "locates" be well defined. This might mean different things + depending on the nature of the URL scheme. + +2.2.1 Clear mapping from other name spaces + + In many cases, new URL schemes are defined as ways to translate + other protocols and name spaces into the general framework of + URLs. The "ftp" URL scheme translates from the FTP protocol, + while the "mid" URL scheme translates from the Message-ID field of + messages. + + In either case, the description of the mapping must be complete, + must describe how characters get encoded or not in URLs, must + describe exactly how all legal values of the base standard can be + represented using the URL scheme, and exactly which modifiers, + alternate forms and other artifacts from the base standards are + included or not included. These requirements are elaborated + below. + +2.2.2 URL schemes associated with network protocols + + Most new URL schemes are associated with network resources that + have one or several network protocols that can access them. The + 'ftp', 'news', and 'http' schemes are of this nature. For such + schemes, the specification should completely describe how URLs are + translated into protocol actions in sufficient detail to make the + access of the network resource unambiguous. If an implementation + of the URL scheme requires some configuration, the configuration + elements must be clearly identified. (For example, the 'news' + scheme, if implemented using NTTP, requires configuration of the + NTTP server.) + + + + + + +Masinter, et al. Informational [Page 4] + +RFC 2718 Guidelines for new URL Schemes November 1999 + + +2.2.3 Definition of non-protocol URL schemes + + In some cases, URL schemes do not have particular network + protocols associated with them, because their use is limited to + contexts where the access method is understood. This is the case, + for example, with the "cid" and "mid" URL schemes. For these URL + schemes, the specification should describe the notation of the + scheme and a complete mapping of the locator from its source. + +2.2.4 Definition of URL schemes not associated with data resources + + Most URL schemes locate Internet resources that correspond to data + objects that can be retrieved or modified. This is the case with + "ftp" and "http", for example. However, some URL schemes do not; + for example, the "mailto" URL scheme corresponds to an Internet + mail address. + + If a new URL scheme does not locate resources that are data + objects, the properties of names in the new space must be clearly + defined. + +2.2.5 Character encoding + + When describing URL schemes in which (some of) the elements of the + URL are actually representations of sequences of characters, care + should be taken not to introduce unnecessary variety in the ways + in which characters are encoded into octets and then into URL + characters. Unless there is some compelling reason for a + particular scheme to do otherwise, translating character sequences + into UTF-8 (RFC 2279) [3] and then subsequently using the %HH + encoding for unsafe octets is recommended. + +2.2.6 Definition of operations + + In some contexts (for example, HTML forms) it is possible to + specify any one of a list of operations to be performed on a + specific URL. (Outside forms, it is generally assumed to be + something you GET.) + + The URL scheme definition should describe all well-defined + operations on the URL identifier, and what they are supposed to + do. + + Some URL schemes (for example, "telnet") provide location + information for hooking onto bi-directional data streams, and + don't fit the "infoaccess" paradigm of most URLs very well; this + should be documented. + + + + +Masinter, et al. Informational [Page 5] + +RFC 2718 Guidelines for new URL Schemes November 1999 + + + NOTE: It is perfectly valid to say that "no operation apart from + GET is defined for this URL". It is also valid to say that + "there's only one operation defined for this URL, and it's not + very GET-like". The important point is that what is defined on + this type is described. + +2.3 Demonstrated utility + + URL schemes should have demonstrated utility. New URL schemes are + expensive things to support. Often they require special code in + browsers, proxies, and/or servers. Having a lot of ways to say + the same thing needless complicates these programs without adding + value to the Internet. + + The kinds of things that are useful include: + + o Things that cannot be referred to in any other way. + + o Things where it is much easier to get at them using this scheme + than (for instance) a proxy gateway. + +2.3.1 Proxy into HTTP/HTML + + One way to provide a demonstration of utility is via a gateway which + provides objects in the new scheme for clients using an existing + protocol. It is much easier to deploy gateways to a new service than + it is to deploy browsers that understand the new URL object. + + Things to look for when thinking about a proxy are: + + o Is there a single global resolution mechanism whereby any proxy + can find the referenced object? + o If not, is there a way in which the user can find any object of + this type, and "run his own proxy"? + o Are the operations mappable one-to-one (or possibly using + modifiers) to HTTP operations? + o Is the type of returned objects well defined? + - as MIME content-types? + - as something that can be translated to HTML? + o Is there running code for a proxy? + + + + + + + + + + + +Masinter, et al. Informational [Page 6] + +RFC 2718 Guidelines for new URL Schemes November 1999 + + +2.4 Are there security considerations? + + Above and beyond the security considerations of the base mechanism a + scheme builds upon, one must think of things that can happen in the + normal course of URL usage. + + In particular: + + o Does the user need to be warned that such a thing is happening + without an explicit request (GET for the source of an IMG tag, for + instance)? This has implications for the design of a proxy + gateway, of course. + + o Is it possible to fake URLs of this type that point to different + things in a dangerous way? + + o Are there mechanisms for identifying the requester that can be + used or need to be used with this mechanism (the From: field in a + mailto: URL, or the Kerberos login required for AFS access in the + AFS: URL, for instance)? + + o Does the mechanism contain passwords or other security information + that are passed inside the referring document in the clear (as in + the "ftp" URL, for instance)? + +2.5 Does it start with UR? + + Any scheme starting with the letters "U" and "R", in particular if it + attaches any of the meanings "uniform", "universal" or "unifying" to + the first letter, is going to cause intense debate, and generate much + heat (but maybe little light). + + Any such proposal should either make sure that there is a large + consensus behind it that it will be the only scheme of its type, or + pick another name. + +2.6 Non-considerations + + Some issues that are often raised but are not relevant to new URL + schemes include the following. + + + + + + + + + + + +Masinter, et al. Informational [Page 7] + +RFC 2718 Guidelines for new URL Schemes November 1999 + + +2.6.1 Are all objects accessible? + + Can all objects in the world that are validly identified by a scheme + be accessed by any UA implementing it? + + Sometimes the answer will be yes and sometimes no; often it will + depend on factors (like firewalls or client configuration) not + directly related to the scheme itself. + +3. Security Considerations + + New URL schemes are required to address all security considerations + in their definitions. + +4. References + + [1] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource + Identifiers (URI): Generic Syntax", RFC 2396, August 1998. + + [2] Petke, R. and I. King, "Registration Procedures for URL Scheme + Names", BCP 35, RFC 2717, November 1999. + + [3] Yergeau, F., "UTF-8, A Transformation Format of Unicode and ISO + 10646", RFC 2279, January 1998. + + + + + + + + + + + + + + + + + + + + + + + + + + + +Masinter, et al. Informational [Page 8] + +RFC 2718 Guidelines for new URL Schemes November 1999 + + +5. Authors' Addresses + + Larry Masinter + Xerox Corporation + Palo Alto Research Center + 3333 Coyote Hill Road + Palo Alto, CA 94304 + + URL: http://purl.org/NET/masinter + EMail: masinter@parc.xerox.com + + + Harald Tveit Alvestrand + Maxware, Pirsenteret + N-7005 Trondheim + NORWAY + + Phone: +47 73 54 57 00 + EMail: harald.alvestrand@maxware.no + + + Dan Zigmond + WebTV Networks, Inc. + 305 Lytton Avenue + Palo Alto, CA 94301 + USA + + Phone: +1-650-614-6071 + EMail: djz@corp.webtv.net + + + Rich Petke + UUNET Technologies + 5000 Britton Road + P. O. Box 5000 + Hilliard, OH 43026-5000 + + Phone: +1-614-723-4157 + Fax: +1-614-723-8407 + EMail: rpetke@wcom.net + + + + + + + + + + + +Masinter, et al. Informational [Page 9] + +RFC 2718 Guidelines for new URL Schemes November 1999 + + +6. Full Copyright Statement + + Copyright (C) The Internet Society (1999). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Masinter, et al. Informational [Page 10] + |