summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6596.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc6596.txt')
-rw-r--r--doc/rfc/rfc6596.txt451
1 files changed, 451 insertions, 0 deletions
diff --git a/doc/rfc/rfc6596.txt b/doc/rfc/rfc6596.txt
new file mode 100644
index 0000000..6167b38
--- /dev/null
+++ b/doc/rfc/rfc6596.txt
@@ -0,0 +1,451 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) M. Ohye
+Request for Comments: 6596 J. Kupke
+Category: Informational April 2012
+ISSN: 2070-1721
+
+
+ The Canonical Link Relation
+
+Abstract
+
+ RFC 5988 specifies a way to define relationships between links on the
+ web. This document describes a new type of such a relationship,
+ "canonical", to designate an Internationalized Resource Identifier
+ (IRI) as preferred over resources with duplicative content.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Not all documents
+ approved by the IESG are a candidate for any level of Internet
+ Standard; see Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6596.
+
+Copyright Notice
+
+ Copyright (c) 2012 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+
+
+
+
+
+Ohye & Kupke Informational [Page 1]
+
+RFC 6596 The Canonical Link Relation April 2012
+
+
+1. Introduction
+
+ The canonical link relation specifies the preferred IRI from
+ resources with duplicative content. Common implementations of the
+ canonical link relation are to specify the preferred version of an
+ IRI from duplicate pages created with the addition of IRI parameters
+ (e.g., session IDs) or to specify the single-page version as
+ preferred over the same content separated on multiple component
+ pages.
+
+ In regard to the link relation type, "canonical" can be described
+ informally as the author's preferred version of a resource. More
+ formally, the canonical link relation specifies the preferred IRI
+ from a set of resources that return the context IRI's content in
+ duplicated form. Once specified, applications such as search engines
+ can focus processing on the canonical, and references to the context
+ (referring) IRI can be updated to reference the target (canonical)
+ IRI.
+
+2. Notational Conventions
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [RFC2119].
+
+3. The Canonical Link Relation
+
+ The target (canonical) IRI MUST identify content that is either
+ duplicative or a superset of the content at the context (referring)
+ IRI. Authors who declare the canonical link relation ought to
+ anticipate that applications such as search engines can:
+
+ o Index content only from the target IRI (i.e., content from the
+ context IRIs will be likely disregarded as duplicative).
+
+ o Consolidate IRI properties, such as link popularity, to the target
+ IRI.
+
+ o Display the target IRI as the representative IRI.
+
+ The target (canonical) IRI MAY:
+
+ o Specify a relative IRI (see [RFC3986], Section 4.2).
+
+ o Be self-referential (context IRI identical to target IRI).
+
+ o Exist on a different hostname or domain.
+
+
+
+
+Ohye & Kupke Informational [Page 2]
+
+RFC 6596 The Canonical Link Relation April 2012
+
+
+ o Have different scheme names, such as "http" to "https" or "gopher"
+ to "ftp".
+
+ o Be a superset of the content at the context IRI.
+
+ * As an example, each component page (e.g., page-1.html, page-
+ 2.html) of a multi-page article MAY specify the "view-all"
+ version (e.g., page-all.html), the superset of their content,
+ as the target IRI. This is because the content from each
+ component page is contained within the view-all version. Given
+ this implementation, applications can mark page-1.html and
+ page-2.html as duplicates of page-all.html, process content
+ only from page-all.html, and disregard the component pages.
+ All references can then be made to the view-all version (page-
+ all.html, the target IRI), and no content will have been lost
+ in this process.
+
+ * Using the same example above, page-2.html SHOULD NOT designate
+ page-1.html as the target (canonical) IRI because this may
+ cause a loss of data. When page-2.html designates page-1.html
+ as the canonical, only content from the target IRI, page-
+ 1.html, will be processed. page-2.html may be marked as a
+ duplicate of page-1.html and its content disregarded.
+
+ o Be the source IRI of a temporary redirect. For HTTP, this refers
+ to status codes 302, 303, or 307 (Sections 10.3.3, 10.3.4, and
+ 10.3.8, respectively, of [RFC2616]).
+
+ To better ensure that applications properly handle the canonical link
+ relation, administrators ought to consider the following guidelines:
+
+ o Specify only one canonical link relation for a resource. (It
+ would be confusing to consider/label/designate more than one IRI
+ as authoritative.)
+
+ o Avoid designating the target (canonical) as:
+
+ * The source IRI of a permanent redirect (for HTTP, this refers
+ to 300 and 301 response codes, defined in Sections 10.3.1 and
+ 10.3.2 of [RFC2616]).
+
+ * An IRI that also specifies a canonical link relation to an IRI
+ other than itself.
+
+ * An IRI that returns an error code, such as a 4xx response in
+ HTTP (Section 10.4 of [RFC2616]).
+
+
+
+
+
+Ohye & Kupke Informational [Page 3]
+
+RFC 6596 The Canonical Link Relation April 2012
+
+
+ * The first page of a multi-page article or multi-page listing of
+ items (since the first page is not duplicative or a superset of
+ the context IRI). For example, page-2.html and page-3.html of
+ an article SHOULD NOT specify page-1.html as the canonical.
+ This may cause a loss of data from page-2.html and page-3.html
+ as they will be marked duplicative of page-1.html with only
+ content from page-1.html being processed.
+
+ When the canonical link relation is declared improperly, such as
+ creating chained canonicals (i.e., target IRI specifies the source
+ IRI of a permanent redirect) or designating a target IRI that returns
+ a 4xx response, applications can use their own heuristics when
+ processing the resource. For instance, an application can choose to
+ ignore any improper canonical designation and continue to process the
+ remaining content on a page.
+
+4. Examples
+
+ The following example illustrates:
+
+ o Three IRIs that serve duplicate content.
+
+ o One IRI that is the canonical or "preferred version".
+
+ o Two IRIs with additional query parameters, making them the non-
+ preferred version of the content (duplicates). The canonical link
+ relation is therefore specified on these duplicates.
+
+ If the preferred version of a IRI and its content exists at:
+
+ http://www.example.com/page.php?item=purse
+
+ Then duplicate content IRIs such as:
+
+ http://www.example.com/page.php?item=purse&category=bags
+ http://www.example.com/page.php?item=purse&category=bags&sid=1234
+
+ may designate the canonical link relation in HTML as specified in
+ [REC-html401-19991224]:
+
+ <link rel="canonical"
+ href="http://www.example.com/page.php?item=purse">
+
+ or as a relative IRI:
+
+ <link rel="canonical" href="page.php?item=purse">
+
+
+
+
+
+Ohye & Kupke Informational [Page 4]
+
+RFC 6596 The Canonical Link Relation April 2012
+
+
+ or alternatively, in the HTTP header field as specified in Section 5
+ of [RFC5988]:
+
+ Link: <http://www.example.com/page.php?item=purse>; rel="canonical"
+
+ This signals to applications, such as search engines, that these are
+ duplicates of the target (canonical) IRI:
+
+ http://www.example.com/page.php?item=purse.
+
+ Applications may then select the canonical value as the display IRI
+ (such as in search results), and additional IRI properties such as
+ indexing and ranking signals can be transferred as well.
+
+5. Recommendations
+
+ Before adding the canonical link relation, verification of the
+ following is RECOMMENDED:
+
+ 1. The content of the context IRI is duplicated within the content
+ of the target (canonical) IRI.
+
+ 2. For HTTP, permanent HTTP redirects (Section 10.3.2 of [RFC2616]),
+ the traditional strong indicator that a IRI's content has been
+ permanently moved, could not be implemented in place of the
+ canonical link relation.
+
+ 3. In the case where the target (canonical) IRI is a superset of
+ content from the context IRI (i.e., the case where page-1.html
+ and page-2.html designate page-all.html as the canonical), that
+ the user experience is strongly taken into consideration, both in
+ regard to possible increased load time and potential complexity
+ in navigation.
+
+6. IANA Considerations
+
+ IANA has registered the Canonical Link Relation below as per
+ [RFC5988].
+
+ Relation Name:
+
+ canonical
+
+ Description:
+
+ Designates the preferred version of a resource (the IRI and its
+ contents).
+
+
+
+
+Ohye & Kupke Informational [Page 5]
+
+RFC 6596 The Canonical Link Relation April 2012
+
+
+ Reference:
+
+ This specification.
+
+ Notes:
+
+ None.
+
+ Application Data:
+
+ None.
+
+7. Security Considerations
+
+ When a site is compromised, the canonical link relation can be
+ implemented with malicious intent to designate the attacker's IRI as
+ the preferred version of the content. While this technique is
+ largely unnoticeable to humans, automated programs may cluster the
+ compromised resource as duplicative of the attacker's target IRI,
+ transferring properties such as link popularity away from the
+ compromised resource to the attacker's designated canonical.
+ (Naturally, even a site that is not compromised could provide
+ inaccurate or misleading information about which URI is canonical.)
+
+8. Internationalization Considerations
+
+ Internationalization considerations for link relations are provided
+ in Section 8 of [RFC5988].
+
+9. Normative References
+
+ [REC-html401-19991224]
+ Raggett, D., Le Hors, A., and I. Jacobs, "HTML 4.01
+ Specification", W3C Recommendation REC-html401-19991224,
+ December 1999,
+ <http://www.w3.org/TR/1999/REC-html401-19991224>.
+
+ Latest version available at
+ <http://www.w3.org/TR/html401>.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+ [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
+ Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
+ Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
+
+
+
+
+
+Ohye & Kupke Informational [Page 6]
+
+RFC 6596 The Canonical Link Relation April 2012
+
+
+ [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
+ Resource Identifier (URI): Generic Syntax", STD 66,
+ RFC 3986, January 2005.
+
+ [RFC5988] Nottingham, M., "Web Linking", RFC 5988, October 2010.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ohye & Kupke Informational [Page 7]
+
+RFC 6596 The Canonical Link Relation April 2012
+
+
+Appendix A. Implementations
+
+ Automated programs that implement functionality with regard for the
+ canonical link relation include:
+
+ o Google, canonical link relation HTML and HTTP header support,
+ within the same domain and across domains:
+
+ * <http://googlewebmastercentral.blogspot.com/2009/02/
+ specify-your-canonical.html>
+
+ * <http://googlewebmastercentral.blogspot.com/2011/06/
+ supporting-relcanonical-http-headers.html>
+
+ * <http://googlewebmastercentral.blogspot.com/2009/12/
+ handling-legitimate-cross-domain.html>
+
+ o Yahoo, canonical link relation HTML support within the same
+ domain:
+
+ * <http://www.ysearchblog.com/2009/02/12/
+ fighting-duplication-adding-more-arrows-to-your-quiver/>
+
+ o Bing, canonical link relation HTML support within the same domain:
+
+ * <http://www.bing.com/community/site_blogs/b/webmaster/archive/
+ 2009/02/12/
+ partnering-to-help-solve-duplicate-content-issues.aspx>
+
+Authors' Addresses
+
+ Maile Ohye
+
+ EMail: maileohye@gmail.com
+ URI: http://maileohye.com/
+
+
+ Joachim Kupke
+
+ EMail: joachim@kupke.za.net
+
+
+
+
+
+
+
+
+
+
+
+Ohye & Kupke Informational [Page 8]
+