diff options
Diffstat (limited to 'doc/rfc/rfc2854.txt')
-rw-r--r-- | doc/rfc/rfc2854.txt | 451 |
1 files changed, 451 insertions, 0 deletions
diff --git a/doc/rfc/rfc2854.txt b/doc/rfc/rfc2854.txt new file mode 100644 index 0000000..ad1e0f7 --- /dev/null +++ b/doc/rfc/rfc2854.txt @@ -0,0 +1,451 @@ + + + + + + +Network Working Group D. Connolly +Request for Comments: 2854 World Wide Web Consortium (W3C) +Obsoletes: 2070, 1980, 1942, 1867, 1866 L. Masinter +Category: Informational AT&T + June 2000 + + + The 'text/html' Media Type + +Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2000). All Rights Reserved. + +Abstract + + This document summarizes the history of HTML development, and defines + the "text/html" MIME type by pointing to the relevant W3C + recommendations; it is intended to obsolete the previous IETF + documents defining HTML, including RFC 1866, RFC 1867, RFC 1980, RFC + 1942 and RFC 2070, and to remove HTML from IETF Standards Track. + + This document was prepared at the request of the W3C HTML working + group. Please send comments to www-html@w3.org, a public mailing list + with archive at <http://lists.w3.org/Archives/Public/www-html/>. + +1. Introduction and background + + HTML has been in use in the World Wide Web information infrastructure + since 1990, and specified in various informal documents. The + text/html media type was first officially defined by the IETF HTML + working group in 1995 in [HTML20]. Extensions to HTML were proposed + in [HTML30], [UPLOAD], [TABLES], [CLIMAPS], and [I18N]. + + The IETF HTML working group closed Sep 1996, and work on defining + HTML moved to the World Wide Web Consortium (W3C). The proposed + extensions were incorporated to some extent in [HTML32], and to a + larger extent in [HTML40]. The definition of multipart/form-data from + [UPLOAD] was described in [FORMDATA]. In addition, a reformulation of + HTML 4.0 in XML 1.0[XHTML1] was developed. + + + + + + +Connolly & Masinter Informational [Page 1] + +RFC 2854 The 'text/html' Media Type June 2000 + + + [HTML32] notes "This specification defines HTML version 3.2. HTML 3.2 + aims to capture recommended practice as of early '96 and as such to + be used as a replacement for HTML 2.0 (RFC 1866)." Subsequent + specifications for HTML describe the differences in each version. + + In addition to the development of standards, a wide variety of + additional extensions, restrictions, and modifications to HTML were + popularized by NCSA's Mosaic system and subsequently by the + competitive implementations of Netscape Navigator and Microsoft + Internet Explorer; these extensions are documented in numerous books + and online guides. + +2. Registration of MIME media type text/html + + MIME media type name: text + MIME subtype name: html + Required parameters: none + Optional parameters: + + charset + The optional parameter "charset" refers to the character + encoding used to represent the HTML document as a sequence of + bytes. Any registered IANA charset may be used, but UTF-8 is + preferred. Although this parameter is optional, it is strongly + recommended that it always be present. See Section 6 below for + a discussion of charset default rules. + + Note that [HTML20] included an optional "level" parameter; in + practice, this parameter was never used and has been removed from + this specification. [HTML30] also suggested a "version" + parameter; in practice, this parameter also was never used and has + been removed from this specification. + + Encoding considerations: + See Section 4 of this document. + + Security considerations: + See Section 7 of this document. + + Interoperability considerations: + HTML is designed to be interoperable across the widest possible + range of platforms and devices of varying capabilities. However, + there are contexts (platforms of limited display capability, for + example) where not all of the capabilities of the full HTML + definition are feasible. There is ongoing work to develop both a + modularization of HTML and a set of profiling capabilities to + identify and negotiate restricted (and extended) capabilities. + + + + +Connolly & Masinter Informational [Page 2] + +RFC 2854 The 'text/html' Media Type June 2000 + + + Due to the long and distributed development of HTML, current + practice on the Internet includes a wide variety of HTML variants. + Implementors of text/html interpreters must be prepared to be + "bug-compatible" with popular browsers in order to work with many + HTML documents available the Internet. + + Typically, different versions are distinguishable by the DOCTYPE + declaration contained within them, although the DOCTYPE + declaration itself is sometimes omitted or incorrect. + + Published specification: + The text/html media type is now defined by W3C Recommendations; + the latest published version is [HTML401]. In addition, [XHTML1] + defines a profile of use of XHTML which is compatible with HTML + 4.01 and which may also be labeled as text/html. + + Applications which use this media type: + The first and most common application of HTML is the World Wide + Web; commonly, HTML documents contain URI references [URI] to + other documents and media to be retrieved using the HTTP protocol + [HTTP]. Many gateway applications provide HTML-based interfaces to + other underlying complex services. Numerous other applications now + also use HTML as a convenient platform-independent multimedia + document representation. + + Additional information: + + Magic number: + There is no single initial string that is always present for + HTML files. However, Section 5 below gives some guidelines for + recognizing HTML files. + + File extension: + The file extensions 'html' or 'htm' are commonly used, but + other extensions denoting file formats for preprocessing are + also common. + + Macintosh File Type code: TEXT + + Person & email address to contact for further information: + Dan Connolly <connolly@w3.org> + Larry Masinter <lmm@acm.org> + + + + + + + + + +Connolly & Masinter Informational [Page 3] + +RFC 2854 The 'text/html' Media Type June 2000 + + + Intended usage: COMMON + + Author/Change controller: + The HTML specification is a work product of the World Wide Web + Consortium's HTML Working Group. The W3C has change control over + the HTML specification. + + Further information: + HTML has a means of including, by reference via URI, additional + resources (image, video clip, applet) within the base document. In + order to transfer a complete HTML object and the included + resources in a single MIME object, the mechanisms of [MHTML] may + be used. + +3. Fragment Identifiers + + The URI specification [URI] notes that the semantics of a fragment + identifier (part of a URI after a "#") is a property of the data + resulting from a retrieval action, and that the format and + interpretation of fragment identifiers is dependent on the media type + of the retrieval result. + + For documents labeled as text/html, the fragment identifier + designates the correspondingly named element; any element may be + named with the "id" attribute, and A, APPLET, FRAME, IFRAME, IMG and + MAP elements may be named with a "name" attribute. This is described + in detail in [HTML40] section 12. + +4. Encoding considerations + + Because of the availability within HTML itself for using character + entity references, documents that use a wide repertoire of characters + may still be represented using the US-ASCII charset and transported + without encoding. However, transport of text/html using a charset + other than US-ASCII may require base64 or quoted-printable encoding + for 7-bit channels. + + As with all MIME text subtypes, the canonical form of "text/html" + must always represent a line break as a sequence of a CR byte value + (0x0D) followed by an LF (0x0A) byte value. Similarly, any + occurrence of such a CRLF sequence in "text/html" must represent a + line break. Use of CR byte values and LF byte values outside of line + break sequences is also forbidden. This rule applies regardless of + the character encoding ('charset') involved. + + + + + + + +Connolly & Masinter Informational [Page 4] + +RFC 2854 The 'text/html' Media Type June 2000 + + + Note, however, that the HTTP protocol allows the transport of data + not in canonical form, and, in particular, with other end-of-line + conventions; see [HTTP] section 3.7.1. This exception is commonly + used for HTML. + + HTML sent via email is still subject to the MIME restrictions; this + is discussed fully in [MHTML] Section 10. + +5. Recognizing HTML files + + Almost all HTML files have the string "<html" or "<HTML" near the + beginning of the file. + + Documents conformant to HTML 2.0, HTML 3.2 and HTML 4.0 will start + with a DOCTYPE declaration "<!DOCTYPE HTML" near the beginning, + before the "<html". These dialects are case insensitive. Files may + start with white space, comments (introduced by "<!--" ), or + processing instructions (introduced by "<?") prior to the DOCTYPE + declaration. + + XHTML documents (optionally) start with an XML declaration which + begins with "<?xml" and are required to have a DOCTYPE declaration + "<!DOCTYPE html". + +6. Charset default rules + + The use of an explicit charset parameter is strongly recommended. + While [MIME] specifies "The default character set, which must be + assumed in the absence of a charset parameter, is US-ASCII." [HTTP] + Section 3.7.1, defines that "media subtypes of the 'text' type are + defined to have a default charset value of 'ISO-8859-1'". Section + 19.3 of [HTTP] gives additional guidelines. Using an explicit + charset parameter will help avoid confusion. + + Using an explicit charset parameter also takes into account that the + overwhelming majority of deployed browsers are set to use something + else than 'ISO-8859-1' as the default; the actual default is either a + corporate character encoding or character encodings widely deployed + in a certain national or regional community. For further + considerations, please also see Section 5.2 of [HTML40]. + + + + + + + + + + + +Connolly & Masinter Informational [Page 5] + +RFC 2854 The 'text/html' Media Type June 2000 + + +7. Security Considerations + + [HTML401], section B.10, notes various security issues with + interpreting anchors and forms in HTML documents. + + In addition, the introduction of scripting languages and interactive + capabilities in HTML 4.0 introduced a number of security risks + associated with the automatic execution of programs written by the + sender but interpreted by the recipient. User agents executing such + scripts or programs must be extremely careful to insure that + untrusted software is executed in a protected environment. + +8. Authors' Addresses + + Daniel W. Connolly + World Wide Web Consortium (W3C) + MIT Laboratory for Computer Science + 545 Technology Square + Cambridge, MA 02139, U.S.A. + + EMail: connolly@w3.org + http://www.w3.org/People/Connolly/ + + + Larry Masinter + AT&T + 75 Willow Road + Menlo Park, CA 94025 + + EMail: LM@att.com + http://larry.masinter.net + +9. References + + [CLIMAPS] Seidman, J., "A Proposed Extension to HTML: Client-Side + Image Maps", RFC 1980, August 1996. + + [FORMDATA] Masinter, L., "Returning Values from Forms: + multipart/form-data", RFC 2388, August 1998. + + [HTML20] Berners-Lee, T. and D. Connolly, "Hypertext Markup + Language - 2.0", RFC 1866, November 1995. + + [HTML30] Raggett, D., "HyperText Markup Language Specification + Version 3.0", September 1995. (Available at + <http://www.w3.org/MarkUp/html3/CoverPage>). + + + + + +Connolly & Masinter Informational [Page 6] + +RFC 2854 The 'text/html' Media Type June 2000 + + + [HTML32] Raggett, D., "HTML 3.2 Reference Specification", W3C + Recomendation, January 1997. + Available at <http://www.w3.org/TR/REC-html32>. + + [HTML40] Raggett, D., et al., "HTML 4.0 Specification", W3C + Recommendation, December 1997. + Available at <http://www.w3.org/TR/1998/REC-html40- + 19980424> + + [HTML401] Raggett, D., et al., "HTML 4.01 Specification", W3C + Recommendation, December 1999. + Available at <http://www.w3.org/TR/html401>. + + [HTTP] Gettys, J., Fielding, R., Mogul, J., Frystyk, H., + Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext + Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. + + [I18N] Yergeau, F., Nicol, G. and M. Duerst, + "Internationalization of the Hypertext Markup Language", + RFC 2070, January 1997. + + [MHTML] Palme, J., Hotmann, A. and N. Shelness, "MIME + Encapsulation of Aggregate Documents, such as HTML + (MHTML)", RFC 2557, March 1999. + + [MIME] Freed, N. and N. Borenstein, "Multipurpose Internet Mail + Extensions (MIME) Part Two: Media Types", RFC 2046, + November 1996. + + [TABLES] Raggett, D., "HTML Tables", RFC 1942, May 1996. + + [UPLOAD] Nebel, E. and L. Masinter, "Form-based File Upload in + HTML", RFC 1867, November 1995. + + [URI] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform + Resource Identifiers (URI): Generic Syntax", RFC 2396, + August 1998. + + [XHTML1] "XHTML 1.0: The Extensible HyperText Markup Language: A + Reformulation of HTML 4 in XML 1.0", W3C Recommendation, + January 2000. Available at <http://www.w3.org/TR/xhtml1>. + + + + + + + + + + +Connolly & Masinter Informational [Page 7] + +RFC 2854 The 'text/html' Media Type June 2000 + + +10. Full Copyright Statement + + Copyright (C) The Internet Society (2000). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Connolly & Masinter Informational [Page 8] + |