From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc7578.txt | 843 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 843 insertions(+) create mode 100644 doc/rfc/rfc7578.txt (limited to 'doc/rfc/rfc7578.txt') diff --git a/doc/rfc/rfc7578.txt b/doc/rfc/rfc7578.txt new file mode 100644 index 0000000..1e68cec --- /dev/null +++ b/doc/rfc/rfc7578.txt @@ -0,0 +1,843 @@ + + + + + + +Internet Engineering Task Force (IETF) L. Masinter +Request for Comments: 7578 Adobe +Obsoletes: 2388 July 2015 +Category: Standards Track +ISSN: 2070-1721 + + + Returning Values from Forms: multipart/form-data + +Abstract + + This specification defines the multipart/form-data media type, which + can be used by a wide variety of applications and transported by a + wide variety of protocols as a way of returning a set of values as + the result of a user filling out a form. This document obsoletes + RFC 2388. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc7578. + +Copyright Notice + + Copyright (c) 2015 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + + + + + + +Masinter Standards Track [Page 1] + +RFC 7578 multipart/form-data July 2015 + + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 + 2. Percent-Encoding Option . . . . . . . . . . . . . . . . . . . 3 + 3. Advice for Forms and Form Processing . . . . . . . . . . . . 3 + 4. Definition of multipart/form-data . . . . . . . . . . . . . . 4 + 4.1. "Boundary" Parameter of multipart/form-data . . . . . . . 4 + 4.2. Content-Disposition Header Field for Each Part . . . . . 4 + 4.3. Multiple Files for One Form Field . . . . . . . . . . . . 5 + 4.4. Content-Type Header Field for Each Part . . . . . . . . . 5 + 4.5. The Charset Parameter for "text/plain" Form Data . . . . 5 + 4.6. The _charset_ Field for Default Charset . . . . . . . . . 6 + 4.7. Content-Transfer-Encoding Deprecated . . . . . . . . . . 6 + 4.8. Other "Content-" Header Fields . . . . . . . . . . . . . 7 + 5. Operability Considerations . . . . . . . . . . . . . . . . . 7 + 5.1. Non-ASCII Field Names and Values . . . . . . . . . . . . 7 + 5.1.1. Avoid Non-ASCII Field Names . . . . . . . . . . . . . 7 + 5.1.2. Interpreting Forms and Creating multipart/form-data + Data . . . . . . . . . . . . . . . . . . . . . . . . 7 + 5.1.3. Parsing and Interpreting Form Data . . . . . . . . . 8 + 5.2. Ordered Fields and Duplicated Field Names . . . . . . . . 8 + 5.3. Interoperability with Web Applications . . . . . . . . . 8 + 5.4. Correlating Form Data with the Original Form . . . . . . 9 + 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 + 7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 + 8. Media Type Registration for multipart/form-data . . . . . . . 10 + 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 + 9.1. Normative References . . . . . . . . . . . . . . . . . . 11 + 9.2. Informative References . . . . . . . . . . . . . . . . . 12 + Appendix A. Changes from RFC 2388 . . . . . . . . . . . . . . . 14 + Appendix B. Alternatives . . . . . . . . . . . . . . . . . . . . 14 + Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 15 + Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15 + +1. Introduction + + In many applications, it is possible for a user to be presented with + a form. The user will fill out the form, including information that + is typed, generated by user input, or included from files that the + user has selected. When the form is filled out, the data from the + form is sent from the user to the receiving application. + + The definition of multipart/form-data is derived from one of those + applications, originally set out in [RFC1867] and subsequently + incorporated into HTML 3.2 [W3C.REC-html32-19970114], where forms are + expressed in HTML, and the form data is sent via HTTP or electronic + mail. This representation is widely implemented in numerous web + browsers and web servers. + + + +Masinter Standards Track [Page 2] + +RFC 7578 multipart/form-data July 2015 + + + However, multipart/form-data is also used for forms that are + presented using representations other than HTML (spreadsheets, PDF, + etc.) and for transport using means other than electronic mail or + HTTP; it is used in distributed applications that do not involve + forms at all or do not have users filling out the form. For this + reason, this document defines a general syntax and semantics + independent of the application for which it is used, with specific + rules for web applications noted in context. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in BCP 14, RFC 2119 + [RFC2119]. + +2. Percent-Encoding Option + + Within this specification, "percent-encoding" (as defined in + [RFC3986]) is offered as a possible way of encoding characters in + file names that are otherwise disallowed, including non-ASCII + characters, spaces, control characters, and so forth. The encoding + is created replacing each non-ASCII or disallowed character with a + sequence, where each byte of the UTF-8 encoding of the character is + represented by a percent-sign (%) followed by the (case-insensitive) + hexadecimal of that byte. + +3. Advice for Forms and Form Processing + + The representation and interpretation of forms and the nature of form + processing is not specified by this document. However, for forms and + form processing that result in the generation of multipart/form-data, + some suggestions are included. + + In a form, there is generally a sequence of fields, where each field + is expected to be supplied with a value, e.g., by a user who fills + out the form. Each field has a name. After a form has been filled + out and the form's data is "submitted", the form processing results + in a set of values for each field -- the "form data". + + In forms that work with multipart/form-data, field names could be + arbitrary Unicode strings; however, restricting field names to ASCII + will help avoid some interoperability issues (see Section 5.1). + + Within a given form, ensuring field names are unique is also helpful. + Some fields may have default values or presupplied values in the form + itself. Fields with presupplied values might be hidden or invisible; + this allows using generic processing for form data from a variety of + actual forms. + + + + +Masinter Standards Track [Page 3] + +RFC 7578 multipart/form-data July 2015 + + +4. Definition of multipart/form-data + + The media type multipart/form-data follows the model of multipart + MIME data streams as specified in Section 5.1 of [RFC2046]; changes + are noted in this document. + + A multipart/form-data body contains a series of parts separated by a + boundary. + +4.1. "Boundary" Parameter of multipart/form-data + + As with other multipart types, the parts are delimited with a + boundary delimiter, constructed using CRLF, "--", and the value of + the "boundary" parameter. The boundary is supplied as a "boundary" + parameter to the multipart/form-data type. As noted in Section 5.1 + of [RFC2046], the boundary delimiter MUST NOT appear inside any of + the encapsulated parts, and it is often necessary to enclose the + "boundary" parameter values in quotes in the Content-Type header + field. + +4.2. Content-Disposition Header Field for Each Part + + Each part MUST contain a Content-Disposition header field [RFC2183] + where the disposition type is "form-data". The Content-Disposition + header field MUST also contain an additional parameter of "name"; the + value of the "name" parameter is the original field name from the + form (possibly encoded; see Section 5.1). For example, a part might + contain a header field such as the following, with the body of the + part containing the form data of the "user" field: + + Content-Disposition: form-data; name="user" + + For form data that represents the content of a file, a name for the + file SHOULD be supplied as well, by using a "filename" parameter of + the Content-Disposition header field. The file name isn't mandatory + for cases where the file name isn't available or is meaningless or + private; this might result, for example, when selection or drag-and- + drop is used or when the form data content is streamed directly from + a device. + + If a "filename" parameter is supplied, the requirements of + Section 2.3 of [RFC2183] for the "receiving MUA" (i.e., the receiving + Mail User Agent) apply to receivers of multipart/form-data as well: + do not use the file name blindly, check and possibly change to match + local file system conventions if applicable, and do not use directory + path information that may be present. + + + + + +Masinter Standards Track [Page 4] + +RFC 7578 multipart/form-data July 2015 + + + In most multipart types, the MIME header fields in each part are + restricted to US-ASCII; for compatibility with those systems, file + names normally visible to users MAY be encoded using the percent- + encoding method in Section 2, following how a "file:" URI + [URI-SCHEME] might be encoded. + + NOTE: The encoding method described in [RFC5987], which would add a + "filename*" parameter to the Content-Disposition header field, MUST + NOT be used. + + Some commonly deployed systems use multipart/form-data with file + names directly encoded including octets outside the US-ASCII range. + The encoding used for the file names is typically UTF-8, although + HTML forms will use the charset associated with the form. + +4.3. Multiple Files for One Form Field + + The form data for a form field might include multiple files. + + [RFC2388] suggested that multiple files for a single form field be + transmitted using a nested "multipart/mixed" part. This usage is + deprecated. + + To match widely deployed implementations, multiple files MUST be sent + by supplying each file in a separate part but all with the same + "name" parameter. + + Receiving applications intended for wide applicability (e.g., + multipart/form-data parsing libraries) SHOULD also support the older + method of supplying multiple files. + +4.4. Content-Type Header Field for Each Part + + Each part MAY have an (optional) "Content-Type" header field, which + defaults to "text/plain". If the contents of a file are to be sent, + the file data SHOULD be labeled with an appropriate media type, if + known, or "application/octet-stream". + +4.5. The Charset Parameter for "text/plain" Form Data + + In the case where the form data is text, the charset parameter for + the "text/plain" Content-Type MAY be used to indicate the character + encoding used in that part. For example, a form with a text field in + which a user typed "Joe owes 100", where is the Euro symbol, + might have form data returned as: + + + + + + +Masinter Standards Track [Page 5] + +RFC 7578 multipart/form-data July 2015 + + + --AaB03x + content-disposition: form-data; name="field1" + content-type: text/plain;charset=UTF-8 + content-transfer-encoding: quoted-printable + + Joe owes =E2=82=AC100. + --AaB03x + + In practice, many widely deployed implementations do not supply a + charset parameter in each part, but rather, they rely on the notion + of a "default charset" for a multipart/form-data instance. + Subsequent sections will explain how the default charset is + established. + +4.6. The _charset_ Field for Default Charset + + Some form-processing applications (including HTML) have the + convention that the value of a form entry with entry name "_charset_" + and type "hidden" is automatically set when the form is opened; the + value is used as the default charset of text field values (see form- + charset in Section 5.1.2). In such cases, the value of the default + charset for each "text/plain" part without a charset parameter is the + supplied value. For example: + + --AaB03x + content-disposition: form-data; name="_charset_" + + iso-8859-1 + --AaB03x-- + content-disposition: form-data; name="field1" + + ...text encoded in iso-8859-1 ... + AaB03x-- + +4.7. Content-Transfer-Encoding Deprecated + + Previously, it was recommended that senders use a Content-Transfer- + Encoding encoding (such as "quoted-printable") for each non-ASCII + part of a multipart/form-data body because that would allow use in + transports that only support a "7bit" encoding. This use is + deprecated for use in contexts that support binary data such as HTTP. + Senders SHOULD NOT generate any parts with a Content-Transfer- + Encoding header field. + + Currently, no deployed implementations that send such bodies have + been discovered. + + + + + +Masinter Standards Track [Page 6] + +RFC 7578 multipart/form-data July 2015 + + +4.8. Other "Content-" Header Fields + + The multipart/form-data media type does not support any MIME header + fields in parts other than Content-Type, Content-Disposition, and (in + limited circumstances) Content-Transfer-Encoding. Other header + fields MUST NOT be included and MUST be ignored. + +5. Operability Considerations + +5.1. Non-ASCII Field Names and Values + + Normally, MIME header fields in multipart bodies are required to + consist only of 7-bit data in the US-ASCII character set. While + [RFC2388] suggested that non-ASCII field names be encoded according + to the method in [RFC2047], this practice doesn't seem to have been + followed widely. + + This specification makes three sets of recommendations for three + different states of workflow. + +5.1.1. Avoid Non-ASCII Field Names + + For broadest interoperability with existing deployed software, those + creating forms SHOULD avoid non-ASCII field names. This should not + be a burden because, in general, the field names are not visible to + users. The field names in the underlying need not match what the + user sees on the screen. + + If non-ASCII field names are unavoidable, form or application + creators SHOULD use UTF-8 uniformly. This will minimize + interoperability problems. + +5.1.2. Interpreting Forms and Creating multipart/form-data Data + + Some applications of this specification will supply a character + encoding to be used for interpretation of the multipart/form-data + body. In particular, HTML 5 [W3C.REC-html5-20141028] uses + + o the content of a "_charset_" field, if there is one; + + o the value of an accept-charset attribute of the
element, if + there is one; + + o the character encoding of the document containing the form, if it + is US-ASCII compatible; + + o otherwise, UTF-8. + + + + +Masinter Standards Track [Page 7] + +RFC 7578 multipart/form-data July 2015 + + + Call this value the form-charset. Any text, whether field name, + field value, or ("text/plain") form data that uses characters outside + the ASCII range MAY be represented directly encoded in the form- + charset. + +5.1.3. Parsing and Interpreting Form Data + + While this specification provides guidance for the creation of + multipart/form-data, parsers and interpreters should be aware of the + variety of implementations. File systems differ as to whether and + how they normalize Unicode names, for example. The matching of form + elements to form-data parts may rely on a fuzzier match. In + particular, some multipart/form-data generators might have followed + the previous advice of [RFC2388] and used the "encoded-word" method + of encoding non-ASCII values, as described in [RFC2047]: + + encoded-word = "=?" charset "?" encoding "?" encoded-text "?=" + + Others have been known to follow [RFC2231], to send unencoded UTF-8, + or even to send strings encoded in the form-charset. + + For this reason, interpreting multipart/form-data (even from + conforming generators) may require knowing the charset used in form + encoding in cases where the _charset_ field value or a charset + parameter of a "text/plain" Content-Type header field is not + supplied. + +5.2. Ordered Fields and Duplicated Field Names + + Form processors given forms with a well-defined ordering SHOULD send + back results in order. (Note that there are some forms that do not + define a natural order.) Intermediaries MUST NOT reorder the + results. Form parts with identical field names MUST NOT be + coalesced. + +5.3. Interoperability with Web Applications + + Many web applications use the "application/x-www-form-urlencoded" + method for returning data from forms. This format is quite compact, + for example: + + name=Xavier+Xantico&verdict=Yes&colour=Blue&happy=sad&Utf%F6r=Send + + However, there is no opportunity to label the enclosed data with a + content type, apply a charset, or use other encoding mechanisms. + + + + + + +Masinter Standards Track [Page 8] + +RFC 7578 multipart/form-data July 2015 + + + Many form-interpreting programs (primarily web browsers) now + implement and generate multipart/form-data, but a receiving + application might also need to support the + "application/x-www-form-urlencoded" format. + +5.4. Correlating Form Data with the Original Form + + This specification provides no specific mechanism by which multipart/ + form-data can be associated with the form that caused it to be + transmitted. This separation is intentional; many different forms + might be used for transmitting the same data. In practice, + applications may supply a specific form processing resource (in HTML, + the ACTION attribute in a FORM tag) for each different form. + Alternatively, data about the form might be encoded in a "hidden + field" (a field that is part of the form but that has a fixed value + to be transmitted back to the form-data processor). + +6. IANA Considerations + + The media type registration of multipart/form-data has been updated + to point to this document, using the template in Section 8. In + addition, the registrations of the "name" parameter and the "form- + data" value in the "Content Disposition Values and Parameters" + registry have been updated to both point to this document. + +7. Security Considerations + + All form-processing software should treat user supplied form-data + with sensitivity, as it often contains confidential or personally + identifying information. There is widespread use of form "auto-fill" + features in web browsers; these might be used to trick users to + unknowingly send confidential information when completing otherwise + innocuous tasks. multipart/form-data does not supply any features + for checking integrity, ensuring confidentiality, avoiding user + confusion, or other security features; those concerns must be + addressed by the form-filling and form-data-interpreting + applications. + + Applications that receive forms and process them must be careful not + to supply data back to the requesting form-processing site that was + not intended to be sent. + + It is important when interpreting the filename of the Content- + Disposition header field to not inadvertently overwrite files in the + recipient's file space. + + + + + + +Masinter Standards Track [Page 9] + +RFC 7578 multipart/form-data July 2015 + + + User applications that request form information from users must be + careful not to cause a user to send information to the requestor or a + third party unwillingly or unwittingly. For example, a form might + request that spam information be sent to an unintended third party or + private information be sent to someone that the user might not + actually intend. While this is primarily an issue for the + representation and interpretation of forms themselves (rather than + the data representation of the form data), the transportation of + private information must be done in a way that does not expose it to + unwanted prying. + + With the introduction of form-data that can reasonably send back the + content of files from a user's file space, the possibility arises + that a user might be sent an automated script that fills out a form + and then sends one of the user's local files to another address. + Thus, additional caution is required when executing automated + scripting where form-data might include a user's files. + + Files sent via multipart/form-data may contain arbitrary executable + content, and precautions against malicious content are necessary. + + The considerations of Sections 2.3 and 5 of [RFC2183], with respect + to the "filename" parameter of the Content-Disposition header field, + also apply to its usage here. + +8. Media Type Registration for multipart/form-data + + This section is the media type registration using the template from + [RFC6838]. + + Type name: multipart + + Subtype name: form-data + + Required parameters: boundary + + Optional parameters: none + + Encoding considerations: Common use is BINARY. + In limited use (or transports that restrict the encoding to 7bit + or 8bit), each part is encoded separately using Content-Transfer- + Encoding; see Section 4.7. + + Security considerations: See Section 7 of this document. + + Interoperability considerations: This document makes several + recommendations for interoperability with deployed + implementations, including Section 4.7. + + + +Masinter Standards Track [Page 10] + +RFC 7578 multipart/form-data July 2015 + + + Published specification: This document. + + Applications that use this media type: Numerous web browsers, + servers, and web applications. + + Fragment identifier considerations: None; fragment identifiers are + not defined for this type. + + Additional information: + + Additional information: + + Deprecated alias names for this type: N/A + Magic number(s): N/A + File extension(s): N/A + Macintosh file type code(s): N/A + + Person & email address to contact for further information: Author of + this document. + + Intended usage: COMMON + + Restrictions on usage: none + + Author: Author of this document. + + Change controller: IETF + + Provisional registration: N/A + +9. References + +9.1. Normative References + + [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail + Extensions (MIME) Part Two: Media Types", RFC 2046, + DOI 10.17487/RFC2046, November 1996, + . + + [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) + Part Three: Message Header Extensions for Non-ASCII Text", + RFC 2047, DOI 10.17487/RFC2047, November 1996, + . + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + . + + + +Masinter Standards Track [Page 11] + +RFC 7578 multipart/form-data July 2015 + + + [RFC2183] Troost, R., Dorner, S., and K. Moore, Ed., "Communicating + Presentation Information in Internet Messages: The + Content-Disposition Header Field", RFC 2183, + DOI 10.17487/RFC2183, August 1997, + . + + [RFC2231] Freed, N. and K. Moore, "MIME Parameter Value and Encoded + Word Extensions: Character Sets, Languages, and + Continuations", RFC 2231, DOI 10.17487/RFC2231, November + 1997, . + + [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform + Resource Identifier (URI): Generic Syntax", STD 66, + RFC 3986, DOI 10.17487/RFC3986, January 2005, + . + +9.2. Informative References + + [RFC1867] Nebel, E. and L. Masinter, "Form-based File Upload in + HTML", RFC 1867, DOI 10.17487/RFC1867, November 1995, + . + + [RFC2388] Masinter, L., "Returning Values from Forms: multipart/ + form-data", RFC 2388, DOI 10.17487/RFC2388, August 1998, + . + + [RFC5987] Reschke, J., "Character Set and Language Encoding for + Hypertext Transfer Protocol (HTTP) Header Field + Parameters", RFC 5987, DOI 10.17487/RFC5987, August 2010, + . + + [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type + Specifications and Registration Procedures", BCP 13, + RFC 6838, DOI 10.17487/RFC6838, January 2013, + . + + [URI-SCHEME] + Kerwin, M., "The file URI Scheme", Work in Progress, + draft-ietf-appsawg-file-scheme-02, May 2015. + + [W3C.REC-html32-19970114] + Raggett, D., "HTML 3.2 Reference Specification", W3C + Recommendation REC-html32-19970114, January 1997, + . + + + + + + + +Masinter Standards Track [Page 12] + +RFC 7578 multipart/form-data July 2015 + + + [W3C.REC-html5-20141028] + Hickson, I., Berjon, R., Faulkner, S., Leithead, T., + Navara, E., O'Connor, E., and S. Pfeiffer, "HTML5", W3C + Recommendation REC-html5-20141028, October 2014, + . + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Masinter Standards Track [Page 13] + +RFC 7578 multipart/form-data July 2015 + + +Appendix A. Changes from RFC 2388 + + The handling of non-ASCII field names has changed -- the method + described in RFC 2047 is no longer recommended; instead, it is + suggested that senders send UTF-8 field names directly and that file + names be sent directly in the form-charset. + + The handling of multiple files submitted as the result of a single + form field (e.g., HTML's element) results + in each file having its own top-level part with the same name + parameter; the method of using a nested "multipart/mixed" from + [RFC2388] is no longer recommended for creators and is not required + for receivers as there are no known implementations of senders. + + The _charset_ convention and use of an explicit "form-data" charset + is documented; also, "boundary" is now a required parameter in + Content-Type. + + The relationship of the ordering of fields within a form and the + ordering of returned values within multipart/form-data was not + defined before, nor was the handling of the case where a form has + multiple fields with the same name. + + Various editorial changes were made; they include removing the + obsolete discussion of alternatives from the appendix, updating the + references, and moving the outline of form processing into the + introduction. + +Appendix B. Alternatives + + There are numerous alternative ways in which form data can be + encoded; many are listed in Section 5.2 of [RFC2388]. The multipart/ + form-data encoding is verbose, especially if there are many fields + with short values. In most use cases, this overhead isn't + significant. + + More problematic are the differences introduced when implementors + opted to not follow [RFC2388] when encoding non-ASCII field names + (perhaps because "may" should have been "MUST"). As a result, + parsers need to be more complex for matching against the possible + outputs of various encoding methods. + + + + + + + + + + +Masinter Standards Track [Page 14] + +RFC 7578 multipart/form-data July 2015 + + +Acknowledgements + + Many thanks to the those who reviewed this document -- Alexey + Melnikov, Salvatore Loreto, Chris Lonvick, Kathleen Moriarty, Barry + Leiba, Julian Reschke, Tom Petch, Ned Freed, Cedric Brancourt, as + well as others, including Ian Hickson, who requested it be produced + in the first place. + +Author's Address + + Larry Masinter + Adobe + + Email: masinter@adobe.com + URI: http://larry.masinter.net + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Masinter Standards Track [Page 15] + -- cgit v1.2.3