summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1867.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc1867.txt')
-rw-r--r--doc/rfc/rfc1867.txt731
1 files changed, 731 insertions, 0 deletions
diff --git a/doc/rfc/rfc1867.txt b/doc/rfc/rfc1867.txt
new file mode 100644
index 0000000..9c43522
--- /dev/null
+++ b/doc/rfc/rfc1867.txt
@@ -0,0 +1,731 @@
+
+
+
+
+
+
+Network Working Group E. Nebel
+Request For Comments: 1867 L. Masinter
+Category: Experimental Xerox Corporation
+ November 1995
+
+
+ Form-based File Upload in HTML
+
+Status of this Memo
+
+ This memo defines an Experimental Protocol for the Internet
+ community. This memo does not specify an Internet standard of any
+ kind. Discussion and suggestions for improvement are requested.
+ Distribution of this memo is unlimited.
+
+1. Abstract
+
+ Currently, HTML forms allow the producer of the form to request
+ information from the user reading the form. These forms have proven
+ useful in a wide variety of applications in which input from the user
+ is necessary. However, this capability is limited because HTML forms
+ don't provide a way to ask the user to submit files of data. Service
+ providers who need to get files from the user have had to implement
+ custom user applications. (Examples of these custom browsers have
+ appeared on the www-talk mailing list.) Since file-upload is a
+ feature that will benefit many applications, this proposes an
+ extension to HTML to allow information providers to express file
+ upload requests uniformly, and a MIME compatible representation for
+ file upload responses. This also includes a description of a
+ backward compatibility strategy that allows new servers to interact
+ with the current HTML user agents.
+
+ The proposal is independent of which version of HTML it becomes a
+ part.
+
+2. HTML forms with file submission
+
+ The current HTML specification defines eight possible values for the
+ attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE,
+ PASSWORD, RADIO, RESET, SUBMIT, TEXT.
+
+ In addition, it defines the default ENCTYPE attribute of the FORM
+ element using the POST METHOD to have the default value
+ "application/x-www-form-urlencoded".
+
+
+
+
+
+
+
+Nebel & Masinter Experimental [Page 1]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+ This proposal makes two changes to HTML:
+
+ 1) Add a FILE option for the TYPE attribute of INPUT.
+ 2) Allow an ACCEPT attribute for INPUT tag, which is a list of
+ media types or type patterns allowed for the input.
+
+ In addition, it defines a new MIME media type, multipart/form-data,
+ and specifies the behavior of HTML user agents when interpreting a
+ form with ENCTYPE="multipart/form-data" and/or <INPUT type="file">
+ tags.
+
+ These changes might be considered independently, but are all
+ necessary for reasonable file upload.
+
+ The author of an HTML form who wants to request one or more files
+ from a user would write (for example):
+
+ <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST>
+
+ File to process: <INPUT NAME="userfile1" TYPE="file">
+
+ <INPUT TYPE="submit" VALUE="Send File">
+
+ </FORM>
+
+ The change to the HTML DTD is to add one item to the entity
+ "InputType". In addition, it is proposed that the INPUT tag have an
+ ACCEPT attribute, which is a list of comma-separated media types.
+
+ ... (other elements) ...
+
+ <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
+ RADIO | SUBMIT | RESET |
+ IMAGE | HIDDEN | FILE )">
+ <!ELEMENT INPUT - 0 EMPTY>
+ <!ATTLIST INPUT
+ TYPE %InputType TEXT
+ NAME CDATA #IMPLIED -- required for all but submit and reset
+ VALUE CDATA #IMPLIED
+ SRC %URI #IMPLIED -- for image inputs --
+ CHECKED (CHECKED) #IMPLIED
+ SIZE CDATA #IMPLIED --like NUMBERS,
+ but delimited with comma, not space
+ MAXLENGTH NUMBER #IMPLIED
+ ALIGN (top|middle|bottom) #IMPLIED
+ ACCEPT CDATA #IMPLIED --list of content types
+ >
+
+
+
+
+Nebel & Masinter Experimental [Page 2]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+ ... (other elements) ...
+
+3. Suggested implementation
+
+ While user agents that interpret HTML have wide leeway to choose the
+ most appropriate mechanism for their context, this section suggests
+ how one class of user agent, WWW browsers, might implement file
+ upload.
+
+3.1 Display of FILE widget
+
+ When a INPUT tag of type FILE is encountered, the browser might show
+ a display of (previously selected) file names, and a "Browse" button
+ or selection method. Selecting the "Browse" button would cause the
+ browser to enter into a file selection mode appropriate for the
+ platform. Window-based browsers might pop up a file selection window,
+ for example. In such a file selection dialog, the user would have the
+ option of replacing a current selection, adding a new file selection,
+ etc. Browser implementors might choose let the list of file names be
+ manually edited.
+
+ If an ACCEPT attribute is present, the browser might constrain the
+ file patterns prompted for to match those with the corresponding
+ appropriate file extensions for the platform.
+
+3.2 Action on submit
+
+ When the user completes the form, and selects the SUBMIT element, the
+ browser should send the form data and the content of the selected
+ files. The encoding type application/x-www-form-urlencoded is
+ inefficient for sending large quantities of binary data or text
+ containing non-ASCII characters. Thus, a new media type,
+ multipart/form-data, is proposed as a way of efficiently sending the
+ values associated with a filled-out form from client to server.
+
+3.3 use of multipart/form-data
+
+ The definition of multipart/form-data is included in section 7. A
+ boundary is selected that does not occur in any of the data. (This
+ selection is sometimes done probabilisticly.) Each field of the form
+ is sent, in the order in which it occurs in the form, as a part of
+ the multipart stream. Each part identifies the INPUT name within the
+ original HTML form. Each part should be labelled with an appropriate
+ content-type if the media type is known (e.g., inferred from the file
+ extension or operating system typing information) or as
+ application/octet-stream.
+
+
+
+
+
+Nebel & Masinter Experimental [Page 3]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+ If multiple files are selected, they should be transferred together
+ using the multipart/mixed format.
+
+ While the HTTP protocol can transport arbitrary BINARY data, the
+ default for mail transport (e.g., if the ACTION is a "mailto:" URL)
+ is the 7BIT encoding. The value supplied for a part may need to be
+ encoded and the "content-transfer-encoding" header supplied if the
+ value does not conform to the default encoding. [See section 5 of
+ RFC 1521 for more details.]
+
+ The original local file name may be supplied as well, either as a
+ 'filename' parameter either of the 'content-disposition: form-data'
+ header or in the case of multiple files in a 'content-disposition:
+ file' header of the subpart. The client application should make best
+ effort to supply the file name; if the file name of the client's
+ operating system is not in US-ASCII, the file name might be
+ approximated or encoded using the method of RFC 1522. This is a
+ convenience for those cases where, for example, the uploaded files
+ might contain references to each other, e.g., a TeX file and its .sty
+ auxiliary style description.
+
+ On the server end, the ACTION might point to a HTTP URL that
+ implements the forms action via CGI. In such a case, the CGI program
+ would note that the content-type is multipart/form-data, parse the
+ various fields (checking for validity, writing the file data to local
+ files for subsequent processing, etc.).
+
+3.4 Interpretation of other attributes
+
+ The VALUE attribute might be used with <INPUT TYPE=file> tags for a
+ default file name. This use is probably platform dependent. It might
+ be useful, however, in sequences of more than one transaction, e.g.,
+ to avoid having the user prompted for the same file name over and
+ over again.
+
+ The SIZE attribute might be specified using SIZE=width,height, where
+ width is some default for file name width, while height is the
+ expected size showing the list of selected files. For example, this
+ would be useful for forms designers who expect to get several files
+ and who would like to show a multiline file input field in the
+ browser (with a "browse" button beside it, hopefully). It would be
+ useful to show a one line text field when no height is specified
+ (when the forms designer expects one file, only) and to show a
+ multiline text area with scrollbars when the height is greater than 1
+ (when the forms designer expects multiple files).
+
+
+
+
+
+
+Nebel & Masinter Experimental [Page 4]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+4. Backward compatibility issues
+
+ While not necessary for successful adoption of an enhancement to the
+ current WWW form mechanism, it is useful to also plan for a migration
+ strategy: users with older browsers can still participate in file
+ upload dialogs, using a helper application. Most current web browers,
+ when given <INPUT TYPE=FILE>, will treat it as <INPUT TYPE=TEXT> and
+ give the user a text box. The user can type in a file name into this
+ text box. In addition, current browsers seem to ignore the ENCTYPE
+ parameter in the <FORM> element, and always transmit the data as
+ application/x-www-form-urlencoded.
+
+ Thus, the server CGI might be written in a way that would note that
+ the form data returned had content-type application/x-www-form-
+ urlencoded instead of multipart/form-data, and know that the user was
+ using a browser that didn't implement file upload.
+
+ In this case, rather than replying with a "text/html" response, the
+ CGI on the server could instead send back a data stream that a helper
+ application might process instead; this would be a data stream of
+ type "application/x-please-send-files", which contains:
+
+ * The (fully qualified) URL to which the actual form data should
+ be posted (terminated with CRLF)
+ * The list of field names that were supposed to be file contents
+ (space separated, terminated with CRLF)
+ * The entire original application/x-www-form-urlencoded form data
+ as originally sent from client to server.
+
+ In this case, the browser needs to be configured to process
+ application/x-please-send-files to launch a helper application.
+
+ The helper would read the form data, note which fields contained
+ 'local file names' that needed to be replaced with their data
+ content, might itself prompt the user for changing or adding to the
+ list of files available, and then repackage the data & file contents
+ in multipart/form-data for retransmission back to the server.
+
+ The helper would generate the kind of data that a 'new' browser
+ should actually have sent in the first place, with the intention that
+ the URL to which it is sent corresponds to the original ACTION URL.
+ The point of this is that the server can use the *same* CGI to
+ implement the mechanism for dealing with both old and new browsers.
+
+ The helper need not display the form data, but *should* ensure that
+ the user actually be prompted about the suitability of sending the
+ files requested (this is to avoid a security problem with malicious
+ servers that ask for files that weren't actually promised by the
+
+
+
+Nebel & Masinter Experimental [Page 5]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+ user.) It would be useful if the status of the transfer of the files
+ involved could be displayed.
+
+5. Other considerations
+
+5.1 Compression, encryption
+
+ This scheme doesn't address the possible compression of files. After
+ some consideration, it seemed that the optimization issues of file
+ compression were too complex to try to automatically have browsers
+ decide that files should be compressed. Many link-layer transport
+ mechanisms (e.g., high-speed modems) perform data compression over
+ the link, and optimizing for compression at this layer might not be
+ appropriate. It might be possible for browsers to optionally produce
+ a content-transfer-encoding of x-compress for file data, and for
+ servers to decompress the data before processing, if desired; this
+ was left out of the proposal, however.
+
+ Similarly, the proposal does not contain a mechanism for encryption
+ of the data; this should be handled by whatever other mechanisms are
+ in place for secure transmission of data, whether via secure HTTP or
+ mail.
+
+5.2 Deferred file transmission
+
+ In some situations, it might be advisable to have the server validate
+ various elements of the form data (user name, account, etc.) before
+ actually preparing to receive the data. However, after some
+ consideration, it seemed best to require that servers that wish to do
+ this should implement this as a series of forms, where some of the
+ data elements that were previously validated might be sent back to
+ the client as 'hidden' fields, or by arranging the form so that the
+ elements that need validation occur first. This puts the onus of
+ maintaining the state of a transaction only on those servers that
+ wish to build a complex application, while allowing those cases that
+ have simple input needs to be built simply.
+
+ The HTTP protocol may require a content-length for the overall
+ transmission. Even if it were not to do so, HTTP clients are
+ encouraged to supply content-length for overall file input so that a
+ busy server could detect if the proposed file data is too large to be
+ processed reasonably and just return an error code and close the
+ connection without waiting to process all of the incoming data. Some
+ current implementations of CGI require a content-length in all POST
+ transactions.
+
+ If the INPUT tag includes the attribute MAXLENGTH, the user agent
+ should consider its value to represent the maximum Content-Length (in
+
+
+
+Nebel & Masinter Experimental [Page 6]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+ bytes) which the server will accept for transferred files. In this
+ way, servers can hint to the client how much space they have
+ available for a file upload, before that upload takes place. It is
+ important to note, however, that this is only a hint, and the actual
+ requirements of the server may change between form creation and file
+ submission.
+
+ In any case, a HTTP server may abort a file upload in the middle of
+ the transaction if the file being received is too large.
+
+5.3 Other choices for return transmission of binary data
+
+ Various people have suggested using new mime top-level type
+ "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of
+ "packet" to express indeterminate-length binary data, rather than
+ relying on the multipart-style boundaries. While we are not opposed
+ to doing so, this would require additional design and standardization
+ work to get acceptance of "aggregate". On the other hand, the
+ 'multipart' mechanisms are well established, simple to implement on
+ both the sending client and receiving server, and as efficient as
+ other methods of dealing with multiple combinations of binary data.
+
+5.4 Not overloading <INPUT>:
+
+ Various people have wondered about the advisability of overloading
+ 'INPUT' for this function, rather than merely providing a different
+ type of FORM element. Among other considerations, the migration
+ strategy which is allowed when using <INPUT> is important. In
+ addition, the <INPUT> field *is* already overloaded to contain most
+ kinds of data input; rather than creating multiple kinds of <INPUT>
+ tags, it seems most reasonable to enhance <INPUT>. The 'type' of
+ INPUT is not the content-type of what is returned, but rather the
+ 'widget-type'; i.e., it identifies the interaction style with the
+ user. The description here is carefully written to allow <INPUT
+ TYPE=FILE> to work for text browsers or audio-markup.
+
+5.5 Default content-type of field data
+
+ Many input fields in HTML are to be typed in. There has been some
+ ambiguity as to how form data should be transmitted back to servers.
+ Making the content-type of <INPUT> fields be text/plain clearly
+ disambiguates that the client should properly encode the data before
+ sending it back to the server with CRLFs.
+
+5.6 Allow form ACTION to be "mailto:"
+
+ Independent of this proposal, it would be very useful for HTML
+ interpreting user agents to allow a ACTION in a form to be a
+
+
+
+Nebel & Masinter Experimental [Page 7]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+ "mailto:" URL. This seems like a good idea, with or without this
+ proposal. Similarly, the ACTION for a HTML form which is received via
+ mail should probably default to the "reply-to:" of the message.
+ These two proposals would allow HTML forms to be served via HTTP
+ servers but sent back via mail, or, alternatively, allow HTML forms
+ to be sent by mail, filled out by HTML-aware mail recipients, and the
+ results mailed back.
+
+5.7 Remote files with third-party transfer
+
+ In some scenarios, the user operating the client software might want
+ to specify a URL for remote data rather than a local file. In this
+ case, is there a way to allow the browser to send to the client a
+ pointer to the external data rather than the entire contents? This
+ capability could be implemented, for example, by having the client
+ send to the server data of type "message/external-body" with
+ "access-type" set to, say, "uri", and the URL of the remote data in
+ the body of the message.
+
+5.8 File transfer with ENCTYPE=x-www-form-urlencoded
+
+ If a form contains <INPUT TYPE=file> elements but does not contain an
+ ENCTYPE in the enclosing <FORM>, the behavior is not specified. It
+ is probably inappropriate to attempt to URN-encode large quantities
+ of data to servers that don't expect it.
+
+5.9 CRLF used as line separator
+
+ As with all MIME transmissions, CRLF is used as the separator for
+ lines in a POST of the data in multipart/form-data.
+
+5.10 Relationship to multipart/related
+
+ The MIMESGML group is proposing a new type called multipart/related.
+ While it contains similar features to multipart/form-data, the use
+ and application of form-data is different enough that form-data is
+ being described separately.
+
+ It might be possible at some point to encode the result of HTML forms
+ (including files) in a multipart/related body part; this is not
+ incompatible with this proposal.
+
+5.11 Non-ASCII field names
+
+ Note that mime headers are generally required to consist only of 7-
+ bit data in the US-ASCII character set. Hence field names should be
+ encoded according to the prescriptions of RFC 1522 if they contain
+ characters outside of that set. In HTML 2.0, the default character
+
+
+
+Nebel & Masinter Experimental [Page 8]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+ set is ISO-8859-1, but non-ASCII characters in field names should be
+ encoded.
+
+6. Examples
+
+ Suppose the server supplies the following HTML:
+
+ <FORM ACTION="http://server.dom/cgi/handle"
+ ENCTYPE="multipart/form-data"
+ METHOD=POST>
+ What is your name? <INPUT TYPE=TEXT NAME=submitter>
+ What files are you sending? <INPUT TYPE=FILE NAME=pics>
+ </FORM>
+
+ and the user types "Joe Blow" in the name field, and selects a text
+ file "file1.txt" for the answer to 'What files are you sending?'
+
+ The client might send back the following data:
+
+ Content-type: multipart/form-data, boundary=AaB03x
+
+ --AaB03x
+ content-disposition: form-data; name="field1"
+
+ Joe Blow
+ --AaB03x
+ content-disposition: form-data; name="pics"; filename="file1.txt"
+ Content-Type: text/plain
+
+ ... contents of file1.txt ...
+ --AaB03x--
+
+ If the user also indicated an image file "file2.gif" for the answer
+ to 'What files are you sending?', the client might client might send
+ back the following data:
+
+ Content-type: multipart/form-data, boundary=AaB03x
+
+ --AaB03x
+ content-disposition: form-data; name="field1"
+
+ Joe Blow
+ --AaB03x
+ content-disposition: form-data; name="pics"
+ Content-type: multipart/mixed, boundary=BbC04y
+
+ --BbC04y
+ Content-disposition: attachment; filename="file1.txt"
+
+
+
+Nebel & Masinter Experimental [Page 9]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+ Content-Type: text/plain
+
+ ... contents of file1.txt ...
+ --BbC04y
+ Content-disposition: attachment; filename="file2.gif"
+ Content-type: image/gif
+ Content-Transfer-Encoding: binary
+
+ ...contents of file2.gif...
+ --BbC04y--
+ --AaB03x--
+
+7. Registration of multipart/form-data
+
+ The media-type multipart/form-data follows the rules of all multipart
+ MIME data streams as outlined in RFC 1521. It is intended for use in
+ returning the data that comes about from filling out a form. In a
+ form (in HTML, although other applications may also use forms), there
+ are a series of fields to be supplied by the user who fills out the
+ form. Each field has a name. Within a given form, the names are
+ unique.
+
+ multipart/form-data contains a series of parts. Each part is expected
+ to contain a content-disposition header where the value is "form-
+ data" and a name attribute specifies the field name within the form,
+ e.g., 'content-disposition: form-data; name="xxxxx"', where xxxxx is
+ the field name corresponding to that field. Field names originally in
+ non-ASCII character sets may be encoded using the method outlined in
+ RFC 1522.
+
+ As with all multipart MIME types, each part has an optional Content-
+ Type which defaults to text/plain. If the contents of a file are
+ returned via filling out a form, then the file input is identified as
+ application/octet-stream or the appropriate media type, if known. If
+ multiple files are to be returned as the result of a single form
+ entry, they can be returned as multipart/mixed embedded within the
+ multipart/form-data.
+
+ Each part may be encoded and the "content-transfer-encoding" header
+ supplied if the value of that part does not conform to the default
+ encoding.
+
+ File inputs may also identify the file name. The file name may be
+ described using the 'filename' parameter of the "content-disposition"
+ header. This is not required, but is strongly recommended in any case
+ where the original filename is known. This is useful or necessary in
+ many applications.
+
+
+
+
+Nebel & Masinter Experimental [Page 10]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+8. Security Considerations
+
+ It is important that a user agent not send any file that the user has
+ not explicitly asked to be sent. Thus, HTML interpreting agents are
+ expected to confirm any default file names that might be suggested
+ with <INPUT TYPE=file VALUE="yyyy">. Never have any hidden fields be
+ able to specify any file.
+
+ This proposal does not contain a mechanism for encryption of the
+ data; this should be handled by whatever other mechanisms are in
+ place for secure transmission of data, whether via secure HTTP, or by
+ security provided by MOSS (described in RFC 1848).
+
+ Once the file is uploaded, it is up to the receiver to process and
+ store the file appropriately.
+
+9. Conclusion
+
+ The suggested implementation gives the client a lot of flexibility in
+ the number and types of files it can send to the server, it gives the
+ server control of the decision to accept the files, and it gives
+ servers a chance to interact with browsers which do not support INPUT
+ TYPE "file".
+
+ The change to the HTML DTD is very simple, but very powerful. It
+ enables a much greater variety of services to be implemented via the
+ World-Wide Web than is currently possible due to the lack of a file
+ submission facility. This would be an extremely valuable addition to
+ the capabilities of the World-Wide Web.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Nebel & Masinter Experimental [Page 11]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+Authors' Addresses
+
+ Larry Masinter
+ Xerox Palo Alto Research Center
+ 3333 Coyote Hill Road
+ Palo Alto, CA 94304
+
+ Phone: (415) 812-4365
+ Fax: (415) 812-4333
+ EMail: masinter@parc.xerox.com
+
+
+ Ernesto Nebel
+ XSoft, Xerox Corporation
+ 10875 Rancho Bernardo Road, Suite 200
+ San Diego, CA 92127-2116
+
+ Phone: (619) 676-7817
+ Fax: (619) 676-7865
+ EMail: nebel@xsoft.sd.xerox.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Nebel & Masinter Experimental [Page 12]
+
+RFC 1867 Form-based File Upload in HTML November 1995
+
+
+A. Media type registration for multipart/form-data
+
+Media Type name:
+ multipart
+
+Media subtype name:
+ form-data
+
+Required parameters:
+ none
+
+Optional parameters:
+ none
+
+Encoding considerations:
+ No additional considerations other than as for other multipart types.
+
+Published specification:
+ RFC 1867
+
+Security Considerations
+
+ The multipart/form-data type introduces no new security
+ considerations beyond what might occur with any of the enclosed
+ parts.
+
+References
+
+[RFC 1521] MIME (Multipurpose Internet Mail Extensions) Part One:
+ Mechanisms for Specifying and Describing the Format of
+ Internet Message Bodies. N. Borenstein & N. Freed.
+ September 1993.
+
+[RFC 1522] MIME (Multipurpose Internet Mail Extensions) Part Two:
+ Message Header Extensions for Non-ASCII Text. K. Moore.
+ September 1993.
+
+[RFC 1806] Communicating Presentation Information in Internet
+ Messages: The Content-Disposition Header. R. Troost & S.
+ Dorner, June 1995.
+
+
+
+
+
+
+
+
+
+
+
+Nebel & Masinter Experimental [Page 13]
+