summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc2184.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc2184.txt')
-rw-r--r--doc/rfc/rfc2184.txt507
1 files changed, 507 insertions, 0 deletions
diff --git a/doc/rfc/rfc2184.txt b/doc/rfc/rfc2184.txt
new file mode 100644
index 0000000..254321a
--- /dev/null
+++ b/doc/rfc/rfc2184.txt
@@ -0,0 +1,507 @@
+
+
+
+
+
+
+Network Working Group N. Freed
+Request for Comments: 2184 Innosoft
+Updates: 2045, 2047, 2183 K. Moore
+Category: Standards Track University of Tennessee
+ August 1997
+
+
+ MIME Parameter Value and Encoded Word Extensions:
+ Character Sets, Languages, and Continuations
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+1. Abstract
+
+ This memo defines extensions to the RFC 2045 media type and RFC 2183
+ disposition parameter value mechanisms to provide
+
+ (1) a means to specify parameter values in character sets
+ other than US-ASCII,
+
+ (2) to specify the language to be used should the value be
+ displayed, and
+
+ (3) a continuation mechanism for long parameter values to
+ avoid problems with header line wrapping.
+
+ This memo also defines an extension to the encoded words defined in
+ RFC 2047 to allow the specification of the language to be used for
+ display as well as the character set.
+
+2. Introduction
+
+ The Multipurpose Internet Mail Extensions, or MIME [RFC-2045, RFC-
+ 2046, RFC-2047, RFC-2048, RFC-2049], define a message format that
+ allows for
+
+ (1) textual message bodies in character sets other than
+ US-ASCII,
+
+ (2) non-textual message bodies,
+
+ (3) multi-part message bodies, and
+
+
+
+Freed & Moore Standards Track [Page 1]
+
+RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+ (4) textual header information in character sets other than
+ US-ASCII.
+
+ MIME is now widely deployed and is used by a variety of Internet
+ protocols, including, of course, Internet email. However, MIME's
+ success has resulted in the need for additional mechanisms that were
+ not provided in the original protocol specification.
+
+ In particular, existing MIME mechanisms provide for named media type
+ (content-type field) parameters as well as named disposition
+ (content-disposition field). A MIME media type may specify any
+ number of parameters associated with all of its subtypes, and any
+ specific subtype may specify additional parameters for its own use. A
+ MIME disposition value may specify any number of associated
+ parameters, the most important of which is probably the attachment
+ disposition's filename parameter.
+
+ These parameter names and values end up appearing in the content-type
+ and content-disposition header fields in Internet email. This
+ inherently imposes three crucial limitations:
+
+ (1) Lines in Internet email header fields are folded according to
+ RFC 822 folding rules. This makes long parameter values
+ problematic.
+
+ (2) MIME headers, like the RFC 822 headers they often appear in,
+ are limited to 7bit US-ASCII, and the encoded-word mechanisms
+ of RFC 2047 are not available to parameter values. This makes
+ it impossible to have parameter values in character sets other
+ than US-ASCII without specifying some sort of private per-
+ parameter encoding.
+
+ (3) It has recently become clear that character set information
+ is not sufficient to properly display some sorts of
+ information -- language information is also needed [RFC-2130].
+ For example, support for handicapped users may require reading
+ text string aloud. The language the text is written in is
+ needed for this to be done correctly. Some parameter values
+ may need to be displayed, hence there is a need to allow for
+ the inclusion of language information.
+
+ The last problem on this list is also an issue for the encoded words
+ defined by RFC 2047, as encoded words are intended primarily for
+ display purposes.
+
+
+
+
+
+
+
+Freed & Moore Standards Track [Page 2]
+
+RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+ This document defines extensions that address all of these
+ limitations. All of these extensions are implemented in a fashion
+ that is completely compatible at a syntactic level with existing MIME
+ implementations. In addition, the extensions are designed to have as
+ little impact as possible on existing uses of MIME.
+
+ IMPORTANT NOTE: These mechanisms end up being somewhat gibbous when
+ they actually are used. As such, use of these mechanisms should not
+ be used lightly; they should be reserved for situations where a real
+ need for them exists.
+
+2.1. Requirements notation
+
+ This document occasionally uses terms that appear in capital letters.
+ When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
+ appear capitalized, they are being used to indicate particular
+ requirements of this specification. A discussion of the meanings of
+ these terms appears in [RFC-2119].
+
+
+3. Parameter Value Continuations
+
+ Long MIME media type or disposition parameter values do not interact
+ well with header line wrapping conventions. In particular, proper
+ header line wrapping depends on there being places where linear
+ whitespace (LWSP) is allowed, which may or may not be present in a
+ parameter value, and even if present may not be recognizable as such
+ since specific knowledge of parameter value syntax may not be
+ available to the agent doing the line wrapping. The result is that
+ long parameter values may end up getting truncated or otherwise
+ damaged by incorrect line wrapping implementations.
+
+ A mechanism is therefore needed to break up parameter values into
+ smaller units that are amenable to line wrapping. Any such mechanism
+ MUST be compatible with existing MIME processors. This means that
+
+ (1) the mechanism MUST NOT change the syntax of MIME media
+ type and disposition lines, and
+
+ (2) the mechanism MUST NOT depend on parameter ordering
+ since MIME states that parameters are not order sensitive.
+ Note that while MIME does prohibit modification of MIME
+ headers during transport, it is still possible that parameters
+ will be reordered when user agent level processing is done.
+
+
+
+
+
+
+
+Freed & Moore Standards Track [Page 3]
+
+RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+ The obvious solution, then, is to use multiple parameters to contain
+ a single parameter value and to use some kind of distinguished name
+ to indicate when this is being done. And this obvious solution is
+ exactly what is specified here: The asterisk character ("*") followed
+ by a decimal count is employed to indicate that multiple parameters
+ are being used to encapsulate a single parameter value. The count
+ starts at 0 and increments by 1 for each subsequent section of the
+ parameter value. Decimal values are used and neither leading zeroes
+ nor gaps in the sequence are allowed.
+
+ The original parameter value is recovered by concatenating the
+ various sections of the parameter, in order. For example, the
+ content-type field
+
+ Content-Type: message/external-body; access-type=URL;
+ URL*0="ftp://";
+ URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
+
+ is semantically identical to
+
+ Content-Type: message/external-body; access-type=URL;
+ URL="ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
+
+ Note that quotes around parameter values are part of the value
+ syntax; they are NOT part of the value itself. Furthermore, it is
+ explicitly permitted to have a mixture of quoted and unquoted
+ continuation fields.
+
+4. Parameter Value Character Set and Language Information
+
+ Some parameter values may need to be qualified with character set or
+ language information. It is clear that a distinguished parameter
+ name is needed to identify when this information is present along
+ with a specific syntax for the information in the value itself. In
+ addition, a lightweight encoding mechanism is needed to accomodate 8
+ bit information in parameter values.
+
+ Asterisks ("*") are reused to provide the indicator that language and
+ character set information is present and encoding is being used. A
+ single quote ("'") is used to delimit the character set and language
+ information at the beginning of the parameter value. Percent signs
+ ("%") are used as the encoding flag, which agrees with RFC 2047.
+
+
+
+
+
+
+
+
+
+Freed & Moore Standards Track [Page 4]
+
+RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+ Specifically, an asterisk at the end of a parameter name acts as an
+ indicator that character set and language information may appear at
+ the beginning of the parameter value. A single quote is used to
+ separate the character set, language, and actual value information in
+ the parameter value string, and an percent sign is used to flag
+ octets encoded in hexadecimal. For example:
+
+ Content-Type: application/x-stuff;
+ title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A
+
+ Note that it is perfectly permissible to leave either the character
+ set or language field blank. Note also that the single quote
+ delimiters MUST be present even when one of the field values is
+ omitted. This is done when either character set, language, or both
+ are not relevant to the parameter value at hand. This MUST NOT be
+ done in order to indicate a default character set or language --
+ parameter field definitions MUST NOT assign a default character set
+ or lanugage.
+
+4.1. Combining Character Set, Language, and Parameter Continuations
+
+ Character set and language information may be combined with the
+ parameter continuation mechanism. For example:
+
+ Content-Type: application/x-stuff
+ title*1*=us-ascii'en'This%20is%20even%20more%20
+ title*2*=%2A%2A%2Afun%2A%2A%2A%20
+ title*3="isn't it!"
+
+ Note that:
+
+ (1) Language and character set information only appear at
+ the beginning of a given parameter value.
+
+ (2) Continuations do not provide a facility for using more
+ than one character set or language in the same parameter
+ value.
+
+ (3) A value presented using multiple continuations may
+ contain a mixture of encoded and unencoded segments.
+
+ (4) The first segment of a continuation MUST be encoded if
+ language and character set information are given.
+
+ (5) If the first segment of a continued parameter value is
+ encoded the language and character set field delimiters MUST
+ be present even when the fields are left blank.
+
+
+
+
+Freed & Moore Standards Track [Page 5]
+
+RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+5. Language specification in Encoded Words
+
+ RFC 2047 provides support for non-US-ASCII character sets in RFC 822
+ message header comments, phrases, and any unstructured text field.
+ This is done by defining an encoded word construct which can appear
+ in any of these places. Given that these are fields intended for
+ display, it is sometimes necessary to associate language information
+ with encoded words as well as just the character set. This
+ specification extends the definition of an encoded word to allow the
+ inclusion of such information. This is simply done by suffixing the
+ character set specification with an asterisk followed by the language
+ tag. For example:
+
+ From: =?US-ASCII*EN?Q?Keith_Moore?= <moore@cs.utk.edu>
+
+6. IMAP4 Handling of Parameter Values
+
+ IMAP4 [RFC-2060] servers SHOULD decode parameter value continuations
+ when generating the BODY and BODYSTRUCTURE fetch attributes.
+
+7. Modifications to MIME ABNF
+
+ The ABNF for MIME parameter values given in RFC 2045 is:
+
+ parameter := attribute "=" value
+
+ attribute := token
+ ; Matching of attributes
+ ; is ALWAYS case-insensitive.
+
+ This specification changes this ABNF to:
+
+ parameter := regular-parameter / extended-parameter
+
+ regular-parameter := regular-parameter-name "=" value
+
+ regular-parameter-name := attribute [section]
+
+ attribute := 1*attribute-char
+
+ attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs,
+ "*", "'", "%", or tspecials>
+
+ section := initial-section / other-sections
+
+ initial-section := "*1"
+
+
+
+
+
+Freed & Moore Standards Track [Page 6]
+
+RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+ other-sections := "*" (("2" / "3" / "4" / "5" /
+ "6" / "7" / "8" / "9") *DIGIT) /
+ ("1" 1*DIGIT))
+
+ extended-parameter := (extended-initial-name "="
+ extended-value) /
+ (extended-other-names "="
+ extended-other-values)
+
+ extended-initial-name := attribute [initial-section] "*"
+
+ extended-other-names := attribute other-sections "*"
+
+ extended-initial-value := [charset] "'" [language] "'"
+ extended-other-values
+
+ extended-other-values := *(ext-octet / attribute-char)
+
+ ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
+
+ charset := <registered character set name>
+
+ language := <registered language tag [RFC-1766]>
+
+ The ABNF given in RFC 2047 for encoded-words is:
+
+ encoded-word := "=?" charset "?" encoding "?" encoded-text "?="
+
+ This specification changes this ABNF to:
+
+ encoded-word := "=?" charset ["*" language] "?" encoded-text "?="
+
+
+8. Character sets which allow specification of language
+
+ In the future it is likely that some character sets will provide
+ facilities for inline language labelling. Such facilities are
+ inherently more flexible than those defined here as they allow for
+ language switching in the middle of a string.
+
+ If and when such facilities are developed they SHOULD be used in
+ preference to the language labelling facilities specified here. Note
+ that all the mechanisms defined here allow for the omission of
+ language labels so as to be able to accomodate this possible future
+ usage.
+
+
+
+
+
+
+Freed & Moore Standards Track [Page 7]
+
+RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+9. Security Considerations
+
+ This RFC does not discuss security issues and is not believed to
+ raise any security issues not already endemic in electronic mail and
+ present in fully conforming implementations of MIME.
+
+10. References
+
+ [RFC-822]
+ Crocker, D., "Standard for the Format of ARPA Internet Text
+ Messages", STD 11, RFC 822, August 1982.
+
+ [RFC-1766]
+ Alvestrand, H., "Tags for the Identification of Languages", RFC
+ 1766, March 1995.
+
+ [RFC-2045]
+ Freed, N. and Borenstein, N., "Multipurpose Internet Mail
+ Extensions (MIME) Part One: Format of Internet Message Bodies",
+ RFC 2045, Innosoft, First Virtual Holdings, December 1996.
+
+ [RFC-2046]
+ Freed, N. and Borenstein, N., "Multipurpose Internet Mail
+ Extensions (MIME) Part Two: Media Types", RFC 2046, Innosoft,
+ First Virtual Holdings, December 1996.
+
+ [RFC-2047]
+ Moore, K., "Multipurpose Internet Mail Extensions (MIME) Part
+ Three: Representation of Non-ASCII Text in Internet Message
+ Headers", RFC 2047, University of Tennessee, December 1996.
+
+ [RFC-2048]
+ Freed, N., Klensin, J., Postel, J., "Multipurpose Internet Mail
+ Extensions (MIME) Part Four: MIME Registration Procedures", RFC
+ 2048, Innosoft, MCI, ISI, December 1996.
+
+ [RFC-2049]
+ Freed, N. and Borenstein, N., "Multipurpose Internet Mail
+ Extensions (MIME) Part Five: Conformance Criteria and Examples",
+ RFC 2049, Innosoft, FIrst Virtual Holdings, December 1996.
+
+ [RFC-2060]
+ Crispin, M., "Internet Message Access Protocol - Version 4rev1",
+ RFC 2060, December 1996.
+
+ [RFC-2119]
+ Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", RFC 2119, March 1997.
+
+
+
+Freed & Moore Standards Track [Page 8]
+
+RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+ [RFC-2130]
+ Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson,
+ R., Crispin, M., Svanberg, P., "Report from the IAB Character Set
+ Workshop", RFC 2130, April 1997.
+
+ [RFC-2183]
+ Troost, R., Dorner, S., and Moore, K., "Communicating Presentation
+ Information in Internet Messages: The Content-Disposition
+ Header", RFC 2183, August 1997.
+
+11. Authors' Addresses
+
+ Ned Freed
+ Innosoft International, Inc.
+ 1050 East Garvey Avenue South
+ West Covina, CA 91790
+ USA
+ tel: +1 818 919 3600 fax: +1 818 919 3614
+ email: ned@innosoft.com
+
+ Keith Moore
+ Computer Science Dept.
+ University of Tennessee
+ 107 Ayres Hall
+ Knoxville, TN 37996-1301
+ USA
+ email: moore@cs.utk.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed & Moore Standards Track [Page 9]
+