diff options
Diffstat (limited to 'doc/rfc/rfc2184.txt')
-rw-r--r-- | doc/rfc/rfc2184.txt | 507 |
1 files changed, 507 insertions, 0 deletions
diff --git a/doc/rfc/rfc2184.txt b/doc/rfc/rfc2184.txt new file mode 100644 index 0000000..254321a --- /dev/null +++ b/doc/rfc/rfc2184.txt @@ -0,0 +1,507 @@ + + + + + + +Network Working Group N. Freed +Request for Comments: 2184 Innosoft +Updates: 2045, 2047, 2183 K. Moore +Category: Standards Track University of Tennessee + August 1997 + + + MIME Parameter Value and Encoded Word Extensions: + Character Sets, Languages, and Continuations + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +1. Abstract + + This memo defines extensions to the RFC 2045 media type and RFC 2183 + disposition parameter value mechanisms to provide + + (1) a means to specify parameter values in character sets + other than US-ASCII, + + (2) to specify the language to be used should the value be + displayed, and + + (3) a continuation mechanism for long parameter values to + avoid problems with header line wrapping. + + This memo also defines an extension to the encoded words defined in + RFC 2047 to allow the specification of the language to be used for + display as well as the character set. + +2. Introduction + + The Multipurpose Internet Mail Extensions, or MIME [RFC-2045, RFC- + 2046, RFC-2047, RFC-2048, RFC-2049], define a message format that + allows for + + (1) textual message bodies in character sets other than + US-ASCII, + + (2) non-textual message bodies, + + (3) multi-part message bodies, and + + + +Freed & Moore Standards Track [Page 1] + +RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997 + + + (4) textual header information in character sets other than + US-ASCII. + + MIME is now widely deployed and is used by a variety of Internet + protocols, including, of course, Internet email. However, MIME's + success has resulted in the need for additional mechanisms that were + not provided in the original protocol specification. + + In particular, existing MIME mechanisms provide for named media type + (content-type field) parameters as well as named disposition + (content-disposition field). A MIME media type may specify any + number of parameters associated with all of its subtypes, and any + specific subtype may specify additional parameters for its own use. A + MIME disposition value may specify any number of associated + parameters, the most important of which is probably the attachment + disposition's filename parameter. + + These parameter names and values end up appearing in the content-type + and content-disposition header fields in Internet email. This + inherently imposes three crucial limitations: + + (1) Lines in Internet email header fields are folded according to + RFC 822 folding rules. This makes long parameter values + problematic. + + (2) MIME headers, like the RFC 822 headers they often appear in, + are limited to 7bit US-ASCII, and the encoded-word mechanisms + of RFC 2047 are not available to parameter values. This makes + it impossible to have parameter values in character sets other + than US-ASCII without specifying some sort of private per- + parameter encoding. + + (3) It has recently become clear that character set information + is not sufficient to properly display some sorts of + information -- language information is also needed [RFC-2130]. + For example, support for handicapped users may require reading + text string aloud. The language the text is written in is + needed for this to be done correctly. Some parameter values + may need to be displayed, hence there is a need to allow for + the inclusion of language information. + + The last problem on this list is also an issue for the encoded words + defined by RFC 2047, as encoded words are intended primarily for + display purposes. + + + + + + + +Freed & Moore Standards Track [Page 2] + +RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997 + + + This document defines extensions that address all of these + limitations. All of these extensions are implemented in a fashion + that is completely compatible at a syntactic level with existing MIME + implementations. In addition, the extensions are designed to have as + little impact as possible on existing uses of MIME. + + IMPORTANT NOTE: These mechanisms end up being somewhat gibbous when + they actually are used. As such, use of these mechanisms should not + be used lightly; they should be reserved for situations where a real + need for them exists. + +2.1. Requirements notation + + This document occasionally uses terms that appear in capital letters. + When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY" + appear capitalized, they are being used to indicate particular + requirements of this specification. A discussion of the meanings of + these terms appears in [RFC-2119]. + + +3. Parameter Value Continuations + + Long MIME media type or disposition parameter values do not interact + well with header line wrapping conventions. In particular, proper + header line wrapping depends on there being places where linear + whitespace (LWSP) is allowed, which may or may not be present in a + parameter value, and even if present may not be recognizable as such + since specific knowledge of parameter value syntax may not be + available to the agent doing the line wrapping. The result is that + long parameter values may end up getting truncated or otherwise + damaged by incorrect line wrapping implementations. + + A mechanism is therefore needed to break up parameter values into + smaller units that are amenable to line wrapping. Any such mechanism + MUST be compatible with existing MIME processors. This means that + + (1) the mechanism MUST NOT change the syntax of MIME media + type and disposition lines, and + + (2) the mechanism MUST NOT depend on parameter ordering + since MIME states that parameters are not order sensitive. + Note that while MIME does prohibit modification of MIME + headers during transport, it is still possible that parameters + will be reordered when user agent level processing is done. + + + + + + + +Freed & Moore Standards Track [Page 3] + +RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997 + + + The obvious solution, then, is to use multiple parameters to contain + a single parameter value and to use some kind of distinguished name + to indicate when this is being done. And this obvious solution is + exactly what is specified here: The asterisk character ("*") followed + by a decimal count is employed to indicate that multiple parameters + are being used to encapsulate a single parameter value. The count + starts at 0 and increments by 1 for each subsequent section of the + parameter value. Decimal values are used and neither leading zeroes + nor gaps in the sequence are allowed. + + The original parameter value is recovered by concatenating the + various sections of the parameter, in order. For example, the + content-type field + + Content-Type: message/external-body; access-type=URL; + URL*0="ftp://"; + URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar" + + is semantically identical to + + Content-Type: message/external-body; access-type=URL; + URL="ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar" + + Note that quotes around parameter values are part of the value + syntax; they are NOT part of the value itself. Furthermore, it is + explicitly permitted to have a mixture of quoted and unquoted + continuation fields. + +4. Parameter Value Character Set and Language Information + + Some parameter values may need to be qualified with character set or + language information. It is clear that a distinguished parameter + name is needed to identify when this information is present along + with a specific syntax for the information in the value itself. In + addition, a lightweight encoding mechanism is needed to accomodate 8 + bit information in parameter values. + + Asterisks ("*") are reused to provide the indicator that language and + character set information is present and encoding is being used. A + single quote ("'") is used to delimit the character set and language + information at the beginning of the parameter value. Percent signs + ("%") are used as the encoding flag, which agrees with RFC 2047. + + + + + + + + + +Freed & Moore Standards Track [Page 4] + +RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997 + + + Specifically, an asterisk at the end of a parameter name acts as an + indicator that character set and language information may appear at + the beginning of the parameter value. A single quote is used to + separate the character set, language, and actual value information in + the parameter value string, and an percent sign is used to flag + octets encoded in hexadecimal. For example: + + Content-Type: application/x-stuff; + title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A + + Note that it is perfectly permissible to leave either the character + set or language field blank. Note also that the single quote + delimiters MUST be present even when one of the field values is + omitted. This is done when either character set, language, or both + are not relevant to the parameter value at hand. This MUST NOT be + done in order to indicate a default character set or language -- + parameter field definitions MUST NOT assign a default character set + or lanugage. + +4.1. Combining Character Set, Language, and Parameter Continuations + + Character set and language information may be combined with the + parameter continuation mechanism. For example: + + Content-Type: application/x-stuff + title*1*=us-ascii'en'This%20is%20even%20more%20 + title*2*=%2A%2A%2Afun%2A%2A%2A%20 + title*3="isn't it!" + + Note that: + + (1) Language and character set information only appear at + the beginning of a given parameter value. + + (2) Continuations do not provide a facility for using more + than one character set or language in the same parameter + value. + + (3) A value presented using multiple continuations may + contain a mixture of encoded and unencoded segments. + + (4) The first segment of a continuation MUST be encoded if + language and character set information are given. + + (5) If the first segment of a continued parameter value is + encoded the language and character set field delimiters MUST + be present even when the fields are left blank. + + + + +Freed & Moore Standards Track [Page 5] + +RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997 + + +5. Language specification in Encoded Words + + RFC 2047 provides support for non-US-ASCII character sets in RFC 822 + message header comments, phrases, and any unstructured text field. + This is done by defining an encoded word construct which can appear + in any of these places. Given that these are fields intended for + display, it is sometimes necessary to associate language information + with encoded words as well as just the character set. This + specification extends the definition of an encoded word to allow the + inclusion of such information. This is simply done by suffixing the + character set specification with an asterisk followed by the language + tag. For example: + + From: =?US-ASCII*EN?Q?Keith_Moore?= <moore@cs.utk.edu> + +6. IMAP4 Handling of Parameter Values + + IMAP4 [RFC-2060] servers SHOULD decode parameter value continuations + when generating the BODY and BODYSTRUCTURE fetch attributes. + +7. Modifications to MIME ABNF + + The ABNF for MIME parameter values given in RFC 2045 is: + + parameter := attribute "=" value + + attribute := token + ; Matching of attributes + ; is ALWAYS case-insensitive. + + This specification changes this ABNF to: + + parameter := regular-parameter / extended-parameter + + regular-parameter := regular-parameter-name "=" value + + regular-parameter-name := attribute [section] + + attribute := 1*attribute-char + + attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs, + "*", "'", "%", or tspecials> + + section := initial-section / other-sections + + initial-section := "*1" + + + + + +Freed & Moore Standards Track [Page 6] + +RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997 + + + other-sections := "*" (("2" / "3" / "4" / "5" / + "6" / "7" / "8" / "9") *DIGIT) / + ("1" 1*DIGIT)) + + extended-parameter := (extended-initial-name "=" + extended-value) / + (extended-other-names "=" + extended-other-values) + + extended-initial-name := attribute [initial-section] "*" + + extended-other-names := attribute other-sections "*" + + extended-initial-value := [charset] "'" [language] "'" + extended-other-values + + extended-other-values := *(ext-octet / attribute-char) + + ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F") + + charset := <registered character set name> + + language := <registered language tag [RFC-1766]> + + The ABNF given in RFC 2047 for encoded-words is: + + encoded-word := "=?" charset "?" encoding "?" encoded-text "?=" + + This specification changes this ABNF to: + + encoded-word := "=?" charset ["*" language] "?" encoded-text "?=" + + +8. Character sets which allow specification of language + + In the future it is likely that some character sets will provide + facilities for inline language labelling. Such facilities are + inherently more flexible than those defined here as they allow for + language switching in the middle of a string. + + If and when such facilities are developed they SHOULD be used in + preference to the language labelling facilities specified here. Note + that all the mechanisms defined here allow for the omission of + language labels so as to be able to accomodate this possible future + usage. + + + + + + +Freed & Moore Standards Track [Page 7] + +RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997 + + +9. Security Considerations + + This RFC does not discuss security issues and is not believed to + raise any security issues not already endemic in electronic mail and + present in fully conforming implementations of MIME. + +10. References + + [RFC-822] + Crocker, D., "Standard for the Format of ARPA Internet Text + Messages", STD 11, RFC 822, August 1982. + + [RFC-1766] + Alvestrand, H., "Tags for the Identification of Languages", RFC + 1766, March 1995. + + [RFC-2045] + Freed, N. and Borenstein, N., "Multipurpose Internet Mail + Extensions (MIME) Part One: Format of Internet Message Bodies", + RFC 2045, Innosoft, First Virtual Holdings, December 1996. + + [RFC-2046] + Freed, N. and Borenstein, N., "Multipurpose Internet Mail + Extensions (MIME) Part Two: Media Types", RFC 2046, Innosoft, + First Virtual Holdings, December 1996. + + [RFC-2047] + Moore, K., "Multipurpose Internet Mail Extensions (MIME) Part + Three: Representation of Non-ASCII Text in Internet Message + Headers", RFC 2047, University of Tennessee, December 1996. + + [RFC-2048] + Freed, N., Klensin, J., Postel, J., "Multipurpose Internet Mail + Extensions (MIME) Part Four: MIME Registration Procedures", RFC + 2048, Innosoft, MCI, ISI, December 1996. + + [RFC-2049] + Freed, N. and Borenstein, N., "Multipurpose Internet Mail + Extensions (MIME) Part Five: Conformance Criteria and Examples", + RFC 2049, Innosoft, FIrst Virtual Holdings, December 1996. + + [RFC-2060] + Crispin, M., "Internet Message Access Protocol - Version 4rev1", + RFC 2060, December 1996. + + [RFC-2119] + Bradner, S., "Key words for use in RFCs to Indicate Requirement + Levels", RFC 2119, March 1997. + + + +Freed & Moore Standards Track [Page 8] + +RFC 2184 MIME Parameter Value and Encoded Word Extensions August 1997 + + + [RFC-2130] + Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson, + R., Crispin, M., Svanberg, P., "Report from the IAB Character Set + Workshop", RFC 2130, April 1997. + + [RFC-2183] + Troost, R., Dorner, S., and Moore, K., "Communicating Presentation + Information in Internet Messages: The Content-Disposition + Header", RFC 2183, August 1997. + +11. Authors' Addresses + + Ned Freed + Innosoft International, Inc. + 1050 East Garvey Avenue South + West Covina, CA 91790 + USA + tel: +1 818 919 3600 fax: +1 818 919 3614 + email: ned@innosoft.com + + Keith Moore + Computer Science Dept. + University of Tennessee + 107 Ayres Hall + Knoxville, TN 37996-1301 + USA + email: moore@cs.utk.edu + + + + + + + + + + + + + + + + + + + + + + + + +Freed & Moore Standards Track [Page 9] + |