1 files changed, 507 insertions, 0 deletions
diff --git a/doc/rfc/rfc2184.txt b/doc/rfc/rfc2184.txt
new file mode 100644
index 0000000..254321a
--- /dev/null
+++ b/doc/rfc/rfc2184.txt
@@ -0,0 +1,507 @@
+
+
+
+
+
+
+Network Working Group                                         N. Freed
+Request for Comments: 2184                                    Innosoft
+Updates: 2045, 2047, 2183                                     K. Moore
+Category: Standards Track                      University of Tennessee
+                                                           August 1997
+
+
+           MIME Parameter Value and Encoded Word Extensions:
+              Character Sets, Languages, and Continuations
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+1.  Abstract
+
+   This memo defines extensions to the RFC 2045 media type and RFC 2183
+   disposition parameter value mechanisms to provide
+
+    (1)   a means to specify parameter values in character sets
+          other than US-ASCII,
+
+    (2)   to specify the language to be used should the value be
+          displayed, and
+
+    (3)   a continuation mechanism for long parameter values to
+          avoid problems with header line wrapping.
+
+   This memo also defines an extension to the encoded words defined in
+   RFC 2047 to allow the specification of the language to be used for
+   display as well as the character set.
+
+2.  Introduction
+
+   The Multipurpose Internet Mail Extensions, or MIME [RFC-2045, RFC-
+   2046, RFC-2047, RFC-2048, RFC-2049], define a message format that
+   allows for
+
+    (1)   textual message bodies in character sets other than
+          US-ASCII,
+
+    (2)   non-textual message bodies,
+
+    (3)   multi-part message bodies, and
+
+
+
+Freed & Moore               Standards Track                     [Page 1]
+
+RFC 2184    MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+    (4)   textual header information in character sets other than
+          US-ASCII.
+
+   MIME is now widely deployed and is used by a variety of Internet
+   protocols, including, of course, Internet email.  However, MIME's
+   success has resulted in the need for additional mechanisms that were
+   not provided in the original protocol specification.
+
+   In particular, existing MIME mechanisms provide for named media type
+   (content-type field) parameters as well as named disposition
+   (content-disposition field).  A MIME media type may specify any
+   number of parameters associated with all of its subtypes, and any
+   specific subtype may specify additional parameters for its own use. A
+   MIME disposition value may specify any number of associated
+   parameters, the most important of which is probably the attachment
+   disposition's filename parameter.
+
+   These parameter names and values end up appearing in the content-type
+   and content-disposition header fields in Internet email.  This
+   inherently imposes three crucial limitations:
+
+    (1)   Lines in Internet email header fields are folded according to
+          RFC 822 folding rules.  This makes long parameter values
+          problematic.
+
+    (2)   MIME headers, like the RFC 822 headers they often appear in,
+          are limited to 7bit US-ASCII, and the encoded-word mechanisms
+          of RFC 2047 are not available to parameter values.  This makes
+          it impossible to have parameter values in character sets other
+          than US-ASCII without specifying some sort of private per-
+          parameter encoding.
+
+    (3)   It has recently become clear that character set information
+          is not sufficient to properly display some sorts of
+          information -- language information is also needed [RFC-2130].
+          For example, support for handicapped users may require reading
+          text string aloud. The language the text is written in is
+          needed for this to be done correctly.  Some parameter values
+          may need to be displayed, hence there is a need to allow for
+          the inclusion of language information.
+
+   The last problem on this list is also an issue for the encoded words
+   defined by RFC 2047, as encoded words are intended primarily for
+   display purposes.
+
+
+
+
+
+
+
+Freed & Moore               Standards Track                     [Page 2]
+
+RFC 2184    MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+   This document defines extensions that address all of these
+   limitations. All of these extensions are implemented in a fashion
+   that is completely compatible at a syntactic level with existing MIME
+   implementations. In addition, the extensions are designed to have as
+   little impact as possible on existing uses of MIME.
+
+   IMPORTANT NOTE: These mechanisms end up being somewhat gibbous when
+   they actually are used. As such, use of these mechanisms should not
+   be used lightly; they should be reserved for situations where a real
+   need for them exists.
+
+2.1.  Requirements notation
+
+   This document occasionally uses terms that appear in capital letters.
+   When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
+   appear capitalized, they are being used to indicate particular
+   requirements of this specification. A discussion of the meanings of
+   these terms appears in [RFC-2119].
+
+
+3.  Parameter Value Continuations
+
+   Long MIME media type or disposition parameter values do not interact
+   well with header line wrapping conventions.  In particular, proper
+   header line wrapping depends on there being places where linear
+   whitespace (LWSP) is allowed, which may or may not be present in a
+   parameter value, and even if present may not be recognizable as such
+   since specific knowledge of parameter value syntax may not be
+   available to the agent doing the line wrapping. The result is that
+   long parameter values may end up getting truncated or otherwise
+   damaged by incorrect line wrapping implementations.
+
+   A mechanism is therefore needed to break up parameter values into
+   smaller units that are amenable to line wrapping. Any such mechanism
+   MUST be compatible with existing MIME processors. This means that
+
+    (1)   the mechanism MUST NOT change the syntax of MIME media
+          type and disposition lines, and
+
+    (2)   the mechanism MUST NOT depend on parameter ordering
+          since MIME states that parameters are not order sensitive.
+          Note that while MIME does prohibit modification of MIME
+          headers during transport, it is still possible that parameters
+          will be reordered when user agent level processing is done.
+
+
+
+
+
+
+
+Freed & Moore               Standards Track                     [Page 3]
+
+RFC 2184    MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+   The obvious solution, then, is to use multiple parameters to contain
+   a single parameter value and to use some kind of distinguished name
+   to indicate when this is being done.  And this obvious solution is
+   exactly what is specified here: The asterisk character ("*") followed
+   by a decimal count is employed to indicate that multiple parameters
+   are being used to encapsulate a single parameter value.  The count
+   starts at 0 and increments by 1 for each subsequent section of the
+   parameter value.  Decimal values are used and neither leading zeroes
+   nor gaps in the sequence are allowed.
+
+   The original parameter value is recovered by concatenating the
+   various sections of the parameter, in order.  For example, the
+   content-type field
+
+     Content-Type: message/external-body; access-type=URL;
+      URL*0="ftp://";
+      URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
+
+   is semantically identical to
+
+     Content-Type: message/external-body; access-type=URL;
+      URL="ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
+
+   Note that quotes around parameter values are part of the value
+   syntax; they are NOT part of the value itself.  Furthermore, it is
+   explicitly permitted to have a mixture of quoted and unquoted
+   continuation fields.
+
+4.  Parameter Value Character Set and Language Information
+
+   Some parameter values may need to be qualified with character set or
+   language information.  It is clear that a distinguished parameter
+   name is needed to identify when this information is present along
+   with a specific syntax for the information in the value itself.  In
+   addition, a lightweight encoding mechanism is needed to accomodate 8
+   bit information in parameter values.
+
+   Asterisks ("*") are reused to provide the indicator that language and
+   character set information is present and encoding is being used. A
+   single quote ("'") is used to delimit the character set and language
+   information at the beginning of the parameter value. Percent signs
+   ("%") are used as the encoding flag, which agrees with RFC 2047.
+
+
+
+
+
+
+
+
+
+Freed & Moore               Standards Track                     [Page 4]
+
+RFC 2184    MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+   Specifically, an asterisk at the end of a parameter name acts as an
+   indicator that character set and language information may appear at
+   the beginning of the parameter value. A single quote is used to
+   separate the character set, language, and actual value information in
+   the parameter value string, and an percent sign is used to flag
+   octets encoded in hexadecimal.  For example:
+
+     Content-Type: application/x-stuff;
+      title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A
+
+   Note that it is perfectly permissible to leave either the character
+   set or language field blank.  Note also that the single quote
+   delimiters MUST be present even when one of the field values is
+   omitted.  This is done when either character set, language, or both
+   are not relevant to the parameter value at hand.  This MUST NOT be
+   done in order to indicate a default character set or language --
+   parameter field definitions MUST NOT assign a default character set
+   or lanugage.
+
+4.1.  Combining Character Set, Language, and Parameter Continuations
+
+   Character set and language information may be combined with the
+   parameter continuation mechanism. For example:
+
+   Content-Type: application/x-stuff
+    title*1*=us-ascii'en'This%20is%20even%20more%20
+    title*2*=%2A%2A%2Afun%2A%2A%2A%20
+    title*3="isn't it!"
+
+   Note that:
+
+    (1)   Language and character set information only appear at
+          the beginning of a given parameter value.
+
+    (2)   Continuations do not provide a facility for using more
+          than one character set or language in the same parameter
+          value.
+
+    (3)   A value presented using multiple continuations may
+          contain a mixture of encoded and unencoded segments.
+
+    (4)   The first segment of a continuation MUST be encoded if
+          language and character set information are given.
+
+    (5)   If the first segment of a continued parameter value is
+          encoded the language and character set field delimiters MUST
+          be present even when the fields are left blank.
+
+
+
+
+Freed & Moore               Standards Track                     [Page 5]
+
+RFC 2184    MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+5.  Language specification in Encoded Words
+
+   RFC 2047 provides support for non-US-ASCII character sets in RFC 822
+   message header comments, phrases, and any unstructured text field.
+   This is done by defining an encoded word construct which can appear
+   in any of these places.  Given that these are fields intended for
+   display, it is sometimes necessary to associate language information
+   with encoded words as well as just the character set.  This
+   specification extends the definition of an encoded word to allow the
+   inclusion of such information.  This is simply done by suffixing the
+   character set specification with an asterisk followed by the language
+   tag.  For example:
+
+        From: =?US-ASCII*EN?Q?Keith_Moore?= <moore@cs.utk.edu>
+
+6.  IMAP4 Handling of Parameter Values
+
+   IMAP4 [RFC-2060] servers SHOULD decode parameter value continuations
+   when generating the BODY and BODYSTRUCTURE fetch attributes.
+
+7.  Modifications to MIME ABNF
+
+   The ABNF for MIME parameter values given in RFC 2045 is:
+
+   parameter := attribute "=" value
+
+   attribute := token
+                ; Matching of attributes
+                ; is ALWAYS case-insensitive.
+
+   This specification changes this ABNF to:
+
+   parameter := regular-parameter / extended-parameter
+
+   regular-parameter := regular-parameter-name "=" value
+
+   regular-parameter-name := attribute [section]
+
+   attribute := 1*attribute-char
+
+   attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs,
+                     "*", "'", "%", or tspecials>
+
+   section := initial-section / other-sections
+
+   initial-section := "*1"
+
+
+
+
+
+Freed & Moore               Standards Track                     [Page 6]
+
+RFC 2184    MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+   other-sections := "*" (("2" / "3" / "4" / "5" /
+                           "6" / "7" / "8" / "9") *DIGIT) /
+                          ("1" 1*DIGIT))
+
+   extended-parameter := (extended-initial-name "="
+                          extended-value) /
+                         (extended-other-names "="
+                          extended-other-values)
+
+   extended-initial-name := attribute [initial-section] "*"
+
+   extended-other-names := attribute other-sections "*"
+
+   extended-initial-value := [charset] "'" [language] "'"
+                             extended-other-values
+
+   extended-other-values := *(ext-octet / attribute-char)
+
+   ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
+
+   charset := <registered character set name>
+
+   language := <registered language tag [RFC-1766]>
+
+   The ABNF given in RFC 2047 for encoded-words is:
+
+   encoded-word := "=?" charset "?" encoding "?" encoded-text "?="
+
+   This specification changes this ABNF to:
+
+   encoded-word := "=?" charset ["*" language] "?" encoded-text "?="
+
+
+8.  Character sets which allow specification of language
+
+   In the future it is likely that some character sets will provide
+   facilities for inline language labelling. Such facilities are
+   inherently more flexible than those defined here as they allow for
+   language switching in the middle of a string.
+
+   If and when such facilities are developed they SHOULD be used in
+   preference to the language labelling facilities specified here. Note
+   that all the mechanisms defined here allow for the omission of
+   language labels so as to be able to accomodate this possible future
+   usage.
+
+
+
+
+
+
+Freed & Moore               Standards Track                     [Page 7]
+
+RFC 2184    MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+9.  Security Considerations
+
+   This RFC does not discuss security issues and is not believed to
+   raise any security issues not already endemic in electronic mail and
+   present in fully conforming implementations of MIME.
+
+10.  References
+
+   [RFC-822]
+      Crocker, D., "Standard for the Format of ARPA Internet Text
+      Messages", STD 11, RFC 822, August 1982.
+
+   [RFC-1766]
+      Alvestrand, H., "Tags for the Identification of Languages", RFC
+      1766, March 1995.
+
+   [RFC-2045]
+      Freed, N. and Borenstein, N., "Multipurpose Internet Mail
+      Extensions (MIME) Part One: Format of Internet Message Bodies",
+      RFC 2045, Innosoft, First Virtual Holdings, December 1996.
+
+   [RFC-2046]
+      Freed, N. and Borenstein, N., "Multipurpose Internet Mail
+      Extensions (MIME) Part Two: Media Types", RFC 2046, Innosoft,
+      First Virtual Holdings, December 1996.
+
+   [RFC-2047]
+      Moore, K., "Multipurpose Internet Mail Extensions (MIME) Part
+      Three: Representation of Non-ASCII Text in Internet Message
+      Headers", RFC 2047, University of Tennessee, December 1996.
+
+   [RFC-2048]
+      Freed, N., Klensin, J., Postel, J., "Multipurpose Internet Mail
+      Extensions (MIME) Part Four: MIME Registration Procedures", RFC
+      2048, Innosoft, MCI, ISI, December 1996.
+
+   [RFC-2049]
+      Freed, N. and Borenstein, N., "Multipurpose Internet Mail
+      Extensions (MIME) Part Five: Conformance Criteria and Examples",
+      RFC 2049, Innosoft, FIrst Virtual Holdings, December 1996.
+
+   [RFC-2060]
+      Crispin, M., "Internet Message Access Protocol - Version 4rev1",
+      RFC 2060, December 1996.
+
+   [RFC-2119]
+      Bradner, S., "Key words for use in RFCs to Indicate Requirement
+      Levels", RFC 2119, March 1997.
+
+
+
+Freed & Moore               Standards Track                     [Page 8]
+
+RFC 2184    MIME Parameter Value and Encoded Word Extensions August 1997
+
+
+   [RFC-2130]
+      Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson,
+      R., Crispin, M., Svanberg, P., "Report from the IAB Character Set
+      Workshop", RFC 2130, April 1997.
+
+   [RFC-2183]
+      Troost, R., Dorner, S., and Moore, K., "Communicating Presentation
+      Information in Internet Messages:  The Content-Disposition
+      Header", RFC 2183, August 1997.
+
+11.  Authors' Addresses
+
+   Ned Freed
+   Innosoft International, Inc.
+   1050 East Garvey Avenue South
+   West Covina, CA 91790
+   USA
+    tel: +1 818 919 3600           fax: +1 818 919 3614
+    email: ned@innosoft.com
+
+   Keith Moore
+   Computer Science Dept.
+   University of Tennessee
+   107 Ayres Hall
+   Knoxville, TN 37996-1301
+   USA
+    email: moore@cs.utk.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Freed & Moore               Standards Track                     [Page 9]
+