diff options
Diffstat (limited to 'doc/rfc/rfc1342.txt')
-rw-r--r-- | doc/rfc/rfc1342.txt | 395 |
1 files changed, 395 insertions, 0 deletions
diff --git a/doc/rfc/rfc1342.txt b/doc/rfc/rfc1342.txt new file mode 100644 index 0000000..5315aff --- /dev/null +++ b/doc/rfc/rfc1342.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group K. Moore +Request for Comments: 1342 University of Tennessee + June 1992 + + + Representation of Non-ASCII Text in Internet Message Headers + +Status of this Memo + + This RFC specifies an IAB standards track protocol for the Internet + community, and requests discussion and suggestions for improvements. + Please refer to the current edition of the "IAB Official Protocol + Standards" for the standardization state and status of this protocol. + Distribution of this memo is unlimited. + +Abstract + + This memo describes an extension to the message format defined in [1] + (known to the IETF Mail Extensions Working Group as "RFC 1341"), to + allow the representation of character sets other than ASCII in RFC + 822 message headers. The extensions described were designed to be + highly compatible with existing Internet mail handling software, and + to be easily implemented in mail readers that support RFC 1341. + +Introduction + + RFC 1341 describes a mechanism for denoting textual body parts which + are coded in various character sets, as well as methods for encoding + such body parts as sequences of printable ASCII characters. This + memo describes similar techniques to allow the encoding of non-ASCII + text in various portions of a RFC 822 [2] message header, in a manner + which is unlikely to confuse existing message handling software. + + Like the encoding techniques described in RFC 1341, the techniques + outlined here were designed to allow the use of non-ASCII characters + in message headers in a way which is unlikely to be disturbed by the + quirks of existing Internet mail handling programs. In particular, + some mail relaying programs are known to (a) delete some message + header fields while retaining others, (b) rearrange the order of + addresses in To or Cc fields, (c) rearrange the (vertical) order of + header fields, and/or (d) "wrap" message headers at different places + than those in the original message. In addition, some mail reading + programs are known to have difficulty correctly parsing message + headers which, while legal according to RFC 822, make use of + backslash-quoting to "hide" special characters such as "<", ",", or + or which exploit other infrequently-used features of that + specification. + + + + +Moore [Page 1] + +RFC 1342 Non-ASCII Mail Headers June 1992 + + + While it is unfortunate that these programs do not correctly + interpret RFC 822 headers, to "break" these programs would cause + severe operational problems for the Internet mail system. The + extensions described in this memo therefore do not rely on little- + used features of RFC 822. Instead, certain sequences of "ordinary" + printable ASCII characters (which are assumed to be unlikely to + otherwise appear in message headers) are reserved for use as encoded + data. The characters used in these encodings are restricted to those + which do not have special meanings in the context in which the + encoded text appears. + +Encodings + + An "encoded-word" is a sequence of printable ASCII characters that + begins with "=?", ends with "?=", and has two "?"s in between. It + specifies a character set and an encoding method, and also includes + the original text encoded as ASCII characters, according to the rules + for that encoding method. + + A mail composer that implements this specification will provide a + means of inputing non-ASCII text in header fields, but will translate + these fields (or appropriate portions of these fields) into encoded- + words before inserting them into the message header. + + A mail reader that implements this specification will recognize + encoded-words when they appear in certain portions of the message + header. Instead of displaying the encoded-word "as is", it will + reverse the encoding and display the original text in the designated + character set. + + An "encoded-word" is more precisely defined by the following EBNF + grammar, using the notation of RFC 822: + + encoded-word = "=" "?" charset "?" encoding "?" encoded-text "?" "=" + + charset = token ; legal charsets defined by RFC 1341 + + encoding = token ; Either "B" or "Q" + + token = 1*<Any CHAR except SPACE, CTLs, and tspecials> + + tspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / + <"> / "/" / "[" / "]" / "?" / "." / "=" + + encoded-text = 1*<Any printable ASCII character other than "?" or + ; SPACE> (but see "Use of encoded-words in message + ; headers", below) + + + + +Moore [Page 2] + +RFC 1342 Non-ASCII Mail Headers June 1992 + + + An encoded-word may not be more than 75 characters long, including + charset, encoding, encoded-text, and delimiters. If it is desirable + to encode more text than will fit in an encoded-word of 75 + characters, multiple encoded-words (separated by SPACE or newline) + may be used. Message header lines that contain one or more encoded- + words should be no more than 76 characters long. NOTE: These + restrictions are included not only to ease interoperbility through + internetwork mail gateways, but also to impose a limit on the amount + of lookahead a header parser must employ (while looking for a final + ?= delimiter) before it can decide whether a token is an encoded-word + or something else. + + Initially, the legal values for "encoding" are "Q" and "B". These + encodings are described below. The "Q" encoding is recommended for + use with Latin character sets, and the "B" encoding for all others. + Nevertheless, a mail reader which claims to recognize encoded-words + MUST be able to accept either encoding for any character set which it + supports. + + Only a subset of the printable ASCII characters may be used in + encoded-text. The SPACE character is not allowed, so that the + beginning and end of an encoded-word are obvious. The "?" character + is used within an encoded-word to separate the various portions of + the encoded-word from one another, and thus cannot appear in the + encoded-text portion. Other characters are also illegal in certain + contexts. For example, an encoded-word in a "phrase" preceeding an + address in a From header field may not contain any of the "specials" + defined in RFC 822. Finally, certain other characters are disallowed + in some contexts, to ensure reliability for messages that pass + through internetwork mail gateways. + + The "B" encoding automatically meets these requirements. The "Q" + encoding allows a wide range of printable characters to be used in + non-critical locations in the message header (e.g., Subject), with + fewer characters available for use in other locations. + +The "B" encoding + + The "B" encoding is identical to the "BASE64" encoding defined by RFC + 1341. + +The "Q" encoding + + The "Q" encoding is similar to the "Quoted-Printable" content- + transfer-encoding defined in RFC 1341. It is designed to allow text + containing mostly ASCII characters to be decipherable on an ASCII + terminal without decoding. + + + + +Moore [Page 3] + +RFC 1342 Non-ASCII Mail Headers June 1992 + + + 1. Any 8-bit value may be represented by a "=" followed by two + hexadecimal digits. For example, if the character set in use + were ISO-8859-1, the "=" character would thus be encoded as + "=3D", and a SPACE by "=20". + + 2. The 8-bit hexadecimal value 20 (e.g., IS0-8859-1 SPACE) may be + represented as "_" (underscore, ASCII 95.). (This character may + not pass through some internetwork mail gateways, but its use + will greatly enhance readability of "Q" encoded data with mail + readers that do not support this encoding.) Note that the "_" + always represents hexadecimal 20, even if the SPACE character + occupies a different code position in the character set in use. + + 3. 8-bit values which correspond to printable ASCII characters other + than "=", "?", "_" (underscore), and SPACE may be represented as + those characters. (But see "Use of encoded-words in message + headers", below). + +Character sets + + In an encoded-word, the character set associated with the unencoded + text is specified by a charset. A charset can be any of the + character set names allowed in an RFC 1341 "charset" parameter of a + "text/plain" body part. (See section 7.1.1 of RFC 1341 for a list of + valid charset parameters). + + When there is a possibility of using more than one character set to + represent the text in an encoded-word, and in the absence of private + agreements between sender and recipients of a message, it is + recommended that members of the ISO-8859-* series be used in + preference to other character sets. Among the various ISO-8859-* + character sets, the lowest-numbered set which contains all of the + required characters should be used. + +Use of encoded-words in message headers + + A sequence of one or more encoded-words is used to represent non- + ASCII textual data within a header field. An encoded-word must be + separated from an adjacent encoded-word, "word", "text", "ctext", or + "special" by a linear white-space character or a newline. When + displaying a particular header field" (in the RFC 822 sense) + containing one or more encoded-words, an unencoded SPACE character + that immediately follows the encoded-word is not displayed. A + newline that immediately follows an encoded-word is not displayed + unless the encoded-word is the last token in that "field". (This is + to allow the use of multiple encoded-words to represent long strings + of unencoded text, without having to separate encoded-words where + spaces occur in the unencoded text.) + + + +Moore [Page 4] + +RFC 1342 Non-ASCII Mail Headers June 1992 + + + An encoded-word may appear in a message header or body part header + according to the following rules: + +- An encoded-word may replace a "text" token (as defined by RFC 822) in: + (1) a Subject or Comments header field, (2) any extension message + header field, (3) any user-defined message header field, or (4) any + RFC 1341 body part header field (such as Content-Description) for + which the field body contains only "text"s. + +- An encoded-word may appear within a comment delimited by "(" and ")", + i.e., wherever a "ctext" is allowed. More precisely, the RFC 822 EBNF + definition for "comment" is amended as follows: + + comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")" + + A "Q"-encoded encoded-word which appears in a comment MUST NOT contain + the characters "(", ")" or "\". + +- As a replacement for a "word" entity within a "phrase", for example, + one that precedes an address in a From, To, or Cc header. The EBNF + definition for phrase from RFC 822 thus becomes: + + phrase = 1*(encoded-word / word) + + In this case the set of characters that may be used in a "Q"-encoded + encoded-word is restricted to: <upper and lower case ASCII letters, + decimal digits, "!", "*", "+", "-", "/", "=", and "_" (underscore, + ASCII 95.)>. + + These are the ONLY locations where an encoded-word may appear. In + particular, an encoded-word MUST NOT appear in any portion of an + "address". In addition, an encoded-word MUST NOT be used in a + Received header field. + + Whenever such words appear in a header being displayed, an enlightened + mail reader will decode the text and render it appropriately. + + Only textual data (printable and white space characters) should be + encoded using this scheme. However, since these encoding schemes + allow the encoding of arbitrary 8-bit values, mail readers that + implement this decoding should also ensure that display of the + decoded data on the recipient's terminal will not cause unwanted + side-effects. + + Use of these methods to encode non-textual data (e.g., pictures or + sounds) is not defined by this memo. Use of encoded-words to + represent strings of purely ASCII characters is allowed, but + discouraged. + + + +Moore [Page 5] + +RFC 1342 Non-ASCII Mail Headers June 1992 + + +Recognition of encoded-words in message headers. + + An encoded-word may be distinguished from an ordinary "word", "text", + or "ctext", as follows: An encoded-word begins with "=?", ends with + "?=", contains exactly four "?" characters including the delimiters, + and is followed by a SPACE or newline. If the "word", "text", or + "ctext" does not meet the above tests, it should be displayed as it + appears in the message header. + + If the mail reader does not support the character set used, it may + either display the encoded-word as ordinary text (i.e., as it appears + in the header), or it may substitute an appropriate message + indicating that the decoded text could not be displayed. + +Conformance + + A mail composing program claiming compliance with this specification + MUST ensure that any string of printable ASCII characters in a + message header that begins with "=?" and ends with "?=" be a valid + encoded-word. + + A mail reading program claiming compliance with this specification + must be able to distinguish encoded-words from "text", "ctext", or + "word"s anytime they appear in appropriate places in message headers. + The program must be able to display unencoded text if the character + set is "US-ASCII". For the ISO-8859-* character sets, the mail + reading program must at least be able to display the characters which + are also in the ASCII set. + +Examples + + From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu> + To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk> + CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be> + Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?= + =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?= + + From: =?ISO-8859-1?Q?Olle_J=E4rnefors?= <ojarnef@admin.kth.se> + To: ietf-822@dimacs.rutgers.edu, ojarnef@admin.kth.se + Subject: Time for ISO 10646? + + To: Dave Crocker <dcrocker@mordor.stanford.edu> + Cc: ietf-822@dimacs.rutgers.edu, paf@comsol.se + From: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf@nada.kth.se> + Subject: Re: RFC-HDR care and feeding + + + + + + +Moore [Page 6] + +RFC 1342 Non-ASCII Mail Headers June 1992 + + + From: Nathaniel Borenstein <nsb@thumper.bellcore.com> + (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=) + To: Greg Vaudreuil <gvaudre@NRI.Reston.VA.US>, Ned Freed + <ned@innosoft.com>, + Keith Moore <moore@cs.utk.edu> + Subject: Test of new header generator + MIME-Version: 1.0 + Content-type: text/plain; charset=ISO-8859-1 + +References + + [1] Borenstein N., and N. Freed, "MIME (Multipurpose Internet Mail + Extensions): Mechanisms for Specifying and Describing the Format + of Internet Message Bodies", RFC 1341, Bellcore, Innosoft, + June 1992. + + [2] Crocker, D., "Standard for the Format of ARPA Internet Text + Messages", RFC 822, UDEL, August 1982. + +Security Considerations + + Security issues are not discussed in this memo. + +Author's Address + + Keith Moore + University of Tennessee + 107 Ayres Hall + Knoxville TN 37996-1301 + + EMail: moore@cs.utk.edu + + + + + + + + + + + + + + + + + + + + +Moore [Page 7] +
\ No newline at end of file |