summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6532.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc6532.txt')
-rw-r--r--doc/rfc/rfc6532.txt619
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc6532.txt b/doc/rfc/rfc6532.txt
new file mode 100644
index 0000000..98f6cee
--- /dev/null
+++ b/doc/rfc/rfc6532.txt
@@ -0,0 +1,619 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) A. Yang
+Request for Comments: 6532 TWNIC
+Obsoletes: 5335 S. Steele
+Updates: 2045 Microsoft
+Category: Standards Track N. Freed
+ISSN: 2070-1721 Oracle
+ February 2012
+
+
+ Internationalized Email Headers
+
+Abstract
+
+ Internet mail was originally limited to 7-bit ASCII. MIME added
+ support for the use of 8-bit character sets in body parts, and also
+ defined an encoded-word construct so other character sets could be
+ used in certain header field values. However, full
+ internationalization of electronic mail requires additional
+ enhancements to allow the use of Unicode, including characters
+ outside the ASCII repertoire, in mail addresses as well as direct use
+ of Unicode in header fields like "From:", "To:", and "Subject:",
+ without requiring the use of complex encoded-word constructs. This
+ document specifies an enhancement to the Internet Message Format and
+ to MIME that allows use of Unicode in mail addresses and most header
+ field content.
+
+ This specification updates Section 6.4 of RFC 2045 to eliminate the
+ restriction prohibiting the use of non-identity content-transfer-
+ encodings on subtypes of "message/".
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6532.
+
+
+
+
+
+
+
+
+Yang, et al. Standards Track [Page 1]
+
+RFC 6532 Internationalized Email Headers February 2012
+
+
+Copyright Notice
+
+ Copyright (c) 2012 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 2. Terminology Used in This Specification . . . . . . . . . . . . 3
+ 3. Changes to Message Header Fields . . . . . . . . . . . . . . . 4
+ 3.1. UTF-8 Syntax and Normalization . . . . . . . . . . . . . . 4
+ 3.2. Syntax Extensions to RFC 5322 . . . . . . . . . . . . . . 5
+ 3.3. Use of 8-bit UTF-8 in Message-IDs . . . . . . . . . . . . 5
+ 3.4. Effects on Line Length Limits . . . . . . . . . . . . . . 5
+ 3.5. Changes to MIME Message Type Encoding Restrictions . . . . 6
+ 3.6. Use of MIME Encoded-Words . . . . . . . . . . . . . . . . 6
+ 3.7. The message/global Media Type . . . . . . . . . . . . . . 7
+ 4. Security Considerations . . . . . . . . . . . . . . . . . . . 8
+ 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
+ 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9
+ 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
+ 7.1. Normative References . . . . . . . . . . . . . . . . . . . 10
+ 7.2. Informative References . . . . . . . . . . . . . . . . . . 10
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Yang, et al. Standards Track [Page 2]
+
+RFC 6532 Internationalized Email Headers February 2012
+
+
+1. Introduction
+
+ Internet mail distinguishes a message from its transport and further
+ divides a message between a header and a body [RFC5322]. Internet
+ mail header field values contain a variety of strings that are
+ intended to be user-visible. The range of supported characters for
+ these strings was originally limited to [ASCII] in 7-bit form. MIME
+ [RFC2045] [RFC2046] [RFC2047] provides the ability to use additional
+ character sets, but this support is limited to body part data and to
+ special encoded-word constructs that were only allowed in a limited
+ number of places in header field values.
+
+ Globalization of the Internet requires support of the much larger set
+ of characters provided by Unicode [RFC5198] in both mail addresses
+ and most header field values. Additionally, complex encoding schemes
+ like encoded-words introduce inefficiencies as well as significant
+ opportunities for processing errors. And finally, native support for
+ the UTF-8 charset is now available on most systems. Hence, it is
+ strongly desirable for Internet mail to support UTF-8 [RFC3629]
+ directly.
+
+ This document specifies an enhancement to the Internet Message Format
+ [RFC5322] and to MIME that permits the direct use of UTF-8, rather
+ than only ASCII, in header field values, including mail addresses. A
+ new media type, message/global, is defined for messages that use this
+ extended format. This specification also lifts the MIME restriction
+ on having non-identity content-transfer-encodings on any subtype of
+ the message top-level type so that message/global parts can be safely
+ transmitted across existing mail infrastructure.
+
+ This specification is based on a model of native, end-to-end support
+ for UTF-8, which depends on having an "8-bit-clean" environment
+ assured by the transport system. Support for carriage across legacy,
+ 7-bit infrastructure and for processing by 7-bit receivers requires
+ additional mechanisms that are not provided by these specifications.
+
+ This specification is a revision of and replacement for [RFC5335].
+ Section 6 of [RFC6530] describes the change in approach between this
+ specification and the previous version.
+
+2. Terminology Used in This Specification
+
+ A plain ASCII string is fully compatible with [RFC5321] and
+ [RFC5322]. In this document, non-ASCII strings are UTF-8 strings if
+ they are in header field values that contain at least one
+ <UTF8-non-ascii> (see Section 3.1).
+
+
+
+
+
+Yang, et al. Standards Track [Page 3]
+
+RFC 6532 Internationalized Email Headers February 2012
+
+
+ Unless otherwise noted, all terms used here are defined in [RFC5321],
+ [RFC5322], [RFC6530], or [RFC6531].
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [RFC2119].
+
+ The term "8-bit" means octets are present in the data with values
+ above 0x7F.
+
+3. Changes to Message Header Fields
+
+ To permit non-ASCII Unicode characters in field values, the header
+ definition in [RFC5322] is extended to support the new format. The
+ following sections specify the necessary changes to RFC 5322's ABNF.
+
+ The syntax rules not mentioned below remain defined as in [RFC5322].
+
+ Note that this protocol does not change rules in RFC 5322 for
+ defining header field names. The bodies of header fields are allowed
+ to contain Unicode characters, but the header field names themselves
+ must consist of ASCII characters only.
+
+ Also note that messages in this format require the use of the
+ SMTPUTF8 extension [RFC6531] to be transferred via SMTP.
+
+3.1. UTF-8 Syntax and Normalization
+
+ UTF-8 characters can be defined in terms of octets using the
+ following ABNF [RFC5234], taken from [RFC3629]:
+
+ UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4
+
+ UTF8-2 = <Defined in Section 4 of RFC3629>
+
+ UTF8-3 = <Defined in Section 4 of RFC3629>
+
+ UTF8-4 = <Defined in Section 4 of RFC3629>
+
+ See [RFC5198] for a discussion of Unicode normalization;
+ normalization form NFC [UNF] SHOULD be used. Actually, if one is
+ going to do internationalization properly, one of the most often
+ cited goals is to permit people to spell their names correctly.
+ Since many mailbox local parts reflect personal names, that principle
+ applies to mailboxes as well. The NFKC normalization form [UNF]
+ SHOULD NOT be used because it may lose information that is needed to
+ correctly spell some names in some unusual circumstances.
+
+
+
+
+Yang, et al. Standards Track [Page 4]
+
+RFC 6532 Internationalized Email Headers February 2012
+
+
+3.2. Syntax Extensions to RFC 5322
+
+ The following rules extend the ABNF syntax defined in [RFC5322] and
+ [RFC5234] in order to allow UTF-8 content.
+
+ VCHAR =/ UTF8-non-ascii
+
+ ctext =/ UTF8-non-ascii
+
+ atext =/ UTF8-non-ascii
+
+ qtext =/ UTF8-non-ascii
+
+ text =/ UTF8-non-ascii
+ ; note that this upgrades the body to UTF-8
+
+ dtext =/ UTF8-non-ascii
+
+ The preceding changes mean that the following constructs now allow
+ UTF-8:
+
+ 1. Unstructured text, used in header fields like "Subject:" or
+ "Content-description:".
+
+ 2. Any construct that uses atoms, including but not limited to the
+ local parts of addresses and Message-IDs. This includes
+ addresses in the "for" clauses of "Received:" header fields.
+
+ 3. Quoted strings.
+
+ 4. Domains.
+
+ Note that header field names are not on this list; these are still
+ restricted to ASCII.
+
+3.3. Use of 8-bit UTF-8 in Message-IDs
+
+ Implementers of Message-ID generation algorithms MAY prefer to
+ restrain their output to ASCII since that has some advantages, such
+ as when constructing "In-reply-to:" and "References:" header fields
+ in mailing-list threads where some senders use internationalized
+ addresses and others do not.
+
+3.4. Effects on Line Length Limits
+
+ Section 2.1.1 of [RFC5322] limits lines to 998 characters and
+ recommends that the lines be restricted to only 78 characters. This
+ specification changes the former limit to 998 octets. (Note that, in
+
+
+
+Yang, et al. Standards Track [Page 5]
+
+RFC 6532 Internationalized Email Headers February 2012
+
+
+ ASCII, octets and characters are effectively the same, but this is
+ not true in UTF-8.) The 78-character limit remains defined in terms
+ of characters, not octets, since it is intended to address display
+ width issues, not line-length issues.
+
+3.5. Changes to MIME Message Type Encoding Restrictions
+
+ This specification updates Section 6.4 of [RFC2045]. [RFC2045]
+ prohibits applying a content-transfer-encoding to any subtypes of
+ "message/". This specification relaxes that rule -- it allows newly
+ defined MIME types to permit content-transfer-encoding, and it allows
+ content-transfer-encoding for message/global (see Section 3.7).
+
+ Background: Normally, transfer of message/global will be done in
+ 8-bit-clean channels, and body parts will have "identity" encodings,
+ that is, no decoding is necessary.
+
+ But in the case where a message containing a message/global is
+ downgraded from 8-bit to 7-bit as described in [RFC6152], an encoding
+ might have to be applied to the message. If the message travels
+ multiple times between a 7-bit environment and an environment
+ implementing these extensions, multiple levels of encoding may occur.
+ This is expected to be rarely seen in practice, and the potential
+ complexity of other ways of dealing with the issue is thought to be
+ larger than the complexity of allowing nested encodings where
+ necessary.
+
+3.6. Use of MIME Encoded-Words
+
+ The MIME encoded-words facility [RFC2047] provides the ability to
+ place non-ASCII text, but only in a subset of the places allowed by
+ this extension. Additionally, encoded-words are substantially more
+ complex since they allow the use of arbitrary charsets. Accordingly,
+ encoded-words SHOULD NOT be used when generating header fields for
+ messages employing this extension. Agents MAY, when incorporating
+ material from another message, convert encoded-word use to direct use
+ of UTF-8.
+
+ Note that care must be taken when decoding encoded-words because the
+ results after replacing an encoded-word with its decoded equivalent
+ in UTF-8 may be syntactically invalid. Processors that elect to
+ decode encoded-words MUST NOT generate syntactically invalid fields.
+
+
+
+
+
+
+
+
+
+Yang, et al. Standards Track [Page 6]
+
+RFC 6532 Internationalized Email Headers February 2012
+
+
+3.7. The message/global Media Type
+
+ Internationalized messages in this format MUST only be transmitted as
+ authorized by [RFC6531] or within a non-SMTP environment that
+ supports these messages. A message is a "message/global message" if:
+
+ o it contains 8-bit UTF-8 header values as specified in this
+ document, or
+
+ o it contains 8-bit UTF-8 values in the header fields of body parts.
+
+ The content of a message/global part is otherwise identical to that
+ of a message/rfc822 part.
+
+ If an object of this type is sent to a 7-bit-only system, it MUST
+ have an appropriate content-transfer-encoding applied. (Note that a
+ system compliant with MIME that doesn't recognize message/global is
+ supposed to treat it as "application/octet-stream" as described in
+ Section 5.2.4 of [RFC2046].)
+
+ The registration is as follows:
+
+ Type name: message
+
+ Subtype name: global
+
+ Required parameters: none
+
+ Optional parameters: none
+
+ Encoding considerations: Any content-transfer-encoding is permitted.
+ The 8-bit or binary content-transfer-encodings are recommended
+ where permitted.
+
+ Security considerations: See Section 4.
+
+ Interoperability considerations: This media type provides
+ functionality similar to the message/rfc822 content type for email
+ messages with internationalized email headers. When there is a
+ need to embed or return such content in another message, there is
+ generally an option to use this media type and leave the content
+ unchanged or down-convert the content to message/rfc822. Each of
+ these choices will interoperate with the installed base, but with
+ different properties. Systems unaware of internationalized
+ headers will typically treat a message/global body part as an
+ unknown attachment, while they will understand the structure of a
+ message/rfc822. However, systems that understand message/global
+
+
+
+
+Yang, et al. Standards Track [Page 7]
+
+RFC 6532 Internationalized Email Headers February 2012
+
+
+ will provide functionality superior to the result of a down-
+ conversion to message/rfc822. The most interoperable choice
+ depends on the deployed software.
+
+ Published specification: RFC 6532
+
+ Applications that use this media type: SMTP servers and email
+ clients that support multipart/report generation or parsing.
+ Email clients that forward messages with internationalized headers
+ as attachments.
+
+ Additional information:
+
+ Magic number(s): none
+
+ File extension(s): The extension ".u8msg" is suggested.
+
+ Macintosh file type code(s): A uniform type identifier (UTI) of
+ "public.utf8-email-message" is suggested. This conforms to
+ "public.message" and "public.composite-content", but does not
+ necessarily conform to "public.utf8-plain-text".
+
+ Person & email address to contact for further information: See the
+ Authors' Addresses section of this document.
+
+ Intended usage: COMMON
+
+ Restrictions on usage: This is a structured media type that embeds
+ other MIME media types. An 8-bit or binary content-transfer-
+ encoding SHOULD be used unless this media type is sent over a
+ 7-bit-only transport.
+
+ Author: See the Authors' Addresses section of this document.
+
+ Change controller: IETF Standards Process
+
+4. Security Considerations
+
+ Because UTF-8 often requires several octets to encode a single
+ character, internationalization may cause header field values (in
+ general) and mail addresses (in particular) to become longer. As
+ specified in [RFC5322], each line of characters MUST be no more than
+ 998 octets, excluding the CRLF. On the other hand, MDA (Mail
+ Delivery Agent) processes that parse, store, or handle email
+ addresses or local parts must take extra care not to overflow
+ buffers, truncate addresses, or exceed storage allotments. Also,
+ they must take care, when comparing, to use the entire lengths of the
+ addresses.
+
+
+
+Yang, et al. Standards Track [Page 8]
+
+RFC 6532 Internationalized Email Headers February 2012
+
+
+ There are lots of ways to use UTF-8 to represent something equivalent
+ or similar to a particular displayed character or group of
+ characters; see the security considerations in [RFC3629] for details
+ on the problems this can cause. The normalization process described
+ in Section 3.1 is recommended to minimize these issues.
+
+ The security impact of UTF-8 headers on email signature systems such
+ as Domain Keys Identified Mail (DKIM), S/MIME, and OpenPGP is
+ discussed in Section 14 of [RFC6530].
+
+ If a user has a non-ASCII mailbox address and an ASCII mailbox
+ address, a digital certificate that identifies that user might have
+ both addresses in the identity. Having multiple email addresses as
+ identities in a single certificate is already supported in PKIX
+ (Public Key Infrastructure using X.509) [RFC5280] and OpenPGP
+ [RFC3156], but there may be user-interface issues associated with the
+ introduction of UTF-8 into addresses in this context.
+
+5. IANA Considerations
+
+ IANA has updated the registration of the message/global MIME type
+ using the registration form contained in Section 3.7.
+
+6. Acknowledgements
+
+ This document incorporates many ideas first described in a draft
+ document by Paul Hoffman, although many details have changed from
+ that earlier work.
+
+ The authors especially thank Jeff Yeh for his efforts and
+ contributions on editing previous versions.
+
+ Most of the content of this document was provided by John C Klensin
+ and Dave Crocker. Significant comments and suggestions were received
+ from Martin Duerst, Julien Elie, Arnt Gulbrandsen, Kristin Hubner,
+ Kari Hurtta, Yangwoo Ko, Charles H. Lindsey, Alexey Melnikov, Chris
+ Newman, Pete Resnick, Yoshiro Yoneya, and additional members of the
+ Joint Engineering Team (JET) and were incorporated into the document.
+ The authors wish to sincerely thank them all for their contributions.
+
+
+
+
+
+
+
+
+
+
+
+
+Yang, et al. Standards Track [Page 9]
+
+RFC 6532 Internationalized Email Headers February 2012
+
+
+7. References
+
+7.1. Normative References
+
+ [ASCII] "Coded Character Set -- 7-bit American Standard Code for
+ Information Interchange", ANSI X3.4, 1986.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+ [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
+ 10646", STD 63, RFC 3629, November 2003.
+
+ [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network
+ Interchange", RFC 5198, March 2008.
+
+ [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
+ Specifications: ABNF", STD 68, RFC 5234, January 2008.
+
+ [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
+ October 2008.
+
+ [RFC5322] Resnick, P., Ed., "Internet Message Format", RFC 5322,
+ October 2008.
+
+ [RFC6530] Klensin, J. and Y. Ko, "Overview and Framework for
+ Internationalized Email", RFC 6530, February 2012.
+
+ [RFC6531] Yao, J. and W. Mao, "SMTP Extension for Internationalized
+ Email", RFC 6531, February 2012.
+
+ [UNF] Davis, M. and K. Whistler, "Unicode Standard Annex #15:
+ Unicode Normalization Forms", September 2010,
+ <http://www.unicode.org/reports/tr15/>.
+
+7.2. Informative References
+
+ [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
+ Extensions (MIME) Part One: Format of Internet Message
+ Bodies", RFC 2045, November 1996.
+
+ [RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
+ Extensions (MIME) Part Two: Media Types", RFC 2046,
+ November 1996.
+
+ [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions)
+ Part Three: Message Header Extensions for Non-ASCII Text",
+ RFC 2047, November 1996.
+
+
+
+Yang, et al. Standards Track [Page 10]
+
+RFC 6532 Internationalized Email Headers February 2012
+
+
+ [RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler,
+ "MIME Security with OpenPGP", RFC 3156, August 2001.
+
+ [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S.,
+ Housley, R., and W. Polk, "Internet X.509 Public Key
+ Infrastructure Certificate and Certificate Revocation List
+ (CRL) Profile", RFC 5280, May 2008.
+
+ [RFC5335] Yang, A., "Internationalized Email Headers", RFC 5335,
+ September 2008.
+
+ [RFC6152] Klensin, J., Freed, N., Rose, M., and D. Crocker, "SMTP
+ Service Extension for 8-bit MIME Transport", STD 71,
+ RFC 6152, March 2011.
+
+Authors' Addresses
+
+ Abel Yang
+ TWNIC
+ 4F-2, No. 9, Sec 2, Roosevelt Rd.
+ Taipei 100
+ Taiwan
+
+ Phone: +886 2 23411313 ext 505
+ EMail: abelyang@twnic.net.tw
+
+
+ Shawn Steele
+ Microsoft
+
+ EMail: Shawn.Steele@microsoft.com
+
+
+ Ned Freed
+ Oracle
+ 800 Royal Oaks
+ Monrovia, CA 91016-6347
+ USA
+
+ EMail: ned+ietf@mrochek.com
+
+
+
+
+
+
+
+
+
+
+
+Yang, et al. Standards Track [Page 11]
+