summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc7103.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc7103.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc7103.txt')
-rw-r--r--doc/rfc/rfc7103.txt1347
1 files changed, 1347 insertions, 0 deletions
diff --git a/doc/rfc/rfc7103.txt b/doc/rfc/rfc7103.txt
new file mode 100644
index 0000000..fbda5b0
--- /dev/null
+++ b/doc/rfc/rfc7103.txt
@@ -0,0 +1,1347 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) M. Kucherawy
+Request for Comments: 7103 G. Shapiro
+Category: Informational N. Freed
+ISSN: 2070-1721 January 2014
+
+
+ Advice for Safe Handling of Malformed Messages
+
+Abstract
+
+ Although Internet message formats have been precisely defined since
+ the 1970s, authoring and handling software often shows only mild
+ conformance to the specifications. The malformed messages that
+ result are non-standard. Nonetheless, decades of experience have
+ shown that using some tolerance in the handling of the malformations
+ that result is often an acceptable approach and is better than
+ rejecting the messages outright as nonconformant. This document
+ includes a collection of the best advice available regarding a
+ variety of common malformed mail situations; it is to be used as
+ implementation guidance.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Not all documents
+ approved by the IESG are a candidate for any level of Internet
+ Standard; see Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc7103.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 1]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+Copyright Notice
+
+ Copyright (c) 2014 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 2]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 1.1. The Purpose of This Work . . . . . . . . . . . . . . . . 3
+ 1.2. Not the Purpose of This Work . . . . . . . . . . . . . . 4
+ 1.3. General Considerations . . . . . . . . . . . . . . . . . 4
+ 2. Document Conventions . . . . . . . . . . . . . . . . . . . . 5
+ 2.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . 5
+ 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 5
+ 4. Invariant Content . . . . . . . . . . . . . . . . . . . . . . 5
+ 5. Mail Submission Agents . . . . . . . . . . . . . . . . . . . 6
+ 6. Line Termination . . . . . . . . . . . . . . . . . . . . . . 7
+ 7. Header Anomalies . . . . . . . . . . . . . . . . . . . . . . 8
+ 7.1. Converting Obsolete and Invalid Syntaxes . . . . . . . . 8
+ 7.1.1. Host-Address Syntax . . . . . . . . . . . . . . . . . 8
+ 7.1.2. Excessive Angle Brackets . . . . . . . . . . . . . . 8
+ 7.1.3. Unbalanced Angle Brackets . . . . . . . . . . . . . . 8
+ 7.1.4. Unbalanced Parentheses . . . . . . . . . . . . . . . 9
+ 7.1.5. Commas in Address Lists . . . . . . . . . . . . . . . 9
+ 7.1.6. Unbalanced Quotes . . . . . . . . . . . . . . . . . . 10
+ 7.1.7. Naked Local-Parts . . . . . . . . . . . . . . . . . . 10
+ 7.2. Non-Header Lines . . . . . . . . . . . . . . . . . . . . 10
+ 7.3. Unusual Spacing . . . . . . . . . . . . . . . . . . . . . 12
+ 7.4. Header Malformations . . . . . . . . . . . . . . . . . . 13
+ 7.5. Header Field Counts . . . . . . . . . . . . . . . . . . . 13
+ 7.5.1. Repeated Header Fields . . . . . . . . . . . . . . . 14
+ 7.5.2. Missing Header Fields . . . . . . . . . . . . . . . . 15
+ 7.5.3. Return-Path . . . . . . . . . . . . . . . . . . . . . 16
+ 7.6. Missing or Incorrect Charset Information . . . . . . . . 16
+ 7.7. Eight-Bit Data . . . . . . . . . . . . . . . . . . . . . 18
+ 8. MIME Anomalies . . . . . . . . . . . . . . . . . . . . . . . 18
+ 8.1. Missing MIME-Version Field . . . . . . . . . . . . . . . 19
+ 8.2. Faulty Encodings . . . . . . . . . . . . . . . . . . . . 19
+ 9. Body Anomalies . . . . . . . . . . . . . . . . . . . . . . . 19
+ 9.1. Oversized Lines . . . . . . . . . . . . . . . . . . . . . 19
+ 10. Security Considerations . . . . . . . . . . . . . . . . . . . 20
+ 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 20
+ 11.1. Normative References . . . . . . . . . . . . . . . . . . 20
+ 11.2. Informative References . . . . . . . . . . . . . . . . . 20
+ Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 23
+
+
+
+
+
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 3]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+1. Introduction
+
+1.1. The Purpose of This Work
+
+ The history of email standards, going back to [RFC733] and beyond,
+ contains a fairly rigid evolution of specifications. However,
+ implementations within that culture have also long had an
+ undercurrent known formally as "the robustness principle", also known
+ informally as "Postel's Law": "Be liberal in what you accept, and
+ conservative in what you send" [RFC1122].
+
+ Jon Postel's directive is often interpreted to mean that any deviance
+ from a specification is acceptable. However, we believe it was
+ intended only to account for legitimate variations in interpretation
+ within specifications, as well as basic transit errors, like bit
+ errors. Taken to its unintended extreme, excessive tolerance would
+ imply that there are no limits to the liberties that a sender might
+ take, while presuming a burden on a receiver to guess "correctly" at
+ the meaning of any such variation. These matters are further
+ compounded by receiver software -- the end users' mail readers --
+ which are also sometimes flawed, leaving senders to craft messages
+ (sometimes bending the rules) to overcome those flaws.
+
+ In general, this served the email ecosystem well by allowing a few
+ errors in implementations without obstructing participation in the
+ game. The proverbial bar was set low. However, as we have evolved
+ into the current era, some of these lenient stances have begun to
+ expose opportunities that can be exploited by malefactors. Various
+ email-based applications rely on the strong application of these
+ standards for simple security checks, while the very basic building
+ blocks of that infrastructure, intending to be robust, fail utterly
+ to assert those standards.
+
+ The distributed and non-interactive nature of email has often
+ prompted adjustments to receiving software, to handle these
+ variations, rather than trying to gain better conformance by senders,
+ since the receiving operator is primarily driven by complaints from
+ recipient users and has no authority over the sending side of the
+ system. Processing with such flexibility comes at some cost, since
+ mail software is faced with decisions about whether to permit non-
+ conforming messages to continue toward their destinations unaltered,
+ adjust them to conform (possibly at the cost of losing some of the
+ original message), or reject them outright.
+
+ This document includes a collection of the best advice available
+ regarding a variety of common malformed mail situations; it is to be
+ used as implementation guidance. These malformations are typically
+
+
+
+
+Kucherawy, et al. Informational [Page 4]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ based around loose interpretations or implementations of
+ specifications such as the Internet Message Format [MAIL] and
+ Multipurpose Internet Mail Extensions [MIME].
+
+1.2. Not the Purpose of This Work
+
+ It is important to understand that this work is not an effort to
+ endorse or standardize certain common malformations. The code and
+ culture that introduces such messages into the mail stream needs to
+ be repaired, as the security penalty now being paid for this lax
+ processing arguably outweighs the reduction in support costs to end
+ users who are not expected to understand the standards. However, the
+ reality is that this will not be fixed quickly.
+
+ Given this, it is beneficial to provide implementers with guidance
+ about the safest or most effective way to handle malformed messages
+ when they arrive, taking into consideration the trade-offs of the
+ choices available especially with respect to how various actors in
+ the email ecosystem respond to such messages in terms of handling,
+ parsing, or rendering to end users.
+
+1.3. General Considerations
+
+ Many deviations from message format standards are considered by some
+ receivers to be strong indications that the message is undesirable,
+ such as spam or something containing malware. These receivers
+ quickly decide that the best handling choice is simply to reject or
+ discard the message. This means malformations caused by innocent
+ misunderstandings or ignorance of proper syntax can cause messages
+ with no ill intent also to fail to be delivered.
+
+ Senders that want to ensure message delivery are best advised to
+ adhere strictly to the relevant standards (including, but not limited
+ to, [MAIL], [MIME], and [DKIM]), as well as observe other industry
+ best practices such as may be published from time to time by either
+ the IETF or independently.
+
+ Receivers that haven't the luxury of strict enforcement of the
+ standards on inbound messages are usually best served by observing
+ the following guidelines for handling of malformed messages:
+
+ 1. Whenever possible, mitigation of syntactic malformations should
+ be guided by an assessment of the most likely semantic intent.
+ For example, it is reasonable to conclude that multiple sets of
+ angle brackets around an address are simply superfluous and can
+ be dropped.
+
+
+
+
+
+Kucherawy, et al. Informational [Page 5]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ 2. When the intent is unclear, or when it is clear but also
+ impractical to change the content to reflect that intent,
+ mitigation should be limited to cases where not taking any
+ corrective action would clearly lead to a worse outcome.
+
+ 3. Security issues, when present, need to be addressed and may force
+ mitigation strategies that are otherwise suboptimal.
+
+2. Document Conventions
+
+2.1. Examples
+
+ Examples of message content include a number within braces at the end
+ of each line. These are line numbers for use in subsequent
+ discussion, and they are not actually part of the message content
+ presented in the example.
+
+ Blank lines are not numbered in the examples.
+
+3. Background
+
+ The reader would benefit from reading [EMAIL-ARCH] for some general
+ background about the overall email architecture. Of particular
+ interest is the Internet Message Format, detailed in [MAIL].
+ Throughout this document, the use of the term "message" should be
+ assumed to mean a block of text conforming to the Internet Message
+ Format.
+
+4. Invariant Content
+
+ An agent handling a message could use several distinct
+ representations of the message. One is an internal representation,
+ such as separate blocks of storage for the header and body, some
+ header or body alterations, or tables indexed by header name, set up
+ to make particular kinds of processing easier. The other is the
+ representation passed along to the next agent in the handling chain.
+ This might be identical to the message input to the module, or it
+ might have some changes such as added or reordered header fields or
+ body elisions to remove malicious content.
+
+ Message handling is usually most effective when each in a sequence of
+ handling modules receives the same content for analysis. A module
+ that "fixes" or otherwise alters the content passed to later modules
+ can prevent the later modules from identifying malicious or other
+ content that exposes the end user to harm. It is important that all
+ processing modules can make consistent assertions about the content.
+ Modules that operate sequentially sometimes add private header fields
+ to relay information downstream for later filters to use (and
+
+
+
+Kucherawy, et al. Informational [Page 6]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ possibly remove), or they may have out-of-band ways of doing so.
+ However, even the presence of private header fields can impact a
+ downstream handling agent unaware of its local semantics, so an out-
+ of-band method is always preferable.
+
+ The above is less of a concern when multiple analysis modules are
+ operated in parallel, independent of one another.
+
+ Often, abuse reporting systems can act effectively only when a
+ complaint or report contains the original message exactly as it was
+ generated. Messages that have been altered by handling modules might
+ render a complaint not actionable as the system receiving the report
+ may be unable to identify the original message as one of its own.
+
+ Some message changes alter syntax without changing semantics. For
+ example, Section 7.4 describes a situation where an agent removes
+ additional header whitespace. This is a syntax change without a
+ change in semantics, though some systems (such as DKIM) are sensitive
+ to such changes. Message system developers need to be aware of the
+ downstream impact of making either kind of change.
+
+ Where a change to content between modules is unavoidable, it is a
+ good idea to add standard trace data to indicate a "visible" handoff
+ between modules has occurred. The only advisable way to do this is
+ to prepend Received fields with the appropriate information, as
+ described in Section 3.6.7 of [MAIL].
+
+ There will always be local handling exceptions, but these guidelines
+ should be useful for developing integrated message processing
+ environments.
+
+ In most cases, this document only discusses techniques used on
+ internal representations. It is occasionally necessary to make
+ changes between the input and output versions; such cases will be
+ called out explicitly.
+
+5. Mail Submission Agents
+
+ Within the email context, the single most influential component that
+ can reduce the presence of malformed items in the email system is the
+ Mail Handling Service (MHS; see [EMAIL-ARCH]), which includes the
+ Mail Submission Agent (MSA). This is the component that is
+ essentially the interface between end users that create content and
+ the mail stream.
+
+ MHSs need to become more strict about enforcement of all relevant
+ email standards, especially [MAIL] and the [MIME] family of
+ documents.
+
+
+
+Kucherawy, et al. Informational [Page 7]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ More strict conformance by relaying Mail Transfer Agents (MTAs) will
+ also be helpful. Although preventing the dissemination of malformed
+ messages is desirable, the rejection of such mail already in transit
+ also has a support cost -- namely, the creation of a [DSN] that many
+ end users might not understand.
+
+6. Line Termination
+
+ For interoperable Internet Mail messages, the only valid line
+ separation sequence during a typical SMTP session is ASCII 0x0D
+ ("carriage return", or CR) followed by ASCII 0x0A ("line feed", or
+ LF), commonly referred to as "CRLF". This is not the case for binary
+ mode SMTP (see [BINARYSMTP]).
+
+ Common UNIX user tools, however, typically only use LF for internal
+ line termination. This means that a protocol engine that converts
+ between UNIX and Internet message formats has to convert between
+ these two end-of-line representations before transmitting a message
+ or after receiving it.
+
+ Non-compliant implementations can create messages with a mix of line
+ terminations, such as LF everywhere except CRLF only at the end of
+ the message. According to [SMTP] and [MAIL], this means the entire
+ message actually exists on a single line.
+
+ Within modern Internet Mail, it is highly unlikely that an isolated
+ CR or LF is valid in common ASCII text. Furthermore, when content
+ actually does need to contain such an unusual character sequence,
+ [MIME] provides mechanisms for encoding that content in an SMTP-safe
+ manner.
+
+ Thus, it will typically be safe and helpful to treat an isolated CR
+ or LF as equivalent to a CRLF when parsing a message.
+
+ Note that this advice pertains only to the raw SMTP data and not to
+ decoded MIME entities. As noted above, when MIME encoding mechanisms
+ are used, the unusual character sequences are not visible in the raw
+ SMTP stream.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 8]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+7. Header Anomalies
+
+ This section covers common syntactic and semantic anomalies found in
+ a message header and presents suggested methods of mitigation.
+
+7.1. Converting Obsolete and Invalid Syntaxes
+
+ A message using an obsolete header syntax (see Section 4 of [MAIL])
+ might confound an agent that is attempting to be robust in its
+ handling of syntax variations. A bad actor could exploit such a
+ weakness in order to get abusive or malicious content through a
+ filter. This section presents some examples of such variations.
+ Messages including these variations ought to be rejected; where this
+ is not possible, recommended internal interpretations are provided.
+
+7.1.1. Host-Address Syntax
+
+ The following obsolete syntax attempts to specify source routing:
+
+ To: <@example.net:fran@example.com>
+
+ This means "send to fran@example.com via the mail service at
+ example.net". It can safely be interpreted as:
+
+ To: <fran@example.com>
+
+7.1.2. Excessive Angle Brackets
+
+ The following overuse of angle brackets:
+
+ To: <<<user2@example.org>>>
+
+ can safely be interpreted as:
+
+ To: <user2@example.org>
+
+7.1.3. Unbalanced Angle Brackets
+
+ The following use of unbalanced angle brackets:
+
+ To: <another@example.net
+
+ can usually be treated as:
+
+ To: <another@example.net>
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 9]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ The following:
+
+ To: second@example.org>
+
+ can usually be treated as:
+
+ To: second@example.org
+
+7.1.4. Unbalanced Parentheses
+
+ The following use of unbalanced parentheses:
+
+ To: (Testing <fran@example.com>
+
+ can safely be interpreted as:
+
+ To: (Testing) <fran@example.com>
+
+ Likewise, this case:
+
+ To: Testing) <sam@example.com>
+
+ can safely be interpreted as:
+
+ To: "Testing)" <sam@example.com>
+
+ In both cases, it is obvious where the active email address in the
+ string can be found. The former case retains the active email
+ address in the string by completing what appears to be intended as a
+ comment; the intent in the latter case is less obvious, so the
+ leading string is interpreted as a display name.
+
+7.1.5. Commas in Address Lists
+
+ This use of an errant comma:
+
+ To: <third@example.net, fourth@example.net>
+
+ can usually be interpreted as ending an address, so the above is
+ usually best interpreted as:
+
+ To: third@example.net, fourth@example.net
+
+
+
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 10]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+7.1.6. Unbalanced Quotes
+
+ The following use of unbalanced quotation marks:
+
+ To: "Joe <joe@example.com>
+
+ leaves software with no unambiguous interpretation. One possible
+ interpretation is:
+
+ To: "Joe <joe@example.com>"@example.net
+
+ where "example.net" is the domain name or host name of the handling
+ agent making the interpretation. However, the more obvious and
+ likely best interpretation is simply:
+
+ To: "Joe" <joe@example.com>
+
+7.1.7. Naked Local-Parts
+
+ [MAIL] defines a local-part as the user portion of an email address,
+ and the display-name as the "user-friendly" label that accompanies
+ the address specification.
+
+ Some broken submission agents might introduce messages with only a
+ local-part or only a display-name and no properly formed address.
+ For example:
+
+ To: Joe
+
+ A submission agent ought to reject this or, at a minimum, append "@"
+ followed by its own host name or some other valid name likely to
+ enable a reply to be delivered to the correct mailbox. Where this is
+ not done, an agent receiving such a message will probably be
+ successful by synthesizing a valid header field for evaluation using
+ the techniques described in Section 7.5.2.
+
+7.2. Non-Header Lines
+
+ Some messages contain a line of text in the header that is not a
+ valid message header field of any kind. For example:
+
+ From: user@example.com {1}
+ To: userpal@example.net {2}
+ Subject: This is your reminder {3}
+ about the football game tonight {4}
+ Date: Wed, 20 Oct 2010 20:53:35 -0400 {5}
+
+ Don't forget to meet us for the tailgate party! {7}
+
+
+
+Kucherawy, et al. Informational [Page 11]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ The cause of this is typically a bug in a message generator of some
+ kind. Line {4} was intended to be a continuation of line {3}; it
+ should have been indented by whitespace as set out in Section 2.2.3
+ of [MAIL].
+
+ This anomaly has varying impacts on processing software, depending on
+ the implementation:
+
+ 1. Some agents choose to separate the header of the message from the
+ body only at the first empty line (that is, a CRLF immediately
+ followed by another CRLF).
+
+ 2. Some agents assume this anomaly should be interpreted to mean the
+ body starts at line {4}, as the end of the header is assumed by
+ encountering something that is not a valid header field or folded
+ portion thereof.
+
+ 3. Some agents assume this should be interpreted as an intended
+ header folding as described above and thus simply append a single
+ space character (ASCII 0x20) and the content of line {4} to that
+ of line {3}.
+
+ 4. Some agents reject this outright as line {4} is neither a valid
+ header field nor a folded continuation of a header field prior to
+ an empty line.
+
+ This can be exploited if it is known that one message handling agent
+ will take one action, while the next agent in the handling chain will
+ take another. Consider, for example, a message filter that searches
+ message headers for properties indicative of abusive or malicious
+ content that is attached to a Mail Transfer Agent (MTA) implementing
+ option 2 above. An attacker could craft a message that includes this
+ malformation at a position above the property of interest, knowing
+ the MTA will not consider that content part of the header.
+ Consequently, the MTA will not feed it to the filter; thus, it avoids
+ detection. Meanwhile, the Mail User Agent (MUA), which presents the
+ content to an end user, implements option 1 or 3, which has some
+ undesirable effect.
+
+ It should be noted that a few implementations choose option 4 above
+ since any reputable message generation program will get header
+ folding right, and thus anything so blatant as this malformation is
+ likely an error caused by a malefactor.
+
+
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 12]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ The preferred implementation if option 4 above is not employed is to
+ apply the following heuristic when this malformation is detected:
+
+ 1. Search forward for an empty line. If one is found, then apply
+ option 3 above to the anomalous line, and continue.
+
+ 2. Search forward for another line that appears to be a new header
+ field (a name followed by a colon). If one is found, then apply
+ option 3 above to the anomalous line, and continue.
+
+7.3. Unusual Spacing
+
+ The following message is valid per [MAIL]:
+
+ From: user@example.com {1}
+ To: userpal@example.net {2}
+ Subject: This is your reminder {3}
+ {4}
+ about the football game tonight {5}
+ Date: Wed, 20 Oct 2010 20:53:35 -0400 {6}
+
+ Don't forget to meet us for the tailgate party! {8}
+
+ Line {4} contains a single whitespace. The intended result is that
+ lines {3}, {4}, and {5} comprise a single continued header field.
+ However, some agents are aggressive at stripping trailing whitespace,
+ which will cause line {4} to be treated as an empty line, and thus
+ the separator line between header and body. This can affect header-
+ specific processing algorithms as described in the previous section.
+
+ This example was legal in earlier versions of the Internet message
+ format standard but was rendered obsolete as of [RFC2822] as line {4}
+ could be interpreted as the separator between the header and body.
+
+ The best handling of this example is for a message parsing engine to
+ behave as if line {4} were not present in the message and for a
+ message creation engine to emit the message with line {4} removed.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 13]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+7.4. Header Malformations
+
+ Among the many possible malformations, a common one is insertion of
+ whitespace at unusual locations, such as:
+
+ From: user@example.com {1}
+ To: userpal@example.net {2}
+ Subject: This is your reminder {3}
+ MIME-Version : 1.0 {4}
+ Content-Type: text/plain {5}
+ Date: Wed, 20 Oct 2010 20:53:35 -0400 {6}
+
+ Don't forget to meet us for the tailgate party! {8}
+
+ Note the addition of whitespace in line {4} after the header field
+ name but before the colon that separates the name from the value.
+
+ The obsolete grammar of Section 4 of [MAIL] permits that extra
+ whitespace, so it cannot be considered invalid. However, a consensus
+ of implementations prefers to remove that whitespace. There is no
+ perceived change to the semantics of the header field being altered
+ as the whitespace is itself semantically meaningless. Therefore, it
+ is best to remove all whitespace after the field name but before the
+ colon and to emit the field in this modified form.
+
+7.5. Header Field Counts
+
+ Section 3.6 of [MAIL] prescribes specific header field counts for a
+ valid message. Few agents actually enforce these in the sense that a
+ message whose header contents exceed one or more limits set there are
+ generally allowed to pass; they typically add any required fields
+ that are missing, however.
+
+ Also, few agents that use messages as input, including MUAs that
+ actually display messages to users, verify that the input is valid
+ before proceeding. Some popular open-source filtering programs and
+ some popular Mailing List Management (MLM) packages select either the
+ first or last instance of a particular field name, such as From, to
+ decide who sent a message. Absent strict enforcement of [MAIL], an
+ attacker can craft a message with multiple instances of the same
+ fields if that attacker knows the filter will make a decision based
+ on one, but the user will be shown the others.
+
+ This situation is exacerbated when message validity is assessed, such
+ as through enhanced authentication methods like DomainKeys Identified
+ Mail [DKIM]. Such methods might cover one instance of a constrained
+ field but not another, taking the wrong one as "good" or "safe". An
+
+
+
+
+Kucherawy, et al. Informational [Page 14]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ MUA, for example, could show the first of two From fields to an end
+ user as "good" or "safe", while an authentication method actually
+ only verified the second.
+
+ In attempting to counter this exposure, one of the following
+ strategies can be used:
+
+ 1. reject outright or refuse to process further any input message
+ that does not conform to Section 3.6 of [MAIL];
+
+ 2. remove or, in the case of an MUA, refuse to render any instances
+ of a header field whose presence exceeds a limit prescribed in
+ Section 3.6 of [MAIL] when generating its output;
+
+ 3. where a field can contain multiple distinct values (such as From)
+ or is free-form text (such as Subject), combine them into a
+ semantically identical, single header field of the same name (see
+ Section 7.5.1);
+
+ 4. alter the name of any header field whose presence exceeds a limit
+ prescribed in Section 3.6 of [MAIL] when generating its output so
+ that later agents can produce a consistent result. Any
+ alteration likely to cause the field to be ignored by downstream
+ agents is acceptable. A common approach is to prefix the field
+ names with a string such as "BAD-".
+
+ When selecting a mitigation action (or some other action) from the
+ above list, an operator must consider its needs and the nature of its
+ user base.
+
+7.5.1. Repeated Header Fields
+
+ There are some occasions where repeated fields are encountered where
+ only one is expected. Two examples are presented. First:
+
+ From: reminders@example.com {1}
+ To: jqpublic@example.com {2}
+ Subject: Automatic Meeting Reminder {3}
+ Subject: 4pm Today -- Staff Meeting {4}
+ Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}
+
+ Reminder of the staff meeting today in the small {6}
+ auditorium. Come early! {7}
+
+ The message above has two Subject fields, which is in violation of
+ Section 3.6 of [MAIL]. A safe interpretation of this would be to
+ treat it as though the two Subject field values were concatenated, so
+ long as they are not identical, such as:
+
+
+
+Kucherawy, et al. Informational [Page 15]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ From: reminders@example.com {1}
+ To: jqpublic@example.com {2}
+ Subject: Automatic Meeting Reminder {3}
+ 4pm Today -- Staff Meeting {4}
+ Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}
+
+ Reminder of the staff meeting today in the small {6}
+ auditorium. Come early! {7}
+
+ Second:
+
+ From: president@example.com {1}
+ From: vice-president@example.com {2}
+ To: jqpublic@example.com {3}
+ Subject: A note from the E-Team {4}
+ Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}
+
+ This memo is to remind you of the corporate dress {6}
+ code. Attached you will find an updated copy of {7}
+ the policy. {8}
+ ...
+
+ As with the first example, there is a violation in terms of the
+ number of instances of the From field. A likely safe interpretation
+ would be to combine these into a comma-separated address list in a
+ single From field:
+
+ From: president@example.com, {1}
+ vice-president@example.com {2}
+ To: jqpublic@example.com {3}
+ Subject: A note from the E-Team {4}
+ Date: Wed, 20 Oct 2010 08:00:00 -0700 {5}
+
+ This memo is to remind you of the corporate dress {6}
+ code. Attached you will find an updated copy of {7}
+ the policy. {8}
+ ...
+
+7.5.2. Missing Header Fields
+
+ Similar to the previous section, there are messages seen in the wild
+ that lack certain required header fields. In particular, [MAIL]
+ requires that a From and Date field be present in all messages.
+
+
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 16]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ When presented with a message lacking these fields, the MTA might
+ perform one of the following:
+
+ 1. Make no changes.
+
+ 2. Add an instance of the missing field(s) using synthesized content
+ based on data provided in other parts of the protocol.
+
+ Option 2 is recommended for handling this case. Handling agents
+ should add these for internal handling if they are missing, but
+ should not add them to the external representation. The reason for
+ this advice is that there are some filter modules that would consider
+ the absence of such fields to be a condition warranting special
+ treatment (for example, rejection), and thus the effectiveness of
+ such modules would be stymied by an upstream filter adding them in a
+ way visible to other components.
+
+ The synthesized fields should contain a best guess as to what should
+ have been there; for From, the SMTP MAIL command's address can be
+ used (if not null) or a placeholder address followed by an address
+ literal (for example, unknown@[192.0.2.1]); for Date, a date
+ extracted from a Received field is a reasonable choice.
+
+ One other important case to consider is a missing Message-ID field.
+ An MTA that encounters a message missing this field should synthesize
+ a valid one and add it to the external representation, since many
+ deployed tools commonly use the content of that field as a unique
+ message reference, so its absence inhibits correlation of message
+ processing. Section 3.6.4 of [MAIL] describes advisable practice for
+ synthesizing the content of this field when it is absent, and
+ establishes a requirement that it be globally unique.
+
+7.5.3. Return-Path
+
+ While legitimate messages can contain more than one Return-Path
+ header field, such usage is often an error rather that a valid
+ message containing multiple header field blocks as described in
+ Sections 3.6 of [MAIL]. Accordingly, when a message containing
+ multiple Return-Path header fields is encountered, all but the
+ topmost one is to be disregarded, as it is most likely to have been
+ added nearest to the mailbox that received that message.
+
+7.6. Missing or Incorrect Charset Information
+
+ MIME provides the means to include textual material employing
+ character sets ("charsets") other than US-ASCII. Such material is
+ required to have an identified charset. Charset identification is
+
+
+
+
+Kucherawy, et al. Informational [Page 17]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ done using a "charset" parameter in the Content-Type header field, a
+ charset label within the MIME entity itself, or the charset can be
+ implicitly specified by the Content-Type (see [CHARSET]).
+
+ Unfortunately, it is fairly common for required character set
+ information to be missing or incorrect in textual MIME entities. As
+ such, processing agents should perform basic sanity checks, such as:
+
+ o US-ASCII contains bytes between 1 and 127 inclusive only
+ (colloquially, "7-bit" data), so material including bytes outside
+ of that range ("8-bit" data) is necessarily not US-ASCII. (See
+ Section 2.1 of [MAIL].)
+
+ o [UTF-8] has a very specific syntactic structure that other 8-bit
+ charsets are unlikely to follow.
+
+ o Null bytes (ASCII 0x00) are not allowed in either 7-bit or 8-bit
+ data.
+
+ o Not all 7-bit material is US-ASCII. The presence of the various
+ escape sequences used for character switching can be used as an
+ indication of the various charsets based on ISO/IEC 2022
+ [ISO-2022], such as those defined in [ISO-2022-CN], [ISO-2022-JP],
+ and [ISO-2022-KR].
+
+ When a character set error is detected, processing agents should:
+
+ 1. apply heuristics to determine the most likely character set and,
+ if successful, proceed using that information; or
+
+ 2. refuse to process the malformed MIME entity.
+
+ A null byte inside a textual MIME entity can cause typical string
+ processing functions to misidentify the end of a string, which can be
+ exploited to hide malicious content from analysis processes.
+ Accordingly, null bytes require additional special handling.
+
+ A few null bytes in isolation is likely to be the result of poor
+ message construction practices. Such nulls should be silently
+ dropped.
+
+ Large numbers of null bytes are usually the result of binary material
+ that is improperly encoded, improperly labeled, or both. Such
+ material is likely to be damaged beyond the hope of recovery, so the
+ best course of action is to refuse to process it.
+
+ Finally, the presence of null bytes may be used as indication of
+ possible malicious intent.
+
+
+
+Kucherawy, et al. Informational [Page 18]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+7.7. Eight-Bit Data
+
+ Standards-compliant email messages do not contain any non-ASCII data
+ without indicating that such content is present by means of published
+ SMTP extensions. Absent that, MIME encodings are typically used to
+ convert non-ASCII data to ASCII in a way that can be reversed by
+ other handling agents or end users.
+
+ The best way to handle non-compliant 8-bit material depends on its
+ location.
+
+ Non-compliant 8-bit material in MIME entity content should simply be
+ processed as if the necessary SMTP extensions had been used to
+ transfer the message. Note that improperly labeled 8-bit material in
+ textual MIME entities may require treatment as described in
+ Section 7.6.
+
+ Non-compliant 8-bit material in message or MIME entity header fields
+ can be handled as follows:
+
+ 1. Occurrences in unstructured text fields, comments, and phrases
+ can be converted into encoded-words (see [MIME3] if a likely
+ character set can be determined). Alternatively, 8-bit
+ characters can be removed or replaced with some other character.
+
+ 2. Occurrences in header fields whose syntax is unknown may be
+ handled by dropping the field entirely or by removing/replacing
+ the 8-bit character as described above.
+
+ 3. Occurrences in addresses are especially problematic. Agents
+ supporting [EAI] may, if the 8-bit material conforms to 8-bit
+ syntax, elect to treat the message as an EAI message and process
+ it accordingly. Otherwise, in most cases, it is best to exclude
+ the address from any sort of processing -- which may mean
+ dropping it entirely -- since any attempt to fix it definitively
+ is unlikely to be successful.
+
+8. MIME Anomalies
+
+ The five-part set of MIME specifications includes a mechanism of
+ message extensions for providing text in character sets other than
+ ASCII, non-text attachments to messages, multipart message bodies,
+ and similar facilities.
+
+ Some anomalies with MIME-compliant generation are also common. This
+ section discusses some of those and presents preferred methods of
+ mitigation.
+
+
+
+
+Kucherawy, et al. Informational [Page 19]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+8.1. Missing MIME-Version Field
+
+ Any message that uses [MIME] constructs is required to have a MIME-
+ Version header field. Without it, the Content-Type and associated
+ fields have no semantic meaning.
+
+ It is often observed that a message has complete MIME structure, yet
+ lacks this header field. It is prudent to disregard this absence and
+ conduct analysis of the message as if it were present, especially by
+ agents attempting to identify malicious material.
+
+ Further, the absence of MIME-Version might be an indication of
+ malicious intent, and extra scrutiny of the message may be warranted.
+ Such omissions are not expected from compliant message generators.
+
+8.2. Faulty Encodings
+
+ There have been a few different specifications of base64 in the past.
+ The implementation defined in [MIME] instructs decoders to discard
+ characters that are not part of the base64 alphabet. Other
+ implementations consider an encoded body containing such characters
+ to be completely invalid. Very early specifications of base64 (see
+ [PEM89], for example, which was later obsoleted by [PEM93]) allowed
+ email-style comments within base64-encoded data.
+
+ The attack vector here involves constructing a base64 body whose
+ meaning varies given different possible decodings. If a security
+ analysis module wishes to be thorough, it should consider scanning
+ the possible outputs of the known decoding dialects in an attempt to
+ anticipate how the MUA will interpret the data.
+
+9. Body Anomalies
+
+9.1. Oversized Lines
+
+ A message containing a line of content that exceeds 998 characters
+ plus the line terminator (1000 total) violates Section 2.1.1 of
+ [MAIL]. Some handling agents may not look at content in a single
+ line past the first 998 bytes, providing bad actors an opportunity to
+ hide malicious content.
+
+ There is no specified way to handle such messages, other than to
+ observe that they are non-compliant and reject them or rewrite the
+ oversized line such that the message is compliant.
+
+ To ensure long lines do not prevent analysis of potentially malicious
+ data, handling agents are strongly encouraged to take one of the
+ following actions:
+
+
+
+Kucherawy, et al. Informational [Page 20]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ 1. Break such lines into multiple lines at a position that does not
+ change the semantics of the text being thus altered. For
+ example, break an oversized line at a position such that a [URI]
+ does not span two lines (which could inhibit the proper
+ identification of the URI).
+
+ 2. Rewrite the MIME part (or the entire message if not MIME) that
+ contains the excessively long line using a content encoding that
+ breaks the line in the transmission but would still result in the
+ line being intact on decoding for presentation to the user. Both
+ of the encodings declared in [MIME] can accomplish this.
+
+10. Security Considerations
+
+ The discussions of the anomalies above and their prescribed solutions
+ are themselves security considerations. The practices enumerated in
+ this document are generally perceived as attempts to resolve security
+ considerations that already exist rather than introducing new ones.
+ However, some of the attacks described here may not have appeared in
+ previous email specifications.
+
+11. References
+
+11.1. Normative References
+
+ [EMAIL-ARCH] Crocker, D., "Internet Mail Architecture", RFC 5598,
+ July 2009.
+
+ [MAIL] Resnick, P., "Internet Message Format", RFC 5322,
+ October 2008.
+
+ [MIME] Freed, N. and N. Borenstein, "Multipurpose Internet
+ Mail Extensions (MIME) Part One: Format of Internet
+ Message Bodies", RFC 2045, November 1996.
+
+11.2. Informative References
+
+ [BINARYSMTP] Vaudreuil, G., "SMTP Service Extensions for
+ Transmission of Large and Binary MIME Messages", RFC
+ 3030, December 2000.
+
+ [CHARSET] Melnikov, A. and J. Reschke, "Update to MIME regarding
+ "charset" Parameter Handling in Textual Media Types",
+ RFC 6657, July 2012.
+
+ [DKIM] Crocker, D., Ed., Hansen, T., Ed., and M. Kucherawy,
+ Ed., "DomainKeys Identified Mail (DKIM) Signatures",
+ RFC 6376, September 2011.
+
+
+
+Kucherawy, et al. Informational [Page 21]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ [DSN] Moore, K. and G. Vaudreuil, "An Extensible Message
+ Format for Delivery Status Notifications", RFC 3464,
+ January 2003.
+
+ [EAI] Yang, A., Steele, S., and N. Freed, "Internationalized
+ Email Headers", RFC 6532, February 2012.
+
+ [ISO-2022-CN] Zhu, HF., Hu, DY., Wang, ZG., Kao, TC., Chang, WCH.,
+ and M. Crispin, "Chinese Character Encoding for
+ Internet Messages", RFC 1922, March 1996.
+
+ [ISO-2022-JP] Murai, J., Crispin, M., and E. van der Poel, "Japanese
+ Character Encoding for Internet Messages", RFC 1468,
+ June 1993.
+
+ [ISO-2022-KR] Choi, U., Chon, K., and H. Park, "Korean Character
+ Encoding for Internet Messages", RFC 1557, December
+ 1993.
+
+ [ISO-2022] ISO/IEC, "Information technology -- Character code
+ structure and extension techniques", ISO/IEC 2022,
+ 1994, <http://www.iso.org/iso/
+ catalogue_detail.htm?csnumber=22747>.
+
+ [MIME3] Moore, K., "MIME (Multipurpose Internet Mail
+ Extensions) Part Three: Message Header Extensions for
+ Non-ASCII Text", RFC 2047, November 1996.
+
+ [PEM89] Linn, J., "Privacy Enhancement for Internet Electronic
+ Mail: Part I -- Message Encipherment and Authentication
+ Procedures", RFC 1113, August 1989.
+
+ [PEM93] Linn, J., "Privacy Enhancement for Internet Electronic
+ Mail: Part I: Message Encryption and Authentication
+ Procedures", RFC 1421, February 1993.
+
+ [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts --
+ Communication Layers", RFC 1122, October 1989.
+
+ [RFC2822] Resnick, P., Ed., "Internet Message Format", RFC 2822,
+ April 2001.
+
+ [RFC733] Crocker, D., Vittal, J., Pogran, K., and D. Henderson,
+ Jr., "Standard for the Format of Internet Text
+ Messages", RFC 733, November 1977.
+
+ [SMTP] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
+ October 2008.
+
+
+
+Kucherawy, et al. Informational [Page 22]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+ [URI] Berners-Lee, T., Fielding, R., and L. Masinter,
+ "Uniform Resource Identifier (URI): Generic Syntax",
+ RFC 3986, January 2005.
+
+ [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO
+ 10646", RFC 3629, 2003.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 23]
+
+RFC 7103 Safe Mail Handling January 2014
+
+
+Appendix A. Acknowledgements
+
+ The authors wish to acknowledge the following for their review and
+ constructive criticism of this proposal: Dave Cridland, Dave Crocker,
+ Jim Galvin, Tony Hansen, John Levine, Franck Martin, Alexey Melnikov,
+ and Timo Sirainen.
+
+Authors' Addresses
+
+ Murray S. Kucherawy
+
+ EMail: superuser@gmail.com
+
+
+ Gregory N. Shapiro
+
+ EMail: gshapiro@proofpoint.com
+
+
+ Ned Freed
+
+ EMail: ned.freed@mrochek.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kucherawy, et al. Informational [Page 24]
+